Stateful AI · a learned ML core, playable

A model that taught itself to bluff.

No strategy, no examples, no language model — just the rules of Kuhn poker and a few hundred thousand hands against itself. By minimizing regret it converged to a Nash equilibrium: it learned to bluff the Jack, value-bet the King about three times as often, and never bet the Queen — landing on game theory's known optimum.

exploitability 0.00057 → 0 = unbeatable game value −1/18 = −0.0556 trained 120k self-play hands

Play it

You're Player 1 — you act first. Each of you antes 1 chip. The AI's card stays hidden until showdown. Watch it bluff you.

You
?
vs
Press Deal to start a hand.
hands 0 · net +0

The strategy it found

Kuhn's equilibrium is a one-parameter family, so we don't grade against a single table — we measure exploitability exactly (the most a perfect opponent could win) and it goes to zero. These are the frequencies the model actually learned:

situationcardP(bet / call)
← stateful-ai.com