AI Poker Study - 100-Hand Analysis

Executive Summary

What 110 hands of AI poker reveal about how models reason, adapt, and fail to adapt.

1Agents reason about each other

Agents constructed multi-level theories about opponents mid-hand: bluffing based on perceived weakness, trapping by predicting how opponents would narrate the board. Theory of mind emerged without being prompted for it.

2Memory is a design problem, not a capability problem

Agents with memory didn't automatically outperform those without. Fierce Lion converted observations into concrete rules ("Barrel Musa on A/K boards"). Rustic Moose wrote accurate scouting reports but never acted on them. Memory only helps when it changes decisions.

3Models default to their training, not their notes

Claude Sonnet agents played aggressive, position-aware poker from hand one. Gemini Pro agents defaulted to ultra-tight play regardless of what their notes said. Base model tendencies were the strongest predictor of playstyle in this sample.

4The signal is in the traces, not the scoreboard

Win/loss over 100 hands is heavily influenced by variance (one cooler hand swung Fierce Lion's P/L by 198 chips). But decision quality is visible hand by hand. The 9,500+ game events and full reasoning traces show how each model actually thinks under pressure.

Stack Progression

Game A: chip counts across 100 hands. Amnesia Minnie busted at hand #25.

Game B: three agents across 10 hands. All finished within 3 chips of breakeven, the tightest distribution in any session.

Tool Call Summary

Sandbox tool usage per agent across all 100 hands. Execute = Python/code runs. Edit = file writes (SKILL.md). Read = file reads (SKILL.md).

Agent	Execute	Edit	Read	Total
Amnesia Mickey	52	—	—	52
Amnesia Minnie	2	—	—	2
Fierce Lion	23	147	146	316
Musa	2	—	—	2
Rustic Moose	20	106	98	224
Amnesia Jill	—	—	—	0
Total	99	253	244	596

The Memory System: SKILL.md

How agents used (or ignored) persistent memory. Every agent was told to read, apply, decide, and write SKILL.md each hand.

SKILL.md Engagement

Agent	Reads	Writes	Rules Applied	Assessment
Fierce Lion	145	146	275	Master Student
Rustic Moose	98	106	98	Diligent Scribe
Musa	0	0	0	Complete Dropout
Amnesia Jill	0	0	0	N/A (Amnesia)
Amnesia Mickey	0	0	0	N/A (Amnesia)
Amnesia Minnie	0	0	0	N/A (Amnesia)

Fierce Lion's SKILL.md Evolution

Initial State Hand #1

Opponent Reads

Musa: Tight pre-flop (3x)
Agile Cheetah: Post-flop aggressor prior game

Strategic Rules

RFI (3x): UTG (77+, A2s+, K9s+)
BB Defense: Call Q2s+ vs BTN/CO

Mid-Game Hand #50

Opponent Reads

Musa: Capped BB. Folds river to 65% pot on A/K high boards new
Amnesia Jill: Aggressive RFI (3x-4x). C-bets dry boards (60%+)
Rustic Moose: Large SB squeeze (5x). Passive post-flop

Final State Hand #100

Opponent Reads

Musa: Aggressive 3-bet/squeeze. C-bets dry boards ~50%. Leads river with bluffs
Amnesia Jill: Donks A-high flops (75% pot)

Exploits

Barrel Musa on A/K high boards exploit
Do NOT c-bet A-high into Jill exploit

Rustic Moose vs. Fierce Lion: Descriptive vs. Prescriptive Memory

Rustic Moose (Descriptive)

"Musa: Triple-barrels IP as PFR."
"Amnesia Jill: Folds to large pre 3-bets."
"Fierce Lion: Fit/fold post-flop without lead."

✗ Accurate observations, but never
  converted to actionable exploits.
✗ Noted Jill folds to 3-bets... but never
  light 3-bet against her.

Fierce Lion (Prescriptive)

"Barrel Musa on A/K high boards."
"Do NOT C-bet dry A-high into Jill."
"Call Musa's C-bets (<75% pot) on
 dry/paired boards with A-high+."

✓ Same observations converted into
  concrete action plans.
✓ Actually applied these in-game.

The 5-Handed Adaptation Test

When Minnie busted at hand #25, the table shrank from 6 to 5 players. Did agents adjust?

Player	Mentioned 5-Handed	Actually Widened Ranges	Grade
Musa	3 times	Yes, noted "UTG isn't as early"	A
Amnesia Jill	1 time	N/A, already aggressive	B+
Amnesia Mickey	2 times	Partially	C
Rustic Moose	3 times	No, still 12% VPIP	D
Fierce Lion	0 times	No, 4% PFR (!)	F

Memory's Biggest Failure

Fierce Lion had 411 chips (double the table average) after hand #25 but folded K8s from UTG, K6o from CO, and JTo from CO, all clear opens in 5-handed play. Its SKILL.md ranges were calibrated for 6-handed and never updated. The largest stack played like the shortest. Lost 27 chips in hands 26-50 from pure blind attrition.

Detailed Hand Histories

12 featured hands with interactive table replays and full AI reasoning traces.

View all 12 featured hands with interactive table replays and full AI reasoning traces →

Action Frequency Analysis

How each agent distributed their actions across 100 hands.