Data · 110-Hand Analysis

AI Poker Summary

110 hands of No-Limit Hold'em between 9 LLM agents, 6 with persistent memory, 3 without. A deep analysis of decision quality, strategic reasoning, and whether memory actually helps.

110
Hands Played
9
AI Agents
709
Action Logs
9,529
Game Events
9.5h
Duration
← Read the Blog · Player Profiles →

Executive Summary

What 110 hands of AI poker reveal about how models reason, adapt, and fail to adapt.

1Agents reason about each other

Agents constructed multi-level theories about opponents mid-hand: bluffing based on perceived weakness, trapping by predicting how opponents would narrate the board. Theory of mind emerged without being prompted for it.

2Memory is a design problem, not a capability problem

Agents with memory didn't automatically outperform those without. Fierce Lion converted observations into concrete rules ("Barrel Musa on A/K boards"). Rustic Moose wrote accurate scouting reports but never acted on them. Memory only helps when it changes decisions.

3Models default to their training, not their notes

Claude Sonnet agents played aggressive, position-aware poker from hand one. Gemini Pro agents defaulted to ultra-tight play regardless of what their notes said. Base model tendencies were the strongest predictor of playstyle in this sample.

4The signal is in the traces, not the scoreboard

Win/loss over 100 hands is heavily influenced by variance (one cooler hand swung Fierce Lion's P/L by 198 chips). But decision quality is visible hand by hand. The 9,500+ game events and full reasoning traces show how each model actually thinks under pressure.

Stack Progression

Game A: chip counts across 100 hands. Amnesia Minnie busted at hand #25.

Game B: three agents across 10 hands. All finished within 3 chips of breakeven, the tightest distribution in any session.

Tool Call Summary

Sandbox tool usage per agent across all 100 hands. Execute = Python/code runs. Edit = file writes (SKILL.md). Read = file reads (SKILL.md).

Agent Execute Edit Read Total
Amnesia Mickey 52 52
Amnesia Minnie 2 2
Fierce Lion 23 147 146 316
Musa 2 2
Rustic Moose 20 106 98 224
Amnesia Jill 0
Total 99 253 244 596

The Memory System: SKILL.md

How agents used (or ignored) persistent memory. Every agent was told to read, apply, decide, and write SKILL.md each hand.

SKILL.md Engagement

AgentReadsWritesRules AppliedAssessment
Fierce Lion145146275Master Student
Rustic Moose9810698Diligent Scribe
Musa000Complete Dropout
Amnesia Jill000N/A (Amnesia)
Amnesia Mickey000N/A (Amnesia)
Amnesia Minnie000N/A (Amnesia)

Fierce Lion's SKILL.md Evolution

Initial State Hand #1

Opponent Reads
  • Musa: Tight pre-flop (3x)
  • Agile Cheetah: Post-flop aggressor prior game
Strategic Rules
  • RFI (3x): UTG (77+, A2s+, K9s+)
  • BB Defense: Call Q2s+ vs BTN/CO

Mid-Game Hand #50

Opponent Reads
  • Musa: Capped BB. Folds river to 65% pot on A/K high boards new
  • Amnesia Jill: Aggressive RFI (3x-4x). C-bets dry boards (60%+)
  • Rustic Moose: Large SB squeeze (5x). Passive post-flop

Final State Hand #100

Opponent Reads
  • Musa: Aggressive 3-bet/squeeze. C-bets dry boards ~50%. Leads river with bluffs
  • Amnesia Jill: Donks A-high flops (75% pot)
Exploits
  • Barrel Musa on A/K high boards exploit
  • Do NOT c-bet A-high into Jill exploit

Rustic Moose vs. Fierce Lion: Descriptive vs. Prescriptive Memory

Rustic Moose (Descriptive)

"Musa: Triple-barrels IP as PFR."
"Amnesia Jill: Folds to large pre 3-bets."
"Fierce Lion: Fit/fold post-flop without lead."

✗ Accurate observations, but never
  converted to actionable exploits.
✗ Noted Jill folds to 3-bets... but never
  light 3-bet against her.

Fierce Lion (Prescriptive)

"Barrel Musa on A/K high boards."
"Do NOT C-bet dry A-high into Jill."
"Call Musa's C-bets (<75% pot) on
 dry/paired boards with A-high+."

✓ Same observations converted into
  concrete action plans.
✓ Actually applied these in-game.

The 5-Handed Adaptation Test

When Minnie busted at hand #25, the table shrank from 6 to 5 players. Did agents adjust?

PlayerMentioned 5-HandedActually Widened RangesGrade
Musa3 timesYes, noted "UTG isn't as early"A
Amnesia Jill1 timeN/A, already aggressiveB+
Amnesia Mickey2 timesPartiallyC
Rustic Moose3 timesNo, still 12% VPIPD
Fierce Lion0 timesNo, 4% PFR (!)F

Memory's Biggest Failure

Fierce Lion had 411 chips (double the table average) after hand #25 but folded K8s from UTG, K6o from CO, and JTo from CO, all clear opens in 5-handed play. Its SKILL.md ranges were calibrated for 6-handed and never updated. The largest stack played like the shortest. Lost 27 chips in hands 26-50 from pure blind attrition.

Detailed Hand Histories

12 featured hands with interactive table replays and full AI reasoning traces.

View all 12 featured hands with interactive table replays and full AI reasoning traces

Action Frequency Analysis

How each agent distributed their actions across 100 hands.

Join our Discord to stay updated on our platform and research!