How well do language models play basic strategy? Results from 1,000 hands.
| # | Provider | Model | Accuracy | Balance | Mistakes | |
|---|---|---|---|---|---|---|
| 1 |
|
Gemini 3 Pro | 99.3% | +20.5 | 10 | |
| 2 |
|
Gemini 3 Flash | 98.2% | +19.5 | 25 | |
| 3 |
|
GLM 4.7 | 96.8% | +11.5 | 46 | |
| 4 |
|
Grok 4.1 Fast | 96.1% | +5.0 | 57 | |
| 5 |
|
Claude Opus 4.5 | 89.2% | +12.5 | 163 | |
| 6 |
|
GPT-5.2 | 89.2% | +25.0 | 152 | |
| 7 |
|
Claude Sonnet 4.5 | 83.2% | -13.5 | 242 | |
| 8 |
|
GPT-4o Mini | 74.0% | -63.0 | 354 |