docs/cfr/kuhn/index.html
[View code on Github](https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/cfr/kuhn/ init.py)
This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.
Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.
Both players ante 1 chip (blindly bet 1 chip). After looking at the cards, the first player can either pass or bet 1 chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) 1 chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).
Here's some example games:
KAp - Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn't get a betting chance and Player 2 wins the pot of 2 chips.QKbp - Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of 4 because Player 2 folded.QAbb - Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of 4.He we extend the InfoSet class and History class defined in __init__.py with Kuhn Poker specifics.
37fromtypingimportList,cast,Dict3839importnumpyasnp4041fromlabmlimportexperiment42fromlabml.configsimportoption43fromlabml\_nn.cfrimportHistoryas\_History,InfoSetas\_InfoSet,Action,Player,CFRConfigs
Kuhn poker actions are pass (p ) or bet (b )
46ACTIONS=cast(List[Action],['p','b'])
The three cards in play are Ace, King and Queen
48CHANCES=cast(List[Action],['A','K','Q'])
There are two players
50PLAYERS=cast(List[Player],[0,1])
53classInfoSet(\_InfoSet):
Does not support save/load
58@staticmethod59deffrom\_dict(data:Dict[str,any])-\>'InfoSet':
61pass
Return the list of actions. Terminal states are handled by History class.
63defactions(self)-\>List[Action]:
67returnACTIONS
Human readable string representation - it gives the betting probability
69def\_\_repr\_\_(self):
73total=sum(self.cumulative\_strategy.values())74total=max(total,1e-6)75bet=self.cumulative\_strategy[cast(Action,'b')]/total76returnf'{bet \* 100: .1f}%'
This defines when a game ends, calculates the utility and sample chance events (dealing cards).
The history is stored in a string:
79classHistory(\_History):
History
93history:str
Initialize with a given history string
95def\_\_init\_\_(self,history:str=''):
99self.history=history
Whether the history is terminal (game over).
101defis\_terminal(self):
Players are yet to take actions
106iflen(self.history)\<=2:107returnFalse
Last player to play passed (game over)
109elifself.history[-1]=='p':110returnTrue
Both players called (bet) (game over)
112elifself.history[-2:]=='bb':113returnTrue
Any other combination
115else:116returnFalse
Calculate the terminal utility for player 1, u1(z)
118def\_terminal\_utility\_p1(self)-\>float:
+1 if Player 1 has a better card and −1 otherwise
123winner=-1+2\*(self.history[0]\<self.history[1])
Second player passed
126ifself.history[-2:]=='bp':127return1
Both players called, the player with better card wins 2 chips
129elifself.history[-2:]=='bb':130returnwinner\*2
First player passed, the player with better card wins 1 chip
132elifself.history[-1]=='p':133returnwinner
History is non-terminal
135else:136raiseRuntimeError()
Get the terminal utility for player i
138defterminal\_utility(self,i:Player)-\>float:
If i is Player 1
143ifi==PLAYERS[0]:144returnself.\_terminal\_utility\_p1()
Otherwise, u2(z)=−u1(z)
146else:147return-1\*self.\_terminal\_utility\_p1()
The first two events are card dealing; i.e. chance events
149defis\_chance(self)-\>bool:
153returnlen(self.history)\<2
Add an action to the history and return a new history
155def\_\_add\_\_(self,other:Action):
159returnHistory(self.history+other)
Current player
161defplayer(self)-\>Player:
165returncast(Player,len(self.history)%2)
Sample a chance action
167defsample\_chance(self)-\>Action:
171whileTrue:
Randomly pick a card
173r=np.random.randint(len(CHANCES))174chance=CHANCES[r]
See if the card was dealt before
176forcinself.history:177ifc==chance:178chance=None179break
Return the card if it was not dealt before
182ifchanceisnotNone:183returncast(Action,chance)
Human readable representation
185def\_\_repr\_\_(self):
189returnrepr(self.history)
Information set key for the current history. This is a string of actions only visible to the current player.
191definfo\_set\_key(self)-\>str:
Get current player
197i=self.player()
Current player sees her card and the betting actions
199returnself.history[i]+self.history[2:]
201defnew\_info\_set(self)-\>InfoSet:
Create a new information set object
203returnInfoSet(self.info\_set\_key())
A function to create an empty history object
206defcreate\_new\_history():
208returnHistory()
Configurations extends the CFR configurations class
211classConfigs(CFRConfigs):
215pass
Set the create_new_history method for Kuhn Poker
218@option(Configs.create\_new\_history)219def\_cnh():
223returncreate\_new\_history
226defmain():
Create an experiment, we only write tracking information to sqlite to speed things up. Since the algorithm iterates fast and we track data on each iteration, writing to other destinations such as Tensorboard can be relatively time consuming. SQLite is enough for our analytics.
235experiment.create(name='kuhn\_poker',writers={'sqlite'})
Initialize configuration
237conf=Configs()
Load configuration
239experiment.configs(conf)
Start the experiment
241withexperiment.start():
Start iterating
243conf.cfr.iterate()
247if\_\_name\_\_=='\_\_main\_\_':248main()