Back to Annotated Deep Learning Paper Implementations

Counterfactual Regret Minimization (CFR) on Kuhn Poker

docs/cfr/kuhn/index.html

latest7.9 KB
Original Source

homecfrkuhn

[View code on Github](https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/cfr/kuhn/ init.py)

#

Counterfactual Regret Minimization (CFR) on Kuhn Poker

This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.

Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.

Both players ante 1 chip (blindly bet 1 chip). After looking at the cards, the first player can either pass or bet 1 chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) 1 chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).

Here's some example games:

  • KAp - Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn't get a betting chance and Player 2 wins the pot of 2 chips.
  • QKbp - Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of 4 because Player 2 folded.
  • QAbb - Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of 4.

He we extend the InfoSet class and History class defined in __init__.py with Kuhn Poker specifics.

37fromtypingimportList,cast,Dict3839importnumpyasnp4041fromlabmlimportexperiment42fromlabml.configsimportoption43fromlabml\_nn.cfrimportHistoryas\_History,InfoSetas\_InfoSet,Action,Player,CFRConfigs

#

Kuhn poker actions are pass (p ) or bet (b )

46ACTIONS=cast(List[Action],['p','b'])

#

The three cards in play are Ace, King and Queen

48CHANCES=cast(List[Action],['A','K','Q'])

#

There are two players

50PLAYERS=cast(List[Player],[0,1])

#

Information set

53classInfoSet(\_InfoSet):

#

Does not support save/load

58@staticmethod59deffrom\_dict(data:Dict[str,any])-\>'InfoSet':

#

61pass

#

Return the list of actions. Terminal states are handled by History class.

63defactions(self)-\>List[Action]:

#

67returnACTIONS

#

Human readable string representation - it gives the betting probability

69def\_\_repr\_\_(self):

#

73total=sum(self.cumulative\_strategy.values())74total=max(total,1e-6)75bet=self.cumulative\_strategy[cast(Action,'b')]/total76returnf'{bet \* 100: .1f}%'

#

History

This defines when a game ends, calculates the utility and sample chance events (dealing cards).

The history is stored in a string:

  • First two characters are the cards dealt to player 1 and player 2
  • The third character is the action by the first player
  • Fourth character is the action by the second player
79classHistory(\_History):

#

History

93history:str

#

Initialize with a given history string

95def\_\_init\_\_(self,history:str=''):

#

99self.history=history

#

Whether the history is terminal (game over).

101defis\_terminal(self):

#

Players are yet to take actions

106iflen(self.history)\<=2:107returnFalse

#

Last player to play passed (game over)

109elifself.history[-1]=='p':110returnTrue

#

Both players called (bet) (game over)

112elifself.history[-2:]=='bb':113returnTrue

#

Any other combination

115else:116returnFalse

#

Calculate the terminal utility for player 1, u1​(z)

118def\_terminal\_utility\_p1(self)-\>float:

#

+1 if Player 1 has a better card and −1 otherwise

123winner=-1+2\*(self.history[0]\<self.history[1])

#

Second player passed

126ifself.history[-2:]=='bp':127return1

#

Both players called, the player with better card wins 2 chips

129elifself.history[-2:]=='bb':130returnwinner\*2

#

First player passed, the player with better card wins 1 chip

132elifself.history[-1]=='p':133returnwinner

#

History is non-terminal

135else:136raiseRuntimeError()

#

Get the terminal utility for player i

138defterminal\_utility(self,i:Player)-\>float:

#

If i is Player 1

143ifi==PLAYERS[0]:144returnself.\_terminal\_utility\_p1()

#

Otherwise, u2​(z)=−u1​(z)

146else:147return-1\*self.\_terminal\_utility\_p1()

#

The first two events are card dealing; i.e. chance events

149defis\_chance(self)-\>bool:

#

153returnlen(self.history)\<2

#

Add an action to the history and return a new history

155def\_\_add\_\_(self,other:Action):

#

159returnHistory(self.history+other)

#

Current player

161defplayer(self)-\>Player:

#

165returncast(Player,len(self.history)%2)

#

Sample a chance action

167defsample\_chance(self)-\>Action:

#

171whileTrue:

#

Randomly pick a card

173r=np.random.randint(len(CHANCES))174chance=CHANCES[r]

#

See if the card was dealt before

176forcinself.history:177ifc==chance:178chance=None179break

#

Return the card if it was not dealt before

182ifchanceisnotNone:183returncast(Action,chance)

#

Human readable representation

185def\_\_repr\_\_(self):

#

189returnrepr(self.history)

#

Information set key for the current history. This is a string of actions only visible to the current player.

191definfo\_set\_key(self)-\>str:

#

Get current player

197i=self.player()

#

Current player sees her card and the betting actions

199returnself.history[i]+self.history[2:]

#

201defnew\_info\_set(self)-\>InfoSet:

#

Create a new information set object

203returnInfoSet(self.info\_set\_key())

#

A function to create an empty history object

206defcreate\_new\_history():

#

208returnHistory()

#

Configurations extends the CFR configurations class

211classConfigs(CFRConfigs):

#

215pass

#

Set the create_new_history method for Kuhn Poker

218@option(Configs.create\_new\_history)219def\_cnh():

#

223returncreate\_new\_history

#

Run the experiment

226defmain():

#

Create an experiment, we only write tracking information to sqlite to speed things up. Since the algorithm iterates fast and we track data on each iteration, writing to other destinations such as Tensorboard can be relatively time consuming. SQLite is enough for our analytics.

235experiment.create(name='kuhn\_poker',writers={'sqlite'})

#

Initialize configuration

237conf=Configs()

#

Load configuration

239experiment.configs(conf)

#

Start the experiment

241withexperiment.start():

#

Start iterating

243conf.cfr.iterate()

#

247if\_\_name\_\_=='\_\_main\_\_':248main()

labml.ai