docs/rl/game.html
9importmultiprocessing10importmultiprocessing.connection1112importcv213importgym14importnumpyasnp
This is a wrapper for OpenAI gym game environment. We do a few things here:
Observation is tensor of size (4, 84, 84). It is four frames (images of the game screen) stacked on first axis. i.e, each channel is a frame.
17classGame:
38def\_\_init\_\_(self,seed:int):
create environment
40self.env=gym.make('BreakoutNoFrameskip-v4')41self.env.seed(seed)
tensor for a stack of 4 frames
44self.obs\_4=np.zeros((4,84,84))
buffer to keep the maximum of last 2 frames
47self.obs\_2\_max=np.zeros((2,84,84))
keep track of the episode rewards
50self.rewards=[]
and number of lives left
52self.lives=0
Executes action for 4 time steps and returns a tuple of (observation, reward, done, episode_info).
observation : stacked 4 frames (this frame and frames for last 3 actions)reward : total reward while the action was executeddone : whether the episode finished (a life lost)episode_info : episode information if completed54defstep(self,action):
66reward=0.67done=None
run for 4 steps
70foriinrange(4):
execute the action in the OpenAI Gym environment
72obs,r,done,info=self.env.step(action)7374ifi\>=2:75self.obs\_2\_max[i%2]=self.\_process\_obs(obs)7677reward+=r
get number of lives left
80lives=self.env.unwrapped.ale.lives()
reset if a life is lost
82iflives\<self.lives:83done=True84break
maintain rewards for each step
87self.rewards.append(reward)8889ifdone:
if finished, set episode information if episode is over, and reset
91episode\_info={"reward":sum(self.rewards),"length":len(self.rewards)}92self.reset()93else:94episode\_info=None
get the max of last two frames
97obs=self.obs\_2\_max.max(axis=0)
push it to the stack of 4 frames
100self.obs\_4=np.roll(self.obs\_4,shift=-1,axis=0)101self.obs\_4[-1]=obs102103returnself.obs\_4,reward,done,episode\_info
Clean up episode info and 4 frame stack
105defreset(self):
reset OpenAI Gym environment
112obs=self.env.reset()
reset caches
115obs=self.\_process\_obs(obs)116foriinrange(4):117self.obs\_4[i]=obs118self.rewards=[]119120self.lives=self.env.unwrapped.ale.lives()121122returnself.obs\_4
Convert game frames to gray and rescale to 84x84
124@staticmethod125def\_process\_obs(obs):
130obs=cv2.cvtColor(obs,cv2.COLOR\_RGB2GRAY)131obs=cv2.resize(obs,(84,84),interpolation=cv2.INTER\_AREA)132returnobs
Each worker process runs this method
135defworker\_process(remote:multiprocessing.connection.Connection,seed:int):
create game
143game=Game(seed)
wait for instructions from the connection and execute them
146whileTrue:147cmd,data=remote.recv()148ifcmd=="step":149remote.send(game.step(data))150elifcmd=="reset":151remote.send(game.reset())152elifcmd=="close":153remote.close()154break155else:156raiseNotImplementedError
Creates a new worker and runs it in a separate process.
159classWorker:
164def\_\_init\_\_(self,seed):165self.child,parent=multiprocessing.Pipe()166self.process=multiprocessing.Process(target=worker\_process,args=(parent,seed))167self.process.start()