Back to Annotated Deep Learning Paper Implementations

Atari wrapper with multi-processing

docs/rl/game.html

latest4.4 KB
Original Source

homerl

View code on Github

#

Atari wrapper with multi-processing

9importmultiprocessing10importmultiprocessing.connection1112importcv213importgym14importnumpyasnp

#

Game environment

This is a wrapper for OpenAI gym game environment. We do a few things here:

  1. Apply the same action on four frames and get the last frame 2. Convert observation frames to gray and scale it to (84, 84) 3. Stack four frames of the last four actions 4. Add episode information (total reward for the entire episode) for monitoring 5. Restrict an episode to a single life (game has 5 lives, we reset after every single life)

Observation format

Observation is tensor of size (4, 84, 84). It is four frames (images of the game screen) stacked on first axis. i.e, each channel is a frame.

17classGame:

#

38def\_\_init\_\_(self,seed:int):

#

create environment

40self.env=gym.make('BreakoutNoFrameskip-v4')41self.env.seed(seed)

#

tensor for a stack of 4 frames

44self.obs\_4=np.zeros((4,84,84))

#

buffer to keep the maximum of last 2 frames

47self.obs\_2\_max=np.zeros((2,84,84))

#

keep track of the episode rewards

50self.rewards=[]

#

and number of lives left

52self.lives=0

#

Step

Executes action for 4 time steps and returns a tuple of (observation, reward, done, episode_info).

  • observation : stacked 4 frames (this frame and frames for last 3 actions)
  • reward : total reward while the action was executed
  • done : whether the episode finished (a life lost)
  • episode_info : episode information if completed
54defstep(self,action):

#

66reward=0.67done=None

#

run for 4 steps

70foriinrange(4):

#

execute the action in the OpenAI Gym environment

72obs,r,done,info=self.env.step(action)7374ifi\>=2:75self.obs\_2\_max[i%2]=self.\_process\_obs(obs)7677reward+=r

#

get number of lives left

80lives=self.env.unwrapped.ale.lives()

#

reset if a life is lost

82iflives\<self.lives:83done=True84break

#

maintain rewards for each step

87self.rewards.append(reward)8889ifdone:

#

if finished, set episode information if episode is over, and reset

91episode\_info={"reward":sum(self.rewards),"length":len(self.rewards)}92self.reset()93else:94episode\_info=None

#

get the max of last two frames

97obs=self.obs\_2\_max.max(axis=0)

#

push it to the stack of 4 frames

100self.obs\_4=np.roll(self.obs\_4,shift=-1,axis=0)101self.obs\_4[-1]=obs102103returnself.obs\_4,reward,done,episode\_info

#

Reset environment

Clean up episode info and 4 frame stack

105defreset(self):

#

reset OpenAI Gym environment

112obs=self.env.reset()

#

reset caches

115obs=self.\_process\_obs(obs)116foriinrange(4):117self.obs\_4[i]=obs118self.rewards=[]119120self.lives=self.env.unwrapped.ale.lives()121122returnself.obs\_4

#

Process game frames

Convert game frames to gray and rescale to 84x84

124@staticmethod125def\_process\_obs(obs):

#

130obs=cv2.cvtColor(obs,cv2.COLOR\_RGB2GRAY)131obs=cv2.resize(obs,(84,84),interpolation=cv2.INTER\_AREA)132returnobs

#

Worker Process

Each worker process runs this method

135defworker\_process(remote:multiprocessing.connection.Connection,seed:int):

#

create game

143game=Game(seed)

#

wait for instructions from the connection and execute them

146whileTrue:147cmd,data=remote.recv()148ifcmd=="step":149remote.send(game.step(data))150elifcmd=="reset":151remote.send(game.reset())152elifcmd=="close":153remote.close()154break155else:156raiseNotImplementedError

#

Creates a new worker and runs it in a separate process.

159classWorker:

#

164def\_\_init\_\_(self,seed):165self.child,parent=multiprocessing.Pipe()166self.process=multiprocessing.Process(target=worker\_process,args=(parent,seed))167self.process.start()

labml.ai