For the installation of Gym, please refer to the official tutorial or the following link:
Env classes in Gym mainly have the following methods:
- action_space: action space.
- observation_space: state space.
- reset(): initializes a state, and the environment provides an initial state matrix
- step(): receive the action from the agent, complete the state transfer, judge whether the game is terminated, and provide reward information.
. We need to focus on two methods: reset() and step().
The reset() method is very simple and has no parameters. You want to initialize a state.
The step() method is the core part of the whole environment, which mainly implements the following functions:
- When receiving an action, the next state needs to be generated. The state representation is usually an array vector.
- Feedback reward signal to agent. Reward is usually a float type of data.
- Determine whether the epiride is terminated.
The classic control object of inverted pendulum is equivalent to the mnist data set of deep learning.
first import the gym package, and then make (Env) based on Gym_ Name) to create an environment, as follows:
import gym env = gym.make("CartPole-v0")
Through OBS= env.reset () you can view the initial state variables, an array of four dimensions, including the car position, car speed, bar angle, angular acceleration information.
you can view action space and state space information with the following two commands:
if a random agent is used, its completion code is as follows:
import gym if __name__ == "__main__": env = gym.make("CartPole-v0") total_reward = 0.0 total_steps = 0 obs = env.reset() while True: action = env.action_space.sample() obs, reward, done, _ = env.step(action) total_reward += reward total_steps += 1 if done: break print("Episode done in %d steps, total reward %.2f" % ( total_steps, total_reward))
Wrappers means packaging. When using gym, we sometimes expect to be able to develop and design the observation information of the game by ourselves. For example, one frame of game pixels may not be enough, and we need multiple frames of data (to contain the moving direction of the object). Gym provides a convenient development tool wrapper class. Its hierarchy in gym is shown in the following figure:
The Wrapper class inherits the Env class. If you want to extend some functions, you just need to define the way you want to expand, such as step() or reset(), and only need to call the previous method in the subclass. Redefining States, rewards, and actions is OK.
- If we want to redefine actions now, and introduce 10% random actions to balance exploration and utilization, we can redefine them directly in wrapper:
import gym from typing import TypeVar import random Action = TypeVar('Action') class RandomActionWrapper(gym.ActionWrapper): def __init__(self, env, epsilon=0.1): super(RandomActionWrapper, self).__init__(env) self.epsilon = epsilon
super(RandomActionWrapper, self).__init__(env) first find the parent class of RandomActionWrapper (i.e gym.ActionWrapper ), and then gym.ActionWrapper The object of is converted to an object of class randomactionwrapper.
After , you want to cover the random sampling process of the parent class with a certain greedy strategy:
def action(self, action: Action) -> Action: if random.random() < self.epsilon: print("Random!") return self.env.action_space.sample() return action
if you still don't know how to work, we can locate the gym source code for this piece of writing
you can see that action is called every time step is called, so the random noise is added to it.
the complete code is as follows:
import gym from typing import TypeVar import random Action = TypeVar('Action') class RandomActionWrapper(gym.ActionWrapper): def __init__(self, env, epsilon=0.1): super(RandomActionWrapper, self).__init__(env) self.epsilon = epsilon def action(self, action: Action) -> Action: if random.random() < self.epsilon: print("Random!") return self.env.action_space.sample() return action if __name__ == "__main__": env = RandomActionWrapper(gym.make("CartPole-v0")) obs = env.reset() total_reward = 0.0 while True: obs, reward, done, _ = env.step(0) total_reward += reward if done: break print("Reward got: %.2f" % total_reward)