Brief introduction and expansion of Gym function

Article catalog

For the installation of Gym, please refer to the official tutorial or the following link:

  • https://blog.csdn.net/weixin_39059031/article/details/82085916

Env classes in   Gym mainly have the following methods:

  1. action_space: action space.
  2. observation_space: state space.
  3. reset(): initializes a state, and the environment provides an initial state matrix
  4. step(): receive the action from the agent, complete the state transfer, judge whether the game is terminated, and provide reward information.

                     . We need to focus on two methods: reset() and step().

The    reset() method is very simple and has no parameters. You want to initialize a state.

The    step() method is the core part of the whole environment, which mainly implements the following functions:

  1. When receiving an action, the next state needs to be generated. The state representation is usually an array vector.
  2. Feedback reward signal to agent. Reward is usually a float type of data.
  3. Determine whether the epiride is terminated.

Inverted pendulum

The classic control object of inverted pendulum is equivalent to the mnist data set of deep learning.

   first import the gym package, and then make (Env) based on Gym_ Name) to create an environment, as follows:

import gym
env = gym.make("CartPole-v0")

Through OBS= env.reset () you can view the initial state variables, an array of four dimensions, including the car position, car speed, bar angle, angular acceleration information.

   you can view action space and state space information with the following two commands:

env.action_space
env.observation_space

                       .

env.action_space.sample()

   if a random agent is used, its completion code is as follows:

import gym

if __name__ == "__main__":
    env = gym.make("CartPole-v0")

    total_reward = 0.0
    total_steps = 0
    obs = env.reset()

    while True:
        action = env.action_space.sample()
        obs, reward, done, _ = env.step(action)
        total_reward += reward
        total_steps += 1
        if done:
            break

    print("Episode done in %d steps, total reward %.2f" % (
        total_steps, total_reward))

Wrappers

Wrappers means packaging. When using gym, we sometimes expect to be able to develop and design the observation information of the game by ourselves. For example, one frame of game pixels may not be enough, and we need multiple frames of data (to contain the moving direction of the object). Gym provides a convenient development tool wrapper class. Its hierarchy in gym is shown in the following figure:

The   Wrapper class inherits the Env class. If you want to extend some functions, you just need to define the way you want to expand, such as step() or reset(), and only need to call the previous method in the subclass. Redefining States, rewards, and actions is OK.

  1. If we want to redefine actions now, and introduce 10% random actions to balance exploration and utilization, we can redefine them directly in wrapper:
import gym
from typing import TypeVar
import random

Action = TypeVar('Action')

class RandomActionWrapper(gym.ActionWrapper):
    def __init__(self, env, epsilon=0.1):
        super(RandomActionWrapper, self).__init__(env)
        self.epsilon = epsilon

  super(RandomActionWrapper, self).__init__(env) first find the parent class of RandomActionWrapper (i.e gym.ActionWrapper ), and then gym.ActionWrapper The object of is converted to an object of class randomactionwrapper.

After  , you want to cover the random sampling process of the parent class with a certain greedy strategy:

    def action(self, action: Action) -> Action:
        if random.random() < self.epsilon:
            print("Random!")
            return self.env.action_space.sample()
        return action

    if you still don't know how to work, we can locate the gym source code for this piece of writing

   you can see that action is called every time step is called, so the random noise is added to it.

   the complete code is as follows:

import gym
from typing import TypeVar
import random

Action = TypeVar('Action')

class RandomActionWrapper(gym.ActionWrapper):
    def __init__(self, env, epsilon=0.1):
        super(RandomActionWrapper, self).__init__(env)
        self.epsilon = epsilon

    def action(self, action: Action) -> Action:
        if random.random() < self.epsilon:
            print("Random!")
            return self.env.action_space.sample()
        return action


if __name__ == "__main__":
    env = RandomActionWrapper(gym.make("CartPole-v0"))

    obs = env.reset()
    total_reward = 0.0

    while True:
        obs, reward, done, _ = env.step(0)
        total_reward += reward
        if done:
            break

    print("Reward got: %.2f" % total_reward)

Tags: angular

Posted on Wed, 17 Jun 2020 21:28:38 -0400 by aprinc