Dungeon escape for ML-Agents case

This case is derived from the official ML-Agents example, Github address: https://github.com/Unity-Technologies/ml-agents This is a detailed accompanying explanation.

This article is based on two previous articles I have published. It is necessary to have some knowledge of ML-Agents. For more information, see: Use of ML-Agents for Unity Enhanced Learning,ML-Agents Command and Configuration Complete.

My previous articles are:

Crawler for ML-Agents case

ML-Agents Case Push Box Game

ML-Agents Case Wall Jump Game

Food collectors in ML-Agents cases

ML-Agents Case of Two-player Football

Unity AI's Self-Evolving Five-Man Football Game

Environmental description

  • Setup: Agents are trapped in a dragon dungeon and must work together to escape. In order to retrieve the key, one of the agents must find and kill the Dragon at his own expense. The dragon drops a key for others to use. The other agents can then pick up the key and open the dungeon door. If the agent takes too long, the dragon will escape through the portal and the environment will be reset.

  • Goal: Open the dungeon door and leave.

  • If any agent successfully opens the door and leaves the dungeon, +1 team reward is given.

  • The training difficulty of this project is that in order to reward the team, the agent must learn to sacrifice himself.

  • Input: The input of the agent contains a ray sensor, RayPerceptionSensor3D, which identifies the labels as wall, teammate, dragon, key, door lock, dragon cave. There are 15 rays in total. The parameters are shown in the picture below. For a detailed description of the sensor, see ML-Agents Case Push Box Game.

    In addition to the sensor, an input is added to the program that detects if there is a key on the body of the agent.

  • Output: The agent only takes one discrete output, which contains seven, representing nothing, going forward, going backward, going left, turning right, turning left, turning right. Fewer outputs will greatly reduce the complexity of the neural network and training time. The disadvantage is that only one action can be executed at the same time, which reduces the flexibility of the agent, such as not being able to advance and rotate at the same time, nor moving forward and right.

Code Explanation

First, there are three standard sets of Behavior Parameters, Decision Requester, Model Overrider. Only Behavior Parameters need to be adjusted, as shown in the figure above. Their roles have been explained in detail before, but they will not be explained here.

Now look at the main agent code, PushAgent Escape.cs:

Initialize():

public override void Initialize()
{
    // Get Components
    m_GameController = GetComponentInParent<DungeonEscapeEnvController>();
    m_AgentRb = GetComponent<Rigidbody>();
    m_PushBlockSettings = FindObjectOfType<PushBlockSettings>();
    // No key by default
    MyKey.SetActive(false);
    IHaveAKey = false;
}

Processing OnEpisodeBegin() method at the beginning of each episode:

public override void OnEpisodeBegin()
{
    MyKey.SetActive(false);
    IHaveAKey = false;
}

State input CollectObservations method:

public override void CollectObservations(VectorSensor sensor)
{
    sensor.AddObservation(IHaveAKey);
}

You can see that in addition to the input of the sensor, there is only one input for whether or not you have a key.

Action output OnActionReceived method:

public override void OnActionReceived(ActionBuffers actionBuffers)
{
    MoveAgent(actionBuffers.DiscreteActions);
}

public void MoveAgent(ActionSegment<int> act)
{
    var dirToGo = Vector3.zero;
    var rotateDir = Vector3.zero;

    var action = act[0];

    switch (action)
    {
        case 1:
            dirToGo = transform.forward * 1f;
            break;
        case 2:
            dirToGo = transform.forward * -1f;
            break;
        case 3:
            rotateDir = transform.up * 1f;
            break;
        case 4:
            rotateDir = transform.up * -1f;
            break;
        case 5:
            dirToGo = transform.right * -0.75f;
            break;
        case 6:
            dirToGo = transform.right * 0.75f;
            break;
    }
    // Perform Rotation
    transform.Rotate(rotateDir, Time.fixedDeltaTime * 200f);
    // Apply force to the rigid body and perform the movement
    m_AgentRb.AddForce(dirToGo * m_PushBlockSettings.agentRunSpeed,
                       ForceMode.VelocityChange);
}

You can see that there is only one discrete output, which contains 0-6 seven values, of which 0 is not doing.

Collision detection:

Collision detection is divided into two parts, where cave, dragon and door lock are collisions and OnCollisionEnter method is called:

void OnCollisionEnter(Collision col)
{
     // When there is a key on the body and the lock is touched, the door opens and the key is consumed. Call the UnlockDoor method
    if (col.transform.CompareTag("lock"))
    {       
        if (IHaveAKey)
        {
            MyKey.SetActive(false);
            IHaveAKey = false;
            m_GameController.UnlockDoor();
        }
    }
    // When you encounter a dragon, destroy the key on the body (it is virtually impossible to have a key on the body at this time, for logic reasons), and call the KilledByBaddie method
    if (col.transform.CompareTag("dragon"))
    {
        m_GameController.KilledByBaddie(this, col);
        MyKey.SetActive(false);
        IHaveAKey = false;
    }
    // Call the TouchedHazard method when you encounter a cave
    if (col.transform.CompareTag("portal"))
    {
        m_GameController.TouchedHazard(this);
    }
}

The other part is the key, which is set as a trigger instead of a collider and calls the OnTriggerEnter method:

void OnTriggerEnter(Collider col)
{
    // If the key is under the same parent as the agent and the agent is active
    // Then deactivate the key and activate the child object key on the body, so it looks like picking up the key
    if (col.transform.CompareTag("key") && col.transform.parent == transform.parent && 	                           		  gameObject.activeInHierarchy)
    {
        print("Picked up key");
        MyKey.SetActive(true);
        IHaveAKey = true;
        col.gameObject.SetActive(false);
    }
}

If the player wants to manually manipulate one of the agents, he or she needs to override the Heuristic method without a model for the agent:

public override void Heuristic(in ActionBuffers actionsOut)
{
    var discreteActionsOut = actionsOut.DiscreteActions;
    if (Input.GetKey(KeyCode.D))
    {
        discreteActionsOut[0] = 3;
    }
    else if (Input.GetKey(KeyCode.W))
    {
        discreteActionsOut[0] = 1;
    }
    else if (Input.GetKey(KeyCode.A))
    {
        discreteActionsOut[0] = 4;
    }
    else if (Input.GetKey(KeyCode.S))
    {
        discreteActionsOut[0] = 2;
    }
}

The following explains the script DungeonEscapeEnvController.cs that controls the entire environment:

The script first defines the information classes owned by the agents and dragons, encapsulating key information for easy invocation and making the code more concise and beautiful:

// Define Agent Information Class
public class PlayerInfo
{
    // Agent scripts
    public PushAgentEscape Agent;
    // Agent Start Location
    public Vector3 StartingPos;
    // Agent Initial Rotation Vector
    public Quaternion StartingRot;
    // Agent Rigid Body
    public Rigidbody Rb;
    // Agent Collisor
    public Collider Col;
}

// Define Dragon Information Class
public class DragonInfo
{
    // Dragon's script
    public SimpleNPC Agent;
    // Dragon's starting position
    public Vector3 StartingPos;
    // Dragon's Real Rotation Vector
    public Quaternion StartingRot;
    // Dragon's Rigid Body
    public Rigidbody Rb;
    // Dragon's Collider
    public Collider Col;
    // Starting Transform
    public Transform T;
    // Is Death
    public bool IsDead;
}

Then a series of variables are defined:

// The maximum number of steps and time per episode, beyond which the environment will be reset
[Header("Max Environment Steps")] public int MaxEnvironmentSteps = 25000;
private int m_ResetTimer;
// Area Size
public Bounds areaBounds;
// ground
public GameObject ground;
// Ground Material
Material m_GroundMaterial; 
// Ground Rendering
Renderer m_GroundRenderer;
// Agent Information List
public List<PlayerInfo> AgentsList = new List<PlayerInfo>();
// Dragon Information List
public List<DragonInfo> DragonsList = new List<DragonInfo>();
// Create a dictionary with key as agent script and value as agent information
private Dictionary<PushAgentEscape, PlayerInfo> m_PlayerDict = new Dictionary<PushAgentEscape, PlayerInfo>();
// Whether the position and rotation of a random agent
public bool UseRandomAgentRotation = true;
public bool UseRandomAgentPosition = true;
// I reused the push box script without changing its name
PushBlockSettings m_PushBlockSettings;
// Number of agents in stock
private int m_NumberOfRemainingPlayers;
// Key
public GameObject Key;
// Tombstone
public GameObject Tombstone;
// Intelligence Groups (Most Important)
private SimpleMultiAgentGroup m_AgentGroup;

Then you initialize the scene and call the Start method:

void Start()
{
    // Getting ground boundaries
    areaBounds = ground.GetComponent<Collider>().bounds;
    // Get ground rendering to change material easily
    m_GroundRenderer = ground.GetComponent<Renderer>();
    // Initial material
    m_GroundMaterial = m_GroundRenderer.material;
    // Get global setup script
    m_PushBlockSettings = FindObjectOfType<PushBlockSettings>();
    // Recalculate Agents Existing on the Field
    m_NumberOfRemainingPlayers = AgentsList.Count;
    // Hide key
    Key.SetActive(false);
    // Add the corresponding information to the agents in the list and add the agents to the group. The agents in the same group will work together
    m_AgentGroup = new SimpleMultiAgentGroup();
    foreach (var item in AgentsList)
    {
        item.StartingPos = item.Agent.transform.position;
        item.StartingRot = item.Agent.transform.rotation;
        item.Rb = item.Agent.GetComponent<Rigidbody>();
        item.Col = item.Agent.GetComponent<Collider>();
        // Add to Group
        m_AgentGroup.RegisterAgent(item.Agent);
    }
    // Add information to dragons in the Dragon list
    foreach (var item in DragonsList)
    {
        item.StartingPos = item.Agent.transform.position;
        item.StartingRot = item.Agent.transform.rotation;
        item.T = item.Agent.transform;
        item.Col = item.Agent.GetComponent<Collider>();
    }
	// Reset scene
    ResetScene();
}

In ResetScene:

 void ResetScene()
 {
     // Reset Timing
     m_ResetTimer = 0;
     // Reset the number of surviving agents
     m_NumberOfRemainingPlayers = AgentsList.Count;
	// Rotate the scene in any of four directions to prevent overfitting in one location
     var rotation = Random.Range(0, 4);
     var rotationAngle = rotation * 90f;
     transform.Rotate(new Vector3(0f, rotationAngle, 0f));

     // Reset each agent in the list
     foreach (var item in AgentsList)
     {
         // If Random is set, one location is randomized in the scene, and none is fixed
         var pos = UseRandomAgentPosition ? GetRandomSpawnPos() : item.StartingPos;
         var rot = UseRandomAgentRotation ? GetRandomRot() : item.StartingRot;		
         item.Agent.transform.SetPositionAndRotation(pos, rot);
         // All states cleared
         item.Rb.velocity = Vector3.zero;
         item.Rb.angularVelocity = Vector3.zero;
         item.Agent.MyKey.SetActive(false);
         item.Agent.IHaveAKey = false;
         item.Agent.gameObject.SetActive(true);
         // I think this line can be removed without having to add it again
         m_AgentGroup.RegisterAgent(item.Agent);
     }
     // Reset Key
     Key.SetActive(false);

     // Reset Tombstone
     Tombstone.SetActive(false);

     // Reset each dragon in the list
     foreach (var item in DragonsList)
     {
         if (!item.Agent)
         {
             return;
         }
         // Set a fixed starting position
         item.Agent.transform.SetPositionAndRotation(item.StartingPos, item.StartingRot);
         // Set Random Walking Speed
         item.Agent.SetRandomWalkSpeed();
         // Activate Agent
         item.Agent.gameObject.SetActive(true);
     }
 }

When you get a location in any scene, GetRandomSpawnPos is called, and this code is very reusable.

public Vector3 GetRandomSpawnPos()
{
    var foundNewSpawnLocation = false;
    var randomSpawnPos = Vector3.zero;
    while (foundNewSpawnLocation == false)
    {
        var randomPosX = Random.Range(-areaBounds.extents.x * m_PushBlockSettings.spawnAreaMarginMultiplier,
                                      areaBounds.extents.x * m_PushBlockSettings.spawnAreaMarginMultiplier);

        var randomPosZ = Random.Range(-areaBounds.extents.z * m_PushBlockSettings.spawnAreaMarginMultiplier,
                                      areaBounds.extents.z * m_PushBlockSettings.spawnAreaMarginMultiplier);
        randomSpawnPos = ground.transform.position + new Vector3(randomPosX, 1f, randomPosZ);
        // Check to see if there are collisions at the generated location, regenerate if there are, or exit the loop if there are no collisions
        if (Physics.CheckBox(randomSpawnPos, new Vector3(2.5f, 0.01f, 2.5f)) == false)
        {
            foundNewSpawnLocation = true;
        }
    }
    return randomSpawnPos;
}

The next FixedUpdate method is executed every 0.02 seconds:

The main test here is whether an episode has reached the set time and maximum number of steps, while satisfying both is an environment reset.

void FixedUpdate()
{
    m_ResetTimer += 1;
    if (m_ResetTimer >= MaxEnvironmentSteps && MaxEnvironmentSteps > 0)
    {
        m_AgentGroup.GroupEpisodeInterrupted();
        ResetScene();
    }
}

Next, three methods are defined for contacting dragons, caves and door locks:

When an agent touches a cave:

public void TouchedHazard(PushAgentEscape agent)
{
    // Agent death, number-1, number-0 reset environment
    m_NumberOfRemainingPlayers--;
    if (m_NumberOfRemainingPlayers == 0 || agent.IHaveAKey)
    {
        m_AgentGroup.EndGroupEpisode();
        ResetScene();
    }
    else
    {
        agent.gameObject.SetActive(false);
    }
}

When the agent touches the door lock:

public void UnlockDoor()
{
    // Get collective rewards
    m_AgentGroup.AddGroupReward(1f);
   // Change the floor material for 0.5 seconds
    StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.goalScoredMaterial, 0.5f));
    print("Unlocked Door");
    // End Game
    m_AgentGroup.EndGroupEpisode();
	// Reset scene
    ResetScene();
}

When an agent touches a dragon:

public void KilledByBaddie(PushAgentEscape agent, Collision baddieCol)
{
    // Dragon killed, hidden
    baddieCol.gameObject.SetActive(false);
    // An agent dies, hides
    m_NumberOfRemainingPlayers--;
    agent.gameObject.SetActive(false);
    print($"{baddieCol.gameObject.name} ate {agent.transform.name}");

    // Activate Tombstone
    Tombstone.transform.SetPositionAndRotation(agent.transform.position, agent.transform.rotation);
    Tombstone.SetActive(true);

    // Activation key
    Key.transform.SetPositionAndRotation(baddieCol.collider.transform.position, baddieCol.collider.transform.rotation);
    Key.SetActive(true);
}

Here you can try deducting points from the Dragon Agent itself to see if the Agent sacrifices his points for the benefit of the team.

Changing the material of the ground:

IEnumerator GoalScoredSwapGroundMaterial(Material mat, float time)
{
    m_GroundRenderer.material = mat;
    yield return new WaitForSeconds(time); // Wait for 2 sec
    m_GroundRenderer.material = m_GroundMaterial;
}

Here's NPC Dragon's code. It's simple, it's just mobile logic:

using UnityEngine;

public class SimpleNPC : MonoBehaviour
{

    public Transform target;
    private Rigidbody rb;
    public float walkSpeed = 1;
    private Vector3 dirToGo;
	// Execute earlier than Start
    void Awake()
    {
        rb = GetComponent<Rigidbody>();
    }
    void Update()
    {
    }
	// Execute every 0.02 seconds
    void FixedUpdate()
    {
        dirToGo = target.position - transform.position;
        dirToGo.y = 0;
        rb.rotation = Quaternion.LookRotation(dirToGo);
        // Perform Move
        rb.MovePosition(transform.position + transform.forward * walkSpeed * Time.deltaTime);
    }
    // Set a random speed
    public void SetRandomWalkSpeed()
    {
        walkSpeed = Random.Range(1f, 7f);
    }
}

There is also a script hanging under the dragon to detect whether the Dragon touches the cave:

using UnityEngine;
using UnityEngine.Events;

namespace Unity.MLAgentsExamples
{

    public class CollisionCallbacks : MonoBehaviour
    {
 		// There are several events defined below that need to be subscribed to in the Unity editor
        [System.Serializable]
        public class TriggerEvent : UnityEvent<Collider>
        {
        }

        [Header("Trigger Callbacks")]
        public TriggerEvent onTriggerEnterEvent = new TriggerEvent();

 	    // This is the only method used in this case, no other method has subscribed
        private void OnCollisionEnter(Collision col)
        {
            if (col.transform.CompareTag(tagToDetect))
            {
                onCollisionEnterEvent.Invoke(col, transform);
        }      
    }
}

Subscription events:

The following methods are implemented:

public void BaddieTouchedBlock()
{
    m_AgentGroup.EndGroupEpisode();
    StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.failMaterial, 0.5f));
    ResetScene();
}

configuration file

The simplest configuration:

behaviors:
  DungeonEscape:
    trainer_type: poca
    hyperparameters:
      batch_size: 1024
      buffer_size: 10240
      learning_rate: 0.0003
      beta: 0.01
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: constant
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 20000000
    time_horizon: 64
    summary_freq: 60000

Effect demonstration

Postnote

This case is a multi-agent case, which explores the possibility of self-sacrifice by the agent for the benefit of the team. Based on this, we can make a more complex decryption game in the future, which contains unexpected decryption methods for human beings, but the agent can learn them, which is a huge challenge to the setting of reward function.

Tags: Unity

Posted on Mon, 29 Nov 2021 14:00:04 -0500 by h3ktlk