This case is derived from the official ML-Agents example, Github address: https://github.com/Unity-Technologies/ml-agents This is a detailed accompanying explanation.
This article is based on two previous articles I have published. It is necessary to have some knowledge of ML-Agents. For more information, see: Use of ML-Agents for Unity Enhanced Learning,ML-Agents Command and Configuration Complete.
My previous articles are:
Food collectors in ML-Agents cases
ML-Agents Case of Two-player Football
Unity AI's Self-Evolving Five-Man Football Game
Environmental description
-
Setup: Agents are trapped in a dragon dungeon and must work together to escape. In order to retrieve the key, one of the agents must find and kill the Dragon at his own expense. The dragon drops a key for others to use. The other agents can then pick up the key and open the dungeon door. If the agent takes too long, the dragon will escape through the portal and the environment will be reset.
-
Goal: Open the dungeon door and leave.
-
If any agent successfully opens the door and leaves the dungeon, +1 team reward is given.
-
The training difficulty of this project is that in order to reward the team, the agent must learn to sacrifice himself.
-
Input: The input of the agent contains a ray sensor, RayPerceptionSensor3D, which identifies the labels as wall, teammate, dragon, key, door lock, dragon cave. There are 15 rays in total. The parameters are shown in the picture below. For a detailed description of the sensor, see ML-Agents Case Push Box Game.
In addition to the sensor, an input is added to the program that detects if there is a key on the body of the agent.
-
Output: The agent only takes one discrete output, which contains seven, representing nothing, going forward, going backward, going left, turning right, turning left, turning right. Fewer outputs will greatly reduce the complexity of the neural network and training time. The disadvantage is that only one action can be executed at the same time, which reduces the flexibility of the agent, such as not being able to advance and rotate at the same time, nor moving forward and right.
Code Explanation
First, there are three standard sets of Behavior Parameters, Decision Requester, Model Overrider. Only Behavior Parameters need to be adjusted, as shown in the figure above. Their roles have been explained in detail before, but they will not be explained here.
Now look at the main agent code, PushAgent Escape.cs:
Initialize():
public override void Initialize() { // Get Components m_GameController = GetComponentInParent<DungeonEscapeEnvController>(); m_AgentRb = GetComponent<Rigidbody>(); m_PushBlockSettings = FindObjectOfType<PushBlockSettings>(); // No key by default MyKey.SetActive(false); IHaveAKey = false; }
Processing OnEpisodeBegin() method at the beginning of each episode:
public override void OnEpisodeBegin() { MyKey.SetActive(false); IHaveAKey = false; }
State input CollectObservations method:
public override void CollectObservations(VectorSensor sensor) { sensor.AddObservation(IHaveAKey); }
You can see that in addition to the input of the sensor, there is only one input for whether or not you have a key.
Action output OnActionReceived method:
public override void OnActionReceived(ActionBuffers actionBuffers) { MoveAgent(actionBuffers.DiscreteActions); } public void MoveAgent(ActionSegment<int> act) { var dirToGo = Vector3.zero; var rotateDir = Vector3.zero; var action = act[0]; switch (action) { case 1: dirToGo = transform.forward * 1f; break; case 2: dirToGo = transform.forward * -1f; break; case 3: rotateDir = transform.up * 1f; break; case 4: rotateDir = transform.up * -1f; break; case 5: dirToGo = transform.right * -0.75f; break; case 6: dirToGo = transform.right * 0.75f; break; } // Perform Rotation transform.Rotate(rotateDir, Time.fixedDeltaTime * 200f); // Apply force to the rigid body and perform the movement m_AgentRb.AddForce(dirToGo * m_PushBlockSettings.agentRunSpeed, ForceMode.VelocityChange); }
You can see that there is only one discrete output, which contains 0-6 seven values, of which 0 is not doing.
Collision detection:
Collision detection is divided into two parts, where cave, dragon and door lock are collisions and OnCollisionEnter method is called:
void OnCollisionEnter(Collision col) { // When there is a key on the body and the lock is touched, the door opens and the key is consumed. Call the UnlockDoor method if (col.transform.CompareTag("lock")) { if (IHaveAKey) { MyKey.SetActive(false); IHaveAKey = false; m_GameController.UnlockDoor(); } } // When you encounter a dragon, destroy the key on the body (it is virtually impossible to have a key on the body at this time, for logic reasons), and call the KilledByBaddie method if (col.transform.CompareTag("dragon")) { m_GameController.KilledByBaddie(this, col); MyKey.SetActive(false); IHaveAKey = false; } // Call the TouchedHazard method when you encounter a cave if (col.transform.CompareTag("portal")) { m_GameController.TouchedHazard(this); } }
The other part is the key, which is set as a trigger instead of a collider and calls the OnTriggerEnter method:
void OnTriggerEnter(Collider col) { // If the key is under the same parent as the agent and the agent is active // Then deactivate the key and activate the child object key on the body, so it looks like picking up the key if (col.transform.CompareTag("key") && col.transform.parent == transform.parent && gameObject.activeInHierarchy) { print("Picked up key"); MyKey.SetActive(true); IHaveAKey = true; col.gameObject.SetActive(false); } }
If the player wants to manually manipulate one of the agents, he or she needs to override the Heuristic method without a model for the agent:
public override void Heuristic(in ActionBuffers actionsOut) { var discreteActionsOut = actionsOut.DiscreteActions; if (Input.GetKey(KeyCode.D)) { discreteActionsOut[0] = 3; } else if (Input.GetKey(KeyCode.W)) { discreteActionsOut[0] = 1; } else if (Input.GetKey(KeyCode.A)) { discreteActionsOut[0] = 4; } else if (Input.GetKey(KeyCode.S)) { discreteActionsOut[0] = 2; } }
The following explains the script DungeonEscapeEnvController.cs that controls the entire environment:
The script first defines the information classes owned by the agents and dragons, encapsulating key information for easy invocation and making the code more concise and beautiful:
// Define Agent Information Class public class PlayerInfo { // Agent scripts public PushAgentEscape Agent; // Agent Start Location public Vector3 StartingPos; // Agent Initial Rotation Vector public Quaternion StartingRot; // Agent Rigid Body public Rigidbody Rb; // Agent Collisor public Collider Col; } // Define Dragon Information Class public class DragonInfo { // Dragon's script public SimpleNPC Agent; // Dragon's starting position public Vector3 StartingPos; // Dragon's Real Rotation Vector public Quaternion StartingRot; // Dragon's Rigid Body public Rigidbody Rb; // Dragon's Collider public Collider Col; // Starting Transform public Transform T; // Is Death public bool IsDead; }
Then a series of variables are defined:
// The maximum number of steps and time per episode, beyond which the environment will be reset [Header("Max Environment Steps")] public int MaxEnvironmentSteps = 25000; private int m_ResetTimer; // Area Size public Bounds areaBounds; // ground public GameObject ground; // Ground Material Material m_GroundMaterial; // Ground Rendering Renderer m_GroundRenderer; // Agent Information List public List<PlayerInfo> AgentsList = new List<PlayerInfo>(); // Dragon Information List public List<DragonInfo> DragonsList = new List<DragonInfo>(); // Create a dictionary with key as agent script and value as agent information private Dictionary<PushAgentEscape, PlayerInfo> m_PlayerDict = new Dictionary<PushAgentEscape, PlayerInfo>(); // Whether the position and rotation of a random agent public bool UseRandomAgentRotation = true; public bool UseRandomAgentPosition = true; // I reused the push box script without changing its name PushBlockSettings m_PushBlockSettings; // Number of agents in stock private int m_NumberOfRemainingPlayers; // Key public GameObject Key; // Tombstone public GameObject Tombstone; // Intelligence Groups (Most Important) private SimpleMultiAgentGroup m_AgentGroup;
Then you initialize the scene and call the Start method:
void Start() { // Getting ground boundaries areaBounds = ground.GetComponent<Collider>().bounds; // Get ground rendering to change material easily m_GroundRenderer = ground.GetComponent<Renderer>(); // Initial material m_GroundMaterial = m_GroundRenderer.material; // Get global setup script m_PushBlockSettings = FindObjectOfType<PushBlockSettings>(); // Recalculate Agents Existing on the Field m_NumberOfRemainingPlayers = AgentsList.Count; // Hide key Key.SetActive(false); // Add the corresponding information to the agents in the list and add the agents to the group. The agents in the same group will work together m_AgentGroup = new SimpleMultiAgentGroup(); foreach (var item in AgentsList) { item.StartingPos = item.Agent.transform.position; item.StartingRot = item.Agent.transform.rotation; item.Rb = item.Agent.GetComponent<Rigidbody>(); item.Col = item.Agent.GetComponent<Collider>(); // Add to Group m_AgentGroup.RegisterAgent(item.Agent); } // Add information to dragons in the Dragon list foreach (var item in DragonsList) { item.StartingPos = item.Agent.transform.position; item.StartingRot = item.Agent.transform.rotation; item.T = item.Agent.transform; item.Col = item.Agent.GetComponent<Collider>(); } // Reset scene ResetScene(); }
In ResetScene:
void ResetScene() { // Reset Timing m_ResetTimer = 0; // Reset the number of surviving agents m_NumberOfRemainingPlayers = AgentsList.Count; // Rotate the scene in any of four directions to prevent overfitting in one location var rotation = Random.Range(0, 4); var rotationAngle = rotation * 90f; transform.Rotate(new Vector3(0f, rotationAngle, 0f)); // Reset each agent in the list foreach (var item in AgentsList) { // If Random is set, one location is randomized in the scene, and none is fixed var pos = UseRandomAgentPosition ? GetRandomSpawnPos() : item.StartingPos; var rot = UseRandomAgentRotation ? GetRandomRot() : item.StartingRot; item.Agent.transform.SetPositionAndRotation(pos, rot); // All states cleared item.Rb.velocity = Vector3.zero; item.Rb.angularVelocity = Vector3.zero; item.Agent.MyKey.SetActive(false); item.Agent.IHaveAKey = false; item.Agent.gameObject.SetActive(true); // I think this line can be removed without having to add it again m_AgentGroup.RegisterAgent(item.Agent); } // Reset Key Key.SetActive(false); // Reset Tombstone Tombstone.SetActive(false); // Reset each dragon in the list foreach (var item in DragonsList) { if (!item.Agent) { return; } // Set a fixed starting position item.Agent.transform.SetPositionAndRotation(item.StartingPos, item.StartingRot); // Set Random Walking Speed item.Agent.SetRandomWalkSpeed(); // Activate Agent item.Agent.gameObject.SetActive(true); } }
When you get a location in any scene, GetRandomSpawnPos is called, and this code is very reusable.
public Vector3 GetRandomSpawnPos() { var foundNewSpawnLocation = false; var randomSpawnPos = Vector3.zero; while (foundNewSpawnLocation == false) { var randomPosX = Random.Range(-areaBounds.extents.x * m_PushBlockSettings.spawnAreaMarginMultiplier, areaBounds.extents.x * m_PushBlockSettings.spawnAreaMarginMultiplier); var randomPosZ = Random.Range(-areaBounds.extents.z * m_PushBlockSettings.spawnAreaMarginMultiplier, areaBounds.extents.z * m_PushBlockSettings.spawnAreaMarginMultiplier); randomSpawnPos = ground.transform.position + new Vector3(randomPosX, 1f, randomPosZ); // Check to see if there are collisions at the generated location, regenerate if there are, or exit the loop if there are no collisions if (Physics.CheckBox(randomSpawnPos, new Vector3(2.5f, 0.01f, 2.5f)) == false) { foundNewSpawnLocation = true; } } return randomSpawnPos; }
The next FixedUpdate method is executed every 0.02 seconds:
The main test here is whether an episode has reached the set time and maximum number of steps, while satisfying both is an environment reset.
void FixedUpdate() { m_ResetTimer += 1; if (m_ResetTimer >= MaxEnvironmentSteps && MaxEnvironmentSteps > 0) { m_AgentGroup.GroupEpisodeInterrupted(); ResetScene(); } }
Next, three methods are defined for contacting dragons, caves and door locks:
When an agent touches a cave:
public void TouchedHazard(PushAgentEscape agent) { // Agent death, number-1, number-0 reset environment m_NumberOfRemainingPlayers--; if (m_NumberOfRemainingPlayers == 0 || agent.IHaveAKey) { m_AgentGroup.EndGroupEpisode(); ResetScene(); } else { agent.gameObject.SetActive(false); } }
When the agent touches the door lock:
public void UnlockDoor() { // Get collective rewards m_AgentGroup.AddGroupReward(1f); // Change the floor material for 0.5 seconds StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.goalScoredMaterial, 0.5f)); print("Unlocked Door"); // End Game m_AgentGroup.EndGroupEpisode(); // Reset scene ResetScene(); }
When an agent touches a dragon:
public void KilledByBaddie(PushAgentEscape agent, Collision baddieCol) { // Dragon killed, hidden baddieCol.gameObject.SetActive(false); // An agent dies, hides m_NumberOfRemainingPlayers--; agent.gameObject.SetActive(false); print($"{baddieCol.gameObject.name} ate {agent.transform.name}"); // Activate Tombstone Tombstone.transform.SetPositionAndRotation(agent.transform.position, agent.transform.rotation); Tombstone.SetActive(true); // Activation key Key.transform.SetPositionAndRotation(baddieCol.collider.transform.position, baddieCol.collider.transform.rotation); Key.SetActive(true); }
Here you can try deducting points from the Dragon Agent itself to see if the Agent sacrifices his points for the benefit of the team.
Changing the material of the ground:
IEnumerator GoalScoredSwapGroundMaterial(Material mat, float time) { m_GroundRenderer.material = mat; yield return new WaitForSeconds(time); // Wait for 2 sec m_GroundRenderer.material = m_GroundMaterial; }
Here's NPC Dragon's code. It's simple, it's just mobile logic:
using UnityEngine; public class SimpleNPC : MonoBehaviour { public Transform target; private Rigidbody rb; public float walkSpeed = 1; private Vector3 dirToGo; // Execute earlier than Start void Awake() { rb = GetComponent<Rigidbody>(); } void Update() { } // Execute every 0.02 seconds void FixedUpdate() { dirToGo = target.position - transform.position; dirToGo.y = 0; rb.rotation = Quaternion.LookRotation(dirToGo); // Perform Move rb.MovePosition(transform.position + transform.forward * walkSpeed * Time.deltaTime); } // Set a random speed public void SetRandomWalkSpeed() { walkSpeed = Random.Range(1f, 7f); } }
There is also a script hanging under the dragon to detect whether the Dragon touches the cave:
using UnityEngine; using UnityEngine.Events; namespace Unity.MLAgentsExamples { public class CollisionCallbacks : MonoBehaviour { // There are several events defined below that need to be subscribed to in the Unity editor [System.Serializable] public class TriggerEvent : UnityEvent<Collider> { } [Header("Trigger Callbacks")] public TriggerEvent onTriggerEnterEvent = new TriggerEvent(); // This is the only method used in this case, no other method has subscribed private void OnCollisionEnter(Collision col) { if (col.transform.CompareTag(tagToDetect)) { onCollisionEnterEvent.Invoke(col, transform); } } }
Subscription events:
The following methods are implemented:
public void BaddieTouchedBlock() { m_AgentGroup.EndGroupEpisode(); StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.failMaterial, 0.5f)); ResetScene(); }
configuration file
The simplest configuration:
behaviors: DungeonEscape: trainer_type: poca hyperparameters: batch_size: 1024 buffer_size: 10240 learning_rate: 0.0003 beta: 0.01 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: constant network_settings: normalize: false hidden_units: 256 num_layers: 2 vis_encode_type: simple reward_signals: extrinsic: gamma: 0.99 strength: 1.0 keep_checkpoints: 5 max_steps: 20000000 time_horizon: 64 summary_freq: 60000
Effect demonstration
Postnote
This case is a multi-agent case, which explores the possibility of self-sacrifice by the agent for the benefit of the team. Based on this, we can make a more complex decryption game in the future, which contains unexpected decryption methods for human beings, but the agent can learn them, which is a huge challenge to the setting of reward function.