Developing a basic helicopter controller as a foundation for the heuristic behaviour.
Flies well enough without Pitch and Roll being controlled via the mouse if speed is kept to a minimum above the hover thrust.
private void FixedUpdate()
{
rb_Heli.AddForce(transform.up * throttle, ForceMode.Impulse);
rb_Heli.AddTorque(transform.right * pitch * responsiveness);
rb_Heli.AddTorque(-transform.forward * roll * responsiveness);
rb_Heli.AddTorque(transform.up * yaw * responsiveness);
}
Set Heuristics to get the same results as the controller. 4 Discrete actions, each with a branch size of 3.
public override void Heuristic(in ActionBuffers actionsOut)
{
var discreteActionsOut = actionsOut.DiscreteActions;
// Roll: 'a' (left), 'd' (right) Z axis
discreteActionsOut[0] = Input.GetKey(KeyCode.Q) ? 1 : Input.GetKey(KeyCode.E) ? 2 : 0;
// Pitch: 's' (down), 'w' (up) X axis
discreteActionsOut[1] = Input.GetKey(KeyCode.S) ? 1 : Input.GetKey(KeyCode.W) ? 2 : 0;
// Yaw: 'q' (left), 'e' (right) Y axis
discreteActionsOut[2] = Input.GetKey(KeyCode.A) ? 1 : Input.GetKey(KeyCode.D) ? 2 : 0;
// Throttle control: 'Space' (increase), 'LeftShift' (decrease)
discreteActionsOut[3] = Input.GetKey(KeyCode.Space) ? 1 : Input.GetKey(KeyCode.LeftShift) ? 2 : 0;
}
No Demonstration Recorder used for training as Agents usually train faster than the time it takes to get it configured. Decision Requester originally set at 10 but then brought back down to 3 as throttle was rarely used. Added rewards and logging the throttle being used for the first time and reaching RPMs needed for take off.
private void HasLeftPad()
{
if (transform.position.y > initialYPos + takeOffThreshold)
{
// Log agentThrottle first time Y position increases by takeOffThreshold
if (!hasTakenOff)
{
Debug.Log("<color=green>AGENT THROTTLE AT TAKEOFF: </color>" + agentThrottle);
AddReward(0.25f);
hasTakenOff = true;
}
}
}
Setting a trigger collider for the chopper to crash into and changing its orientation on EpisodeBegin to make things a bit more difficult. Added a secondary Ray Perception Sensor to view objects at ground level.
Learning has been quite slow, so I've had to move to Colabs to keep the load on the local workstations low. The first training runs in Colab returned much better results. I don't understand why this is happening as none of the hyperparameters were changed.
The good luck, however, didn't last too long as there are now frequent disconnects deleting all training data in the Virtual Machine.
Cerdanya Valley, Spain
No Tracking ~ No Cookies