The Adventures in Unity ML-Agents blog aims to explore the exciting and constantly changing world of reinforcement and imitation learning using the Unity ML-Agents toolkit. Although the ML-Agents toolkit includes a number of great example environments, we have developed our own set of example RL environments/games to help get you started. Namely, a basic implementation of three classic reinforcement-learning games: Catch Ball, Wall Pong and Pong.
The Unity package can be downloaded from GitHub:
Detailed instructions on how to install, train and test the three games using the ML-Agents toolkit are provided in the GitHub repository. We have also put together a complete tutorial on how to get started with the Unity ML-Agents toolkit by walking through the creation of Wall Pong using Unity and the ML-Agents Toolkit. This tutorial can be found on our Getting Started page:
Catch-Ball requires a paddle agent to learn how to “catch” balls that fall from the top of the game area. The agent observes 5 state variables (x and z ball position, x and z ball velocity, and the x position of the agent paddle), can perform 3 actions (do nothing, move paddle left, or move paddle right), and receives a reward of +1 for every time the paddle catches the ball and a reward of -1 if the paddle misses the ball.
Wall pong is a single-agent adaptation of the classic game Pong. The aim of the game is to move a paddle back a forth in order to bounce a ball against the walls (bounds) of a rectangular game area. The agent can perform 4 actions [do nothing, fire ball (i.e., start game), move paddle left, and move paddle right] and observe 5 state variables (x and z ball position, x and z ball velocity, and the x position of the agent’s paddle), with the agent receiving a reward of +1 every time the ball hits the paddle and a reward of -1 if the paddle misses the ball. The aim of the game is to keep the ball in play for as long as possible (the more times the paddle hits the ball, the higher the agent’s score/reward).
The classic game Pong requires no introduction: old-school 2-D digital tennis. Two agents (paddles) compete, with an agent receiving a reward of +1 if it wins and a reward of -1 if it loses. The implementation also includes a small positive reward (e.g., .001) for every time an agent’s paddle hits the ball, which can be increased or deceased to increase or decease, respectively, the degree of cooperative ‘rally’ play. To stimulate game play, agents also receive a small negative reward for not initiate a new game (i.e., not firing the ball). Agents can perform 4 actions [do nothing, fire ball (i.e., start game), move paddle left, and move paddle right] and observe 6 state variables (x and z ball position, x and z ball velocity, and the z position of the agent’s and opponent agent’s paddle).
What the Examples (and the Getting Started Tutorial) Highlight
- How to install the ML-Agents toolkit
- How to build a simple Unity environment for the ML-Agents toolkit
- How to setup the ML-Agents components: Academy, Brain and Agent
- How to test a game by playing it yourself.
- How to train a Unity ML-Agent.
- How to rapidly decease training time by training an agent using multiple game arenas (sub-environments) simultaneously.
- How to test a trained model.
- Plus, future posts will use these example games to explore such issues as: discrete vs. continuous control; whether two brains are better than one; and how to train a Unity ML-Agent using PyTorch.