Deep Reinforcement Learning is a really interesting modern technology and so I decided to implement an PPO (from the family of Policy Gradient Methods) algorithm in Tensorflow 2.0. Blueprint is the PPO algorithm develped by OpenAI (https://arxiv.org/abs/1707.06347).
For test reasons I designed four simple training environments with Unity 3D and ML-Agents. You can use this algorithm with executable Unity 3D files and in the Unity 3D Editor.
- CartPole
- RollerBall
- BallSorter
- BallSorterVisualObs
-
Clone PPO Repo and run pip install -e in the PPO folder
-
Clone Environments Repo
Put the repos in an project-folder. You shold have following file structure.
Project | Envs | PPO
-
(Optional) If you are familiar with ML-Agents you can also clone this Repo and run from the Unity 3D Editor.
-
Set the configs in the *.yaml file that you want to use
Standard config = `__Example__.yaml` (is loaded by default if no config is specified) Standard directory = __WORKING_DIRS__/__STANDARD__/__EXAMPLE__.yaml
- Set env_name (path + filename) to the Unity 3D executeable
- Set nn_architecure based on the environment to train (Vec Obs, Visual Obs, mixed, ...)
- Set training and policy parameters (lr, hidden sizes of network, ...)
-
Run python main.py and specify --runner=run-ppo --working_dir=./path/to/your/working_dir --config=your_config.yaml
python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/CartPole/ --config=CartPole.yaml
python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/RollerBall/ --config=RollerBall.yaml
python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/BallSorter/ --config=BallSorter.yaml
python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/BallSorterVisualObs/ --config=BallSorterVisualObs.yaml
-
Watch the agent learn
-
Experiment with the environments