Proximal Policy Optimization (PPO) with Tensorflow 2.0

Deep Reinforcement Learning is a really interesting modern technology and so I decided to implement an PPO (from the family of Policy Gradient Methods) algorithm in Tensorflow 2.0. Blueprint is the PPO algorithm develped by OpenAI (https://arxiv.org/abs/1707.06347).

For test reasons I designed four simple training environments with Unity 3D and ML-Agents. You can use this algorithm with executable Unity 3D files and in the Unity 3D Editor.

- CartPole 
- RollerBall 
- BallSorter
- BallSorterVisualObs

How to use

Clone PPO Repo and run pip install -e in the PPO folder
Clone Environments Repo

Put the repos in an project-folder. You shold have following file structure.
```
Project
    |
    Envs 
    |     
    PPO     
```
(Optional) If you are familiar with ML-Agents you can also clone this Repo and run from the Unity 3D Editor.
Set the configs in the *.yaml file that you want to use
```
 Standard config = `__Example__.yaml` (is loaded by default if no config is specified) 
 Standard directory = __WORKING_DIRS__/__STANDARD__/__EXAMPLE__.yaml
```
- Set env_name (path + filename) to the Unity 3D executeable
- Set nn_architecure based on the environment to train (Vec Obs, Visual Obs, mixed, ...)
- Set training and policy parameters (lr, hidden sizes of network, ...)

Run python main.py and specify --runner=run-ppo --working_dir=./path/to/your/working_dir --config=your_config.yaml

Run CartPole

 python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/CartPole/ --config=CartPole.yaml

Run RollerBall

 python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/RollerBall/ --config=RollerBall.yaml

Run BallSorter

  python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/BallSorter/ --config=BallSorter.yaml

Run BallSorterVisualObs

 python main.py --runner=run-ppo --working_dir=./__WORKING_DIRS__/BallSorterVisualObs/ --config=BallSorterVisualObs.yaml

Watch the agent learn
Experiment with the environments

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
__WORKING_DIRS__		__WORKING_DIRS__
algorithms		algorithms
environment		environment
manager		manager
runners		runners
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
main.py		main.py
readme.md		readme.md
run_manager.py		run_manager.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (PPO) with Tensorflow 2.0

How to use

Run CartPole

Run RollerBall

Run BallSorter

Run BallSorterVisualObs

About

Releases

Packages

Languages

jw1401/PPO-Tensorflow-2.0

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (PPO) with Tensorflow 2.0

How to use

Run CartPole

Run RollerBall

Run BallSorter

Run BallSorterVisualObs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages