Report

Solution and Results

The algorithm used is DDPG described in this paper. PyTorch is used to build the four neural networks. Specifically two Actor networks (which estimate best actions) and two Critic networks (which estimate discounted reward given a state and an actions).

Unfortunately, with this approach the agent learns very slowly, hence seems not suitable for this challenge (see the image below).

Next Steps

As next step we target to use a parallel framework (several agents gaining experience from the environment in parallel) and applying the recently proposed algorithm D4PG. You can read about it in this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report.md

Report.md

Report

Solution and Results

Next Steps

Files

Report.md

Latest commit

History

Report.md

File metadata and controls

Report

Solution and Results

Next Steps