Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 811 Bytes

Report.md

File metadata and controls

13 lines (7 loc) · 811 Bytes

Report

Solution and Results

The algorithm used is DDPG described in this paper. PyTorch is used to build the four neural networks. Specifically two Actor networks (which estimate best actions) and two Critic networks (which estimate discounted reward given a state and an actions).

Unfortunately, with this approach the agent learns very slowly, hence seems not suitable for this challenge (see the image below).

alt text

Next Steps

As next step we target to use a parallel framework (several agents gaining experience from the environment in parallel) and applying the recently proposed algorithm D4PG. You can read about it in this paper.