In this casual four day hackathon:
-
As a warm up, I created an extremely simple learning algorithm for the cartpole environment.
-
I implemented a deep deterministic policy gradient with a neural network for the OpenAI gym pendulum environment. read more about DDPG https://arxiv.org/pdf/1509.02971v5.pdf
-
I (fruitlessly) attempted to extend the DDPG to the humanoid environment. (the result is quite funny if you run it).
This project marks my first experience with TensorFlow, though I had previously implemented neural networks in other languages.