This is an implementation of the RL algorithms in Python.
Сurrently implemented algorithms:
- Q-Learning
- SARSA
- Deep Q Network (DQN)
- Double Deep Q Network (DDQN)
- Deep Deterministic Policy Gradient (DDPG)
- Vanilla Policy Gradient (VPG)
- Advantage Actor Critic (A2C)
- Proximal Policy Optimization (PPO)
Check out the examples folder for training demonstrations.