Hands-on project after finishing Deepmind's RL course. Decided to learn and use JAX for the implementations.
🚧 The project is still in progress.
Utilities:
- Experience accumulator
- ✔️ by episodes
- 🔲 by transitions (🚧 in progress)
- ✔️ Training experiment
Environments:
- gym
- ✔️ Black-jack
- ✔️ Cartpole
- 🔲 Atari (🚧 in progress)
- mujoco ?
- evogym (so cool, must try)
Algorithms:
- Value function approximator
- ✔️ Tabular
- ✔️ Linear
- ✔️ Neural Nets
- Value approximation/heuristic
- TD
- ✔️ TD(0)
- ✔️ n-step TD
- ✔️ TD(λ)
- Q-learning
- ✔️ vanilla q-learning
- 🔲 λ q-learning
- TD
- Simple agents
- ✔️ Tabular + TD,Q (with ε-greedy)
- ✔️ Linear + Q (with ε-greedy)
- DQN
- ✔️ Barebones (NN + Q)
- 🔲 Vanilla DQN (:construction: in progress)
- 🔲 Rainbow?
- Policy Gradient
- 🔲 Vanilla
- 🔲 Trust Region/PPO?
- Model-based ?
- GVF ?
- Combining with Evolutionary ?