🚧 🛠️👷♀️ 🛑 Under construction...
Install the required dependencies using the following command:
pip install -r requirements.txt
You can run the algorithm on any supported Gymnasium environment. For example:
python main.py --env 'LunarLanderContinuous-v2'
Notes: Reward scaling appears to work really well for some environments (BipedalWalker) but it might be limiting the upper bound of performance on some other environments. I've increased the number of episodes to 50k for the Mujoco environments, if that gives the agent enough time to learn I'll rerun on the Gymnasium ones. Examples in the paper train for millions of timesteps...
Pendulum-v1 |
MountainCarContinuous-v0 |
LunarLanderContinuous-v2 |
Pusher-v4 |
Reacher-v4 |
InvertedPendulum-v4 |
BipedalWalker-v3 |
InvertedDoublePendulum-v4 |
Walker2d-v4 |
Ant-v4 |
HalfCheetah-v4 |
Swimmer-v3 |
Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.