Release v0.3.1 · Eclectic-Sheep/sheeprl

v0.3.1 Release Notes

In this release we have refactored some names inside every algorithm, in particular:

we have introduced the concept of policy_step, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one has n ranks and m environments per rank, then the number of policy steps per environment step is policy_steps = n * m

We have also refactored the hydra configs, in particular:

we have introduced both the metric, checkpoint and buffer config, containing the shared hyperparameters for those objects in every algorithm
the metric config has the metric.log_every parameter, which controls the logging frequency. Since it's hard for the policy_step variable to be divisible by the metric.log_every value, the logging will happen as soon as policy_step - last_log >= cfg.metric.log_every, with last_log = policy_step is updated everytime something is logged
the checkpoint has the every and resume_from parameters. The every parameter works as the metric.log_every one, while the resume_from specifies the experiment folder, which must contain the .hydra folder, to resume the training from. This is now only supported by the Dreamer algorithms
num_envs and clip_reward have been moved to the env config