v0.3.1
v0.3.1 Release Notes
In this release we have refactored some names inside every algorithm, in particular:
- we have introduced the concept of
policy_step
, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one hasn
ranks andm
environments per rank, then the number of policy steps per environment step ispolicy_steps = n * m
We have also refactored the hydra configs, in particular:
- we have introduced both the
metric
,checkpoint
andbuffer
config, containing the shared hyperparameters for those objects in every algorithm - the
metric
config has themetric.log_every
parameter, which controls the logging frequency. Since it's hard for thepolicy_step
variable to be divisible by themetric.log_every
value, the logging will happen as soon aspolicy_step - last_log >= cfg.metric.log_every
, withlast_log = policy_step
is updated everytime something is logged - the
checkpoint
has theevery
andresume_from
parameters. Theevery
parameter works as themetric.log_every
one, while theresume_from
specifies the experiment folder, which must contain the.hydra
folder, to resume the training from. This is now only supported by the Dreamer algorithms num_envs
andclip_reward
have been moved to theenv
config