the dreamer v3 changes? #269

Disastorm · 2024-04-24T11:53:48Z

Hey I noticed you guys updated your dreamer v3.

What exactly is the difference now, I see, for example, that there are no longer any "updates before training" steps, and i remember before there was the setting related to the memory buffer, is that still there or does it just automatically always use/not-use the buffer?

I think some of the code related to exploration decay is gone as well? I think maybe it originally didnt work properly anyway, but is that completely unsupported now?

Lastly, perhaps one of the biggest differences ( unless i'm mistaken ) is that each step seems to be alot slower than before at least in terms of quantity. I'm not sure about the actual performance over time, but in terms of step count over time it seems way slower than before, potentially somewhere between 5-10 times slower? Am I understanding this right, or did I mess something up?

belerico · 2024-04-24T12:45:15Z

Hi @Disastorm, we have updated quite a lot of things in Dreamer-V3.

I see, for example, that there are no longer any "updates before training" steps

This has been replaced with the general raplay-ratio, i.e. the number of gradient steps per environment interaction (without considering the action repeat), as specified in #223

I think some of the code related to exploration decay is gone as well?

Yep. There's no need for exploration decay since it was never used. The exploration comes from the entropy of the policy which encourages to be explorative and it's controlled by its weight cfg.algo.actor.ent_coef

Lastly, perhaps one of the biggest differences ( unless i'm mistaken ) is that each step seems to be alot slower than before at least in terms of quantity

Yes, the slowness comes from the fact that to maintain the replay-ratio specified from the config it may be the case that the agent performs more gradient steps per update. If you want to be faster and less sample-efficient you should lower the replay-ratio

We have also modified the signal from which the continue-predictor learns, i.e. now it learns just from the terminated flag instead from both the terminated and truncated: in this way it can bootstrap better the values.

We fixed the distribution for the continuous actions also.

You can check everything we have modified from the last release note.

For all those things I suggest you to retrain the agent and see if you get better results, which I'm expecting.
If you have any issues please tell us.
Thank you

Disastorm · 2024-04-24T13:40:00Z

Thanks what about the buffer setting?
Previously, if it was buffer.checkpoint = True, it would use a memory buffer to resume training. If it was false it did that "updates before training" thing instead.

How does it work now when resuming training? Is it still the same?

Also was having some problems and noticed it looks like the algo config doesn't override the exp config for the replay_ratio variable. Is that the correct order?

michele-milesi · 2024-04-24T16:20:00Z

Yes, it is the same: the agent performs an initial environment interaction (equal to the learning starts) and then starts training. The updates_before_training is replaced by the ratio (its state is saved in the checkpoint, so the training will start at the correct step).

sheeprl/sheeprl/algos/dreamer_v3/dreamer_v3.py

Line 511 in 2bae379

learning_starts += start_step

For the second question: yes, this is the correct behaviour. The experiment configuration is used to compose all other configurations. If you want to overwrite some value, you can do so directly in the experiment configuration: anything defined in the experiment config will overwrite the value defined elsewhere.

Disastorm · 2024-04-25T06:31:20Z

Ok Cool thanks I guess that explains all the changes.

Disastorm · 2024-04-26T05:51:45Z

@belerico *edit made a new issue #273

Disastorm · 2024-04-26T05:53:44Z

Disastorm · 2024-04-26T08:49:13Z

made a new issue

belerico added the question Further information is requested label Apr 25, 2024

belerico closed this as completed Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the dreamer v3 changes? #269

the dreamer v3 changes? #269

Disastorm commented Apr 24, 2024 •

edited

Loading

belerico commented Apr 24, 2024 •

edited

Loading

Disastorm commented Apr 24, 2024 •

edited

Loading

michele-milesi commented Apr 24, 2024

Disastorm commented Apr 25, 2024

Disastorm commented Apr 26, 2024 •

edited

Loading

Disastorm commented Apr 26, 2024

Disastorm commented Apr 26, 2024

the dreamer v3 changes? #269

the dreamer v3 changes? #269

Comments

Disastorm commented Apr 24, 2024 • edited Loading

belerico commented Apr 24, 2024 • edited Loading

Disastorm commented Apr 24, 2024 • edited Loading

michele-milesi commented Apr 24, 2024

Disastorm commented Apr 25, 2024

Disastorm commented Apr 26, 2024 • edited Loading

Disastorm commented Apr 26, 2024

Disastorm commented Apr 26, 2024

Disastorm commented Apr 24, 2024 •

edited

Loading

belerico commented Apr 24, 2024 •

edited

Loading

Disastorm commented Apr 24, 2024 •

edited

Loading

Disastorm commented Apr 26, 2024 •

edited

Loading