Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the dreamer v3 changes? #269

Closed
Disastorm opened this issue Apr 24, 2024 · 7 comments
Closed

the dreamer v3 changes? #269

Disastorm opened this issue Apr 24, 2024 · 7 comments
Labels
question Further information is requested

Comments

@Disastorm
Copy link

Disastorm commented Apr 24, 2024

Hey I noticed you guys updated your dreamer v3.

What exactly is the difference now, I see, for example, that there are no longer any "updates before training" steps, and i remember before there was the setting related to the memory buffer, is that still there or does it just automatically always use/not-use the buffer?

I think some of the code related to exploration decay is gone as well? I think maybe it originally didnt work properly anyway, but is that completely unsupported now?

Lastly, perhaps one of the biggest differences ( unless i'm mistaken ) is that each step seems to be alot slower than before at least in terms of quantity. I'm not sure about the actual performance over time, but in terms of step count over time it seems way slower than before, potentially somewhere between 5-10 times slower? Am I understanding this right, or did I mess something up?

@belerico
Copy link
Member

belerico commented Apr 24, 2024

Hi @Disastorm, we have updated quite a lot of things in Dreamer-V3.

I see, for example, that there are no longer any "updates before training" steps

This has been replaced with the general raplay-ratio, i.e. the number of gradient steps per environment interaction (without considering the action repeat), as specified in #223

I think some of the code related to exploration decay is gone as well?

Yep. There's no need for exploration decay since it was never used. The exploration comes from the entropy of the policy which encourages to be explorative and it's controlled by its weight cfg.algo.actor.ent_coef

Lastly, perhaps one of the biggest differences ( unless i'm mistaken ) is that each step seems to be alot slower than before at least in terms of quantity

Yes, the slowness comes from the fact that to maintain the replay-ratio specified from the config it may be the case that the agent performs more gradient steps per update. If you want to be faster and less sample-efficient you should lower the replay-ratio

We have also modified the signal from which the continue-predictor learns, i.e. now it learns just from the terminated flag instead from both the terminated and truncated: in this way it can bootstrap better the values.

We fixed the distribution for the continuous actions also.

You can check everything we have modified from the last release note.

For all those things I suggest you to retrain the agent and see if you get better results, which I'm expecting.
If you have any issues please tell us.
Thank you

@Disastorm
Copy link
Author

Disastorm commented Apr 24, 2024

Thanks what about the buffer setting?
Previously, if it was buffer.checkpoint = True, it would use a memory buffer to resume training. If it was false it did that "updates before training" thing instead.

How does it work now when resuming training? Is it still the same?

Also was having some problems and noticed it looks like the algo config doesn't override the exp config for the replay_ratio variable. Is that the correct order?

@michele-milesi
Copy link
Member

Yes, it is the same: the agent performs an initial environment interaction (equal to the learning starts) and then starts training. The updates_before_training is replaced by the ratio (its state is saved in the checkpoint, so the training will start at the correct step).

learning_starts += start_step

For the second question: yes, this is the correct behaviour. The experiment configuration is used to compose all other configurations. If you want to overwrite some value, you can do so directly in the experiment configuration: anything defined in the experiment config will overwrite the value defined elsewhere.

@Disastorm
Copy link
Author

Ok Cool thanks I guess that explains all the changes.

@belerico belerico added the question Further information is requested label Apr 25, 2024
@Disastorm
Copy link
Author

Disastorm commented Apr 26, 2024

@belerico *edit made a new issue #273

@Disastorm
Copy link
Author

image

@Disastorm
Copy link
Author

made a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants