-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the dreamer v3 changes? #269
Comments
Hi @Disastorm, we have updated quite a lot of things in Dreamer-V3.
This has been replaced with the general raplay-ratio, i.e. the number of gradient steps per environment interaction (without considering the action repeat), as specified in #223
Yep. There's no need for exploration decay since it was never used. The exploration comes from the entropy of the policy which encourages to be explorative and it's controlled by its weight
Yes, the slowness comes from the fact that to maintain the replay-ratio specified from the config it may be the case that the agent performs more gradient steps per update. If you want to be faster and less sample-efficient you should lower the replay-ratio We have also modified the signal from which the We fixed the distribution for the continuous actions also. You can check everything we have modified from the last release note. For all those things I suggest you to retrain the agent and see if you get better results, which I'm expecting. |
Thanks what about the buffer setting? How does it work now when resuming training? Is it still the same? Also was having some problems and noticed it looks like the algo config doesn't override the exp config for the replay_ratio variable. Is that the correct order? |
Yes, it is the same: the agent performs an initial environment interaction (equal to the learning starts) and then starts training. The
For the second question: yes, this is the correct behaviour. The experiment configuration is used to compose all other configurations. If you want to overwrite some value, you can do so directly in the experiment configuration: anything defined in the experiment config will overwrite the value defined elsewhere. |
Ok Cool thanks I guess that explains all the changes. |
made a new issue |
Hey I noticed you guys updated your dreamer v3.
What exactly is the difference now, I see, for example, that there are no longer any "updates before training" steps, and i remember before there was the setting related to the memory buffer, is that still there or does it just automatically always use/not-use the buffer?
I think some of the code related to exploration decay is gone as well? I think maybe it originally didnt work properly anyway, but is that completely unsupported now?
Lastly, perhaps one of the biggest differences ( unless i'm mistaken ) is that each step seems to be alot slower than before at least in terms of quantity. I'm not sure about the actual performance over time, but in terms of step count over time it seems way slower than before, potentially somewhere between 5-10 times slower? Am I understanding this right, or did I mess something up?
The text was updated successfully, but these errors were encountered: