-
Notifications
You must be signed in to change notification settings - Fork 6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RLlib] DreamerV3: Main algo code and required changes to some RLlib …
…APIs (RolloutWorker). (#35386)
- Loading branch information
Showing
50 changed files
with
3,149 additions
and
379 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# DreamerV3 | ||
Implementation (TensorFlow/Keras) of the "DreamerV3" model-based reinforcement learning | ||
(RL) algorithm by D. Hafner et al. 2023 | ||
|
||
DreamerV3 train a world model in supervised fashion using real environment | ||
interactions. The world model utilizes a recurrent GRU-based architecture | ||
("recurrent state space model" or RSSM) and uses it to predicts rewards, | ||
episode continuation flags, as well as, observations. | ||
With these predictions (dreams) made by the world model, both actor | ||
and critic are trained in classic REINFORCE fashion. In other words, the | ||
actual RL components of the model are never trained on actual environment data, | ||
but on dreamed trajectories only. | ||
|
||
For more algorithm details, see: | ||
|
||
[1] Mastering Diverse Domains through World Models - 2023 | ||
D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap | ||
https://arxiv.org/pdf/2301.04104v1.pdf | ||
|
||
.. and the "DreamerV2" paper: | ||
|
||
[2] Mastering Atari with Discrete World Models - 2021 | ||
D. Hafner, T. Lillicrap, M. Norouzi, J. Ba | ||
https://arxiv.org/pdf/2010.02193.pdf | ||
|
||
## Results | ||
TODO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
""" | ||
[1] Mastering Diverse Domains through World Models - 2023 | ||
D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap | ||
https://arxiv.org/pdf/2301.04104v1.pdf | ||
[2] Mastering Atari with Discrete World Models - 2021 | ||
D. Hafner, T. Lillicrap, M. Norouzi, J. Ba | ||
https://arxiv.org/pdf/2010.02193.pdf | ||
""" | ||
from ray.rllib.algorithms.dreamerv3.dreamerv3 import DreamerV3, DreamerV3Config | ||
|
||
__all__ = [ | ||
"DreamerV3", | ||
"DreamerV3Config", | ||
] |
Oops, something went wrong.