-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Checkpointing
enhancements: Experimentally support msgpack
and separate state from architecture.
#49497
Conversation
Checkpointing
enhancements: Experimentally support msgpack
and separate of state from architecture.Checkpointing
enhancements: Experimentally support msgpack
and separate state from architecture.
…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -3459,6 +3460,7 @@ def rl_module( | |||
def experimental( | |||
self, | |||
*, | |||
_use_msgpack_checkpoints: Optional[bool] = NotProvided, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
- just for testing purposes, restores the entire algorithm from the latest | ||
checkpoint and checks, whether the state of the restored algo exactly match the | ||
state of the previously saved one. | ||
- then changes the original config used (learning rate and other settings) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome example!!
f"p{aid}": PolicySpec( | ||
config=AlgorithmConfig.overrides( | ||
lr=5e-5 | ||
* (aid + 1), # agent 1 has double the learning rate as 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have more than 2 agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think, then this example breaks :( . I'll just force the setting to always be 2 in this particular example script, otherwise produce an error. :)
test_eval_results = test_algo.evaluate() | ||
assert ( | ||
test_eval_results[ENV_RUNNER_RESULTS][EPISODE_RETURN_MEAN] | ||
>= args.stop_reward_first_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Does this inequality always hold? We train for mean rewards, so theoretically there might be cases where this inequality does not hold, aren't there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but statistically, I think the chance is super low. We also evaluate, meaning we use greedy actions, which perform much stronger that the stochastic ones used during training.
) | ||
|
||
class LoadP0OnAlgoInitCallback(DefaultCallbacks): | ||
def on_algorithm_init(self, *, algorithm, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright so users should always load certain modules via callback? If so my suggestion would be to provide a callback that does this for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the most expressive and transparent way, yes.
- Working on another PR allowing to pass a simple, single
on_algorithm_init
callback lambda to the config. This way, users don't have to provide these clumsy classes anymore (they still can, but don't have to). - This also avoids having these paths in the module specs. Imo, they don't belong in there.
policies_to_train=all_pols, | ||
) | ||
expanded_config.rl_module( | ||
algorithm_config_overrides_per_module={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture
…` and separate state from architecture. (#49497)
Checkpointing
API enhancements:msgpack
checkpoints on the new API stack.get_state
-> dict) from architecture (get_ctor_args_and_kwargs
)..from_checkpoint([old msgpack checkpoint path])
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.