[RLlib] `Checkpointing` enhancements: Experimentally support `msgpack` and separate state from architecture. #49497

sven1977 · 2024-12-30T12:19:58Z

Checkpointing API enhancements:

Experimentally support msgpack checkpoints on the new API stack.
Separate of state (get_state -> dict) from architecture (get_ctor_args_and_kwargs).
- This allows for restoring from any checkpoint, also older ones from older python versions.
- Users need to bring their own (updated, current-ray-version config or other c'tor args and kwargs) and then do the normal .from_checkpoint([old msgpack checkpoint path])
Added comprehensive backward compatibility tests for checkpoints. At each ray version (starting from 2.40), we generate a complex multi-agent checkpoint, using the same script, add it to the repo, and then make sure through the CI that all these checkpoints always work going forward, w/o ever touching them again.
Added example script for how to continue training with a different config, using all the above new features.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture

simonsays1980

LGTM.

simonsays1980 · 2024-12-31T12:24:52Z

rllib/algorithms/algorithm_config.py

@@ -3459,6 +3460,7 @@ def rl_module(
    def experimental(
        self,
        *,
+        _use_msgpack_checkpoints: Optional[bool] = NotProvided,


simonsays1980 · 2024-12-31T12:30:19Z

rllib/examples/checkpoints/change_config_during_training.py

+    - just for testing purposes, restores the entire algorithm from the latest
+    checkpoint and checks, whether the state of the restored algo exactly match the
+    state of the previously saved one.
+    - then changes the original config used (learning rate and other settings) and


Awesome example!!

simonsays1980 · 2024-12-31T12:32:09Z

rllib/examples/checkpoints/change_config_during_training.py

+                f"p{aid}": PolicySpec(
+                    config=AlgorithmConfig.overrides(
+                        lr=5e-5
+                        * (aid + 1),  # agent 1 has double the learning rate as 0.


What if we have more than 2 agents?

Good catch. I think, then this example breaks :( . I'll just force the setting to always be 2 in this particular example script, otherwise produce an error. :)

simonsays1980 · 2024-12-31T12:39:01Z

rllib/examples/checkpoints/change_config_during_training.py

+    test_eval_results = test_algo.evaluate()
+    assert (
+        test_eval_results[ENV_RUNNER_RESULTS][EPISODE_RETURN_MEAN]
+        >= args.stop_reward_first_config


Dumb question: Does this inequality always hold? We train for mean rewards, so theoretically there might be cases where this inequality does not hold, aren't there?

True, but statistically, I think the chance is super low. We also evaluate, meaning we use greedy actions, which perform much stronger that the stochastic ones used during training.

simonsays1980 · 2024-12-31T12:40:48Z

rllib/examples/checkpoints/restore_1_of_n_agents_from_checkpoint.py

    )
+
+    class LoadP0OnAlgoInitCallback(DefaultCallbacks):
+        def on_algorithm_init(self, *, algorithm, **kwargs):


Alright so users should always load certain modules via callback? If so my suggestion would be to provide a callback that does this for users.

I think this is the most expressive and transparent way, yes.

Working on another PR allowing to pass a simple, single on_algorithm_init callback lambda to the config. This way, users don't have to provide these clumsy classes anymore (they still can, but don't have to).

This also avoids having these paths in the module specs. Imo, they don't belong in there.

simonsays1980 · 2024-12-31T12:48:13Z

rllib/utils/tests/test_checkpointable.py

+                    policies_to_train=all_pols,
+                )
+                expanded_config.rl_module(
+                    algorithm_config_overrides_per_module={


…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…` and separate state from architecture. (#49497)

sven1977 added 2 commits December 29, 2024 11:06

wip

5401b6d

wip

6edef91

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from simonsays1980 as a code owner December 30, 2024 12:19

sven1977 assigned simonsays1980 Dec 30, 2024

sven1977 added rllib RLlib related issues rllib-checkpointing-or-recovery An issue related to checkpointing/recovering RLlib Trainers. rllib-newstack labels Dec 30, 2024

sven1977 changed the title ~~[RLlib] Checkpointing enhancements: Experimentally support msgpack and separate of state from architecture.~~ [RLlib] Checkpointing enhancements: Experimentally support msgpack and separate state from architecture. Dec 30, 2024

sven1977 added 2 commits December 30, 2024 15:05

fix

0a3e79e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

896c1a2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) December 30, 2024 18:22

github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 30, 2024

sven1977 disabled auto-merge December 30, 2024 18:22

Merge branch 'master' of https://github.com/ray-project/ray into chec…

93b32af

…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture

simonsays1980 approved these changes Dec 31, 2024

View reviewed changes

sven1977 added 2 commits December 31, 2024 14:19

Merge branch 'master' of https://github.com/ray-project/ray into chec…

515b914

…kpointing_enhancements_msgpack_and_separation_of_state_and_architecture

wip

197cebf

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) December 31, 2024 13:42

wip

29ebf2e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge December 31, 2024 14:52

sven1977 merged commit 7791d13 into ray-project:master Dec 31, 2024
5 checks passed

sven1977 mentioned this pull request Jan 1, 2025

[RLlib] Make msgpack checkpoints NOT contain config information (it's almost impossible to serialize). #40452

Closed

8 tasks

srinathk10 pushed a commit that referenced this pull request Jan 3, 2025

[RLlib] Checkpointing enhancements: Experimentally support `msgpack…

2b78a5d

…` and separate state from architecture. (#49497)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] `Checkpointing` enhancements: Experimentally support `msgpack` and separate state from architecture. #49497

[RLlib] `Checkpointing` enhancements: Experimentally support `msgpack` and separate state from architecture. #49497

sven1977 commented Dec 30, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Dec 31, 2024

simonsays1980 Dec 31, 2024

simonsays1980 Dec 31, 2024

sven1977 Dec 31, 2024

simonsays1980 Dec 31, 2024

sven1977 Dec 31, 2024

simonsays1980 Dec 31, 2024

sven1977 Dec 31, 2024

simonsays1980 Dec 31, 2024

[RLlib] Checkpointing enhancements: Experimentally support msgpack and separate state from architecture. #49497

[RLlib] Checkpointing enhancements: Experimentally support msgpack and separate state from architecture. #49497

Conversation

sven1977 commented Dec 30, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] `Checkpointing` enhancements: Experimentally support `msgpack` and separate state from architecture. #49497

[RLlib] `Checkpointing` enhancements: Experimentally support `msgpack` and separate state from architecture. #49497

sven1977 commented Dec 30, 2024 •

edited

Loading