[Tune] [PBT] [Doc] Add example PBT notebook #28519

justinvyu · 2022-09-14T21:56:17Z

Summary

Example notebook

See here for a Colab version of the notebook! Some images are missing that would show up in the Ray docs once merged.

The purpose of the example is to help new users understand what PBT is doing under the hood, and provide an example of using PBT with a function trainable. The notebook gives recommendations on how to set PBT-specific parameters such as perturbation_interval and how to perform checkpointing alongside PBT.

The example notebook replicates an experiment found in the original PBT paper. The toy example in the paper is a good way of verifying that PBT behavior is correct (trials are being correctly exploited and the correct checkpoints are being used).

Notebook plots

Figure in paper

PBT logging enhancements

This PR also includes some quality of life improvements to PBT logging that makes it easier to understand what is happening.

Example:

hyperparam_mutations = {
    "a": tune.uniform(0, 1),
    "b": list(range(5)),
    "c": {
        "d": tune.uniform(2, 3),
        "e": {"f": [-1, 0, 1]},
    },
}

The above hyperparameter mutations config results in the following log after the PBT explore step:

[PBT] [Explore] Perturbed the hyperparameter config of trial trial_1:
a : 0.5 --- (* 1.2) --> 0.6
b : 2 --- (shift right) --> 3
c :
    d : 2.5 --- (* 1.2) --> 3.0
    e :
        f : 0 --- (shift right) --> 1

Some docs restructuring

I also took the chance to nest some Tune examples in sub-sections, since there were a lot of frameworks specific examples that were overloading the table of contents.

Related PRs

This test helped uncover some bugs fixed in the following PRs:

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Add missing logging logic from previous commit Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Add missing resample operation log Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

…example_notebook

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

…ons` Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

python/ray/air/_internal/checkpoint_manager.py

python/ray/tune/execution/checkpoint_manager.py

…example_notebook

This reverts commit de064ab and b07f4aa.

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

xwjiang2010

Thanks so much!
This looks great! Only some nits..

xwjiang2010 · 2022-09-28T20:31:51Z

python/ray/tune/schedulers/pbt.py

    custom_explore_fn: Optional[Callable],
-) -> Dict:
+) -> Tuple[Dict, Dict]:
    """Return a config perturbed as specified.


What should be updated? The return type should be updated already.

there is also operations returned.

xwjiang2010 · 2022-09-28T20:42:50Z

doc/source/tune/examples/other-examples.rst

+  Example of using a Trainable function with HyperBandScheduler.
+  Also uses the AsyncHyperBandScheduler.
+- :doc:`/tune/examples/pbt_visualization/pbt_visualization`:
+  Configuring and running PBT and understanding the underlying algorithm behavior with a simple example.


mention this is to illustrate synchronous pbt?

…nd make runnable via colab Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

krfricke

Amazing tutorial and changes! Just two minor nits.

cc @maxpumperla for docs approval (incl. structure)

krfricke · 2022-09-30T09:44:56Z

python/ray/tune/schedulers/pbt.py

-                new_config[key] = distribution[
-                    min(len(distribution) - 1, distribution.index(config[key]) + 1)
-                ]
+                shift = random.choice([-1, 1])


Can we add a comment here that explains what we're doing?

Generally, let's add a few comments for this whole exploration block

krfricke · 2022-09-30T09:45:34Z

python/ray/tune/schedulers/pbt.py

+                new_idx = distribution.index(config[key]) + shift
+                new_idx = min(max(new_idx, 0), len(distribution) - 1)
+                new_config[key] = distribution[new_idx]
+                operations[key] = f"shift {'left' if shift == -1 else 'right'}"


Should this be "shift left (noop)" or similar if old_idx == new_idx? E.g. when we select shift = -1 when we're already at the first item

Right, this should include some indicator that we've hit the end of the list.

A few thoughts on this actually:

We might want to just wrap around from the end if trying to shift left at the first item. The fact that we don't wrap implies some kind of ordering to the items, but there is no guarantee there. Example:

"a": [100, 1, 50, 75]

75 <- (shift left & wrap around) -- 100 -- (shift right) -> 1

Seems arbitrary why 100 can only be perturbed to the right.

Should a noop be an option here rather than always shifting left and right? SHERPA's implementation includes a "no shift" option.

Hm what does the original paper say?

Everything else seems to have an option not to shift, so I actually believe just going with [-1, 0, 1] should be fine.

For wrapping around, basically a matter of preference. Best would be to distinguish between ordinal and categorical, but since we don't have this, I'd say let's assume ordinal and leave the logic as is

The original paper doesn't really mention specifics on perturb, so I think adding the no-shift makes sense to align with other implementations.

I see, can leave this for a PR in the future.

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Follow-up to #28519, fixing the hierarchy to match the hierarchy contained within the pages. Also moved MLFlow from ML Framework Examples to Experiment Tracking Examples Signed-off-by: Matthew Deng <matt@anyscale.com>

See [here](https://colab.research.google.com/github/justinvyu/ray/blob/pbt_example_notebook/doc/source/tune/examples/pbt_visualization/pbt_visualization.ipynb) for a Colab version of the notebook! Some images are missing that would show up in the Ray docs once merged. The purpose of the example is to help new users understand what PBT is doing under the hood, and provide an example of using PBT with a function trainable. The notebook gives recommendations on how to set PBT-specific parameters such as `perturbation_interval` and how to perform checkpointing alongside PBT. The example notebook replicates an experiment found in the [original PBT paper](https://arxiv.org/pdf/1711.09846.pdf). The toy example in the paper is a good way of verifying that PBT behavior is correct (trials are being correctly exploited and the correct checkpoints are being used). Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

Follow-up to ray-project#28519, fixing the hierarchy to match the hierarchy contained within the pages. Also moved MLFlow from ML Framework Examples to Experiment Tracking Examples Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

justinvyu added 13 commits September 7, 2022 14:52

Add better PBT logging for exploit, explore

d4ffcde

Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Add missing logging logic from previous commit Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Simplify perturb logic

d227adc

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add operation tracking and logging for PBT perturbs

dfef665

Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Add missing resample operation log Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

[Debug] Temporary fix for PBT checkpoint setting and loading

de064ab

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add forced checkpoint logic for PBT

b07f4aa

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add example notebook walking through paper toy example

aff2bb4

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add animation, 4 trial expeirment, and more explanations to notebok

ea54f45

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add 4 trial gif, separate make_animation function

421dd03

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add open in colab button

6077337

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add __init__.py to pbt_visualization doc module

8521074

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Merge branch 'master' of https://github.com/ray-project/ray into pbt_…

f08ccfe

…example_notebook

Rerun 2 trial PBT to generate a better visual

21eac42

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Clean-up tune examples TOC into sub-sections + add PBT notebook into TOC

0bdd8c8

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

justinvyu added tune Tune-related issues docs An issue or change related to documentation labels Sep 14, 2022

justinvyu self-assigned this Sep 14, 2022

justinvyu requested review from richardliaw, krfricke, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla and a team as code owners September 14, 2022 21:56

justinvyu marked this pull request as draft September 14, 2022 21:56

justinvyu added 2 commits September 27, 2022 11:02

Add test for PBT mutations logging + fix for empty `hyperparam_mutati…

9726837

…ons` Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Fix some wording in the notebook

550fd44

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Yard1 reviewed Sep 27, 2022

View reviewed changes

python/ray/air/_internal/checkpoint_manager.py Outdated Show resolved Hide resolved

python/ray/tune/execution/checkpoint_manager.py Outdated Show resolved Hide resolved

justinvyu added 2 commits September 27, 2022 11:58

Merge branch 'master' of https://github.com/ray-project/ray into pbt_…

1aa30c3

…example_notebook

Revert commits related to forced checkpoint

d5f6357

This reverts commit de064ab and b07f4aa.

justinvyu force-pushed the pbt_example_notebook branch from 94c5388 to d5f6357 Compare September 27, 2022 19:00

Fix failing _exploit tests

b9763aa

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

justinvyu marked this pull request as ready for review September 28, 2022 18:45

Yard1 requested review from c21 and removed request for c21 September 28, 2022 19:00

xwjiang2010 approved these changes Sep 28, 2022

View reviewed changes

justinvyu added 3 commits September 29, 2022 15:02

Improve pbt example notebook explanations (mention async behavior), a…

03e6d17

…nd make runnable via colab Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Clean up references to the pbt example notebook

f965de7

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Remove missing reference to utils file

df6940b

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

xwjiang2010 assigned krfricke and xwjiang2010 Sep 29, 2022

krfricke approved these changes Sep 30, 2022

View reviewed changes

justinvyu added 2 commits September 30, 2022 11:20

Improve documentation and typing hints + fix shift noop case

65ecf6d

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

Add assertion for matching keys in summarize_hyperparam_changes

755bebf

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

justinvyu requested a review from krfricke September 30, 2022 21:24

krfricke approved these changes Oct 4, 2022

View reviewed changes

richardliaw approved these changes Oct 4, 2022

View reviewed changes

krfricke merged commit da50ef4 into ray-project:master Oct 4, 2022

justinvyu mentioned this pull request Oct 5, 2022

[PB2] Fix broken PB2._get_new_config method override #29102

Merged

7 tasks

matthewdeng mentioned this pull request Nov 16, 2022

[docs][tune] fix examples hierarchy #30347

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tune] [PBT] [Doc] Add example PBT notebook #28519

[Tune] [PBT] [Doc] Add example PBT notebook #28519

justinvyu commented Sep 14, 2022 •

edited

Loading

xwjiang2010 left a comment

xwjiang2010 Sep 28, 2022

justinvyu Sep 28, 2022

xwjiang2010 Sep 29, 2022

xwjiang2010 Sep 28, 2022

krfricke left a comment

krfricke Sep 30, 2022

krfricke Sep 30, 2022

krfricke Sep 30, 2022

justinvyu Sep 30, 2022

krfricke Oct 3, 2022

justinvyu Oct 3, 2022

[Tune] [PBT] [Doc] Add example PBT notebook #28519

[Tune] [PBT] [Doc] Add example PBT notebook #28519

Conversation

justinvyu commented Sep 14, 2022 • edited Loading

Summary

Example notebook

Notebook plots

Figure in paper

PBT logging enhancements

Some docs restructuring

Related PRs

Checks

xwjiang2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krfricke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinvyu commented Sep 14, 2022 •

edited

Loading