-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Learner API Docs #37729
[RLlib] Learner API Docs #37729
Conversation
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
These docs are WIP right now. I still need to add some of the other sections. I'm almost through getting the doc snippets to run without errors. |
Signed-off-by: Avnish <avnishnarayan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure that everytime you use RLModule
, Learner
, and LearnerGroup
you are referencing the actual API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits.
update? still targeting at 2.6.2? |
yep. |
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
yes |
Signed-off-by: Avnish <avnishnarayan@gmail.com>
:hide: | ||
:skipif: True | ||
|
||
from ray.rllib.algorithms.ppo.ppo import PPOConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc contains a lot of code that is not tested.
There exists a folder "doc/source/rllib/doc_code" that could hold a guide file to hold all this test code that actually gets executed there. You could put code there and grab code snippets from there to ensure these example snippets stay up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lets do that in a later iteration at this point since doc picks are due today.
But I agree that if thats the workaround then we'll go that way instead.
config.build() # test that the algorithm can be built with the given resources | ||
|
||
|
||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this more like here?
- But the warning that this feature is in alpha and the supported algos up top
- Consolidate the information on how to enable it with the above information in section "Enabling Learner API in RLlib experiments".
I purposely disabled doc testing for code snippets in this PR at the request of @bveeramani We're having ray rpc issues when running the tests on the CI, but they pass locally. |
doc/source/rllib/rllib-learner.rst
Outdated
|
||
Adjust the amount of resources for training using the | ||
`num_gpus_per_learner_worker`, `num_cpus_per_learner_worker`, and `num_learner_workers` | ||
arguments in the `AlgorithmConfig`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to the API refs page at least once for these things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
framework_hyperparameters=FrameworkHyperparameters(), | ||
) | ||
|
||
learner_group = LearnerGroup(learner_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would become more useful if it would include the necessary imports so it would become copy-pasteable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its going to add a lot of cognitive load to the guide. Maybe instead we put all of this in a file later then come back to it. We can do that when we update the examples.
doc/source/rllib/rllib-learner.rst
Outdated
learner_group.load_state(LEARNER_GROUP_CKPT_DIR) | ||
|
||
Checkpoint the state of all learners in the `LearnerGroup` via `save_state` and | ||
`load_state`. This includes all states including neural network weights and any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make clear what the difference is between get/set/save/load state?
For many of these sections, we just start off with a code example without giving context.
I think both sections "Getting and setting state" and "Checkpointing" should begin with something like:
"You can checkpoint the state of a LearnerGroup to a directory by calling
LearnerGroup.save_state()
. This will callLearnerGroup.get_state()
under the hood and therefore includes neural networks weights and optimizer states.
Loading a checkpoint works accordingly. Here is an example of how to save and load the state of a LearnerGroup:
.. testcode::
:skipif: True
learner_group.save_state(LEARNER_GROUP_CKPT_DIR)
learner_group.load_state(LEARNER_GROUP_CKPT_DIR)
Note that since the state of all of the
Learner
s is identical, only the states from the firstLearner
need to be saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Signed-off-by: Avnish <avnishnarayan@gmail.com>
doc/source/rllib/rllib-learner.rst
Outdated
|
||
Implementation | ||
============== | ||
`Learner` has many APIs for flexible implementation, however the core ones that you need to implement are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho we should clarify what this section deals with.
Is this about implementing a learner from scratch? Or for customization of an existing learner?
The heading should be something like "Customizing Learner" or "Implementing your own Learner" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't it be both? I feel like this might be semantics.
|
||
Implementation | ||
============== | ||
:py:class:`~ray.rllib.core.learner.learner.Learner` has many APIs for flexible implementation, however the core ones that you need to implement are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Learner class has a lot of abstract methods (13 or so).
I think we should clarify here that if you would want to only override the ones below, you'll have to subclass a framework specific implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I was going for here is to describe the minimum functions that people need to implement or override for their custom learners, whether that be them overriding them or implementing a brand new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo this should read:
Writing or customizing Learners
If you want to write your own custom :py:class:
~ray.rllib.core.learner.learner.Learner
, you will want to subclass a framework-specific Learner. The following is a list of the most important methods that you'll want to implement. Similarely, these methods are likely what you'll want to override if you want to modify an existing Learner like TorchPPOLearner :
(...)
.. testcode:: | ||
:skipif: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do code-block::
. Ratched will only complain for code-block:: python
arguments in the :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`. | ||
|
||
.. testcode:: | ||
:hide: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this overindented?
|
||
.. testcode:: | ||
:hide: | ||
:skipif: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, instead of labeling all blocks with :skipif: True
, we could add rllib-learner.rst
to the exclude list in the BUILD
file.
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
Signed-off-by: Avnish <avnishnarayan@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Signed-off-by: Avnish avnishnarayan@gmail.com
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.