-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add Humanoid + SAC tuned_example script. #47279
[RLlib] Add Humanoid + SAC tuned_example script. #47279
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please check the learning rates and my comment.
.environment("Humanoid-v4") | ||
.training( | ||
initial_alpha=1.001, | ||
lr=0.00005, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how this can run with a learning rate not None
? The new PR is merged and requires lr=None
in the new stack. Instead actor_lr
, critic_lr
, and alpha_lr
can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, let me double check ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! I must have still run it on the old master (before we merged that PR).
Fixed.
…sac_humanoid_tuned_example
Add Humanoid + SAC tuned_example script.
Not fully tuned yet, but this setup already learns up to 1200 return (in 10h on single GPU with SAC). Probably just needs to run longer for even better results.
Just wanted to lock this in already as a milestone.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.