-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] CQL change hparams and data reading strategy #27451
Conversation
Signed-off-by: avnish <avnish@anyscale.com>
Signed-off-by: avnish <avnish@anyscale.com>
https://buildkite.com/ray-project/release-tests-pr/builds/12452 |
Signed-off-by: Avnish <avnishnarayan@gmail.com>
buildkite/ray-builders-pr — Build #41560 failed (3 hours, 2 minutes, 32 seconds) failing tests unrelated. This is ready to merge |
num_gpus: 1 | ||
metrics_smoothing_episodes: 5 | ||
min_time_s_per_iteration: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this necessary? This is offline RL so more number of iterations should have worked equally here. If this is necessary something does not make sense, if it wasn't necessary we should remove this hparam and instead increase timesteps_total to be consistent with other learning tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait this parameter controls the logging frequency. I increased it so that we don't run too many unnecessary evaluations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not just that, but also the training time spent per iteration. so you keep running the same training_step() function until that timing requirement is met. In other words you keep taking gradient updates until that time is reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted offline and my concern was more about reproducibility. These nits are not merge blockers. So please merge if it's the only thing holding this back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will open a separate pr that addresses this issue across all release tests and environments
Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Signed-off-by: avnish avnish@anyscale.com
tweaking the cql the same way I tweaked pretty much every other offline algorithm so far.
Added the dataset reader for faster speeds.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.