[RLlib] CQL change hparams and data reading strategy #27451

avnishn · 2022-08-03T20:04:43Z

Signed-off-by: avnish avnish@anyscale.com

tweaking the cql the same way I tweaked pretty much every other offline algorithm so far.
Added the dataset reader for faster speeds.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: avnish <avnish@anyscale.com>

avnishn · 2022-08-03T20:24:01Z

https://buildkite.com/ray-project/release-tests-pr/builds/12452
merge pending this test passing

Signed-off-by: Avnish <avnishnarayan@gmail.com>

avnishn · 2022-08-05T00:58:47Z

buildkite/ray-builders-pr — Build #41560 failed (3 hours, 2 minutes, 32 seconds)
Details
buildkite/ray-builders-pr/octopus-brain-tune-tests-and-examples-using-rllib — Failed
Details
buildkite/ray-builders-pr/python-medium-k-z — Failed
Details
buildkite/ray-builders-pr/windows-build-and-test — Failed
Details

failing tests unrelated.
release tests passing:

This is ready to merge

kouroshHakha · 2022-08-05T01:01:56Z

release/rllib_tests/learning_tests/yaml_files/cql/cql-halfcheetahbulletenv-v0.yaml

        num_gpus: 1
        metrics_smoothing_episodes: 5
+        min_time_s_per_iteration: 30


Was this necessary? This is offline RL so more number of iterations should have worked equally here. If this is necessary something does not make sense, if it wasn't necessary we should remove this hparam and instead increase timesteps_total to be consistent with other learning tests.

wait this parameter controls the logging frequency. I increased it so that we don't run too many unnecessary evaluations

it's not just that, but also the training time spent per iteration. so you keep running the same training_step() function until that timing requirement is met. In other words you keep taking gradient updates until that time is reached.

We chatted offline and my concern was more about reproducibility. These nits are not merge blockers. So please merge if it's the only thing holding this back.

will open a separate pr that addresses this issue across all release tests and environments

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>

avnish added 2 commits August 3, 2022 12:57

CQL change hparams and data reading strategy

449c75e

Signed-off-by: avnish <avnish@anyscale.com>

Update compute reqs of cql

5285ebb

Signed-off-by: avnish <avnish@anyscale.com>

avnishn requested a review from kouroshHakha August 3, 2022 20:24

Bump release timeout

9a5f78f

Signed-off-by: Avnish <avnishnarayan@gmail.com>

kouroshHakha reviewed Aug 5, 2022

View reviewed changes

kouroshHakha approved these changes Aug 5, 2022

View reviewed changes

richardliaw merged commit 6a31b61 into ray-project:master Aug 5, 2022

kouroshHakha mentioned this pull request Aug 9, 2022

[RLlib] Move learning_starts logic into execution plans #26032

Merged

6 tasks

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022

[RLlib] CQL change hparams and data reading strategy (ray-project#27451)

785f1a2

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] CQL change hparams and data reading strategy #27451

[RLlib] CQL change hparams and data reading strategy #27451

avnishn commented Aug 3, 2022

avnishn commented Aug 3, 2022

avnishn commented Aug 5, 2022

kouroshHakha Aug 5, 2022

avnishn Aug 5, 2022

kouroshHakha Aug 5, 2022

kouroshHakha Aug 5, 2022

avnishn Aug 5, 2022

[RLlib] CQL change hparams and data reading strategy #27451

[RLlib] CQL change hparams and data reading strategy #27451

Conversation

avnishn commented Aug 3, 2022

Why are these changes needed?

Related issue number

Checks

avnishn commented Aug 3, 2022

avnishn commented Aug 5, 2022

kouroshHakha Aug 5, 2022

Choose a reason for hiding this comment

avnishn Aug 5, 2022

Choose a reason for hiding this comment

kouroshHakha Aug 5, 2022

Choose a reason for hiding this comment

kouroshHakha Aug 5, 2022

Choose a reason for hiding this comment

avnishn Aug 5, 2022

Choose a reason for hiding this comment