[RLlib] Enable eager_tracing=True by default. #36556

sven1977 · 2023-06-19T11:07:19Z

Enable eager_tracing=True by default.

When running Algorithms with framework=="tf2", it is considerably slower to do so with the old default eager_tracing=False setting. This PR switches the default setting from eager_tracing=False to True, RLlib-wide.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

ArturNiederfahrenhorst · 2023-06-20T05:09:05Z

rllib/algorithms/appo/tests/test_appo.py

+                self.assertGreaterEqual(coeff, 0.0001)
+            else:
+                self.assertLessEqual(coeff, 0.01)
+                self.assertGreaterEqual(coeff, 0.001)


Can we redesign this test a little bit?

It could be simpler for example by doing:

entropy_coeff_schedule=[[0, 0.1], [200, 0.001], [600, 0.0001]]

Also, _step_n_times() should be "step_until_n_steps_reached()".
We should then be able to reuse this with entropy coefficient tests for other algorithms if so desired.
The "~100 timesteps" thing can easily change per algorithm or when something else in the algorithm under test changes that has nothing to do with the coefficient schedule.

I need to fix this, yes. I think b/c eager tracing is much faster, the async sampling also runs faster in the background.
I will add a proper check here to make sure this test performs the right checks based on the actual timesteps sampled.

ArturNiederfahrenhorst · 2023-06-20T05:11:35Z

rllib/tests/run_regression_tests.py

-            and not exp["config"].get("eager_tracing") is False
-        ):
-
-            exp["config"]["eager_tracing"] = True


Can we actually make this so that when local mode is True for the regression tests script (we use this only for debugging, right?), eager tracing will be False?

I think in almost all cases we are happy that eager tracing is disabled when using local mode here?

True, but I feel like then you should just set it in your config, no? It's not good to change stuff b/c we assume something w/o the user being in control. I have run into this issue several times while debugging, thinking that eager tracing was True (I wanted to debug a bug that only happened for eager_tracing=True), when it wasn't b/c I was also using local mode. It took me a while to find out that RLlib had sneakily changed my config :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

wip

f4d5375

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners June 19, 2023 11:07

sven1977 assigned ArturNiederfahrenhorst Jun 19, 2023

wip

8c0b5e8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

ArturNiederfahrenhorst reviewed Jun 20, 2023

View reviewed changes

ArturNiederfahrenhorst approved these changes Jun 20, 2023

View reviewed changes

sven1977 added 4 commits June 20, 2023 11:51

wip

c14cb15

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

fe4b7c8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b3a99ba

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4e154f4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 20, 2023

sven1977 added 3 commits June 20, 2023 15:42

wip

9d32a8d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9656b4e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

f553341

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit a3ec4a9 into ray-project:master Jun 20, 2023

akshay-anyscale mentioned this pull request Jul 21, 2023

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[RLlib] Enable eager_tracing=True by default. (ray-project#36556)

2b13925

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

sven1977 deleted the make_tf2_eager_tracing_true_by_default branch October 25, 2024 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Enable eager_tracing=True by default. #36556

[RLlib] Enable eager_tracing=True by default. #36556

sven1977 commented Jun 19, 2023 •

edited

Loading

ArturNiederfahrenhorst Jun 20, 2023

sven1977 Jun 20, 2023

ArturNiederfahrenhorst Jun 20, 2023

ArturNiederfahrenhorst Jun 20, 2023 •

edited

Loading

sven1977 Jun 20, 2023

[RLlib] Enable eager_tracing=True by default. #36556

[RLlib] Enable eager_tracing=True by default. #36556

Conversation

sven1977 commented Jun 19, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

ArturNiederfahrenhorst Jun 20, 2023

Choose a reason for hiding this comment

sven1977 Jun 20, 2023

Choose a reason for hiding this comment

ArturNiederfahrenhorst Jun 20, 2023

Choose a reason for hiding this comment

ArturNiederfahrenhorst Jun 20, 2023 • edited Loading

Choose a reason for hiding this comment

sven1977 Jun 20, 2023

Choose a reason for hiding this comment

sven1977 commented Jun 19, 2023 •

edited

Loading

ArturNiederfahrenhorst Jun 20, 2023 •

edited

Loading