[RLlib] Fix ParametricRecSys observations #28358

ArturNiederfahrenhorst · 2022-09-07T22:33:50Z

Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com

Why are these changes needed?

The way observations are constructed right now, they don't necessarily fall into the observations space.
Furthermore, they don't make use of the complete observations space.

Related issue number

#28231

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 · 2022-09-08T10:58:27Z

rllib/examples/env/bandit_envs_recommender_system.py

@@ -143,9 +143,8 @@ def step(self, action):

        reward = 0.0
        if which_clicked < self.slate_size:
-            # Reward is 1.0 - regret if clicked. 0.0 if not clicked.


Why did we remove this comment?

I felt that it was very obvious and did not need a comment!

Please, let's not remove comments that are already there! A lot of things make sense when one reads the code, but might be unclear when one just skims through things.

Also, here, we should explain, where the magic new 100.0 value comes from.

sven1977 · 2022-09-08T10:59:11Z

rllib/examples/env/bandit_envs_recommender_system.py

-        scores = [
-            np.dot(self.current_user, doc) for doc in self.currently_suggested_docs
-        ]
+        scores = softmax(


Can you add a comment on why this should be softmax'd?

The scores can be > 1. If we then select a score to calculate the regret and reward, the reward may become < 0.

Sorry for not being more verbose about this.

The way we calculate the reward here makes it so that the reward ends up somewhere <= 1, instead of between 0 and 100 like it is specified in the observation space above.

Got it, could you add this explanation here? :)

sven1977

Thanks @ArturNiederfahrenhorst .

Just two nits on better comments, then we can merge it.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

Signed-off-by: ilee300a <ilee300@anyscale.com>

initial

1867ce3

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 7, 2022 22:33

format

7dcea70

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 reviewed Sep 8, 2022

View reviewed changes

sven1977 approved these changes Sep 8, 2022

View reviewed changes

ArturNiederfahrenhorst added 2 commits September 8, 2022 15:57

comments

145b998

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

add back comment and add further explanation of magic 100

6cb5efd

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 merged commit 4b6ef4c into ray-project:master Sep 12, 2022

ilee300a pushed a commit to ilee300a/ray that referenced this pull request Sep 12, 2022

[RLlib] Fix ParametricRecSys observations. (ray-project#28358)

1a389c7

Signed-off-by: ilee300a <ilee300@anyscale.com>

justinvyu pushed a commit to justinvyu/ray that referenced this pull request Sep 14, 2022

[RLlib] Fix ParametricRecSys observations. (ray-project#28358)

b40367c

ArturNiederfahrenhorst deleted the ParametricRecSysfix branch September 21, 2022 10:22

Rohan138 mentioned this pull request May 23, 2023

[RLlib] ParametricRecSys observations outside of observation space #28231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix ParametricRecSys observations #28358

[RLlib] Fix ParametricRecSys observations #28358

ArturNiederfahrenhorst commented Sep 7, 2022

sven1977 Sep 8, 2022

ArturNiederfahrenhorst Sep 8, 2022

sven1977 Sep 8, 2022

sven1977 Sep 8, 2022

ArturNiederfahrenhorst Sep 8, 2022

ArturNiederfahrenhorst Sep 8, 2022

ArturNiederfahrenhorst Sep 8, 2022

sven1977 Sep 8, 2022

sven1977 left a comment

[RLlib] Fix ParametricRecSys observations #28358

[RLlib] Fix ParametricRecSys observations #28358

Conversation

ArturNiederfahrenhorst commented Sep 7, 2022

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment