Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix ParametricRecSys observations #28358

Merged

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com

Why are these changes needed?

The way observations are constructed right now, they don't necessarily fall into the observations space.
Furthermore, they don't make use of the complete observations space.

Related issue number

#28231

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
@@ -143,9 +143,8 @@ def step(self, action):

reward = 0.0
if which_clicked < self.slate_size:
# Reward is 1.0 - regret if clicked. 0.0 if not clicked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that it was very obvious and did not need a comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let's not remove comments that are already there! A lot of things make sense when one reads the code, but might be unclear when one just skims through things.

Also, here, we should explain, where the magic new 100.0 value comes from.

scores = [
np.dot(self.current_user, doc) for doc in self.currently_suggested_docs
]
scores = softmax(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on why this should be softmax'd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scores can be > 1. If we then select a score to calculate the regret and reward, the reward may become < 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not being more verbose about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we calculate the reward here makes it so that the reward ends up somewhere <= 1, instead of between 0 and 100 like it is specified in the observation space above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, could you add this explanation here? :)

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArturNiederfahrenhorst .

Just two nits on better comments, then we can merge it.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
@sven1977 sven1977 merged commit 4b6ef4c into ray-project:master Sep 12, 2022
ilee300a pushed a commit to ilee300a/ray that referenced this pull request Sep 12, 2022
Signed-off-by: ilee300a <ilee300@anyscale.com>
justinvyu pushed a commit to justinvyu/ray that referenced this pull request Sep 14, 2022
@ArturNiederfahrenhorst ArturNiederfahrenhorst deleted the ParametricRecSysfix branch September 21, 2022 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants