Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Fix token expiration for ray autoscaler #48481

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

wadhah101
Copy link
Contributor

@wadhah101 wadhah101 commented Oct 31, 2024

Why are these changes needed?

On AKS 1.30 Ray autoscaler fails after an hour of activity due to AKS new token policy. Check issue for more details

Related issue number

closes ray-project/kuberay#2324

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch 2 times, most recently from e1fc6c2 to 5d1f28c Compare November 1, 2024 19:30
@wadhah101
Copy link
Contributor Author

all tests are passing, PR up to date @hongchaodeng @kevin85421 @MortalHappiness

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch from 5d1f28c to 4101b68 Compare November 1, 2024 19:33
@kevin85421
Copy link
Member

I will review next week.

@jcotant1
Copy link
Member

Hey @kevin85421 any update on this one?

@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 18, 2024
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin85421
Copy link
Member

@wadhah101 thank you for the PR! Would you mind rebasing with the master branch?

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch 2 times, most recently from e0b42fa to ff6d1be Compare November 19, 2024 09:53
@wadhah101
Copy link
Contributor Author

@kevin85421 rebased

@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Nov 20, 2024
@kevin85421
Copy link
Member

CI fails. Let me retry it.

@kevin85421
Copy link
Member

kevin85421 commented Nov 21, 2024

CI failed again. I synced the branch with the master branch. If it still fails, maybe you need to take a look at the failure.

@kevin85421
Copy link
Member

It still failed. Would you mind fixing the issue?

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch from bca47af to f41cae2 Compare December 2, 2024 09:13
@wadhah101
Copy link
Contributor Author

@kevin85421
Fixed the test and rebased

@kevin85421
Copy link
Member

@wadhah101, the CR is still failing. Would you mind taking a look?

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch from f850ae5 to 37b7df0 Compare December 3, 2024 15:19
@jjyao
Copy link
Collaborator

jjyao commented Dec 11, 2024

@wadhah101 CI is still failing

@wadhah101
Copy link
Contributor Author

@jjyao I will try to fix the test, thanks 😄

@jjyao
Copy link
Collaborator

jjyao commented Dec 12, 2024

Thanks. Ping us if you need any helps.

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch 3 times, most recently from fb7a712 to 1fa8d59 Compare January 11, 2025 09:37
@wadhah101
Copy link
Contributor Author

@kevin85421 @jjyao I have fixed the tests locally and they seem to be fine
can you re run the ci ?

@jjyao jjyao changed the title fix token expiration for ray autoscaler [Core] Fix token expiration for ray autoscaler Jan 13, 2025
@danc
Copy link

danc commented Jan 20, 2025

Interested to test this, current implementation blocks autoscaling feature on AKS clusters

@wadhah101 wadhah101 force-pushed the fix-token-expiration branch from 1fa8d59 to 1e4b662 Compare January 20, 2025 12:27
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
@wadhah101 wadhah101 force-pushed the fix-token-expiration branch from 1e4b662 to be00bc7 Compare January 20, 2025 12:28
@jjyao jjyao merged commit bd3d90a into ray-project:master Jan 22, 2025
5 checks passed
ruisearch42 added a commit to ruisearch42/ray that referenced this pull request Jan 22, 2025
ruisearch42 added a commit to ruisearch42/ray that referenced this pull request Jan 22, 2025
…8481)"

This reverts commit bd3d90a.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
@kevin85421
Copy link
Member

The PR is reverted. Reopen the issue.

@wadhah101
Copy link
Contributor Author

why was this PR reverted @kevin85421 @ruisearch42 ?

win5923 pushed a commit to win5923/ray that referenced this pull request Jan 23, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Jan 23, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
@kevin85421
Copy link
Member

#50008 seems to cause our release test to be flaky, but it appears to have passed again. I am not sure whether we still need to revert it.

@kevin85421
Copy link
Member

Let me check with the CI team.

anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
Signed-off-by: Anson Qian <anson627@gmail.com>
anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
Signed-off-by: Anson Qian <anson627@gmail.com>
srinathk10 pushed a commit that referenced this pull request Feb 2, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
anyadontfly pushed a commit to anyadontfly/ray that referenced this pull request Feb 13, 2025
Signed-off-by: wa101 <wadhah.mahroug15@gmail.com>
Signed-off-by: Puyuan Yao <williamyao034@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests
Projects
None yet
6 participants