Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI test windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is consistently_failing #47950

Closed
can-anyscale opened this issue Oct 8, 2024 · 18 comments · Fixed by #47975
Assignees
Labels
bug Something that is supposed to be working; but isn't ci-test flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ ray-test-bot Issues managed by OSS test policy serve Ray Serve Related Issue stability triage Needs triage (eg: priority, bug/not-bug, and owning component) weekly-release-blocker Issues that will be blocking Ray weekly releases

Comments

@can-anyscale
Copy link
Collaborator

CI test windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/6490#01926dfe-ba7d-49e0-8c5f-5b4cc90367ef
- https://buildkite.com/ray-project/postmerge/builds/6490#01926dc0-143e-4412-9c59-644e38b87f41

DataCaseName-windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler-END
Managed by OSS Test Policy

@can-anyscale can-anyscale added bug Something that is supposed to be working; but isn't ci-test flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ ray-test-bot Issues managed by OSS test policy serve Ray Serve Related Issue stability triage Needs triage (eg: priority, bug/not-bug, and owning component) weekly-release-blocker Issues that will be blocking Ray weekly releases labels Oct 8, 2024
@can-anyscale
Copy link
Collaborator Author

Blamed commit: f1cccba found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1616

@can-anyscale
Copy link
Collaborator Author

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

@aslonnie aslonnie assigned aslonnie, zcin and edoakes and unassigned aslonnie Oct 10, 2024
zcin added a commit that referenced this issue Oct 10, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes #47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@can-anyscale
Copy link
Collaborator Author

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this issue Oct 15, 2024
## Why are these changes needed?

Fix `test_pow_2_replica_scheduler.py` on windows. Best guess is asyncio
is slower on windows, so the shortened timeouts for some tests cause the
tests to fail because tasks didn't get a chance to start/finish
executing.

Failing tests on windows:
- `test_multiple_queries_with_different_model_ids`
- `test_queue_len_cache_replica_at_capacity_is_probed`
- `test_queue_len_cache_background_probing`

## Related issue number

Closes ray-project#47950

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
@can-anyscale can-anyscale reopened this Oct 31, 2024
@can-anyscale
Copy link
Collaborator Author

CI test windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/6772#0192e0f9-6c68-407c-87bf-cc472e5534bc
- https://buildkite.com/ray-project/postmerge/builds/6769#0192dfe8-0028-4d0b-83b5-5d65f30e724d

DataCaseName-windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler-END
Managed by OSS Test Policy

@can-anyscale
Copy link
Collaborator Author

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

@can-anyscale
Copy link
Collaborator Author

Blamed commit: 75d652c found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1677

@can-anyscale
Copy link
Collaborator Author

Reverted PR: #48468

@can-anyscale
Copy link
Collaborator Author

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

@can-anyscale
Copy link
Collaborator Author

@can-anyscale can-anyscale reopened this Nov 2, 2024
@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

CI test windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/7308#0193b543-d0b3-448e-bed7-154fbde0941b
- https://buildkite.com/ray-project/postmerge/builds/7308#0193b453-c4d8-4ad0-ae76-51f0427ca92e

DataCaseName-windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler-END
Managed by OSS Test Policy

@can-anyscale
Copy link
Collaborator Author

Blamed commit: 789e1c8 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1832

@edoakes
Copy link
Contributor

edoakes commented Dec 11, 2024

@GeneDer @zcin PTAL

@aslonnie
Copy link
Collaborator

789e1c8 seems to be a pure doc change.

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

CI test windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/7464#0193e6c1-aecf-4b88-88ab-251042564b22
- https://buildkite.com/ray-project/postmerge/builds/7308#0193b543-d0b3-448e-bed7-154fbde0941b
- https://buildkite.com/ray-project/postmerge/builds/7308#0193b453-c4d8-4ad0-ae76-51f0427ca92e

DataCaseName-windows://python/ray/serve/tests/unit:test_pow_2_replica_scheduler-END
Managed by OSS Test Policy

@can-anyscale
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't ci-test flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ ray-test-bot Issues managed by OSS test policy serve Ray Serve Related Issue stability triage Needs triage (eg: priority, bug/not-bug, and owning component) weekly-release-blocker Issues that will be blocking Ray weekly releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants