-
-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests to use a random port for the dashboard #5060
Conversation
crusaderky
commented
Jul 14, 2021
•
edited
Loading
edited
- Drastically reduce the amount of warning messages, complaining that port 8787 is occupied, at the end of the test logs
- Use gen_test / gen_cluster whenever possible
- Clean up most of the other warnings captured by pytest
c.sync(f) | ||
|
||
response = requests.get("http://127.0.0.1:8787/status/") | ||
assert response.status_code == 404 | ||
response.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
404 is tested by test_no_dashboard below
@@ -4,12 +4,12 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole module is broken at least against the latest version of ucp. It needs bugfixing and CI integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @quasiben
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc also @pentschev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole module is broken at least against the latest version of ucp. It needs bugfixing and CI integration.
Could you provide more details? How are you installing UCX-Py and UCX, and how are you launching the tests? What errors do you see?
We run those tests in https://github.com/rapidsai/ucx-py/blob/branch-0.21/ci/gpu/build.sh#L111 and they're currently passing here where we test both UCX 1.9 and master branch.
As for CI integration, this is in the scope of dask/community#138 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I botched the packages installation. Should've RTFM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, thanks for the update.
From the UCX side I can confirm these changes still work, so I'm approving on behalf of UCX changes only and leave other changes for those with better understanding than myself to judge.
assert cluster.scheduler_address in text | ||
assert "cores=4" in text or "threads=4" in text | ||
assert "4.00 GB" in text or "3.73 GiB" in text | ||
assert "workers=2, threads=2, memory=2.00 GiB" in text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop backwards compatibility with dask < 2021.04.0
with Client(s["address"], loop=loop) as c: | ||
for func in funcs: | ||
text = func(c) | ||
assert c.scheduler.address in text | ||
assert "threads=3" in text or "Total threads: </strong>" in text | ||
assert "6.00 GB" in text or "5.59 GiB" in text | ||
assert "6.00 GiB" in text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop backwards compatibility with dask < 2021.04.0
0b2ced7
to
a8ae24f
Compare
CI output of a successful Ubuntu 3.7 run: |
You are removing a lot of pytest-asyncio markers. What's the reasoning here? Should we drop pytest-asyncio generally? If it is about the cleanup fixture, we could probably define it as an autouse fixture |
I'm replacing them with gen_test, which features both cleanup and timeout. Unlike the timeout of pytest-asyncio, gen_test cleanly aborts a single test and can produce a stack trace. Auto-adding a cleanup fixture is a problem because (1) I suspect it could interact poorly with gen_cluster - to be tested - and (2) cleanup fails for some tests. While ideally it is desirable to have all tests decorated by cleanup,being forced to fix the underlying issue all the times may prove to be unpractical. |
My question mostly points to "should we abandon pytest-asyncio or do we see a future". That's an interesting question for future code reviews, even if we still have remnants in the code base. Regarding timeout in pytest-asyncio, see also pytest-dev/pytest-asyncio#216 which implements our gen_test timeout logic there directly |
We should not use pytest-asyncio for as long as it doesn't implement a timeout, and when it does, we could consider cleaning up a bunch of in-house stuff from distributed.utils_test. There are challenges there; namely
The notable exception is parametrized tests and pytest fixtures. gen_cluster supports them, but gen_test doesn't (yet), so pytest-asyncio there remains the only alternative for the time being. |
Ready for review and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates here @crusaderky. Generally they look good. I noticed there are seemingly more timeout-related test failures here than in other recent PRs. Is this something we should be concerned about?
@jrbourbeau of the 6 failures you're talking about, 3 were potentially caused by my PR, while the other 3 seem to be unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test output looks great! So much less scrolling.
I think that we should consider reverting this after
#4928 gets in
If I recall correctly we used to do random ports, but then we had lots of
hard to explain random failures that came about because a scheduler/worker
from a previous test managed to connect back to the scheduler/worker of a
more recent test. This was hard to understand because of the randomness.
We then started to use the same ports everywhere so that this problem was
more obvious and this then led us to the various cleanup testing
infrastructure.
In general, my experience is that avoiding/working around problems in the
distributed CI works, but sometimes leads to hard to track down situations.
Randomness is generally something that I think we want to avoid.
…On Fri, Jul 16, 2021 at 7:26 AM Jacob Tomlinson ***@***.***> wrote:
***@***.**** commented on this pull request.
That test output looks great! So much less scrolling.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5060 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGU3SWCMGJVQALDOMLTYAQOHANCNFSM5ALWR2PQ>
.
|
@mrocklin I assume that with "revert" you're just referring to the instances to |
Yes, sorry, that's all I meant. I don't think that we should rely on
random ports to hide any weirdness that our tests may generate. We should
jump into and swim in that weirdness until it is fixed. I'm hopeful that
the linked PR fixes that particular issue.
…On Fri, Jul 16, 2021 at 9:23 AM crusaderky ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> I assume that with "revert"
you're just referring to the instances to dashboard_address=":0"
throughout the test modules. This PR does a wealth of cleanup on top of
that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5060 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBSUSF3QQDPB676ILLTYA6HDANCNFSM5ALWR2PQ>
.
|
@jrbourbeau is there anything outstanding for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I had a brief discussion with @jrbourbeau yesterday. My outstanding PR #4928 is unfortunately still not ready to fix all port warning. It should not block this PR, especially not since there are other valuable cleanups.
I'll merge once the CI jobs are through