Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests to use a random port for the dashboard #5060

Merged
merged 17 commits into from
Jul 28, 2021

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Jul 14, 2021

  • Drastically reduce the amount of warning messages, complaining that port 8787 is occupied, at the end of the test logs
  • Use gen_test / gen_cluster whenever possible
  • Clean up most of the other warnings captured by pytest

c.sync(f)

response = requests.get("http://127.0.0.1:8787/status/")
assert response.status_code == 404
response.raise_for_status()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 is tested by test_no_dashboard below

@@ -4,12 +4,12 @@

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole module is broken at least against the latest version of ucp. It needs bugfixing and CI integration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc also @pentschev

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole module is broken at least against the latest version of ucp. It needs bugfixing and CI integration.

Could you provide more details? How are you installing UCX-Py and UCX, and how are you launching the tests? What errors do you see?

We run those tests in https://github.com/rapidsai/ucx-py/blob/branch-0.21/ci/gpu/build.sh#L111 and they're currently passing here where we test both UCX 1.9 and master branch.

As for CI integration, this is in the scope of dask/community#138 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I botched the packages installation. Should've RTFM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thanks for the update.

From the UCX side I can confirm these changes still work, so I'm approving on behalf of UCX changes only and leave other changes for those with better understanding than myself to judge.

assert cluster.scheduler_address in text
assert "cores=4" in text or "threads=4" in text
assert "4.00 GB" in text or "3.73 GiB" in text
assert "workers=2, threads=2, memory=2.00 GiB" in text
Copy link
Collaborator Author

@crusaderky crusaderky Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop backwards compatibility with dask < 2021.04.0

with Client(s["address"], loop=loop) as c:
for func in funcs:
text = func(c)
assert c.scheduler.address in text
assert "threads=3" in text or "Total threads: </strong>" in text
assert "6.00 GB" in text or "5.59 GiB" in text
assert "6.00 GiB" in text
Copy link
Collaborator Author

@crusaderky crusaderky Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop backwards compatibility with dask < 2021.04.0

@crusaderky
Copy link
Collaborator Author

CI output of a successful Ubuntu 3.7 run:
Master: 8186 rows
This PR: 2466 rows 😎
cc @mrocklin @jrbourbeau

@fjetter
Copy link
Member

fjetter commented Jul 15, 2021

You are removing a lot of pytest-asyncio markers. What's the reasoning here? Should we drop pytest-asyncio generally? If it is about the cleanup fixture, we could probably define it as an autouse fixture

@crusaderky
Copy link
Collaborator Author

You are removing a lot of pytest-asyncio markers. What's the reasoning here? Should we drop pytest-asyncio generally? If it is about the cleanup fixture, we could probably define it as an autouse fixture

I'm replacing them with gen_test, which features both cleanup and timeout. Unlike the timeout of pytest-asyncio, gen_test cleanly aborts a single test and can produce a stack trace.

Auto-adding a cleanup fixture is a problem because (1) I suspect it could interact poorly with gen_cluster - to be tested - and (2) cleanup fails for some tests. While ideally it is desirable to have all tests decorated by cleanup,being forced to fix the underlying issue all the times may prove to be unpractical.

@fjetter
Copy link
Member

fjetter commented Jul 15, 2021

My question mostly points to "should we abandon pytest-asyncio or do we see a future". That's an interesting question for future code reviews, even if we still have remnants in the code base.

Regarding timeout in pytest-asyncio, see also pytest-dev/pytest-asyncio#216 which implements our gen_test timeout logic there directly

@crusaderky
Copy link
Collaborator Author

crusaderky commented Jul 15, 2021

We should not use pytest-asyncio for as long as it doesn't implement a timeout, and when it does, we could consider cleaning up a bunch of in-house stuff from distributed.utils_test. There are challenges there; namely

  1. have a blanket default timeout (with the option to override it) instead of having to specify it every time
  2. asyncio event loop vs. tornado event loop
  3. blanket cleanup
  4. either have a way to selectively disable cleanup, or go through all cleanup issues of all tests (which may not be even possible)
  5. downstream projects using distributed.utils_test

The notable exception is parametrized tests and pytest fixtures. gen_cluster supports them, but gen_test doesn't (yet), so pytest-asyncio there remains the only alternative for the time being.

@crusaderky crusaderky closed this Jul 15, 2021
@crusaderky crusaderky reopened this Jul 15, 2021
@crusaderky crusaderky marked this pull request as ready for review July 15, 2021 19:52
@crusaderky
Copy link
Collaborator Author

Ready for review and merge

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates here @crusaderky. Generally they look good. I noticed there are seemingly more timeout-related test failures here than in other recent PRs. Is this something we should be concerned about?

@crusaderky
Copy link
Collaborator Author

crusaderky commented Jul 16, 2021

@jrbourbeau of the 6 failures you're talking about, 3 were potentially caused by my PR, while the other 3 seem to be unrelated.
I've taken the occasion to revisit the slow markers and the timeouts for all the tests that take >15s on CI.

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test output looks great! So much less scrolling.

@mrocklin
Copy link
Member

mrocklin commented Jul 16, 2021 via email

@crusaderky
Copy link
Collaborator Author

@mrocklin I assume that with "revert" you're just referring to the instances to dashboard_address=":0" throughout the test modules. This PR does a wealth of cleanup on top of that.

@mrocklin
Copy link
Member

mrocklin commented Jul 16, 2021 via email

@crusaderky
Copy link
Collaborator Author

@jrbourbeau is there anything outstanding for this?

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I had a brief discussion with @jrbourbeau yesterday. My outstanding PR #4928 is unfortunately still not ready to fix all port warning. It should not block this PR, especially not since there are other valuable cleanups.

I'll merge once the CI jobs are through

@fjetter fjetter merged commit 33b795f into dask:main Jul 28, 2021
@crusaderky crusaderky deleted the dashboard_address branch July 28, 2021 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants