-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] dask gpu memory leaking #7092
Comments
Dask will eventually clean things up. Can you try:
xref: https://distributed.dask.org/en/latest/memory.html#aggressively-clearing-data |
@quasiben Cool, confirming that explicitly canceling I'm still stuck on: How would that look within one iteration of AFAICT CPU-oriented dask preserves working partitions so users can retry/inspect/etc. That's safe for CPUs because of swap/vmem. For GPU kernels, even with CPU<>GPU mapping (as was enabled in our testbed here), that seems more system-halting-prone. Maybe there's a mode/flag to free-partitions-on-fail or otherwise halt? |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
This issue has been labeled |
@lmeyerov sorry I didn't answer the last question. Many things have changed since you first opened this issue. Recently, @madsbk has been exploring heterogenous computing and I would suggest we move the conversation to this Dask issue: |
Describe the bug
Normally, if a dask task has an exn, downstream tasks should also have an exn... except it seems gpu partitions leak during this with
map_partitions
/persist
, which afaict is a normal pattern for ingest phases as dtypes get figured out.Steps/Code to reproduce bug
Increase
M
below to use more memory based on your GPU and watch nvidia-smi in a separate window. Run the test cell a few times until you use up all your memory and your browser crashes (if a local gpu)Setup:
Notebook cell:
Expected behavior
The above process can be run indefinitely without running out of memory
Environment overview (please complete the following information)
Ubuntu 18 / 10.2 -> Docker Ubuntu 18 10.2 -> Conda rapids 0.17
Smaller pascal GPU (2GB)
The text was updated successfully, but these errors were encountered: