-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-local current_task keeps garbage alive #40626
Comments
Fixes JuliaGPU/CUAD.jl#866 [skip tests]
Fixes #866 [skip tests]
@quinnj - maybe this is what happens in CSV.jl? |
Also, maybe this is happening with PyCall + PyTorch ? (See https://discourse.julialang.org/t/using-gpu-via-pycall-causes-non-reusable-memory-allocation/55140) |
I guess this is because we keep the current task during sleep? If so, a catch-all (and kinda horrible) workaround may be to run Threads.@threads :static for _ in 1:Threads.nthreads()
Timer(Returns(nothing), 0; interval = 1)
end ? Now that we store
For a short-term solution with relatively small changes, I think Option 1 is attractive. Option 2 is attractive from other points of view like task fusion where you'd want to create tasks without materializing the auxiliary fields (e.g., RNG). |
Bumped into this issue with JuliaGPU/AMDGPU.jl#299, where each Proposed workaround in JuliaGPU/CUDA.jl#866 does not work as those threads do not return. Here's also a small MWE that reflects AMDGPU's situation. |
We're surely running into JuliaLang/julia#40626. This is my attempt to clear thread's local storage after multithreaded parsing.
bump |
#1046) * Clear thread state to ensure threads local state don't keep references We're surely running into JuliaLang/julia#40626. This is my attempt to clear thread's local storage after multithreaded parsing. * 1.6 compat * fix again
Maybe the simplest solution would be to make sure that then GC is invoked the solution proposed by @tkf is invoked before GC process? (the point is that, AFAICT, GC currently ensures that it does not run in parallel to anything else - but I do not know enough about details of current GC implementation so this might be a wrong approach) |
I would imagine that we can teach the GC to check the task's |
Ok, I've dug into this a bit more on the CSV.jl side and here are my notes:
Following along the lines of Julian's idea, I came up w/ the following helper macros: """
@syncpreserve args... begin
end
A macro that wraps a `@sync` block with `GC.@preserve` calls for all `args...` arguments, to ensure
they are not garbage collected for the lifetime of the `@sync` block.
"""
macro syncpreserve(args...)
expr = args[end]
args = args[1:end-1]
esc(quote
GC.@preserve $(args...) begin
@sync $expr
end
end)
end
"""
@weakrefspawn args... begin
end
A macro that wraps a `Threads.@spawn` block with `WeakRef` calls for all `args...` arguments, allowing
them to be garbage collected once the `Task` has finished running. Must be used within a `@syncpreserve`
block to ensure input arguments are not garbage collected for the lifetime of the `@sync` block.
"""
macro weakrefspawn(args...)
expr = args[end]
args = args[1:end-1]
block = Expr(:block, :(wkd = Dict()))
unpack = Expr(:block)
for arg in args
push!(block.args, :(wkd[$(Meta.quot(arg))] = WeakRef($arg)))
push!(unpack.args, :($(Symbol("_", arg)) = wkd[$(Meta.quot(arg))].value))
end
esc(quote
$block
Threads.@spawn begin
$unpack
$expr
end
end)
end Essentially the idea is that I wrap all my input arguments to my @syncpreserve ctx pertaskcolumns rowchunkguess rows wholecolumnslock for i = 1:ntasks
@weakrefspawn ctx pertaskcolumns rowchunkguess i rows wholecolumnslock multithreadparse(_ctx, _pertaskcolumns, _rowchunkguess, _i, _rows, _wholecolumnslock)
end It's a lot of argument repetition, but from my inspection, it seems to be doing the intended job: that is, even though various spawned So all in all, a bit clunky, but it works. Anyone see any problems with this approach? Will @jpsamaroo's PR/approach be able to do this kind of thing automatically? I've come to see this as one of those unfortunate cleanups that we should really address; it doesn't jump up an bite you as a bug, but it's an unfortunate piece of our current design where stuff can hang around a lot longer than you expect. |
Most complete explanation is [here](JuliaLang/julia#40626 (comment)). Also discussed [here](#1057). This PR proposes an alternative solution to `clear_thread_states` where that approach can be problematic (interferring with global thread state, not working as expected in nested spawned tasks, etc.). The previous definition also started unending `Timer` tasks that could build up over time. The approach in this PR is to wrap spawned task closure arguments in `WeakRef` to allow them to be GCed as expected once the tasks are finished.
Excellent writeup @quinnj We chatted about this a bit offline, but i don't think the But yeah, otherwise this seems like a decent workaround.. |
I was wondering about the underlying issue: When a thread goes to sleep, couldn't it switch to a specific "sleeper Task" that just does the sleep, so that it can free the user's Task object? i.e. instead of sleeping a task looking like something this: function _jl_get_next_task()
next_task_or_nothing = fetch_task()
if next_task_or_nothing === nothing
sleep(tls.sleep_cond)
else
task_switch(next_task_or_nothing::Task)
end
end it could instead look something like this?: function _jl_get_next_task()
next_task_or_nothing = fetch_task()
if next_task_or_nothing === nothing
task_switch(tls.sleeper_task)
else
task_switch(next_task_or_nothing::Task)
end
end where the sleeper_task would just be a per-Thread task that runs a function something like this: function sleeper_task()
while true
sleep(tls.cond)
next_task_or_nothing = fetch_task()
if next_task_or_nothing !== nothing
task_switch(next_task_or_nothing::Task)
end
end
end Is there a reason we don't have this kind of design? It would also make CPU profiles easier to understand, since thread sleep always looks the same. |
…e argumetns in WeakRef Fixes JuliaData/CSV.jl#1057. Works around current Julia limitation here: JuliaLang/julia#40626. `@wkspawn` acts just like `Threads.@spawn`, except for mutable, interpolated arguments in the spawned expression, they will also be transparently wrapped as `WeakRef`s, then immediately unwrapped within the spawn block. This avoids the `Task` closure capturing mutable arguments in a more permanent way and preventing their collection later, even after there are no more program references to the mutable argument.
Minor update: I packaged my idea(s) above into a single I added some tests that show the current, surprising behavior with |
#10) * Add new `@wkspawn` macro for wrapping mutable `Threads.@spawn` closure argumetns in WeakRef Fixes JuliaData/CSV.jl#1057. Works around current Julia limitation here: JuliaLang/julia#40626. `@wkspawn` acts just like `Threads.@spawn`, except for mutable, interpolated arguments in the spawned expression, they will also be transparently wrapped as `WeakRef`s, then immediately unwrapped within the spawn block. This avoids the `Task` closure capturing mutable arguments in a more permanent way and preventing their collection later, even after there are no more program references to the mutable argument. * Compat * fix * try lots of gc * Fix * Run with threads
…#1058) * Use WeakRefs for spawned tasks to avoid holding unexpected references Most complete explanation is [here](JuliaLang/julia#40626 (comment)). Also discussed [here](#1057). This PR proposes an alternative solution to `clear_thread_states` where that approach can be problematic (interferring with global thread state, not working as expected in nested spawned tasks, etc.). The previous definition also started unending `Timer` tasks that could build up over time. The approach in this PR is to wrap spawned task closure arguments in `WeakRef` to allow them to be GCed as expected once the tasks are finished. * Try putting gc preserve inside Threads.spawn block * Outside GC preserve manual sync block * Make Context mutable so it gets preserved properly * Only wrap in WeakRef if mutable * oops * Use `@wkspawn` from WorkerUtilities.jl package * 1.6 compat
When returning values from tasks executed on another thread, those values are kept alive even though it should be possible to collect them:
Running under
gdb
reveals that the task object is being kept alive in the thread's local storage:Running another task on the same thread replaces that sate and allows collection of the array:
As observed in JuliaGPU/CUDA.jl#866.
The text was updated successfully, but these errors were encountered: