-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repetetive AMDGPU.ones
calls crash runtime
#299
Comments
For some reason, after thread in the errormonitor finishes waiting it still prevents kernel signal from being GC'd... One way to fix this is to wait not in the separate thread, but directly: julia> using AMDGPU
julia> dummy_kern() = (return nothing;)
dummy_kern (generic function with 1 method)
julia> wait(@roc dummy_kern())
[ Info: [kern] exception size 4096 bytes
[ Info: [queue] create
[ Info: [queue] error monitor wait
true
julia> GC.gc()
[ Info: [kersig] kill
[ Info: [signal] kill AMDGPU.HSA.LibHSARuntime.hsa_signal_s(0x00007f60976f8000) With this change the above example also works fine. Another way is to pass to thread in errormonitor a schedule: Task not runnable
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] schedule(t::Task, arg::Any; error::Bool)
@ Base ./task.jl:829
[3] schedule
@ ./task.jl:827 [inlined]
[4] uv_writecb_task(req::Ptr{Nothing}, status::Int32)
@ Base ./stream.jl:1166
[5] poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
@ Base ./task.jl:963
[6] wait()
@ Base ./task.jl:972
[7] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
@ Base ./condition.jl:124
[8] wait_readnb(x::Base.TTY, nb::Int64)
@ Base ./stream.jl:416
[9] eof(s::Base.TTY)
@ Base ./stream.jl:106
[10] eof(io::REPL.Terminals.TTYTerminal)
@ Base ./io.jl:450 |
So here's MWE that shows that signals are not GC'd until Julia exits. mutable struct A
x::Int64
end
function A()
a = A(1)
errormonitor(Threads.@spawn println(a))
finalizer(a) do ai
println("Finalizer for $ai")
end
a
end
mutable struct B
x::Int64
end
function B()
b = B(1)
finalizer(b) do bi
println("Finalizer for $bi")
end
b
end
function main()
as = A[]
bs = B[]
push!(as, A())
push!(bs, B())
sleep(1)
GC.gc()
empty!(as)
empty!(bs)
GC.gc()
println("done")
end
main() |
Looks like related to JuliaLang/julia#40626 |
Following code snippet will crash:
with this error:
Putting logs on loots of stuff I noticed that (at least)
ROCSignal
associated with respectiveROCKernelSignals
are not killed. Their finalizers do not run.I suspect that there may be some dangling references that prevent finalizers from being called.
One thing to note, that the same code, but for CUDA runs fine on a much weaker GPU.
Only when exiting Julia REPL they are triggered. Manually calling
GC.gc()
also does not help.The text was updated successfully, but these errors were encountered: