Duplicated of Ref value does not accumulate gradient in GPU kernel #674

jgreener64 · 2023-03-14T16:19:31Z

I am on Julia 1.8.5, Enzyme main (2ccf4b) and CUDA v4.0.1. This works okay:

using Enzyme, CUDA

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

function kernel!(a, b_ref)
    b = b_ref[]
    a[threadIdx().x] = a[threadIdx().x] * b
    return nothing
end

function grad_kernel!(a, da, b, db)
    Enzyme.autodiff_deferred(
        Reverse,
        kernel!,
        Const,
        Duplicated(a, da),
        Duplicated(b, db),
    )
    return nothing
end

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = CuArray([2.0f0])
db = CuArray([0.0f0])

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel!(a, da, b, db)
println(db)

Float32[121.32266]

However if I try and use a Ref to avoid the array then the gradient is zero:

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = Ref(2.0f0)
db = Ref(0.0f0)

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel!(a, da, b, db)
println(db)

Base.RefValue{Float32}(0.0f0)

This could also be achieved with Active, but then I need to reduce the gradients either inside or outside the kernel:

function kernel_2!(a, b)
    a[threadIdx().x] = a[threadIdx().x] * b
    return nothing
end

function grad_kernel_2!(a, da, b, db)
    grads = Enzyme.autodiff_deferred(
        Reverse,
        kernel_2!,
        Const,
        Duplicated(a, da),
        Active(b),
    )
    db[threadIdx().x] = grads[1][2]
    return nothing
end

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = 2.0f0
db = CUDA.zeros(256)

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel_2!(a, da, b, db)
println(sum(db))

127.38301

The text was updated successfully, but these errors were encountered:

vchuravy · 2023-03-14T16:22:50Z

https://github.com/JuliaGPU/CUDA.jl/blob/940d23d5b9a82e50f79a16ea46d13ca885a4d2de/src/compiler/execution.jl#L129

Ref gets translated to CuRef which is not mutable. Ref itself is CPU memory...

cc: @maleadt

Maybe we could provide a Ref with unified memory/pin the memory behind the back of the user?

maleadt · 2023-03-14T16:43:51Z

That wouldn't work with Ref, and also would break the adapt that happens now. But it may be something worth considering, yes, as it would make Ref behave more like users expect.

vchuravy · 2023-03-14T22:12:12Z

I am closing this for now since there is nothing we can do on the Enzyme level.

maleadt · 2023-03-15T11:28:56Z

I've filed an issue, JuliaGPU/CUDA.jl#1803, but I don't have time to prioritize this.

vchuravy added the cuda label Mar 14, 2023

vchuravy closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated of Ref value does not accumulate gradient in GPU kernel #674

Duplicated of Ref value does not accumulate gradient in GPU kernel #674

jgreener64 commented Mar 14, 2023

vchuravy commented Mar 14, 2023

maleadt commented Mar 14, 2023

vchuravy commented Mar 14, 2023

maleadt commented Mar 15, 2023

Duplicated of Ref value does not accumulate gradient in GPU kernel #674

Duplicated of Ref value does not accumulate gradient in GPU kernel #674

Comments

jgreener64 commented Mar 14, 2023

vchuravy commented Mar 14, 2023

maleadt commented Mar 14, 2023

vchuravy commented Mar 14, 2023

maleadt commented Mar 15, 2023