Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated of Ref value does not accumulate gradient in GPU kernel #674

Closed
jgreener64 opened this issue Mar 14, 2023 · 4 comments
Closed
Labels

Comments

@jgreener64
Copy link
Contributor

I am on Julia 1.8.5, Enzyme main (2ccf4b) and CUDA v4.0.1. This works okay:

using Enzyme, CUDA

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

function kernel!(a, b_ref)
    b = b_ref[]
    a[threadIdx().x] = a[threadIdx().x] * b
    return nothing
end

function grad_kernel!(a, da, b, db)
    Enzyme.autodiff_deferred(
        Reverse,
        kernel!,
        Const,
        Duplicated(a, da),
        Duplicated(b, db),
    )
    return nothing
end

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = CuArray([2.0f0])
db = CuArray([0.0f0])

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel!(a, da, b, db)
println(db)
Float32[121.32266]

However if I try and use a Ref to avoid the array then the gradient is zero:

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = Ref(2.0f0)
db = Ref(0.0f0)

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel!(a, da, b, db)
println(db)
Base.RefValue{Float32}(0.0f0)

This could also be achieved with Active, but then I need to reduce the gradients either inside or outside the kernel:

function kernel_2!(a, b)
    a[threadIdx().x] = a[threadIdx().x] * b
    return nothing
end

function grad_kernel_2!(a, da, b, db)
    grads = Enzyme.autodiff_deferred(
        Reverse,
        kernel_2!,
        Const,
        Duplicated(a, da),
        Active(b),
    )
    db[threadIdx().x] = grads[1][2]
    return nothing
end

a = CUDA.rand(256)
da = zero(a) .+ 1.0f0
b = 2.0f0
db = CUDA.zeros(256)

CUDA.@sync @cuda threads=256 blocks=1 grad_kernel_2!(a, da, b, db)
println(sum(db))
127.38301
@vchuravy
Copy link
Member

https://github.com/JuliaGPU/CUDA.jl/blob/940d23d5b9a82e50f79a16ea46d13ca885a4d2de/src/compiler/execution.jl#L129

Ref gets translated to CuRef which is not mutable. Ref itself is CPU memory...

cc: @maleadt

Maybe we could provide a Ref with unified memory/pin the memory behind the back of the user?

@vchuravy vchuravy added the cuda label Mar 14, 2023
@maleadt
Copy link

maleadt commented Mar 14, 2023

That wouldn't work with Ref, and also would break the adapt that happens now. But it may be something worth considering, yes, as it would make Ref behave more like users expect.

@vchuravy
Copy link
Member

I am closing this for now since there is nothing we can do on the Enzyme level.

@vchuravy vchuravy closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2023
@maleadt
Copy link

maleadt commented Mar 15, 2023

I've filed an issue, JuliaGPU/CUDA.jl#1803, but I don't have time to prioritize this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants