-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inverse Complex-to-Real FFT allocates GPU memory #2249
Comments
Known and expected; this is a bug in CUFFT, and NVIDIA has updated the documentation to indicate that these operations are expected to mutate inputs, so we need to take a copy of them. |
Sorry for commenting on this closed issue, but in some applications one does not need the input data after the transform has been computed. Would it make sense to add an "advanced" interface allowing a user to explicitly specify that they're OK with CUFFT overwriting input arrays? For example by setting an optional keyword argument to I can make a PR with the changes if that's an acceptable solution. |
I think that would be fine. Maybe it would make sense to coordinate such a change with AbstractFFTs.jl though; @stevengj does this kind of problem (where computing an FFT mutates inputs) happen with other FFT back-ends as well? |
Thanks for your answer. I agree, this should better be coordinated at the level of AbstractFFTs.jl. Just note that the mutating behaviour of CUFFT on complex-to-real transforms also exists in FFTW:
In FFTW.jl this is also the case when using the non-allocating interface ( using FFTW
using LinearAlgebra
û = rand(ComplexF64, 21, 30)
û_orig = copy(û)
# p = plan_brfft(û, 40; flags = FFTW.PRESERVE_INPUT) # only works for 1D inputs
p = plan_brfft(û, 40)
v = p * û # always preserves input?
norm(û - û_orig) # = 0 (input preserved)
mul!(v, p, û) # destroys input
norm(û - û_orig) # ≠ 0 (input was modified) |
In that case, I guess we shouldn't default to making a preserving copy unless the user requested that on plan creation? That would be a breaking change, though. |
Describe the bug
Inverse Complex-to-Real FFT allocates GPU memory, whereas inverse Complex-to-Complex FFT does not.
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Expected behavior
No allocations?
Version info
Details on Julia:
Details on CUDA:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: