Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for non-allocating Jacobian #120

Closed
danielwe opened this issue Nov 25, 2020 · 7 comments
Closed

Suggestion for non-allocating Jacobian #120

danielwe opened this issue Nov 25, 2020 · 7 comments

Comments

@danielwe
Copy link

danielwe commented Nov 25, 2020

I'm aware of #104 which was closed by #113, but Jacobians still allocate when using StaticArrays, even with a preallocated cache. Here's a barebones solution for a cache-free non-allocating Jacobian:

using StaticArrays
using BenchmarkTools

const RELSTEP = sqrt(eps(Float64))
const ABSSTEP = RELSTEP

function step(x)
    return max(RELSTEP * abs(x), ABSSTEP)
end

function finite_difference_jacobian(f, x)
    return finite_difference_jacobian(f, x, f(x))
end

@generated function finite_difference_jacobian(f, x::SVector{L}, fx) where {L}
    if L > 32
        # dimension too large for non-allocating Jacobian calculation
        return :(finite_difference_jacobian_fallback(f, x, fx))
    end
    ex = Expr(:call, :hcat)
    for i in 1:L
        push!(ex.args, :(finite_difference_jacobian_column(f, x, fx, $i)))
    end
    return ex
end

function finite_difference_jacobian_column(f, x::SVector{L}, fx, i) where {L}
    dxi = (x[i] + step(x[i])) - x[i]  # Floating point accuracy trick https://twitter.com/willkurt/status/1330183861452541953
    xdx = SVector{L}(j == i ? x[j] + dxi : x[j] for j in 1:L)
    return (f(xdx) - fx) / dxi
end

f(x) = @SVector [x[2]^3, x[1]^2, x[1] - 3x[2]]
x = @SVector rand(2)

println("FiniteDiff.finite_difference_jacobian:")
using FiniteDiff; cache = FiniteDiff.JacobianCache(x)
@btime FiniteDiff.finite_difference_jacobian(f, $x, $cache)

println("Custom finite_difference_jacobian:")
@btime finite_difference_jacobian(f, Ref($x)[])

Output:

FiniteDiff.finite_difference_jacobian:
  272.326 ns (21 allocations: 624 bytes)
Custom finite_difference_jacobian:
  15.568 ns (0 allocations: 0 bytes)

Could a specialization along these lines for small SVector problems be relevant for ForwardDiff? Unfortunately, I don't have the time to dig into the codebase and make a PR right now.

I had to use a generated function to hcat an arbitrary number of columns without splatting a generator, which apparently always allocates. Let me know if you know a better solution.


Edit: Included an ABSSTEP.
Edit 2: Fixed benchmark that was optimized away.

@Marc-Cox-08
Copy link

Wow 0.018 ns is faster than lightning ! As Admiral Grace Hopper was fond of saying In one nanosecond light travels about 0.3 meters . The only thing I can guess that is that fast is CPU registry memory ? Maybe that is why the check for " if L > 32 " works ? But my CPU L1 memory cache is much wider than 32 , so hmmm ?

And about "I had to use a generated function to hcat .. which apparently always allocates. Let me know if you know a better solution." Suggest Take a look at >> my example using RuntimeGeneratedFunction at this link >>
https://discourse.julialang.org/t/setting-approxfun-space-tried-several-tricks-but-still-receive-error-loaderror-cannot-infer-spaces-specifically-when-using-generalizedgenerated-jl-for-generalized-generated-functions/48909/17?u=marc.cox

using RuntimeGeneratedFunction
. . .
julia> @btime f17(7)
  26.122 ns (1 allocation: 16 bytes)
7.000000000000001

@YingboMa
Copy link
Member

When @btime reports sub-nanosecond, that implies that the actual computation is optimized away by the compiler. The correct benchmark should be

julia> @btime finite_difference_jacobian(f, Ref($x)[])
  12.866 ns (0 allocations: 0 bytes)
3×2 SMatrix{3, 2, Float64, 6} with indices SOneTo(3)×SOneTo(2):
 0.0        1.01871
 0.298806   0.0
 1.0       -3.0

@danielwe
Copy link
Author

Thanks, benchmark updated! 0.018 ns did seem a little too good to be true, although I guess it isn't a bad sign per se when the compiler is able to optimize away the whole thing.

I set the upper bound to L = 32 because during my initial prototyping the function seemed to start allocating when passing more than 32 arguments to hcat. But when testing the current implementation the largest non-allocating value seems to be L = 34, go figure.

For L > 34 this implementation is still faster and allocates less than ForwardDiff's, but I guess you gotta stop somewhere. I don't know how many versions of the generated function one should allow. In fact, I'm not sure I understand how generated functions work at all. I was expecting it to compile a new method for each L, such that, for example, after calling the function on xs of lengths 2, 3, and 7 one would get something like this:

julia> methods(finite_difference_jacobian)
# 4 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{2},T,1,2} where T, fx) in Main at <filename>:15
[2] finite_difference_jacobian(f, x::SArray{Tuple{3},T,1,3} where T, fx) in Main at <filename>:15
[3] finite_difference_jacobian(f, x::SArray{Tuple{7},T,1,7} where T, fx) in Main at <filename>:15
[4] finite_difference_jacobian(f, x) in Main at <filename>:11

But that's not what happens, you only ever see this:

julia> methods(finite_difference_jacobian)
# 2 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{L},T,1,L} where T, fx) where L in Main at <filename>:15
[2] finite_difference_jacobian(f, x) in Main at <filename>:11

@ChrisRackauckas
Copy link
Member

But that's not what happens, you only ever see this:

It only shows you that, but it's still specializing on each one.

This is a good idea and we should include such a specialization.

@YingboMa
Copy link
Member

This is great! Do you mind to make PR and add some tests?

@danielwe
Copy link
Author

I'll get to it eventually, but it doesn't look like it's gonna be a straight copy&paste and it's gonna be a couple of days before I'm able to sit down and familiarize myself with the existing code. If anyone else wants to take a stab in the meantime, feel free---that's why I wanted to share the example right away.

@Marc-Cox-08
Copy link

Hi Daniel @danielwe I'll see if I get a chance to take a stab at it.
It will probably be more efficient if you elaborate about why " it doesn't look like it's gonna be a straight copy&paste ",
And I'm disinclined to a lot of public back and forth in Github so feel free to message me on Discourse https://discourse.julialang.org/u/marc.cox/summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants