Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finite difference Hessians? #13

Closed
chriselrod opened this issue Aug 20, 2017 · 6 comments · Fixed by #61
Closed

Finite difference Hessians? #13

chriselrod opened this issue Aug 20, 2017 · 6 comments · Fixed by #61

Comments

@chriselrod
Copy link

My application needs Hessians, and I would rather not take Jacobians of gradients.
I could write up an implementation myself based on: http://www.sciencedirect.com/science/article/pii/S0377042707004086

@dextorious
Copy link
Contributor

The reference looks good to me. If you want to write up an implementation, I'll be happy to review and pull it in. Otherwise, I can do it myself, but it will probably be a few days before I get to it.

@chriselrod
Copy link
Author

I'll look at it more in a few hours. Diagonals are right, but I can't seem to get non-roughly zero off-diagonals with a simple test function (just a symmetric quadratic form, x' S x / 2).

@chriselrod
Copy link
Author

chriselrod commented Aug 21, 2017

The complex step methods seem to struggle when matrix or vector multiplication are involved:

julia> function x_plus_e_i!(y::Vector{S}, x::Vector{T}, i::Int, h::S = eps()*im) where S where T
           copy!(y, x)
           y[i] += h
           y
       end
x_plus_e_i! (generic function with 2 methods)

julia> S = randn(20,10) |> x -> x' * x; #positive definite matrix

julia> x = randn(10);

julia> f = x -> x' * S * x / 2 #Gradient = S * x; Hessian = S
(::#3) (generic function with 1 method)

julia> y = Vector{Complex{Float64}}(10);

julia> h = eps()
2.220446049250313e-16

julia> hi = h*im
0.0 + 2.220446049250313e-16im

julia> g = x -> sum(sin, x)
(::#7) (generic function with 1 method)

julia> [imag(g(x_plus_e_i!(y, x, i, hi))) / h for i  1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 0.726238  0.799128  0.71147  0.867147  0.758384  0.867052  -0.244029  0.150227  0.999054  0.477616

julia> cos.(x)'
1×10 RowVector{Float64,Array{Float64,1}}:
 0.726238  0.799128  0.71147  0.867147  0.758384  0.867052  -0.244029  0.150227  0.999054  0.477616

julia> [imag(f(x_plus_e_i!(y, x, i, hi))) / h for i  1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 4.44089e-16  3.33067e-16  9.99201e-16  2.74086e-15  1.77636e-15  -5.55112e-16  7.77156e-16  2.22045e-15  5.55112e-17  8.88178e-16

julia> x' * S
1×10 RowVector{Float64,Array{Float64,1}}:
 -0.551427  1.58321  13.8398  16.0132  40.4378  -8.04598  -20.2981  -33.0481  -4.83252  37.1841

julia> function floop(x::AbstractVector{T}, S::Matrix = S) where T
           out = zero(T)
           for i  eachindex(x), j  eachindex(x)
               out += x[i] * x[j] * S[j,i]
           end
           out / 2
       end
floop (generic function with 2 methods)

julia> [imag(floop(x_plus_e_i!(y, x, i, hi))) / h for i  1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 -0.551427  1.58321  13.8398  16.0132  40.4378  -8.04598  -20.2981  -33.0481  -4.83252  37.1841

f was my test function while struggling to get the Hessian to provide anything other than zero for the off-diagonals.
Even the first derivatives fail on a simple quadratic form expressed as matrix multiplication.
But it works just fine if you turn the quadratic into a for loop -- but that is an awkward limitation to work around, and of course means one can't take advantage of BLAS.

EDIT:
Anyway, this would probably need some work:

function finite_difference_hessian!(df::AbstractArray{T}, f, x::AbstractArray{T}, fdtype::DataType, fx::Union{Void,AbstractArray{T}}, ::Type{Val{:Default}}, avoidable_alloc = Vector{Complex{T}}(length(x))) where T<:Real
    if fdtype == Val{:complex}
        h² = sqrt(eps(T))
        h = sqrt(h²)
        h_im = h*im
        copy!(avoidable_alloc, x)
        fx = f(x)
        for i in eachindex(x)
            avoidable_alloc[i] += h_im
            df[i,i] = 2 * ( fx - real( f(avoidable_alloc) ) ) /for j in 1:i-1
                avoidable_alloc[j] += h
                Hij = f(avoidable_alloc)
                avoidable_alloc[j] = x[j] - h
                df[i,j] = df[j,i] = imag(Hij - f(avoidable_alloc))/2h²
                avoidable_alloc[j] = x[j]
            end
            avoidable_alloc[i] = x[i]
        end
    else
        throw("Not yet implemented.")
    end
    df
end

Like the author of that paper found higher accuracy using +/- i^1/2 (aka + i^1/2 vs + i^5/2).

That said, we have:

julia> df = similar(S); avoidable_alloc = Vector{Complex{Float64}}(10);

julia> using BenchmarkTools, ForwardDiff

julia> function floopsym(x::AbstractVector{T}, S::Matrix) where T
           out = zero(T)
           @inbounds for i  eachindex(x)
               out += x[i]^2 * S[i,i] / 2
               for j  1:i-1
                   out += x[i] * x[j] * S[j,i]
               end
           end
           out
       end
floopsym (generic function with 2 methods)

julia> S
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> ForwardDiff.hessian!(df, x -> floopsym(x, S), x)
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> finite_difference_hessian!(df, x -> floopsym(x, S), x, Val{:complex}, nothing, Val{:Default}, avoidable_alloc)
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> @benchmark ForwardDiff.hessian!($df, x -> floopsym(x, $S), $x)
BenchmarkTools.Trial: 
  memory estimate:  21.42 KiB
  allocs estimate:  12
  --------------
  minimum time:     40.628 μs (0.00% GC)
  median time:      41.814 μs (0.00% GC)
  mean time:        43.800 μs (2.53% GC)
  maximum time:     1.415 ms (91.52% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark finite_difference_hessian!($df, x -> floopsym(x, $S), $x, $Val{:complex}, $nothing, $Val{:Default}, $avoidable_alloc)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     12.915 μs (0.00% GC)
  median time:      13.122 μs (0.00% GC)
  mean time:        13.162 μs (0.00% GC)
  maximum time:     25.761 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

There's an allocation somewhere?
Unfortunately, the inability to do actual matrix multiplication would make me hesitate to make it a default option. Although, one could say this is really just looking bad for matrix operations:

julia> @benchmark ForwardDiff.hessian!($df, x -> x' * $S * x / 2, $x)
BenchmarkTools.Trial: 
  memory estimate:  31.05 KiB
  allocs estimate:  13
  --------------
  minimum time:     93.520 μs (0.00% GC)
  median time:      94.931 μs (0.00% GC)
  mean time:        97.900 μs (1.66% GC)
  maximum time:     1.513 ms (87.37% GC)
  --------------
  samples:          10000
  evals/sample:     1

@ChrisRackauckas
Copy link
Member

Reference: https://v8doc.sas.com/sashtml/ormp/chap5/sect28.htm
And the complex form:
http://www.sciencedirect.com/science/article/pii/S0377042707004086

I'll implement these in the same manner that the Jacobians are now done. That shouldn't be so difficult.

@ChrisRackauckas
Copy link
Member

Now that the others are in and the documentation is on the README, the general API of what the API should look like is pretty solidified. If you just follow how gradients and Jacobians look then it'll be great. So now someone just needs to tackle it!

@ChrisRackauckas
Copy link
Member

I've ignored complex step for now since it seems to have issues, and AD is just better if you have that kind of control over f in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants