Finite difference Hessians? #13

chriselrod · 2017-08-20T16:07:19Z

My application needs Hessians, and I would rather not take Jacobians of gradients.
I could write up an implementation myself based on: http://www.sciencedirect.com/science/article/pii/S0377042707004086

dextorious · 2017-08-20T16:28:57Z

The reference looks good to me. If you want to write up an implementation, I'll be happy to review and pull it in. Otherwise, I can do it myself, but it will probably be a few days before I get to it.

chriselrod · 2017-08-20T20:01:55Z

I'll look at it more in a few hours. Diagonals are right, but I can't seem to get non-roughly zero off-diagonals with a simple test function (just a symmetric quadratic form, x' S x / 2).

chriselrod · 2017-08-21T05:51:11Z

The complex step methods seem to struggle when matrix or vector multiplication are involved:

julia> function x_plus_e_i!(y::Vector{S}, x::Vector{T}, i::Int, h::S = eps()*im) where S where T
           copy!(y, x)
           y[i] += h
           y
       end
x_plus_e_i! (generic function with 2 methods)

julia> S = randn(20,10) |> x -> x' * x; #positive definite matrix

julia> x = randn(10);

julia> f = x -> x' * S * x / 2 #Gradient = S * x; Hessian = S
(::#3) (generic function with 1 method)

julia> y = Vector{Complex{Float64}}(10);

julia> h = eps()
2.220446049250313e-16

julia> hi = h*im
0.0 + 2.220446049250313e-16im

julia> g = x -> sum(sin, x)
(::#7) (generic function with 1 method)

julia> [imag(g(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 0.726238  0.799128  0.71147  0.867147  0.758384  0.867052  -0.244029  0.150227  0.999054  0.477616

julia> cos.(x)'
1×10 RowVector{Float64,Array{Float64,1}}:
 0.726238  0.799128  0.71147  0.867147  0.758384  0.867052  -0.244029  0.150227  0.999054  0.477616

julia> [imag(f(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 4.44089e-16  3.33067e-16  9.99201e-16  2.74086e-15  1.77636e-15  -5.55112e-16  7.77156e-16  2.22045e-15  5.55112e-17  8.88178e-16

julia> x' * S
1×10 RowVector{Float64,Array{Float64,1}}:
 -0.551427  1.58321  13.8398  16.0132  40.4378  -8.04598  -20.2981  -33.0481  -4.83252  37.1841

julia> function floop(x::AbstractVector{T}, S::Matrix = S) where T
           out = zero(T)
           for i ∈ eachindex(x), j ∈ eachindex(x)
               out += x[i] * x[j] * S[j,i]
           end
           out / 2
       end
floop (generic function with 2 methods)

julia> [imag(floop(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
 -0.551427  1.58321  13.8398  16.0132  40.4378  -8.04598  -20.2981  -33.0481  -4.83252  37.1841

f was my test function while struggling to get the Hessian to provide anything other than zero for the off-diagonals.
Even the first derivatives fail on a simple quadratic form expressed as matrix multiplication.
But it works just fine if you turn the quadratic into a for loop -- but that is an awkward limitation to work around, and of course means one can't take advantage of BLAS.

EDIT:
Anyway, this would probably need some work:

function finite_difference_hessian!(df::AbstractArray{T}, f, x::AbstractArray{T}, fdtype::DataType, fx::Union{Void,AbstractArray{T}}, ::Type{Val{:Default}}, avoidable_alloc = Vector{Complex{T}}(length(x))) where T<:Real
    if fdtype == Val{:complex}
        h² = sqrt(eps(T))
        h = sqrt(h²)
        h_im = h*im
        copy!(avoidable_alloc, x)
        fx = f(x)
        for i in eachindex(x)
            avoidable_alloc[i] += h_im
            df[i,i] = 2 * ( fx - real( f(avoidable_alloc) ) ) / h²
            for j in 1:i-1
                avoidable_alloc[j] += h
                Hij = f(avoidable_alloc)
                avoidable_alloc[j] = x[j] - h
                df[i,j] = df[j,i] = imag(Hij - f(avoidable_alloc))/2h²
                avoidable_alloc[j] = x[j]
            end
            avoidable_alloc[i] = x[i]
        end
    else
        throw("Not yet implemented.")
    end
    df
end

Like the author of that paper found higher accuracy using +/- i^1/2 (aka + i^1/2 vs + i^5/2).

That said, we have:

julia> df = similar(S); avoidable_alloc = Vector{Complex{Float64}}(10);

julia> using BenchmarkTools, ForwardDiff

julia> function floopsym(x::AbstractVector{T}, S::Matrix) where T
           out = zero(T)
           @inbounds for i ∈ eachindex(x)
               out += x[i]^2 * S[i,i] / 2
               for j ∈ 1:i-1
                   out += x[i] * x[j] * S[j,i]
               end
           end
           out
       end
floopsym (generic function with 2 methods)

julia> S
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> ForwardDiff.hessian!(df, x -> floopsym(x, S), x)
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> finite_difference_hessian!(df, x -> floopsym(x, S), x, Val{:complex}, nothing, Val{:Default}, avoidable_alloc)
10×10 Array{Float64,2}:
 14.9324      0.0182406   1.20413   -2.30786    -0.373463   5.97281   -1.10479   -5.16091   1.49971    4.65925  
  0.0182406  11.8286     -3.18068   -2.66351     2.15562    0.816386   1.71965    1.08352   0.759179   1.26882  
  1.20413    -3.18068    10.1553     2.24107     2.06688    3.39954   -0.404018  -4.17523  -1.31675    1.32567  
 -2.30786    -2.66351     2.24107   18.4213      3.17548   -5.60214    1.47782   -1.49102   1.81466   -0.0514479
 -0.373463    2.15562     2.06688    3.17548    25.251     -5.88625    0.367059  -4.07004  -1.04861    8.79373  
  5.97281     0.816386    3.39954   -5.60214    -5.88625   18.3602    -2.35115   -4.87581  -4.50786   -0.950041 
 -1.10479     1.71965    -0.404018   1.47782     0.367059  -2.35115   14.523     -2.02552   2.01434   -0.706608 
 -5.16091     1.08352    -4.17523   -1.49102    -4.07004   -4.87581   -2.02552   22.7344    2.35894   -4.45734  
  1.49971     0.759179   -1.31675    1.81466    -1.04861   -4.50786    2.01434    2.35894  14.6459     0.616328 
  4.65925     1.26882     1.32567   -0.0514479   8.79373   -0.950041  -0.706608  -4.45734   0.616328  22.8475   

julia> @benchmark ForwardDiff.hessian!($df, x -> floopsym(x, $S), $x)
BenchmarkTools.Trial: 
  memory estimate:  21.42 KiB
  allocs estimate:  12
  --------------
  minimum time:     40.628 μs (0.00% GC)
  median time:      41.814 μs (0.00% GC)
  mean time:        43.800 μs (2.53% GC)
  maximum time:     1.415 ms (91.52% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark finite_difference_hessian!($df, x -> floopsym(x, $S), $x, $Val{:complex}, $nothing, $Val{:Default}, $avoidable_alloc)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     12.915 μs (0.00% GC)
  median time:      13.122 μs (0.00% GC)
  mean time:        13.162 μs (0.00% GC)
  maximum time:     25.761 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

There's an allocation somewhere?
Unfortunately, the inability to do actual matrix multiplication would make me hesitate to make it a default option. Although, one could say this is really just looking bad for matrix operations:

julia> @benchmark ForwardDiff.hessian!($df, x -> x' * $S * x / 2, $x)
BenchmarkTools.Trial: 
  memory estimate:  31.05 KiB
  allocs estimate:  13
  --------------
  minimum time:     93.520 μs (0.00% GC)
  median time:      94.931 μs (0.00% GC)
  mean time:        97.900 μs (1.66% GC)
  maximum time:     1.513 ms (87.37% GC)
  --------------
  samples:          10000
  evals/sample:     1

ChrisRackauckas · 2017-11-21T11:11:30Z

Reference: https://v8doc.sas.com/sashtml/ormp/chap5/sect28.htm
And the complex form:
http://www.sciencedirect.com/science/article/pii/S0377042707004086

I'll implement these in the same manner that the Jacobians are now done. That shouldn't be so difficult.

ChrisRackauckas · 2018-02-15T20:21:11Z

Now that the others are in and the documentation is on the README, the general API of what the API should look like is pretty solidified. If you just follow how gradients and Jacobians look then it'll be great. So now someone just needs to tackle it!

ChrisRackauckas · 2019-06-22T17:40:18Z

I've ignored complex step for now since it seems to have issues, and AD is just better if you have that kind of control over f in this case.

ChrisRackauckas mentioned this issue Nov 8, 2017

DiffEqDiffTools API changes SciML/OrdinaryDiffEq.jl#224

Merged

This was referenced Jan 23, 2018

DiffeqDiffTools derivatives JuliaNLSolvers/Optim.jl#518

Merged

Finite Difference: Remove Calculus when DiffEqDiffTools Hessians are implemented JuliaNLSolvers/Optim.jl#519

Closed

ChrisRackauckas mentioned this issue Jun 22, 2019

Add Hessians and update docs #61

Merged

ChrisRackauckas closed this as completed in #61 Jun 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finite difference Hessians? #13

Finite difference Hessians? #13

chriselrod commented Aug 20, 2017

dextorious commented Aug 20, 2017

chriselrod commented Aug 20, 2017

chriselrod commented Aug 21, 2017 •

edited

Loading

ChrisRackauckas commented Nov 21, 2017

ChrisRackauckas commented Feb 15, 2018

ChrisRackauckas commented Jun 22, 2019

Finite difference Hessians? #13

Finite difference Hessians? #13

Comments

chriselrod commented Aug 20, 2017

dextorious commented Aug 20, 2017

chriselrod commented Aug 20, 2017

chriselrod commented Aug 21, 2017 • edited Loading

ChrisRackauckas commented Nov 21, 2017

ChrisRackauckas commented Feb 15, 2018

ChrisRackauckas commented Jun 22, 2019

chriselrod commented Aug 21, 2017 •

edited

Loading