-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finite difference Hessians? #13
Comments
The reference looks good to me. If you want to write up an implementation, I'll be happy to review and pull it in. Otherwise, I can do it myself, but it will probably be a few days before I get to it. |
I'll look at it more in a few hours. Diagonals are right, but I can't seem to get non-roughly zero off-diagonals with a simple test function (just a symmetric quadratic form, x' S x / 2). |
The complex step methods seem to struggle when matrix or vector multiplication are involved: julia> function x_plus_e_i!(y::Vector{S}, x::Vector{T}, i::Int, h::S = eps()*im) where S where T
copy!(y, x)
y[i] += h
y
end
x_plus_e_i! (generic function with 2 methods)
julia> S = randn(20,10) |> x -> x' * x; #positive definite matrix
julia> x = randn(10);
julia> f = x -> x' * S * x / 2 #Gradient = S * x; Hessian = S
(::#3) (generic function with 1 method)
julia> y = Vector{Complex{Float64}}(10);
julia> h = eps()
2.220446049250313e-16
julia> hi = h*im
0.0 + 2.220446049250313e-16im
julia> g = x -> sum(sin, x)
(::#7) (generic function with 1 method)
julia> [imag(g(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
0.726238 0.799128 0.71147 0.867147 0.758384 0.867052 -0.244029 0.150227 0.999054 0.477616
julia> cos.(x)'
1×10 RowVector{Float64,Array{Float64,1}}:
0.726238 0.799128 0.71147 0.867147 0.758384 0.867052 -0.244029 0.150227 0.999054 0.477616
julia> [imag(f(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
4.44089e-16 3.33067e-16 9.99201e-16 2.74086e-15 1.77636e-15 -5.55112e-16 7.77156e-16 2.22045e-15 5.55112e-17 8.88178e-16
julia> x' * S
1×10 RowVector{Float64,Array{Float64,1}}:
-0.551427 1.58321 13.8398 16.0132 40.4378 -8.04598 -20.2981 -33.0481 -4.83252 37.1841
julia> function floop(x::AbstractVector{T}, S::Matrix = S) where T
out = zero(T)
for i ∈ eachindex(x), j ∈ eachindex(x)
out += x[i] * x[j] * S[j,i]
end
out / 2
end
floop (generic function with 2 methods)
julia> [imag(floop(x_plus_e_i!(y, x, i, hi))) / h for i ∈ 1:10]'
1×10 RowVector{Float64,Array{Float64,1}}:
-0.551427 1.58321 13.8398 16.0132 40.4378 -8.04598 -20.2981 -33.0481 -4.83252 37.1841 f was my test function while struggling to get the Hessian to provide anything other than zero for the off-diagonals. EDIT: function finite_difference_hessian!(df::AbstractArray{T}, f, x::AbstractArray{T}, fdtype::DataType, fx::Union{Void,AbstractArray{T}}, ::Type{Val{:Default}}, avoidable_alloc = Vector{Complex{T}}(length(x))) where T<:Real
if fdtype == Val{:complex}
h² = sqrt(eps(T))
h = sqrt(h²)
h_im = h*im
copy!(avoidable_alloc, x)
fx = f(x)
for i in eachindex(x)
avoidable_alloc[i] += h_im
df[i,i] = 2 * ( fx - real( f(avoidable_alloc) ) ) / h²
for j in 1:i-1
avoidable_alloc[j] += h
Hij = f(avoidable_alloc)
avoidable_alloc[j] = x[j] - h
df[i,j] = df[j,i] = imag(Hij - f(avoidable_alloc))/2h²
avoidable_alloc[j] = x[j]
end
avoidable_alloc[i] = x[i]
end
else
throw("Not yet implemented.")
end
df
end Like the author of that paper found higher accuracy using +/- i^1/2 (aka + i^1/2 vs + i^5/2). That said, we have: julia> df = similar(S); avoidable_alloc = Vector{Complex{Float64}}(10);
julia> using BenchmarkTools, ForwardDiff
julia> function floopsym(x::AbstractVector{T}, S::Matrix) where T
out = zero(T)
@inbounds for i ∈ eachindex(x)
out += x[i]^2 * S[i,i] / 2
for j ∈ 1:i-1
out += x[i] * x[j] * S[j,i]
end
end
out
end
floopsym (generic function with 2 methods)
julia> S
10×10 Array{Float64,2}:
14.9324 0.0182406 1.20413 -2.30786 -0.373463 5.97281 -1.10479 -5.16091 1.49971 4.65925
0.0182406 11.8286 -3.18068 -2.66351 2.15562 0.816386 1.71965 1.08352 0.759179 1.26882
1.20413 -3.18068 10.1553 2.24107 2.06688 3.39954 -0.404018 -4.17523 -1.31675 1.32567
-2.30786 -2.66351 2.24107 18.4213 3.17548 -5.60214 1.47782 -1.49102 1.81466 -0.0514479
-0.373463 2.15562 2.06688 3.17548 25.251 -5.88625 0.367059 -4.07004 -1.04861 8.79373
5.97281 0.816386 3.39954 -5.60214 -5.88625 18.3602 -2.35115 -4.87581 -4.50786 -0.950041
-1.10479 1.71965 -0.404018 1.47782 0.367059 -2.35115 14.523 -2.02552 2.01434 -0.706608
-5.16091 1.08352 -4.17523 -1.49102 -4.07004 -4.87581 -2.02552 22.7344 2.35894 -4.45734
1.49971 0.759179 -1.31675 1.81466 -1.04861 -4.50786 2.01434 2.35894 14.6459 0.616328
4.65925 1.26882 1.32567 -0.0514479 8.79373 -0.950041 -0.706608 -4.45734 0.616328 22.8475
julia> ForwardDiff.hessian!(df, x -> floopsym(x, S), x)
10×10 Array{Float64,2}:
14.9324 0.0182406 1.20413 -2.30786 -0.373463 5.97281 -1.10479 -5.16091 1.49971 4.65925
0.0182406 11.8286 -3.18068 -2.66351 2.15562 0.816386 1.71965 1.08352 0.759179 1.26882
1.20413 -3.18068 10.1553 2.24107 2.06688 3.39954 -0.404018 -4.17523 -1.31675 1.32567
-2.30786 -2.66351 2.24107 18.4213 3.17548 -5.60214 1.47782 -1.49102 1.81466 -0.0514479
-0.373463 2.15562 2.06688 3.17548 25.251 -5.88625 0.367059 -4.07004 -1.04861 8.79373
5.97281 0.816386 3.39954 -5.60214 -5.88625 18.3602 -2.35115 -4.87581 -4.50786 -0.950041
-1.10479 1.71965 -0.404018 1.47782 0.367059 -2.35115 14.523 -2.02552 2.01434 -0.706608
-5.16091 1.08352 -4.17523 -1.49102 -4.07004 -4.87581 -2.02552 22.7344 2.35894 -4.45734
1.49971 0.759179 -1.31675 1.81466 -1.04861 -4.50786 2.01434 2.35894 14.6459 0.616328
4.65925 1.26882 1.32567 -0.0514479 8.79373 -0.950041 -0.706608 -4.45734 0.616328 22.8475
julia> finite_difference_hessian!(df, x -> floopsym(x, S), x, Val{:complex}, nothing, Val{:Default}, avoidable_alloc)
10×10 Array{Float64,2}:
14.9324 0.0182406 1.20413 -2.30786 -0.373463 5.97281 -1.10479 -5.16091 1.49971 4.65925
0.0182406 11.8286 -3.18068 -2.66351 2.15562 0.816386 1.71965 1.08352 0.759179 1.26882
1.20413 -3.18068 10.1553 2.24107 2.06688 3.39954 -0.404018 -4.17523 -1.31675 1.32567
-2.30786 -2.66351 2.24107 18.4213 3.17548 -5.60214 1.47782 -1.49102 1.81466 -0.0514479
-0.373463 2.15562 2.06688 3.17548 25.251 -5.88625 0.367059 -4.07004 -1.04861 8.79373
5.97281 0.816386 3.39954 -5.60214 -5.88625 18.3602 -2.35115 -4.87581 -4.50786 -0.950041
-1.10479 1.71965 -0.404018 1.47782 0.367059 -2.35115 14.523 -2.02552 2.01434 -0.706608
-5.16091 1.08352 -4.17523 -1.49102 -4.07004 -4.87581 -2.02552 22.7344 2.35894 -4.45734
1.49971 0.759179 -1.31675 1.81466 -1.04861 -4.50786 2.01434 2.35894 14.6459 0.616328
4.65925 1.26882 1.32567 -0.0514479 8.79373 -0.950041 -0.706608 -4.45734 0.616328 22.8475
julia> @benchmark ForwardDiff.hessian!($df, x -> floopsym(x, $S), $x)
BenchmarkTools.Trial:
memory estimate: 21.42 KiB
allocs estimate: 12
--------------
minimum time: 40.628 μs (0.00% GC)
median time: 41.814 μs (0.00% GC)
mean time: 43.800 μs (2.53% GC)
maximum time: 1.415 ms (91.52% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark finite_difference_hessian!($df, x -> floopsym(x, $S), $x, $Val{:complex}, $nothing, $Val{:Default}, $avoidable_alloc)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 12.915 μs (0.00% GC)
median time: 13.122 μs (0.00% GC)
mean time: 13.162 μs (0.00% GC)
maximum time: 25.761 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
There's an allocation somewhere? julia> @benchmark ForwardDiff.hessian!($df, x -> x' * $S * x / 2, $x)
BenchmarkTools.Trial:
memory estimate: 31.05 KiB
allocs estimate: 13
--------------
minimum time: 93.520 μs (0.00% GC)
median time: 94.931 μs (0.00% GC)
mean time: 97.900 μs (1.66% GC)
maximum time: 1.513 ms (87.37% GC)
--------------
samples: 10000
evals/sample: 1 |
Reference: https://v8doc.sas.com/sashtml/ormp/chap5/sect28.htm I'll implement these in the same manner that the Jacobians are now done. That shouldn't be so difficult. |
Now that the others are in and the documentation is on the README, the general API of what the API should look like is pretty solidified. If you just follow how gradients and Jacobians look then it'll be great. So now someone just needs to tackle it! |
I've ignored complex step for now since it seems to have issues, and AD is just better if you have that kind of control over f in this case. |
My application needs Hessians, and I would rather not take Jacobians of gradients.
I could write up an implementation myself based on: http://www.sciencedirect.com/science/article/pii/S0377042707004086
The text was updated successfully, but these errors were encountered: