diff --git a/docs/_rst/source/advanced_usage.rst b/docs/_rst/source/advanced_usage.rst index df0c887d..001f499a 100644 --- a/docs/_rst/source/advanced_usage.rst +++ b/docs/_rst/source/advanced_usage.rst @@ -209,6 +209,126 @@ expensive operation): Likewise, you could write a version of ``vector_hessian`` which supports functions of the form ``f!(y, x)``, or perhaps an in-place Jacobian with ``ForwardDiff.jacobian!``. +User defined derivatives +------------------------ + +Sometimes one wants to use `ForwardDiff` on a function where a part of the function (here denoted subfunction) does not support dual numbers. +This could be in cases where the subfunction exists in non Julia code (like a compiled C library) or when the function is implicitly defined. +If you can provide the derivative of the subfunction then `ForwardDiff` can properly propagate that derivative to the dual numbers so that the original function is still able to be automatically differentiated. +This technique can also be useful if an optimized version of the derivative is available that is faster to compute than the normal way of propagating dual numbers. + +**Note**: When defining a function and its derivative inside another function (for example when creating a closure) the derivative need to be defined first. This is due to a lowering bug in Julia 0.5. In the examples below, the functions are defined in global scope and there the order of function and derivative definition does not matter. + +** Derivatives ** + +Below is a minimum example where we define the explicit derivative for a small function and use the `@implement_derivative` macro to tell `ForwardDiff` that the function `f` has a user supplied derivative. The `@implement_derivative` macro takes two arguments, the first is the name of the function and the second is the name of the function that computes the derivative: + +.. code-block:: julia + julia> f(x) = x^2 + x; + + julia> Df(x) = (println("I got called"); 2x + 1); + + julia> ForwardDiff.@implement_derivative f Df + f (generic function with 2 methods) + + julia> ForwardDiff.derivative(f, 2.0) + I got called + 5.0 + + +**Gradients** + +Two macros are available for user defined gradients, `@implement_gradient` and `@implement_gradient!`. The first one (without the exclamation mark) takes two arguments. A function `f(x)` where `x` is a `Vector` that returns a scalar and the second `Df(x)`. which also takes a `Vector` but returns the gradient. + +.. code-block:: julia + julia> g(x) = norm(x); + + julia> Dg(x) = (println("I got called"); x / norm(x)) + + julia> ForwardDiff.@implement_gradient g Dg + g (generic function with 2 methods) + + julia> ForwardDiff.gradient(g, [1.0, 2.0]) + I got called + 2-element Array{Float64,1}: + 0.447214 + 0.894427 + +For better performance, use the second macro `@implement_gradient!`. This macro expects three arguments, a function `f(x)`, it's gradient `Df!(G, x)` which updates `G` and a config type that caches some intermediate arrays needed. This type is created from `GradientImplementConfig(T, chunk_size::Int, input_size::Int)` where `T` is the type of the input vector `x`, or `GradientImplementConfig(chunk_size::Int, x)`. The same example is given as above (in real code this code shoule be in a function or with `const` on the variables): + +.. code-block:: julia + julia> h(x) = norm(x); + + julia> Dh!(G, x) = (println("I got called"); copy!(G, x); scale!(G, 1 / norm(x))); + + julia> config = ForwardDiff.GradientImplementConfig(Float64, 2, 2); + + julia> ForwardDiff.@implement_gradient! h Dh! config + h (generic function with 2 methods) + + julia> x = [1.0, 2.0]; + + julia> ForwardDiff.gradient(h, x, ForwardDiff.GradientConfig{2}(x)) + I got called + 2-element Array{Float64,1}: + 0.447214 + 0.894427 + + +** Jacobians** + +Similar to the gradient case there are two macros, `@implement_jacobian` and `@implement_jacobian`. The first macro is used in an analogous fashion as the gradient macro: + +.. code-block:: julia + julia> p(x) = 5*x; + + julia> Dp(x) = (println("I got called"); 5*eye(length(x))); + + julia> ForwardDiff.@implement_jacobian p Dp + p (generic function with 2 methods) + + julia> ForwardDiff.jacobian(p, [1.0, 2.0]) + I got called + 2×2 Array{Float64,2}: + 5.0 0.0 + 0.0 5.0 + +The second one is the three argument version that also takes a `JacobianImplementConfig` which need access to both the input length and output length: + +.. code-lock:: julia + julia> q!(y, x) = (copy!(y, x); scale!(y, 5.0)); + + julia> function Dq!(J, x) + println("I got called") + for i in 1:length(x) + J[i,i] = 5.0 + end + return J + end; + + julia> x = rand(5); + + julia> y = similar(x); + + julia> J = zeros(5,5); + + julia> chunk_size = 5; + + julia> cfig = ForwardDiff.JacobianImplementConfig(chunk_size, y, x); + + julia> ForwardDiff.@implement_jacobian! q! Dq! cfig + q! (generic function with 2 methods) + + julia> ForwardDiff.jacobian!(J, q!, y, x, ForwardDiff.JacobianConfig{chunk_size}(y, x)) + I got called + 5×5 Array{Float64,2}: + 5.0 0.0 0.0 0.0 0.0 + 0.0 5.0 0.0 0.0 0.0 + 0.0 0.0 5.0 0.0 0.0 + 0.0 0.0 0.0 5.0 0.0 + 0.0 0.0 0.0 0.0 5.0 + + SIMD Vectorization ------------------ diff --git a/src/ForwardDiff.jl b/src/ForwardDiff.jl index 1f4058cd..5504184a 100644 --- a/src/ForwardDiff.jl +++ b/src/ForwardDiff.jl @@ -56,6 +56,7 @@ include("derivative.jl") include("gradient.jl") include("jacobian.jl") include("hessian.jl") +include("user_implementations.jl") include("deprecated.jl") export DiffBase diff --git a/src/user_implementations.jl b/src/user_implementations.jl new file mode 100644 index 00000000..8f3270a7 --- /dev/null +++ b/src/user_implementations.jl @@ -0,0 +1,207 @@ +# This file contains the macros and functions that enables +# user to define custom derivatives for arbitrary functions. +# This is useful is when calling out to external libraries, +# when the function is implicitly defined or when an optimized +# version of the derivative is available. + +############## +# Derivative # +############## + +_propagate_user_derivative{D <: Dual}(f, Df, x::D) = Dual(f(value(x)), Df(value(x)) * partials(x)) + +macro implement_derivative(f, Df) + return :($(esc(f))(x :: Dual) = _propagate_user_derivative($(esc(f)), $(esc(Df)), x)) +end + + +############ +# Gradient # +############ + +immutable GradientImplementConfig{T} + input::Vector{T} + user_gradient::Vector{T} + current_jacobian::Matrix{T} + updated_gradient::Vector{T} +end + +getchunksize(gic::GradientImplementConfig) = length(gic.updated_gradient) + +function GradientImplementConfig{T}(::Type{T}, chunk_size::Int, n_input::Int) + GradientImplementConfig( + zeros(T, n_input), + zeros(T, n_input), + zeros(T, n_input, chunk_size), + zeros(T, chunk_size) + ) +end + +GradientImplementConfig{T}(chunk_size::Int, x::Vector{T}) = GradientImplementConfig(T, chunk_size, length(x)) + +function _propagate_user_gradient!{D <: Dual}(f, Df!, x::Vector{D}, cfig::GradientImplementConfig) + for i in eachindex(x) + cfig.input[i] = value(x[i]) + end + fv = f(cfig.input) + Df!(cfig.user_gradient, cfig.input) + extract_jacobian!(cfig.current_jacobian, x, npartials(D)) + At_mul_B!(cfig.updated_gradient, cfig.current_jacobian, cfig.user_gradient) + return D(fv, Partials(totuple(D, cfig.updated_gradient))) +end + +@generated function totuple{D <: Dual}(::Type{D}, v) + return tupexpr(k -> :(v[$k]), npartials(D)) +end + +macro implement_gradient(f, Df) + return quote + function $(esc(f)){D <: Dual}(x::Vector{D}) + cfig = GradientImplementConfig(valtype(D), npartials(D), length(x)) + Df! = (G, x) -> copy!(G, $(esc(Df))(x)) + _propagate_user_gradient!($(esc(f)), Df!, x, cfig) + end + end +end + +macro implement_gradient!(f, Df!, cfig) + return quote + function $(esc(f)){D <: Dual}(x::Vector{D}) + _propagate_user_gradient!($(esc(f)), $(esc(Df!)), x, $(esc(cfig))) + end + end +end + + +############# +# Jacobian # +############ + +immutable JacobianImplementConfig{T} + input::Vector{T} + output::Vector{T} + user_jacobian::Matrix{T} + current_jacobian::Matrix{T} + new_jacobian::Matrix{T} +end + +function JacobianImplementConfig(::Type{T}, chunk_size::Int, n_output::Int, n_input::Int) where {T} + JacobianImplementConfig( + zeros(T, n_input), + zeros(T, n_output), + zeros(T, n_output, n_input), + zeros(T, n_input, chunk_size), + zeros(T, n_output, chunk_size) + ) +end + +function JacobianImplementConfig(chunk_size::Int, y::Vector{T}, x::Vector{T}) where {T} + JacobianImplementConfig(T, chunk_size, length(y), length(x)) +end + +function _propagate_user_jacobian!(f!, Df!, y::Vector{D}, x::Vector{D}, cfig::JacobianImplementConfig) where {D <: Dual} + for i in eachindex(x) + cfig.input[i] = value(x[i]) + end + f!(cfig.output, cfig.input) + Df!(cfig.user_jacobian, cfig.input) + extract_jacobian!(cfig.current_jacobian, x, npartials(D)) + A_mul_B!(cfig.new_jacobian, cfig.user_jacobian, cfig.current_jacobian) + for i in eachindex(cfig.output) + y[i] = D(cfig.output[i], Partials(getrowpartials(D, cfig.new_jacobian, i))) + end + return y +end + +function _propagate_user_jacobian(f, Df, x::Vector{D}) where {D <: Dual} + input = zeros(valtype(D), length(x)) + for i in eachindex(x) + input[i] = value(x[i]) + end + output = f(input) + user_jacobian = Df(input) + current_jacobian = zeros(valtype(D), length(x), npartials(D)) + extract_jacobian!(current_jacobian, x, npartials(D)) + new_jacobian = user_jacobian * current_jacobian + y = similar(x, length(output)) + for i in eachindex(output) + y[i] = D(output[i], Partials(getrowpartials(D, new_jacobian, i))) + end + return y +end + +@generated function getrowpartials{D <: Dual}(::Type{D}, J, i) + return tupexpr(k -> :(J[i, $k]), npartials(D)) +end + +macro implement_jacobian(f, Df) + return quote + function $(esc(f)){D <: Dual}(x::Vector{D}) + _propagate_user_jacobian($(esc(f)), $(esc(Df)), x) + end + end +end + +macro implement_jacobian!(f!, Df!, cfig) + return quote + function $(esc(f!)){D <: Dual}(y::Vector{D}, x::Vector{D}) + _propagate_user_jacobian!($(esc(f!)), $(esc(Df!)), y, x, $(esc(cfig))) + end + end +end + +########### +# Hessian # +########### + +# Currently not able to support this since the implementation Currently +# assumed that it is possible to evaluate gradients and jacobians of +# any functions passed to ForwardDiff. + +#= +# Note that since we only ask the user to provide the Hessian it is impossible +# to know what the gradient should be. Extracting lower order results will thus +# give wrong values. + +immutable HessianImplementConfig{T} + input::Vector{T} + user_hessian::Matrix{T} + current_hessian::Matrix{T} + new_hessian::Matrix{T} +end + +function HessianImplementConfig{T}(::Type{T}, chunk_size::Int, n_input::Int) + HessianImplementConfig( + zeros(T, n_input), + zeros(T, n_input, n_input), + zeros(T, n_input, chunk_size), + zeros(T, n_input, chunk_size) + ) +end + +function HessianImplementConfig{T}(chunk_size::Int, x::Vector{T}) + HessianImplementConfig(T, chunk_size, length(x), length(y)) +end + +function _propagate_user_hessian!{D <: Dual}(f, Hf!, x::Vector{D}, cfig::HessianImplementConfig) + for i in eachindex(x) + cfig.input[i] = value(x[i]) + end + fv = f(cfig.input) + Hf!(cfig.user_hessian, cfig.input) + extract_hessian!(cfig.current_hessian, x, npartials(D)) + A_mul_B!(cfig.new_hessian, cfig.user_hessian, cfig.current_hessian) + return Dual(fv, create_duals(D, cfig_new_hessian) +end + +function extract_hessian!(out::AbstractArray, ydual::Vector) + for col in 1:length(ydual) + parts = partials(ydual[col]) + for row in 1:length(ydual) + out[col, row] = parts[row] + end + end + return out +end + +=# diff --git a/test/UserDefinedDerivativesTest.jl b/test/UserDefinedDerivativesTest.jl new file mode 100644 index 00000000..a1b88556 --- /dev/null +++ b/test/UserDefinedDerivativesTest.jl @@ -0,0 +1,119 @@ +module UserDefinedDerivativesTest + +############## +# Derivative # +############## + +module Derivative + +using Base.Test +using ForwardDiff + +global d_counter = 0 +f(x) = 3*x +Df(x) = (global d_counter += 1; return 3*one(x)) +@ForwardDiff.implement_derivative f Df + +@test ForwardDiff.derivative(f, rand()) == 3.0 +@test d_counter == 1 + +# In function +function closure(a) + d_counter = 0 + Dff(x) = (d_counter += 1; return a*one(x)) + ff(x) = a*x + @ForwardDiff.implement_derivative ff Dff + @test ForwardDiff.derivative(ff, rand()) == a + @test d_counter == 1 +end + +closure(2.0) + +end + + +############ +# Gradient # +############ + +module Gradient + +using Base.Test +using ForwardDiff + +global g_counter = 0 +f(x) = norm(x) +Df!(g, x) = (global g_counter += 1; copy!(g, x); scale!(g, 1 / norm(x))) +x = rand(5) +cfig = ForwardDiff.GradientImplementConfig(5, x) +ForwardDiff.@implement_gradient! f Df! cfig +@test ForwardDiff.gradient(f, x) ≈ x / norm(x) +@test g_counter == 1 + +# ForwardDiff.Chunk mode + preallocated config for user gradient +g_counter = 0 +f2(x) = norm(x) +Df2(x) = (global g_counter += 1; x / norm(x)) +x = rand(5) +ForwardDiff.@implement_gradient f2 Df2 +cfg = ForwardDiff.GradientConfig(nothing, x, ForwardDiff.Chunk{2}()) +@test ForwardDiff.gradient(f2, x, cfg) ≈ x / norm(x) +@test g_counter == 3 + +end + +############ +# Jacobian # +############ + +module Jacobian + +using Base.Test +using ForwardDiff + +global j_counter = 0 +function f!(y, x) + y[1] = x[1]*x[2] + y[2] = x[2]*x[3] + return y +end + +function Df!(J, x) + global j_counter += 1 + J[1,1] = x[2] + J[1,2] = x[1] + J[1,3] = 0 + J[2,1] = 0 + J[2,2] = x[3] + J[2,3] = x[2] + return J +end + +J = zeros(2, 3) +x = rand(3) +y = zeros(2) +chunk = 2 +cfig = ForwardDiff.JacobianImplementConfig(chunk, y, x) + +@ForwardDiff.implement_jacobian! f! Df! cfig +cfg = ForwardDiff.JacobianConfig(nothing, y, x, ForwardDiff.Chunk{chunk}()) +@test ForwardDiff.jacobian!(J, f!, y, x, cfg) == Df!(J, x) +@test j_counter == 3 # Two for chunk mode and one for the call after == + +# Test non mutating version +global j_counter2 = 0 +f(x) = [x[1]*x[2], x[2]*x[3]] + + +function Df(x) + global j_counter2 += 1 + return [x[2] x[1] 0; + 0 x[3] x[2]] +end + +@ForwardDiff.implement_jacobian f Df +@test ForwardDiff.jacobian(f, x) == Df(x) +@test j_counter2 == 2 # One for jacobian and one for the call after == +end + +end diff --git a/test/runtests.jl b/test/runtests.jl index ede73646..3673e2ce 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -35,6 +35,11 @@ tic() include("MiscTest.jl") println("done (took $(toq()) seconds).") +println("Testing user defined derivatives...") +tic() +include("UserDefinedDerivativesTest.jl") +println("done (took $(toq()) seconds).") + if Base.JLOptions().opt_level >= 3 println("Testing SIMD vectorization...") tic()