Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: implement macro for users to define their own derivatives #165

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions docs/_rst/source/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,126 @@ expensive operation):
Likewise, you could write a version of ``vector_hessian`` which supports functions of the
form ``f!(y, x)``, or perhaps an in-place Jacobian with ``ForwardDiff.jacobian!``.

User defined derivatives
------------------------

Sometimes one wants to use `ForwardDiff` on a function where a part of the function (here denoted subfunction) does not support dual numbers.
This could be in cases where the subfunction exists in non Julia code (like a compiled C library) or when the function is implicitly defined.
If you can provide the derivative of the subfunction then `ForwardDiff` can properly propagate that derivative to the dual numbers so that the original function is still able to be automatically differentiated.
This technique can also be useful if an optimized version of the derivative is available that is faster to compute than the normal way of propagating dual numbers.

**Note**: When defining a function and its derivative inside another function (for example when creating a closure) the derivative need to be defined first. This is due to a lowering bug in Julia 0.5. In the examples below, the functions are defined in global scope and there the order of function and derivative definition does not matter.

** Derivatives **

Below is a minimum example where we define the explicit derivative for a small function and use the `@implement_derivative` macro to tell `ForwardDiff` that the function `f` has a user supplied derivative. The `@implement_derivative` macro takes two arguments, the first is the name of the function and the second is the name of the function that computes the derivative:

.. code-block:: julia
julia> f(x) = x^2 + x;

julia> Df(x) = (println("I got called"); 2x + 1);

julia> ForwardDiff.@implement_derivative f Df
f (generic function with 2 methods)

julia> ForwardDiff.derivative(f, 2.0)
I got called
5.0


**Gradients**

Two macros are available for user defined gradients, `@implement_gradient` and `@implement_gradient!`. The first one (without the exclamation mark) takes two arguments. A function `f(x)` where `x` is a `Vector` that returns a scalar and the second `Df(x)`. which also takes a `Vector` but returns the gradient.

.. code-block:: julia
julia> g(x) = norm(x);

julia> Dg(x) = (println("I got called"); x / norm(x))

julia> ForwardDiff.@implement_gradient g Dg
g (generic function with 2 methods)

julia> ForwardDiff.gradient(g, [1.0, 2.0])
I got called
2-element Array{Float64,1}:
0.447214
0.894427

For better performance, use the second macro `@implement_gradient!`. This macro expects three arguments, a function `f(x)`, it's gradient `Df!(G, x)` which updates `G` and a config type that caches some intermediate arrays needed. This type is created from `GradientImplementConfig(T, chunk_size::Int, input_size::Int)` where `T` is the type of the input vector `x`, or `GradientImplementConfig(chunk_size::Int, x)`. The same example is given as above (in real code this code shoule be in a function or with `const` on the variables):

.. code-block:: julia
julia> h(x) = norm(x);

julia> Dh!(G, x) = (println("I got called"); copy!(G, x); scale!(G, 1 / norm(x)));

julia> config = ForwardDiff.GradientImplementConfig(Float64, 2, 2);

julia> ForwardDiff.@implement_gradient! h Dh! config
h (generic function with 2 methods)

julia> x = [1.0, 2.0];

julia> ForwardDiff.gradient(h, x, ForwardDiff.GradientConfig{2}(x))
I got called
2-element Array{Float64,1}:
0.447214
0.894427


** Jacobians**

Similar to the gradient case there are two macros, `@implement_jacobian` and `@implement_jacobian`. The first macro is used in an analogous fashion as the gradient macro:

.. code-block:: julia
julia> p(x) = 5*x;

julia> Dp(x) = (println("I got called"); 5*eye(length(x)));

julia> ForwardDiff.@implement_jacobian p Dp
p (generic function with 2 methods)

julia> ForwardDiff.jacobian(p, [1.0, 2.0])
I got called
2×2 Array{Float64,2}:
5.0 0.0
0.0 5.0

The second one is the three argument version that also takes a `JacobianImplementConfig` which need access to both the input length and output length:

.. code-lock:: julia
julia> q!(y, x) = (copy!(y, x); scale!(y, 5.0));

julia> function Dq!(J, x)
println("I got called")
for i in 1:length(x)
J[i,i] = 5.0
end
return J
end;

julia> x = rand(5);

julia> y = similar(x);

julia> J = zeros(5,5);

julia> chunk_size = 5;

julia> cfig = ForwardDiff.JacobianImplementConfig(chunk_size, y, x);

julia> ForwardDiff.@implement_jacobian! q! Dq! cfig
q! (generic function with 2 methods)

julia> ForwardDiff.jacobian!(J, q!, y, x, ForwardDiff.JacobianConfig{chunk_size}(y, x))
I got called
5×5 Array{Float64,2}:
5.0 0.0 0.0 0.0 0.0
0.0 5.0 0.0 0.0 0.0
0.0 0.0 5.0 0.0 0.0
0.0 0.0 0.0 5.0 0.0
0.0 0.0 0.0 0.0 5.0


SIMD Vectorization
------------------

Expand Down
1 change: 1 addition & 0 deletions src/ForwardDiff.jl
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ include("derivative.jl")
include("gradient.jl")
include("jacobian.jl")
include("hessian.jl")
include("user_implementations.jl")
include("deprecated.jl")

export DiffBase
Expand Down
207 changes: 207 additions & 0 deletions src/user_implementations.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# This file contains the macros and functions that enables
# user to define custom derivatives for arbitrary functions.
# This is useful is when calling out to external libraries,
# when the function is implicitly defined or when an optimized
# version of the derivative is available.

##############
# Derivative #
##############

_propagate_user_derivative{D <: Dual}(f, Df, x::D) = Dual(f(value(x)), Df(value(x)) * partials(x))

macro implement_derivative(f, Df)
return :($(esc(f))(x :: Dual) = _propagate_user_derivative($(esc(f)), $(esc(Df)), x))
end


############
# Gradient #
############

immutable GradientImplementConfig{T}
input::Vector{T}
user_gradient::Vector{T}
current_jacobian::Matrix{T}
updated_gradient::Vector{T}
end

getchunksize(gic::GradientImplementConfig) = length(gic.updated_gradient)

function GradientImplementConfig{T}(::Type{T}, chunk_size::Int, n_input::Int)
GradientImplementConfig(
zeros(T, n_input),
zeros(T, n_input),
zeros(T, n_input, chunk_size),
zeros(T, chunk_size)
)
end

GradientImplementConfig{T}(chunk_size::Int, x::Vector{T}) = GradientImplementConfig(T, chunk_size, length(x))

function _propagate_user_gradient!{D <: Dual}(f, Df!, x::Vector{D}, cfig::GradientImplementConfig)
for i in eachindex(x)
cfig.input[i] = value(x[i])
end
fv = f(cfig.input)
Df!(cfig.user_gradient, cfig.input)
extract_jacobian!(cfig.current_jacobian, x, npartials(D))
At_mul_B!(cfig.updated_gradient, cfig.current_jacobian, cfig.user_gradient)
return D(fv, Partials(totuple(D, cfig.updated_gradient)))
end

@generated function totuple{D <: Dual}(::Type{D}, v)
return tupexpr(k -> :(v[$k]), npartials(D))
end

macro implement_gradient(f, Df)
return quote
function $(esc(f)){D <: Dual}(x::Vector{D})
cfig = GradientImplementConfig(valtype(D), npartials(D), length(x))
Df! = (G, x) -> copy!(G, $(esc(Df))(x))
_propagate_user_gradient!($(esc(f)), Df!, x, cfig)
end
end
end

macro implement_gradient!(f, Df!, cfig)
return quote
function $(esc(f)){D <: Dual}(x::Vector{D})
_propagate_user_gradient!($(esc(f)), $(esc(Df!)), x, $(esc(cfig)))
end
end
end


#############
# Jacobian #
############

immutable JacobianImplementConfig{T}
input::Vector{T}
output::Vector{T}
user_jacobian::Matrix{T}
current_jacobian::Matrix{T}
new_jacobian::Matrix{T}
end

function JacobianImplementConfig(::Type{T}, chunk_size::Int, n_output::Int, n_input::Int) where {T}
JacobianImplementConfig(
zeros(T, n_input),
zeros(T, n_output),
zeros(T, n_output, n_input),
zeros(T, n_input, chunk_size),
zeros(T, n_output, chunk_size)
)
end

function JacobianImplementConfig(chunk_size::Int, y::Vector{T}, x::Vector{T}) where {T}
JacobianImplementConfig(T, chunk_size, length(y), length(x))
end

function _propagate_user_jacobian!(f!, Df!, y::Vector{D}, x::Vector{D}, cfig::JacobianImplementConfig) where {D <: Dual}
for i in eachindex(x)
cfig.input[i] = value(x[i])
end
f!(cfig.output, cfig.input)
Df!(cfig.user_jacobian, cfig.input)
extract_jacobian!(cfig.current_jacobian, x, npartials(D))
A_mul_B!(cfig.new_jacobian, cfig.user_jacobian, cfig.current_jacobian)
for i in eachindex(cfig.output)
y[i] = D(cfig.output[i], Partials(getrowpartials(D, cfig.new_jacobian, i)))
end
return y
end

function _propagate_user_jacobian(f, Df, x::Vector{D}) where {D <: Dual}
input = zeros(valtype(D), length(x))
for i in eachindex(x)
input[i] = value(x[i])
end
output = f(input)
user_jacobian = Df(input)
current_jacobian = zeros(valtype(D), length(x), npartials(D))
extract_jacobian!(current_jacobian, x, npartials(D))
new_jacobian = user_jacobian * current_jacobian
y = similar(x, length(output))
for i in eachindex(output)
y[i] = D(output[i], Partials(getrowpartials(D, new_jacobian, i)))
end
return y
end

@generated function getrowpartials{D <: Dual}(::Type{D}, J, i)
return tupexpr(k -> :(J[i, $k]), npartials(D))
end

macro implement_jacobian(f, Df)
return quote
function $(esc(f)){D <: Dual}(x::Vector{D})
_propagate_user_jacobian($(esc(f)), $(esc(Df)), x)
end
end
end

macro implement_jacobian!(f!, Df!, cfig)
return quote
function $(esc(f!)){D <: Dual}(y::Vector{D}, x::Vector{D})
_propagate_user_jacobian!($(esc(f!)), $(esc(Df!)), y, x, $(esc(cfig)))
end
end
end

###########
# Hessian #
###########

# Currently not able to support this since the implementation Currently
# assumed that it is possible to evaluate gradients and jacobians of
# any functions passed to ForwardDiff.

#=
# Note that since we only ask the user to provide the Hessian it is impossible
# to know what the gradient should be. Extracting lower order results will thus
# give wrong values.

immutable HessianImplementConfig{T}
input::Vector{T}
user_hessian::Matrix{T}
current_hessian::Matrix{T}
new_hessian::Matrix{T}
end

function HessianImplementConfig{T}(::Type{T}, chunk_size::Int, n_input::Int)
HessianImplementConfig(
zeros(T, n_input),
zeros(T, n_input, n_input),
zeros(T, n_input, chunk_size),
zeros(T, n_input, chunk_size)
)
end

function HessianImplementConfig{T}(chunk_size::Int, x::Vector{T})
HessianImplementConfig(T, chunk_size, length(x), length(y))
end

function _propagate_user_hessian!{D <: Dual}(f, Hf!, x::Vector{D}, cfig::HessianImplementConfig)
for i in eachindex(x)
cfig.input[i] = value(x[i])
end
fv = f(cfig.input)
Hf!(cfig.user_hessian, cfig.input)
extract_hessian!(cfig.current_hessian, x, npartials(D))
A_mul_B!(cfig.new_hessian, cfig.user_hessian, cfig.current_hessian)
return Dual(fv, create_duals(D, cfig_new_hessian)
end

function extract_hessian!(out::AbstractArray, ydual::Vector)
for col in 1:length(ydual)
parts = partials(ydual[col])
for row in 1:length(ydual)
out[col, row] = parts[row]
end
end
return out
end

=#
Loading