Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify tutorial and README #216

Merged
merged 8 commits into from
Apr 26, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions DifferentiationInterface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,7 @@ We support all of the backends defined by [ADTypes.jl](https://github.com/SciML/
To install the stable version of the package, run the following code in a Julia REPL:

```julia
julia> using Pkg

julia> Pkg.add("DifferentiationInterface")
julia> ]add DifferentiationInterface
adrhill marked this conversation as resolved.
Show resolved Hide resolved
```

To install the development version, run this instead:
Expand All @@ -68,19 +66,21 @@ julia> Pkg.add(

## Example

```jldoctest readme
julia> import ADTypes, ForwardDiff

julia> using DifferentiationInterface
```julia
using DifferentiationInterface, ADTypes
import ForwardDiff, Enzyme, Zygote # import automatic differentiation backends you want to use

julia> backend = ADTypes.AutoForwardDiff();
f(x) = sum(abs2, x)

julia> f(x) = sum(abs2, x);
x = [1., 2., 3.]
gdalle marked this conversation as resolved.
Show resolved Hide resolved

julia> value_and_gradient(f, backend, [1., 2., 3.])
(14.0, [2.0, 4.0, 6.0])
value_and_gradient(f, AutoForwardDiff(), x) # returns (14.0, [2.0, 4.0, 6.0]) using ForwardDiff.jl
value_and_gradient(f, AutoEnzyme(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Enzyme.jl
value_and_gradient(f, AutoZygote(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Zygote.jl
```

For more performance, take a look at the [DifferentiationInterface tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/tutorial/).

## Related packages

- [AbstractDifferentiation.jl](https://github.com/JuliaDiff/AbstractDifferentiation.jl) is the original inspiration for DifferentiationInterface.jl.
Expand Down
65 changes: 35 additions & 30 deletions DifferentiationInterface/docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,45 +6,45 @@ CurrentModule = Main

We present a typical workflow with DifferentiationInterface.jl and showcase its potential performance benefits.

```@repl tuto
using DifferentiationInterface
import ADTypes, ForwardDiff, Enzyme
using BenchmarkTools
```@example tuto
using DifferentiationInterface, ADTypes

import ForwardDiff, Enzyme # ⚠️ import the backends you want to use ⚠️
adrhill marked this conversation as resolved.
Show resolved Hide resolved
```

## Computing a gradient

A common use case of AD is optimizing real-valued functions with first- or second-order methods.
Let's define a simple objective

```@repl tuto
f(x::AbstractArray) = sum(abs2, x)
```
A common use case of automatic differentiation (AD) is optimizing real-valued functions with first- or second-order methods.
Let's define a simple objective and a random input vector

and a random input vector
```@example tuto
f(x) = sum(abs2, x)

```@repl tuto
x = [1.0, 2.0, 3.0];
x = [1.0, 2.0, 3.0]
nothing # hide
```

To compute its gradient, we need to choose a "backend", i.e. an AD package that DifferentiationInterface.jl will call under the hood.
Most backend types are defined by [ADTypes.jl](https://github.com/SciML/ADTypes.jl) and re-exported by DifferentiationInterface.jl.
adrhill marked this conversation as resolved.
Show resolved Hide resolved

[ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) is very generic and efficient for low-dimensional inputs, so it's a good starting point:

```@repl tuto
backend = ADTypes.AutoForwardDiff()
```@example tuto
backend = AutoForwardDiff()
nothing # hide
```

Now you can use DifferentiationInterface.jl to get the gradient:

```@repl tuto
```@example tuto
gradient(f, backend, x)
```

Was that fast?
[BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) helps you answer that question.

```@repl tuto
using BenchmarkTools
@btime gradient($f, $backend, $x);
```

Expand All @@ -58,13 +58,14 @@ Not bad, but you can do better.

## Overwriting a gradient

Since you know how much space your gradient will occupy, you can pre-allocate that memory and offer it to AD.
Since you know how much space your gradient will occupy (the same as your input `x`), you can pre-allocate that memory and offer it to AD.
Some backends get a speed boost from this trick.

```@repl tuto
```@example tuto
grad = zero(x)
adrhill marked this conversation as resolved.
Show resolved Hide resolved
gradient!(f, grad, backend, x);
grad
gradient!(f, grad, backend, x)

grad # has been mutated
```

The bang indicates that one of the arguments of `gradient!` might be mutated.
Expand All @@ -76,24 +77,26 @@ More precisely, our convention is that _every positional argument between the fu

For some reason the in-place version is not much better than your first attempt.
However, it has one less allocation, which corresponds to the gradient vector you provided.
Don't worry, you're not done yet.
Don't worry, we can get even more performance.
adrhill marked this conversation as resolved.
Show resolved Hide resolved

## Preparing for multiple gradients

Internally, ForwardDiff.jl creates some data structures to keep track of things.
These objects can be reused between gradient computations, even on different input values.
We abstract away the preparation step behind a backend-agnostic syntax:

```@repl tuto
```@example tuto
extras = prepare_gradient(f, backend, x)
nothing # hide
```

You don't need to know what this object is, you just need to pass it to the gradient operator.

```@repl tuto
grad = zero(x);
gradient!(f, grad, backend, x, extras);
grad
```@example tuto
grad = zero(x)
gradient!(f, grad, backend, x, extras)

grad # has been mutated
```

Preparation makes the gradient computation much faster, and (in this case) allocation-free.
Expand All @@ -115,13 +118,14 @@ So let's try the state-of-the-art [Enzyme.jl](https://github.com/EnzymeAD/Enzyme

For this one, the backend definition is slightly more involved, because you need to feed the "mode" to the object from ADTypes.jl:

```@repl tuto
backend2 = ADTypes.AutoEnzyme(; mode=Enzyme.Reverse)
```@example tuto
backend2 = AutoEnzyme(; mode=Enzyme.Reverse)
nothing # hide
```

But once it is done, things run smoothly with exactly the same syntax:

```@repl tuto
```@example tuto
gradient(f, backend2, x)
```

Expand All @@ -136,4 +140,5 @@ And you can run the same benchmarks:

Not only is it blazingly fast, you achieved this speedup without looking at the docs of either ForwardDiff.jl or Enzyme.jl!
In short, DifferentiationInterface.jl allows for easy testing and comparison of AD backends.
If you want to go further, check out the [DifferentiationTest.jl tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterfaceTest/dev/tutorial/).
If you want to go further, check out the [DifferentiationInterfaceTest.jl tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterfaceTest/dev/tutorial/).
It provides benchmarking utilities to compare backends and help you select the one that is best suited for your problem.
Loading