support for the determinant function #197

gideonite · 2017-02-20T18:01:38Z

I naively tried ForwardDiff.gradient(det, A) and compared it to the output of (det(A) * inv(A))' * A)(Jacobi's Formula for a real square matrix A. They should be the same value, but they are radically different.

What am I getting wrong here? Shouldn't ForwardDiff support det out of the box since it's a core function of Julia?

Thanks.

The text was updated successfully, but these errors were encountered:

jrevels · 2017-02-21T18:48:15Z

ForwardDiff is giving the correct result, your implementation of Jacobi's Formula is incorrect:

julia> using ForwardDiff

julia> A = rand(3, 3);

julia> adj(A) = det(A) * inv(A)
adj (generic function with 1 method)

# what Jacobi's Formula actually says should hold true
julia> isapprox(adj(A)', ForwardDiff.gradient(det, A))
true

You can also verify by checking with finite differences and reverse-mode AD, which give similar results.

schrauf · 2017-03-09T15:07:02Z

I came too with tthe same issue.

While a generic matrix (generated randomly) seems to give the correct result, specific ones appear to be wrong:

julia> using ForwardDiff

julia> A = [1 0 0; 0 1 0; 0 1 1];

julia> adj(A) = det(A) * inv(A)
adj (generic function with 1 method)

julia> adj(A)'
3×3 Array{Float64,2}:
 1.0  0.0   0.0
 0.0  1.0  -1.0
 0.0  0.0   1.0

julia> ForwardDiff.gradient(det,A)
3×3 Array{Float64,2}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

julia> isapprox(adj(A)', ForwardDiff.gradient(det, A))
false

I apologize in advance if this is a problem with my algebra, but I don't understand why I am getting it wrong.

Cheers, Matías

KristofferC · 2017-03-09T15:22:05Z

Edit: This is wrong

The determinant of

x 0 0
0 y 0
0 a z

is always xyz. So the derivative should be

yz 0 0
0 xz 0
0 0 xy

So Forward diff seems correct. Do all cofactors need to be nonzero for Jacobis formula to work?

schrauf · 2017-03-09T18:07:44Z

[UPDATE: please see comment below]
First of all, thank you for the prompt reply.

You are right, I could not find any reference as to why the Jacobi's Formula does not apply, with conditions on cofactors or otherwise (it is commonly stated as a general property, i.e. applying to all cases, which puzzles me very much).

But doing the derivative by hand does give (hopefully) the correct result and this coincides with the one from Forward diff.

Also, thank you for your time here and elsewhere, I am now trying to use Tensors.jl in research and it's great!

schrauf · 2017-03-13T14:53:17Z

Hello,

after looking more carefully at the problem, I still think there is an issue with gradient of the determinant function, either considering symbolic differentiation or a finite difference approximation, we arrive at the answer given by the Jacobi's formula.

Please take a look at position [2,3] of the gradient:

Symbolic

julia> using SymPy

julia> entries = @syms a b c d e f g h i real=true;
julia> A = reshape([entries...],(3,3))
3×3 Array{SymPy.Sym,2}
⎡a  d  g⎤
⎢       ⎥
⎢b  e  h⎥
⎢       ⎥
⎣c  f  i⎦

julia> detA = det(A) |> simplify
a⋅e⋅i - a⋅f⋅h - b⋅d⋅i + b⋅f⋅g + c⋅d⋅h - c⋅e⋅g

julia> dA = diff.(detA,A)
3×3 Array{SymPy.Sym,2}
⎡e⋅i - f⋅h   -b⋅i + c⋅h  b⋅f - c⋅e ⎤
⎢                                  ⎥
⎢-d⋅i + f⋅g  a⋅i - c⋅g   -a⋅f + c⋅d⎥
⎢                                  ⎥
⎣d⋅h - e⋅g   -a⋅h + b⋅g  a⋅e - b⋅d ⎦

julia> dA |> subs(b=>0,c=>0,d=>0,g=>0,h=>0)
3×3 Array{SymPy.Sym,2}
⎡e⋅i   0    0  ⎤
⎢              ⎥
⎢ 0   a⋅i  -a⋅f⎥
⎢              ⎥
⎣ 0    0   a⋅e ⎦

Finite differences

julia> X = Float64[1 0 0; 0 1 0; 0 1 1]
3×3 Array{Float64,2}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  1.0  1.0

julia> h = 0.01;  ɛ = zeros(3,3); ɛ[2,3] = h;

julia> (det(X) - det(X + ɛ))/h
1.0000000000000009

ForwardDiff

julia> using ForwardDiff

julia> ForwardDiff.gradient(det,X)
3×3 Array{Float64,2}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

jrevels · 2017-03-13T15:04:08Z

It does seem like ForwardDiff is giving the wrong answer here, thanks for pursuing this further @schrauf. I checked against ReverseDiff, whose answer matches SymPy:

julia> using ForwardDiff, ReverseDiff

julia> X = Float64[rand() 0 0; 0 rand() 0; 0 rand() rand()]
3×3 Array{Float64,2}:
 0.596899  0.0       0.0
 0.0       0.147049  0.0
 0.0       0.507581  0.2777

julia> ForwardDiff.gradient(det, X)
3×3 Array{Float64,2}:
 0.0408355  0.0       0.0
 0.0        0.165759  0.0
 0.0        0.0       0.0877734

julia> ReverseDiff.gradient(det, X)
3×3 Array{Float64,2}:
 0.0408355  0.0        0.0
 0.0        0.165759  -0.302975
 0.0        0.0        0.0877734

KristofferC · 2017-03-13T15:04:25Z

With Tensors.jl I get:

<< t = Tensor{2,3}(Float64[1 0 0; 0 1 0; 0 1 1])
>> 3×3 Tensors.Tensor{2,3,Float64,9}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  1.0  1.0

<< gradient(det, t)
>> 3×3 Tensors.Tensor{2,3,Float64,9}:
 1.0  0.0   0.0
 0.0  1.0  -1.0
 0.0  0.0   1.0

which is correct. Strange because we use the Dual numbers from here...

KristofferC · 2017-03-13T15:19:03Z

The problem is in Base definition of det. Using a hand written version:

<< det2(A) = return (
       A[1,1]*(A[2,2]*A[3,3]-A[2,3]*A[3,2]) -
       A[1,2]*(A[2,1]*A[3,3]-A[2,3]*A[3,1]) +
       A[1,3]*(A[2,1]*A[3,2]-A[2,2]*A[3,1])
   )
>> det2 (generic function with 1 method)

<< ForwardDiff.gradient(det2,X)
>> 3×3 Array{Float64,2}:
 0.0281524  0.0         0.0     
 0.0        0.0479236  -0.495581
 0.0        0.0         0.241191

When the matrix is triangular, base computes det from just the upper triangular part of the matrix and I think thereby the sensitivity is not properly propagated to all the partials.

Using the hand written det2 with the upper triangular part, yields:

<< ForwardDiff.gradient(det2,UpperTriangular(X))
>> 3×3 UpperTriangular{Float64,Array{Float64,2}}:
 0.0281524  0.0        0.0     
  ⋅         0.0479236  0.0     
  ⋅          ⋅         0.241191

which seems to confirm the analysis above.

jrevels · 2017-03-13T15:24:56Z

Could it be because the Base det definition converts lower triangular inputs (our pathological input here) to UpperTriangular arrays:

function det{T}(A::AbstractMatrix{T})
    if istriu(A) || istril(A)
        S = typeof((one(T)*zero(T) + zero(T))/one(T))
        return convert(S, det(UpperTriangular(A)))
    end
    return det(lufact(A))
end

EDIT: Yup, this is the reason. This ends up calling a special version of det which only multiplies the diagonals:

det(A::UpperTriangular) = prod(diag(A.data))

KristofferC · 2017-03-13T15:28:16Z

Out of curiosity, do you know if there is a term for this type of "deficiency" in automatic differentiation?

jrevels · 2017-03-13T15:40:11Z

Not really. I'll do some digging though, this is pretty interesting. I wonder if there are similar cases in Base linear algebra code where the optimizations aren't dual-safe.

As far as naive solutions go, we can define something like det(x::UpperTriangular{D<:Dual}) = det(lufact(x.data)) (probably for LowerTriangular too).

jrevels · 2018-07-04T10:16:26Z

So I'm at ISMP 2018 and just saw a talk where something clicked for me.

It turns out that these sorts of problematic functions can be modeled as piecewise differentiable from an AD perspective - meaning you can apply existing nonsmooth AD techniques to them (e.g. ANF) in order to recover information that is lost due to implementation structure. What that means is that If you have a caller that supports ANF, they can handle weird functions like these just fine. Of course, that requires that the caller (e.g. solver) understand and utilize ANF...

More of a theoretical tidbit, but thought I'd leave a note here for posterity.

dpsanders · 2018-09-23T17:22:26Z

My Google-fu has deserted me and I am unable to find what ANF might be.

dpsanders · 2018-09-23T17:23:09Z

Ah, could that be Abs Normal Form that you were talking about?

jrevels · 2018-09-24T16:15:17Z

Ah, could that be Abs Normal Form that you were talking about?

Yes.

PetrKryslUCSD · 2019-07-28T15:32:43Z

Is there still no solution to this problem?

KristofferC · 2019-07-28T17:40:11Z

You could use something like #165 and define the extend the derivative of det for Matrix{<:Dual} based on the analytical expression.

PetrKryslUCSD · 2019-07-28T18:20:14Z

I "solved" the issue with my own version of the determinant function. However, if this persists it will clearly waste someone else's time when they try to differentiate through some linear algebra with a determinant thrown in.

By the way, are there any other known functions in LinearAlgebra which don't differentiate through nicely?

KristofferC · 2019-07-29T13:58:40Z

Basically, anything that exploits the sparsity of an array to compute the result.

gideonite changed the title ~~support for det function~~ support for the determinant function Feb 20, 2017

jrevels closed this as completed Feb 21, 2017

jrevels reopened this Mar 13, 2017

jrevels mentioned this issue Feb 6, 2018

Incorrect computation of Hessian of squared euclidean norm #303

Open

fredrikekre mentioned this issue Jul 28, 2019

det not handled correctly? #407

Closed

devmotion mentioned this issue Jan 17, 2020

Support Distributions 0.22 TuringLang/DistributionsAD.jl#23

Merged

This was referenced Nov 27, 2020

Inequality, ε-balls, and accidental structural zeros #480

Open

Change == to ignore measure-zero branches #481

Merged

KristofferC closed this as completed in #481 Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for the determinant function #197

support for the determinant function #197

gideonite commented Feb 20, 2017

jrevels commented Feb 21, 2017

schrauf commented Mar 9, 2017 •

edited

Loading

KristofferC commented Mar 9, 2017 •

edited

Loading

schrauf commented Mar 9, 2017 •

edited

Loading

schrauf commented Mar 13, 2017 •

edited

Loading

jrevels commented Mar 13, 2017

KristofferC commented Mar 13, 2017 •

edited

Loading

KristofferC commented Mar 13, 2017 •

edited

Loading

jrevels commented Mar 13, 2017 •

edited

Loading

KristofferC commented Mar 13, 2017

jrevels commented Mar 13, 2017

jrevels commented Jul 4, 2018

dpsanders commented Sep 23, 2018

dpsanders commented Sep 23, 2018

jrevels commented Sep 24, 2018

PetrKryslUCSD commented Jul 28, 2019

KristofferC commented Jul 28, 2019

PetrKryslUCSD commented Jul 28, 2019 •

edited

Loading

KristofferC commented Jul 29, 2019

support for the determinant function #197

support for the determinant function #197

Comments

gideonite commented Feb 20, 2017

jrevels commented Feb 21, 2017

schrauf commented Mar 9, 2017 • edited Loading

KristofferC commented Mar 9, 2017 • edited Loading

schrauf commented Mar 9, 2017 • edited Loading

schrauf commented Mar 13, 2017 • edited Loading

Symbolic

Finite differences

ForwardDiff

jrevels commented Mar 13, 2017

KristofferC commented Mar 13, 2017 • edited Loading

KristofferC commented Mar 13, 2017 • edited Loading

jrevels commented Mar 13, 2017 • edited Loading

KristofferC commented Mar 13, 2017

jrevels commented Mar 13, 2017

jrevels commented Jul 4, 2018

dpsanders commented Sep 23, 2018

dpsanders commented Sep 23, 2018

jrevels commented Sep 24, 2018

PetrKryslUCSD commented Jul 28, 2019

KristofferC commented Jul 28, 2019

PetrKryslUCSD commented Jul 28, 2019 • edited Loading

KristofferC commented Jul 29, 2019

schrauf commented Mar 9, 2017 •

edited

Loading

KristofferC commented Mar 9, 2017 •

edited

Loading

schrauf commented Mar 9, 2017 •

edited

Loading

schrauf commented Mar 13, 2017 •

edited

Loading

KristofferC commented Mar 13, 2017 •

edited

Loading

KristofferC commented Mar 13, 2017 •

edited

Loading

jrevels commented Mar 13, 2017 •

edited

Loading

PetrKryslUCSD commented Jul 28, 2019 •

edited

Loading