-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for the determinant function #197
Comments
det
function
ForwardDiff is giving the correct result, your implementation of Jacobi's Formula is incorrect: julia> using ForwardDiff
julia> A = rand(3, 3);
julia> adj(A) = det(A) * inv(A)
adj (generic function with 1 method)
# what Jacobi's Formula actually says should hold true
julia> isapprox(adj(A)', ForwardDiff.gradient(det, A))
true You can also verify by checking with finite differences and reverse-mode AD, which give similar results. |
I came too with tthe same issue. While a generic matrix (generated randomly) seems to give the correct result, specific ones appear to be wrong:
I apologize in advance if this is a problem with my algebra, but I don't understand why I am getting it wrong. Cheers, Matías |
Edit: This is wrong The determinant of
is always
So Forward diff seems correct. Do all cofactors need to be nonzero for Jacobis formula to work? |
[UPDATE: please see comment below] You are right, I could not find any reference as to why the Jacobi's Formula does not apply, with conditions on cofactors or otherwise (it is commonly stated as a general property, i.e. applying to all cases, which puzzles me very much). But doing the derivative by hand does give (hopefully) the correct result and this coincides with the one from Forward diff. Also, thank you for your time here and elsewhere, I am now trying to use Tensors.jl in research and it's great! |
Hello, after looking more carefully at the problem, I still think there is an issue with gradient of the determinant function, either considering symbolic differentiation or a finite difference approximation, we arrive at the answer given by the Jacobi's formula. Please take a look at position [2,3] of the gradient: Symbolicjulia> using SymPy
julia> entries = @syms a b c d e f g h i real=true;
julia> A = reshape([entries...],(3,3))
3×3 Array{SymPy.Sym,2}
⎡a d g⎤
⎢ ⎥
⎢b e h⎥
⎢ ⎥
⎣c f i⎦
julia> detA = det(A) |> simplify
a⋅e⋅i - a⋅f⋅h - b⋅d⋅i + b⋅f⋅g + c⋅d⋅h - c⋅e⋅g
julia> dA = diff.(detA,A)
3×3 Array{SymPy.Sym,2}
⎡e⋅i - f⋅h -b⋅i + c⋅h b⋅f - c⋅e ⎤
⎢ ⎥
⎢-d⋅i + f⋅g a⋅i - c⋅g -a⋅f + c⋅d⎥
⎢ ⎥
⎣d⋅h - e⋅g -a⋅h + b⋅g a⋅e - b⋅d ⎦
julia> dA |> subs(b=>0,c=>0,d=>0,g=>0,h=>0)
3×3 Array{SymPy.Sym,2}
⎡e⋅i 0 0 ⎤
⎢ ⎥
⎢ 0 a⋅i -a⋅f⎥
⎢ ⎥
⎣ 0 0 a⋅e ⎦ Finite differencesjulia> X = Float64[1 0 0; 0 1 0; 0 1 1]
3×3 Array{Float64,2}:
1.0 0.0 0.0
0.0 1.0 0.0
0.0 1.0 1.0
julia> h = 0.01; ɛ = zeros(3,3); ɛ[2,3] = h;
julia> (det(X) - det(X + ɛ))/h
1.0000000000000009 ForwardDiffjulia> using ForwardDiff
julia> ForwardDiff.gradient(det,X)
3×3 Array{Float64,2}:
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0 |
It does seem like ForwardDiff is giving the wrong answer here, thanks for pursuing this further @schrauf. I checked against ReverseDiff, whose answer matches SymPy: julia> using ForwardDiff, ReverseDiff
julia> X = Float64[rand() 0 0; 0 rand() 0; 0 rand() rand()]
3×3 Array{Float64,2}:
0.596899 0.0 0.0
0.0 0.147049 0.0
0.0 0.507581 0.2777
julia> ForwardDiff.gradient(det, X)
3×3 Array{Float64,2}:
0.0408355 0.0 0.0
0.0 0.165759 0.0
0.0 0.0 0.0877734
julia> ReverseDiff.gradient(det, X)
3×3 Array{Float64,2}:
0.0408355 0.0 0.0
0.0 0.165759 -0.302975
0.0 0.0 0.0877734 |
With Tensors.jl I get: << t = Tensor{2,3}(Float64[1 0 0; 0 1 0; 0 1 1])
>> 3×3 Tensors.Tensor{2,3,Float64,9}:
1.0 0.0 0.0
0.0 1.0 0.0
0.0 1.0 1.0
<< gradient(det, t)
>> 3×3 Tensors.Tensor{2,3,Float64,9}:
1.0 0.0 0.0
0.0 1.0 -1.0
0.0 0.0 1.0 which is correct. Strange because we use the Dual numbers from here... |
The problem is in Base definition of << det2(A) = return (
A[1,1]*(A[2,2]*A[3,3]-A[2,3]*A[3,2]) -
A[1,2]*(A[2,1]*A[3,3]-A[2,3]*A[3,1]) +
A[1,3]*(A[2,1]*A[3,2]-A[2,2]*A[3,1])
)
>> det2 (generic function with 1 method)
<< ForwardDiff.gradient(det2,X)
>> 3×3 Array{Float64,2}:
0.0281524 0.0 0.0
0.0 0.0479236 -0.495581
0.0 0.0 0.241191 When the matrix is triangular, base computes Using the hand written << ForwardDiff.gradient(det2,UpperTriangular(X))
>> 3×3 UpperTriangular{Float64,Array{Float64,2}}:
0.0281524 0.0 0.0
⋅ 0.0479236 0.0
⋅ ⋅ 0.241191 which seems to confirm the analysis above. |
Could it be because the Base function det{T}(A::AbstractMatrix{T})
if istriu(A) || istril(A)
S = typeof((one(T)*zero(T) + zero(T))/one(T))
return convert(S, det(UpperTriangular(A)))
end
return det(lufact(A))
end EDIT: Yup, this is the reason. This ends up calling a special version of det(A::UpperTriangular) = prod(diag(A.data)) |
Out of curiosity, do you know if there is a term for this type of "deficiency" in automatic differentiation? |
Not really. I'll do some digging though, this is pretty interesting. I wonder if there are similar cases in Base linear algebra code where the optimizations aren't dual-safe. As far as naive solutions go, we can define something like |
So I'm at ISMP 2018 and just saw a talk where something clicked for me. It turns out that these sorts of problematic functions can be modeled as piecewise differentiable from an AD perspective - meaning you can apply existing nonsmooth AD techniques to them (e.g. ANF) in order to recover information that is lost due to implementation structure. What that means is that If you have a caller that supports ANF, they can handle weird functions like these just fine. Of course, that requires that the caller (e.g. solver) understand and utilize ANF... More of a theoretical tidbit, but thought I'd leave a note here for posterity. |
My Google-fu has deserted me and I am unable to find what ANF might be. |
Ah, could that be Abs Normal Form that you were talking about? |
Yes. |
Is there still no solution to this problem? |
You could use something like #165 and define the extend the derivative of |
I "solved" the issue with my own version of the determinant function. However, if this persists it will clearly waste someone else's time when they try to differentiate through some linear algebra with a determinant thrown in. By the way, are there any other known functions in LinearAlgebra which don't differentiate through nicely? |
Basically, anything that exploits the sparsity of an array to compute the result. |
I naively tried
ForwardDiff.gradient(det, A)
and compared it to the output of(det(A) * inv(A))' * A)
(Jacobi's Formula for a real square matrixA
. They should be the same value, but they are radically different.What am I getting wrong here? Shouldn't ForwardDiff support
det
out of the box since it's a core function of Julia?Thanks.
The text was updated successfully, but these errors were encountered: