You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following model fails to initialize due to bad gradient if s = 1 or s = 2 but initializes fine when s = 3 and s = 4.
data {
int<lower=1,upper=4> s;
}
parameters {
vector[2] x;
}
model {
if (s ==1)
target+=distance(x,x);
elseif (s ==2)
target+=pow(squared_distance(x,x), 0.5);
elseif (s ==3)
target+=pow(squared_distance(x,x), 0.5+1e-16);
elseif (s ==4)
target+=abs(x[1] - x[1]) +abs(x[2] - x[2]);
}
This is surprising because s = 1 looks like a well-formed (though redundant) model and in any case s=1, s=2, and s=3 should all be equivalent. (s=4 is a one-dimensional version of s=1)
The behaviour for s=2 and s=3 is different because pow(base, exp) skips gradient calculation at base=0 in the general case (as if the gradient was zero) but delegates exp=0.5 case to sqrt() which always computes the mathematically correct gradient (infinity at base=0). And zero gradient ends up being accidentally correct whereas an infinite gradient is an unrecoverable error.
The mathematically correct value for the derivative of pow(base, exp) with respect to base at base=0.0 is
0.0 when exp > 1.0
1.0 when exp = 1.0
positive infinity when 0.0 < exp < 1.0
0.0 when exp = 0.0
negative infinity when exp < 0.0
It would be reasonable to round infinity down to a finite-but-near-maximum floating point number.
Moreover, Stan only requires gradients to be correct almost everywhere. Boundary points like base=0 are relevant only in expressions like distance(x,x) where the gradients cancel anyway and the only thing that matters is whether the intermediate gradient values are finite. So the most numerically stable choice might be to "round" the infinity all the way down to zero.
I'm inclined to think pow()'s current behaviour is correct for us and sqrt() (and possibly cbrt()) should be "fixed" to also have vanishing gradient at zero.
Current Version:
v4.8.1
The text was updated successfully, but these errors were encountered:
The following model fails to initialize due to bad gradient if
s = 1
ors = 2
but initializes fine whens = 3
ands = 4
.This is surprising because
s = 1
looks like a well-formed (though redundant) model and in any cases=1
,s=2
, ands=3
should all be equivalent. (s=4
is a one-dimensional version ofs=1
)The behaviour for
s=2
ands=3
is different becausepow(base, exp)
skips gradient calculation atbase=0
in the general case (as if the gradient was zero) but delegatesexp=0.5
case tosqrt()
which always computes the mathematically correct gradient (infinity atbase=0
). And zero gradient ends up being accidentally correct whereas an infinite gradient is an unrecoverable error.The mathematically correct value for the derivative of
pow(base, exp)
with respect tobase
atbase=0.0
isIt would be reasonable to round infinity down to a finite-but-near-maximum floating point number.
Moreover, Stan only requires gradients to be correct almost everywhere. Boundary points like
base=0
are relevant only in expressions likedistance(x,x)
where the gradients cancel anyway and the only thing that matters is whether the intermediate gradient values are finite. So the most numerically stable choice might be to "round" the infinity all the way down to zero.I'm inclined to think
pow()
's current behaviour is correct for us andsqrt()
(and possiblycbrt()
) should be "fixed" to also have vanishing gradient at zero.Current Version:
v4.8.1
The text was updated successfully, but these errors were encountered: