`pow(base, exp)` gradient inconsistency at `base=0.0` #3031

nhuurre · 2024-03-20T07:16:58Z

The following model fails to initialize due to bad gradient if s = 1 or s = 2 but initializes fine when s = 3 and s = 4.

data {
  int<lower=1,upper=4> s;
}
parameters {
  vector[2] x;
}
model {
  if (s == 1)
    target += distance(x,x);
  else if (s == 2)
    target += pow(squared_distance(x,x), 0.5);
  else if (s == 3)
    target += pow(squared_distance(x,x), 0.5 + 1e-16);
  else if (s == 4)
    target += abs(x[1] - x[1]) + abs(x[2] - x[2]);
}

This is surprising because s = 1 looks like a well-formed (though redundant) model and in any case s=1, s=2, and s=3 should all be equivalent. (s=4 is a one-dimensional version of s=1)

The behaviour for s=2 and s=3 is different because pow(base, exp) skips gradient calculation at base=0 in the general case (as if the gradient was zero) but delegates exp=0.5 case to sqrt() which always computes the mathematically correct gradient (infinity at base=0). And zero gradient ends up being accidentally correct whereas an infinite gradient is an unrecoverable error.

The mathematically correct value for the derivative of pow(base, exp) with respect to base at base=0.0 is

0.0 when exp > 1.0
1.0 when exp = 1.0
positive infinity when 0.0 < exp < 1.0
0.0 when exp = 0.0
negative infinity when exp < 0.0

It would be reasonable to round infinity down to a finite-but-near-maximum floating point number.
Moreover, Stan only requires gradients to be correct almost everywhere. Boundary points like base=0 are relevant only in expressions like distance(x,x) where the gradients cancel anyway and the only thing that matters is whether the intermediate gradient values are finite. So the most numerically stable choice might be to "round" the infinity all the way down to zero.
I'm inclined to think pow()'s current behaviour is correct for us and sqrt() (and possibly cbrt()) should be "fixed" to also have vanishing gradient at zero.

Current Version:

v4.8.1

The text was updated successfully, but these errors were encountered:

WardBrian · 2024-05-22T17:31:44Z

#3033

nhuurre mentioned this issue Mar 21, 2024

sqrt(x) gradient when x=0 #3033

Merged

4 tasks

This was referenced Mar 21, 2024

pow applied to mixed variables #2993

Closed

fix pow<var>(0, 0) not causing NAN adjoints #2994

Closed

WardBrian closed this as completed May 22, 2024

WardBrian mentioned this issue Mar 6, 2025

Generalized normal distribution #3157

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pow(base, exp)` gradient inconsistency at `base=0.0` #3031

`pow(base, exp)` gradient inconsistency at `base=0.0` #3031

nhuurre commented Mar 20, 2024

WardBrian commented May 22, 2024

pow(base, exp) gradient inconsistency at base=0.0 #3031

pow(base, exp) gradient inconsistency at base=0.0 #3031

Comments

nhuurre commented Mar 20, 2024

Current Version:

WardBrian commented May 22, 2024

`pow(base, exp)` gradient inconsistency at `base=0.0` #3031

`pow(base, exp)` gradient inconsistency at `base=0.0` #3031