-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915
Conversation
Test build #25112 has started for PR 3915 at commit
|
Test build #25112 has finished for PR 3915 at commit
|
Test PASSed. |
Test build #25120 has started for PR 3915 at commit
|
Test build #25120 has finished for PR 3915 at commit
|
Test PASSed. |
For MLOR case, it's not as simple as binary case, but I managed to address it. Will be in another PR. Actually, this solves instability issue when the initial weights is not properly set. |
@@ -61,14 +62,27 @@ abstract class Gradient extends Serializable { | |||
class LogisticGradient extends Gradient { | |||
override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = { | |||
val margin = -1.0 * dot(data, weights) | |||
val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label | |||
/** | |||
* gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the logistic function? Maybe we should make create a function for it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but in MLOR, the formula is very different since we have k-1 margins, and we need to carry out the largest margin. As a result, it will be very hard to generalized for BLOR and MLOR. I plan to have it as it since we probably will not use it in other place. The following is the version for MLOR from the comment of my another PR.
* `multiplier = exp(margins(i)) / (1 + \sum_k exp(margins(k))) -
* \alpha(label) * \delta_{label}{i+1}`
*
* where \alpha(label) = 1 if label != 0;
* \alpha(label) = 0 if label == 0, and
* \delta_{i}{j} = 1 if i == j;
* \delta_{i}{j} = 0 if i != j
*
* See the reference for the detailed mathematical derivation.
*
* However, the first part of multiplier will be likely suffered from arithmetic overflow,
* if any one of the `margins` is larger than 709.78.
*
* Let's say the largest `margins(l)` is `maxMargin` and positive, then the formula can be
* rewritten into equivalent formula with more numerical stability by
*
* exp(margins(i)) / (1 + \sum_k exp(margins(k))) =
* exp(margins(i) / maxMargin) / (exp(-margins(l)) + \sum_k exp(margins(k) / maxMargin))
*
* If we define margins'(i) = margins(i) / maxMargin when i != l,
* and margins'(l) = -margins(l), we will have equivalent numerically stable formula as
*
* `multiplier = exp(margins'(i)) / (1 + \sum_k exp(margins'(k))) -
* \alpha(label) * \delta_{label}{i+1}`
*
* Note that if the largest `margins` is negative, the original formula is stable;
* as a result, we don't need to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS, I'll add a test to test overflow later. It does happen quite often when there are outliners which are far away from the hyperplane, and I run into this situation before.
Test build #25150 has started for PR 3915 at commit
|
Test build #25151 has started for PR 3915 at commit
|
Test build #25152 has started for PR 3915 at commit
|
Test build #25150 has finished for PR 3915 at commit
|
Test PASSed. |
Test build #25151 has finished for PR 3915 at commit
|
Test PASSed. |
Test build #25152 has finished for PR 3915 at commit
|
Test PASSed. |
LGTM. The method successfully avoids overflow in the intended way, there's a test, and from checking the old and new logic they should be identical. |
Merged into master. Thanks! |
When
x
is positive and large, computingmath.log(1 + math.exp(x))
will lead to arithmeticoverflow. This will happen when
x > 709.78
which is not a very large number.It can be addressed by rewriting the formula into
x + math.log1p(math.exp(-x))
whenx > 0
.