[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915

dbtsai · 2015-01-06T20:58:31Z

When x is positive and large, computing math.log(1 + math.exp(x)) will lead to arithmetic
overflow. This will happen when x > 709.78 which is not a very large number.
It can be addressed by rewriting the formula into x + math.log1p(math.exp(-x)) when x > 0.

SparkQA · 2015-01-06T21:02:50Z

Test build #25112 has started for PR 3915 at commit 6b3ca72.

This patch merges cleanly.

SparkQA · 2015-01-06T22:12:30Z

Test build #25112 has finished for PR 3915 at commit 6b3ca72.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-06T22:12:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25112/
Test PASSed.

SparkQA · 2015-01-06T23:17:54Z

Test build #25120 has started for PR 3915 at commit e4957d5.

This patch merges cleanly.

SparkQA · 2015-01-07T00:29:36Z

Test build #25120 has finished for PR 3915 at commit e4957d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-07T00:29:40Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25120/
Test PASSed.

dbtsai · 2015-01-07T00:48:51Z

For MLOR case, it's not as simple as binary case, but I managed to address it. Will be in another PR.

Actually, this solves instability issue when the initial weights is not properly set.
Since in this case, the margin will be very large, and it's just overflow.

mengxr · 2015-01-07T05:28:13Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala

@@ -61,14 +62,27 @@ abstract class Gradient extends Serializable {
 class LogisticGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    val margin = -1.0 * dot(data, weights)
-    val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
+    /**
+     * gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label


Is it the logistic function? Maybe we should make create a function for it as well.

Yeah, but in MLOR, the formula is very different since we have k-1 margins, and we need to carry out the largest margin. As a result, it will be very hard to generalized for BLOR and MLOR. I plan to have it as it since we probably will not use it in other place. The following is the version for MLOR from the comment of my another PR.

* `multiplier = exp(margins(i)) / (1 + \sum_k exp(margins(k))) - * \alpha(label) * \delta_{label}{i+1}` * * where \alpha(label) = 1 if label != 0; * \alpha(label) = 0 if label == 0, and * \delta_{i}{j} = 1 if i == j; * \delta_{i}{j} = 0 if i != j * * See the reference for the detailed mathematical derivation. * * However, the first part of multiplier will be likely suffered from arithmetic overflow, * if any one of the `margins` is larger than 709.78. * * Let's say the largest `margins(l)` is `maxMargin` and positive, then the formula can be * rewritten into equivalent formula with more numerical stability by * * exp(margins(i)) / (1 + \sum_k exp(margins(k))) = * exp(margins(i) / maxMargin) / (exp(-margins(l)) + \sum_k exp(margins(k) / maxMargin)) * * If we define margins'(i) = margins(i) / maxMargin when i != l, * and margins'(l) = -margins(l), we will have equivalent numerically stable formula as * * `multiplier = exp(margins'(i)) / (1 + \sum_k exp(margins'(k))) - * \alpha(label) * \delta_{label}{i+1}` * * Note that if the largest `margins` is negative, the original formula is stable; * as a result, we don't need to change it.

PS, I'll add a test to test overflow later. It does happen quite often when there are outliners which are far away from the hyperplane, and I run into this situation before.

…code

SparkQA · 2015-01-07T07:37:33Z

Test build #25150 has started for PR 3915 at commit 23144f3.

This patch merges cleanly.

SparkQA · 2015-01-07T07:42:34Z

Test build #25151 has started for PR 3915 at commit 3239541.

This patch merges cleanly.

SparkQA · 2015-01-07T07:47:33Z

Test build #25152 has started for PR 3915 at commit bec6a84.

This patch merges cleanly.

SparkQA · 2015-01-07T08:47:24Z

Test build #25150 has finished for PR 3915 at commit 23144f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-07T08:47:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25150/
Test PASSed.

SparkQA · 2015-01-07T08:52:10Z

Test build #25151 has finished for PR 3915 at commit 3239541.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-07T08:52:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25151/
Test PASSed.

SparkQA · 2015-01-07T08:56:11Z

Test build #25152 has finished for PR 3915 at commit bec6a84.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-07T08:56:15Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25152/
Test PASSed.

srowen · 2015-01-07T16:33:46Z

LGTM. The method successfully avoids overflow in the intended way, there's a test, and from checking the old and new logic they should be identical.

mengxr · 2015-01-07T18:14:00Z

Merged into master. Thanks!

mengxr mentioned this pull request Jan 7, 2015

[SPARK-5099][Mllib] Simplify logistic loss function #3899

Closed

mengxr reviewed Jan 7, 2015
View reviewed changes

DB Tsai added 5 commits January 6, 2015 23:30

first commit

64eefd0

address another overflow issue in gradientMultiplier in LOR gradient …

f8447f9

…code

formating

6c29ed3

temp

49f3658

doc

23144f3

revert part of patch into another PR

3239541

remove empty line

bec6a84

dbtsai changed the title ~~[SPARK-5101] Add common ML math functions~~ [SPARK-5101] Add common used log1pExp API in MLUtils Jan 7, 2015

dbtsai changed the title ~~[SPARK-5101] Add common used log1pExp API in MLUtils~~ [SPARK-5128][MLLib] Add common used log1pExp API in MLUtils Jan 7, 2015

asfgit closed this in 60e2d9e Jan 7, 2015

dbtsai deleted the mathutil branch January 7, 2015 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915

[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915

dbtsai commented Jan 6, 2015

SparkQA commented Jan 6, 2015

SparkQA commented Jan 6, 2015

AmplabJenkins commented Jan 6, 2015

SparkQA commented Jan 6, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

dbtsai commented Jan 7, 2015

mengxr Jan 7, 2015

dbtsai Jan 7, 2015

dbtsai Jan 7, 2015

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

srowen commented Jan 7, 2015

mengxr commented Jan 7, 2015

[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915

[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils #3915

Conversation

dbtsai commented Jan 6, 2015

SparkQA commented Jan 6, 2015

SparkQA commented Jan 6, 2015

AmplabJenkins commented Jan 6, 2015

SparkQA commented Jan 6, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

dbtsai commented Jan 7, 2015

mengxr Jan 7, 2015

Choose a reason for hiding this comment

dbtsai Jan 7, 2015

Choose a reason for hiding this comment

dbtsai Jan 7, 2015

Choose a reason for hiding this comment

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

SparkQA commented Jan 7, 2015

AmplabJenkins commented Jan 7, 2015

srowen commented Jan 7, 2015

mengxr commented Jan 7, 2015