-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2479][MLlib] Comparing floating-point numbers using relative error in UnitTests #1425
[SPARK-2479][MLlib] Comparing floating-point numbers using relative error in UnitTests #1425
Conversation
QA tests have started for PR 1425. This patch merges cleanly. |
val weight0 = model.weights(0) | ||
assert(weight0 >= -1.60 && weight0 <= -1.40, weight0 + " not in [-1.6, -1.4]") | ||
assert(model.intercept >= 1.9 && model.intercept <= 2.1, model.intercept + " not in [1.9, 2.1]") | ||
assert(model.weights(0).almostEquals(-1.5244128696247), "weight0 should be -1.5244128696247") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should leave some buffer space for the weights. In case we update the implementation, it would be tedious to update the numbers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have higher relative error here instead. If the implementation is changed, it's also nice to have a test which can catch the slightly different behavior. Also, updating those numbers will not take too much time comparing with the implementation work.
@dbtsai The assertions with Btw, Scalatest 2.x has this tolerance feature, where you can use |
|
||
class BinaryClassificationMetricsSuite extends FunSuite with LocalSparkContext { | ||
|
||
implicit class SeqDoubleWithAlmostEquals(val x: Seq[Double]) { | ||
def almostEquals(y: Seq[Double], eps: Double = 1E-6): Boolean = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0e-6 is way bigger than an ulp for a double; 1.0e-12 is more like it. I understand a complex calculation might legitimately vary by significantly more than an ulp depending on the implementation. As @mengxr says where you mean to allow significantly more than machine precision worth of noise, that's probably good to do with an explicitly larger epsilon. But this is certainly a good step forward already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for one ulp, it might be 10e-15. Lots of time, I manually type the numbers or just copy the first couple dights of numbers to save the line space, so that's why I chose 1.0e-6. Thus, I can just type around 7 digits of numbers.
I agree with you that in this case, we may want to explicitly specify with larger epsilon.
@mengxr Scalatest 2.x has the tolerance feature, but it's absolute error not relative error. For large numbers, the absolute error may not be meaningful. With I probably can add method called |
|
I learn
|
QA tests have started for PR 1425. This patch merges cleanly. |
QA tests have started for PR 1425. This patch merges cleanly. |
QA tests have started for PR 1425. This patch merges cleanly. |
@dbtsai The suggestion of using |
|
@dbtsai this is awesome! I actually created a JIRA on this after trying to use TestUtils in one of my unit suites, but it looks like you're already taking care of it. https://issues.apache.org/jira/browse/SPARK-2599 |
@dbtsai I saw why |
true | ||
} else if(math.abs(x) > math.abs(y)) { | ||
math.abs(x - y) / math.abs(x) < epsilon | ||
} else if (absX < 1E-15 || absY < 1E-15) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See commentary at https://issues.apache.org/jira/browse/SPARK-2599?focusedCommentId=14068293&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068293 I can see the idea here, but all of these kinds of efforts seem to lead to errors or unintuitive behavior. For example this line means that:
Fails:
expected = 1e-15
actual = 2e-15
eps = 0.1
Passes:
expected = 1e-16
actual = 2e-16
eps = 0.1
Why is 1e-15 special anyway?
Is it possible to support syntax like Also, there is the method |
I also noticed that it will be suffering when comparing against 0.0. As @srowen pointed out, it's meaningless to comparing against 0.0 (or a really small number) with relative error. However, people may just want to write unittest using relative error for even comparing those small numbers. So I purpose the following APIs. |
@srowen |
@mengxr Sure, maybe the % syntax isn't helpful. I just mean two different operators or methods of some kind. Why bother with these issues instead of making two methods? Yes, choosing 1.0 as the switching point removes the discontinuity. I think it will surprise readers to find that, in a test with a series of checks like "0.1 +- 0.01", "1 +- 0.01", "3 +- 0.01" that the latter doesn't mean [2.99,3.01], when the first two do in fact mean [0.09,0.11] and [0.99,1.01], which matches what one would expect from all these other unit testing frameworks. @dbtsai I think developing two operators is a good solution If there is a separate operator for relative error, you don't need to special-case the behavior. Sure it's meaningless to make a relative error test at 0 but you can just warn the caller; it's well-defined what happens. |
QA tests have started for PR 1425. This patch merges cleanly. |
QA tests have started for PR 1425. This patch merges cleanly. |
Based on our discussion, I've implemented two different APIs for relative error, and absolute error. It makes sense that test writers should know which one they need depending on their circumstances. Developers also need to explicitly specify the eps now, and there is no default value which will sometimes cause confusion. When comparing against zero using relative error, a exception will be raised to warn users that it's meaningless. For relative error in percentage, users can now write
For absolute error,
|
QA tests have started for PR 1425. This patch merges cleanly. |
QA results for PR 1425: |
QA results for PR 1425: |
QA results for PR 1425: |
QA tests have started for PR 1425. This patch merges cleanly. |
QA results for PR 1425: |
@dbtsai As discussed with @srowen , the Instead of adding more sugar in testing, how about using
I still recommend using |
@dbtsai I think it is very uncommon to combine the scientific notation with percentage, like |
QA tests have started for PR 1425. This patch DID NOT merge cleanly! |
QA tests have started for PR 1425. This patch DID NOT merge cleanly! |
QA tests have started for PR 1425. This patch DID NOT merge cleanly! |
@@ -40,27 +41,51 @@ class KMeansSuite extends FunSuite with LocalSparkContext { | |||
// No matter how many runs or iterations we use, we should get one cluster, | |||
// centered at the mean of the points | |||
|
|||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not merged cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to rebase against master with conflicts. I addressed them in the next push.
QA tests have started for PR 1425. This patch merges cleanly. |
QA results for PR 1425: |
QA results for PR 1425: |
QA results for PR 1425: |
@mengxr Resolved all the conflicts after rebasing, and all the unittests are passed. Thanks. |
QA results for PR 1425: |
There were some problems with pyspark. Let's call Jenkins again. |
Jenkins, retest this please. |
1 similar comment
Jenkins, retest this please. |
QA tests have started for PR 1425. This patch merges cleanly. |
QA results for PR 1425: |
LGTM. I'm merging this into master! Thanks!! |
Allow small errors in comparison. @dbtsai , this unit test blocks apache#1562 . I may need to merge this one first. We can change it to use the tools in apache#1425 after that PR gets merged. Author: Xiangrui Meng <meng@databricks.com> Closes apache#1576 from mengxr/fix-binary-metrics-unit-tests and squashes the following commits: 5076a7f [Xiangrui Meng] fix binary metrics unit tests
…rror in UnitTests Floating point math is not exact, and most floating-point numbers end up being slightly imprecise due to rounding errors. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result. That means that comparing two floats to see if they are equal is usually not what we want. As long as this imprecision stays small, it can usually be ignored. Based on discussion in the community, we have implemented two different APIs for relative tolerance, and absolute tolerance. It makes sense that test writers should know which one they need depending on their circumstances. Developers also need to explicitly specify the eps, and there is no default value which will sometimes cause confusion. When comparing against zero using relative tolerance, a exception will be raised to warn users that it's meaningless. For relative tolerance, users can now write assert(23.1 ~== 23.52 relTol 0.02) assert(23.1 ~== 22.74 relTol 0.02) assert(23.1 ~= 23.52 relTol 0.02) assert(23.1 ~= 22.74 relTol 0.02) assert(!(23.1 !~= 23.52 relTol 0.02)) assert(!(23.1 !~= 22.74 relTol 0.02)) // This will throw exception with the following message. // "Did not expect 23.1 and 23.52 to be within 0.02 using relative tolerance." assert(23.1 !~== 23.52 relTol 0.02) // "Expected 23.1 and 22.34 to be within 0.02 using relative tolerance." assert(23.1 ~== 22.34 relTol 0.02) For absolute error, assert(17.8 ~== 17.99 absTol 0.2) assert(17.8 ~== 17.61 absTol 0.2) assert(17.8 ~= 17.99 absTol 0.2) assert(17.8 ~= 17.61 absTol 0.2) assert(!(17.8 !~= 17.99 absTol 0.2)) assert(!(17.8 !~= 17.61 absTol 0.2)) // This will throw exception with the following message. // "Did not expect 17.8 and 17.99 to be within 0.2 using absolute error." assert(17.8 !~== 17.99 absTol 0.2) // "Expected 17.8 and 17.59 to be within 0.2 using absolute error." assert(17.8 ~== 17.59 absTol 0.2) Authors: DB Tsai <dbtsaialpinenow.com> Marek Kolodziej <marekalpinenow.com> Author: DB Tsai <dbtsai@alpinenow.com> Closes apache#1425 from dbtsai/SPARK-2479_comparing_floating_point and squashes the following commits: 8c7cbcc [DB Tsai] Alpine Data Labs
Floating point math is not exact, and most floating-point numbers end up being slightly imprecise due to rounding errors.
Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result.
That means that comparing two floats to see if they are equal is usually not what we want. As long as this imprecision stays small, it can usually be ignored.
Based on discussion in the community, we have implemented two different APIs for relative tolerance, and absolute tolerance. It makes sense that test writers should know which one they need depending on their circumstances.
Developers also need to explicitly specify the eps, and there is no default value which will sometimes cause confusion.
When comparing against zero using relative tolerance, a exception will be raised to warn users that it's meaningless.
For relative tolerance, users can now write
For absolute error,
Authors:
DB Tsai dbtsai@alpinenow.com
Marek Kolodziej marek@alpinenow.com