-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap #14373
Conversation
@@ -608,7 +608,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap | |||
def optimize(): Unit = { | |||
val range = maxKey - minKey | |||
// Convert to dense mode if it does not require more memory or could fit within L1 cache | |||
if (range < array.length || range < 1024) { | |||
if (range >= 0 && (range < array.length || range < 1024)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest a comment explaining why this check exists, but LGTM
Jenkins test this please |
Oops. I added a useless comment and removed it. Sorry. Never mind that. I didn't see the previous comments of @srowen . |
Test build #62901 has finished for PR 14373 at commit
|
Thanks - merging in master/2.0. |
@@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap | |||
def optimize(): Unit = { | |||
val range = maxKey - minKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually shouldn't we just change range's type to long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a Scala expert so I'll trust your judgement!
maxKey
being initialized to Long.MinValue
, I wasn't sure of all the cases where range
could end up negative so range >=0
seemed the safest thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're already Long
s so range
is too. The difference can overflow though. This should correctly handle the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it. let me merge this.
I haven't actually merged it yet. |
Merging in master/2.0 for real this time. |
Avoid overflow of Long type causing a NegativeArraySizeException a few lines later. Unit tests for HashedRelationSuite still pass. I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this. Thanks! Author: Sylvain Zimmer <sylvain@sylvainzimmer.com> Closes #14373 from sylvinus/master. (cherry picked from commit 1178d61) Signed-off-by: Reynold Xin <rxin@databricks.com>
Thanks a lot! |
For reference, I have found another similar issue: https://issues.apache.org/jira/browse/SPARK-16802 |
What changes were proposed in this pull request?
Avoid overflow of Long type causing a NegativeArraySizeException a few lines later.
How was this patch tested?
Unit tests for HashedRelationSuite still pass.
I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this.
Thanks!