Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap #14373

Closed
wants to merge 2 commits into from

Conversation

sylvinus
Copy link
Contributor

What changes were proposed in this pull request?

Avoid overflow of Long type causing a NegativeArraySizeException a few lines later.

How was this patch tested?

Unit tests for HashedRelationSuite still pass.

I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this.

Thanks!

@@ -608,7 +608,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
def optimize(): Unit = {
val range = maxKey - minKey
// Convert to dense mode if it does not require more memory or could fit within L1 cache
if (range < array.length || range < 1024) {
if (range >= 0 && (range < array.length || range < 1024)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest a comment explaining why this check exists, but LGTM

@srowen
Copy link
Member

srowen commented Jul 26, 2016

Jenkins test this please

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 26, 2016

Oops. I added a useless comment and removed it. Sorry. Never mind that. I didn't see the previous comments of @srowen .

@SparkQA
Copy link

SparkQA commented Jul 26, 2016

Test build #62901 has finished for PR 14373 at commit a30ca9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 27, 2016

Thanks - merging in master/2.0.

@@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
def optimize(): Unit = {
val range = maxKey - minKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually shouldn't we just change range's type to long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a Scala expert so I'll trust your judgement!

maxKey being initialized to Long.MinValue, I wasn't sure of all the cases where range could end up negative so range >=0 seemed the safest thing to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're already Longs so range is too. The difference can overflow though. This should correctly handle the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it. let me merge this.

@rxin
Copy link
Contributor

rxin commented Jul 27, 2016

I haven't actually merged it yet.

@rxin
Copy link
Contributor

rxin commented Jul 28, 2016

Merging in master/2.0 for real this time.

asfgit pushed a commit that referenced this pull request Jul 28, 2016
Avoid overflow of Long type causing a NegativeArraySizeException a few lines later.

Unit tests for HashedRelationSuite still pass.

I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this.

Thanks!

Author: Sylvain Zimmer <sylvain@sylvainzimmer.com>

Closes #14373 from sylvinus/master.

(cherry picked from commit 1178d61)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 1178d61 Jul 28, 2016
@sylvinus
Copy link
Contributor Author

Thanks a lot!

@sylvinus
Copy link
Contributor Author

For reference, I have found another similar issue: https://issues.apache.org/jira/browse/SPARK-16802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants