-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16664][SQL] Fix persist call on Data frames with more than 200… #14324
Conversation
… columns is wiping out the data.
Can you add a test case? |
Yes, working on that |
@breakdawn it'd be great to do more tests when you open a request. As I'm investigating into this too, I found that my same fix works for 201 cols but fails for 8118 cols. The exact limit is 8117. |
@lw-lin You're right, thanks for your suggestion. |
8118 cols limit due to janino, the exception like following, might be another story |
@breakdawn yes that's a different issue and I've been looking into it. Regarding what this PR tries to fix, could you run this PR's change against this test case to see whether it's sufficient? |
@@ -1571,4 +1571,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { | |||
checkAnswer(joined, Row("x", null, null)) | |||
checkAnswer(joined.filter($"new".isNull), Row("x", null, null)) | |||
} | |||
|
|||
test("SPARK-16664: persist with more than 200 columns") { | |||
val size = 201l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: write 201L for a long literal; it's too easy to read this as 2011.
There are actually 55 occurrences of this type of problem in the code base. I think I will open a PR separately to fix them. It might or might not cause a problem in practice in other cases, but many are in examples or tests, where we might not observe the consequence. |
@lw-lin umm, thanks for pointing it out. Since the limit is 8117, 10000 will fail, that case needs a update. |
@breakdawn what else can we do to actually fix the ≥ 8118 cols issue? We're actually running out of the constant pool when we compile the generated code. So maybe compile it into multiple classes? Or just fall back to the non-code-gen path? Thanks. |
@lw-lin Personally, multiple classes way is smoother base on current implementation. But no matter in what way, it's a big change, maybe it's better to open another jira issue to involve more discussions. |
Jenkins test this please |
Test build #62876 has finished for PR 14324 at commit
|
@@ -227,7 +227,8 @@ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { | |||
val columnTypes1 = List.fill(length1)(IntegerType) | |||
val columnarIterator1 = GenerateColumnAccessor.generate(columnTypes1) | |||
|
|||
val length2 = 10000 | |||
//SPARK-16664: the limit of janino is 8117 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this needs a space after //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Sorry for that.
Jenkins retest this please |
Jenkins add to whitelist |
Test build #62920 has finished for PR 14324 at commit
|
## What changes were proposed in this pull request? f12f11e introduced this bug, missed foreach as map ## How was this patch tested? Test added Author: Wesley Tang <tangmingjun@mininglamp.com> Closes #14324 from breakdawn/master. (cherry picked from commit d1d5069) Signed-off-by: Sean Owen <sowen@cloudera.com>
Merged to master/2.0/1.6 |
Darn, this breaks 1.6, because the test doesn't compile. I'll revert it in 1.6. @breakdawn if you're willing, could you open a PR vs 1.6 that updates the test to work in that branch? |
f12f11e introduced this bug, missed foreach as map Test added Author: Wesley Tang <tangmingjun@mininglamp.com> Closes apache#14324 from breakdawn/master. (cherry picked from commit d1d5069) Signed-off-by: Sean Owen <sowen@cloudera.com> (cherry picked from commit 15abbf9)
What changes were proposed in this pull request?
f12f11e introduced this bug, missed foreach as map
How was this patch tested?
Test added