-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending Jenkins.
buildConf("spark.sql.inMemoryColumnarStorage.enableVectorizedReader") | ||
.doc("Enables vectorized reader for columnar caching.") | ||
.booleanConf | ||
.createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock 2.3, I think we need to disable this with false
.
Sorry, I'm taking this back since it's too radical in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parquet/orc vectorized reader conf is also public.
Test build #86966 has finished for PR 20483 at commit
|
Test build #86967 has finished for PR 20483 at commit
|
retest this please. |
Test build #86983 has finished for PR 20483 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks, merging to master/2.3! |
## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-23309 reported a performance regression about cached table in Spark 2.3. While the investigating is still going on, this PR adds a conf to turn off the vectorized cache reader, to unblock the 2.3 release. ## How was this patch tested? a new test Author: Wenchen Fan <wenchen@databricks.com> Closes #20483 from cloud-fan/cache. (cherry picked from commit b9503fc) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thank you for adding this. I will look at the performance regression. |
…e reader ## What changes were proposed in this pull request? apache#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20513 from cloud-fan/cache.
…e reader ## What changes were proposed in this pull request? #20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #20513 from cloud-fan/cache. (cherry picked from commit ac7454c) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…e reader ## What changes were proposed in this pull request? apache#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20513 from cloud-fan/cache.
What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-23309 reported a performance regression about cached table in Spark 2.3. While the investigating is still going on, this PR adds a conf to turn off the vectorized cache reader, to unblock the 2.3 release.
How was this patch tested?
a new test