[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483

cloud-fan · 2018-02-02T05:57:22Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-23309 reported a performance regression about cached table in Spark 2.3. While the investigating is still going on, this PR adds a conf to turn off the vectorized cache reader, to unblock the 2.3 release.

How was this patch tested?

a new test

cloud-fan · 2018-02-02T05:59:14Z

cc @kiszk @gatorsmile

gatorsmile

LGTM pending Jenkins.

dongjoon-hyun · 2018-02-02T06:54:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+    buildConf("spark.sql.inMemoryColumnarStorage.enableVectorizedReader")
+      .doc("Enables vectorized reader for columnar caching.")
+      .booleanConf
+      .createWithDefault(true)


~~To unblock 2.3, I think we need to disable this with false.~~
Sorry, I'm taking this back since it's too radical in general.

internal?

the parquet/orc vectorized reader conf is also public.

SparkQA · 2018-02-02T08:05:01Z

Test build #86966 has finished for PR 20483 at commit 376c855.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-02T08:05:02Z

Test build #86967 has finished for PR 20483 at commit 53d3259.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-02-02T08:19:30Z

retest this please.

SparkQA · 2018-02-02T11:41:22Z

Test build #86983 has finished for PR 20483 at commit 53d3259.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya

LGTM

cloud-fan · 2018-02-02T14:43:58Z

thanks, merging to master/2.3!

## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-23309 reported a performance regression about cached table in Spark 2.3. While the investigating is still going on, this PR adds a conf to turn off the vectorized cache reader, to unblock the 2.3 release. ## How was this patch tested? a new test Author: Wenchen Fan <wenchen@databricks.com> Closes #20483 from cloud-fan/cache. (cherry picked from commit b9503fc) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

kiszk · 2018-02-05T07:38:56Z

Thank you for adding this. I will look at the performance regression.

…e reader ## What changes were proposed in this pull request? apache#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20513 from cloud-fan/cache.

…e reader ## What changes were proposed in this pull request? #20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #20513 from cloud-fan/cache. (cherry picked from commit ac7454c) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…e reader ## What changes were proposed in this pull request? apache#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20513 from cloud-fan/cache.

gatorsmile approved these changes Feb 2, 2018

View reviewed changes

add a config to turn off vectorized cache reader

53d3259

cloud-fan force-pushed the cache branch from 376c855 to 53d3259 Compare February 2, 2018 06:09

dongjoon-hyun reviewed Feb 2, 2018

View reviewed changes

felixcheung approved these changes Feb 2, 2018

View reviewed changes

viirya approved these changes Feb 2, 2018

View reviewed changes

asfgit closed this in b9503fc Feb 2, 2018

cloud-fan mentioned this pull request Feb 6, 2018

[SPARK-23312][SQL][followup] add a config to turn off vectorized cache reader #20513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483

[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483

cloud-fan commented Feb 2, 2018

cloud-fan commented Feb 2, 2018

gatorsmile left a comment

dongjoon-hyun Feb 2, 2018 •

edited

Loading

viirya Feb 2, 2018

cloud-fan Feb 2, 2018

SparkQA commented Feb 2, 2018

SparkQA commented Feb 2, 2018

viirya commented Feb 2, 2018

SparkQA commented Feb 2, 2018

viirya left a comment

cloud-fan commented Feb 2, 2018

kiszk commented Feb 5, 2018

[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483

[SPARK-23312][SQL] add a config to turn off vectorized cache reader #20483

Conversation

cloud-fan commented Feb 2, 2018

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Feb 2, 2018

gatorsmile left a comment

Choose a reason for hiding this comment

dongjoon-hyun Feb 2, 2018 • edited Loading

Choose a reason for hiding this comment

viirya Feb 2, 2018

Choose a reason for hiding this comment

cloud-fan Feb 2, 2018

Choose a reason for hiding this comment

SparkQA commented Feb 2, 2018

SparkQA commented Feb 2, 2018

viirya commented Feb 2, 2018

SparkQA commented Feb 2, 2018

viirya left a comment

Choose a reason for hiding this comment

cloud-fan commented Feb 2, 2018

kiszk commented Feb 5, 2018

dongjoon-hyun Feb 2, 2018 •

edited

Loading