[SPARK-17409] [SQL] [FOLLOW-UP] Do Not Optimize Query in CTAS More Than Once #15459

gatorsmile · 2016-10-13T07:57:34Z

What changes were proposed in this pull request?

This follow-up PR is for addressing the comment.

We added two test cases based on the suggestion from @yhuai . One is a new test case using the saveAsTable API to create a data source table. Another is for CTAS on Hive serde table.

Note: No need to backport this PR to 2.0. Will submit a new PR to backport the whole fix with new test cases to Spark 2.0

How was this patch tested?

N/A

SparkQA · 2016-10-13T10:07:55Z

Test build #66881 has finished for PR 15459 at commit 6898f4a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MetastoreRelationSuite extends QueryTest with SQLTestUtils with TestHiveSingleton

yhuai · 2016-10-14T18:24:52Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreRelationSuite.scala

+    withTable("bar") {
+      withTempView("foo") {
+        sql("select 0 as id").createOrReplaceTempView("foo")
+        sql("CREATE TABLE bar AS SELECT * FROM foo group by id")


Also put the comment of https://github.com/apache/spark/pull/15459/files#diff-5d2ebf4e9ca5a990136b276859769289R1626 at here? Also mention that this test is for hive format table?

SparkQA · 2016-10-15T00:19:16Z

Test build #66989 has finished for PR 15459 at commit 765d104.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…15048 ### What changes were proposed in this pull request? This PR is to backport #15048 and #15459. However, in 2.0, we do not have a unified logical node `CreateTable` and the analyzer rule `PreWriteCheck` is also different. To minimize the code changes, this PR adds a new rule `AnalyzeCreateTableAsSelect`. Please treat it as a new PR to review. Thanks! As explained in #14797: >Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. For example, we have a rule for decimal calculation to promote the precision before binary operations, use PromotePrecision as placeholder to indicate that this rule should not apply twice. But a Optimizer rule will remove this placeholder, that break the assumption, then the rule applied twice, cause wrong result. We should not optimize the query in CTAS more than once. For example, ```Scala spark.range(99, 101).createOrReplaceTempView("tab1") val sqlStmt = "SELECT id, cast(id as long) * cast('1.0' as decimal(38, 18)) as num FROM tab1" sql(s"CREATE TABLE tab2 USING PARQUET AS $sqlStmt") checkAnswer(spark.table("tab2"), sql(sqlStmt)) ``` Before this PR, the results do not match ``` == Results == !== Correct Answer - 2 == == Spark Answer - 2 == ![100,100.000000000000000000] [100,null] [99,99.000000000000000000] [99,99.000000000000000000] ``` After this PR, the results match. ``` +---+----------------------+ |id |num | +---+----------------------+ |99 |99.000000000000000000 | |100|100.000000000000000000| +---+----------------------+ ``` In this PR, we do not treat the `query` in CTAS as a child. Thus, the `query` will not be optimized when optimizing CTAS statement. However, we still need to analyze it for normalizing and verifying the CTAS in the Analyzer. Thus, we do it in the analyzer rule `PreprocessDDL`, because so far only this rule needs the analyzed plan of the `query`. ### How was this patch tested? Author: gatorsmile <gatorsmile@gmail.com> Closes #15502 from gatorsmile/ctasOptimize2.0.

gatorsmile · 2016-10-18T04:19:52Z

@yhuai Any further comment about it? Thanks!

…Cases

SparkQA · 2016-10-24T19:52:09Z

Test build #67462 has finished for PR 15459 at commit 1f2e7b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-10-24T19:54:03Z

cc @cloud-fan @yhuai This has been reviewed in the backport PR #15502. Can we merge this to master now? Thanks!

cloud-fan · 2016-10-25T02:45:53Z

LGTM

cloud-fan · 2016-10-25T02:47:48Z

merging to master, thanks!

… Once ### What changes were proposed in this pull request? This follow-up PR is for addressing the [comment](apache#15048). We added two test cases based on the suggestion from yhuai . One is a new test case using the `saveAsTable` API to create a data source table. Another is for CTAS on Hive serde table. Note: No need to backport this PR to 2.0. Will submit a new PR to backport the whole fix with new test cases to Spark 2.0 ### How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes apache#15459 from gatorsmile/ctasOptimizedTestCases.

Added test cases.

6898f4a

yhuai reviewed Oct 14, 2016

View reviewed changes

address comments

765d104

gatorsmile mentioned this pull request Oct 15, 2016

[SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS More Than Once #15048 #15502

Closed

Merge remote-tracking branch 'upstream/master' into ctasOptimizedTest…

1f2e7b8

…Cases

asfgit closed this in d479c52 Oct 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17409] [SQL] [FOLLOW-UP] Do Not Optimize Query in CTAS More Than Once #15459

[SPARK-17409] [SQL] [FOLLOW-UP] Do Not Optimize Query in CTAS More Than Once #15459

gatorsmile commented Oct 13, 2016

SparkQA commented Oct 13, 2016

yhuai Oct 14, 2016

SparkQA commented Oct 15, 2016

gatorsmile commented Oct 18, 2016

SparkQA commented Oct 24, 2016

gatorsmile commented Oct 24, 2016

cloud-fan commented Oct 25, 2016

cloud-fan commented Oct 25, 2016

[SPARK-17409] [SQL] [FOLLOW-UP] Do Not Optimize Query in CTAS More Than Once #15459

[SPARK-17409] [SQL] [FOLLOW-UP] Do Not Optimize Query in CTAS More Than Once #15459

Conversation

gatorsmile commented Oct 13, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Oct 13, 2016

yhuai Oct 14, 2016

Choose a reason for hiding this comment

SparkQA commented Oct 15, 2016

gatorsmile commented Oct 18, 2016

SparkQA commented Oct 24, 2016

gatorsmile commented Oct 24, 2016

cloud-fan commented Oct 25, 2016

cloud-fan commented Oct 25, 2016