-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527
Conversation
Test build #87151 has finished for PR 20527 at commit
|
|
||
Seq("c" -> 3).toDF("i", "j").write.mode("append").saveAsTable("t") | ||
checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c") | ||
:: Row(null, "3") :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pining me, @cloud-fan . +1 for the patch, LGTM.
@@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit | |||
""".stripMargin) | |||
} | |||
|
|||
castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns) | |||
insert.copy(query = newQuery, partition = normalizedPartSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't need to copy the newQuery
if it is the same as query
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also ok to always copy it and the code is neater.
@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo | |||
checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) | |||
} | |||
} | |||
|
|||
// TODO: This test is copied from HiveDDLSuite, unify it later. | |||
test("SPARK-23348: append data to data source table with saveAsTable") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to cover the following case:
2) Target tables have column metadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add this when unifying the test cases?
LGTM only some nits. |
LGTM Thanks! Merged to master/2.3 |
…ta types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable. There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case. This PR fixes this bug by adjusting data types for `CreateTable` too. ## How was this patch tested? new test. Author: Wenchen Fan <wenchen@databricks.com> Closes #20527 from cloud-fan/saveAsTable. (cherry picked from commit 7f5f5fb) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…ta types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable. There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case. This PR fixes this bug by adjusting data types for `CreateTable` too. ## How was this patch tested? new test. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20527 from cloud-fan/saveAsTable.
What changes were proposed in this pull request?
For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable.
There are several ways to insert/append data: SQL API,
DataFrameWriter.insertInto
,DataFrameWriter.saveAsTable
. The first 2 ways createInsertIntoTable
plan, and the last way createsCreateTable
plan. However, we only adjust input query data types forInsertIntoTable
, and users may hit weird errors when appending data usingsaveAsTable
. See the JIRA for the error case.This PR fixes this bug by adjusting data types for
CreateTable
too.How was this patch tested?
new test.