-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Execute MERGE using Dataframe API in Scala #3456
Conversation
spark/src/test/scala/org/apache/spark/sql/delta/SchemaValidationSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/SchemaValidationSuite.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/SchemaValidationSuite.scala
Show resolved
Hide resolved
It is necessary to add some tests that check that Merge is actually captured by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test
spark/src/test/scala/org/apache/spark/sql/delta/SchemaValidationSuite.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoSuiteBase.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoSuiteBase.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoSuiteBase.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoSuiteBase.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoSuiteBase.scala
Show resolved
Hide resolved
withKeyValueData( | ||
source = (0, 0) :: (1, 10) :: Nil, | ||
target = (1, 1) :: (2, 2) :: Nil) { case (sourceName, targetName) => | ||
val plans = withLogicalPlansCaptured(spark, optimizedPlan = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to test for the case optimizedPlan = true
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that doesn't bring much, the optimizer has no impact on MergeIntoCommand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
the build for spark-master failed for unrelated reason. so merging this. |
(cherrypick of delta-io#3456) Due to Spark unfortunate behavior of resolving plan nodes it doesn't know, the `DeltaMergeInto` plan created when using the MERGE scala API needs to be manually resolved to ensure spark doesn't interfere with its analysis. This currently completely bypasses Spark's analysis as we then manually execute the MERGE command which has negatiev effects, e.g. the execution is not visible in QueryExecutionListener. This change addresses this issue, by executing the plan using the Dataframe API after it's manually resolved so that the command goes through the regular code path. Resolves delta-io#1521 Covered by existing tests.
(cherrypick of #3456) Due to Spark unfortunate behavior of resolving plan nodes it doesn't know, the `DeltaMergeInto` plan created when using the MERGE scala API needs to be manually resolved to ensure spark doesn't interfere with its analysis. This currently completely bypasses Spark's analysis as we then manually execute the MERGE command which has negatiev effects, e.g. the execution is not visible in QueryExecutionListener. This change addresses this issue, by executing the plan using the Dataframe API after it's manually resolved so that the command goes through the regular code path. Resolves #1521 Covered by existing tests. Co-authored-by: Johan Lasperas <johan.lasperas@databricks.com>
Description
Due to Spark unfortunate behavior of resolving plan nodes it doesn't know, the
DeltaMergeInto
plan created when using the MERGE scala API needs to be manually resolved to ensure spark doesn't interfere with its analysis.This currently completely bypasses Spark's analysis as we then manually execute the MERGE command which has negatiev effects, e.g. the execution is not visible in QueryExecutionListener.
This change addresses this issue, by executing the plan using the Dataframe API after it's manually resolved so that the command goes through the regular code path.
Resolves #1521
How was this patch tested?
Covered by existing tests.