[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

cloud-fan · 2019-10-28T10:20:57Z

What changes were proposed in this pull request?

Build the BatchWrite in the planner and get rid of the V1_BATCH_WRITE table capability.

Why are the changes needed?

It's always better to make the API simpler and easier to implement. When I was working on v1 read fallback API at #26231 , I realized that we don't need a table capability for it, because we create the Scan object in the planner. We can do the same thing for v1 write fallback API as well.

This can also reduce duplicated code of creating the WriteBuilder.

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

cloud-fan · 2019-10-28T10:21:37Z

cc @brkyvz @rdblue

cloud-fan · 2019-10-28T10:24:07Z

A related topic: #25990 suggests that we should pass some physical information when creating Write. This makes me think if we should have logical and physical write as the read side API does. We create the logical write in the planner, and the physical plan creates physical write given some physical information.

SparkQA · 2019-10-28T13:14:28Z

Test build #112766 has finished for PR 26281 at commit b973fbd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AppendDataExec(write: BatchWrite, query: SparkPlan) extends V2TableWriteExec
trait V2TableWriteWithV1FallBack extends V2TableWriteExec with SupportsV1Write

SparkQA · 2019-10-28T14:39:24Z

Test build #112769 has finished for PR 26281 at commit 9156561.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AppendDataExec(write: BatchWrite, query: SparkPlan) extends V2TableWriteExec
trait V2TableWriteWithV1FallBack extends V2TableWriteExec with SupportsV1Write

brkyvz · 2019-10-28T18:01:39Z

Until we support Table creation through V2 in DataFrameWriter.save (through the Options -> Catalog and Identifier resolvers), I'm a -1 on this. This would break all V1 Write fallback codepaths for DataFrameWriter.save.

github-actions · 2020-02-08T00:07:24Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

remove V1_BATCH_WRITE table capability

9156561

cloud-fan force-pushed the v1-write branch from b973fbd to 9156561 Compare October 28, 2019 11:55

dongjoon-hyun added the SQL label Oct 30, 2019

github-actions bot added the Stale label Feb 8, 2020

github-actions bot closed this Feb 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

cloud-fan commented Oct 28, 2019

cloud-fan commented Oct 28, 2019

cloud-fan commented Oct 28, 2019

SparkQA commented Oct 28, 2019

SparkQA commented Oct 28, 2019

brkyvz commented Oct 28, 2019 •

edited

Loading

github-actions bot commented Feb 8, 2020

[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

Conversation

cloud-fan commented Oct 28, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan commented Oct 28, 2019

cloud-fan commented Oct 28, 2019

SparkQA commented Oct 28, 2019

SparkQA commented Oct 28, 2019

brkyvz commented Oct 28, 2019 • edited Loading

github-actions bot commented Feb 8, 2020

brkyvz commented Oct 28, 2019 •

edited

Loading