Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29618] remove V1_BATCH_WRITE table capability #26281

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Build the BatchWrite in the planner and get rid of the V1_BATCH_WRITE table capability.

Why are the changes needed?

It's always better to make the API simpler and easier to implement. When I was working on v1 read fallback API at #26231 , I realized that we don't need a table capability for it, because we create the Scan object in the planner. We can do the same thing for v1 write fallback API as well.

This can also reduce duplicated code of creating the WriteBuilder.

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @brkyvz @rdblue

@cloud-fan
Copy link
Contributor Author

A related topic: #25990 suggests that we should pass some physical information when creating Write. This makes me think if we should have logical and physical write as the read side API does. We create the logical write in the planner, and the physical plan creates physical write given some physical information.

@SparkQA
Copy link

SparkQA commented Oct 28, 2019

Test build #112766 has finished for PR 26281 at commit b973fbd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AppendDataExec(write: BatchWrite, query: SparkPlan) extends V2TableWriteExec
  • trait V2TableWriteWithV1FallBack extends V2TableWriteExec with SupportsV1Write

@SparkQA
Copy link

SparkQA commented Oct 28, 2019

Test build #112769 has finished for PR 26281 at commit 9156561.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AppendDataExec(write: BatchWrite, query: SparkPlan) extends V2TableWriteExec
  • trait V2TableWriteWithV1FallBack extends V2TableWriteExec with SupportsV1Write

@brkyvz
Copy link
Contributor

brkyvz commented Oct 28, 2019

Until we support Table creation through V2 in DataFrameWriter.save (through the Options -> Catalog and Identifier resolvers), I'm a -1 on this. This would break all V1 Write fallback codepaths for DataFrameWriter.save.

@github-actions
Copy link

github-actions bot commented Feb 8, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 8, 2020
@github-actions github-actions bot closed this Feb 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants