Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30627][SQL] Disable all the V2 file sources by default #27348

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jan 24, 2020

What changes were proposed in this pull request?

Disable all the V2 file sources in Spark 3.0 by default.

Why are the changes needed?

There are still some missing parts in the file source V2 framework:

  1. It doesn't support reporting file scan metrics such as "numOutputRows"/"numFiles"/"fileSize" like FileSourceScanExec. This requires another patch in the data source V2 framework. Tracked by SPARK-30362
  2. It doesn't support partition pruning with subqueries(including dynamic partition pruning) for now. Tracked by SPARK-30628

As we are going to code freeze on Jan 31st, this PR proposes to disable all the V2 file sources in Spark 3.0 by default.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

@gengliangwang
Copy link
Member Author

cc @cloud-fan @rdblue @gatorsmile

@@ -1728,7 +1728,7 @@ object SQLConf {
"implementation class names for which Data Source V2 code path is disabled. These data " +
"sources will fallback to Data Source V1 code path.")
.stringConf
.createWithDefault("kafka")
.createWithDefault("kafka,parquet,orc,json,csv,text,avro")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we set this alphabetically at this time, @gengliangwang ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (Pending Jenkins. Hopefully, we didn't miss anything in test suites)

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117326 has finished for PR 27348 at commit 0567514.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117327 has finished for PR 27348 at commit 412dda1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -86,6 +87,11 @@ class FileDataSourceV2FallBackSuite extends QueryTest with SharedSparkSession {
private val dummyReadOnlyFileSourceV2 = classOf[DummyReadOnlyFileDataSourceV2].getName
private val dummyWriteOnlyFileSourceV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName

override protected def sparkConf: SparkConf =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117329 has finished for PR 27348 at commit 7afda97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM again. Thank you, @gengliangwang and @gatorsmile .
All tests passed and the last commit is only indentation changes.
Merged to master.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117334 has finished for PR 27348 at commit 0aaaa58.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants