Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23305][SQL][TEST] Test spark.sql.files.ignoreMissingFiles for all file-based data sources #20479

Closed
wants to merge 4 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 1, 2018

What changes were proposed in this pull request?

Like Parquet, all file-based data source handles spark.sql.files.ignoreMissingFiles correctly. We had better have a test coverage for feature parity and in order to prevent future accidental regression for all data sources.

How was this patch tested?

Pass Jenkins with a newly added test case.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86940 has finished for PR 20479 at commit 7fdfc35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

new Path(basePath, "third").toString)

val fs = thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
fs.delete(thirdPath, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we assert true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval!

@viirya
Copy link
Member

viirya commented Feb 2, 2018

LGTM

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86955 has finished for PR 20479 at commit c2891d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -655,4 +655,35 @@ class OrcQuerySuite extends OrcQueryTest with SharedSQLContext {
}
}
}

testQuietly("Enabling/disabling ignoreMissingFiles") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this is copied from Parquet. To avoid duplicate codes, create a common base test class for parquet and orc? Then, we can deduplicate the codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Which suite is proper for that base test class?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 2, 2018

@gatorsmile . Can we do the refactoring later since we have still many things to do for RC3?
The main purpose is preventing any regression since Spark 2.3.0.

@dongjoon-hyun
Copy link
Member Author

Ah, I moved both of them into FileBasedDataSourceSuite.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @HyukjinKwon , @viirya , and @gatorsmile .
Now, there is no redundancy.

@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
}
}
}

// Only ORC/Parquet support this.
Seq("orc", "parquet").foreach { format =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. This sounds better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
}
}
}

// Only ORC/Parquet support this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait .. don't other sources support this option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, let me check back.

@gatorsmile
Copy link
Member

For test-only PRs, we still can merge it to 2.3

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86976 has finished for PR 20479 at commit 4512d72.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -92,4 +96,37 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
}
}
}

allFileBasedDataSources.foreach { format =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @HyukjinKwon !

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-23305][SQL][TEST] Add spark.sql.files.ignoreMissingFiles test case for ORC [SPARK-23305][SQL][TEST] Add spark.sql.files.ignoreMissingFiles test case Feb 2, 2018
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-23305][SQL][TEST] Add spark.sql.files.ignoreMissingFiles test case [SPARK-23305][SQL][TEST] Test spark.sql.files.ignoreMissingFiles for all data sources Feb 2, 2018
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-23305][SQL][TEST] Test spark.sql.files.ignoreMissingFiles for all data sources [SPARK-23305][SQL][TEST] Test spark.sql.files.ignoreMissingFiles for all file-based data sources Feb 2, 2018
@dongjoon-hyun
Copy link
Member Author

I updated the JIRA issue and PR descriptions to mention all file-based data sources.

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86982 has finished for PR 20479 at commit 9a2f640.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Feb 3, 2018

Test build #87019 has finished for PR 20479 at commit 9a2f640.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 3, 2018

Test build #87022 has finished for PR 20479 at commit 9a2f640.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master/2.3

@asfgit asfgit closed this in 522e0b1 Feb 3, 2018
asfgit pushed a commit that referenced this pull request Feb 3, 2018
…r all file-based data sources

## What changes were proposed in this pull request?

Like Parquet, all file-based data source handles `spark.sql.files.ignoreMissingFiles` correctly. We had better have a test coverage for feature parity and in order to prevent future accidental regression for all data sources.

## How was this patch tested?

Pass Jenkins with a newly added test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #20479 from dongjoon-hyun/SPARK-23305.

(cherry picked from commit 522e0b1)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon and @gatorsmile .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-23305 branch February 3, 2018 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants