-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter #2044
Conversation
…gle file as parameter
QA tests have started for PR 2044 at commit
|
QA tests have finished for PR 2044 at commit
|
A normal, complete Parquet file on HDFS should be directory with a |
i am confused about this definition "Parquet file on HDFS should be directory"... this was reported in user list http://apache-spark-user-list.1001560.n3.nabble.com/sqlContext-parquetFile-path-fails-if-path-is-a-file-but-succeeds-if-a-directory-td12345.html i think he is also right, parameter of sqlContext.parquetFile should be a dir of parquet files (better with a _metadata file in it), but this should not be a mandatory requirement, our javadoc otherwise maybe we should rename this method to sqlContext.parquetDir or somthing and this PR is not "try to handle", it is more like "try to ignore useless check" :) it is already handled by children.find(...) |
I tend to agree here with attempting to provide more functionality as long as it doesn't complicate the code base. Seems like parquet works fine without this check. |
Jenkins, test this please. |
QA tests have started for PR 2044 at commit
|
QA tests have finished for PR 2044 at commit
|
Hmm... the last Jenkins build info has already been cleaned up, don't know why it failed. |
Jenkins, retest this please. |
QA tests have started for PR 2044 at commit
|
QA tests have finished for PR 2044 at commit
|
The build failure was caused by unrelated test suites, should be good to go. |
…gle file as parameter ```if (!fs.getFileStatus(path).isDir) throw Exception``` make no sense after this commit #1370 be careful if someone is working on SPARK-2551, make sure the new change passes test case ```test("Read a parquet file instead of a directory")``` Author: chutium <teng.qiu@gmail.com> Closes #2044 from chutium/parquet-singlefile and squashes the following commits: 4ae477f [chutium] [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter (cherry picked from commit 48f4278) Signed-off-by: Michael Armbrust <michael@databricks.com>
Thanks! I've merged this into master and 1.1. |
…gle file as parameter ```if (!fs.getFileStatus(path).isDir) throw Exception``` make no sense after this commit apache#1370 be careful if someone is working on SPARK-2551, make sure the new change passes test case ```test("Read a parquet file instead of a directory")``` Author: chutium <teng.qiu@gmail.com> Closes apache#2044 from chutium/parquet-singlefile and squashes the following commits: 4ae477f [chutium] [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter
Can we have this take a parent directory that has other subdirectories with parquet files, instead of the parquet files being the immediate children of the directory. |
if (!fs.getFileStatus(path).isDir) throw Exception
make no sense after this commit #1370be careful if someone is working on SPARK-2551, make sure the new change passes test case
test("Read a parquet file instead of a directory")