-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20980] [SQL] Rename wholeFile
to multiLine
for both CSV and JSON
#18202
Conversation
wholeFile
to multiLine
wholeFile
to multiLine
Test build #77733 has started for PR 18202 at commit |
Retest this please. |
@@ -128,7 +128,7 @@ class CSVOptions( | |||
FastDateFormat.getInstance( | |||
parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"), timeZone, Locale.US) | |||
|
|||
val wholeFile = parameters.get("wholeFile").map(_.toBoolean).getOrElse(false) | |||
val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @gatorsmile .
It seems that we need to change JSONOptions.wholeFile
together.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is different. Each JSON file only can parse at most one record when wholeFile is on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethinking the issue, we need to rename both CSV and JSON to multiLine
and fix the JSON parsing to make them consistent.
wholeFile
to multiLine
wholeFile
to multiLine
for both CSV and JSON
Test build #77746 has finished for PR 18202 at commit
|
I think the change itself looks good as targeted to me (if this is going to be included in 2.2.0 - I just saw https://issues.apache.org/jira/browse/SPARK-20980?focusedCommentId=16037416&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16037416). Looks we just need a decision. Probably, please let me cc @rxin who I believe came up the option name initially. |
Wouldn't this break compatibility? |
Test build #77751 has finished for PR 18202 at commit
|
Test build #77752 has finished for PR 18202 at commit
|
Both options look added in 2.2 assuming from the JIRAs https://issues.apache.org/jira/browse/SPARK-19610 and https://issues.apache.org/jira/browse/SPARK-18352. If it targets 2.2.0, I guess It wouldn't. |
let's hold it until RC4 finishes. If RC4 passes, we need to update this PR to support the old option name, otherwise we can just rename. |
Hi all, it sounds RC4 vote was failed. Should we proceed this one? |
ah lucky :) merging to master/2.2! |
… JSON The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`. N/A Author: Xiao Li <gatorsmile@gmail.com> Closes #18202 from gatorsmile/renameCVSOption. (cherry picked from commit 2051428) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
opened #18312 |
… JSON ### What changes were proposed in this pull request? The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`. ### How was this patch tested? N/A Author: Xiao Li <gatorsmile@gmail.com> Closes apache#18202 from gatorsmile/renameCVSOption.
What changes were proposed in this pull request?
The current option name
wholeFile
is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal ismultiLine
.How was this patch tested?
N/A