-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No error raised if xml has only one failed row #436
Comments
The issue here is that in the second example, Now, arguably that should be an error. If a start tag is found, but no end, something's wrong. Right now it's silently ignored. I think I can see a way to implement it: return whatever text it found after the start tag, which will certainly fail to parse correctly later, and then PERMISSIVE/FAILFAST works. There's a subtle issue here in any event. If an end tag is missing, it will keep reading. If it's followed by valid row tags, they are considered nested. It will just eat the rest of the file and not recover to parse the valid tags. Allowing nested row tags is a valid use case, so don't want to change that. At least one would get a real chance to handle the error with a change like the above. |
Hey @srowen Thanks for the quick update and PR ! Thanks ! |
Shouldn't it throw an error if it can't find a start tag at all, i mean if a JSON file is there in the path instead of XML, it should throw an Exception? |
Good question. I hesitate to do it because, well, the current behavior is to just return nothing (which doesn't mean it's correct). The XML data source is a little bit different in that it's meant to potentially just examine a small subset of larger files in general, so maybe it's less unusual to accept not finding any row tags in input sometimes? What I think we might want to do is throw an error if the schema is auto-inferred, but it's entirely empty. This doesn't seem like a meaningful outcome and could/should be an error? |
…ling instead (#437) See #436 (comment) for context. This stops silently swallowing unclosed tags at the end of input and instead propagates the remaining content for normal error handling. Closes #436
…ling instead (#437) See databricks/spark-xml#436 (comment) for context. This stops silently swallowing unclosed tags at the end of input and instead propagates the remaining content for normal error handling. Closes #436
Hi.
We are making some failure tests with
spark-xml
package within DatabricksWe discovered that a xml file containing only one failing row will throw nothing.
Let me explain it with a simple example.
I've used your /tests samples for reproduction:
Here a good malformatted xml file:
The test is using the
FAILFAST
option, but the defaultPERMISSIVE
option will have the same behavior at the end.So far, the test is:
and the expected result is:
But now, if I have a file with only one failing row, like this:
Now, with the same test, the expected result should be pretty the same as the last test, but unfortunatelly, the result is:
Any thought ?
Additional question : Just to know, do you support the databricks option "badRecordsPath" ?
The text was updated successfully, but these errors were encountered: