-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing issue of quoted values in large CSV file #1930
Comments
@peterdesmet Thank you Peter for reporting this and sorry for the late reply. |
Oh great, so it |
Yes, it works. Also I've just checked creating a couple of file sources, and they all have |
Thanks! I notice now that other In any case, would indeed be good if description of the field was updated. |
Did some more tests. Looks like the IPT is assessing the source data to figure out whether to use quotes values or not:
So far so good. The issue seems to be that it only looks in the first 20 lines of a file. Compare:
This will lead to parsing issues down the line. The user can off course always set |
@peterdesmet Thank you for spending your time testing this issue. |
Let's just set |
I ran into a CSV parsing issue.
The source file I'm uploading is a CSV file (2.5 million rows) that only uses "quotes" when needed. This is the default
readr::write_csv()
behaviour, e.g. to escape commas in values. Note that I am not indicatingField Quotes: "
in the IPT, as that is reserved for when all values are quoted.Snippet of source file. Notice quoted
"BIG FLOCK, WIDESPREAD FORAGING ON SPRAT"
inoccurrenceRemarks
In the generated Darwin Core Archive, the resulting file is the following. Notice how
"BIG FLOCK ...
is now spread over multiple fields:Any idea what might be causing this? It's the first time I encounter this, even though I have uploaded many such CSV files (with only quoted values when necessary) to the IPT before, without ever running into issues. See e.g. https://www.gbif.org/occurrence/3795234906, where some values contain commas:
The text was updated successfully, but these errors were encountered: