Set fieldQuotes to `"` by default #2190

peterdesmet · 2023-11-10T15:15:36Z

When publishing a Camtrap DP dataset (started from an zip archive), I get the following error:

Publishing version #1.0 of resource muntjak failed: Archive generation for resource muntjak failed: java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter

There were no issues when importing the archive and it is a different dataset than the Camtrap DP example dataset.

Originally posted by @peterdesmet in #2189 (comment)

peterdesmet · 2023-11-10T15:16:44Z

This is the entire log file:

Data package generation started for version #1.0
Metadata added
Start writing data file for deployments
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source deployments
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source deployments
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source deployments
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source deployments
Data file written for deployments with 213 records and 24 columns
Start writing data file for media
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source media
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source media
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source media
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source media
Data file written for media with 466674 records and 11 columns
Start writing data file for observations
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source observations
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source observations
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source observations
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source observations
Data file written for observations with 38426 records and 28 columns
All data files completed
Data package generation failed!
java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:149)
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.hasNext(CSVParser.java:159)
	at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
	at io.frictionlessdata.tableschema.iterator.TableIterator.hasNext(TableIterator.java:59)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
	at io.frictionlessdata.tableschema.Table.writeCSVData(Table.java:730)
	at io.frictionlessdata.tableschema.Table.writeCsv(Table.java:474)
	at io.frictionlessdata.tableschema.Table.write(Table.java:498)
	at io.frictionlessdata.tableschema.Table.writeCsv(Table.java:525)
	at io.frictionlessdata.datapackage.resource.AbstractResource.writeData(AbstractResource.java:591)
	at io.frictionlessdata.datapackage.Package.write(Package.java:557)
	at io.frictionlessdata.datapackage.Package.write(Package.java:525)
	at org.gbif.ipt.task.GenerateDataPackage.bundleArchive(GenerateDataPackage.java:235)
	at org.gbif.ipt.task.GenerateDataPackage.call(GenerateDataPackage.java:135)
	at org.gbif.ipt.task.GenerateDataPackage.call(GenerateDataPackage.java:77)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter
	at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:364)
	at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285)
	at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701)
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:146)
	... 18 more

mike-podolskiy90 · 2023-11-10T16:38:22Z

Thanks, I'll have a look

mike-podolskiy90 · 2023-11-13T10:22:53Z

That line has the value "nie zeker, soort vogel"
So after configuring Field quotes for the observation source it works fine

peterdesmet · 2023-11-13T10:40:30Z

Oh thanks. I think we've had this issue before. Is there a disadvantage to assuming Field quotes true by default (which would avoid this error)?

mike-podolskiy90 · 2023-11-13T11:27:48Z

Yes, we certainly have. I don't see any, but I might be mistaken. I'll have a look

mike-podolskiy90 · 2023-11-17T12:06:08Z

Implemented here #2193

peterdesmet added the Data Package : Camtrap DP label Nov 10, 2023

peterdesmet mentioned this issue Nov 10, 2023

Starting Camtrap DP resource from zip does not recognize observationID #2189

Closed

mike-podolskiy90 self-assigned this Nov 13, 2023

mike-podolskiy90 added this to the 3.0 milestone Nov 13, 2023

peterdesmet changed the title ~~invalid char between encapsulated token and delimiter~~ Set fieldQuotes to " by default Nov 13, 2023

peterdesmet mentioned this issue Nov 13, 2023

Publication fails for values containing escaped quotes and comma #2198

Closed

mike-podolskiy90 mentioned this issue Nov 16, 2023

Parsing issue of quoted values in large CSV file #1930

Closed

mike-podolskiy90 closed this as completed Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set fieldQuotes to `"` by default #2190

Set fieldQuotes to `"` by default #2190

peterdesmet commented Nov 10, 2023

peterdesmet commented Nov 10, 2023

mike-podolskiy90 commented Nov 10, 2023

mike-podolskiy90 commented Nov 13, 2023

peterdesmet commented Nov 13, 2023

mike-podolskiy90 commented Nov 13, 2023

mike-podolskiy90 commented Nov 17, 2023

Set fieldQuotes to " by default #2190

Set fieldQuotes to " by default #2190

Comments

peterdesmet commented Nov 10, 2023

peterdesmet commented Nov 10, 2023

mike-podolskiy90 commented Nov 10, 2023

mike-podolskiy90 commented Nov 13, 2023

peterdesmet commented Nov 13, 2023

mike-podolskiy90 commented Nov 13, 2023

mike-podolskiy90 commented Nov 17, 2023

Set fieldQuotes to `"` by default #2190

Set fieldQuotes to `"` by default #2190