Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set fieldQuotes to " by default #2190

Closed
peterdesmet opened this issue Nov 10, 2023 · 6 comments
Closed

Set fieldQuotes to " by default #2190

peterdesmet opened this issue Nov 10, 2023 · 6 comments
Assignees
Milestone

Comments

@peterdesmet
Copy link
Member

When publishing a Camtrap DP dataset (started from an zip archive), I get the following error:

Publishing version #1.0 of resource muntjak failed: Archive generation for resource muntjak failed: java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter

There were no issues when importing the archive and it is a different dataset than the Camtrap DP example dataset.

Originally posted by @peterdesmet in #2189 (comment)

@peterdesmet
Copy link
Member Author

This is the entire log file:

Data package generation started for version #1.0
Metadata added
Start writing data file for deployments
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source deployments
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source deployments
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source deployments
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source deployments
Data file written for deployments with 213 records and 24 columns
Start writing data file for media
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source media
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source media
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source media
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source media
Data file written for media with 466674 records and 11 columns
Start writing data file for observations
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source observations
No lines were skipped due to errors for mapping Camera Trap Data Package (Camtrap DP) in source observations
No lines with fewer columns than mapped for mapping Camera Trap Data Package (Camtrap DP) in source observations
All lines match the filter criteria for mapping Camera Trap Data Package (Camtrap DP) in source observations
Data file written for observations with 38426 records and 28 columns
All data files completed
Data package generation failed!
java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:149)
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.hasNext(CSVParser.java:159)
	at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
	at io.frictionlessdata.tableschema.iterator.TableIterator.hasNext(TableIterator.java:59)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
	at io.frictionlessdata.tableschema.Table.writeCSVData(Table.java:730)
	at io.frictionlessdata.tableschema.Table.writeCsv(Table.java:474)
	at io.frictionlessdata.tableschema.Table.write(Table.java:498)
	at io.frictionlessdata.tableschema.Table.writeCsv(Table.java:525)
	at io.frictionlessdata.datapackage.resource.AbstractResource.writeData(AbstractResource.java:591)
	at io.frictionlessdata.datapackage.Package.write(Package.java:557)
	at io.frictionlessdata.datapackage.Package.write(Package.java:525)
	at org.gbif.ipt.task.GenerateDataPackage.bundleArchive(GenerateDataPackage.java:235)
	at org.gbif.ipt.task.GenerateDataPackage.call(GenerateDataPackage.java:135)
	at org.gbif.ipt.task.GenerateDataPackage.call(GenerateDataPackage.java:77)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: (line 14721) invalid char between encapsulated token and delimiter
	at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:364)
	at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285)
	at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701)
	at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:146)
	... 18 more

@mike-podolskiy90
Copy link
Contributor

Thanks, I'll have a look

@mike-podolskiy90 mike-podolskiy90 self-assigned this Nov 13, 2023
@mike-podolskiy90 mike-podolskiy90 added this to the 3.0 milestone Nov 13, 2023
@mike-podolskiy90
Copy link
Contributor

That line has the value "nie zeker, soort vogel"
So after configuring Field quotes for the observation source it works fine

@peterdesmet
Copy link
Member Author

Oh thanks. I think we've had this issue before. Is there a disadvantage to assuming Field quotes true by default (which would avoid this error)?

@mike-podolskiy90
Copy link
Contributor

Yes, we certainly have. I don't see any, but I might be mistaken. I'll have a look

@peterdesmet peterdesmet changed the title invalid char between encapsulated token and delimiter Set fieldQuotes to " by default Nov 13, 2023
@mike-podolskiy90
Copy link
Contributor

Implemented here #2193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants