Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766 caused by GeofileParser stacking adjacent folders #1668

Closed
hauck-jvsh opened this issue Apr 27, 2023 · 15 comments
Assignees
Labels

Comments

@hauck-jvsh
Copy link
Member

hauck-jvsh commented Apr 27, 2023

An user send me a log with an exception that occur in a caverd EML. It looks like that the EMLhas an extremely deep EML struct. This leads to a very large number of parentIds, apparently breaking the max number of array elements in the lucene index.
I don't known what to do in this case, should I ignore the items after this limit? Or should put than without the parentIds?

@hauck-jvsh
Copy link
Member Author

The error message:

[B720]/Polyline [B71F]/Polyline [B71E]/Polyline [B71D]/Polyline [B71C]/Polyline [B71B]/Polyline [B71A]/Polyline [B719]/Line [B718]/Arc [B717]/Line [B716]/Arc [B715]/Arc [B714]/Arc [B713]/Arc [B713]:0 (1124bytes)
at iped.engine.core.Worker.process(Worker.java:186) ~[iped-engine-4.1.2.jar:?]
at iped.engine.core.Worker.processNewItem(Worker.java:226) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.ParsingTask.parseEmbedded(ParsingTask.java:689) ~[iped-engine-4.1.2.jar:?]
at iped.geo.parsers.GeofileParser.featureParser(GeofileParser.java:231) ~[iped-geo-4.1.2.jar:?]
at iped.geo.parsers.GeofileParser.recursiveFolderParse(GeofileParser.java:116) ~[iped-geo-4.1.2.jar:?]
at iped.geo.parsers.GeofileParser.recursiveFolderParse(GeofileParser.java:123) ~[iped-geo-4.1.2.jar:?]
at iped.geo.parsers.GeofileParser.parse(GeofileParser.java:94) ~[iped-geo-4.1.2.jar:?]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-core-2.4.0-p1.jar:2.4.0]
at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) ~[iped-parsers-impl-4.1.2.jar:?]
at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) ~[iped-engine-4.1.2.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: java.lang.IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766
at org.apache.lucene.index.SortedDocValuesWriter.addValue(SortedDocValuesWriter.java:77) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.IndexingChain.indexDocValue(IndexingChain.java:977) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:758) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1503) ~[lucene-core-9.2.0.jar:9.2.0 ba8c3a806ada3d7b3c34d408e449a92376a8481b - romseygeek - 2022-05-19 15:10:13]
at iped.engine.task.index.IndexTask.process(IndexTask.java:149) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processMonitorTimeout(AbstractTask.java:277) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:192) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.MinIOTask.sendToNextTask(MinIOTask.java:372) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.PythonTask.sendToNextTask(PythonTask.java:278) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.PythonTask.sendToNextTask(PythonTask.java:278) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.PythonTask.sendToNextTask(PythonTask.java:278) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) ~[iped-engine-4.1.2.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) ~[iped-engine-4.1.2.jar:?]
at iped.engine.core.Worker.process(Worker.java:177) ~[iped-engine-4.1.2.jar:?]

@hauck-jvsh hauck-jvsh added the bug label Apr 27, 2023
@lfcnassif
Copy link
Member

What kind of XML is this file? Seems it is being processed by GeofileParser and that parser is generating this huge and deep subitem tree. Maybe the generated tree could be simplified/shortened... @patrickdalla, since you implemented that parser, could you take a look? @hauck-jvsh could you send the triggering file to @patrickdalla?

@lfcnassif
Copy link
Member

PS: I think it would be difficult to workaround that Lucene limitation, so I think the easier path would be to generate a shorter subitem tree, if possible...

@hauck-jvsh
Copy link
Member Author

Sorry, I correct the original message it is an EML not a XML file. The original user says it is an EML with a map attached.

@hauck-jvsh
Copy link
Member Author

What kind of XML is this file? Seems it is being processed by GeofileParser and that parser is generating this huge and deep subitem tree. Maybe the generated tree could be simplified/shortened... @patrickdalla, since you implemented that parser, could you take a look? @hauck-jvsh could you send the triggering file to @patrickdalla?

I will ask our collegue if he can share the file. He just share the log file with me.

@patrickdalla
Copy link
Collaborator

patrickdalla commented Apr 28, 2023 via email

@patrickdalla
Copy link
Collaborator

patrickdalla commented Apr 28, 2023 via email

@patrickdalla
Copy link
Collaborator

Looking at the log we can see only 2 calls to the method "recursiveFolderParse". So, it seems that in the the GeofileParser itself, only 2 levels of parent ID is added. Maybe this error occurred by chance in GeofileParser.

@hauck-jvsh
Copy link
Member Author

The user says he will send the file. As soon as I receive the file I will sent it to you @patrickdalla. Anyway, if the max deep is achieved by any means do you think that the best behavior is to abort?

@lfcnassif
Copy link
Member

32K nested subitem level seems a bit crazy to me. I think we first should look into the triggering file, maybe it is corrupted, maybe it caused an infinite loop in some parser, so a fix at a parser level may be more appropriate.

If it is a normal situation, we can think on a workaround to Lucene limitation, maybe prunning some deep subparents and putting deep children below some max parent level, or maybe just ignoring deep subparents in parentIds field before indexing it as a Lucene DocValue (this can cause some side effects, e.g. in Metadata panel filter when filtering for parentIds)

@lfcnassif
Copy link
Member

lfcnassif commented Apr 28, 2023

Looking at the log we can see only 2 calls to the method "recursiveFolderParse". So, it seems that in the the GeofileParser itself, only 2 levels of parent ID is added. Maybe this error occurred by chance in GeofileParser.

There are some for loops into that parser that can create a deep subitem tree without a deep method recursion. I changed that parser a while ago to make it thread safe, I tried to keep the previous logic and asked for review, maybe I broke it, maybe it was already broken, maybe the file really has a deep structure and we should change the parser approach. Anyway, we need the triggering file...

@hauck-jvsh
Copy link
Member Author

I just got the the files, how can I sent it to you @patrickdalla?

@patrickdalla
Copy link
Collaborator

via team

@patrickdalla
Copy link
Collaborator

It was an error of GeoFileParser that was stacking sequential folders one inside the other. I submitted the correction in branch GeoStackFoldersBug.

@lfcnassif lfcnassif changed the title Problem with more than 32766 in the parentIds Aborting IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766 caused by GeofileParser May 5, 2023
@lfcnassif lfcnassif changed the title Aborting IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766 caused by GeofileParser Aborting IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766 caused by GeofileParser stacking adjacent folders May 5, 2023
@lfcnassif
Copy link
Member

Fixed by #1674.

lfcnassif pushed a commit that referenced this issue Jun 27, 2023
return so sequential folders are not recursivelly stacked on inside the
other.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants