-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aborting IllegalArgumentException: DocValuesField "parentIds" is too large, must be <= 32766 caused by GeofileParser stacking adjacent folders #1668
Comments
The error message:
|
What kind of XML is this file? Seems it is being processed by GeofileParser and that parser is generating this huge and deep subitem tree. Maybe the generated tree could be simplified/shortened... @patrickdalla, since you implemented that parser, could you take a look? @hauck-jvsh could you send the triggering file to @patrickdalla? |
PS: I think it would be difficult to workaround that Lucene limitation, so I think the easier path would be to generate a shorter subitem tree, if possible... |
Sorry, I correct the original message it is an EML not a XML file. The original user says it is an EML with a map attached. |
I will ask our collegue if he can share the file. He just share the log file with me. |
yes.
Em qui., 27 de abr. de 2023 18:30, Luis Filipe Nassif <
***@***.***> escreveu:
… What kind of XML is this file? Seems it is being processed by
GeofileParser and that parser is generating this huge and deep subitem
tree. Maybe the generated tree could be simplified/shortened...
@patrickdalla <https://github.com/patrickdalla>, since you implemented
that parser, could you take a look? @hauck-jvsh
<https://github.com/hauck-jvsh> could you send the triggering file to
@patrickdalla <https://github.com/patrickdalla>?
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG247SZ4ER2VTVMP7NTQSYDXDLXSFANCNFSM6AAAAAAXON465Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
the file will be very useful to find the best solution
Em qui., 27 de abr. de 2023 21:11, Patrick Bernardina <
***@***.***> escreveu:
… yes.
Em qui., 27 de abr. de 2023 18:30, Luis Filipe Nassif <
***@***.***> escreveu:
> What kind of XML is this file? Seems it is being processed by
> GeofileParser and that parser is generating this huge and deep subitem
> tree. Maybe the generated tree could be simplified/shortened...
> @patrickdalla <https://github.com/patrickdalla>, since you implemented
> that parser, could you take a look? @hauck-jvsh
> <https://github.com/hauck-jvsh> could you send the triggering file to
> @patrickdalla <https://github.com/patrickdalla>?
>
> —
> Reply to this email directly, view it on GitHub
> <#1668 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AG247SZ4ER2VTVMP7NTQSYDXDLXSFANCNFSM6AAAAAAXON465Y>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Looking at the log we can see only 2 calls to the method "recursiveFolderParse". So, it seems that in the the GeofileParser itself, only 2 levels of parent ID is added. Maybe this error occurred by chance in GeofileParser. |
The user says he will send the file. As soon as I receive the file I will sent it to you @patrickdalla. Anyway, if the max deep is achieved by any means do you think that the best behavior is to abort? |
32K nested subitem level seems a bit crazy to me. I think we first should look into the triggering file, maybe it is corrupted, maybe it caused an infinite loop in some parser, so a fix at a parser level may be more appropriate. If it is a normal situation, we can think on a workaround to Lucene limitation, maybe prunning some deep subparents and putting deep children below some max parent level, or maybe just ignoring deep subparents in parentIds field before indexing it as a Lucene DocValue (this can cause some side effects, e.g. in Metadata panel filter when filtering for parentIds) |
There are some for loops into that parser that can create a deep subitem tree without a deep method recursion. I changed that parser a while ago to make it thread safe, I tried to keep the previous logic and asked for review, maybe I broke it, maybe it was already broken, maybe the file really has a deep structure and we should change the parser approach. Anyway, we need the triggering file... |
I just got the the files, how can I sent it to you @patrickdalla? |
via team |
It was an error of GeoFileParser that was stacking sequential folders one inside the other. I submitted the correction in branch GeoStackFoldersBug. |
Fixed by #1674. |
return so sequential folders are not recursivelly stacked on inside the other.
An user send me a log with an exception that occur in a caverd EML. It looks like that the EMLhas an extremely deep EML struct. This leads to a very large number of parentIds, apparently breaking the max number of array elements in the lucene index.
I don't known what to do in this case, should I ignore the items after this limit? Or should put than without the parentIds?
The text was updated successfully, but these errors were encountered: