qiqqa crashing when generating autotags #283

quissicks · 2021-01-03T16:23:15Z

Happy New Year to the qiqqa community! I have a very large library. I am running version V82.0.7579.33985. It crashed when I try to generate autotags.

GerHobbelt · 2021-01-03T18:58:30Z

Happy & healthy new year! Re issue: Much appreciated if you can send the logfiles. Did the crash happen again after restart of the application and regenerating the autotags, i.e. is the application crashing *consistently *?

…

On Sun, Jan 3, 2021, 17:23 quissicks ***@***.***> wrote: Happy New Year to the qiqqa community! I have a very large library. I am running the most recent version posted by Ger. It crashed when I try to generate autotags. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#283>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADCIHXGBCDP6EAXSQGR3P3SYCKYBANCNFSM4VR7W5UQ> .

GerHobbelt · 2021-01-10T22:54:14Z

Hi Chris,

Finally took time to inspect your logfiles earlier today. Still going through them as there's some other stuff in there that hints of other trouble. Anyway, we'll get to that.

What I can see for the logfiles, the root cause is (with high probability) the auto tag processing (and not something happening in the background that "just happens at the same time"). The outofmem failure happens inside the LuceneNET library code as this library code is busy updating the search index with the new autotags which are attached to each document. (The LuceneNET search index processes all PDF document texts plus all PDF text-based metadata (tags, BibTeX, title, etc.)

Thank you very much for sending the bundled logfiles; I'll bother you with a few more requests if that's okay:

Aside

it's not related to this issue but I noticed a bunch of PDFs producing 'irregular' log output during OCR/text background processing for the search index updates, which translates to:

you seem to have several PDFs in your collection which would be good to have as test cases for further Qiqqa PDF/OCR work; these would then end up in the large github test set repository at https://github.com/GerHobbelt/Evil-PDF-Library-for-Qiqqa
some of those 'irregular' PDFs produce a 'nil' output, which means Qiqqa has been unable to extract any text from them or no text from a limited set of pages (which can legally happen when those pages only carry graphics, such as charts, photos or schematics)
at time of writing there was one(1) very unexpected out-of-page-range request "which should never have happened": it's not harmful, but is indicative of a PDF apparently triggering some faulty internal behaviour that I haven't seen before and needs looking into.

I'd like to have a look at those PDFs when time allows, if that's okay.

Back to the issue at hand

The short end of the problem at hand is that I don't have a quick fix for it right now.

Memory management in .NET applications isn't easy stuff; I'm considering how to tackle this sooner than my intended end result: Qiqqa in 64 bit with upgraded libraries. (#289, section "How much .NET memory is gobbled up by the Lucene search databases in current Qiqqa?")

From what I can see so far is the problem is caused by all the LuceneNET activity resulting from the set of AutoTags discovered and assigned to the documents. 🤔 Thinking about how to approach this problem and reduce the memory pressure in the application.

Current questions for you (@quissicks)

What's the total number of documents in your libraries?

No need to add it to the last item, but rather a range like 'between 40K and 42K documents': I'm wondering if my own libraries are sufficiently large to be useful for testing the issue you're experiencing or whether I need to build a larger library to help inspect memory pressure in .NET.
The request for particular PDFs will follow later as a single batch to keep that separate.

…en* exactly these out-of-bounds requests occur - as this was discovered in customer log files during problem analysis of jimmejardine#283

GerHobbelt · 2021-01-13T02:09:03Z

@quissicks : Hi Chris,

There's a new (test) release published at https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7649.30836 ; see description there. You can simply install it over your existing Qiqqa; if you want to revert to another Qiqqa version, you can install that version over the new one without trouble.

See also the last comment at #288 (the other issue this release is targetting) and the screenshot of the startup dialog there: not meant for your situation, so only an awareness bit. In your case I'm particularly interested in the new log files; regrettably I haven't been able to do something seriously about memory reduction yet: I have a few observations, also from my own testing, but it's pretty tough to pinpoint the culprits (well, technically more accurate is saying the culprits are easily found in a memory profiler but the big hurdle is coming up with ways to alleviate the memory pressure there: it's all the documents, which load their metadata into memory at the first "opportunity" where such is needed (e.g. when analyzing metadata in the background for auto-tagging, checking the indexing, etc.etc.) and then Qiqqa isn't smart about it and doesn't know how to, say, "throw away" these datums when the acute need for them has gone. Plus there's the curious observation in my own tests that 'apparently' there are more PDF document 'instances' in memory than I have PDF documents in all the libraries, so that's another ho-hum-hum to research: that one has to be tested with a very small library (or set of libraries) to see if I can reproduce that 'too many' situation then and find out where it originates -- doing that in a huge lib is a too cumbersome.

Anyway, just so you get a bit of feel for what's seen and know that work is being done, only I cannot predict results yet as I'm still in the 'finding out what's going exactly phase, while also realizing that there's some serious refactoring required if I must detect high memory pressure and 'discard' old-ish metadata -- which isn't timestamped yet as these are all persistent stores, not 'caches' in the usual sense, where stuff comes in, gets a timestamp that's tracked and refreshed based on usage and then killed off when the cached stuff 'expires'.

No matter, ignore if that's too geeky for you 😅

Have a go at the new version if you like and I'ld be happy to see another set of logfiles. Thanks!

By The Way

Apologies for any 'rough edges' with the new one; pushed the release out so it's here today and not, say, friday or later. Real life and all that jazz. Ciao!

quissicks · 2021-01-13T09:07:01Z

Dear Ger, Thanks for this. I have now installed the new version. I will send you logs if there are any crashes. I really welcome your commentary. I used to do a lot of computing myself – I used to administer a cluster of Sun workstations and I did a lot of programming (I developed a very large simulation model for my PhD, which was coded from scratch). However, I am very out of date – I haven’t done much since 2006 when the Sun compilers were withdrawn (and porting to something else would have been a huge task). At some stage I must get round to programming in some contemporary language. Thanks again, Chris. From: Ger Hobbelt <notifications@github.com> Sent: 13 January 2021 02:09 To: jimmejardine/qiqqa-open-source <qiqqa-open-source@noreply.github.com> Cc: Chris Hicks <chris.hicks@newcastle.ac.uk>; Mention <mention@noreply.github.com> Subject: Re: [jimmejardine/qiqqa-open-source] qiqqa crashing when generating autotags (#283) ⚠ External sender. Take care when opening links or attachments. Do not provide your login details. @quissicks<https://github.com/quissicks> : Hi Chris, There's a new (test) release published at https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7649.30836 ; see description there. You can simply install it over your existing Qiqqa; if you want to revert to another Qiqqa version, you can install that version over the new one without trouble. See also the last comment at #288<#288> (the other issue this release is targetting) and the screenshot of the startup dialog there: not meant for your situation, so only an awareness bit. In your case I'm particularly interested in the new log files; regrettably I haven't been able to do something seriously about memory reduction yet: I have a few observations, also from my own testing, but it's pretty tough to pinpoint the culprits (well, technically more accurate is saying the culprits are easily found in a memory profiler but the big hurdle is coming up with ways to alleviate the memory pressure there: it's all the documents, which load their metadata into memory at the first "opportunity" where such is needed (e.g. when analyzing metadata in the background for auto-tagging, checking the indexing, etc.etc.) and then Qiqqa isn't smart about it and doesn't know how to, say, "throw away" these datums when the acute need for them has gone. Plus there's the curious observation in my own tests that 'apparently' there are more PDF document 'instances' in memory than I have PDF documents in all the libraries, so that's another ho-hum-hum to research: that one has to be tested with a very small library (or set of libraries) to see if I can reproduce that 'too many' situation then and find out where it originates -- doing that in a huge lib is a too cumbersome. Anyway, just so you get a bit of feel for what's seen and know that work is being done, only I cannot predict results yet as I'm still in the 'finding out what's going exactly phase, while also realizing that there's some serious refactoring required if I must detect high memory pressure and 'discard' old-ish metadata -- which isn't timestamped yet as these are all persistent stores, not 'caches' in the usual sense, where stuff comes in, gets a timestamp that's tracked and refreshed based on usage and then killed off when the cached stuff 'expires'. No matter, ignore if that's too geeky for you 😅 Have a go at the new version if you like and I'ld be happy to see another set of logfiles. Thanks! By The Way Apologies for any 'rough edges' with the new one; pushed the release out so it's here today and not, say, friday or later. Real life and all that jazz. Ciao! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#283 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGVCBDPM5BJEEAQTNH3TKNDSZT6EZANCNFSM4VR7W5UQ>.

GerHobbelt · 2021-01-16T21:59:18Z

Quick heads up: new release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537

Please report anything you observe with the new release. Thanks!

quissicks · 2021-01-16T22:03:07Z

Thanks Ger, I am just installing it now. Best wishes, Chris. From: Ger Hobbelt <notifications@github.com> Sent: 16 January 2021 22:00 To: jimmejardine/qiqqa-open-source <qiqqa-open-source@noreply.github.com> Cc: Chris Hicks <chris.hicks@newcastle.ac.uk>; Mention <mention@noreply.github.com> Subject: Re: [jimmejardine/qiqqa-open-source] qiqqa crashing when generating autotags (#283) ⚠ External sender. Take care when opening links or attachments. Do not provide your login details. Quick heads up: new release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537 Please report anything you observe with the new release. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#283 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGVCBDOAMHOIVIVNTJRBHB3S2ID4HANCNFSM4VR7W5UQ>.

GerHobbelt · 2021-01-17T02:48:19Z

Quick heads up: hotfix release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7656.6401 (which fixes known issue in previous release https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537)

Please report anything you observe with the new release. Thanks!

…I thread; the lib uses COM under the hood, which requires a working and accessible Windows message pipe, something which only the UI thread can provide. - littered the code with WPFDoEvents UI/not-UI assertions -- which caught the above scenario in a Dispose() for a page image render. And that was the hint the needed to progress a little further towards stibility: it was SORAX which caused a *lot* of the out-of-memory failures due to crazy COM/WPF/UI failures, even for smaller libraries under test. - fix bit of an odd crash in the Lucene flush/cleanup during shutdown, where Lucene kept busy with 'optimizing the index' while a quick application termination was happening in the background, resulting in lockup and then a crash. - this MAY be a fix for the reported "number of documents reported not matching reality": added update/refresh code to update the library list panel when PDF documents are added in the background via FolderWatcher or other means (async library loading). WARNING: this code is still incomplete/buggy! - most UI assertions have been covered now. Keeping them anyway as this is hairy stuff and should be tested more. Addresses (but is not guaranteed to fix) jimmejardine#290, jimmejardine#283, jimmejardine#281, jimmejardine#280, jimmejardine#243

GerHobbelt added the 🐛bug Something isn't working label Jan 4, 2021

GerHobbelt mentioned this issue Jan 10, 2021

Migrate Qiqqa to 64 bit architecture to cope with large libraries, etc. (Future Plan) #289

Open

GerHobbelt added this to the v82 milestone Jan 10, 2021

GerHobbelt mentioned this issue Jan 16, 2021

Qiqqa crashing #264

Closed

GerHobbelt mentioned this issue Feb 27, 2021

Qiqqa error pops up "unexpected problem in qiqqa" v83.0.7656.6401 - I sent you zipped logs to email #304

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qiqqa crashing when generating autotags #283

qiqqa crashing when generating autotags #283

quissicks commented Jan 3, 2021 •

edited

Loading

GerHobbelt commented Jan 3, 2021 via email

GerHobbelt commented Jan 10, 2021

GerHobbelt commented Jan 13, 2021

quissicks commented Jan 13, 2021 via email

GerHobbelt commented Jan 16, 2021

quissicks commented Jan 16, 2021 via email

GerHobbelt commented Jan 17, 2021

qiqqa crashing when generating autotags #283

qiqqa crashing when generating autotags #283

Comments

quissicks commented Jan 3, 2021 • edited Loading

GerHobbelt commented Jan 3, 2021 via email

GerHobbelt commented Jan 10, 2021

Aside

Back to the issue at hand

Current questions for you (@quissicks)

GerHobbelt commented Jan 13, 2021

By The Way

quissicks commented Jan 13, 2021 via email

GerHobbelt commented Jan 16, 2021

quissicks commented Jan 16, 2021 via email

GerHobbelt commented Jan 17, 2021

quissicks commented Jan 3, 2021 •

edited

Loading