Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search index stops working and re-indexing doesn't recreate the Lucene search db #16

Closed
GerHobbelt opened this issue Aug 2, 2019 · 3 comments
Labels
🐛bug Something isn't working
Milestone

Comments

@GerHobbelt
Copy link
Collaborator

I've had this problem many times over the years with a 20K+ docs db. (using v76-80 (github release))

@GerHobbelt
Copy link
Collaborator Author

From #13

  1. search index stops working and re-indexing doesn't recreate the Lucene search db #16: Qiqqa failed on several occasions with my large PDF collection, causing a permanent and total failure in its search feature, i.e. the Lucene database got nuked/b0rked. All subsequent searches in Qiqqa would deliver ZERO results, quickly.

    • Reindexing via the Qiqqa Tools panel would have no effect.

      Tools > Qiqqa Configuration > Troubleshooting > Rebuild Library Search Indices
      
    • Manually deleting all the Lucene DB files in base/Guest/index/ would also be to no avail.

    • Reconstructing the Library by importing the PDF files in tiny batches via the Directory Watch feature of Qiqqa would result in 'semi-random behaviour': it now turns out to be highly dependent on which PDF files got loaded first: as soon as an offending PDF (to be uploaded later) got included in the library, the Lucene-backed search facility would break down and stop to function.

    Note: Pending investigation suspects SyncFusion 14 locks up (hangs) when reading some PDF files #11 at least; at the time of this writing SyncFusion 14 locks up (hangs) when reading some PDF files #11 has been fixed and this was a required first step towards making the Lucene-backed search feature work and (re)generate a working search index once again.

@GerHobbelt
Copy link
Collaborator Author

Done as per #33.

Commits:

Revision: d58bd7a
revert debug code that was part of commit SHA-1: 89307ed -- some invalid BibTeX was crashing the Lucene indexer (AddDocumentMetadata_BibTex() would b0rk on a NULL Key)

That problem was fixed in that commit at a higher level (in PDFDocument)

Revision: 89307ed
some invalid BibTeX was crashing the Lucene indexer (AddDocumentMetadata_BibTex() would b0rk on a NULL Key)

Sample invalid BibTeX:

@empty = delete?

Revision: 8a1d766
Fix #17 by processing PDFs in any Qiqqa library in small batches so that Qiqqa is not unreponsive for a loooooooooooooong time when it is re-indexing/upgrading/whatever a large library, e.g. 20K+ PDF files. The key here is to make the 'infrequent background task' produce some result quickly (like a working, yet incomplete, Lucene search index DB!) and then updating/augmenting that result as time goes by. This way, we can recover a search index for larger Qiqqa libraries!

@GerHobbelt GerHobbelt changed the title search index stops working and re-indexing doesn't recreate the Lucene search db ✅ search index stops working and re-indexing doesn't recreate the Lucene search db Aug 8, 2019
@GerHobbelt
Copy link
Collaborator Author

Closing and decluttering the issue list so it stays workable for me: fixed in https://github.com/GerHobbelt/qiqqa-open-source mainline=master branch, pending #15 / any maintainer rights/actions.

@GerHobbelt GerHobbelt added the 🐛bug Something isn't working label Oct 4, 2019
@GerHobbelt GerHobbelt added this to the v82 milestone Oct 4, 2019
@GerHobbelt GerHobbelt changed the title ✅ search index stops working and re-indexing doesn't recreate the Lucene search db search index stops working and re-indexing doesn't recreate the Lucene search db Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant