QiqqaOCR: The Sorax PDF render library seems to take longer and longer, the higher the page number to render is. #136
Labels
🦸♀️enhancement🦸♂️
New feature or request
🕵investigate
Needs further analysis to find the root cause.
⛷performance
Anything that's related to UX: speed of response; I/O speed, etc.
👮wontfix
This will not be worked on
Milestone
The Sorax PDF render library seems to take longer and longer, the higher the page number to render in the PDF is.
Smells like we're facing a O(n^2) bad performance behaviour here, thanks to the way QiqqaOCR is working? (one page per invocation in SINGLE mode, which results in Sorax' apparent O(n) cost turning into O(1/2 n^2) thus O(n^2) performance hog at the Qiqqa level.
Would Sorax render costs drop when we simply grab all pages in the PDF at once and dump them to image files, to be Tesseract-OCR'd in another process?
The text was updated successfully, but these errors were encountered: