You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed this easyocr version via pipx and I went to compare a bunch of files between the original ocrmypdf and this one, and found that while easyocr is WAY more accurate at getting the letters right, the sidecar is all one line. Less than ideal and sounds like a bug to me.
If I pdftotext the pdf, it comes out on multiple lines. But the sidecar is jacked.
to reproduce, use --sidecar I can provide a jpg for sure if you want.
The text was updated successfully, but these errors were encountered:
The output format from easyocr doesn't really have line group, so that information has to be inferred. Using pdftotext -layout should give an accurate reconstruction.
I installed this easyocr version via pipx and I went to compare a bunch of files between the original ocrmypdf and this one, and found that while easyocr is WAY more accurate at getting the letters right, the sidecar is all one line. Less than ideal and sounds like a bug to me.
If I pdftotext the pdf, it comes out on multiple lines. But the sidecar is jacked.
to reproduce, use --sidecar I can provide a jpg for sure if you want.
The text was updated successfully, but these errors were encountered: