Newlines missing in sidecar #4

BillyCroan · 2024-01-21T03:28:35Z

I installed this easyocr version via pipx and I went to compare a bunch of files between the original ocrmypdf and this one, and found that while easyocr is WAY more accurate at getting the letters right, the sidecar is all one line. Less than ideal and sounds like a bug to me.

If I pdftotext the pdf, it comes out on multiple lines. But the sidecar is jacked.

to reproduce, use --sidecar I can provide a jpg for sure if you want.

jbarlow83 · 2024-01-21T04:23:29Z

The output format from easyocr doesn't really have line group, so that information has to be inferred. Using pdftotext -layout should give an accurate reconstruction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newlines missing in sidecar #4

Newlines missing in sidecar #4

BillyCroan commented Jan 21, 2024

jbarlow83 commented Jan 21, 2024

Newlines missing in sidecar #4

Newlines missing in sidecar #4

Comments

BillyCroan commented Jan 21, 2024

jbarlow83 commented Jan 21, 2024