Importing & cleaning data from PDFs efficiently #5

antoniovalentim · 2020-10-15T09:41:33Z

This is more of a cleaning scraped data than collecting data question, but one thing I struggled in the past is how to import and clean data from PDFs, and how to scale that up for large numbers of similar PDFs.

I tried different things with importing them with pdf_text or different OCR packages but I quite never found efficient ways to then import and clean data in bulk.
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing & cleaning data from PDFs efficiently #5

Importing & cleaning data from PDFs efficiently #5

antoniovalentim commented Oct 15, 2020

Importing & cleaning data from PDFs efficiently #5

Importing & cleaning data from PDFs efficiently #5

Comments

antoniovalentim commented Oct 15, 2020