Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing & cleaning data from PDFs efficiently #5

Open
antoniovalentim opened this issue Oct 15, 2020 · 0 comments
Open

Importing & cleaning data from PDFs efficiently #5

antoniovalentim opened this issue Oct 15, 2020 · 0 comments

Comments

@antoniovalentim
Copy link

This is more of a cleaning scraped data than collecting data question, but one thing I struggled in the past is how to import and clean data from PDFs, and how to scale that up for large numbers of similar PDFs.

I tried different things with importing them with pdf_text or different OCR packages but I quite never found efficient ways to then import and clean data in bulk.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant