Jupyter notebook code for Tesseract ocr to extract the images to text, box files and hocr files using pytesseract + python
https://www.linuxhelp.com/how-to-install-tesseract-ocr-on-ubuntu-16-04 https://www.linux.com/blog/using-tesseract-ubuntu
https://pypi.org/project/pytesseract/
pip install pytesseract
sudo find / -name "tesseract"
sudo find / -name "tessdata"
- Keep all the png files in one folder and replace path with its location
- Use "" to enhance the dpi of the images
- It will create the text file, box file and hocr file for the input image and will save in the same directory