Model
weights: Mirror Site, Original Site
Original GitHub: GOT-OCR2.0
Thank GLM4 for providing some of the code (since I am really not good
at it, so I have to use AI)
Thank Yi-1.5 for translating README.md to English
This project was developed under Windows, so I can't guarantee that it will work on Linux. If you want to use it on
Linux, you can check this issue
Click a star, please
- Preliminary implementation of PDF export feature
- Fixed LaTeX rendering issues in PDF
- Implemented functionality to set language using
.json
- Wrote a script to manage language configuration files
- Batch rendering script
- PDF processing, split into individual page pngs first, then batch render to generate pdf for each page
- Integrated the batch renderer into the GUI
- More configuration options
- Refactored the PDF processing script
- Pulled an
Alpha
branch, put unfinished tasks into this -
⚠️ PDF processing should be able to render a whole PDF, not one PDF per page (I ran into some troubles, if you have the solution, you can create a pull request, pull intoAlpha
branch for now; HuggingFace model used is kaifeise/GOT-gguf;GGUF Test.py
used codes from 1694439208/GOT-OCR-Inference. Files are in thegguf
folder) - Support for
llama-cpp-python
, hoping to accelerate inference - html to word functionality, preserve formulas for editing
If you don't have the folder mentioned here, create a new one
This environment was tested under python 3.11.9.
Choose a suitable GPU version of torch
and from PyTorch and install
it.
I am using stable 2.4.1 + cu124, so I suggest you to use this version.
Not a must-have, but if you want to install it, check #12
I have tested that if you install it directly through requiremtns.txt
, then you will get
ModuleNotFoundError: No module named 'frontend'
error. But if you install it separately in commandline, it will work
fine. I don't know the reason why, just try it yourself.
By the way, if you still get the ModuleNotFoundError
, try to uninstall and reinstall fitz
and PyMuPDF
separately.
I have tested that pip install -U
won't work. Strange.
pip install fitz
pip install PyMuPDF
pip install -r requirements.txt
And, someone said that he encountered conflicting dependencies after installing. But I didn't find any conflicting
dependencies in the requirements.txt
file, and pipdeptree
shows that nothing is conflicting. I used pip freeze
to
create this requirements.txt
file, so it should be fine.
However, this problem really happened, so I provided a requirements-noversion.txt
that doesn't contain version
numbers.
For more information, see this issue #4
pip install -r requirements-noversion.txt
- Edge WebDriver,
download the
.zip
file, unzip it and put it intomsedgedriver
folder
This requires the Edge browser to be installed on your computer, which is preinstalled on Windows.
The file structure should be:GOT-OCR-2-GUI └─edge_driver ├─msedgedriver.exe └─...
- Download to
models
folder - Stop downloading fewer files
- The file structure should be:
GOT-OCR-2-GUI
└─models
├─config.json
├─generation_config.json
├─got_vision_b.py
├─model.safetensors
├─modeling_GOT.py
├─qwen.tiktoken
├─render_tools.py
├─special_tokens_map.json
├─tokenization_qwen.py
└─tokenizer_config.json
- If you want to use the command line, then use
CLI.py
. - If you want to use the graphical interface, then use
GUI.py
. - If you want to modify settings, then use
Config Manager.py
. - If you want to perform automated rendering operations, then use
Renderer.py
, which will automatically render all.jpg
and.png
images in theimgs
folder.
Those using the GUI can ignore this, but for those using the CLI, remember to place the images you want to OCR into the
imgs
folder (the CLI currently only detects.jpg
and.png
files).
- You can find various language
.json
files in theLocales
folder, with CLI and GUI language files stored separately. - In the
gui
subfolder, in addition to thelanguage.json
file, there is also aninstructions
folder that contains the built-in tutorials for the GUI, named aslanguage.md
. - To modify language support, simply change the value of
'language'
in theconfig.json
file. The available options correspond to the file names without extensions in thelanguage.json
files. - If you wish to add language support, for the CLI, just add a new
language.json
file (I strongly recommend using an existing file as a starting point). For the GUI, you will also need the correspondinglanguage.md
file. - You can run
Config Manager.py
to manage the language and other configuration files.
- DO NOT DELETE
markdown-it.js
insideresult
folder, or pdf outputting may fail
If you deleted it, you can find a backup in
scripts
folder. Just copy it toresult
folder.
- If the script crashes, you can try running
cmd
withpython + file name
, I encountered crashes during testing, but I don't know why - Make sure hat you installed the gpu version of
torch
-
Q: What is an "HTML local file"? Are there HTML files that are not saved locally?
-
A: Although the HTML files output by the model are saved locally, they use external scripts. Therefore, even if the file is on your local machine, you still need an internet connection to open it. I have downloaded the external script, which is the previously mentioned
markdown-it.js
. The main reason for doing this is to prevent PDF export failures due to network issues. -
Q: Why did my model fail to load?
-
A: Check if you are missing any files. It seems that the model files downloaded from Baidu Cloud are missing some files. I recommend you download from the previously mentioned Huggingface instead.
-
Q:Any suggestions on deploying this repo?
-
A: See this issue #5
For GUI users, the tutorial is in the GUI, you can just open the GUI and follow the instructions.
Here are tutorials for CLI users.
- ocr: Standard OCR
- format: OCR with formatting
- fine-grained-ocr: OCR content within a specific box
- fine-grained-format: OCR and format content within a specific box
- fine-grained-color-ocr: OCR content within a box of a specific color (I haven't tried this, but it seems like you would need to draw a red/green/blue box first and then select the color in the GUI)
- fine-grained-color-format: OCR and format content within a box of a specific color
- Suitable for more complex images
- Exist files will be overwritten!!!Check the file path before clicking the button!!!
- Render OCR content and save it as an HTML file
- Will be saved as UTF8 encoding and GB2312 encoding files
- You can convert HTML to PDF
- CLI will automatically get image name
- HTML files will be saved in
result
folder - If you want to convert HTML to PDF, just enter
y
when the CLI ask you