-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the OCR engine that you use, three questions need your help #9
Comments
Thanks for your nice concern and sorry for the late reply. Besides, I think it is unnecessary to annotate the GT string manually.
Hope this helps~! |
Thanks a lot for your detailed explanation. Based on your code, Tesseract version and PyTesseract version, I have achieved the same CER performance in paper. The DocScanner is another great work which achieves the best MS-SSIM, I will pay some time to follow it next step. |
@hanquansanren Thanks for your feedback. |
@fh2019ustc I'vd installed the corresponding version, but achived differenet ED value(607), while the CER value(0.20) is the same as in table2. Eval dataset: DocUnet |
@an1018 Hi, please use the OCR eval code in our repo, in which we have updated the image list used in the DewarpNet. |
@an1018 For more OCR performance of other methods under the two settings (DocTr and DewarpNet), you can refer to the DocScanner. |
@an1018 Hope to get your reply. |
@fh2019ustc Yes,I use OCR_eval.py for evaluation,but there are still some problems: Q2:And the performance of DocTr in the following table is based on the geometric rectified results of GeoTr, not based on the illumination correction of IllTr? Q3: I still can't get the same peformance by using the rectified images from Baidu Cloud
note:'docunet/scan/' is the scan images of docunet Q4:How can I get the same result without using the rectified images from Baidu Cloud
|
@an1018 Note that In the DocUNet Benchmark, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ignored by most of the existing works. Before the evaluation, please make a check. |
@an1018 For your Q2, this performance is based on GeoTr. |
@an1018 For Q3 and Q4, to reproduce the above performance, please use the geometric rectified images rather than the illumination corrected images. |
@fh2019ustc Thanks for your quick response, I'll try again and give you feedback |
@fh2019ustc Hi, I'vd installed Tesseract(v5.0.1) from Git, and downloaded the eng model. The performance is similar to the following performance, but there are still some differences. What else could be causing it? CER: 0.1759 Here are some of my configurations: |
1)How can I install 5.0.1.20220118, not 5.0.1?(My environment is Linux Ubuntu) |
Oh, I can get the same performance in Windows environment. But for Ubuntu,I can't find Tesseract v5.0.1.20220118 |
@an1018 Thanks for your reply. For OCR evaluation, I think that you can compare the performance with the same environment, whether it is windows or ubuntu. |
Yes, Thanks for your continuous technical support |
Q1: Hello, in section5.1 of your paper, I notice you used Pytesseract V3.02.02, as shown in the above picture ↑
But on the homepage of pytesseract, I only find the version of 0.3.~ or 0.2.~, could you please tell me the detailed version you use. By the way, in the paper of DewarpNet, they specify the Pytesseract on version 0.2.9. Are there big differences caused by the version of OCR engine?
Q2: For the calculation of CER metric, it needs the ground true of each character in images, I also notice your repository provides 60 images index for OCR metric test, while the DewarpNet provided 25 images index as well as ground true in JSON form. Can you tell me how do you annotate the ground true? And if possible, can you share your ground true file?
In addition, I also noticed 25 ground trues in DewarpNet have several label errors, I guess they also use some OCR metric. If you also use OCR engine to label the ground true, can your some me more details about how do you annotate?
Q3: In fact, I also try to test the OCR performance over your model output. However, neither Pytesseract version 0.3.~ nor 0.2.~ achieve the same result in paper.
Here is my OCR test code:
In brief, the core code for OCR is
h1=pytesseract.image_to_string(Image.open(h1str),lang='eng')
, with which I only get CER of 0.6. This result is far away from 0.2~0.3 CER as previous models.Could you share your OCR version and code for the OCR metric? Many thanks for your generous response!
The text was updated successfully, but these errors were encountered: