-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where is the text images in CC-OCR? #23
Comments
We uploaded the GCC index file at https://tapvqacaption.blob.core.windows.net/data/GoogleCC/Train_GCC-training.tsv The first index in "ocr_feat/visu_feat_resx" before "_" indicates the row number in the index file (both 0-indexed). E.g., "100000_1967358300" is the "100000" row of the soccer match image. |
Is there another way to download the OCR-CC Data? Such as Google Drive... I can not download the dataset stably due to my area. Many Thanks. |
Unfortunately, the CC3M dataset does not allow sharing raw images due to copyright issues. If you have a copy of CC3M images, it should cover all images in OCR-CC. There are also various online tools for CC3M downloading, which might solve/alleviate the network issue. |
老哥,能够分享一下你下载的这个数据集吗?我按照他提供的这个azcopy下载一直不行。。能分享一个百度网盘链接不。。感谢感谢 |
Hello!
When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.
The text was updated successfully, but these errors were encountered: