Where is the text images in CC-OCR? #23

TongkunGuan · 2022-08-20T11:59:57Z

Hello!
When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.

zyang-ur · 2022-08-28T04:56:30Z

We uploaded the GCC index file at https://tapvqacaption.blob.core.windows.net/data/GoogleCC/Train_GCC-training.tsv

The first index in "ocr_feat/visu_feat_resx" before "_" indicates the row number in the index file (both 0-indexed). E.g., "100000_1967358300" is the "100000" row of the soccer match image.

daeing · 2022-09-08T07:00:41Z

We uploaded the GCC index file at https://tapvqacaption.blob.core.windows.net/data/GoogleCC/Train_GCC-training.tsv

The first index in "ocr_feat/visu_feat_resx" before "_" indicates the row number in the index file (both 0-indexed). E.g., "100000_1967358300" is the "100000" row of the soccer match image.

Is there another way to download the OCR-CC Data? Such as Google Drive... I can not download the dataset stably due to my area. Many Thanks.

zyang-ur · 2022-09-09T18:31:12Z

Unfortunately, the CC3M dataset does not allow sharing raw images due to copyright issues. If you have a copy of CC3M images, it should cover all images in OCR-CC. There are also various online tools for CC3M downloading, which might solve/alleviate the network issue.

daeing · 2022-10-28T03:39:48Z

Hello! When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.

老哥，能够分享一下你下载的这个数据集吗？我按照他提供的这个azcopy下载一直不行。。能分享一个百度网盘链接不。。感谢感谢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where is the text images in CC-OCR? #23

Where is the text images in CC-OCR? #23

TongkunGuan commented Aug 20, 2022

zyang-ur commented Aug 28, 2022

daeing commented Sep 8, 2022

zyang-ur commented Sep 9, 2022

daeing commented Oct 28, 2022

Where is the text images in CC-OCR? #23

Where is the text images in CC-OCR? #23

Comments

TongkunGuan commented Aug 20, 2022

zyang-ur commented Aug 28, 2022

daeing commented Sep 8, 2022

zyang-ur commented Sep 9, 2022

daeing commented Oct 28, 2022