Will this extracts the image embedded in pdf. #365
Replies: 2 comments 7 replies
-
I, too, have a pdf with images but docling is not returning the images in the result. It is in English. OCR works well and when reviewing the chunks with this code
I get the text chunks on page 11 but no images. I get the tables and the text for the pages. I had been hoping to use docling for the many pdfs that have not properly labelled their images. Will docling support extraction of these images? Is it waiting for image metadata or can it detect them? Thanks for any help. |
Beta Was this translation helpful? Give feedback.
-
Hello. I used the DocumentCoverter() and export_to_dict() and through that I am able to access the coordinates of the picture location in the page. Do you know a way for me to be able to access the text inside the picture with docling? Thank you. |
Beta Was this translation helpful? Give feedback.
-
I have pdf which is in japanese which i need to translate it. Using HierarchicalChunker i am able to extract the chunks do the translation. Pdf is having embedded image which is in japanese and i donot see this in chunks. Will this support extraction of embedded image.
Beta Was this translation helpful? Give feedback.
All reactions