How to effectively embed PDFs with images for a RAG LLM? #4031

lucasmirachi · 2024-11-18T01:25:22Z

lucasmirachi
Nov 18, 2024

I’m working on a project where I need to embed PDF documents that contain images (which may or may not be relevant to the response) to create a vector database for later retrieval in a Retrieval-Augmented Generation (RAG) LLM. Currently, I’m using Unstructured + Faiss, but I’m not achieving satisfactory results with the images in the PDFs.

Here are some details about my approach:

I’m using the Unstructured library to parse the PDFs.
FAISS is being used to create and manage the vector database.
Text embeddings are working fine, but image embeddings are not yielding good results.

Questions:

What are the best practices for embedding PDFs that contain both text and images?
Are there any specific techniques or libraries that handle image embeddings within PDFs more effectively?
How can I improve the integration of image embeddings with text embeddings in my current setup?

Any advice or suggestions would be greatly appreciated!

andysingal · 2024-12-01T22:43:43Z

andysingal
Dec 1, 2024

I’m working on a project where I need to embed PDF documents that contain images (which may or may not be relevant to the response) to create a vector database for later retrieval in a Retrieval-Augmented Generation (RAG) LLM. Currently, I’m using Unstructured + Faiss, but I’m not achieving satisfactory results with the images in the PDFs.

Here are some details about my approach:

I’m using the Unstructured library to parse the PDFs.

FAISS is being used to create and manage the vector database.

Text embeddings are working fine, but image embeddings are not yielding good results.

Questions:

What are the best practices for embedding PDFs that contain both text and images?

Are there any specific techniques or libraries that handle image embeddings within PDFs more effectively?

How can I improve the integration of image embeddings with text embeddings in my current setup?

Any advice or suggestions would be greatly appreciated!

i have added examples within: https://github.com/andysingal/llm-course

0 replies

hypomariia · 2025-01-16T21:47:07Z

hypomariia
Jan 16, 2025

https://lu.ma/b1yqug98 - recently found a great webinar

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to effectively embed PDFs with images for a RAG LLM? #4031

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to effectively embed PDFs with images for a RAG LLM? #4031

lucasmirachi Nov 18, 2024

Replies: 2 comments

andysingal Dec 1, 2024

hypomariia Jan 16, 2025

lucasmirachi
Nov 18, 2024

andysingal
Dec 1, 2024

hypomariia
Jan 16, 2025