[BUG] Information extracted from table/image using Azure Document Intelligence API is not reflected in GraphRAG input #594

hide212131 · 2025-01-03T12:57:09Z

Description

When a PDF document with the following structure is read by Azure Document Intelligence, files for Paragraph 1 and Paragraph 2 are created in the GraphRAG input folder, but no file is created for the Table/Image(description).

Paragraph 1
Table
Paragraph 2
Image
...

Reproduction steps

1. In Retrieval settings > GraphRAG Collection > File loader, select `Azure AI Document Intelligence (figure+table extraction)`
1. Upload a PDF file containing a table in GraphRAG
1. Execute a query related to the table

Screenshots

No response

Logs

No response

Browsers

No response

OS

No response

Additional information

AzureAIDocumentIntelligenceLoader stores Text/Table/Image separately in the Document without duplication, while GraphRAGIndexingPipeline outputs only Text.

I think it would be more appropriate to have a format like ktem_app_data/markdown_cache_dir, where tables and other elements are expanded inline, as the text to be indexed.

The text was updated successfully, but these errors were encountered:

hide212131 added the bug Something isn't working label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Information extracted from table/image using Azure Document Intelligence API is not reflected in GraphRAG input #594

[BUG] Information extracted from table/image using Azure Document Intelligence API is not reflected in GraphRAG input #594

hide212131 commented Jan 3, 2025

[BUG] Information extracted from table/image using Azure Document Intelligence API is not reflected in GraphRAG input #594

[BUG] Information extracted from table/image using Azure Document Intelligence API is not reflected in GraphRAG input #594

Comments

hide212131 commented Jan 3, 2025

Description

Reproduction steps

Screenshots

Logs

Browsers

OS

Additional information