-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
128 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Graph Retrieval-Augmented Generation: A Survey | ||
- **Published**: [arXiv 2024] | ||
- **Link**: [GraphRAG Survey Paper](URL to the paper) | ||
- **Summary**: The paper surveys GraphRAG, a framework that enhances traditional RAG by incorporating graph-based retrieval for improved knowledge representation and generation. | ||
|
||
## 1. Introduction | ||
|
||
- Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing. | ||
- LLMs face challenges such as hallucinations, lack of domain-specific knowledge, and outdated information. | ||
- Retrieval-Augmented Generation (RAG) integrates external knowledge to improve LLM outputs. | ||
- GraphRAG enhances RAG by leveraging structured relationships in graph data for more accurate and context-aware responses. | ||
|
||
## 2. Comparison with Related Techniques and Surveys | ||
|
||
- **2.1 RAG:** Combines external knowledge with LLMs for improved task performance. GraphRAG is a branch of RAG but emphasizes structured relationships in graph data. | ||
- **2.2 LLMs on Graphs:** LLMs are primarily text-based and struggle with graph data, but combining them with Graph Neural Networks (GNNs) enhances performance on tasks involving graphs. | ||
- **2.3 KBQA:** A task in natural language processing aiming to respond to queries using external knowledge bases. GraphRAG is closely related but focuses on a broader range of downstream tasks. | ||
|
||
## 3. Preliminaries | ||
|
||
- **3.1 Text-Attributed Graphs (TAGs):** The graph format used in GraphRAG where nodes and edges have textual attributes. | ||
- **3.2 Graph Neural Networks (GNNs):** Used for modeling graph data, GNNs aggregate information from neighboring nodes and edges to update node representations. | ||
- **3.3 Language Models (LMs):** LMs are categorized into discriminative and generative models. They are integral to RAG and GraphRAG for improving retrieval and generation tasks. | ||
|
||
## 4. Overview of GraphRAG | ||
|
||
- GraphRAG integrates external knowledge graphs to enhance LLMs. | ||
- **Stages:** | ||
- **Graph-Based Indexing (G-Indexing):** Constructing and indexing graph data. | ||
- **Graph-Guided Retrieval (G-Retrieval):** Extracting relevant graph elements in response to queries. | ||
- **Graph-Enhanced Generation (G-Generation):** Generating responses based on retrieved graph data. | ||
|
||
## 5. Graph-Based Indexing | ||
|
||
- **5.1 Graph Data:** | ||
- **Open Knowledge Graphs:** Publicly available graphs (e.g., Wikidata, Freebase). | ||
- **General Knowledge Graphs:** Store broad, structured knowledge (e.g., DBpedia, YAGO). | ||
- **Domain Knowledge Graphs:** Focus on specific fields (e.g., CMeKG for biomedical data). | ||
- **Self-Constructed Graph Data:** Custom graphs created for specific tasks (e.g., heterogeneous document graphs). | ||
|
||
- **5.2 Indexing Methods:** | ||
- **Graph Indexing:** Preserves the graph structure, facilitating efficient retrieval. | ||
- **Text Indexing:** Converts graph data into text for retrieval using text-based methods. | ||
- **Vector Indexing:** Converts graph data into vectors for efficient retrieval. | ||
|
||
## 6. Graph-Guided Retrieval | ||
|
||
- **6.1 Retriever Types:** | ||
- **Non-parametric Retriever:** Efficient but lacks accuracy without training. | ||
- **LM-based Retriever:** Utilizes LLMs for accurate retrieval but at a higher computational cost. | ||
- **GNN-based Retriever:** Leverages GNNs for complex graph structures. | ||
|
||
- **6.2 Retrieval Paradigms:** | ||
- **Once Retrieval:** Single-step retrieval for efficiency. | ||
- **Iterative Retrieval:** Multiple retrieval steps for accuracy, can be adaptive or non-adaptive. | ||
- **Multi-Stage Retrieval:** Combines different retrieval methods for precision. | ||
|
||
- **6.3 Retrieval Granularity:** | ||
- **Nodes, Triplets, Paths, Subgraphs:** Different levels of granularity for retrieval depending on task requirements. | ||
- **Hybrid Granularities:** Combines multiple granularities for comprehensive retrieval. | ||
|
||
- **6.4 Retrieval Enhancement:** | ||
- **Query Enhancement:** Expands or decomposes queries for better retrieval. | ||
- **Knowledge Enhancement:** Merges or prunes retrieved data for relevance. | ||
|
||
## 7. Graph-Enhanced Generation | ||
|
||
- **7.1 Generators:** | ||
- **GNNs:** Used for discriminative tasks. | ||
- **LMs:** Used for both discriminative and generative tasks, requiring transformation of graph data. | ||
- **Hybrid Models:** Combine GNNs and LMs in cascaded or parallel approaches. | ||
|
||
- **7.2 Graph Formats:** | ||
- **Graph Languages:** Adjacency/Edge tables, Natural Language, Code-like forms, Syntax Trees, Node Sequences. | ||
- Converts graph data into formats that LMs can process. | ||
|
||
## 8. Training Strategies | ||
|
||
### 1. Training-Free | ||
|
||
- "Training-Free methods are commonly employed when using closed-source LLMs such as GPT-4 [116] as retrievers or generators" | ||
- **Non-parametric Retrievers**: Utilize predefined rules or traditional graph search algorithms. | ||
- No specific model training is required. | ||
|
||
- **Pre-trained LMs as Retrievers**: | ||
- Employ pre-trained language models. | ||
- Retrieve graph elements based on query-graph element similarity. | ||
|
||
- **Prompt-Based Methods**: | ||
- Use generative language models. | ||
- Select relevant graph elements from a prompt. | ||
- Leverage semantic associations. | ||
|
||
### 2. Training-Based | ||
|
||
- **Autoregressive Approach**: | ||
- The model predicts the next relation. | ||
- Achieved by concatenating the previous relationship path to the query. | ||
|
||
- **Distant Supervision and Implicit Intermediate Supervision**: | ||
- Use distant supervision to guide retriever training. | ||
- Implicit signals may also be used. | ||
- These methods may introduce noise. | ||
|
||
- **Self-Supervised Methods**: | ||
- Employ self-supervised pre-training techniques. | ||
- Techniques include Masked Language Models (MLM). | ||
- Also use contrastive learning to enhance retriever performance. | ||
|
||
### 3. Joint Training | ||
|
||
- **Synergy Between Retriever and Generator**: | ||
- Jointly train retrievers and generators. | ||
- This optimizes their collaboration. | ||
- Results in more robust and effective retrieval and generation. | ||
|
||
## 9. Downstream Tasks and Evaluation | ||
|
||
- **Applications:** GraphRAG is used across multiple domains and tasks. | ||
- **Challenges:** Discusses current challenges and future research directions in GraphRAG. | ||
|
||
## 10. Future Directions | ||
|
||
- Potential areas for future research to advance GraphRAG. | ||
|
||
## 11. Conclusion | ||
|
||
- Summarizes the contributions of the survey and the potential impact of GraphRAG on future AI developments. |