Create GraphRAG_survey.md

AIResponsibly · Aug 25, 2024 · aeef8a2 · aeef8a2
1 parent d0ea3d6
commit aeef8a2
Showing 1 changed file with 128 additions and 0 deletions.
diff --git a/summaries/safety/GraphRAG_survey.md b/summaries/safety/GraphRAG_survey.md
@@ -0,0 +1,128 @@
+# Graph Retrieval-Augmented Generation: A Survey
+- **Published**: [arXiv 2024]
+- **Link**: [GraphRAG Survey Paper](URL to the paper)
+- **Summary**: The paper surveys GraphRAG, a framework that enhances traditional RAG by incorporating graph-based retrieval for improved knowledge representation and generation.
+
+## 1. Introduction
+
+- Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing.
+- LLMs face challenges such as hallucinations, lack of domain-specific knowledge, and outdated information.
+- Retrieval-Augmented Generation (RAG) integrates external knowledge to improve LLM outputs.
+- GraphRAG enhances RAG by leveraging structured relationships in graph data for more accurate and context-aware responses.
+
+## 2. Comparison with Related Techniques and Surveys
+
+- **2.1 RAG:** Combines external knowledge with LLMs for improved task performance. GraphRAG is a branch of RAG but emphasizes structured relationships in graph data.
+- **2.2 LLMs on Graphs:** LLMs are primarily text-based and struggle with graph data, but combining them with Graph Neural Networks (GNNs) enhances performance on tasks involving graphs.
+- **2.3 KBQA:** A task in natural language processing aiming to respond to queries using external knowledge bases. GraphRAG is closely related but focuses on a broader range of downstream tasks.
+
+## 3. Preliminaries
+
+- **3.1 Text-Attributed Graphs (TAGs):** The graph format used in GraphRAG where nodes and edges have textual attributes.
+- **3.2 Graph Neural Networks (GNNs):** Used for modeling graph data, GNNs aggregate information from neighboring nodes and edges to update node representations.
+- **3.3 Language Models (LMs):** LMs are categorized into discriminative and generative models. They are integral to RAG and GraphRAG for improving retrieval and generation tasks.
+
+## 4. Overview of GraphRAG
+
+- GraphRAG integrates external knowledge graphs to enhance LLMs.
+- **Stages:**
+  - **Graph-Based Indexing (G-Indexing):** Constructing and indexing graph data.
+  - **Graph-Guided Retrieval (G-Retrieval):** Extracting relevant graph elements in response to queries.
+  - **Graph-Enhanced Generation (G-Generation):** Generating responses based on retrieved graph data.
+
+## 5. Graph-Based Indexing
+
+- **5.1 Graph Data:**
+  - **Open Knowledge Graphs:** Publicly available graphs (e.g., Wikidata, Freebase).
+    - **General Knowledge Graphs:** Store broad, structured knowledge (e.g., DBpedia, YAGO).
+    - **Domain Knowledge Graphs:** Focus on specific fields (e.g., CMeKG for biomedical data).
+  - **Self-Constructed Graph Data:** Custom graphs created for specific tasks (e.g., heterogeneous document graphs).
+
+- **5.2 Indexing Methods:**
+  - **Graph Indexing:** Preserves the graph structure, facilitating efficient retrieval.
+  - **Text Indexing:** Converts graph data into text for retrieval using text-based methods.
+  - **Vector Indexing:** Converts graph data into vectors for efficient retrieval.
+
+## 6. Graph-Guided Retrieval
+
+- **6.1 Retriever Types:**
+  - **Non-parametric Retriever:** Efficient but lacks accuracy without training.
+  - **LM-based Retriever:** Utilizes LLMs for accurate retrieval but at a higher computational cost.
+  - **GNN-based Retriever:** Leverages GNNs for complex graph structures.
+
+- **6.2 Retrieval Paradigms:**
+  - **Once Retrieval:** Single-step retrieval for efficiency.
+  - **Iterative Retrieval:** Multiple retrieval steps for accuracy, can be adaptive or non-adaptive.
+  - **Multi-Stage Retrieval:** Combines different retrieval methods for precision.
+
+- **6.3 Retrieval Granularity:**
+  - **Nodes, Triplets, Paths, Subgraphs:** Different levels of granularity for retrieval depending on task requirements.
+  - **Hybrid Granularities:** Combines multiple granularities for comprehensive retrieval.
+
+- **6.4 Retrieval Enhancement:**
+  - **Query Enhancement:** Expands or decomposes queries for better retrieval.
+  - **Knowledge Enhancement:** Merges or prunes retrieved data for relevance.
+
+## 7. Graph-Enhanced Generation
+
+- **7.1 Generators:**
+  - **GNNs:** Used for discriminative tasks.
+  - **LMs:** Used for both discriminative and generative tasks, requiring transformation of graph data.
+  - **Hybrid Models:** Combine GNNs and LMs in cascaded or parallel approaches.
+
+- **7.2 Graph Formats:**
+  - **Graph Languages:** Adjacency/Edge tables, Natural Language, Code-like forms, Syntax Trees, Node Sequences.
+    - Converts graph data into formats that LMs can process.
+
+## 8. Training Strategies
+
+### 1. Training-Free
+
+- "Training-Free methods are commonly employed when using closed-source LLMs such as GPT-4 [116] as retrievers or generators"
+- **Non-parametric Retrievers**: Utilize predefined rules or traditional graph search algorithms.
+- No specific model training is required.
+
+- **Pre-trained LMs as Retrievers**:
+  - Employ pre-trained language models.
+  - Retrieve graph elements based on query-graph element similarity.
+
+- **Prompt-Based Methods**:
+  - Use generative language models.
+  - Select relevant graph elements from a prompt.
+  - Leverage semantic associations.
+
+### 2. Training-Based
+
+- **Autoregressive Approach**:
+  - The model predicts the next relation.
+  - Achieved by concatenating the previous relationship path to the query.
+
+- **Distant Supervision and Implicit Intermediate Supervision**:
+  - Use distant supervision to guide retriever training.
+  - Implicit signals may also be used.
+  - These methods may introduce noise.
+
+- **Self-Supervised Methods**:
+  - Employ self-supervised pre-training techniques.
+  - Techniques include Masked Language Models (MLM).
+  - Also use contrastive learning to enhance retriever performance.
+
+### 3. Joint Training
+
+- **Synergy Between Retriever and Generator**:
+  - Jointly train retrievers and generators.
+  - This optimizes their collaboration.
+  - Results in more robust and effective retrieval and generation.
+
+## 9. Downstream Tasks and Evaluation
+
+- **Applications:** GraphRAG is used across multiple domains and tasks.
+- **Challenges:** Discusses current challenges and future research directions in GraphRAG.
+
+## 10. Future Directions
+
+- Potential areas for future research to advance GraphRAG.
+
+## 11. Conclusion
+
+- Summarizes the contributions of the survey and the potential impact of GraphRAG on future AI developments.