Skip to content

Commit

Permalink
Create GraphRAG_survey.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hamid58b authored Aug 25, 2024
1 parent d0ea3d6 commit aeef8a2
Showing 1 changed file with 128 additions and 0 deletions.
128 changes: 128 additions & 0 deletions summaries/safety/GraphRAG_survey.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Graph Retrieval-Augmented Generation: A Survey
- **Published**: [arXiv 2024]
- **Link**: [GraphRAG Survey Paper](URL to the paper)
- **Summary**: The paper surveys GraphRAG, a framework that enhances traditional RAG by incorporating graph-based retrieval for improved knowledge representation and generation.

## 1. Introduction

- Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing.
- LLMs face challenges such as hallucinations, lack of domain-specific knowledge, and outdated information.
- Retrieval-Augmented Generation (RAG) integrates external knowledge to improve LLM outputs.
- GraphRAG enhances RAG by leveraging structured relationships in graph data for more accurate and context-aware responses.

## 2. Comparison with Related Techniques and Surveys

- **2.1 RAG:** Combines external knowledge with LLMs for improved task performance. GraphRAG is a branch of RAG but emphasizes structured relationships in graph data.
- **2.2 LLMs on Graphs:** LLMs are primarily text-based and struggle with graph data, but combining them with Graph Neural Networks (GNNs) enhances performance on tasks involving graphs.
- **2.3 KBQA:** A task in natural language processing aiming to respond to queries using external knowledge bases. GraphRAG is closely related but focuses on a broader range of downstream tasks.

## 3. Preliminaries

- **3.1 Text-Attributed Graphs (TAGs):** The graph format used in GraphRAG where nodes and edges have textual attributes.
- **3.2 Graph Neural Networks (GNNs):** Used for modeling graph data, GNNs aggregate information from neighboring nodes and edges to update node representations.
- **3.3 Language Models (LMs):** LMs are categorized into discriminative and generative models. They are integral to RAG and GraphRAG for improving retrieval and generation tasks.

## 4. Overview of GraphRAG

- GraphRAG integrates external knowledge graphs to enhance LLMs.
- **Stages:**
- **Graph-Based Indexing (G-Indexing):** Constructing and indexing graph data.
- **Graph-Guided Retrieval (G-Retrieval):** Extracting relevant graph elements in response to queries.
- **Graph-Enhanced Generation (G-Generation):** Generating responses based on retrieved graph data.

## 5. Graph-Based Indexing

- **5.1 Graph Data:**
- **Open Knowledge Graphs:** Publicly available graphs (e.g., Wikidata, Freebase).
- **General Knowledge Graphs:** Store broad, structured knowledge (e.g., DBpedia, YAGO).
- **Domain Knowledge Graphs:** Focus on specific fields (e.g., CMeKG for biomedical data).
- **Self-Constructed Graph Data:** Custom graphs created for specific tasks (e.g., heterogeneous document graphs).

- **5.2 Indexing Methods:**
- **Graph Indexing:** Preserves the graph structure, facilitating efficient retrieval.
- **Text Indexing:** Converts graph data into text for retrieval using text-based methods.
- **Vector Indexing:** Converts graph data into vectors for efficient retrieval.

## 6. Graph-Guided Retrieval

- **6.1 Retriever Types:**
- **Non-parametric Retriever:** Efficient but lacks accuracy without training.
- **LM-based Retriever:** Utilizes LLMs for accurate retrieval but at a higher computational cost.
- **GNN-based Retriever:** Leverages GNNs for complex graph structures.

- **6.2 Retrieval Paradigms:**
- **Once Retrieval:** Single-step retrieval for efficiency.
- **Iterative Retrieval:** Multiple retrieval steps for accuracy, can be adaptive or non-adaptive.
- **Multi-Stage Retrieval:** Combines different retrieval methods for precision.

- **6.3 Retrieval Granularity:**
- **Nodes, Triplets, Paths, Subgraphs:** Different levels of granularity for retrieval depending on task requirements.
- **Hybrid Granularities:** Combines multiple granularities for comprehensive retrieval.

- **6.4 Retrieval Enhancement:**
- **Query Enhancement:** Expands or decomposes queries for better retrieval.
- **Knowledge Enhancement:** Merges or prunes retrieved data for relevance.

## 7. Graph-Enhanced Generation

- **7.1 Generators:**
- **GNNs:** Used for discriminative tasks.
- **LMs:** Used for both discriminative and generative tasks, requiring transformation of graph data.
- **Hybrid Models:** Combine GNNs and LMs in cascaded or parallel approaches.

- **7.2 Graph Formats:**
- **Graph Languages:** Adjacency/Edge tables, Natural Language, Code-like forms, Syntax Trees, Node Sequences.
- Converts graph data into formats that LMs can process.

## 8. Training Strategies

### 1. Training-Free

- "Training-Free methods are commonly employed when using closed-source LLMs such as GPT-4 [116] as retrievers or generators"
- **Non-parametric Retrievers**: Utilize predefined rules or traditional graph search algorithms.
- No specific model training is required.

- **Pre-trained LMs as Retrievers**:
- Employ pre-trained language models.
- Retrieve graph elements based on query-graph element similarity.

- **Prompt-Based Methods**:
- Use generative language models.
- Select relevant graph elements from a prompt.
- Leverage semantic associations.

### 2. Training-Based

- **Autoregressive Approach**:
- The model predicts the next relation.
- Achieved by concatenating the previous relationship path to the query.

- **Distant Supervision and Implicit Intermediate Supervision**:
- Use distant supervision to guide retriever training.
- Implicit signals may also be used.
- These methods may introduce noise.

- **Self-Supervised Methods**:
- Employ self-supervised pre-training techniques.
- Techniques include Masked Language Models (MLM).
- Also use contrastive learning to enhance retriever performance.

### 3. Joint Training

- **Synergy Between Retriever and Generator**:
- Jointly train retrievers and generators.
- This optimizes their collaboration.
- Results in more robust and effective retrieval and generation.

## 9. Downstream Tasks and Evaluation

- **Applications:** GraphRAG is used across multiple domains and tasks.
- **Challenges:** Discusses current challenges and future research directions in GraphRAG.

## 10. Future Directions

- Potential areas for future research to advance GraphRAG.

## 11. Conclusion

- Summarizes the contributions of the survey and the potential impact of GraphRAG on future AI developments.

0 comments on commit aeef8a2

Please sign in to comment.