From b0e42e28b6e18a3c2cb440afc5644b7515c42c57 Mon Sep 17 00:00:00 2001 From: Hamid Bagheri Date: Tue, 23 Jul 2024 18:21:40 -0500 Subject: [PATCH] Update knowlege_graphs_survey.md --- .../explainability/knowlege_graphs_survey.md | 324 +++++++++++++++++- 1 file changed, 312 insertions(+), 12 deletions(-) diff --git a/summaries/explainability/knowlege_graphs_survey.md b/summaries/explainability/knowlege_graphs_survey.md index 67fdc9e..8c0122f 100644 --- a/summaries/explainability/knowlege_graphs_survey.md +++ b/summaries/explainability/knowlege_graphs_survey.md @@ -4,7 +4,6 @@ - **Link**: [IEEE Xplore](https://ieeexplore.ieee.org/document/9446554) - **Summary**: A comprehensive review of knowledge graph representation learning, acquisition methods, and applications. - ## I. INTRODUCTION - Incorporating human knowledge in AI is essential for solving complex tasks. @@ -53,6 +52,13 @@ - Complex space allows for symmetric and antisymmetric relations. - Manifold space offers more expressive representation. +| Representation Space | Description | Advantages | Disadvantages | +|--------------------------|-------------|------------|---------------| +| **Euclidean Space** | Entities and relations are represented as vectors in a continuous space. | - Simple and intuitive. - Well-understood mathematical properties. | - Limited expressiveness for complex structures. | +| **Complex Vector Space** | Uses complex numbers to represent entities and relations, capturing symmetric and antisymmetric relations. | - Handles symmetric and antisymmetric relations effectively. | - Increased computational complexity. | +| **Gaussian Distribution**| Models entities and relations with Gaussian distributions to account for uncertainties. | - Captures uncertainty and variance in data. - Provides probabilistic interpretation. | - Computationally intensive. - Requires modeling covariance. | +| **Manifold Space** | Embeddings lie on complex topological spaces like spheres or hyperplanes. | - Highly expressive for complex and hierarchical data. - Captures non-Euclidean structures effectively. | - Increased mathematical and computational complexity. - More challenging to implement and optimize. | + ### B. Scoring Function - Measures the plausibility of facts in knowledge graphs. @@ -86,19 +92,90 @@ ### A. Knowledge Graph Completion (KGC) - Expands existing knowledge graphs with new facts. -- Embedding-based ranking, path-based reasoning, rule-based reasoning, and meta relational learning are methods used. +- Methods: + - Embedding-based ranking, + - valid triples (head, relation, tail) are closer in the embedding space than invalid ones + - Scoring functions are used to measure the plausibility of triples. + - path-based reasoning, + - This method explores the paths between entities to infer new relationships. + - The paths provide contextual information that can improve link prediction. + - RL-based Path Finding + - rule-based reasoning + - using logical rules to infer new facts + - e.g. (Y, sonOf, X) ← (X, hasChild, Y) ∧ (Y, gender, Male) + - Meta Relational Learning - Triple classification and entity discovery tasks are included. -### B. Relation Extraction - -- Identifies relations between entities from text. -- Uses neural paradigms like attention mechanisms, GCNs, and deep residual learning. -- Techniques include adversarial training and reinforcement learning. - -### C. Entity Discovery - -- Involves tasks like entity recognition, typing, disambiguation, and alignment. -- Methods use neural models to identify and categorize entities. +### B. Entity Discovery + +- Entity Discovery + - This section distinguishes entity-based knowledge acquisition into several tasks: entity recognition, entity disambiguation, entity typing, and entity alignment. These tasks are collectively termed as entity discovery as they all explore entity-related knowledge under different settings. + + - Entity Recognition + - Also known as named entity recognition (NER). + - Task: Tags entities in text. + - Methods: Hand-crafted features (capitalization patterns, gazetteers). + - Recent Work: Sequence-to-sequence neural architectures like LSTM-CNN. + - Example: LSTM-CRF, MGNER, multi-task training for multi-token and single-token entities. + - Advanced Techniques: Pretrained language models with knowledge graphs (ERNIE, K-BERT). + + - Entity Typing + - Types: Coarse and fine-grained types. + - Fine-grained types use a tree-structured category, seen as multi-class and multi-label classification. + - Challenges: Reducing label noise and managing growing typesets. + - Methods: PLE with partial-label embedding, prototype-driven label embedding. + - Example: JOIE for joint embeddings, ConnectE for local and global triple knowledge. + + - Entity Disambiguation + - Also known as entity linking. + - Task: Links entity mentions to corresponding entities in a knowledge graph. + - Example: Linking "Einstein" to Albert Einstein. + - Techniques: Representation learning of entities and mentions. + - Example Models: DSRM for entity semantic relatedness, EDKate for joint embedding of entity and text. + - Advanced Techniques: Attentive neural models over local context windows, differentiable message passing. + + - Entity Alignment + - Task: Fuse knowledge among various knowledge graphs. + - Goal: Find an alignment set + $$ A = \{(e1, e2) \in E1 \times E2 | e1 \equiv e2 \} $$ + - Methods: Embedding-based alignment, iterative alignment. + - Example: MTransE for multilingual scenarios, IPTransE for joint embedding, BootEA for bootstrapping approach. + - Additional Techniques: JAPE for cross-lingual attributes, KDCoE for multi-lingual entity descriptions, MultiKE for multiple views of entity name, relation, and attributes. + - Example: Incorporating additional information of entities for refinement. + +### C. Relation Extraction + +- Relation Extraction + - Key task for building large-scale knowledge graphs by extracting relational facts from text. + - Uses distant supervision to create training data by heuristic matching. + + - Neural Relation Extraction + - Uses CNNs, RNNs for relation classification and extraction. + - Example: PCNN applies piecewise max pooling, SDP-LSTM uses multi-channel LSTM. + + - Attention Mechanism + - Combines attention mechanisms with CNNs and RNNs. + - Example: APCNN uses entity descriptions, Att-BLSTM uses word-level attention with BiLSTM. + + - Graph Convolutional Networks (GCNs) + - Encodes dependency trees or learns KGEs for sentence encoding. + - Example: C-GCN uses contextualized GCN, AGGCN applies multi-head attention. + + - Adversarial Training + - Adds adversarial noise to word embeddings. + - Example: DSGAN denoises distantly supervised relation extraction. + + - Reinforcement Learning + - Trains instance selectors with policy networks. + - Example: Qin et al. use F1 score-based rewards, HRL proposes hierarchical policy learning. + + - Other Advances + - Includes deep residual learning, transfer learning, and trustworthy relation extraction. + - Example: Huang and Wang use deep residual learning, CORD combines text corpus and KG with logical rules. + + - Joint Entity and Relation Extraction + - Combines entity mention extraction and relation classification. + - Example: Attention-based LSTM network by Katiyar and Cardie, cascade binary tagging by Wei et al. ## V. TEMPORAL KNOWLEDGE GRAPHS @@ -146,3 +223,226 @@ - Knowledge graphs are vital for advancing AI with structured information. - Significant progress has been made, but challenges remain in reasoning and scalability. - Future research should address these challenges for full potential utilization. + +### Reference + +- Shaoxiong Ji, Pekka Marttinen, Shirui Pan, Erik Cambria, Philip S. Yu, "A Survey on Knowledge Graphs: Representation, Acquisition, and Applications", IEEE Transactions on Neural Networks and Learning Systems, 2021. + +## Notes + +- Key Concepts + - Quick recap of entities, relationships, and triples + - Ontologies and schemas + +- Construction of Knowledge Graphs + - Data Collection and Preprocessing + - Entity Recognition and Linking + - Triples Extraction + - Integration and Fusion + - Storage Solutions (Graph databases, RDF stores) + +- Advanced Techniques + - Using Machine Learning for Entity Recognition and Relationship Extraction + - Probabilistic Knowledge Graphs + - Incorporating NLP for enhanced extraction and linking + +- Tools and Technologies + - popular tools: Neo4j + - Brief demos or screenshots + +- Use Cases and Applications + - Real-world examples in various domains (healthcare, finance, e-commerce) + - Enhancing recommendation systems, semantic search, and data integration + +- Challenges and Solutions + - Scalability + - Data Quality and Integration + - Maintaining and Updating Graphs + +## TransE paper + +- Overview + - The TransE model represents entities and relationships in a continuous vector space. The core idea is that if a triple $ (h, r, t) $ holds true, then the embedding of the head entity $ \mathbf{h} $ plus the embedding of the relation $ \mathbf{r} $ should be close to the embedding of the tail entity $ \mathbf{t} $. The TransE loss function is designed to enforce this principle by minimizing the distance between $ \mathbf{h} + \mathbf{r} $ and $ \mathbf{t} $ for valid triples, while ensuring that invalid triples have a larger distance. + +- Loss Function + - The loss function in TransE uses a margin-based ranking criterion. This function includes both positive and negative examples to ensure the model distinguishes between correct and incorrect triples. + - The loss function can be formulated as: + $$ + L = \sum_{(h, r, t) \in \mathcal{S}} \sum_{(h', r, t') \in \mathcal{S'}} \left[ \gamma + d(\mathbf{h} + \mathbf{r}, \mathbf{t}) - d(\mathbf{h'} + \mathbf{r}, \mathbf{t'}) \right]_{+} + $$ + - $\mathcal{S}$ is the set of positive (valid) triples. + - $\mathcal{S'}$ is the set of negative (invalid) triples generated by corrupting $ \mathcal{S} $. + - $\gamma$ is the margin, a hyperparameter that defines the minimum distance between positive and negative triples. + - $d(\cdot)$ is the distance function (either L1 or L2 norm). + - $[ \cdot ]_{+}$ denotes the positive part, meaning $ \max(0, x) $. + +- Intuition + - Positive Triples: The loss function minimizes the distance $d(\mathbf{h} + \mathbf{r}, \mathbf{t})$ for valid triples, making sure that $\mathbf{h} + \mathbf{r} \approx \mathbf{t}$. + - Negative Triples: For corrupted triples, the distance $d(\mathbf{h'} + \mathbf{r}, \mathbf{t'})$ should be as large as possible, ensuring that $\mathbf{h'} + \mathbf{r}$ is not close to $\mathbf{t'}$. + - By minimizing this loss function, the model learns embeddings such that valid triples are closer together in the embedding space compared to invalid triples. + +- Summary + - The TransE loss function uses a margin-based ranking criterion to ensure that valid triples have smaller distances compared to invalid ones. + - The intuitive goal is to make valid triples close in the embedding space while pushing invalid triples apart. + - The margin $\gamma$ controls the separation between positive and negative triples, and the distance metric can be either L1 or L2 norm. + - This method helps in effectively training the embeddings to distinguish between valid and invalid relationships in a knowledge graph. + +### Implementations + +```python +import torch +import torch.nn as nn +import torch.optim as optim + +class TransE(nn.Module): + def __init__(self, num_entities, num_relations, embedding_dim, margin=1.0, norm=1): + super(TransE, self).__init__() + self.entity_embeddings = nn.Embedding(num_entities, embedding_dim) + self.relation_embeddings = nn.Embedding(num_relations, embedding_dim) + self.margin = margin + self.norm = norm # 1 for L1 norm, 2 for L2 norm + + # Initialize embeddings randomly. + nn.init.xavier_uniform_(self.entity_embeddings.weight.data) + nn.init.xavier_uniform_(self.relation_embeddings.weight.data) + + def forward(self, h, r, t): + h_embedded = self.entity_embeddings(h) + r_embedded = self.relation_embeddings(r) + t_embedded = self.entity_embeddings(t) + return h_embedded, r_embedded, t_embedded + + def loss(self, pos_triples, neg_triples): + pos_h, pos_r, pos_t = pos_triples + neg_h, neg_r, neg_t = neg_triples + + pos_h_embedded, pos_r_embedded, pos_t_embedded = self.forward(pos_h, pos_r, pos_t) + neg_h_embedded, neg_r_embedded, neg_t_embedded = self.forward(neg_h, neg_r, neg_t) + + pos_distance = torch.norm(pos_h_embedded + pos_r_embedded - pos_t_embedded, p=self.norm, dim=1) + neg_distance = torch.norm(neg_h_embedded + neg_r_embedded - neg_t_embedded, p=self.norm, dim=1) + + loss = torch.sum(torch.relu(self.margin + pos_distance - neg_distance)) + return loss + +# Example usage +num_entities = 4 # Alice, Bob, Charlie, David +num_relations = 2 # friendOf, colleagueOf +embedding_dim = 7 +margin = 1.0 +model = TransE(num_entities, num_relations, embedding_dim, margin) + +# Optimizer +optimizer = optim.Adam(model.parameters(), lr=0.001) + +# Example positive and negative triples +pos_triples = ( + torch.tensor([0, 1, 2]), # Head entities: Alice, Bob, Charlie + torch.tensor([0, 0, 1]), # Relations: friendOf, friendOf, colleagueOf + torch.tensor([1, 2, 3]) # Tail entities: Bob, Charlie, David +) + +neg_triples = ( + torch.tensor([0, 1, 2]), # Head entities: Alice, Bob, Charlie + torch.tensor([0, 0, 1]), # Relations: friendOf, friendOf, colleagueOf + torch.tensor([2, 3, 0]) # Tail entities: Charlie, David, Alice (randomly corrupted) +) + +# Print the initial embeddings before training +initial_entity_embeddings = model.entity_embeddings.weight.data.clone() +initial_relation_embeddings = model.relation_embeddings.weight.data.clone() + +print("Initial Entity Embeddings:") +print(initial_entity_embeddings) + +print("Initial Relation Embeddings:") +print(initial_relation_embeddings) + +# Training loop +epochs = 200 +for epoch in range(epochs): + model.train() + optimizer.zero_grad() + loss = model.loss(pos_triples, neg_triples) + loss.backward() + optimizer.step() + if (epoch + 1) % 30 == 0: + print(f'Epoch {epoch+1}, Loss: {loss.item()}') + +# Accessing the entity and relation embeddings after training +final_entity_embeddings = model.entity_embeddings.weight.data +final_relation_embeddings = model.relation_embeddings.weight.data + +print("\nFinal Entity Embeddings:") +# print(final_entity_embeddings) + +print("\nFinal Relation Embeddings:") +# print(final_relation_embeddings) + +# Example inference: get the embedding for a specific entity and relation +entity_idx = 0 # Example entity index +relation_idx = 0 # Example relation index + +initial_entity_embedding = initial_entity_embeddings[entity_idx] +initial_relation_embedding = initial_relation_embeddings[relation_idx] +final_entity_embedding = final_entity_embeddings[entity_idx] +final_relation_embedding = final_relation_embeddings[relation_idx] + +print(f'\nInitial Embedding for entity {entity_idx}: {initial_entity_embedding}') +print(f'Initial Embedding for relation {relation_idx}: {initial_relation_embedding}') +print(f'Final Embedding for entity {entity_idx}: {final_entity_embedding}') +print(f'Final Embedding for relation {relation_idx}: {final_relation_embedding}') + + +``` + +Ouptut: + +```text +Initial Entity Embeddings: +tensor([[ 0.3136, -0.1941, -0.1631, -0.2950, 0.3734, 0.0123, -0.3953], + [-0.2188, -0.2769, -0.5799, -0.2892, -0.2250, 0.7215, -0.4351], + [-0.0712, 0.1976, -0.2865, -0.2884, 0.0214, -0.3816, -0.0810], + [ 0.2617, -0.6003, 0.7249, 0.6405, -0.6770, 0.1776, -0.4784]]) +Initial Relation Embeddings: +tensor([[ 0.3823, 0.7712, -0.7779, -0.7209, -0.3273, -0.7017, 0.2453], + [-0.0923, 0.2648, 0.0189, -0.6642, -0.0149, 0.1680, -0.6854]]) +Epoch 30, Loss: 5.484335899353027 +Epoch 60, Loss: 4.4643402099609375 +Epoch 90, Loss: 3.444344997406006 +Epoch 120, Loss: 2.511836290359497 +Epoch 150, Loss: 1.6007781028747559 +Epoch 180, Loss: 0.713299036026001 + +Final Entity Embeddings: + +Final Relation Embeddings: + +Initial Embedding for entity 0: tensor([ 0.3136, -0.1941, -0.1631, -0.2950, 0.3734, 0.0123, -0.3953]) +Initial Embedding for relation 0: tensor([ 0.3823, 0.7712, -0.7779, -0.7209, -0.3273, -0.7017, 0.2453]) +Final Embedding for entity 0: tensor([ 0.5136, -0.3941, 0.0318, -0.0950, 0.5734, 0.2123, -0.4633]) +Final Embedding for relation 0: tensor([ 0.3823, 0.7712, -0.7981, -0.7209, -0.3273, -0.7017, 0.1057]) +``` + +## more notes + +- Triplets validation + + - Triple Validation: Validate new facts or check the consistency of existing facts in the knowledge graph. For example, to validate `(Alice, friendOf, Bob)`, the model would score this triple. If the score is high, the fact is considered valid. + + - Triple Completion: Predict missing entities in a query. For example, given `(Alice, friendOf, ?)`, the model can score different entities to find the most likely friend of Alice. + +- Edge prediction using Ampligraph: + + ![img](../img/AmpliGraph.png) + +- paper review: + +- demo neo4j + +- TransE: + +- DFS/BFS for context of the node. Used in Random Walk approach. +- ![dfs-bfs](../img/bfs-dfs.jpeg) + +- recent work: Graph RAG using LLMs