scientific-skills/torchdrug/references/knowledge_graphs.md
Knowledge graphs represent structured information as entities and relations in a graph format. TorchDrug provides comprehensive support for knowledge graph completion (link prediction) using embedding-based models and neural reasoning approaches.
FB15k (Freebase subset):
FB15k-237:
WN18 (WordNet):
WN18RR:
Hetionet:
The primary task for knowledge graphs is link prediction - given a head entity and relation, predict the tail entity (or vice versa).
Head Prediction:
Tail Prediction:
Both:
Ranking Metrics:
Filtered vs Raw:
TransE (Translation Embedding):
RotatE (Rotation Embedding):
DistMult:
ComplEx:
SimplE:
NeuralLP (Neural Logic Programming):
KBGAT (Knowledge Base Graph Attention):
from torchdrug import datasets, models, tasks, core
# Load dataset
dataset = datasets.FB15k237("~/kg-datasets/")
# Define model
model = models.RotatE(
num_entity=dataset.num_entity,
num_relation=dataset.num_relation,
embedding_dim=2000,
max_score=9
)
# Define task
task = tasks.KnowledgeGraphCompletion(
model,
num_negative=128,
adversarial_temperature=2,
criterion="bce"
)
# Train with PyTorch Lightning or custom loop
Strategies:
Parameters:
num_negative: Number of negative samples per positive tripleadversarial_temperature: Temperature for self-adversarial weightingBinary Cross-Entropy (BCE):
Margin Loss:
max(0, margin + score_neg - score_pos)Logistic Loss:
1-to-1 Relations:
1-to-N Relations:
N-to-1 Relations:
N-to-N Relations:
Symmetric Relations:
Antisymmetric Relations:
Inverse Relations:
Composition:
Small Graphs (< 50k entities):
Large Graphs (> 100k entities):
Sparse Graphs:
Dense, Complete Graphs:
Biomedical/Domain Graphs:
Chain multiple relations to answer complex queries:
Extend to time-varying facts:
Handle relations with few examples:
Generalize to unseen entities:
Predict "drug treats disease" links in Hetionet:
Identify genes associated with diseases:
Link proteins to biological processes:
Issue: Poor performance on specific relation types
Issue: Overfitting on small graphs
Issue: Slow training on large graphs
Issue: Cannot handle new entities