scientific-skills/torchdrug/references/protein_modeling.md
TorchDrug provides extensive support for protein-related tasks including sequence analysis, structure prediction, property prediction, and protein-protein interactions. Proteins are represented as graphs where nodes are amino acid residues and edges represent spatial or sequential relationships.
Enzyme Function:
Protein Characteristics:
Gene Ontology:
Protein-Protein Interactions:
Protein-Ligand Binding:
Predict properties at the residue (node) level, such as secondary structure or contact maps.
Use Cases:
Predict protein-level properties like function, stability, or localization.
Use Cases:
Predict interactions between protein pairs or protein-ligand pairs.
Key Features:
Specialized task for predicting spatial proximity between residues in folded structures.
Applications:
ESM (Evolutionary Scale Modeling):
ProteinBERT:
ProteinLSTM:
ProteinCNN / ProteinResNet:
GearNet (Geometry-Aware Relational Graph Network):
GCN/GAT/GIN on Protein Graphs:
SchNet:
Statistic Features:
Physicochemical Features:
Sequential Edges:
Spatial Edges:
Contact Edges:
Residue Identity:
Position Information:
Physicochemical Properties:
Self-Supervised Pre-training:
Pre-trained Model Usage:
from torchdrug import models
# Load pre-trained ESM
model = models.ESM(path="esm1b_t33_650M_UR50S.pt")
# Fine-tune on downstream task
task = tasks.PropertyPrediction(
model, task=["stability"],
criterion="mse", metric=["mae", "rmse"]
)
Train on multiple related tasks simultaneously:
For Sequence-Only Tasks:
For Structure-Based Tasks:
For Small Datasets:
Load AlphaFold-predicted structures:
from torchdrug import data
# Load AlphaFold structure
protein = data.Protein.from_pdb("alphafold_structure.pdb")
# Use in TorchDrug workflows
Use ESMFold for structure prediction, then analyze with TorchDrug models.
Generate structures with Rosetta, import to TorchDrug for analysis.