scientific-skills/torchdrug/references/molecular_property_prediction.md
Molecular property prediction involves predicting chemical, physical, or biological properties of molecules from their structure. TorchDrug provides comprehensive support for both classification and regression tasks on molecular graphs.
Classification Tasks:
Regression Tasks:
Standard task for graph-level property prediction supporting both classification and regression.
Key Parameters:
model: Graph representation model (GNN)task: "node", "edge", or "graph" level predictioncriterion: Loss function ("mse", "bce", "ce")metric: Evaluation metrics ("mae", "rmse", "auroc", "auprc")num_mlp_layer: Number of MLP layers for readoutExample Workflow:
import torch
from torchdrug import core, models, tasks, datasets
# Load dataset
dataset = datasets.BBBP("~/molecule-datasets/")
# Define model
model = models.GIN(input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True, readout="mean")
# Define task
task = tasks.PropertyPrediction(model, task=dataset.tasks,
criterion="bce",
metric=("auprc", "auroc"))
Specialized task for multi-label scenarios where each molecule can have multiple binary labels (e.g., Tox21, SIDER).
Key Features:
Small Molecules (< 1000 molecules):
Medium Datasets (1k-100k molecules):
Large Datasets (> 100k molecules):
3D Structure Available:
TorchDrug automatically extracts atom features:
Bond features include:
Add custom node/edge features using transforms:
from torchdrug import data, transforms
# Add custom features
transform = transforms.VirtualNode() # Add virtual node
dataset = datasets.BBBP("~/molecule-datasets/",
transform=transform)
Random Split: Standard train/val/test split Scaffold Split: Group molecules by Bemis-Murcko scaffolds (recommended for drug discovery) Stratified Split: Maintain label distribution across splits
Issue: Poor performance on imbalanced datasets
Issue: Overfitting on small datasets
Issue: Large memory consumption
Issue: Slow training