scientific-skills/torchdrug/references/retrosynthesis.md
Retrosynthesis is the process of planning synthetic routes from target molecules back to commercially available starting materials. TorchDrug provides tools for learning-based retrosynthesis prediction, breaking down the complex task into manageable subtasks.
The standard benchmark dataset for retrosynthesis derived from US patent literature.
Statistics:
Reaction Types:
Data Splits:
Format:
TorchDrug decomposes retrosynthesis into a multi-step pipeline:
Identifies the reaction center - which bonds were formed/broken in the forward reaction.
Input: Product molecule Output: Probability for each bond of being part of reaction center
Purpose:
Model Architecture:
Evaluation Metrics:
Given the product and identified reaction center, predict the reactant structures (synthons).
Input:
Output:
Process:
Challenges:
Evaluation:
Combines center identification and synthon completion into a unified pipeline.
Input: Target product molecule Output: Ranked list of reactant sets (synthesis pathways)
Workflow:
Advantages:
from torchdrug import datasets, models, tasks
# Load dataset
dataset = datasets.USPTO50k("~/retro-datasets/")
# For center identification
model_center = models.RGCN(
input_dim=dataset.node_feature_dim,
num_relation=dataset.num_bond_type,
hidden_dims=[256, 256, 256]
)
task_center = tasks.CenterIdentification(
model_center,
top_k=3 # Consider top 3 reaction centers
)
# For synthon completion
model_synthon = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256]
)
task_synthon = tasks.SynthonCompletion(
model_synthon,
center_topk=3, # Use top 3 from center identification
num_synthon_beam=5 # Beam search for synthon generation
)
# End-to-end
task_retro = tasks.Retrosynthesis(
model=model_center,
synthon_model=model_synthon,
center_topk=5,
num_synthon_beam=10
)
Pre-train on large reaction datasets (e.g., USPTO-full with 1M+ reactions), then fine-tune on specific reaction classes.
Benefits:
Train jointly on:
Advantages:
RGCN (Relational Graph Convolutional Network):
GIN (Graph Isomorphism Network):
GAT (Graph Attention Network):
Transformer Models:
LSTM/GRU:
Combine graph and sequence representations:
Common Transformations:
Rare Reactions:
Regioselectivity:
Stereoselectivity:
Chemoselectivity:
While TorchDrug focuses on reaction connectivity, consider:
Predict immediate precursors for target molecule.
Use Case:
Recursively apply retrosynthesis to each predicted reactant until reaching commercial building blocks.
Tree Search Strategies:
Breadth-First Search:
Depth-First Search:
Monte Carlo Tree Search (MCTS):
A* Search:
Rank synthetic routes by:
Stop retrosynthesis when reaching:
Check each predicted reaction:
Filters:
Expert Systems:
Databases:
Considerations:
Train forward reaction prediction models to validate retrosynthetic proposals:
Integration with:
TorchDrug as Component:
High-Throughput Screening:
Robotic Synthesis: