scientific-skills/molfeat/references/available_featurizers.md
This document provides a comprehensive catalog of all featurizers available in molfeat, organized by category.
Pre-trained transformer models for molecular embeddings using SMILES/SELFIES representations.
Pre-trained graph neural network models operating on molecular graph structures.
All pre-trained on ChEMBL molecules with different objectives:
Calculators for physico-chemical properties and molecular characteristics.
Binary or count-based fixed-length vectors representing molecular substructures.
Features based on pharmacologically relevant functional groups and their spatial relationships.
Descriptors capturing 3D molecular shape and electrostatic properties.
Descriptors based on molecular scaffolds and core structures.
Atom and bond-level features for constructing graph representations for Graph Neural Networks.
Molfeat integrates models from various sources:
Access to transformer models through HuggingFace hub:
Pre-trained GNN models from DGL-Life:
For traditional ML (Random Forest, SVM, etc.):
For deep learning:
For similarity searching:
For pharmacophore-based approaches:
For interpretability:
Some featurizers require optional dependencies:
pip install "molfeat[dgl]"pip install "molfeat[graphormer]"pip install "molfeat[transformer]"pip install "molfeat[fcd]"pip install "molfeat[map4]"pip install "molfeat[all]"from molfeat.store.modelstore import ModelStore
store = ModelStore()
all_models = store.available_models
# Print all available featurizers
for model in all_models:
print(f"{model.name}: {model.description}")
# Search for specific types
transformers = [m for m in all_models if "transformer" in m.tags]
gnn_models = [m for m in all_models if "gnn" in m.tags]
fingerprints = [m for m in all_models if "fingerprint" in m.tags]
Fastest:
Medium:
Slower:
Slowest (first run):
Low (< 200 dims):
Medium (200-2000 dims):
High (> 2000 dims):
Variable: