scientific-skills/datamol/references/core_api.md
This document covers the main functions available in the datamol namespace.
to_mol(mol, ...)Convert SMILES string or other molecular representations to RDKit molecule objects.
rdkit.Chem.Mol objectmol = dm.to_mol("CCO")from_inchi(inchi)Convert InChI string to molecule object.
from_smarts(smarts)Convert SMARTS pattern to molecule object.
from_selfies(selfies)Convert SELFIES string to molecule object.
copy_mol(mol)Create a copy of a molecule object to avoid modifying the original.
to_smiles(mol, ...)Convert molecule object to SMILES string.
canonical=True, isomeric=Trueto_inchi(mol, ...)Convert molecule to InChI string representation.
to_inchikey(mol)Convert molecule to InChI key (fixed-length hash).
to_smarts(mol)Convert molecule to SMARTS pattern.
to_selfies(mol)Convert molecule to SELFIES (Self-Referencing Embedded Strings) format.
sanitize_mol(mol, ...)Enhanced version of RDKit's sanitize operation using mol→SMILES→mol conversion and aromatic nitrogen fixing.
standardize_mol(mol, disconnect_metals=False, normalize=True, reionize=True, ...)Apply comprehensive standardization procedures including:
standardize_smiles(smiles, ...)Apply SMILES standardization procedures directly to a SMILES string.
fix_mol(mol)Attempt to fix molecular structure issues automatically.
fix_valence(mol)Correct valence errors in molecular structures.
reorder_atoms(mol, ...)Ensure consistent atom ordering for the same molecule regardless of original SMILES representation.
remove_hs(mol, ...)Remove hydrogen atoms from molecular structure.
add_hs(mol, ...)Add explicit hydrogen atoms to molecular structure.
to_fp(mol, fp_type='ecfp', ...)Generate molecular fingerprints for similarity calculations.
'ecfp' - Extended Connectivity Fingerprints (Morgan)'fcfp' - Functional Connectivity Fingerprints'maccs' - MACCS keys'topological' - Topological fingerprints'atompair' - Atom pair fingerprintsn_bits, radiuspdist(mols, ...)Calculate pairwise Tanimoto distances between all molecules in a list.
n_jobs parametercdist(mols1, mols2, ...)Calculate Tanimoto distances between two sets of molecules.
cluster_mols(mols, cutoff=0.2, feature_fn=None, n_jobs=1)Cluster molecules using Butina clustering algorithm.
cutoff: Distance threshold (default 0.2)feature_fn: Custom function for molecular featuresn_jobs: Parallelization (-1 for all cores)pick_diverse(mols, npick, ...)Select diverse subset of molecules based on fingerprint diversity.
pick_centroids(mols, npick, ...)Select centroid molecules representing clusters.
to_graph(mol)Convert molecule to graph representation for graph-based analysis.
get_all_path_between(mol, start, end)Find all paths between two atoms in molecular structure.
to_df(mols, smiles_column='smiles', mol_column='mol')Convert list of molecules to pandas DataFrame.
from_df(df, smiles_column='smiles', mol_column='mol')Convert pandas DataFrame to list of molecules.