skills/esm/SKILL.md
ESM provides protein language models for understanding, generating, and designing proteins. Use this skill for current EvolutionaryScale/Biohub workflows: ESM3 for generative design, ESMC for representation learning and embeddings, hosted Forge/Biohub inference, and ESMFold2 all-atom structure prediction.
Generate novel protein sequences with desired properties using multimodal generative modeling.
When to use:
Basic usage:
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig
# Load local open weights after accepting the license on Hugging Face.
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cuda")
# Create protein prompt
protein = ESMProtein(sequence="MPRT___KEND") # '_' represents masked positions
# Generate completion
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
print(protein.sequence)
For remote/cloud usage via Forge API:
import os
import esm
from esm.sdk.api import ESMProtein, GenerationConfig
# Same interface as local ESM3; token from ESM_API_KEY (see Authentication)
model = esm.sdk.client("esm3-medium-2024-08", token=os.environ["ESM_API_KEY"])
# Generate
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
See references/esm3-api.md for detailed ESM3 model specifications, advanced generation configurations, and multimodal prompting examples.
Use ESM3's structure track for structure prediction from sequence or inverse folding (sequence design from structure).
Structure prediction:
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig
# Predict structure from sequence
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_with_structure = model.generate(
protein,
GenerationConfig(track="structure", num_steps=protein.sequence.count("_"))
)
# Access predicted structure
coordinates = protein_with_structure.coordinates # 3D coordinates
pdb_string = protein_with_structure.to_pdb()
Inverse folding (sequence from structure):
# Design sequence for a target structure
protein_with_structure = ESMProtein.from_pdb("target_structure.pdb")
protein_with_structure.sequence = None # Remove sequence
# Generate sequence that folds to this structure
designed_protein = model.generate(
protein_with_structure,
GenerationConfig(track="sequence", num_steps=50, temperature=0.7)
)
Generate high-quality embeddings for downstream tasks like function prediction, classification, or similarity analysis.
When to use:
Basic usage:
from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein, LogitsConfig
# Load ESM C model
model = ESMC.from_pretrained("esmc_300m").to("cuda")
# Get embeddings
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_tensor = model.encode(protein)
logits_output = model.logits(
protein_tensor,
LogitsConfig(sequence=True, return_embeddings=True),
)
embeddings = logits_output.embeddings
Batch processing:
# Encode multiple proteins
proteins = [
ESMProtein(sequence="MPRTKEIND..."),
ESMProtein(sequence="AGLIVHSPQ..."),
ESMProtein(sequence="KTEFLNDGR...")
]
embeddings_list = [
model.logits(
model.encode(p),
LogitsConfig(sequence=True, return_embeddings=True),
).embeddings
for p in proteins
]
See references/esm-c-api.md for ESM C model details, efficiency comparisons, and advanced embedding strategies.
Use ESM3's function track to generate proteins with specific functional annotations or predict function from sequence.
Function-conditioned generation:
from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig
# Create protein with desired function
protein = ESMProtein(
sequence="_" * 200, # Generate 200 residue protein
function_annotations=[
FunctionAnnotation(label="fluorescent_protein", start=50, end=150)
]
)
# Generate sequence with specified function
functional_protein = model.generate(
protein,
GenerationConfig(track="sequence", num_steps=200)
)
Iteratively refine protein designs using ESM3's chain-of-thought generation approach.
from esm.sdk.api import GenerationConfig
# Multi-step refinement
protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")
# Step 1: Generate initial structure
config = GenerationConfig(track="structure", num_steps=50)
protein = model.generate(protein, config)
# Step 2: Refine sequence based on structure
config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5)
protein = model.generate(protein, config)
# Step 3: Predict function
config = GenerationConfig(track="function", num_steps=20)
protein = model.generate(protein, config)
Process multiple proteins efficiently using Forge's async methods.
import os
import asyncio
import esm
from esm.sdk.api import ESMProtein, GenerationConfig
client = esm.sdk.client("esm3-medium-2024-08", token=os.environ["ESM_API_KEY"])
# Async batch processing
async def batch_generate(proteins_list):
tasks = [
client.async_generate(protein, GenerationConfig(track="sequence"))
for protein in proteins_list
]
return await asyncio.gather(*tasks)
# Execute
proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)]
results = asyncio.run(batch_generate(proteins))
See references/forge-api.md for detailed Forge API documentation, authentication, rate limits, and batch processing patterns.
ESM3 Models (Generative):
esm3-open (1.4B) - Open weights, local usage after accepting the Hugging Face licenseesm3-medium-2024-08 (7B) - Best balance of quality and speed (Forge only)esm3-large-2024-03 (98B) - Highest quality, slower (Forge only)ESM C Models (Embeddings):
esmc_300m / esmc-300m-2024-12 (30 layers) - Lightweight, fast inference (open weights, local)esmc_600m / esmc-600m-2024-12 (36 layers) - Balanced performance (open weights, local)esmc-6b-2024-12 (80 layers) - Maximum quality (Forge API; local 6B weights require Forge or SageMaker)Local ESMC.from_pretrained() examples use underscore aliases (esmc_300m, esmc_600m). Hosted API clients use dated model IDs such as esmc-600m-2024-12.
Selection criteria:
esm3-open or esmc_300mesm3-medium-2024-08 via Forgeesm3-large-2024-03 or esmc-6b-2024-12 via ForgeInstall from PyPI (esm on PyPI by EvolutionaryScale). Current PyPI release: 3.2.3 (Oct 14, 2025). Requires Python >=3.12,<3.13.
Basic installation:
uv pip install "esm==3.2.3"
With Flash Attention (recommended for faster inference on NVIDIA GPUs):
uv pip install "esm==3.2.3"
uv pip install flash-attn --no-build-isolation
The Forge client ships with the esm package - no extra install for ESM3 or ESMC Forge inference.
Forge API access requires an API key. Never hardcode tokens in scripts or commit them to version control.
ESM_API_KEY is already set in the environment..env for ESM_API_KEY only (do not load unrelated secrets).import os
token = os.environ["ESM_API_KEY"] # raises KeyError if unset
esm.sdk.client() reads ESM_API_KEY automatically when token is omitted. Keep endpoint URLs fixed to trusted hosts such as https://forge.evolutionaryscale.ai or https://biohub.ai; do not take API hosts from untrusted user input.
Biohub platform: EvolutionaryScale and Forge now surface current hosted models through biohub.ai. SDK class names may still reference "Forge". See references/biohub-platform.md for ESMFold2 and Biohub-specific setup.
For detailed examples and complete workflows, see references/workflows.md which includes:
This skill includes comprehensive reference documentation:
references/esm3-api.md - ESM3 model architecture, API reference, generation parameters, and multimodal promptingreferences/esm-c-api.md - ESM C model details, embedding strategies, and performance optimizationreferences/forge-api.md - Forge platform documentation, authentication, batch processing, and deploymentreferences/biohub-platform.md - Biohub API migration, ESMFold2 structure prediction, and developer-console authreferences/workflows.md - Complete examples and common workflow patternsThese references contain detailed API specifications, parameter descriptions, and advanced usage patterns. Load them as needed for specific tasks.
For generation tasks:
esm3-open)For embedding tasks:
For production deployment:
ESM is designed for beneficial applications in protein engineering, drug discovery, and scientific research. Follow the Responsible Biodesign Framework (https://responsiblebiodesign.ai/) and Biohub Acceptable Use Policy (https://biohub.org/acceptable-use-policy/) when designing novel proteins. Consider biosafety and ethical implications of protein designs before experimental validation.