optional-skills/mlops/saelens/references/README.md
This directory contains comprehensive reference materials for SAELens.
saelenspip install sae-lens
Requirements: Python 3.10+, transformer-lens>=2.0.0
from transformer_lens import HookedTransformer
from sae_lens import SAE
# Load model and SAE
model = HookedTransformer.from_pretrained("gpt2-small", device="cuda")
sae, cfg_dict, sparsity = SAE.from_pretrained(
release="gpt2-small-res-jb",
sae_id="blocks.8.hook_resid_pre",
device="cuda"
)
# Encode activations to sparse features
tokens = model.to_tokens("Hello world")
_, cache = model.run_with_cache(tokens)
activations = cache["resid_pre", 8]
features = sae.encode(activations) # Sparse feature activations
reconstructed = sae.decode(features) # Reconstructed activations
SAEs decompose dense neural activations into sparse, interpretable features:
Loss = MSE(original, reconstructed) + L1_coefficient × L1(features)
| Release | Model | Description |
|---|---|---|
gpt2-small-res-jb | GPT-2 Small | Residual stream SAEs |
gemma-2b-res | Gemma 2B | Residual stream SAEs |
| Various | Search HuggingFace | Community-trained SAEs |