docs/source/package_reference/cartridges.md
Cartridges are a prompt-learning method that stores a compressed long-context representation as a parameterized KV-cache prefix. The core idea comes from the paper Cartridges: Lightweight and general-purpose long context representations via self-study.
For a high-level overview and motivation, see the blog post Cartridges: Storing long contexts in tiny caches with self-study.
Both Prefix Tuning and Cartridges are served by injecting past_key_values (a prefix KV cache) into the base model.
p virtual tokens), and are
designed to be initialized from real prefill KV (for example, the first p tokens of a corpus/system prompt).The paper also recommends freezing the first token as an attention sink for stability (num_frozen_tokens=1 is the
default).
Load a trained CARTRIDGE adapter and run generation:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_path = "path/to/cartridge_adapter"
base = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(base, adapter_path)
tok = AutoTokenizer.from_pretrained(model_id)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
out = model.generate(**tok("Question about the corpus:", return_tensors="pt"), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))
If you need to create and initialize a cartridge before training, see the initialization options below.
The paper discusses a few practical initialization strategies:
CartridgeConfig and start training. This initializes the KV prefix randomly.initialize_kv_prefix_from_text(model, tokenizer, text=...). This
runs a prefill on text and copies the resulting KV cache for the first num_virtual_tokens into the adapter.initialize_kv_prefix_from_past_key_values(model, past_key_values=...) if you already
have a past_key_values object from a base-model prefill.The Cartridges paper proposes a SELF-STUDY distillation objective (a frozen base model provides teacher logits; the
CARTRIDGE adapter is trained so the student matches the teacher’s next-token distribution over the target segment).
PEFT keeps training logic out of the core library; see
https://github.com/huggingface/peft/tree/main/examples/cartridge_self_study for a reference workflow.
The example scripts use the frozen base model as the teacher and the adapted model as the student, so both share the
same underlying checkpoint.
To concatenate independently trained cartridges into a single adapter, use compose_cartridge_adapters(...).
[[autodoc]] tuners.cartridge.config.CartridgeConfig
[[autodoc]] tuners.cartridge.model.CartridgeEncoder