skills/transformers/SKILL.md
The Hugging Face Transformers library provides access to thousands of pre-trained models for tasks across NLP, computer vision, audio, and multimodal domains. Use this skill to load models, perform inference, and fine-tune on custom data.
Tested against transformers 5.9.x (stable; May 2026). Requires Python 3.10+ and PyTorch 2.4+.
uv pip install "transformers[torch]>=5.9" huggingface_hub datasets evaluate accelerate
For vision tasks, add:
uv pip install timm pillow
For audio tasks, add:
uv pip install librosa soundfile
Check your version:
import transformers
print(transformers.__version__)
Many models on the Hugging Face Hub are gated or private. Authenticate before loading them.
Recommended: CLI login (stores token in ~/.cache/huggingface/token):
hf auth login
Python:
from huggingface_hub import login
login() # Interactive prompt; do not hardcode tokens in scripts
Servers / CI: set HF_TOKEN in the environment (never commit tokens to git or shell profiles):
export HF_TOKEN="..." # Read token from a secret manager, not source code
Get tokens at: https://huggingface.co/settings/tokens
Security: Never paste tokens into notebooks, repos, or shared configs. Prefer hf auth login over exporting tokens in .bashrc or .zshrc.
Transformers v5 is PyTorch-only (TensorFlow and JAX backends were removed). For upgrades from v4, see the v5 migration guide. New projects should pair transformers 5.x with huggingface_hub 1.x.
Gated or custom architectures: accept the model license on the Hub, then load with trust_remote_code=True only when the model card requires custom code you have reviewed.
Cache location: set HF_HOME for a writable cache root (Hub files default under $HF_HOME/hub).
Use the Pipeline API for fast inference without manual configuration:
from transformers import pipeline
# Text generation (prefer max_new_tokens for causal LMs)
generator = pipeline("text-generation", model="Qwen/Qwen2.5-1.5B")
result = generator("The future of AI is", max_new_tokens=50)
# Text classification
classifier = pipeline("text-classification")
result = classifier("This movie was excellent!")
# Question answering
qa = pipeline("question-answering")
result = qa(question="What is AI?", context="AI is artificial intelligence...")
Use for simple, optimized inference across many tasks. Supports text generation, classification, NER, question answering, summarization, translation, image classification, object detection, audio classification, and more.
When to use: Quick prototyping, simple inference tasks, no custom preprocessing needed.
See references/pipelines.md for comprehensive task coverage and optimization.
Load pre-trained models with fine-grained control over configuration, device placement, and precision.
When to use: Custom model initialization, advanced device management, model inspection.
See references/models.md for loading patterns and best practices.
Generate text with LLMs using various decoding strategies (greedy, beam search, sampling) and control parameters (temperature, top-k, top-p).
When to use: Creative text generation, code generation, conversational AI, text completion.
See references/generation.md for generation strategies and parameters.
Fine-tune pre-trained models on custom datasets using the Trainer API with automatic mixed precision, distributed training, and logging.
When to use: Task-specific model adaptation, domain adaptation, improving model performance.
See references/training.md for training workflows and best practices.
Convert text to tokens and token IDs for model input, with padding, truncation, and special token handling.
When to use: Custom preprocessing pipelines, understanding model inputs, batch processing.
See references/tokenizers.md for tokenization details.
For straightforward tasks, use pipelines:
pipe = pipeline("task-name", model="model-id")
output = pipe(input_data)
For advanced control, load model and tokenizer separately:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("model-id")
model = AutoModelForCausalLM.from_pretrained("model-id", device_map="auto")
inputs = tokenizer("text", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
result = tokenizer.decode(outputs[0])
For task adaptation, use Trainer:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
For detailed information on specific components:
references/pipelines.md - All supported tasks and optimizationreferences/models.md - Loading, saving, and configurationreferences/generation.md - Text generation strategies and parametersreferences/training.md - Fine-tuning with Trainer APIreferences/tokenizers.md - Tokenization and preprocessing