scientific-skills/timesfm-forecasting/references/system_requirements.md
TimesFM can run on a variety of hardware configurations. This guide helps you choose the right setup and tune performance for your machine.
per_core_batch_size=4max_context=512model.compile(timesfm.ForecastConfig(
max_context=512,
max_horizon=128,
per_core_batch_size=4,
normalize_inputs=True,
use_continuous_quantile_head=True,
fix_quantile_crossing=True,
))
per_core_batch_size=32 (CPU) or 64 (GPU)max_context=1024model.compile(timesfm.ForecastConfig(
max_context=1024,
max_horizon=256,
per_core_batch_size=64,
normalize_inputs=True,
use_continuous_quantile_head=True,
fix_quantile_crossing=True,
))
per_core_batch_size=128–256max_context=4096 or highermodel.compile(timesfm.ForecastConfig(
max_context=4096,
max_horizon=256,
per_core_batch_size=128,
normalize_inputs=True,
use_continuous_quantile_head=True,
fix_quantile_crossing=True,
))
Approximate RAM usage during inference:
| Component | TimesFM 2.5 (200M) | TimesFM 2.0 (500M) |
|---|---|---|
| Model weights | ~800 MB | ~2 GB |
| Runtime overhead | ~500 MB | ~1 GB |
| Input/output buffers | ~200 MB per 1000 series | ~500 MB per 1000 series |
| Total (small batch) | ~1.5 GB | ~3.5 GB |
| Total (large batch) | ~3 GB | ~6 GB |
Formula: RAM ≈ model_weights + 0.5 GB + (0.2 MB × num_series × context_length / 1000)
| Component | TimesFM 2.5 (200M) |
|---|---|
| Model weights | ~800 MB |
| KV cache + activations | ~200–500 MB (scales with context) |
| Batch buffers | ~100 MB per 100 series at context=1024 |
| Total (batch=32) | ~1.2 GB |
| Total (batch=128) | ~1.8 GB |
| Total (batch=256) | ~2.5 GB |
| Item | Size |
|---|---|
| TimesFM 2.5 safetensors | ~800 MB |
| Hugging Face cache overhead | ~200 MB |
| Total download | ~1 GB |
Model weights are downloaded once from Hugging Face Hub and cached in
~/.cache/huggingface/ (or $HF_HOME).
| GPU | VRAM | Recommended batch | Notes |
|---|---|---|---|
| RTX 3060 | 12 GB | 64 | Good entry-level |
| RTX 3090 / 4090 | 24 GB | 256 | Excellent for production |
| A100 (40 GB) | 40 GB | 512 | Cloud/HPC |
| A100 (80 GB) | 80 GB | 1024 | Cloud/HPC |
| T4 | 16 GB | 128 | Cloud (Colab, AWS) |
| V100 | 16–32 GB | 128–256 | Cloud |
| Chip | Unified Memory | Recommended batch | Notes |
|---|---|---|---|
| M1 | 8–16 GB | 16–32 | Works, slower than CUDA |
| M1 Pro/Max | 16–64 GB | 32–128 | Good performance |
| M2/M3/M4 Pro/Max | 18–128 GB | 64–256 | Excellent |
Works on any CPU with sufficient RAM. Expect 5–20× slower than GPU.
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.12+ |
| numpy | 1.26.4 | latest |
| torch | 2.0.0 | latest |
| huggingface_hub | 0.23.0 | latest |
| safetensors | 0.5.3 | latest |
| Package | Purpose | Install |
|---|---|---|
| jax | Flax backend | pip install jax[cuda] |
| flax | Flax backend | pip install flax |
| scikit-learn | XReg covariates | pip install scikit-learn |
| OS | Status | Notes |
|---|---|---|
| Linux (Ubuntu 20.04+) | ✅ Fully supported | Best performance with CUDA |
| macOS 13+ (Ventura) | ✅ Fully supported | MPS acceleration on Apple Silicon |
| Windows 11 + WSL2 | ✅ Supported | Use WSL2 for best experience |
| Windows (native) | ⚠️ Partial | PyTorch works, some edge cases |
# Reduce batch size
model.compile(timesfm.ForecastConfig(
per_core_batch_size=4, # Start very small
max_context=512, # Reduce context
...
))
# Process in chunks
for i in range(0, len(inputs), 50):
chunk = inputs[i:i+50]
p, q = model.forecast(horizon=H, inputs=chunk)
# Ensure matmul precision is set
import torch
torch.set_float32_matmul_precision("high")
# Use smaller context
model.compile(timesfm.ForecastConfig(
max_context=256, # Shorter context = faster
...
))
# Set a different cache directory
export HF_HOME=/path/with/more/space
# Or download manually
huggingface-cli download google/timesfm-2.5-200m-pytorch