examples/adamss_finetuning/README.md
AdaMSS (Adaptive Matrix Decomposition with Subspace Selection) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD into low-rank subspaces. It uses only ~0.07% of original trainable parameters (e.g., 59K for ViT-Base vs 86M full fine-tuning) while maintaining competitive performance.
The method optionally supports ASA (Adaptive Subspace Allocation) for dynamic subspace selection during training, further improving efficiency and performance.
See the paper for more details.
Install from local source:
cd peft-main && pip install -e .
pip install transformers datasets torch torchvision evaluate accelerate scikit-learn
Verify installation:
python -c "from peft import AdamssConfig; print('AdaMSS ready')"
Core AdaMSS Configuration:
from peft import AdamssConfig, get_peft_model
# Configure AdaMSS with ASA
config = AdamssConfig(
r=100, # SVD rank (full decomposition rank)
num_subspaces=10, # Number of subspaces (K) - initial capacity
subspace_rank=3, # Rank per subspace (ri) - use 1 for NLU, 3 for Vision
target_modules=["query", "value"], # Target attention layers
use_asa=True, # Enable Adaptive Subspace Allocation
asa_target_subspaces=5, # Target active subspaces (ASA reduces K→5)
init_warmup=50, # Start ASA after 50 steps
final_warmup=1000, # Complete masking by step 1000
mask_interval=100, # Update mask every 100 steps
modules_to_save=["classifier"], # Modules to train without decomposition
)
peft_model = get_peft_model(model, config)
Option A – With HuggingFace Trainer (callback):
from peft.tuners.adamss.asa_callback import AdamssAsaCallback
# The callback is a thin wrapper around model.update_and_allocate()
trainer = Trainer(
model=peft_model,
callbacks=[AdamssAsaCallback()],
# ... other arguments
)
trainer.train()
Option B – Custom training loop (no Trainer needed):
for step, batch in enumerate(dataloader):
loss = peft_model(**batch).loss
loss.backward()
optimizer.step()
peft_model.base_model.update_and_allocate(step) # ← all ASA logic in one call
optimizer.zero_grad()
Key Points:
r × (d_in + d_out), split into K subspaces of rank ri eachasa_target_subspaces most important subspaces from initial num_subspacesinit_warmup to final_warmupsubspace_rank=3 for vision, subspace_rank=1 for NLU tasksRun the provided script with your configuration:
python examples/adamss_finetuning/image_classification_adamss_asa.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 3 \
--use_asa \
--asa_target_subspaces 5 \
--output_dir ./output
Run GLUE tasks (e.g., CoLA) with ASA:
python examples/adamss_finetuning/glue_adamss_asa_example.py \
--dataset_name cola \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 1 \
--use_asa \
--asa_target_subspaces 5 \
--num_epochs 100 \
--batch_size 32 \
--output_dir ./output_cola_asa
Without ASA (fixed K=10):
python examples/adamss_finetuning/glue_adamss_asa_example.py \
--dataset_name cola \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 1 \
--num_epochs 100 \
--batch_size 32 \
--output_dir ./output_cola_no_asa
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 100 | SVD decomposition rank |
num_subspaces | int | 10 | Number of subspaces (K) |
subspace_rank | int | 3 | Rank per subspace (ri) |
target_modules | list | - | Modules to apply AdaMSS (e.g., ["query", "value"]) |
use_asa | bool | False | Enable Adaptive Subspace Allocation |
asa_target_subspaces | int | None | Target active subspaces when ASA enabled |
modules_to_save | list | None | Modules to train without decomposition |
The ASA callback reads all parameters from AdamssConfig. Import it directly:
from peft.tuners.adamss.asa_callback import AdamssAsaCallback
ASA-related config parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
init_warmup | int | 50 | Steps before starting masking |
final_warmup | int | 1000 | Steps to reach target active subspaces |
mask_interval | int | 100 | Steps between subspace selection updates |
asa_importance_beta | float | 0.85 | EMA decay for importance tracking |
asa_uncertainty_beta | float | 0.85 | EMA decay for uncertainty tracking |
asa_schedule_exponent | float | 3.0 | Exponent for masking schedule |
Results with AdaMSS + ASA (100 epochs, seed=0):
| Task | Model | AdaMSS Params | Metric | Score |
|---|---|---|---|---|
| CoLA | RoBERTa-base | 27.0K (ASA K→5) | Matthews | 0.6466 |
| CoLA | RoBERTa-large | 64.8K (ASA K→5) | Matthews | 0.7093 |
| MRPC | RoBERTa-base | 27.2K (ASA K→5) | Accuracy | 0.8824 |
| MRPC | RoBERTa-large | 66.7K (ASA K→5) | Accuracy | 0.9044 |
Notes:
Results with AdaMSS on Stanford Cars (10 epochs, seed=0):
| Model | Method | AdaMSS Params | Test Accuracy |
|---|---|---|---|
| ViT-Base | AdaMSS (no ASA) | 121K (K=10) | 82.15% |
| ViT-Base | AdaMSS + ASA | 75.0K (K→5) | 80.45% |
Notes:
If you use AdaMSS in your research, please cite:
@inproceedings{zheng2025adamss,
title={AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning},
author={Zheng, Jingjing and Lu, Wanglong and Dong, Yiming and Ji, Chaojie and Cao, Yankai and Lin, Zhouchen},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
}