Back to Peft

AdaMSS Fine-tuning

examples/adamss_finetuning/README.md

0.19.16.9 KB
Original Source

AdaMSS Fine-tuning

Introduction

AdaMSS (Adaptive Matrix Decomposition with Subspace Selection) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD into low-rank subspaces. It uses only ~0.07% of original trainable parameters (e.g., 59K for ViT-Base vs 86M full fine-tuning) while maintaining competitive performance.

The method optionally supports ASA (Adaptive Subspace Allocation) for dynamic subspace selection during training, further improving efficiency and performance.

See the paper for more details.

Installation & Quick Test

Install from local source:

bash
cd peft-main && pip install -e .
pip install transformers datasets torch torchvision evaluate accelerate scikit-learn

Verify installation:

bash
python -c "from peft import AdamssConfig; print('AdaMSS ready')"

Detailed Code Explanation

Core AdaMSS Configuration:

python
from peft import AdamssConfig, get_peft_model

# Configure AdaMSS with ASA
config = AdamssConfig(
    r=100,                          # SVD rank (full decomposition rank)
    num_subspaces=10,               # Number of subspaces (K) - initial capacity
    subspace_rank=3,                # Rank per subspace (ri) - use 1 for NLU, 3 for Vision
    target_modules=["query", "value"],  # Target attention layers
    use_asa=True,                   # Enable Adaptive Subspace Allocation
    asa_target_subspaces=5,         # Target active subspaces (ASA reduces K→5)
    init_warmup=50,                 # Start ASA after 50 steps
    final_warmup=1000,              # Complete masking by step 1000
    mask_interval=100,              # Update mask every 100 steps
    modules_to_save=["classifier"], # Modules to train without decomposition
)
peft_model = get_peft_model(model, config)

Option A – With HuggingFace Trainer (callback):

python
from peft.tuners.adamss.asa_callback import AdamssAsaCallback

# The callback is a thin wrapper around model.update_and_allocate()
trainer = Trainer(
    model=peft_model,
    callbacks=[AdamssAsaCallback()],
    # ... other arguments
)
trainer.train()

Option B – Custom training loop (no Trainer needed):

python
for step, batch in enumerate(dataloader):
    loss = peft_model(**batch).loss
    loss.backward()
    optimizer.step()
    peft_model.base_model.update_and_allocate(step)   # ← all ASA logic in one call
    optimizer.zero_grad()

Key Points:

  • Parameterization: Total params = r × (d_in + d_out), split into K subspaces of rank ri each
  • ASA Mechanism: Dynamically selects asa_target_subspaces most important subspaces from initial num_subspaces
  • Warmup Schedule: ASA gradually increases masking strength from init_warmup to final_warmup
  • Vision vs NLU: Use subspace_rank=3 for vision, subspace_rank=1 for NLU tasks

Use the training example scripts

Vision Tasks (Image Classification)

Run the provided script with your configuration:

bash
python examples/adamss_finetuning/image_classification_adamss_asa.py \
    --model_name_or_path google/vit-base-patch16-224-in21k \
    --dataset_name cifar10 \
    --adamss_r 100 \
    --adamss_k 10 \
    --adamss_ri 3 \
    --use_asa \
    --asa_target_subspaces 5 \
    --output_dir ./output

NLU Tasks (GLUE Benchmark)

Run GLUE tasks (e.g., CoLA) with ASA:

bash
python examples/adamss_finetuning/glue_adamss_asa_example.py \
    --dataset_name cola \
    --adamss_r 100 \
    --adamss_k 10 \
    --adamss_ri 1 \
    --use_asa \
    --asa_target_subspaces 5 \
    --num_epochs 100 \
    --batch_size 32 \
    --output_dir ./output_cola_asa

Without ASA (fixed K=10):

bash
python examples/adamss_finetuning/glue_adamss_asa_example.py \
    --dataset_name cola \
    --adamss_r 100 \
    --adamss_k 10 \
    --adamss_ri 1 \
    --num_epochs 100 \
    --batch_size 32 \
    --output_dir ./output_cola_no_asa

AdamssConfig Parameters

ParameterTypeDefaultDescription
rint100SVD decomposition rank
num_subspacesint10Number of subspaces (K)
subspace_rankint3Rank per subspace (ri)
target_moduleslist-Modules to apply AdaMSS (e.g., ["query", "value"])
use_asaboolFalseEnable Adaptive Subspace Allocation
asa_target_subspacesintNoneTarget active subspaces when ASA enabled
modules_to_savelistNoneModules to train without decomposition

AdamssAsaCallback

The ASA callback reads all parameters from AdamssConfig. Import it directly:

python
from peft.tuners.adamss.asa_callback import AdamssAsaCallback

ASA-related config parameters:

ParameterTypeDefaultDescription
init_warmupint50Steps before starting masking
final_warmupint1000Steps to reach target active subspaces
mask_intervalint100Steps between subspace selection updates
asa_importance_betafloat0.85EMA decay for importance tracking
asa_uncertainty_betafloat0.85EMA decay for uncertainty tracking
asa_schedule_exponentfloat3.0Exponent for masking schedule

Experimental Results

NLU Tasks (GLUE Benchmark)

Results with AdaMSS + ASA (100 epochs, seed=0):

TaskModelAdaMSS ParamsMetricScore
CoLARoBERTa-base27.0K (ASA K→5)Matthews0.6466
CoLARoBERTa-large64.8K (ASA K→5)Matthews0.7093
MRPCRoBERTa-base27.2K (ASA K→5)Accuracy0.8824
MRPCRoBERTa-large66.7K (ASA K→5)Accuracy0.9044

Notes:

  • Configuration: r=100, K=10→5 (ASA), ri=1
  • AdaMSS active params with ASA (5 out of 10 subspaces selected)
  • Full AdaMSS capacity: 97K (large) / 42K (base)
  • Training: 100 epochs, batch_size=32, warmup_ratio=0.06

Vision Tasks (Image Classification)

Results with AdaMSS on Stanford Cars (10 epochs, seed=0):

ModelMethodAdaMSS ParamsTest Accuracy
ViT-BaseAdaMSS (no ASA)121K (K=10)82.15%
ViT-BaseAdaMSS + ASA75.0K (K→5)80.45%

Notes:

  • Configuration: r=100, K=10, ri=3, 10 epochs, batch_size=32
  • ASA dynamically selects 5 out of 10 subspaces (75K active from 121K total)

Citation

If you use AdaMSS in your research, please cite:

bibtex
@inproceedings{zheng2025adamss,
  title={AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning},
  author={Zheng, Jingjing and Lu, Wanglong and Dong, Yiming and Ji, Chaojie and Cao, Yankai and Lin, Zhouchen},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
}

Reference