examples/gralora_finetuning/README.md
Granular Low-Rank Adaptation (GraLoRA) is a PEFT method designed to enhance the expressivity of low-rank adaptation while improving robustness to outlier activations, based on insights from well-known issues in quantization.
GraLoRA introduces a structured and fine-grained adaptation scheme. It divides the adaptation space into a grid of $𝑘^2$ smaller, independent adapter pairs, each responsible for a localized subset of the input and output dimensions.
With respect to your standard PEFT training procedure with LoRA, simply swap your LoraConfig for a GraloraConfig.
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer, SFTConfig
from peft import GraloraConfig
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
gralora_config = GraloraConfig()
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
processing_class=tokenizer,
peft_config=gralora_config,
args=SFTConfig(
max_length=2048,
dataset_text_field="text",
per_device_train_batch_size=2,
),
)
trainer.train()
trainer.model.save_pretrained("gralora-llama-3.2-3b")
Run the finetuning script simply by running:
python examples/gralora_finetuning/gralora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco
You can load and use the model as any other 🤗 models.
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B", dtype=torch.bfloat16, device_map="auto"
)
peft_model = PeftModel.from_pretrained(model, "gralora-llama-3-8b")
While gralora_k is set to 2 for default, you can increase this value to create more fine-grained adapters. gralora_k of 4 is recommended when the total rank (r + hybrid_r) is 64 or higher.
@misc{jung2025graloragranularlowrankadaptation,
title={GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning},
author={Yeonjoon Jung and Daehyun Ahn and Hyungjun Kim and Taesu Kim and Eunhyeok Park},
year={2025},
eprint={2505.20355},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.20355},
}