TinyLoRA: Learning to Reason in 13 Parameters

TinyLoRA is an extremely parameter-efficient fine-tuning technique that builds upon the LoRA-XS approach by using SVD decomposition of frozen weights and projecting a tiny trainable vector through fixed random tensors. When combined with reinforcement learning (RL) training methods like GRPO, TinyLoRA can achieve competitive performance with as few as 1-13 trainable parameters.

The key innovation of TinyLoRA is replacing the trainable low-rank matrix R with a weighted sum of fixed random projection matrices: R = Σᵢ vᵢ Pᵢ, where v ∈ R^u is a tiny trainable vector of dimension u and Pᵢ are fixed random matrices. This dramatically reduces the number of trainable parameters while maintaining competitive performance.

TinyLoRA supports weight tying through the weight_tying parameter, a ratio between 0.0 and 1.0 that controls how many modules share the same trainable vector v. Setting weight_tying=0.0 (the default) means no sharing, while weight_tying=1.0 means full sharing across all target modules — achieving extreme parameter efficiency with just a single vector of u trainable parameters for the entire model.

When saving the adapter parameters, it's possible to eschew storing the random projection matrices by setting save_projection=False on the TinyLoraConfig. In that case, these matrices will be restored based on the fixed random seed from the projection_seed argument. This cuts down on the size of the checkpoint, but we cannot guarantee reproducibility on all devices and for all future versions of PyTorch. If you want to ensure reproducibility, set save_projection=True (which is the default).

TinyLoRA currently has the following constraints:

Only nn.Linear, nn.Embedding, and transformers.pytorch_utils.Conv1D layers are supported.

The abstract from the paper is:

Recent research has shown that language models can learn to reason, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90% of performance improvements while training 1000x fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require 100-1000x larger updates to reach the same performance.

TinyLoraConfig

[[autodoc]] tuners.tinylora.config.TinyLoraConfig

TinyLoraModel

[[autodoc]] tuners.tinylora.model.TinyLoraModel