PVeRA: Probabilistic Vector-Based Random Matrix Adaptation

PVeRA is a parameter-efficient fine-tuning technique that is base on VeRA, in the family of the LoRA-based adapters. It keeps the very low parameter budget of VeRA, but increases the performance by learning a distribution of latent adaptations. This also enables models adapted with PVeRA to generate Monte Carlo confidence interval estimates, by sampling from the learned distribution at inference.

When saving the adapter parameters, it's possible to eschew storing the low rank matrices by setting save_projection=False on the PveraConfig. In that case, these matrices will be restored based on the fixed random seed from the projection_prng_key argument. This cuts down on the size of the checkpoint, but we cannot guarantee reproducibility on all devices and for all future versions of PyTorch. If you want to ensure reproducibility, set save_projection=True (which is the default).

To handle different shapes of adapted layers, PVeRA initializes shared A and B matrices with the largest required size for each dimension. During the forward pass, submatrices A and B for a given layer are sliced out from these shared matrices and used as described in the paper. For example, adapting two linear layers of shapes (100, 20) and (80, 50) will create A and B matrices of shapes (rank, 50) and (100, rank) respectively. Then, to adapt a layer of shape (100, 20), submatrices A and B of shapes (rank, 20) and (100, rank) will be extracted.

PVeRA currently has the following constraint:

Only nn.Linear layers are supported.
The latent representation is not easily accessible, for training using the KL divergence.

The abstract from the paper is:

Large foundation models have emerged in the last years and are pushing performance boundaries for a variety of tasks. Training or even finetuning such models demands vast datasets and computational resources, which are often scarce and costly. Adaptation methods provide a computationally efficient solution to address these limitations by allowing such models to be finetuned on small amounts of data and computing power. This is achieved by appending new trainable modules to frozen backbones with only a fraction of the trainable parameters and fitting only these modules on novel tasks. Recently, the VeRA adapter was shown to excel in parameter-efficient adaptations by utilizing a pair of frozen random low-rank matrices shared across all layers. In this paper, we propose PVeRA, a probabilistic version of the VeRA adapter, which modifies the low-rank matrices of VeRA in a probabilistic manner. This modification naturally allows handling inherent ambiguities in the input and allows for different sampling configurations during training and testing. A comprehensive evaluation was performed on the VTAB-1k benchmark and seven adapters, with PVeRA outperforming VeRA and other adapters.

PveraConfig

[[autodoc]] tuners.pvera.config.PveraConfig

PveraModel

[[autodoc]] tuners.pvera.model.PveraModel