examples/int8_training/Finetune_flan_t5_large_bnb_peft.ipynb
bitsandbytes, peft & transformers 🤗In this notebook we will see how to properly use peft , transformers & bitsandbytes to fine-tune flan-t5-large in a google colab!
We will finetune the model on zeroshot/twitter-financial-news-sentiment dataset, that consists of financial tweets labeled with sentiment (bearish, bullish, or neutral).
Note that you could use the same notebook to fine-tune flan-t5-xl as well, but you would need to shard the models first to avoid CPU RAM issues on Google Colab, check these weights.
!pip install -q datasets==3.6.0 accelerate
!pip install -q git+https://github.com/bitsandbytes-foundation/bitsandbytes.git
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main
# Select CUDA device index
import os
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
model_name = "google/flan-t5-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_name)
Some pre-processing needs to be done before training such an int8 model using peft, therefore let's import an utiliy function prepare_model_for_kbit_training that will:
int8 modules to full precision (fp32) for stabilityforward_hook to the input embedding layer to enable gradient computation of the input hidden statesfrom peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
PeftModelHere we will use LoRA (Low-Rank Adaptators) to train our model
from peft import LoraConfig, get_peft_model, TaskType
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
)
lora_config = LoraConfig(
r=16, lora_alpha=32, target_modules=["q", "v"], lora_dropout=0.05, bias="none", task_type="SEQ_2_SEQ_LM"
)
model = get_peft_model(model, lora_config)
print_trainable_parameters(model)
As you can see, here we are only training 0.6% of the parameters of the model! This is a huge memory gain that will enable us to fine-tune the model without any memory issue.
Here we will use zeroshot/twitter-financial-news-sentiment dataset to fine-tune our model on sentiment classification on financial tweets.
# loading dataset
dataset = load_dataset("zeroshot/twitter-financial-news-sentiment")
dataset = dataset["train"].train_test_split(test_size=0.1)
dataset["validation"] = dataset["test"]
del dataset["test"]
if hasattr(dataset["train"].features["label"], "names"):
classes = dataset["train"].features["label"].names
else:
classes = ["Bearish", "Bullish", "Neutral"]
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["label"]]},
batched=True,
num_proc=1,
)
Let's also apply some pre-processing of the input data, the labels needs to be pre-processed, the tokens corresponding to pad_token_id needs to be set to -100 so that the CrossEntropy loss associated with the model will correctly ignore these tokens.
# data preprocessing
text_column = "text"
label_column = "text_label"
max_length = 128
def preprocess_function(examples):
inputs = examples[text_column]
targets = examples[label_column]
model_inputs = tokenizer(inputs, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
labels = tokenizer(targets, max_length=3, padding="max_length", truncation=True, return_tensors="pt")
labels = labels["input_ids"]
labels[labels == tokenizer.pad_token_id] = -100
model_inputs["labels"] = labels
return model_inputs
processed_datasets = dataset.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=dataset["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]
Let's now train our model, run the cells below.
Note that for T5 since some layers are kept in float32 for stability purposes there is no need to call autocast on the trainer.
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
"temp",
eval_strategy="epoch",
learning_rate=1e-3,
gradient_accumulation_steps=1,
auto_find_batch_size=True,
num_train_epochs=1,
save_steps=100,
save_total_limit=8,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()
Let's have a quick qualitative evaluation of the model, by taking a sample from the dataset that corresponds to a positive label. Run your generation similarly as you were running your model from transformers:
model.eval()
input_text = "In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 ."
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print("input sentence: ", input_text)
print(" output prediction: ", tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
Once you have trained your adapter, you can easily share it on the Hub using the method push_to_hub . Note that only the adapter weights and config will be pushed
from huggingface_hub import notebook_login
notebook_login()
model.push_to_hub("ybelkada/flan-t5-large-financial-phrasebank-lora", use_auth_token=True)
You can load the model together with the adapter with few lines of code! Check the snippet below to load the adapter from the Hub and run the example evaluation!
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
peft_model_id = "ybelkada/flan-t5-large-financial-phrasebank-lora"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)
model.eval()
input_text = "In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 ."
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print("input sentence: ", input_text)
print(" output prediction: ", tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))