Back to Transformers

T5v1.1

docs/source/en/model_doc/t5v1.1.md

5.8.03.2 KB
Original Source
<!--Copyright 2021 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer. -->

This model was released on 2020-02-12 and added to Hugging Face Transformers on 2023-06-20.

T5v1.1

<div class="flex flex-wrap space-x-1"> </div>

Overview

T5v1.1 was released in the google-research/text-to-text-transfer-transformer repository by Colin Raffel et al. It's an improved version of the original T5 model. This model was contributed by patrickvonplaten. The original code can be found here.

Usage tips

One can directly plug in the weights of T5v1.1 into a T5 model, like so:

python
from transformers import T5ForConditionalGeneration


model = T5ForConditionalGeneration.from_pretrained("google/t5-v1_1-base", device_map="auto")

T5 Version 1.1 includes the following improvements compared to the original T5 model:

  • GEGLU activation in the feed-forward hidden layer, rather than ReLU. See this paper.

  • Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.

  • Pre-trained on C4 only without mixing in the downstream tasks.

  • No parameter sharing between the embedding and classifier layer.

  • "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.

Note: T5 Version 1.1 was only pre-trained on C4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model. Since t5v1.1 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Google has released the following variants:

<Tip>

Refer to T5's documentation page for all API reference, tips, code examples and notebooks.

</Tip>