This model was released on 2020-04-10 and added to Hugging Face Transformers on 2021-01-05.

</div>

</div>

LED

Longformer-Encoder-Decoder (LED) is an encoder-decoder transformer model for sequence-to-sequence tasks like summarization. It extends Longformer, an encoder-only model designed to handle long inputs, by adding a decoder layer. The decoder uses full self-attention on the encoded tokens and previously decoded locations. Because of Longformer's linear self-attention mechanism, LED is more efficient than standard encoder-decoder models when processing long sequences.

You can find all the original [LED] checkpoints under the Ai2 organization.

[!TIP] This model was contributed by patrickvonplaten.

Click on the LED models in the right sidebar for more examples of how to apply LED to different language tasks.

The example below demonstrates how to summarize text with [Pipeline], [AutoModel], and from the command line.

python

import torch

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained(
    "allenai/led-base-16384"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
    "allenai/led-base-16384",
    device_map="auto"
)

input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)

# Place global attention on the first token
global_attention_mask = torch.zeros_like(input_ids.input_ids).to(model.device)
global_attention_mask[:, 0] = 1

output = model.generate(**input_ids, global_attention_mask=global_attention_mask, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))

</hfoption> </hfoptions>

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to only quantize the weights to int4.

python

import torch

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig


quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
    "allenai/led-large-16384",
    device_map="auto",
    quantization_config=quantization_config
)

tokenizer = AutoTokenizer.from_pretrained(
    "allenai/led-large-16384"
)

input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)

# Place global attention on the first token
global_attention_mask = torch.zeros_like(input_ids.input_ids).to(model.device)
global_attention_mask[:, 0] = 1

output = model.generate(**input_ids, global_attention_mask=global_attention_mask, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))

Notes

[LEDForConditionalGeneration] is an extension of [BartForConditionalGeneration] exchanging the traditional self-attention layer with Longformer's chunked self-attention layer. [LEDTokenizer] is an alias of [BartTokenizer].
LED pads the input_ids to be a multiple of config.attention_window if required. A small speedup is gained when [LEDTokenizer] is used with the pad_to_multiple_of argument.
LED works best on long-range sequence-to-sequence tasks where the input_ids are significantly longer than 1024 tokens.
LED uses global attention by means of the global_attention_mask (see [LongformerModel]). For summarization, it is advised to put global attention only on the first <s> token. For question answering, it is advised to put global attention on all tokens of the question.
To fine-tune LED on all 16384 parameters, gradient checkpointing can be enabled in case training leads to out-of-memory (OOM) errors. Enable gradient checkpointing by adding model.gradient_checkpointing_enable() and setting use_cache=False to disable the caching mechanism to save memory.
Inputs should be padded on the right because LED uses absolute position embeddings.

Resources

Read the LED on Arxiv notebook to see how LED can achieve state-of-the-art performance on Arxiv article summarization.
Read the Fine-tune LED notebook to learn how to fine-tune LED on PubMed articles.

LEDConfig

[[autodoc]] LEDConfig

LEDTokenizer

[[autodoc]] LEDTokenizer - get_special_tokens_mask - save_vocabulary

LEDTokenizerFast

[[autodoc]] LEDTokenizerFast

LED specific outputs

[[autodoc]] models.led.modeling_led.LEDEncoderBaseModelOutput

[[autodoc]] models.led.modeling_led.LEDSeq2SeqModelOutput

[[autodoc]] models.led.modeling_led.LEDSeq2SeqLMOutput

[[autodoc]] models.led.modeling_led.LEDSeq2SeqSequenceClassifierOutput

[[autodoc]] models.led.modeling_led.LEDSeq2SeqQuestionAnsweringModelOutput

LEDModel

[[autodoc]] LEDModel - forward

LEDForConditionalGeneration

[[autodoc]] LEDForConditionalGeneration - forward

LEDForQuestionAnswering

[[autodoc]] LEDForQuestionAnswering - forward