This model was released on 2019-10-02 and added to Hugging Face Transformers on 2020-11-16.

</div>

</div>

DistilBERT

DistilBERT is pretrained by knowledge distillation to create a smaller model with faster inference and requires less compute to train. Through a triple loss objective during pretraining, language modeling loss, distillation loss, cosine-distance loss, DistilBERT demonstrates similar performance to a larger transformer language model.

You can find all the original DistilBERT checkpoints under the DistilBERT organization.

[!TIP] Click on the DistilBERT models in the right sidebar for more examples of how to apply DistilBERT to different language tasks.

The example below demonstrates how to classify text with [Pipeline], [AutoModel], and from the command line.

python

from transformers import pipeline


classifier = pipeline(
    task="text-classification",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=0
)

result = classifier("I love using Hugging Face Transformers!")
print(result)
# Output: [{'label': 'POSITIVE', 'score': 0.9998}]

</hfoption> <hfoption id="AutoModel">

python

import torch

from transformers import AutoModelForSequenceClassification, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained(
    "distilbert/distilbert-base-uncased-finetuned-sst-2-english",
)
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    device_map="auto",
    attn_implementation="sdpa"
)
inputs = tokenizer("I love using Hugging Face Transformers!", return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model(**inputs)

predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
predicted_label = model.config.id2label[predicted_class_id]
print(f"Predicted label: {predicted_label}")

</hfoption> </hfoptions>

Notes

DistilBERT doesn't have token_type_ids, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token tokenizer.sep_token (or [SEP]).
DistilBERT doesn't have options to select the input positions (position_ids input). This could be added if necessary though, just let us know if you need this option.

DistilBertConfig

[[autodoc]] DistilBertConfig

DistilBertTokenizer

[[autodoc]] DistilBertTokenizer

DistilBertTokenizerFast

[[autodoc]] DistilBertTokenizerFast

DistilBertModel

[[autodoc]] DistilBertModel - forward

DistilBertForMaskedLM

[[autodoc]] DistilBertForMaskedLM - forward

DistilBertForSequenceClassification

[[autodoc]] DistilBertForSequenceClassification - forward

DistilBertForMultipleChoice

[[autodoc]] DistilBertForMultipleChoice - forward

DistilBertForTokenClassification

[[autodoc]] DistilBertForTokenClassification - forward

DistilBertForQuestionAnswering

[[autodoc]] DistilBertForQuestionAnswering - forward