This model was released on {release_date} and added to Hugging Face Transformers on 2026-01-14.

LightOnOcr

LightOnOcr is a compact, end-to-end vision–language model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs.

📝 Read the full blog post | 📓 Finetuning notebook

Model Overview

LightOnOcr combines a Vision Transformer encoder (Pixtral-based) with a lightweight text decoder (Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.

Usage

python

from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor


model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-1B-1025", device_map="auto")
processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-1B-1025")

url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"

conversation = [{"role": "user", "content": [{"type": "image", "url": url}]}]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids = output_ids[0, inputs["input_ids"].shape[1] :]
output_text = processor.decode(generated_ids, skip_special_tokens=True)
print(output_text)

LightOnOcrConfig

[[autodoc]] LightOnOcrConfig

LightOnOcrProcessor

[[autodoc]] LightOnOcrProcessor - call

LightOnOcrModel

[[autodoc]] LightOnOcrModel - forward - get_image_features

LightOnOcrForConditionalGeneration

[[autodoc]] LightOnOcrForConditionalGeneration - forward - get_image_features