docs/source/en/model_doc/lighton_ocr.md
This model was released on {release_date} and added to Hugging Face Transformers on 2026-01-14.
LightOnOcr is a compact, end-to-end vision–language model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs.
📝 Read the full blog post | 📓 Finetuning notebook
Model Overview
LightOnOcr combines a Vision Transformer encoder (Pixtral-based) with a lightweight text decoder (Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-1B-1025", device_map="auto")
processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-1B-1025")
url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"
conversation = [{"role": "user", "content": [{"type": "image", "url": url}]}]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids = output_ids[0, inputs["input_ids"].shape[1] :]
output_text = processor.decode(generated_ids, skip_special_tokens=True)
print(output_text)
[[autodoc]] LightOnOcrConfig
[[autodoc]] LightOnOcrProcessor - call
[[autodoc]] LightOnOcrModel - forward - get_image_features
[[autodoc]] LightOnOcrForConditionalGeneration - forward - get_image_features