docs/source/en/model_doc/kimi_k25.md
This model was contributed to Hugging Face Transformers on 2026-07-03.
This model class supports all three different releases: KimiK-2.5,KimiK-2.6, KimiK-2.7
Kimi K2.5 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. The model was proposed in Kimi K2.5: Visual Agentic Intelligence and further improved in [Kimi K2.6: Advancing Open-Source Coding](Kimi K2.5: Visual Agentic Intelligence).
Kimi K2.5 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization. The model is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision.
This model was contributed by RaushanTurganbay. The offical checkpoints are moonshotai/Kimi-K2.5, moonshotai/Kimi-K2.6 and moonshotai/Kimi-K2.7-Code.
Note that the repositories don't yet have the correct fast tokenizer uploaded. You can get the converted processor and tokenizer from RaushanTurganbay/kimi2.7-processor
</Tip>import os
import torch
from transformers import AutoProcessor, AutoTokenizer, AutoModelForImageTextToText
from transformers.distributed.configuration_utils import DistributedConfig
distributed_config = DistributedConfig(enable_expert_parallel=True)
processor = AutoProcessor.from_pretrained('moonshotai/Kimi-K2.6')
model = AutoModelForImageTextToText.from_pretrained(
'moonshotai/Kimi-K2.6',
distributed_config=distributed_config,
)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "https://www.ilankelman.org/stopsigns/australia.jpg"},
{"type": "text", "text": "What is shown in this image?"},
],
}
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to(device=model.device, dtype=model.dtype)
generated_ids = model.generate(**inputs, max_new_tokens=64)
generated_text = processor.batch_decode(generated_ids[:, inputs["input_ids"].shape[-1]:], skip_special_tokens=True)[0]
print(generated_text)
[[autodoc]] Kimi_K25ImageProcessor
[[autodoc]] Kimi_K25Processor
[[autodoc]] Kimi_K25VideoProcessor
[[autodoc]] Kimi_K25Config
[[autodoc]] Kimi_K25VisionConfig
[[autodoc]] Kimi_K25PreTrainedModel - forward
[[autodoc]] Kimi_K25VisionModel
[[autodoc]] Kimi_K25Model - forward
[[autodoc]] Kimi_K25ForConditionalGeneration