docs_new/cookbook/autoregressive/Moonshotai/Kimi-K2.7-Code.mdx
Kimi-K2.7-Code is a coding-focused agentic model by Moonshot AI, built on top of Kimi-K2.6. It improves real-world long-horizon coding task completion while reducing thinking-token usage by approximately 30% compared with Kimi-K2.6.
Key Features:
Benchmarks:
<table> <thead> <tr> <th>Benchmark</th> <th>Kimi-K2.6</th> <th>Kimi-K2.7-Code</th> </tr> </thead> <tbody> <tr> <td>Kimi Code Bench v2</td> <td>50.9</td> <td>62.0</td> </tr> <tr> <td>Program Bench</td> <td>48.3</td> <td>53.6</td> </tr> <tr> <td>MLS Bench Lite</td> <td>26.7</td> <td>35.1</td> </tr> <tr> <td>Kimi Claw 24/7 Bench</td> <td>42.9</td> <td>46.9</td> </tr> <tr> <td>MCP Atlas</td> <td>69.4</td> <td>76.0</td> </tr> <tr> <td>MCP Mark Verified</td> <td>72.8</td> <td>81.1</td> </tr> </tbody> </table>Recommended Generation Parameters:
temperature=1.0, top_p=0.95Available Models:
License: Modified MIT for the native checkpoint.
For details, see the official model card.
Refer to the official SGLang installation guide.
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, deployment strategy, and capabilities.
import { KimiK27CodeDeployment } from '/src/snippets/autoregressive/kimi-k27-code-deployment.jsx'
<KimiK27CodeDeployment />--context-length 128000 to conserve memory.--context-length when you need to reserve memory for larger batches.transformers>=4.57.1,<5.0.0.heads_per_gpu % 16 == 0. With TP=4, each GPU gets 16 heads (valid). With TP=8, each GPU gets 8 heads (invalid).lmsysorg/sglang:v0.5.9-rocm700-mi35x for MI350X/MI355X and lmsysorg/sglang:v0.5.9-rocm700-mi30x for MI300X/MI325X.--dp <N> --enable-dp-attention for production throughput. A common choice is to set --dp equal to --tp, but this is not required.--reasoning-parser kimi_k2 to separate thinking and content in model outputs.--tool-call-parser kimi_k2 for structured tool calls.--kv-cache-dtype fp8_e4m3 by default and sets --mem-fraction-static 0.8 to fit the INT4 weights plus KV cache. FP8 KV cache trades a small amount of accuracy for memory; omit the flag if you observe accuracy regressions on your workload.See Basic API Usage.
Kimi-K2.7-Code supports native multimodal input with images:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "What is in this image? Describe it in detail."
}
]
}
]
)
print(response.choices[0].message.content)
Output Example:
This image shows a **paper receipt from Auntie Anne's**, the pretzel chain restaurant. Here's a detailed breakdown:
## Header
- At the top left is the Auntie Anne's logo (a pretzel with a halo)
- The store name "**Auntie Anne's**" is printed prominently at the top
- Some text below the store name appears blurred/redacted (likely store location, address, or transaction details)
## Purchase Details
- **Item**: CINNAMON SUGAR
- **Quantity & Price**: 1 × 17,000
- **Item Total**: 17,000
## Financial Summary
- **SUB TOTAL**: 17,000
- **GRAND TOTAL**: 17,000
- **CASH IDR**: 20,000 (customer paid 20,000 Indonesian Rupiah)
- **CHANGE DUE**: 3,000
## Physical Description
- The receipt is printed on white thermal paper
- Some information in the middle section and toward the bottom is intentionally blurred/obscured
- The paper appears slightly curved/wrinkled and is placed on a dark brown surface (likely a table or counter)
The transaction is in **Indonesian Rupiah (IDR)**, indicating this purchase was made at an Auntie Anne's location in Indonesia. The customer bought one Cinnamon Sugar pretzel for 17,000 IDR and received 3,000 IDR in change after paying with 20,000 IDR cash.
Kimi-K2.7-Code forces thinking mode and preserve-thinking behavior.
Thinking Mode (default) — reasoning content is automatically separated:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=[
{"role": "user", "content": "Which one is bigger, 9.11 or 9.9? Think carefully."}
]
)
print("====== Reasoning Content (Thinking Mode) ======")
print(response.choices[0].message.reasoning_content)
print("====== Response (Thinking Mode) ======")
print(response.choices[0].message.content)
Output Example:
====== Reasoning Content (Thinking Mode) ======
The user is asking which number is bigger: 9.11 or 9.9. This seems straightforward, but there's a viral internet debate about this due to decimal confusion.
Let me think carefully:
- 9.11 means 9 + 11/100 = 9.11
- 9.9 means 9 + 9/10 = 9.90
So 9.9 = 9.90, and 9.90 > 9.11 because 0.90 > 0.11.
The confusion often comes from people thinking of software versioning (where 9.11 comes after 9.9) or comparing the numbers after the decimal as whole numbers (11 vs 9, thinking 11 > 9).
So mathematically, 9.9 is clearly bigger. 9.9 - 9.11 = 0.79.
I should explain this clearly and address the common misconception.
====== Response (Thinking Mode) ======
Mathematically, **9.9 is bigger**.
Here's why:
**9.9 = 9.90**
When comparing decimals, you need to look at the same place values:
- 9.11 = 9 ones, 1 tenth, and 1 hundredth
- 9.9 = 9 ones, 9 tenths, and 0 hundredths (9.90)
Since **0.90 > 0.11**, it follows that **9.9 > 9.11**.
The difference is:
9.9 - 9.11 = 0.79
**Why people get confused:** Many mistakenly treat the decimals like whole numbers (thinking "11 is bigger than 9") or confuse this with software version numbering (where version 9.11 comes after version 9.9). But in standard mathematics, 9.9 is definitively larger.
Kimi-K2.7-Code keeps reasoning content across multi-turn interactions. This behavior is enabled by default and cannot be disabled.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
messages = [
{
"role": "user",
"content": "Tell me three random numbers."
},
{
"role": "assistant",
"reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
"content": "473, 921, 235"
},
{
"role": "user",
"content": "What are the other two numbers you have in mind?"
}
]
response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=messages,
stream=False,
max_tokens=4096,
)
print(response.choices[0].message.content)
Some OpenAI-compatible deployments use reasoning instead of reasoning_content in assistant messages. Use the field your serving stack exposes.
Kimi-K2.7-Code supports tool calling capabilities for agentic tasks:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=[
{"role": "user", "content": "What's the weather in Beijing?"}
],
tools=tools,
stream=True
)
# Process streaming response
tool_calls_accumulator = {}
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if hasattr(delta, 'tool_calls') and delta.tool_calls:
for tool_call in delta.tool_calls:
index = tool_call.index
if index not in tool_calls_accumulator:
tool_calls_accumulator[index] = {'name': None, 'arguments': ''}
if tool_call.function:
if tool_call.function.name:
tool_calls_accumulator[index]['name'] = tool_call.function.name
if tool_call.function.arguments:
tool_calls_accumulator[index]['arguments'] += tool_call.function.arguments
if delta.content:
print(delta.content, end="", flush=True)
for index, tool_call in sorted(tool_calls_accumulator.items()):
print(f"Tool Call: {tool_call['name']}")
print(f" Arguments: {tool_call['arguments']}")
Output Example:
Tool Call: get_weather
Arguments: {"location": "Beijing"}
Handling Tool Call Results:
# Send tool result back to the model
messages = [
{"role": "user", "content": "What's the weather in Beijing?"},
{
"role": "assistant",
"content": None,
"tool_calls": [{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"location": "Beijing", "unit": "celsius"}'
}
}]
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": "The weather in Beijing is 22°C and sunny."
}
]
final_response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=messages
)
print(final_response.choices[0].message.content)
Output Example:
The weather in Beijing is currently **22°C and sunny**. ☀️
It's a nice, warm day there—great for being outdoors!
Combine vision understanding with tool calling for advanced agentic tasks:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
tools = [
{
"type": "function",
"function": {
"name": "search_product",
"description": "Search for a product by name or description",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The product name or description to search for"
}
},
"required": ["query"]
}
}
}
]
response = client.chat.completions.create(
model="moonshotai/Kimi-K2.7-Code",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "Can you identify this product and search for similar items?"
}
]
}
],
tools=tools
)
msg = response.choices[0].message
# Print reasoning process
if msg.reasoning_content:
print("=== Reasoning ===")
print(msg.reasoning_content)
# Print response content
if msg.content:
print("=== Content ===")
print(msg.content)
# Print tool calls
if msg.tool_calls:
print("=== Tool Calls ===")
for tc in msg.tool_calls:
print(f" Function: {tc.function.name}")
print(f" Arguments: {tc.function.arguments}")
Output Example:
=== Reasoning ===
The user wants me to identify the product from the receipt and search for similar items. Looking at the receipt, it's from Auntie Anne's and the item purchased is "CINNAMON SUGAR" for 17,000 IDR. This is likely a Cinnamon Sugar Pretzel from Auntie Anne's, which is a popular pretzel chain.
I should search for this product using the search_product function. The query should be something like "Auntie Anne's Cinnamon Sugar Pretzel" or just "Cinnamon Sugar Pretzel" to find similar items.
=== Content ===
Based on the receipt, the product is a **Cinnamon Sugar Pretzel** from **Auntie Anne's** (a popular pretzel bakery chain). The receipt shows it was purchased for 17,000 Indonesian Rupiah (IDR).
Let me search for this product and similar items for you.
=== Tool Calls ===
Function: search_product
Arguments: {"query":"Auntie Anne's Cinnamon Sugar Pretzel"}
Deploy Kimi-K2.7-Code with the following command (H200/B300, reasoning and tool parsing enabled):
sglang serve \
--model-path moonshotai/Kimi-K2.7-Code \
--tp 8 \
--reasoning-parser kimi_k2 \
--tool-call-parser kimi_k2 \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000
For GB300, use --tp 4.
The following results are from the official Kimi-K2.7-Code model card. They were evaluated with thinking mode enabled through Kimi Code CLI at temperature=1.0, top_p=0.95, and a 262,144-token context length unless otherwise stated.