docs/source/en/model_doc/exaone_moe.md
This model was released on 2025-12-31 and added to Hugging Face Transformers on 2026-02-04.
K-EXAONE model is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture named EXAONE-MoE, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
For more details, please refer to the technical report and GitHub.
All model weights including quantized version are available at Huggingface Collections.
For tasks that require accurate results, you can run the K-EXAONE model in reasoning mode as below.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/K-EXAONE-236B-A23B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "You are K-EXAONE, a large language model developed by LG AI Research in South Korea, built to serve as a helpful and reliable assistant."},
{"role": "user", "content": "Which one is bigger, 3.9 vs 3.12?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
enable_thinking=True, # skippable (default: True)
)
generated_ids = model.generate(
**input_ids.to(model.device),
max_new_tokens=16384,
temperature=1.0,
top_p=0.95,
)
output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
For tasks where latency matters more than accuracy, you can run the K-EXAONE model in non-reasoning mode as below.
messages = [
{"role": "system", "content": "You are K-EXAONE, a large language model developed by LG AI Research in South Korea, built to serve as a helpful and reliable assistant."},
{"role": "user", "content": "Explain how wonderful you are"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
enable_thinking=False,
)
generated_ids = model.generate(
**input_ids.to(model.device),
max_new_tokens=1024,
temperature=1.0,
top_p=0.95,
)
output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
For your AI-powered agent, you can leverage K-EXAONE’s tool calling capability. The K-EXAONE model is compatible with both OpenAI and HuggingFace tool calling specifications. The example below demonstrates tool calling using HuggingFace’s docstring-to-tool-schema utility.
Please check the example file for an example of a search agent conversation using K-EXAONE.
from transformers.utils import get_json_schema
def roll_dice(max_num: int):
"""
Roll a dice with the number 1 to N. User can select the number N.
Args:
max_num: The maximum number on the dice.
"""
return random.randint(1, max_num)
tool_schema = get_json_schema(roll_dice)
tools = [tool_schema]
messages = [
{"role": "system", "content": "You are K-EXAONE, a large language model developed by LG AI Research in South Korea, built to serve as a helpful and reliable assistant."},
{"role": "user", "content": "Roll a D20 twice and sum the results."}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
tools=tools,
)
generated_ids = model.generate(
**input_ids.to(model.device),
max_new_tokens=16384,
temperature=1.0,
top_p=0.95,
)
output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
[[autodoc]] ExaoneMoeConfig
[[autodoc]] ExaoneMoeModel - forward
[[autodoc]] ExaoneMoeForCausalLM - forward