docs/source/en/community_integrations/mlx.md
MLX is an array framework for machine learning on Apple silicon that also works with CUDA. On Apple silicon, arrays stay in shared memory to avoid data copies between CPU and GPU. Lazy computation enables graph manipulation and optimizations. Native safetensors support means Transformers language models run directly on MLX.
Install the mlx-lm library.
pip install mlx-lm transformers
Load any Transformers language model from the Hub as long as the model architecture is supported. No weight conversion is required.
from mlx_lm import load, generate
model, tokenizer = load("openai/gpt-oss-20b")
output = generate(
model,
tokenizer,
prompt="The capital of France is",
max_tokens=100,
)
print(output)
[!TIP] The MLX Transformers integration is bidirectional. Transformers can also load and run MLX weights from the Hub.