docs/examples/llm/neutrino.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/neutrino.ipynb" target="_parent"></a>
Neutrino lets you intelligently route queries to the best-suited LLM for the prompt, maximizing performance while optimizing for costs and latency.
Check us out at: <a href="https://www.neutrinoapp.com/">neutrinoapp.com</a> Docs: <a href="https://docs.neutrinoapp.com/">docs.neutrinoapp.com</a> Create an API key: <a href="https://platform.neutrinoapp.com/">platform.neutrinoapp.com</a>
%pip install llama-index-llms-neutrino
!pip install llama-index
You can create an API key at: <a href="https://platform.neutrinoapp.com/">platform.neutrinoapp.com</a>
import os
os.environ["NEUTRINO_API_KEY"] = "<your-neutrino-api-key>"
A router is a collection of LLMs that you can route queries to. You can create a router in the Neutrino <a href="https://platform.neutrinoapp.com/">dashboard</a> or use the default router, which includes all supported models. You can treat a router as a LLM.
from llama_index.llms.neutrino import Neutrino
from llama_index.core.llms import ChatMessage
llm = Neutrino(
# api_key="<your-neutrino-api-key>",
# router="<your-router-id>" # (or 'default')
)
response = llm.complete("In short, a Neutrino is")
print(f"Optimal model: {response.raw['model']}")
print(response)
message = ChatMessage(
role="user",
content="Explain the difference between statically typed and dynamically typed languages.",
)
resp = llm.chat([message])
print(f"Optimal model: {resp.raw['model']}")
print(resp)
message = ChatMessage(
role="user", content="What is the approximate population of Mexico?"
)
resp = llm.stream_chat([message])
for i, r in enumerate(resp):
if i == 0:
print(f"Optimal model: {r.raw['model']}")
print(r.delta, end="")