docs/examples/llm/ollama_gemma.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/ollama_gemma.ipynb" target="_parent"></a>
First, follow the readme to set up and run a local Ollama instance.
Gemma: a family of lightweight, state-of-the-art open models built by Google DeepMind. Available in 2b and 7b parameter sizes
Ollama: Support both 2b and 7b models
Note: please install ollama>=0.1.26
You can download pre-release version here Ollama
When the Ollama app is running on your local machine:
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
!pip install llama-index-llms-ollama
!pip install llama-index
from llama_index.llms.ollama import Ollama
gemma_2b = Ollama(model="gemma:2b", request_timeout=30.0)
gemma_7b = Ollama(model="gemma:7b", request_timeout=30.0)
resp = gemma_2b.complete("Who is Paul Graham?")
print(resp)
resp = gemma_7b.complete("Who is Paul Graham?")
print(resp)
resp = gemma_2b.complete("Who is owning Tesla?")
print(resp)
resp = gemma_7b.complete("Who is owning Tesla?")
print(resp)
chat with a list of messagesfrom llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = gemma_7b.chat(messages)
print(resp)
Using stream_complete endpoint
response = gemma_7b.stream_complete("Who is Paul Graham?")
for r in response:
print(r.delta, end="")
Using stream_chat endpoint
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = gemma_7b.stream_chat(messages)
for r in resp:
print(r.delta, end="")