Back to Llama Index

Ollama - Gemma

docs/examples/llm/ollama_gemma.ipynb

0.14.212.7 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/ollama_gemma.ipynb" target="_parent"></a>

Ollama - Gemma

Setup

First, follow the readme to set up and run a local Ollama instance.

Gemma: a family of lightweight, state-of-the-art open models built by Google DeepMind. Available in 2b and 7b parameter sizes

Ollama: Support both 2b and 7b models

Note: please install ollama>=0.1.26 You can download pre-release version here Ollama

When the Ollama app is running on your local machine:

  • All of your local models are automatically served on localhost:11434
  • Select your model when setting llm = Ollama(..., model="<model family>:<version>")
  • Increase defaullt timeout (30 seconds) if needed setting Ollama(..., request_timeout=300.0)
  • If you set llm = Ollama(..., model="<model family") without a version it will simply look for latest

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
!pip install llama-index-llms-ollama
python
!pip install llama-index
python
from llama_index.llms.ollama import Ollama
python
gemma_2b = Ollama(model="gemma:2b", request_timeout=30.0)
gemma_7b = Ollama(model="gemma:7b", request_timeout=30.0)
python
resp = gemma_2b.complete("Who is Paul Graham?")
print(resp)
python
resp = gemma_7b.complete("Who is Paul Graham?")
print(resp)
python
resp = gemma_2b.complete("Who is owning Tesla?")
print(resp)
python
resp = gemma_7b.complete("Who is owning Tesla?")
print(resp)

Call chat with a list of messages

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = gemma_7b.chat(messages)
python
print(resp)

Streaming

Using stream_complete endpoint

python
response = gemma_7b.stream_complete("Who is Paul Graham?")
python
for r in response:
    print(r.delta, end="")

Using stream_chat endpoint

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = gemma_7b.stream_chat(messages)
python
for r in resp:
    print(r.delta, end="")