Back to Llama Index

Replicate - Llama 2 13B

docs/examples/llm/llama_2.ipynb

0.14.212.3 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/llama_2.ipynb" target="_parent"></a>

Replicate - Llama 2 13B

Setup

Make sure you have the REPLICATE_API_TOKEN environment variable set.
If you don't have one yet, go to https://replicate.com/ to obtain one.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-replicate
python
!pip install llama-index
python
import os
python
os.environ["REPLICATE_API_TOKEN"] = "<your API key>"

Basic Usage

We showcase the "llama13b-v2-chat" model, which you can play with directly at: https://replicate.com/a16z-infra/llama13b-v2-chat

python
from llama_index.llms.replicate import Replicate

llm = Replicate(
    model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5"
)

Call complete with a prompt

python
resp = llm.complete("Who is Paul Graham?")
python
print(resp)

Call chat with a list of messages

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
python
print(resp)

Streaming

Using stream_complete endpoint

python
response = llm.stream_complete("Who is Paul Graham?")
python
for r in response:
    print(r.delta, end="")

Using stream_chat endpoint

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
python
for r in resp:
    print(r.delta, end="")

Configure Model

python
from llama_index.llms.replicate import Replicate

llm = Replicate(
    model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5",
    temperature=0.9,
    context_window=32,
)
python
resp = llm.complete("Who is Paul Graham?")
python
print(resp)