Call Replicate LLMs using chatGPT Input/Output Format

This tutorial covers using the following Replicate Models with liteLLM

python

# install liteLLM
!pip install litellm

Imports & Set ENV variables Get your Replicate Key: https://replicate.com/account/api-tokens

python

from litellm import completion
import os
os.environ['REPLICATE_API_TOKEN'] = ' ' # @param
user_message = "Hello, whats the weather in San Francisco??"
messages = [{ "content": user_message,"role": "user"}]

Call Replicate Models using completion(model, messages) - chatGPT format

python

llama_2 = "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1"
llama_2_7b = "a16z-infra/llama-2-7b-chat:4f0b260b6a13eb53a6b1891f089d57c08f41003ae79458be5011303d81a394dc"
dolly_v2 = "replicate/dolly-v2-12b:ef0e1aefc61f8e096ebe4db6b2bacc297daf2ef6899f0f7e001ec445893500e5"
vicuna = "replicate/vicuna-13b:6282abe6a492de4145d7bb601023762212f9ddbbe78278bd6771c8b3b2f2a13b"
models = [llama_2, llama_2_7b, dolly_v2, vicuna]
for model in models:
  response = completion(model=model, messages=messages)
  print(f"Response from {model} \n]\n")
  print(response)

python

# @title Stream Responses from Replicate - Outputs in the same format used by chatGPT streaming
response = completion(model=llama_2, messages=messages, stream=True)

for chunk in response:
  print(chunk['choices'][0]['delta'])