llama-index-integrations/llms/llama-index-llms-friendli/README.md
Install the required Python packages:
%pip install llama-index-llms-friendli
!pip install llama-index
Set the Friendli token as an environment variable:
%env FRIENDLI_TOKEN=your_token_here
To generate a chat response, use the following code:
from llama_index.llms.friendli import Friendli
from llama_index.core.llms import ChatMessage, MessageRole
llm = Friendli()
message = ChatMessage(role=MessageRole.USER, content="Tell me a joke.")
resp = llm.chat([message])
print(resp)
To stream chat responses in real-time:
resp = llm.stream_chat([message])
for r in resp:
print(r.delta, end="")
For asynchronous chat interactions, use the following:
resp = await llm.achat([message])
print(resp)
To handle async streaming of chat responses:
resp = await llm.astream_chat([message])
async for r in resp:
print(r.delta, end="")
To generate a completion based on a prompt:
prompt = "Draft a cover letter for a role in software engineering."
resp = llm.complete(prompt)
print(resp)
To stream completions in real-time:
resp = llm.stream_complete(prompt)
for r in resp:
print(r.delta, end="")
To handle async completions:
resp = await llm.acomplete(prompt)
print(resp)
For async streaming of completions:
resp = await llm.astream_complete(prompt)
async for r in resp:
print(r.delta, end="")
To configure a specific model:
llm = Friendli(model="llama-2-70b-chat")
resp = llm.chat([message])
print(resp)