docs/examples/llm/monsterapi.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/monsterapi.ipynb" target="_parent"></a>
MonsterAPI Hosts wide range of popular LLMs as inference service and this notebook serves as a tutorial about how to use llama-index to access MonsterAPI LLMs.
Check us out here: https://monsterapi.ai/
Install Required Libraries
%pip install llama-index-llms-monsterapi
!python3 -m pip install llama-index --quiet -y
!python3 -m pip install monsterapi --quiet
!python3 -m pip install sentence_transformers --quiet
Import required modules
import os
from llama_index.llms.monsterapi import MonsterLLM
from llama_index.core.embeddings import resolve_embed_model
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
Sign up on MonsterAPI and get a free auth key. Paste it below:
os.environ["MONSTER_API_KEY"] = ""
Set the model
model = "meta-llama/Meta-Llama-3-8B-Instruct"
Initiate LLM module
llm = MonsterLLM(model=model, temperature=0.75)
result = llm.complete("Who are you?")
print(result)
from llama_index.core.llms import ChatMessage
# Construct mock Chat history
history_message = ChatMessage(
**{
"role": "user",
"content": (
"When asked 'who are you?' respond as 'I am qblocks llm model'"
" everytime."
),
}
)
current_message = ChatMessage(**{"role": "user", "content": "Who are you?"})
response = llm.chat([history_message, current_message])
print(response)
##RAG Approach to import external knowledge into LLM as context
Source Paper: https://arxiv.org/pdf/2005.11401.pdf
Retrieval-Augmented Generation (RAG) is a method that uses a combination of pre-defined rules or parameters (non-parametric memory) and external information from the internet (parametric memory) to generate responses to questions or create new ones. By lever
Install pypdf library needed to install pdf parsing library
!python3 -m pip install pypdf --quiet
Lets try to augment our LLM with RAG source paper PDF as external information. Lets download the pdf into data dir
!rm -r ./data
!mkdir -p data&&cd data&&curl 'https://arxiv.org/pdf/2005.11401.pdf' -o "RAG.pdf"
Load the document
documents = SimpleDirectoryReader("./data").load_data()
Initiate LLM and Embedding Model
llm = MonsterLLM(model=model, temperature=0.75, context_window=1024)
embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
splitter = SentenceSplitter(chunk_size=1024)
Create embedding store and create index
index = VectorStoreIndex.from_documents(
documents, transformations=[splitter], embed_model=embed_model
)
query_engine = index.as_query_engine(llm=llm)
Actual LLM output without RAG:
response = llm.complete("What is Retrieval-Augmented Generation?")
print(response)
LLM Output with RAG
response = query_engine.query("What is Retrieval-Augmented Generation?")
print(response)
Monster Deploy enables you to host any vLLM supported large language model (LLM) like Tinyllama, Mixtral, Phi-2 etc as a rest API endpoint on MonsterAPI's cost optimised GPU cloud.
With MonsterAPI's integration in Llama index, you can use your deployed LLM API endpoints to create RAG system or RAG bot for use cases such as:
Once deployment is launched use the base_url and api_auth_token once deployment is live and use them below.
Note: When using LLama index to access Monster Deploy LLMs, you need to create a prompt with required template and send compiled prompt as input.
See LLama Index Prompt Template Usage example section for more details.
see here for more details
Once deployment is launched use the base_url and api_auth_token once deployment is live and use them below.
Note: When using LLama index to access Monster Deploy LLMs, you need to create a prompt with reqhired template and send compiled prompt as input. see section LLama Index Prompt Template Usage example for more details.
deploy_llm = MonsterLLM(
model="<Replace with basemodel used to deploy>",
api_base="https://ecc7deb6-26e0-419b-a7f2-0deb934af29a.monsterapi.ai",
api_key="a0f8a6ba-c32f-4407-af0c-169f1915490c",
temperature=0.75,
)
deploy_llm.complete("What is Retrieval-Augmented Generation?")
from llama_index.core.llms import ChatMessage
# Construct mock Chat history
history_message = ChatMessage(
**{
"role": "user",
"content": (
"When asked 'who are you?' respond as 'I am qblocks llm model'"
" everytime."
),
}
)
current_message = ChatMessage(**{"role": "user", "content": "Who are you?"})
response = deploy_llm.chat([history_message, current_message])
print(response)