Monster API <> LLamaIndex

MonsterAPI Hosts wide range of popular LLMs as inference service and this notebook serves as a tutorial about how to use llama-index to access MonsterAPI LLMs.

Check us out here: https://monsterapi.ai/

Install Required Libraries

python

%pip install llama-index-llms-monsterapi

python

!python3 -m pip install llama-index --quiet -y
!python3 -m pip install monsterapi --quiet
!python3 -m pip install sentence_transformers --quiet

Import required modules

python

import os

from llama_index.llms.monsterapi import MonsterLLM
from llama_index.core.embeddings import resolve_embed_model
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

Set Monster API Key env variable

python

os.environ["MONSTER_API_KEY"] = ""

Basic Usage Pattern

Set the model

python

model = "meta-llama/Meta-Llama-3-8B-Instruct"

Initiate LLM module

python

llm = MonsterLLM(model=model, temperature=0.75)

Completion Example

python

result = llm.complete("Who are you?")
print(result)

Chat Example

python

from llama_index.core.llms import ChatMessage

# Construct mock Chat history
history_message = ChatMessage(
    **{
        "role": "user",
        "content": (
            "When asked 'who are you?' respond as 'I am qblocks llm model'"
            " everytime."
        ),
    }
)
current_message = ChatMessage(**{"role": "user", "content": "Who are you?"})

response = llm.chat([history_message, current_message])
print(response)

##RAG Approach to import external knowledge into LLM as context

Source Paper: https://arxiv.org/pdf/2005.11401.pdf

Retrieval-Augmented Generation (RAG) is a method that uses a combination of pre-defined rules or parameters (non-parametric memory) and external information from the internet (parametric memory) to generate responses to questions or create new ones. By lever

Install pypdf library needed to install pdf parsing library

python

!python3 -m pip install pypdf --quiet

Lets try to augment our LLM with RAG source paper PDF as external information. Lets download the pdf into data dir

python

!rm -r ./data
!mkdir -p data&&cd data&&curl 'https://arxiv.org/pdf/2005.11401.pdf' -o "RAG.pdf"

Load the document

python

documents = SimpleDirectoryReader("./data").load_data()

Initiate LLM and Embedding Model

python

llm = MonsterLLM(model=model, temperature=0.75, context_window=1024)
embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
splitter = SentenceSplitter(chunk_size=1024)

Create embedding store and create index

python

index = VectorStoreIndex.from_documents(
    documents, transformations=[splitter], embed_model=embed_model
)
query_engine = index.as_query_engine(llm=llm)

Actual LLM output without RAG:

python

response = llm.complete("What is Retrieval-Augmented Generation?")
print(response)

LLM Output with RAG

python

response = query_engine.query("What is Retrieval-Augmented Generation?")
print(response)

LLM with RAG using our Monster Deploy service

Monster Deploy enables you to host any vLLM supported large language model (LLM) like Tinyllama, Mixtral, Phi-2 etc as a rest API endpoint on MonsterAPI's cost optimised GPU cloud.

With MonsterAPI's integration in Llama index, you can use your deployed LLM API endpoints to create RAG system or RAG bot for use cases such as:

Answering questions on your documents
Improving the content of your documents
Finding context of importance in your documents

Once deployment is launched use the base_url and api_auth_token once deployment is live and use them below.

Note: When using LLama index to access Monster Deploy LLMs, you need to create a prompt with required template and send compiled prompt as input. See LLama Index Prompt Template Usage example section for more details.

see here for more details

Once deployment is launched use the base_url and api_auth_token once deployment is live and use them below.

Note: When using LLama index to access Monster Deploy LLMs, you need to create a prompt with reqhired template and send compiled prompt as input. see section LLama Index Prompt Template Usage example for more details.

python

deploy_llm = MonsterLLM(
    model="<Replace with basemodel used to deploy>",
    api_base="https://ecc7deb6-26e0-419b-a7f2-0deb934af29a.monsterapi.ai",
    api_key="a0f8a6ba-c32f-4407-af0c-169f1915490c",
    temperature=0.75,
)

General Usage Pattern

python

deploy_llm.complete("What is Retrieval-Augmented Generation?")

Chat Example

python

from llama_index.core.llms import ChatMessage

# Construct mock Chat history
history_message = ChatMessage(
    **{
        "role": "user",
        "content": (
            "When asked 'who are you?' respond as 'I am qblocks llm model'"
            " everytime."
        ),
    }
)
current_message = ChatMessage(**{"role": "user", "content": "Who are you?"})

response = deploy_llm.chat([history_message, current_message])
print(response)