🤖 Large language models (LLMs) - Mem0

Overview

Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface.

OpenAI

To use OpenAI LLM models, you have to set the OPENAI_API_KEY environment variable. You can obtain the OpenAI API key from the OpenAI Platform.

Once you have obtained the key, you can use it like this:

python

import os
from embedchain import App

os.environ['OPENAI_API_KEY'] = 'xxx'

app = App()
app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?")

If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a yaml config file.

python

import os
from embedchain import App

os.environ['OPENAI_API_KEY'] = 'xxx'

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: openai
  config:
    model: 'gpt-4o-mini'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

</CodeGroup>

Function Calling

Embedchain supports OpenAI Function calling with a single function. It accepts inputs in accordance with the Langchain interface.

<Accordion title="Pydantic Model"> ```python from pydantic import BaseModel

class multiply(BaseModel): """Multiply two integers together."""

  a: int = Field(..., description="First integer")
  b: int = Field(..., description="Second integer")

</Accordion>

<Accordion title="Python function">
```python
def multiply(a: int, b: int) -> int:
    """Multiply two integers together.

    Args:
        a: First integer
        b: Second integer
    """
    return a * b

</Accordion> <Accordion title="OpenAI tool dictionary"> ```python multiply = { "type": "function", "function": { "name": "multiply", "description": "Multiply two integers together.", "parameters": { "type": "object", "properties": { "a": { "description": "First integer", "type": "integer" }, "b": { "description": "Second integer", "type": "integer" } }, "required": [ "a", "b" ] } } } ``` </Accordion>

With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function.

python

import os
from embedchain import App
from embedchain.llm.openai import OpenAILlm

os.environ["OPENAI_API_KEY"] = "sk-xxx"

llm = OpenAILlm(tools=multiply)
app = App(llm=llm)

result = app.query("What is the result of 125 multiplied by fifteen?")

Google AI

To use Google AI model, you have to set the GOOGLE_API_KEY environment variable. You can obtain the Google API key from the Google Maker Suite

<CodeGroup> ```python main.py import os from embedchain import App

os.environ["GOOGLE_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("What is the net worth of Elon Musk?") if app.llm.config.stream: # if stream is enabled, response is a generator for chunk in response: print(chunk) else: print(response)


```yaml config.yaml
llm:
  provider: google
  config:
    model: gemini-pro
    max_tokens: 1000
    temperature: 0.5
    top_p: 1
    stream: false

embedder:
  provider: google
  config:
    model: 'models/embedding-001'
    task_type: "retrieval_document"
    title: "Embeddings for Embedchain"

</CodeGroup>

Azure OpenAI

To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below:

python

import os
from embedchain import App

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
os.environ["AZURE_OPENAI_KEY"] = "xxx"
os.environ["OPENAI_API_VERSION"] = "xxx"

app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: azure_openai
  config:
    model: gpt-4o-mini
    deployment_name: your_llm_deployment_name
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

embedder:
  provider: azure_openai
  config:
    model: text-embedding-ada-002
    deployment_name: you_embedding_model_deployment_name

</CodeGroup>

You can find the list of models and deployment name on the Azure OpenAI Platform.

Anthropic

To use anthropic's model, please set the ANTHROPIC_API_KEY which you find on their Account Settings Page.

python

import os
from embedchain import App

os.environ["ANTHROPIC_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: anthropic
  config:
    model: 'claude-instant-1'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

</CodeGroup>

Cohere

Install related dependencies using the following command:

bash

pip install --upgrade 'embedchain[cohere]'

Set the COHERE_API_KEY as environment variable which you can find on their Account settings page.

Once you have the API key, you are all set to use it with Embedchain.

python

import os
from embedchain import App

os.environ["COHERE_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: cohere
  config:
    model: large
    temperature: 0.5
    max_tokens: 1000
    top_p: 1

</CodeGroup>

Together

Install related dependencies using the following command:

bash

pip install --upgrade 'embedchain[together]'

Set the TOGETHER_API_KEY as environment variable which you can find on their Account settings page.

Once you have the API key, you are all set to use it with Embedchain.

python

import os
from embedchain import App

os.environ["TOGETHER_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: together
  config:
    model: togethercomputer/RedPajama-INCITE-7B-Base
    temperature: 0.5
    max_tokens: 1000
    top_p: 1

</CodeGroup>

Ollama

Setup Ollama using https://github.com/jmorganca/ollama

python

import os
os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434"
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: ollama
  config:
    model: 'llama2'
    temperature: 0.5
    top_p: 1
    stream: true
    base_url: 'http://localhost:11434'
embedder:
  provider: ollama
  config:
    model: znbang/bge:small-en-v1.5-q8_0
    base_url: http://localhost:11434

</CodeGroup>

vLLM

Setup vLLM by following instructions given in their docs.

python

import os
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: vllm
  config:
    model: 'meta-llama/Llama-2-70b-hf'
    temperature: 0.5
    top_p: 1
    top_k: 10
    stream: true
    trust_remote_code: true

</CodeGroup>

Clarifai

Install related dependencies using the following command:

bash

pip install --upgrade 'embedchain[clarifai]'

set the CLARIFAI_PAT as environment variable which you can find in the security page. Optionally you can also pass the PAT key as parameters in LLM/Embedder class.

Now you are all set with exploring Embedchain.

python

import os
from embedchain import App

os.environ["CLARIFAI_PAT"] = "XXX"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

#Now let's add some data.
app.add("https://www.forbes.com/profile/elon-musk")

#Query the app
response = app.query("what college degrees does elon musk have?")

Head to Clarifai Platform to browse various State-of-the-Art LLM models for your use case. For passing model inference parameters use model_kwargs argument in the config file. Also you can use api_key argument to pass CLARIFAI_PAT in the config.

yaml

llm:
 provider: clarifai
 config:
   model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct"
   model_kwargs:
     temperature: 0.5
     max_tokens: 1000  
embedder:
 provider: clarifai
 config:
   model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15"

</CodeGroup>

GPT4ALL

Install related dependencies using the following command:

bash

pip install --upgrade 'embedchain[opensource]'

GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:

python

from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: gpt4all
  config:
    model: 'orca-mini-3b-gguf2-q4_0.gguf'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

embedder:
  provider: gpt4all

</CodeGroup>

JinaChat

First, set JINACHAT_API_KEY in environment variable which you can obtain from their platform.

Once you have the key, load the app using the config yaml file:

python

import os
from embedchain import App

os.environ["JINACHAT_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: jina
  config:
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

</CodeGroup>

Hugging Face

Install related dependencies using the following command:

bash

pip install --upgrade 'embedchain[huggingface-hub]'

First, set HUGGINGFACE_ACCESS_TOKEN in environment variable which you can obtain from their platform.

You can load the LLMs from Hugging Face using three ways:

Hugging Face Hub
Hugging Face Local Pipelines
Hugging Face Inference Endpoint

Hugging Face Hub

To load the model from Hugging Face Hub, use the following code:

python

import os
from embedchain import App

os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"

config = {
  "app": {"config": {"id": "my-app"}},
  "llm": {
      "provider": "huggingface",
      "config": {
          "model": "bigscience/bloom-1b7",
          "top_p": 0.5,
          "max_length": 200,
          "temperature": 0.1,
      },
  },
}

app = App.from_config(config=config)

</CodeGroup>

Hugging Face Local Pipelines

If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below:

<CodeGroup> ```python main.py from embedchain import App

config = { "app": {"config": {"id": "my-app"}}, "llm": { "provider": "huggingface", "config": { "model": "Trendyol/Trendyol-LLM-7b-chat-v0.1", "local": True, # Necessary if you want to run model locally "top_p": 0.5, "max_tokens": 1000, "temperature": 0.1, }, } } app = App.from_config(config=config)

</CodeGroup>

### Hugging Face Inference Endpoint

You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.

Then, load the app using the config yaml file:

<CodeGroup>

```python main.py
from embedchain import App

config = {
  "app": {"config": {"id": "my-app"}},
  "llm": {
      "provider": "huggingface",
      "config": {
        "endpoint": "https://api-inference.huggingface.co/models/gpt2",
        "model_params": {"temprature": 0.1, "max_new_tokens": 100}
      },
  },
}
app = App.from_config(config=config)

</CodeGroup>

Currently only supports text-generation and text2text-generation for now [ref].

See langchain's hugging face endpoint for more information.

Llama2

Llama2 is integrated through Replicate. Set REPLICATE_API_TOKEN in environment variable which you can obtain from their platform.

Once you have the token, load the app using the config yaml file:

python

import os
from embedchain import App

os.environ["REPLICATE_API_TOKEN"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: llama2
  config:
    model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
    temperature: 0.5
    max_tokens: 1000
    top_p: 0.5
    stream: false

</CodeGroup>

Vertex AI

Setup Google Cloud Platform application credentials by following the instruction on GCP. Once setup is done, use the following code to create an app using VertexAI as provider:

python

from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: vertexai
  config:
    model: 'chat-bison'
    temperature: 0.5
    top_p: 0.5

</CodeGroup>

Mistral AI

Obtain the Mistral AI api key from their console.

python

os.environ["MISTRAL_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("what is the net worth of Elon Musk?")
# As of January 16, 2024, Elon Musk's net worth is $225.4 billion.

response = app.chat("which companies does elon own?")
# Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X.

response = app.chat("what question did I ask you already?")
# You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X.

yaml

llm:
  provider: mistralai
  config:
    model: mistral-tiny
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
embedder:
  provider: mistralai
  config:
    model: mistral-embed

</CodeGroup>

AWS Bedrock

Setup

Before using the AWS Bedrock LLM, make sure you have the appropriate model access from Bedrock Console.
You will also need to authenticate the boto3 client by using a method in the AWS documentation
You can optionally export an AWS_REGION

Usage

python

import os
from embedchain import App

os.environ["AWS_REGION"] = "us-west-2"

app = App.from_config(config_path="config.yaml")

yaml

llm:
  provider: aws_bedrock
  config:
    model: amazon.titan-text-express-v1
    # check notes below for model_kwargs
    model_kwargs:
      temperature: 0.5
      topP: 1
      maxTokenCount: 1000

</CodeGroup> <Note> The model arguments are different for each providers. Please refer to the [AWS Bedrock Documentation](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/providers) to find the appropriate arguments for your model. </Note>

<br/ >

Groq

Groq is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine.

Usage

In order to use LLMs from Groq, go to their platform and get the API key.

Set the API key as GROQ_API_KEY environment variable or pass in your app configuration to use the model as given below in the example.

python

import os
from embedchain import App

# Set your API key here or pass as the environment variable
groq_api_key = "gsk_xxxx"

config = {
    "llm": {
        "provider": "groq",
        "config": {
            "model": "mixtral-8x7b-32768",
            "api_key": groq_api_key,
            "stream": True
        }
    }
}

app = App.from_config(config=config)
# Add your data source here
app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap")
app.query("Write a poem about Embedchain")

# In the realm of data, vast and wide,
# Embedchain stands with knowledge as its guide.
# A platform open, for all to try,
# Building bots that can truly fly.

# With REST API, data in reach,
# Deployment a breeze, as easy as a speech.
# Updating data sources, anytime, anyday,
# Embedchain's power, never sway.

# A knowledge base, an assistant so grand,
# Connecting to platforms, near and far.
# Discord, WhatsApp, Slack, and more,
# Embedchain's potential, never a bore.

</CodeGroup>

NVIDIA AI

NVIDIA AI Foundation Endpoints let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the NVIDIA NGC catalog, fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup.

Usage

In order to use LLMs from NVIDIA AI, create an account on NVIDIA NGC Service.

Generate an API key from their dashboard. Set the API key as NVIDIA_API_KEY environment variable. Note that the NVIDIA_API_KEY will start with nvapi-.

Below is an example of how to use LLM model and embedding model from NVIDIA AI:

python

import os
from embedchain import App

os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx'

config = {
    "app": {
        "config": {
            "id": "my-app",
        },
    },
    "llm": {
        "provider": "nvidia",
        "config": {
            "model": "nemotron_steerlm_8b",
        },
    },
    "embedder": {
        "provider": "nvidia",
        "config": {
            "model": "nvolveqa_40k",
            "vector_dimension": 1024,
        },
    },
}

app = App.from_config(config=config)

app.add("https://www.forbes.com/profile/elon-musk")
answer = app.query("What is the net worth of Elon Musk today?")
# Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies.
# As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors.
# Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio.

</CodeGroup>

Token Usage

You can get the cost of the query by setting token_usage to True in the config file. This will return the token details: prompt_tokens, completion_tokens, total_tokens, total_cost, cost_currency. The list of paid LLMs that support token usage are:

OpenAI
Vertex AI
Anthropic
Cohere
Together
Groq
Mistral AI
NVIDIA AI

Here is an example of how to use token usage: <CodeGroup>

python

os.environ["OPENAI_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("what is the net worth of Elon Musk?")
# {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.',
#   'usage': {'prompt_tokens': 1228,
#   'completion_tokens': 21, 
#   'total_tokens': 1249, 
#   'total_cost': 0.001884, 
#   'cost_currency': 'USD'}
# }


response = app.chat("Which companies did Elon Musk found?")
# {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.',
#   'usage': {'prompt_tokens': 1616,
#   'completion_tokens': 34,
#   'total_tokens': 1650,
#   'total_cost': 0.002492,
#   'cost_currency': 'USD'}
# }

yaml

llm:
  provider: openai
  config:
    model: gpt-4o-mini
    temperature: 0.5
    max_tokens: 1000
    token_usage: true

</CodeGroup>

If a model is missing and you'd like to add it to model_prices_and_context_window.json, please feel free to open a PR.

<br/ >