embedchain/docs/components/llms.mdx
Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface.
<CardGroup cols={4}> <Card title="OpenAI" href="#openai"></Card> <Card title="Google AI" href="#google-ai"></Card> <Card title="Azure OpenAI" href="#azure-openai"></Card> <Card title="Anthropic" href="#anthropic"></Card> <Card title="Cohere" href="#cohere"></Card> <Card title="Together" href="#together"></Card> <Card title="Ollama" href="#ollama"></Card> <Card title="vLLM" href="#vllm"></Card> <Card title="Clarifai" href="#clarifai"></Card> <Card title="GPT4All" href="#gpt4all"></Card> <Card title="JinaChat" href="#jinachat"></Card> <Card title="Hugging Face" href="#hugging-face"></Card> <Card title="Llama2" href="#llama2"></Card> <Card title="Vertex AI" href="#vertex-ai"></Card> <Card title="Mistral AI" href="#mistral-ai"></Card> <Card title="AWS Bedrock" href="#aws-bedrock"></Card> <Card title="Groq" href="#groq"></Card> <Card title="NVIDIA AI" href="#nvidia-ai"></Card> </CardGroup>To use OpenAI LLM models, you have to set the OPENAI_API_KEY environment variable. You can obtain the OpenAI API key from the OpenAI Platform.
Once you have obtained the key, you can use it like this:
import os
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
app = App()
app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?")
If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a yaml config file.
<CodeGroup>import os
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: openai
config:
model: 'gpt-4o-mini'
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
Embedchain supports OpenAI Function calling with a single function. It accepts inputs in accordance with the Langchain interface.
<Accordion title="Pydantic Model"> ```python from pydantic import BaseModelclass multiply(BaseModel): """Multiply two integers together."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")
</Accordion>
<Accordion title="Python function">
```python
def multiply(a: int, b: int) -> int:
"""Multiply two integers together.
Args:
a: First integer
b: Second integer
"""
return a * b
With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function.
import os
from embedchain import App
from embedchain.llm.openai import OpenAILlm
os.environ["OPENAI_API_KEY"] = "sk-xxx"
llm = OpenAILlm(tools=multiply)
app = App(llm=llm)
result = app.query("What is the result of 125 multiplied by fifteen?")
To use Google AI model, you have to set the GOOGLE_API_KEY environment variable. You can obtain the Google API key from the Google Maker Suite
os.environ["GOOGLE_API_KEY"] = "xxx"
app = App.from_config(config_path="config.yaml")
app.add("https://www.forbes.com/profile/elon-musk")
response = app.query("What is the net worth of Elon Musk?") if app.llm.config.stream: # if stream is enabled, response is a generator for chunk in response: print(chunk) else: print(response)
```yaml config.yaml
llm:
provider: google
config:
model: gemini-pro
max_tokens: 1000
temperature: 0.5
top_p: 1
stream: false
embedder:
provider: google
config:
model: 'models/embedding-001'
task_type: "retrieval_document"
title: "Embeddings for Embedchain"
To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below:
<CodeGroup>import os
from embedchain import App
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
os.environ["AZURE_OPENAI_KEY"] = "xxx"
os.environ["OPENAI_API_VERSION"] = "xxx"
app = App.from_config(config_path="config.yaml")
llm:
provider: azure_openai
config:
model: gpt-4o-mini
deployment_name: your_llm_deployment_name
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: azure_openai
config:
model: text-embedding-ada-002
deployment_name: you_embedding_model_deployment_name
You can find the list of models and deployment name on the Azure OpenAI Platform.
To use anthropic's model, please set the ANTHROPIC_API_KEY which you find on their Account Settings Page.
import os
from embedchain import App
os.environ["ANTHROPIC_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: anthropic
config:
model: 'claude-instant-1'
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
Install related dependencies using the following command:
pip install --upgrade 'embedchain[cohere]'
Set the COHERE_API_KEY as environment variable which you can find on their Account settings page.
Once you have the API key, you are all set to use it with Embedchain.
<CodeGroup>import os
from embedchain import App
os.environ["COHERE_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: cohere
config:
model: large
temperature: 0.5
max_tokens: 1000
top_p: 1
Install related dependencies using the following command:
pip install --upgrade 'embedchain[together]'
Set the TOGETHER_API_KEY as environment variable which you can find on their Account settings page.
Once you have the API key, you are all set to use it with Embedchain.
<CodeGroup>import os
from embedchain import App
os.environ["TOGETHER_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: together
config:
model: togethercomputer/RedPajama-INCITE-7B-Base
temperature: 0.5
max_tokens: 1000
top_p: 1
Setup Ollama using https://github.com/jmorganca/ollama
<CodeGroup>import os
os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434"
from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: ollama
config:
model: 'llama2'
temperature: 0.5
top_p: 1
stream: true
base_url: 'http://localhost:11434'
embedder:
provider: ollama
config:
model: znbang/bge:small-en-v1.5-q8_0
base_url: http://localhost:11434
Setup vLLM by following instructions given in their docs.
<CodeGroup>import os
from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: vllm
config:
model: 'meta-llama/Llama-2-70b-hf'
temperature: 0.5
top_p: 1
top_k: 10
stream: true
trust_remote_code: true
Install related dependencies using the following command:
pip install --upgrade 'embedchain[clarifai]'
set the CLARIFAI_PAT as environment variable which you can find in the security page. Optionally you can also pass the PAT key as parameters in LLM/Embedder class.
Now you are all set with exploring Embedchain.
<CodeGroup>import os
from embedchain import App
os.environ["CLARIFAI_PAT"] = "XXX"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
#Now let's add some data.
app.add("https://www.forbes.com/profile/elon-musk")
#Query the app
response = app.query("what college degrees does elon musk have?")
Head to Clarifai Platform to browse various State-of-the-Art LLM models for your use case.
For passing model inference parameters use model_kwargs argument in the config file. Also you can use api_key argument to pass CLARIFAI_PAT in the config.
llm:
provider: clarifai
config:
model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct"
model_kwargs:
temperature: 0.5
max_tokens: 1000
embedder:
provider: clarifai
config:
model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15"
Install related dependencies using the following command:
pip install --upgrade 'embedchain[opensource]'
GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:
<CodeGroup>from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: gpt4all
config:
model: 'orca-mini-3b-gguf2-q4_0.gguf'
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: gpt4all
First, set JINACHAT_API_KEY in environment variable which you can obtain from their platform.
Once you have the key, load the app using the config yaml file:
<CodeGroup>import os
from embedchain import App
os.environ["JINACHAT_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: jina
config:
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
Install related dependencies using the following command:
pip install --upgrade 'embedchain[huggingface-hub]'
First, set HUGGINGFACE_ACCESS_TOKEN in environment variable which you can obtain from their platform.
You can load the LLMs from Hugging Face using three ways:
To load the model from Hugging Face Hub, use the following code:
<CodeGroup>import os
from embedchain import App
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
config = {
"app": {"config": {"id": "my-app"}},
"llm": {
"provider": "huggingface",
"config": {
"model": "bigscience/bloom-1b7",
"top_p": 0.5,
"max_length": 200,
"temperature": 0.1,
},
},
}
app = App.from_config(config=config)
If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below:
<CodeGroup> ```python main.py from embedchain import Appconfig = { "app": {"config": {"id": "my-app"}}, "llm": { "provider": "huggingface", "config": { "model": "Trendyol/Trendyol-LLM-7b-chat-v0.1", "local": True, # Necessary if you want to run model locally "top_p": 0.5, "max_tokens": 1000, "temperature": 0.1, }, } } app = App.from_config(config=config)
</CodeGroup>
### Hugging Face Inference Endpoint
You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.
Then, load the app using the config yaml file:
<CodeGroup>
```python main.py
from embedchain import App
config = {
"app": {"config": {"id": "my-app"}},
"llm": {
"provider": "huggingface",
"config": {
"endpoint": "https://api-inference.huggingface.co/models/gpt2",
"model_params": {"temprature": 0.1, "max_new_tokens": 100}
},
},
}
app = App.from_config(config=config)
Currently only supports text-generation and text2text-generation for now [ref].
See langchain's hugging face endpoint for more information.
Llama2 is integrated through Replicate. Set REPLICATE_API_TOKEN in environment variable which you can obtain from their platform.
Once you have the token, load the app using the config yaml file:
<CodeGroup>import os
from embedchain import App
os.environ["REPLICATE_API_TOKEN"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: llama2
config:
model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
temperature: 0.5
max_tokens: 1000
top_p: 0.5
stream: false
Setup Google Cloud Platform application credentials by following the instruction on GCP. Once setup is done, use the following code to create an app using VertexAI as provider:
<CodeGroup>from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
llm:
provider: vertexai
config:
model: 'chat-bison'
temperature: 0.5
top_p: 0.5
Obtain the Mistral AI api key from their console.
<CodeGroup>os.environ["MISTRAL_API_KEY"] = "xxx"
app = App.from_config(config_path="config.yaml")
app.add("https://www.forbes.com/profile/elon-musk")
response = app.query("what is the net worth of Elon Musk?")
# As of January 16, 2024, Elon Musk's net worth is $225.4 billion.
response = app.chat("which companies does elon own?")
# Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X.
response = app.chat("what question did I ask you already?")
# You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X.
llm:
provider: mistralai
config:
model: mistral-tiny
temperature: 0.5
max_tokens: 1000
top_p: 1
embedder:
provider: mistralai
config:
model: mistral-embed
boto3 client by using a method in the AWS documentationAWS_REGIONimport os
from embedchain import App
os.environ["AWS_REGION"] = "us-west-2"
app = App.from_config(config_path="config.yaml")
llm:
provider: aws_bedrock
config:
model: amazon.titan-text-express-v1
# check notes below for model_kwargs
model_kwargs:
temperature: 0.5
topP: 1
maxTokenCount: 1000
<br/ >
Groq is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine.
In order to use LLMs from Groq, go to their platform and get the API key.
Set the API key as GROQ_API_KEY environment variable or pass in your app configuration to use the model as given below in the example.
import os
from embedchain import App
# Set your API key here or pass as the environment variable
groq_api_key = "gsk_xxxx"
config = {
"llm": {
"provider": "groq",
"config": {
"model": "mixtral-8x7b-32768",
"api_key": groq_api_key,
"stream": True
}
}
}
app = App.from_config(config=config)
# Add your data source here
app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap")
app.query("Write a poem about Embedchain")
# In the realm of data, vast and wide,
# Embedchain stands with knowledge as its guide.
# A platform open, for all to try,
# Building bots that can truly fly.
# With REST API, data in reach,
# Deployment a breeze, as easy as a speech.
# Updating data sources, anytime, anyday,
# Embedchain's power, never sway.
# A knowledge base, an assistant so grand,
# Connecting to platforms, near and far.
# Discord, WhatsApp, Slack, and more,
# Embedchain's potential, never a bore.
NVIDIA AI Foundation Endpoints let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the NVIDIA NGC catalog, fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup.
In order to use LLMs from NVIDIA AI, create an account on NVIDIA NGC Service.
Generate an API key from their dashboard. Set the API key as NVIDIA_API_KEY environment variable. Note that the NVIDIA_API_KEY will start with nvapi-.
Below is an example of how to use LLM model and embedding model from NVIDIA AI:
<CodeGroup>import os
from embedchain import App
os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx'
config = {
"app": {
"config": {
"id": "my-app",
},
},
"llm": {
"provider": "nvidia",
"config": {
"model": "nemotron_steerlm_8b",
},
},
"embedder": {
"provider": "nvidia",
"config": {
"model": "nvolveqa_40k",
"vector_dimension": 1024,
},
},
}
app = App.from_config(config=config)
app.add("https://www.forbes.com/profile/elon-musk")
answer = app.query("What is the net worth of Elon Musk today?")
# Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies.
# As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors.
# Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio.
You can get the cost of the query by setting token_usage to True in the config file. This will return the token details: prompt_tokens, completion_tokens, total_tokens, total_cost, cost_currency.
The list of paid LLMs that support token usage are:
Here is an example of how to use token usage: <CodeGroup>
os.environ["OPENAI_API_KEY"] = "xxx"
app = App.from_config(config_path="config.yaml")
app.add("https://www.forbes.com/profile/elon-musk")
response = app.query("what is the net worth of Elon Musk?")
# {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.',
# 'usage': {'prompt_tokens': 1228,
# 'completion_tokens': 21,
# 'total_tokens': 1249,
# 'total_cost': 0.001884,
# 'cost_currency': 'USD'}
# }
response = app.chat("Which companies did Elon Musk found?")
# {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.',
# 'usage': {'prompt_tokens': 1616,
# 'completion_tokens': 34,
# 'total_tokens': 1650,
# 'total_cost': 0.002492,
# 'cost_currency': 'USD'}
# }
llm:
provider: openai
config:
model: gpt-4o-mini
temperature: 0.5
max_tokens: 1000
token_usage: true
If a model is missing and you'd like to add it to model_prices_and_context_window.json, please feel free to open a PR.
<br/ >
<Snippet file="missing-llm-tip.mdx" />