docs/ai-functions/providers.md
Providers in Daft give you flexible control over where and how your AI inference runs. Whether you're using commercial APIs like OpenAI, running local models with LM Studio, or leveraging open-source models from Hugging Face, providers offer a unified interface for configuration.
This guide covers:
The simplest way to use a provider is with daft.set_provider(), which configures a provider globally for your Daft session:
import os
import daft
# Set OpenAI Provider for Global Daft Session
daft.set_provider(
"openai",
api_key=os.environ.get("OPENAI_API_KEY"),
timeout=30.0,
max_retries=3
)
# Retrieve Provider for an AI Function
provider = daft.get_provider("openai")
Daft supports the following AI Providers:
For more control, you can create named providers within a session. This is useful when working with OpenAI-compatible APIs like OpenRouter, or when you need to switch between different configurations.
import os
import daft
from daft.ai.openai.provider import OpenAIProvider
sess = daft.Session()
openrouter_provider = OpenAIProvider(
name="OpenRouter",
base_url="https://openrouter.ai/api/v1",
api_key=os.environ.get("OPENROUTER_API_KEY")
)
sess.attach_provider(openrouter_provider) # Register the provider
sess.set_provider("OpenRouter") # Set as default for the session
# Retrieve Provider for an AI Function
provider = sess.get_provider("OpenRouter")
The Google provider enables you to use Google's Gemini models for text generation, multimodal processing, and structured outputs.
import daft
import os
# Set up the Google provider with your API key
with daft.session() as session:
session.set_provider("google", api_key=os.environ["GOOGLE_API_KEY"])
# Create a DataFrame with questions
df = daft.from_pydict({
"question": [
"What is the capital of France?",
"Explain quantum computing in simple terms.",
"What are the benefits of fusion energy?"
]
})
# Use the prompt function with Google's Gemini model
df = df.with_column(
"answer",
daft.functions.prompt(
daft.col("question"),
provider="google",
model="gemini-2.5-flash" # or "gemini-3-pro-preview"
)
)
df.show()
The Google provider supports all the same features as other providers:
For complex workflows, you might need to use different providers for different tasks—for example, using GPT-5 for validation while using a cheaper model for initial classification. Daft makes it easy to manage multiple providers in a single session.
import os
from dotenv import load_dotenv
import daft
from daft import Session
from daft.ai.openai.provider import OpenAIProvider
from daft.functions import prompt, format, unnest
from pydantic import BaseModel, Field
# Load environment variables
load_dotenv()
class Anime(BaseModel):
show: str = Field(description="The name of the anime show")
character: str = Field(description="The name of the character who says the quote")
explanation: str = Field(description="Why the character says the quote")
openai_provider = OpenAIProvider(
name="OAI_DEV",
api_key=os.environ.get("OPENAI_API_KEY_DEV")
)
openrouter_provider = OpenAIProvider(
name="OpenRouter",
base_url="https://openrouter.ai/api/v1",
api_key=os.environ.get("OPENROUTER_API_KEY")
)
# Create a session and register both providers
sess = Session()
sess.attach_provider(openai_provider)
sess.attach_provider(openrouter_provider)
sess.set_provider("OpenRouter") # Set OpenRouter as default
# Create a dataframe with the quotes
df = daft.from_pydict({
"quote": [
"I am going to be the king of the pirates!",
"I'm going to be the next Hokage!",
],
})
# Use different providers for different tasks in the same pipeline
df = (
df
.with_column(
"nemotron-response",
prompt(
daft.col("quote"),
system_message="You are an anime expert. Classify the anime based on the text and returns the name, character, and quote.",
return_format=Anime,
provider=sess.get_provider("OpenRouter"), # Use OpenRouter for initial classification
model="nvidia/nemotron-nano-9b-v2:free"
)
)
.select("quote", unnest(daft.col("nemotron-response")))
.with_column(
"gpt-5-response",
prompt(
format("""Does the quote "{}" match the assigned anime series name {} and attributed character {}""", daft.col("quote"), daft.col("show"), daft.col("character")),
system_message="Validate whether the user prompt is correct or not.",
return_format=Anime,
provider=sess.get_provider("OAI_DEV"), # Use OpenAI for validation
model="gpt-5"
)
)
)
df.show(format="fancy", max_width=120)
This example demonstrates:
!!! tip "When to Use Multiple Providers"
Multiple providers are useful when you need to:
- **Balance cost and quality**: Use cheaper models for bulk processing, premium models for critical tasks
- **Ensure redundancy**: Fall back to alternative providers if one fails
- **Compare model outputs**: Run the same prompt through different models for quality assessment
- **Manage rate limits**: Distribute load across multiple API keys or providers
You can use OpenAI-compatible providers with the prompt function like OpenRouter, HuggingFace Inference Providers, Databricks, and more.
import os
import daft
# For OpenRouter
# See: https://openrouter.ai/docs/quickstart#using-the-openai-sdk
daft.set_provider(
"openai",
base_url="https://openrouter.ai/api/v1",
api_key=os.getenv("OPENROUTER_API_KEY")
)
# For Hugging Face Inference Providers
# See: https://huggingface.co/inference/get-started
daft.set_provider(
"openai",
base_url="https://router.huggingface.co/v1",
api_key=os.getenv("HF_TOKEN")
)
# For Databricks
# How to get your Databricks token: https://docs.databricks.com/en/dev-tools/auth/pat.html
# More info on model serving: https://docs.databricks.com/aws/en/machine-learning/model-serving/score-foundation-models
daft.set_provider(
"openai",
base_url="https://<workspace-id>.cloud.databricks.com/serving-endpoints",
api_key=os.getenv("DATABRICKS_TOKEN"),
)
For vLLM Online Serving, you can set the provider as demonstrated in the example below.
Note: vLLM's OpenAI-compatible endpoint expects requests in the Chat Completions API format. Daft's prompt() function defaults to using the newer Responses API format, which is not supported by vLLM. To ensure compatibility, set use_chat_completions=True in prompt() so that your requests use the Chat Completions API format required by vLLM.
import os
import daft
# For vLLM Online Serving
daft.set_provider(
"openai",
api_key="none",
base_url="http://localhost:8000/v1",
) # Make sure to set use_chat_completions=True in prompt()
df = df.with_column(
"response",
prompt(
daft.col("input"),
use_chat_completions=True,
model="google/gemma-3-4b-it",
)
)
df.show()
The following sample cli launch command is optimized for Google Colab's A100 High-RAM instance and Google's gemma-3-4b-it model.
python -m vllm.entrypoints.openai.api_server \
--model google/gemma-3-4b-it \
--enable-chunked-prefill \
--guided-decoding-backend guidance \
--dtype bfloat16 \
--gpu-memory-utilization 0.85 \
--host 0.0.0.0 --port 8000
api_key = "none" and base_url = "http://0.0.0.0:8000/v1"guided-decoding-backend guidance is required for structured outputs.