docs/use-case/batch-inference.md
Run prompts, embeddings, and model scoring over large datasets, then stream the results to durable storage. Daft is a reliable engine to express batch inference pipelines and scale them from your laptop to a distributed cluster.
prompt, embed_text, embed_image) and let Daft handle batching, concurrency, and backpressure.If you’re new to Daft, see the quickstart first. For distributed execution, see our docs on Scaling Out and Deployment.
Daft provides first-class APIs for model inference. Under the hood, Daft pipelines data operations so that reading, inference, and writing overlap automatically, and is optimized for throughput.
import daft
from daft.functions import prompt
(
daft.read_huggingface("fka/awesome-chatgpt-prompts")
.with_column( # Generate model outputs in a new column
"output",
prompt(
daft.col("prompt"),
model="gpt-5", # Any chat/completions-capable model
provider="openai", # Switch providers by changing this; e.g. to "vllm"
max_output_tokens=256, # OpenAI Provider uses Responses API by default
),
)
.write_parquet("output.parquet/", write_mode="overwrite") # Write to Parquet as the pipeline runs
)
What this does:
prompt() to express inference.import daft
from daft.ai.provider import load_provider
from daft.functions.ai import embed_text
provider = load_provider("lm_studio")
model = "text-embedding-nomic-embed-text-v1.5"
(
daft.read_huggingface("Open-Orca/OpenOrca")
.with_column("embedding", embed_text(daft.col("response"), provider=provider, model=model))
.show()
)
Notes:
Turn on distributed execution with a single line; then run the same script on a Ray cluster.
import daft
daft.set_runner_ray() # Enable Daft's distributed runner
Daft partitions the data, schedules remote execution, and orchestrates your workload across the cluster. No pipeline rewrites.
For inspiration and real-world scale:
Ready to explore Daft further? Check out these topics: