Baseten - Cline — ContextQMD

Baseten provides on-demand frontier model APIs designed for production applications, not just experimentation. Built on the Baseten Inference Stack, these APIs deliver optimized inference for leading open-source models from OpenAI, DeepSeek, Moonshot AI, and Alibaba Cloud.

Website: https://www.baseten.co/products/model-apis/

Getting an API Key

Sign Up/Sign In: Go to Baseten and create an account or sign in.
Navigate to API Keys: Access your dashboard and go to the API Keys section.
Create a Key: Generate a new API key. Give it a descriptive name (e.g., "Cline").
Copy the Key: Copy the API key immediately and store it securely.

Configuration in Cline

Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
Select Provider: Choose "Baseten" from the "API Provider" dropdown.
Enter API Key: Paste your Baseten API key into the "Baseten API Key" field.
Select Model: Choose your desired model from the "Model" dropdown.

IMPORTANT: For Kimi K2 Thinking: To use the moonshotai/Kimi-K2-Thinking model, you must enable Native Tool Call (Experimental) in Cline settings. This setting allows Cline to call tools through their native tool processor and is required for this reasoning model to function properly.

Supported Models

Cline supports all current models under Baseten Model APIs, including: For the most updated pricing, please visit: https://www.baseten.co/products/model-apis/

moonshotai/Kimi-K2-Thinking (Moonshot AI) - Enhanced reasoning capabilities with step-by-step thought processes (262K context) - $0.60/$2.50 per 1M tokens
zai-org/GLM-4.6 (Z AI) - Frontier open model with advanced agentic, reasoning and coding capabilities by Z AI (200k context) $0.60/$2.20 per 1M tokens
moonshotai/Kimi-K2-Instruct-0905 (Moonshot AI) - September update with enhanced capabilities (262K context) - $0.60/$2.50 per 1M tokens
openai/gpt-oss-120b (OpenAI) - 120B MoE with strong reasoning capabilities (128K context) - $0.10/$0.50 per 1M tokens
Qwen/Qwen3-Coder-480B-A35B-Instruct- Advanced coding and reasoning (262K context) - $0.38/$1.53 per 1M tokens
Qwen/Qwen3-235B-A22B-Instruct-2507 - Math and reasoning expert (262K context) - $0.22/$0.80 per 1M tokens
deepseek-ai/DeepSeek-R1 - DeepSeek's first-generation reasoning model (163K context) - $2.55/$5.95 per 1M tokens
deepseek-ai/DeepSeek-R1-0528 - Latest revision of DeepSeek's reasoning model (163K context) - $2.55/$5.95 per 1M tokens
deepseek-ai/DeepSeek-V3-0324 - Fast general-purpose with enhanced reasoning (163K context) - $0.77/$0.77 per 1M tokens
deepseek-ai/DeepSeek-V3.1 - Hybrid reasoning with advanced tool calling (163K context) - $0.50/$1.50 per 1M tokens
deepseek-ai/DeepSeek-V3.2 - Hybrid reasoning with efficient long context scaling (163K context) - $0.30/$0.45 per 1M tokens

Production-First Architecture

Baseten's Model APIs are built for production environments with several key advantages:

Enterprise-Grade Reliability

Four nines of uptime (99.99%) through active-active redundancy
Cloud-agnostic, multi-cluster autoscaling for consistent availability
SOC 2 Type II certified and HIPAA compliant for security requirements

Optimized Performance

Pre-optimized models shipped with the Baseten Inference Stack
Latest-generation GPUs with multi-cloud infrastructure
Ultra-fast inference optimized from the bottom up for production workloads

Cost Efficiency

5-10x less expensive than closed alternatives
Optimized multi-cloud infrastructure for efficient resource utilization
Transparent pricing with no hidden costs or rate limit surprises

Developer Experience

OpenAI compatible API - migrate by swapping a single URL
Drop-in replacement for closed models with comprehensive observability and analytics
Seamless scaling from Model APIs to dedicated deployments

Special Features

Function Calling & Tool Use

All Baseten models support structured outputs, function calling, and tool use as part of the Baseten Inference Stack, making them ideal for agentic applications and coding workflows.

Tips and Notes

Dynamic Model Updates: Cline automatically fetches the latest model list from Baseten, ensuring access to new models as they're released in real time.
Multi-Cloud Capacity Management (MCM): Baseten's multi-cloud infrastructure ensures high availability and low latency globally.
Support: Baseten provides dedicated support for production deployments and can work with you on dedicated resources as you scale.

Pricing Information

Current pricing is highly competitive and transparent. For the most up-to-date pricing, visit the Baseten Model APIs page. Prices typically range from $0.10-$6.00 per million tokens, making Baseten significantly more cost-effective than many closed-model alternatives while providing access to state-of-the-art open-source models.