Back to Scalar

Your API? 0.2% of your context window

documentation/blog/2026-03-05-agent-scalar.md

latest9.9 KB
Original Source

Your API? 0.2% of your context window

Make your AI Agents talk to APIs is easy and fun. It's just that bigger APIs are a worst-case workload for agents. Raw API definitions in the prompt overflows context so easily. I mean, we tried that. But all we got is hallucinated endpoints and unreliable responses.

That's something MCP helps with. You just wrap your API as MCP tools, and you're good to go. That's what we thought. Actually, the repeated schemas come with a cost (tokens), too.

I'm glad to say, we found something better. It's called Agent and our newest product, tightly integrated with our set of tools for your API. And here is why it might be better than your average MCP server:

Agent keeps the tool surface fixed and super small. It fetches just-in-time details. The result is smaller context, fewer steps, and better routing. And happy agents. :-)

Get Started

<scalar-button title="Chat with Agent" href="https://agent.scalar.com" icon="phosphor/regular/chat-circle-dots"> </scalar-button>

<scalar-button title="Create your own MCP" href="https://dashboard.scalar.com/register" icon="phosphor/regular/cpu"> </scalar-button>

The problem

If you dump the full OpenAPI document into the prompt, you often blow past the model's context window before the model can do any work. That's exactly what happens with the, for example, the Zoom Meetings API.

Even when the API is smaller, like Notions API, raw OpenAPI is so expensive: It works, sure, but you pay a steep token tax every run. MCP reduces cost a lot, but native MCP still carries schema tokens for every single endpoint.

Agent collapses that into three tools and pulls only the schema it needs.

Benchmarking what we built

We ran a few benchmarks to test Agent with real-world APIs. And the results are so good, but take a look yourself:

Benchmark setup

We ran identical tasks across three approaches:

  1. Raw OpenAPI documents in prompt
  2. Native MCP server (one tool per endpoint)
  3. Agent (3 tools: summarize, search and execute)

We used Zoom Meetings API (list, create, update) and Notions API (search, create page, get workspace). Example Notion prompts are aligned with Notion's MCP tools guide.

Token counting was done with tiktoken.

Zoom Mettings

Summary

ModeTaskRunsSuccessAvg TokensAvg LatencyAvg Steps
raw-openapiList upcoming meetings10%3857972205 ms0.0
native-mcpList upcoming meetings1100%2017947887 ms6.0
agent-scalarList upcoming meetings1100%553123029 ms2.0
raw-openapiCreate a meeting10%3857912011 ms0.0
native-mcpCreate a meeting1100%9547446529 ms6.0
agent-scalarCreate a meeting1100%3932720637 ms2.0
raw-openapiUpdate a meeting10%3857921978 ms0.0
native-mcpUpdate a meeting1100%9540645189 ms6.0
agent-scalarUpdate a meeting1100%<span style="background-color: yellow; color: black;">12674</span>13367 ms2.0

Schema Cost (200k context)

ApproachToolsToken costSchema TokensAll-in TokensContext used (200k)
Raw OpenAPI Spec in prompt--2956560295656147.8%
Native MCP (full schemas)183892818953017881189.4%
Native MCP (required params only)1835504895309503447.5%
Agent (MCP tools)3412412412<span style="background-color: yellow; color: black;">0.2%</span>

Notion

Summary

ModeTaskRunsSuccessAvg TokensAvg LatencyAvg Steps
raw-openapiSearch for budget approval docs1100%95819114858 ms1.0
native-mcpSearch for budget approval docs1100%1472539663 ms6.0
agent-scalarSearch for budget approval docs1100%187351474 ms3.0
raw-openapiCreate project kickoff page1100%9647975803 ms1.0
native-mcpCreate project kickoff page1100%1464351163 ms6.0
agent-scalarCreate project kickoff page1100%1454103488 ms2.0
raw-openapiWhich workspace am I connected to?1100%9549022175 ms1.0
native-mcpWhich workspace am I connected to?1100%1438527013 ms6.0
agent-scalarWhich workspace am I connected to?1100%<span style="background-color: yellow; color: black;">1206</span>27580 ms3.0

Schema Cost (200k context)

ApproachToolsToken costSchema TokensAll-in TokensContext used (200k)
Raw OpenAPI Spec in prompt--6911406911434.6%
Native MCP (full schemas)26210412803149077.5%
Native MCP (required params only)2682912803136326.8%
Agent (MCP tools)3400400400<span style="background-color: yellow; color: black;">0.2%</span>

The Results

Guess who's the clear winner with just 0.2% of your context window: Agent. Why is that?

  1. Native MCP scales with endpoint count. Agent does not. Three tools cover the entire API.

  2. Instead of loading the full API definition upfront, the agent calls a tiny version, with just the endpoints and schemas it needs.

  3. The schema footprint is tiny (hundreds of tokens) even for large APIs like the Zoom Meetings API.

How it works

  1. Upload your OpenAPI document to Scalar.
  2. Scalar augments it for search and execution.
  3. Agents connect via MCP with three tools:
    • summarize-openapi-specs (short summary of specs and available endpoints)
    • search-openapi-operations (minified OpenAPI documents for the endpoints matching the user's search), and
    • execute-request

Try it

You can use Agent in two ways:

  1. Chat UI: upload your OpenAPI and chat at agent.scalar.com.
  2. Agent SDK: connect to Scalar MCP servers from your agent runtime (Vercel AI SDK, OpenAI Agents SDK, Anthropic Claude SDK).
ts
import { Agent, MCPServerStreamableHttp, run } from '@openai/agents'
import { agentScalar } from '@scalar-org/agent-sdk'

const scalar = agentScalar({
  agentKey: 'YOUR_AGENT_KEY',
})

const session = await scalar.session()

const serverOptions = session.createOpenAIMCPServerOptions()
const servers = serverOptions.map((opts) => new MCPServerStreamableHttp(opts))
await Promise.all(servers.map((s) => s.connect()))

const agent = new Agent({
  name: 'api-agent',
  instructions: 'You help users interact with APIs.',
  mcpServers: servers,
})

const result = await run(agent, 'pls list available endpoints in the zoom api thanks')

await Promise.all(servers.map((s) => s.close()))

Agent is able to scale across N APIs that you want your agent to have access to, without flooding the context window and yielding the most accurate tool calling. Crazy, eh?

Mar 5, 2026