Back to Opik

Opik's MCP server

apps/opik-documentation/documentation/fern/docs-v2/prompt_engineering/mcp_server.mdx

2.0.47-6792-merge-21437.1 KB
Original Source

Opik's MCP server is a Python 3.13+ package that connects your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly to your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.

opik-mcp has been rewritten in Python. If you previously installed the npx-based JavaScript server, replace npx -y opik-mcp with uvx opik-mcp in the snippets below.

Before you start

You will need:

  • OPIK_API_KEY — generate one at comet.com/api/my/settings.
  • COMET_WORKSPACE — the lowercase workspace name from your Comet URL. For example, https://www.comet.com/acme-ai/...COMET_WORKSPACE=acme-ai.
  • uv installed locally. The fastest way is brew install uv (macOS) or curl -LsSf https://astral.sh/uv/install.sh | sh. uvx (bundled with uv) fetches and runs the latest published opik-mcp on demand — no global install required.
<Tip> **Pre-release note:** `opik-mcp` is not yet published to PyPI. Until the first PyPI release lands, replace `uvx opik-mcp` in any snippet on this page with `uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp`. </Tip>

Setting up the MCP server

<Tabs> <Tab title="Claude Code">
Add the server with one command:

```bash
claude mcp add --transport stdio opik-mcp \
  --env OPIK_API_KEY=<your-key> \
  --env COMET_WORKSPACE=<your-workspace> \
  -- uvx opik-mcp
```

Or edit `~/.claude.json` directly:

```json
{
  "mcpServers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}
```

Restart Claude Code, verify with `/mcp` (`opik-mcp` should appear as
connected), and then ask in the chat: **"list my Opik projects"**.

</Tab>
<Tab title="Cursor">

Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (project), or open
**Cmd+Shift+J → Features → Model Context Protocol**:

```json
{
  "mcpServers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}
```

Reload Cursor; the green dot next to `opik-mcp` in the MCP panel confirms
the connection. Ask in chat: **"list my Opik projects"**.

<Tip>
**Cursor 60s timeout.** Cursor enforces a hard tool-call timeout that does
not reset on progress notifications. Long `ask_ollie` turns will fail on
Cursor — see [Known host limits](#known-host-limits).
</Tip>

</Tab>
<Tab title="VS Code Copilot">

Create or open `.vscode/mcp.json` in your workspace (or run the
**MCP: Open User Configuration** command to add it globally):

```json
{
  "servers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}
```

Reload the window. The Copilot Chat **MCP** indicator shows `opik-mcp` once
the server is reachable. Ask in chat: **"list my Opik projects"**.

</Tab>
<Tab title="MCP Inspector">

For manual testing or debugging, run the inspector against `opik-mcp`:

```bash
OPIK_API_KEY=<your-key> COMET_WORKSPACE=<your-workspace> \
  npx @modelcontextprotocol/inspector uvx opik-mcp
```

The inspector opens in your browser and lets you call each tool directly.

</Tab>
</Tabs> <Tip> **Self-hosted Opik.** Add `COMET_URL_OVERRIDE` to the `env` block (and `OPIK_URL` if Opik lives at a non-default path). `ask_ollie` and `run_experiment` are available on Comet Cloud only — on self-hosted those calls fail at dispatch; use `read` / `list` / `write` directly. </Tip>

Using the MCP server

The tools at a glance

ToolPurpose
readUniversal read by id / name / opik:// URI.
listUniversal list with optional name filter and pagination.
ask_ollieInvestigate or synthesize via the Opik in-product assistant.
writeUniversal write — log traces/spans, score, comment, save prompts, manage test suites and experiments.
schemaIntrospect write-operation schemas (used by the LLM to construct valid payloads).
run_experimentRun an evaluation experiment end-to-end via Ollie.

Browsing your workspace

list my Opik projects

what was the most recent trace logged to the "demo" project?

show me trace <trace-id>

Scoring, commenting, saving prompts

score trace <trace-id> 0.9 on helpfulness with reason "great recovery"

comment "retry with temperature=0" on span <span-id>

save the following text as a new version of the "rerank-system" prompt: ...

For the full set of write operations and their payload shapes, ask the host "show me the schema for trace.create" (calls the schema tool) or see the README.

Asking Ollie

For investigative or cross-entity questions:

why are spans in the "demo" project slower this week than last?

compare experiments "rerank-v2" and "rerank-v3" on factuality

ask_ollie returns a thread_id you can pass back on follow-ups to preserve context. For more about Ollie itself, see Ollie. See Ollie & auto-approve below before running write-style prompts in shared workspaces.

Ollie & auto-approve

By default, writes that Ollie performs mid-stream (scores, comments, prompt versions, test-suite items) execute without a per-action confirmation step. Each auto-approved write is logged as a JSON audit row on the opik_mcp.audit Python logger.

To require manual confirmation instead, set OPIK_MCP_AUTO_APPROVE=disabled in the server's env block. Ollie's confirmation requests then surface as typed errors that you can re-issue manually.

ask_ollie and run_experiment are available on Comet Cloud only — on self-hosted those calls fail at dispatch; use read / list / write directly.

Known host limits

  • Cursor enforces a 60-second hard tool-call timeout that does not reset on progress notifications. Long ask_ollie turns will fail on Cursor. For long-running investigations, use Claude Code or VS Code Copilot.

Example conversation

A typical investigative loop using Claude Code:

You: Why did the experiment "gpt-4o-rerank-v3" regress on factuality?

Claude: (calls ask_ollie) Three traces failed because the reranker dropped the system message. The remaining 12 traces scored above 0.8…

You: Score the bottom 3 traces 0.2 with reason "dropped system message".

Claude: (calls write with score.create ×3) Done — three scores recorded on traces <id-1>, <id-2>, <id-3>.