Back to Docsgpt

OpenAI-Compatible API

docs/content/Agents/openai-compatible.mdx

0.18.08.8 KB
Original Source

import { Callout, Tabs } from 'nextra/components';

OpenAI-Compatible API

DocsGPT exposes /v1/chat/completions following the standard chat completions protocol. Point any compatible client — opencode, Aider, LibreChat or the OpenAI SDKs — at your DocsGPT Agent by changing only the base URL and API key.

Quick Start

<Tabs items={['Python', 'cURL']}> <Tabs.Tab> ```python from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:7091/v1",   # or https://gptcloud.arc53.com/v1
    api_key="your_agent_api_key",
)

response = client.chat.completions.create(
    model="docsgpt-agent",
    messages=[{"role": "user", "content": "Summarize our refund policy"}],
)
print(response.choices[0].message.content)
```

</Tabs.Tab> <Tabs.Tab> bash curl -X POST http://localhost:7091/v1/chat/completions \ -H "Authorization: Bearer your_agent_api_key" \ -H "Content-Type: application/json" \ -d '{"model":"docsgpt-agent","messages":[{"role":"user","content":"Summarize our refund policy"}]}' </Tabs.Tab> </Tabs>

The model field is accepted but ignored — the agent bound to your API key determines the model. The agent's prompt, sources, tools, and default model are loaded automatically.

Base URL & Auth

EnvironmentBase URL
Localhttp://localhost:7091/v1
Cloudhttps://gptcloud.arc53.com/v1

Authenticate with Authorization: Bearer <agent_api_key>.

Endpoints

MethodPathDescription
POST/v1/chat/completionsChat request (streaming or non-streaming)
GET/v1/modelsList agents available to your key

Streaming

Set "stream": true. You'll receive SSE chunks with choices[0].delta.content. DocsGPT-specific events (sources, tool calls) arrive as extra frames that carry a top-level docsgpt key on an otherwise-empty chunk — standard clients ignore them.

python
stream = client.chat.completions.create(
    model="docsgpt-agent",
    stream=True,
    messages=[{"role": "user", "content": "Explain vector search"}],
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Sampling Parameters

Standard OpenAI sampling parameters are forwarded to the model. When omitted, the agent's configured defaults apply. Supported: temperature, max_tokens (or max_completion_tokens), top_p, frequency_penalty, presence_penalty, stop, seed.

json
{
  "model": "docsgpt-agent",
  "messages": [{"role": "user", "content": "Write a haiku about search"}],
  "temperature": 0.2,
  "max_tokens": 256,
  "seed": 42
}

Structured Output

You can force the model to return JSON matching a schema, using either the OpenAI response_format field or the response_schema convenience field.

<Tabs items={['response_format', 'response_schema']}> <Tabs.Tab> json { "model": "docsgpt-agent", "messages": [{"role": "user", "content": "Extract the order id and total"}], "response_format": { "type": "json_schema", "json_schema": { "name": "order", "strict": true, "schema": { "type": "object", "properties": { "order_id": {"type": "string"}, "total": {"type": "number"} }, "required": ["order_id", "total"] } } } } </Tabs.Tab> <Tabs.Tab> json { "model": "docsgpt-agent", "messages": [{"role": "user", "content": "Extract the order id and total"}], "response_schema": { "type": "object", "properties": { "order_id": {"type": "string"}, "total": {"type": "number"} }, "required": ["order_id", "total"] } } </Tabs.Tab> </Tabs>

  • response_format follows OpenAI Structured Outputs. strict defaults to true; set strict: false to relax enforcement.
  • response_format: {"type": "json_object"} requests JSON without a fixed schema (the model is steered by the prompt).
  • response_schema is a DocsGPT convenience: pass a raw JSON Schema object (or a {"schema": {...}} wrapper) directly.

Multimodal Input (text + images)

User messages may use OpenAI typed-content arrays with image_url parts. Images are forwarded to vision-capable models.

json
{
  "model": "docsgpt-agent",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this screenshot?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/shot.png"}}
      ]
    }
  ]
}

Tool Calling (client-side, stateless)

You can register your own tools and execute them on the client. The flow is stateless — OpenAI clients that don't carry a conversation_id re-send the full message history each turn, and DocsGPT rebuilds the agent from it.

  1. Send a request with a tools array.
  2. If the agent decides to call a tool, the response comes back with finish_reason: "tool_calls" and a tool_calls array (and content: null).
  3. Execute the tool(s) on your side, then re-POST the full message history with the assistant's tool_calls message followed by role: "tool" result messages.
  4. DocsGPT continues the run and returns the final answer.
json
{
  "model": "docsgpt-agent",
  "messages": [
    {"role": "user", "content": "What's the weather in Paris?"},
    {"role": "assistant", "tool_calls": [
      {"id": "call_1", "type": "function",
       "function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}
    ]},
    {"role": "tool", "tool_call_id": "call_1", "content": "18°C, clear"}
  ],
  "tools": [ { "type": "function", "function": { "name": "get_weather", "...": "..." } } ]
}

Reasoning

For models that emit reasoning ("thinking") tokens, the response surfaces them in a non-standard reasoning_content field (a reasoning_content delta when streaming). Standard clients ignore it; clients that understand it can display the model's thinking separately from the answer.

Idempotent Retries

Add an Idempotency-Key header so a retried request returns the stored first response instead of re-running the agent (which would duplicate the answer and double-bill tokens).

bash
curl -X POST http://localhost:7091/v1/chat/completions \
  -H "Authorization: Bearer your_agent_api_key" \
  -H "Idempotency-Key: 8f1c...unique-per-request" \
  -H "Content-Type: application/json" \
  -d '{"model":"docsgpt-agent","messages":[{"role":"user","content":"hi"}]}'
  • Opt-in — no header means today's behavior (every request runs).
  • Non-streaming only — streaming replay is not supported.
  • A completed key replays the cached body (and status) for 24 hours.
  • A request with a key whose first attempt is still in flight returns HTTP 409.
  • Keys are scoped per agent and capped at 256 characters (oversized keys are rejected).

System Prompt Override

System messages are dropped by default — the agent's configured prompt is used. To allow callers to override it, enable Allow prompt override in the agent's Advanced settings.

<Callout type="warning"> When an override is active, the agent's prompt template is replaced wholesale — template variables like `{summaries}` are not substituted. </Callout>

Conversation Persistence

Conversations are always persisted server-side, and the response includes docsgpt.conversation_id. They never appear in the agent owner's sidebar — /v1 traffic is stored hidden, so external clients can't clutter the owner's conversation list.

Stateless tool continuations (no conversation_id, e.g. opencode) skip persistence by default to avoid writing orphan rows; set docsgpt.persist to override. The legacy docsgpt.save_conversation flag from older releases is deprecated and ignored.

DocsGPT Extension Fields

DocsGPT adds an optional docsgpt object to both requests and responses for features outside the OpenAI schema.

Request (docsgpt.*):

FieldDescription
attachmentsList of attachment IDs to include as context for this turn.
persistForce-enable/disable conversation persistence (mainly for stateless tool continuations).

Response (docsgpt.*):

FieldDescription
conversation_idServer-side conversation ID for this exchange.
sourcesRAG sources used to answer.
tool_callsCompleted tool-call results from the run.

When streaming, these arrive on otherwise-empty chunks that carry a top-level docsgpt key, so strict OpenAI clients still validate each frame.

When to Use Native Endpoints Instead

Use /api/answer or /stream if you need server-side attachments, passthrough template variables, explicit conversation_id reuse, or sidebar visibility control via visibility.