docs/content/Agents/openai-compatible.mdx
import { Callout, Tabs } from 'nextra/components';
DocsGPT exposes /v1/chat/completions following the standard chat completions protocol. Point any compatible client — opencode, Aider, LibreChat or the OpenAI SDKs — at your DocsGPT Agent by changing only the base URL and API key.
<Tabs items={['Python', 'cURL']}> <Tabs.Tab> ```python from openai import OpenAI
client = OpenAI(
base_url="http://localhost:7091/v1", # or https://gptcloud.arc53.com/v1
api_key="your_agent_api_key",
)
response = client.chat.completions.create(
model="docsgpt-agent",
messages=[{"role": "user", "content": "Summarize our refund policy"}],
)
print(response.choices[0].message.content)
```
</Tabs.Tab>
<Tabs.Tab>
bash curl -X POST http://localhost:7091/v1/chat/completions \ -H "Authorization: Bearer your_agent_api_key" \ -H "Content-Type: application/json" \ -d '{"model":"docsgpt-agent","messages":[{"role":"user","content":"Summarize our refund policy"}]}'
</Tabs.Tab>
</Tabs>
The model field is accepted but ignored — the agent bound to your API key determines the model. The agent's prompt, sources, tools, and default model are loaded automatically.
| Environment | Base URL |
|---|---|
| Local | http://localhost:7091/v1 |
| Cloud | https://gptcloud.arc53.com/v1 |
Authenticate with Authorization: Bearer <agent_api_key>.
| Method | Path | Description |
|---|---|---|
POST | /v1/chat/completions | Chat request (streaming or non-streaming) |
GET | /v1/models | List agents available to your key |
Set "stream": true. You'll receive SSE chunks with choices[0].delta.content. DocsGPT-specific events (sources, tool calls) arrive as extra frames that carry a top-level docsgpt key on an otherwise-empty chunk — standard clients ignore them.
stream = client.chat.completions.create(
model="docsgpt-agent",
stream=True,
messages=[{"role": "user", "content": "Explain vector search"}],
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Standard OpenAI sampling parameters are forwarded to the model. When omitted, the agent's configured defaults apply. Supported: temperature, max_tokens (or max_completion_tokens), top_p, frequency_penalty, presence_penalty, stop, seed.
{
"model": "docsgpt-agent",
"messages": [{"role": "user", "content": "Write a haiku about search"}],
"temperature": 0.2,
"max_tokens": 256,
"seed": 42
}
You can force the model to return JSON matching a schema, using either the OpenAI response_format field or the response_schema convenience field.
<Tabs items={['response_format', 'response_schema']}>
<Tabs.Tab>
json { "model": "docsgpt-agent", "messages": [{"role": "user", "content": "Extract the order id and total"}], "response_format": { "type": "json_schema", "json_schema": { "name": "order", "strict": true, "schema": { "type": "object", "properties": { "order_id": {"type": "string"}, "total": {"type": "number"} }, "required": ["order_id", "total"] } } } }
</Tabs.Tab>
<Tabs.Tab>
json { "model": "docsgpt-agent", "messages": [{"role": "user", "content": "Extract the order id and total"}], "response_schema": { "type": "object", "properties": { "order_id": {"type": "string"}, "total": {"type": "number"} }, "required": ["order_id", "total"] } }
</Tabs.Tab>
</Tabs>
response_format follows OpenAI Structured Outputs. strict defaults to true; set strict: false to relax enforcement.response_format: {"type": "json_object"} requests JSON without a fixed schema (the model is steered by the prompt).response_schema is a DocsGPT convenience: pass a raw JSON Schema object (or a {"schema": {...}} wrapper) directly.User messages may use OpenAI typed-content arrays with image_url parts. Images are forwarded to vision-capable models.
{
"model": "docsgpt-agent",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this screenshot?"},
{"type": "image_url", "image_url": {"url": "https://example.com/shot.png"}}
]
}
]
}
You can register your own tools and execute them on the client. The flow is stateless — OpenAI clients that don't carry a conversation_id re-send the full message history each turn, and DocsGPT rebuilds the agent from it.
tools array.finish_reason: "tool_calls" and a tool_calls array (and content: null).tool_calls message followed by role: "tool" result messages.{
"model": "docsgpt-agent",
"messages": [
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "tool_calls": [
{"id": "call_1", "type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}
]},
{"role": "tool", "tool_call_id": "call_1", "content": "18°C, clear"}
],
"tools": [ { "type": "function", "function": { "name": "get_weather", "...": "..." } } ]
}
For models that emit reasoning ("thinking") tokens, the response surfaces them in a non-standard reasoning_content field (a reasoning_content delta when streaming). Standard clients ignore it; clients that understand it can display the model's thinking separately from the answer.
Add an Idempotency-Key header so a retried request returns the stored first response instead of re-running the agent (which would duplicate the answer and double-bill tokens).
curl -X POST http://localhost:7091/v1/chat/completions \
-H "Authorization: Bearer your_agent_api_key" \
-H "Idempotency-Key: 8f1c...unique-per-request" \
-H "Content-Type: application/json" \
-d '{"model":"docsgpt-agent","messages":[{"role":"user","content":"hi"}]}'
System messages are dropped by default — the agent's configured prompt is used. To allow callers to override it, enable Allow prompt override in the agent's Advanced settings.
<Callout type="warning"> When an override is active, the agent's prompt template is replaced wholesale — template variables like `{summaries}` are not substituted. </Callout>Conversations are always persisted server-side, and the response includes docsgpt.conversation_id. They never appear in the agent owner's sidebar — /v1 traffic is stored hidden, so external clients can't clutter the owner's conversation list.
Stateless tool continuations (no conversation_id, e.g. opencode) skip persistence by default to avoid writing orphan rows; set docsgpt.persist to override. The legacy docsgpt.save_conversation flag from older releases is deprecated and ignored.
DocsGPT adds an optional docsgpt object to both requests and responses for features outside the OpenAI schema.
Request (docsgpt.*):
| Field | Description |
|---|---|
attachments | List of attachment IDs to include as context for this turn. |
persist | Force-enable/disable conversation persistence (mainly for stateless tool continuations). |
Response (docsgpt.*):
| Field | Description |
|---|---|
conversation_id | Server-side conversation ID for this exchange. |
sources | RAG sources used to answer. |
tool_calls | Completed tool-call results from the run. |
When streaming, these arrive on otherwise-empty chunks that carry a top-level docsgpt key, so strict OpenAI clients still validate each frame.
Use /api/answer or /stream if you need server-side attachments, passthrough template variables, explicit conversation_id reuse, or sidebar visibility control via visibility.