docs/content/Sources/Per-source-configuration.mdx
import { Callout } from 'nextra/components' import Image from 'next/image'
Every source in DocsGPT carries its own behavior contract — a small config object that controls how that source is chunked when it is ingested and how it is retrieved when you ask a question. This lets you tune each source independently: a large reference manual can use a different chunking strategy and retriever than a short FAQ.
You edit this config from a source's settings in the UI (shown below), or through the API. The same options are also available in Advanced settings when you first upload a document.
<Image src="/sources-settings-screen.png" alt="Source settings panel showing Retrieval options (retriever, top-k, score threshold, rephrase, exposure, prescreen) and Chunking options (strategy, max/min tokens, duplicate headers)" width={633} height={862} />
<Callout type="info" emoji="ℹ️"> Per-source retrieval is enabled by default. Operators can turn it off instance-wide with `PER_SOURCE_RETRIEVAL_ENABLED=false`, in which case all sources fall back to the classic retriever regardless of their stored config. </Callout>The config has two groups of settings that differ in when they take effect:
| Group | When it applies | Re-ingest needed? |
|---|---|---|
Retrieval (retrieval.*) | Query time — applied live on the next question | No |
Chunking (chunking.*) | Ingest time — baked into the stored chunks | Yes |
Changing a retrieval setting takes effect immediately. Changing a chunking setting only affects documents ingested after the change, so you must re-ingest the source to apply it to existing content. The API response includes a requires_reingest flag to make this explicit.
Chunking decides how a document is split into the pieces that get embedded and stored.
{
"chunking": {
"strategy": "classic_chunk",
"max_tokens": 1250,
"min_tokens": 150,
"duplicate_headers": false
}
}
| Field | Default | Description |
|---|---|---|
strategy | classic_chunk | Which chunking algorithm to use (see below). |
max_tokens | 1250 | Upper bound on chunk size in tokens. |
min_tokens | 150 | Lower bound; small fragments are merged up to this size. |
duplicate_headers | false | Repeat section headers into each child chunk for context. |
| Strategy | Behavior |
|---|---|
classic_chunk | The default token-window splitter. An empty config reproduces DocsGPT's historical chunking byte-for-byte. |
recursive | Recursive character/token splitter that tries to break on natural boundaries (paragraphs, sentences). |
markdown | Splits along Markdown structure (headings, sections) — good for docs and wikis. |
parent_child | Embeds small child chunks for precise matching but carries a larger parent window in metadata, so the model still sees surrounding context. |
semantic | Embeds sentences and splits where meaning shifts (at the 95th-percentile cosine-distance gap between adjacent sentences), falling back to recursive on failure. Produces topically coherent chunks at the cost of extra embedding calls during ingest. |
Retrieval decides which chunks are pulled in to answer a question. These settings apply live.
{
"retrieval": {
"retriever": "classic",
"exposure": "prefetch",
"chunks": 2,
"score_threshold": null,
"rephrase_query": true,
"prescreen": null
}
}
| Field | Default | Description |
|---|---|---|
retriever | classic | Retrieval strategy: classic, hybrid, or graphrag. |
exposure | prefetch | How retrieved context reaches the model: prefetch or agentic_tool (see below). |
chunks | 2 | Final number of chunks (top-k) returned to the answer. Range 1–500. |
score_threshold | null | Minimum similarity score. Honored by pgvector and MongoDB Atlas; other stores ignore it. |
rephrase_query | true | Whether to run a query-rephrasing side-call before retrieval. |
prescreen | null | Optional LLM relevance filter (see below). null = off. |
classic — Vector similarity search. The default and a safe choice for any vector store.hybrid — Fuses vector search with full-text keyword search using Reciprocal Rank Fusion, which improves recall for exact terms, codes, and names that pure vector search can miss.graphrag — Knowledge-graph retrieval. Set indirectly when you enable GraphRAG on a source. See GraphRAG.Operators can restrict which retrievers are usable instance-wide with the RETRIEVERS_ENABLED setting; a per-source retriever value must be within that allow-list.
exposure controls how a source's content is delivered to the model:
prefetch (default) — DocsGPT retrieves the top chunks up front and injects them into the prompt before the model answers. Best for focused Q&A over a source.agentic_tool — The source is exposed to the model as a search tool it can call on demand, deciding when and what to look up (browse-as-you-go) rather than receiving a bulk prefetch. This is the default exposure for Wiki sources.Pre-screening adds an optional map-reduce step between retrieval and answering: a base retriever fetches a wider set of candidates, an LLM screens them in batches, and only the most relevant survivors are passed to the answer. It improves precision on noisy sources at the cost of extra query-time LLM calls, so it is off by default.
{
"retrieval": {
"chunks": 8,
"prescreen": {
"candidate_k": 40,
"batch_size": 10,
"max_keep": 8,
"model": null
}
}
}
| Field | Default | Description |
|---|---|---|
candidate_k | 40 | Candidates fetched before screening. Must be >= chunks. |
batch_size | 10 | Candidates screened per LLM call. |
max_keep | 8 | Survivors kept after screening. Must be <= candidate_k. |
model | null | Model used for screening. null reuses the request's resolved model. |
The config is edited with a PATCH to the source's config endpoint:
curl -X PATCH https://your-docsgpt/api/sources/<source_id>/config \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"retrieval": { "retriever": "hybrid", "chunks": 4 },
"chunking": { "strategy": "semantic" }
}'
The response echoes the stored config and a requires_reingest flag:
{
"success": true,
"config": { "...": "..." },
"requires_reingest": true
}
Notes:
400 (strict validation on write).kind field (classic / wiki / graphrag) cannot be changed through this endpoint — converting a source to a Wiki or enabling GraphRAG uses dedicated endpoints.editor grant; viewers receive 403.