document/content/guide/dataset/dataset_engine.en.mdx
FastGPT uses an Embedding-based RAG approach for its knowledge base. To use FastGPT effectively, you need a basic understanding of how Embedding vectors work and their characteristics.
Human text, images, and other media cannot be directly understood by computers. To determine whether two pieces of text are similar or related, they typically need to be converted into a computer-readable format — vectors are one such method.
A vector is essentially an array of numbers. The "distance" between two vectors can be calculated using mathematical formulas — the smaller the distance, the more similar the vectors. This maps back to text, images, and other media to measure similarity between them. Vector search leverages this principle.
Since text comes in many types with countless combinations, exact matching is hard to guarantee when converting to vectors for similarity comparison. In vector-based knowledge bases, a top-k recall approach is typically used — finding the top k most similar results and passing them to an LLM for further semantic evaluation, logical reasoning, and summarization, enabling knowledge base Q&A. This makes vector search the most critical step in the process.
Many factors affect vector search accuracy, including: vector model quality, data quality (length, completeness, diversity), and retriever precision (the speed vs. accuracy tradeoff). Search query quality is equally important.
Retriever precision is relatively straightforward to address, and training vector models is more complex, so optimizing data and query quality becomes a key focus.
index content by reducing vector content length: Shorter, more precise index content improves search accuracy, though it may narrow the search scope. Best suited for scenarios requiring strict answers.index quantity: Add multiple index entries for the same chunk to improve recall.In FastGPT, a knowledge base consists of three parts: libraries, collections, and data entries. A collection can be thought of as a "file." A library can contain multiple collections, and a collection can contain multiple data entries. The smallest searchable unit is the library — searches span the entire library. Collections are only for organizing and managing data and do not affect search results (at least for now).
FastGPT uses PostgreSQL's PG Vector extension as the vector retriever, with HNSW indexing. PostgreSQL is used solely for vector search (this engine can be swapped for other databases), while MongoDB handles all other data storage.
In MongoDB's dataset.datas collection, vector source data is stored along with an indexes field that records corresponding vector IDs. This is an array, meaning a single data entry can map to multiple vectors. In addition to default text indexes, image content can also generate image description indexes or image vector indexes when the configured models support it.
In PostgreSQL, a vector field stores the vectors. During search, vectors are recalled first, then their IDs are used to look up the original data in MongoDB. If multiple vectors map to the same source data, they are merged and the highest vector score is used.
In a single vector, content length and semantic richness are often at odds. FastGPT uses multi-vector mapping to map a single data entry to multiple vectors, preserving both data completeness and semantic richness.
You can add multiple vectors to a longer text so that if any one vector is matched during search, the entire data entry is recalled.
This means you can continuously improve data chunk accuracy through annotation.
A Knowledge Base search is not simply "user question -> vector database -> result." Depending on the input and search parameters, FastGPT combines text, images, semantic recall, full-text recall, query optimization, and reranking, then fuses multiple result paths into the final quoted content.
Query Optimization for coreference resolution and query expansion, improving multi-turn conversation search capability and semantic richness.Semantic Search, Full-Text Search, or Hybrid Search to recall candidate content.RRF (Reciprocal Rank Fusion) to merge results from multiple search channels.Rerank for secondary sorting to improve text result relevance.In Knowledge Base search, images can participate in retrieval in addition to text questions. FastGPT handles images differently depending on the configured model capabilities.
Image search mainly works in two ways:
Image search is not a separate system outside the Knowledge Base. It adds an image-input path to the existing Knowledge Base search pipeline.
Common usage patterns include:
Image search quality usually depends on image clarity, whether the image content is easy for the model to understand, whether a vision model is configured, and whether the embedding model supports image vectors.
Whether an image can be retrieved does not only depend on uploading an image at search time. It also depends on which indexes were created during ingestion:
| Knowledge Base capability | Text-only query | Image-only query | Text + image query |
|---|---|---|---|
| Regular embedding model, no vision model | Normal text retrieval | Usually unavailable | Mainly uses the text part |
| Regular embedding model with a vision model | Normal text retrieval | Converts the image into a description, then uses text retrieval | Text + image description participate in retrieval |
| Image-capable embedding model, no vision model | Normal text retrieval | Image vector retrieval | Text retrieval + image vector retrieval |
| Image-capable embedding model with a vision model | Text retrieval, including image descriptions | Image description + image vector retrieval | Text + image description + image vector retrieval |
So when image-to-image search performs poorly, do not only adjust search parameters. Also check whether the Knowledge Base is configured with a vision model or an image-capable embedding model, and whether valid image indexes were generated during ingestion.
FastGPT fuses results from different recall paths instead of using only one path. Common paths include text vector recall, full-text recall, image description recall, image vector recall, and reranked results.
Keep these points in mind:
Semantic Search relies more on vector similarity and is better for natural-language questions and semantically related content.Full-Text Search relies more on keyword matches and is better for IDs, model numbers, proper nouns, error codes, and other exact queries.Hybrid Search uses both semantic recall and full-text recall, then merges the results with RRF.Rerank re-sorts candidate text results and works best when the question is clear and there are enough candidates.This means final quoted content may not be strictly sorted by a single vector similarity score. Content matched by multiple recall paths is usually more likely to rank higher.
Semantic search calculates the vector distance between the user's query and knowledge base content to determine "similarity" — mathematical similarity, not linguistic.
Pros:
Cons:
Uses traditional full-text search. Best for finding key subjects, predicates, and other specific terms.
Combines vector search and full-text search, merging results using the RRF formula. Generally produces richer and more accurate results.
Since hybrid search covers a large range and cannot directly filter by similarity, a rerank model is typically used to re-sort results and filter by rerank scores.
Uses a ReRank model to re-sort search results. In most cases, this significantly improves accuracy. Rerank models work better with complete questions (with proper subjects and predicates), so query optimization is usually applied before search and reranking. Reranking produces a score between 0-1 representing the relevance between the search content and the query — this score is typically more accurate than vector similarity scores and can be used for filtering.
FastGPT uses RRF to merge rerank results, vector search results, and full-text search results into the final output.
The maximum number of tokens to reference per search.
Instead of using top k, we found that in mixed knowledge bases (Q&A + document), different chunk lengths vary significantly, making top k results unstable. Using a token limit provides more consistent control.
A value between 0-1 that filters out low-relevance search results.
This only takes effect when using Semantic Search or Result Reranking.
Note that minimum relevance is a filtering threshold, not the final sorting rule. After query optimization, hybrid search, image search, or result reranking is enabled, final results may be fused from multiple recall paths and may not be strictly sorted by a single vector similarity score.
In RAG, we need to perform embedding searches against the database based on the input query to find similar content (i.e., knowledge base search).
During search — especially in multi-turn conversations — follow-up questions often fail to find relevant content because knowledge base search only uses the "current" question. Consider this example:
When the user asks "What's the second point?", the system searches for "What's the second point?" in the knowledge base, which returns nothing useful. The actual query should be "What is the QA structure?". This is why we need a Query Optimization module to complete the user's current question, enabling the knowledge base search to find relevant content. Here's the result after optimization:
Before performing data retrieval, the model first performs coreference resolution and query expansion. This resolves ambiguous references and enriches the query's semantic content. You can view the optimized query in the conversation details after each interaction.
Query Optimization adds an extra model call before the actual search. It often improves retrieval in multi-turn conversations, but it also increases total latency. If the current question is already clear, or response speed is more important, decide whether to enable it based on actual results.
If search results are not as expected, start from the symptom. Avoid changing every parameter at once.
| Symptom | What to check or adjust first |
|---|---|
| No results found | Confirm the data has finished training; lower the minimum relevance; increase the reference limit; check whether the question is too short or missing a subject |
| Results are too broad or off-topic | Raise the minimum relevance; reduce the reference limit; improve chunking; check whether recalled content contains too many unrelated chunks |
| IDs, model numbers, or proper nouns are inaccurate | Use full-text or hybrid search; reduce semantic search weight; avoid overusing query optimization for exact ID queries |
| Natural-language questions do not retrieve well | Use semantic or hybrid search; enable query optimization; add more accurate indexes to the data |
| Query optimization makes search slower | Query optimization adds an extra model call. Use a faster optimization model, or enable it only for follow-up questions and short queries |
| Rerank still gives poor ordering | Check whether the user question is complete; make sure enough candidates are recalled; adjust minimum relevance and reference limit |
| Image-to-image search is weak | Confirm the embedding model supports image input; confirm image vector indexes were generated during ingestion; check whether the image is clear and has an obvious subject |
| Text + image search is unstable | Clarify whether text or image should be more important; if you only want visual similarity, reduce extra text constraints |