document/content/docs/introduction/guide/course/fileInput.en.mdx
Starting from version 4.8.9, FastGPT supports configuring file and image uploads in both Basic Mode and Workflows. This guide covers how to use the file input feature and explains how document parsing works under the hood.
When file upload is enabled in Basic Mode, it uses tool-calling mode — the model decides whether to read the file content.
Find the file upload option on the left panel and click the Enable/Disable toggle to open the configuration dialog.
Once enabled, a file selection icon appears in the chat input area. Click it to select files for upload.
Behavior
Starting from version 4.8.13, Basic Mode forces file parsing and injects the content into the system prompt, preventing cases where the model skips reading the file during multi-turn conversations.
In Workflows, find the File Input option in the system configuration panel and click the Enable/Disable toggle to open the configuration dialog.
There are many ways to use files in Workflows. The simplest approach, shown below, connects document parsing via tool calling — achieving the same result as Basic Mode.
You can also use Workflows to extract or analyze document content, then pass the results to HTTP requests or other modules to build a document processing pipeline.
Unlike image recognition, LLMs currently cannot parse documents directly. All document "understanding" is achieved by converting documents to text and injecting it into the prompt. The following FAQs explain how this works — understanding the mechanics helps you use document parsing more effectively in Workflows.
In FastGPT's chat history, messages with role=user store their value in this structure:
type UserChatItemValueItemType = {
type: 'text' | 'file'
text?: {
content: string;
};
file?: {
type: 'img' | 'doc'
name?: string;
url: string;
};
};
Uploaded images and documents are stored as URLs — the parsed document content is not stored.
The document parsing node does not process images. Image URLs are filtered out. For image recognition, use an LLM that supports vision.
The document parsing node accepts an array<string> input (file URLs) and outputs a string (the parsed content).
Multiple files are concatenated using the following template — filename + content, separated by \n******\n:
File: ${filename}
<Content>
${content}
</Content>
AI nodes (AI Chat / Tool Calling) have a document URL input that lets you reference document addresses directly.
It accepts an Array<string> input. The URLs are parsed and injected into a system message using this prompt template:
Use the content in <FilesContent></FilesContent> as reference for this conversation:
<FilesContent>
{{quote}}
</FilesContent>
There are some differences from version 4.8.9. We've maintained backward compatibility to avoid breaking existing workflows, but please update your workflows to follow the new rules as soon as possible — compatibility code will be removed in future versions.