docs/guides/dataset/advanced/auto_metadata.md
Automatically extract metadata from uploaded files.
RAGFlow v0.23.0 introduces the Auto-metadata feature, which uses large language models to automatically extract and generate metadata for files—eliminating the need for manual entry. In a typical RAG pipeline, metadata serves two key purposes:
:::danger WARNING Enabling TOC extraction requires significant memory, computational resources, and tokens. :::
Click Auto metadata > Settings to go to the configuration page for automatic metadata generation rules.
The configuration page for rules on automatically generating metadata appears.
Enter a field name, such as Author, and add a description and examples in the Description section. This provides context to the large language model (LLM) for more accurate value extraction. If left blank, the LLM will extract values based only on the field name.
To restrict the LLM to generating metadata from a predefined list, enable the Restrict to defined values mode and manually add the allowed values. The LLM will then only generate results from this preset range.
Once configured, turn on the Auto-metadata switch on the Configuration page. All newly uploaded files will have these rules applied during parsing. For files that have already been processed, you must re-parse them to trigger metadata generation. You can then use the filter function to check the metadata generation status of your files.