docs/versioned_docs/version-1.10.0/Components/bundles-files-ingestion.mdx
import Icon from "@site/src/components/icon";
<Icon name="Blocks" aria-hidden="true" /> Bundles contain custom components that support specific third-party integrations with Langflow.
Langflow integrates with OpenDsStar through a bundle of file processing components for ingesting, indexing, and retrieving content from large collections of files in agent workflows.
OpenDsStar package (File Description Generator only): The File Description Generator component requires the OpenDsStar package and Python 3.11 or later.
Install the dependency with:
uv pip install OpenDsStar
For more information, see Install custom dependencies.
For an example of using this component, see the Structured Data Agent starter template.
The following sections describe the purpose and configuration options for each component in the File Processing bundle.
The File Content Retriever component takes file outputs from a Read File component and exposes two tools so an agent can look up file content by path:
retrieve_content): Returns the file content as text (Message).retrieve_content_as_dataframe): Returns the file content as a Table for tabular formats (CSV, Excel, Parquet, SON, and TSV).File maps are built once and cached in memory after the first build. Set Persistent Directory to cache maps to disk and preserve them across flow runs.
| Name | Type | Description |
|---|---|---|
| file_data | Data, Table, or Message | Input parameter. Output from a Read File component. |
| persistent_dir | String | Input parameter. Optional path to a directory for persisting file maps across runs. If empty, maps are kept in memory only. |
| file_path | String | Input parameter (Tool Mode). The full file path as a string, for example /path/to/file.csv. Used by agents to request a specific file's content. |
The File Description Generator component runs the OpenDsStar Docling-based ingestion pipeline to produce natural-language descriptions of each file.
For each file, the pipeline converts the document with Docling, shortens the Markdown output, and prompts the connected LLM to write a searchable description. Processing runs in a subprocess to avoid memory pressure when handling large files.
The component outputs a list of Data objects, each containing file_path and the generated description text. Connect this output to a vector store's Ingest Data input to make the files searchable by an agent.
Descriptions are cached in the Cache Directory to avoid regenerating them on subsequent runs with the same files.
| Name | Type | Description |
|---|---|---|
| file_data | Data, Table, or Message | Input parameter. Output from a Read File component. |
| llm | LanguageModel | Input parameter. The LLM used to generate file descriptions. |
| cache_dir | String | Input parameter. Directory for caching Docling analysis and LLM-generated descriptions. Default: ./opendsstar_cache. |
| embedding_model | String | Input parameter. Embedding model name used for cache keying. Default: ibm-granite/granite-embedding-english-r2. |
| timeout | Integer | Input parameter. Maximum time in seconds allowed for the ingestion subprocess. Default: 3600. Increase this value for large file sets. |
| batch_size | Integer | Input parameter. Number of files to process per LLM batch. Default: 8. |
The Merge Flows component connects multiple upstream component outputs and triggers all of them when the component executes.
Use this component to synchronize parallel setup pipelines, such as running the File Description Generator ingestion flow and the File Content Retriever initialization together before starting an agent.
The component outputs a Message that confirms how many upstream flows completed.
| Name | Type | Description |
|---|---|---|
| inputs | Data, Table, Message, Tool, or JSON | Input parameter. Connect any number of upstream component outputs here. All connected components will run when this component executes. |