Back to Langflow

Manage vector data

docs/docs/Develop/knowledge.mdx

1.10.0.dev3312.0 KB
Original Source

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Icon from "@site/src/components/icon"; import PartialGlobalModelProviders from '@site/docs/_partial-global-model-providers.mdx';

Vector data is critical to AI applications. Langflow provides several components to help you store and retrieve vector data in your flows, including embedding models, vector stores, and knowledge bases.

Embedding models

Embedding model components generate text embeddings using a specified Large Language Model (LLM).

There are two common use cases for these components:

  • Store vectors: Generate embeddings for content written to a vector database.
  • Search vectors: Generate an embedding from a query to run a similarity search.

In both cases the embedding model component is attached to a vector store component. For more information, examples, and available options, see Embedding model components.

Alternatively, you can use knowledge bases, which include built-in support for several embedding models.

Vector stores

Vector store components read and write to vector databases. Typically, these components connect to remote databases, but some vector store components support local databases.

import PartialVectorRagBlurb from '@site/docs/_partial-vector-rag-blurb.mdx';

<PartialVectorRagBlurb /> <details> <summary>Example: Vector search flow</summary>

import PartialVectorRagFlow from '@site/docs/_partial-vector-rag-flow.mdx';

<PartialVectorRagFlow /> </details>

Knowledge bases

import PartialKbSummary from '@site/docs/_partial-kb-summary.mdx';

<PartialKbSummary />

Create a knowledge base

In this example, you'll create a knowledge base of chunked customer orders. To follow along with this example, download customer-orders.csv to your local machine, or adapt the steps for your own structured data.

  1. On the Projects page page, click <Icon name="Library" aria-hidden="true"/>Knowledge below the list of projects to view and manage your knowledge bases.

  2. To create a new knowledge base, click <Icon name="Plus" aria-hidden="true"/>Add Knowledge.

  3. In the Create Knowledge Base pane, enter a name for your knowledge base, select an embedding model, and select a DB Provider. <PartialGlobalModelProviders /> The DB Provider determines where embeddings are stored. It defaults to the provider configured in Settings → DB Providers. Existing knowledge bases keep their original backend — changing the global DB Provider only affects new knowledge bases.

  4. To configure sources for your knowledge base, click Configure Sources. Optionally, to create an empty knowledge base, click Create.

  5. In the Configure Sources pane, configure the sources for your knowledge base's data, and also how the embedded data will be chunked for vector search retrieval. For this example, click <Icon name="Upload" aria-hidden="true"/>Add Sources, and then select the downloaded customer-orders.csv file from your local machine. The default settings for Chunk Size, Chunk Overlap, and Separator are fine. To continue, click Next Step.

  6. The Review & Build pane allows you to preview your first chunk before you commit to spending tokens to embed all of the data into the knowledge base. If the chunk isn't what you want to embed, click Back to configure your chunking strategy. To embed this data, click Create.

  7. Your data is embedded as a Knowledge. When it is available to use, the Status changes to Ready.

To use the new knowledge base in a flow, see Use the Knowledge Base component in a flow.

Manage knowledge bases

On the Projects page page, click <Icon name="Library" aria-hidden="true"/>Knowledge below the list of projects to view and manage your knowledge bases.

For each knowledge base, you can see the following information:

  • Name
  • Embedding model
  • Size on disk
  • Number of words, characters, and chunks
  • The average length and size of chunks
  • The knowledge base's status

The icon next to the knowledge base name indicates the source file type:

  • <Icon name="File" aria-hidden="true"/> Red — PDF
  • <Icon name="FileChartColumn" aria-hidden="true"/> Green — CSV
  • <Icon name="FileType" aria-hidden="true"/> Purple — plain text (.txt)
  • <Icon name="FileText" aria-hidden="true"/> Fuchsia — Markdown (.md, .mdx)
  • <Icon name="FileCode" aria-hidden="true"/> Yellow — HTML
  • <Icon name="FileCode" aria-hidden="true"/> Blue — code files (.py, .js, .ts)
  • <Icon name="FileJson" aria-hidden="true"/> Indigo — JSON
  • <Icon name="Layers" aria-hidden="true"/> — multiple source types

Chunking behavior is determined by the embedding model, and the embedding model is set when you create the knowledge base. If you need to change the embedding model, you must delete and recreate the knowledge base.

To update a knowledge base with , click <Icon name="EllipsisVertical" aria-hidden="true"/> More, and then select <Icon name="RefreshCW" aria-hidden="true"/> Update Knowledge Base.

To view a knowledge base's chunks, click <Icon name="EllipsisVertical" aria-hidden="true"/> More, and then select <Icon name="Layers" aria-hidden="true"/> View Chunks.

To delete a knowledge base, click <Icon name="EllipsisVertical" aria-hidden="true"/> More, and then click <Icon name="Trash2" aria-hidden="true"/> Delete. If any flows use the deleted knowledge base, you must update them to use a different knowledge base.

For more information on using knowledge bases in a flow, see the Knowledge Base component documentation.

Configure vector database providers

DB Providers are the vector databases where your knowledge bases store and search embeddings. To configure these providers, go to Settings → DB Providers. The selected provider applies to all new knowledge bases you create. Existing knowledge bases continue to use the provider that was active when they were created.

Chroma (default)

By default, knowledge bases use ChromaDB as a local vector store, with no additional setup required. Knowledge bases are stored local to your Langflow instance. The default storage location depends on your operating system and installation method:

  • Langflow Desktop:
    • macOS: /Users/<username>/.langflow/knowledge_bases
    • Windows: C:\Users\<name>\AppData\Roaming\com.LangflowDesktop\knowledge_bases
  • Langflow OSS:
    • macOS/Windows/Linux/WSL with uv pip install: <path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases (Python version can vary. Knowledge bases aren't shared between virtual environments.)
    • macOS/Windows/Linux/WSL with git clone: <path_to_clone>/src/backend/base/langflow/knowledge_bases

If you set the LANGFLOW_CONFIG_DIR environment variable, the knowledge_bases subdirectory is created relative to that path.

To change the default knowledge_bases directory path, set the LANGFLOW_KNOWLEDGE_BASES_DIR environment variable:

bash
export LANGFLOW_KNOWLEDGE_BASES_DIR="/path/to/parent/directory"

OpenSearch

To use OpenSearch as a database provider, you need a running OpenSearch cluster that is accessible to your Langflow instance. This example uses an OpenSearch container running locally, but you can also use a remote OpenSearch instance.

  1. For this example, start a local OpenSearch container with security disabled. This allows you to connect without a username, password, or TLS. This configuration is for example purposes only; it isn't recommended in production environments.

    bash
    podman run -d \
      --name opensearch \
      -p 9200:9200 \
      -p 9600:9600 \
      -e "discovery.type=single-node" \
      -e "plugins.security.disabled=true" \
      -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=YOUR_OPENSEARCH_PASSWORD" \
      opensearchproject/opensearch:latest
    

    :::note OpenSearch 3.x requires OPENSEARCH_INITIAL_ADMIN_PASSWORD to be set even when security is disabled.

    If the password fails validation, container startup exits immediately with Password failed validation.

    The password must adhere to the https://docs.opensearch.org/latest/security/configuration/demo-configuration/#setting-up-a-custom-admin-password[OpenSearch password complexity requirements]. :::

  2. Verify the cluster is reachable:

    bash
    curl -s http://localhost:9200
    

    A successful response indicates that the container has started and can receive requests:

    json
    {
      "name" : "your-node-name",
      "cluster_name" : "docker-cluster",
      "version" : {
        "distribution" : "opensearch",
        "number" : "3.6.0"
      },
      "tagline" : "The OpenSearch Project: https://opensearch.org/"
    }
    

    If you get no response or a connection error, the container might still be starting. Wait a few seconds and try again.

  3. To connect the OpenSearch database to Langflow as a knowledge base, click Settings, and then click DB Providers.

  4. Select OpenSearch.

  5. Enter the following values for the local OpenSearch container:

    • Cluster URL: Enter http://localhost:9200.
    • Username: Leave blank if security is disabled. Otherwise, enter your basic auth username.
    • Password: Leave blank if security is disabled. Otherwise, enter your basic auth password.
    • Default Index name: Enter langflow_knowledge. The OpenSearch index to write and read from. This index is created in the later ingestion step, so it isn't immediately available.
    • Vector field: Enter vector_field. The document field for storing the embedding vector.
    • Text field: Enter text. The document field for storing the chunk text.
    • Use TLS (HTTPS): Turn off. Enable if your cluster uses HTTPS.
    • Verify TLS certificate: Turn off. Enable if your cluster uses CA-signed certificates.
  6. Click Save and Use OpenSearch.

    Optionally, click Test Connection to verify that Langflow can reach your OpenSearch cluster before saving.

    The OpenSearch database is now connected to Langflow as a knowledge base, so you can create a knowledge base that stores its embeddings in OpenSearch.

  7. Click <Icon name="Library" aria-hidden="true"/> Knowledge, and then click <Icon name="Plus" aria-hidden="true"/> Add Knowledge.

  8. Enter a name for this knowledge base. The name can be anything, and doesn't need to match the OpenSearch index name. The name becomes the internal label used to scope searches to this knowledge base within the shared OpenSearch index.

  9. Select an embedding model. When you create a knowledge base in Langflow, you can choose one of your configured embedding model providers. Once you create a knowledge base, you cannot change its provider unless you recreate the knowledge base. For more information, see Embedding Model.

  10. Optional: Add Custom Metadata Fields to tag every chunk with additional context. For example, if you're ingesting files from multiple teams, add a field team with a value of support. When the Knowledge Base component searches, you can then filter results to only return chunks where team equals support to keep results scoped to the support team's content.

  11. Click Next Step.

  12. Add your source files and configure chunking settings, then click Next Step.

  13. In the Review & Build pane, preview the first chunk of your data and confirm the chunk size is appropriate for your use case. A typical chunk size is 512–1000 characters. Smaller chunks support more granular retrieval but they can lose context across chunks.

  14. Click Create.

The knowledge base is now available to use in a flow with the Knowledge Ingestion and Knowledge Base components.

See also