NextPlaid

import Icon from "@site/src/components/icon"; import PartialParams from '@site/docs/_partial-hidden-params.mdx'; import PartialVectorSearchResults from '@site/docs/_partial-vector-search-results.mdx'; import PartialVectorStoreInstance from '@site/docs/_partial-vector-store-instance.mdx';

<Icon name="Blocks" aria-hidden="true" /> Bundles contain custom components that support specific third-party integrations with Langflow.

The NextPlaid bundle provides multi-vector ColBERT-style retrieval for Langflow through a running NextPlaid server. NextPlaid stores each document as a matrix of token embeddings, which enables higher retrieval quality on semantic search tasks through ColBERT-style late interaction with MaxSim scoring. For more information on multi-vector embeddings, see the langchain-plaid package that this bundle is built on.

Install the NextPlaid bundle

The bundle includes two components:

NextPlaid: vector store component backed by a running NextPlaid server.
vLLM Multivector Embeddings: generates the multi-vector token embeddings required by NextPlaid.

The NextPlaid bundle is included in the lfx-nextplaid Extension bundle, which is installed automatically as part of uv pip install langflow.

If you need to install it separately, run:

bash

uv pip install lfx-nextplaid
uv run langflow run

To verify the bundle is loaded in your environment:

bash

lfx extension list

The NextPlaid component reads from and writes to a NextPlaid multi-vector search server.

NextPlaid is a Rust-based server that implements ColBERT-style late interaction retrieval with MaxSim scoring. Each document is stored as a matrix of token embeddings rather than a single dense vector, giving better retrieval quality on semantic search tasks. The component supports both text retrieval with ColBERT models and image retrieval with ColPali models.

<details> <summary>About vector store instances</summary> <PartialVectorStoreInstance /> </details> <PartialVectorSearchResults />

Use the NextPlaid component in a flow

Connect a vLLM Multivector Embeddings component to the Embedding (Multivector) input. Standard single-vector embedding components are not compatible with NextPlaid.

The NextPlaid component can be used for both writes and reads:

When writing, it ingests documents from an attached data source, computes multi-vector embeddings with the connected vLLM Multivector Embeddings component, and then loads them into the NextPlaid index. To trigger writes, click <Icon name="Play" aria-hidden="true"/> Run component on the NextPlaid component.
When reading, the NextPlaid component uses chat input to perform a MaxSim similarity search against the index and returns the top results.

NextPlaid parameters

Name	Type	Description
Server URL (`url`)	String	Input parameter. Base URL of the running NextPlaid server. Default: `http://localhost:8080`.
Index Name (`index_name`)	String	Input parameter. Name of the index to create or connect to. Default: `langflow`.
Ingest Data (`ingest_data`)	Data	Input parameter. Documents or images to write to the vector store. Only relevant for writes.
Search Query (`search_query`)	String	Input parameter. The query string to use for similarity search. Only relevant for reads.
Embedding (Multivector) (`embedding`)	Embeddings	Input parameter. Connect a vLLM Multivector Embeddings component to generate token-level embeddings. Required for both reads and writes.
Index Batch Size (`index_batch_size`)	Integer	Input parameter. Number of documents per indexing request. PLAID builds its initial cluster centroids from the first batch — larger batches produce better retrieval quality. Default: `500`.
Number of Results (`number_of_results`)	Integer	Input parameter. Number of results to return from similarity search. Default: `4`.
Quantization Bits (`nbits`)	Dropdown	Input parameter. Bit-width for PLAID quantization. `4` gives better quality; `2` uses less memory. Options: `2`, `4`. Default: `4`.
Create Index If Not Exists (`create_index_if_not_exists`)	Boolean	Input parameter. If `true`, creates the index on the NextPlaid server if it does not already exist. Default: `true`.
Write Timeout (`write_timeout`)	Float	Input parameter. Seconds to wait for each indexing batch to finish. Set to `0` for async indexing (search may return empty results on the first run). Recommended: `30` or higher when ingesting and searching in the same flow run. Default: `30.0`.

vLLM Multivector Embeddings

The vLLM Multivector Embeddings component generates multi-vector token embeddings by calling vLLM's /pooling endpoint with task: token_embed.

The output is a multi-vector Embeddings object that returns a matrix of token embeddings per document, which is required by the NextPlaid vector store. It is not compatible with standard single-vector stores.

For more information about using embedding model components in flows, see Embedding model components.

Your vLLM server must be started with a ColBERT- or ColPali-compatible model and the pooling runner enabled:

bash

vllm serve <model> --runner pooling --pooler-config '{"task": "token_embed"}'

Compatible models include:

Text (ColBERT): answerdotai/answerai-colbert-small-v1
Text + Images (ColPali): ModernVBERT/colmodernvbert

For more information on the vLLM server, see the vLLM documentation.

vLLM Multivector Embeddings parameters

Name	Type	Description
Model Name (`model_name`)	String	Input parameter. The multi-vector model name served by vLLM. Default: `answerdotai/answerai-colbert-small-v1`.
vLLM API Base (`api_base`)	String	Input parameter. Base URL of the vLLM server (without the `/v1` suffix). Default: `http://localhost:8000`.
API Key (`api_key`)	SecretString	Input parameter. API key for the vLLM server. Leave empty for local servers. Optional.
Request Timeout (`request_timeout`)	Float	Input parameter. Timeout in seconds for each request to the vLLM API. Default: `60.0`. Advanced.
Max Retries (`max_retries`)	Integer	Input parameter. Number of times to retry a failed request before raising an error. Default: `3`. Advanced.

NextPlaid

Install the NextPlaid bundle

NextPlaid

Use the NextPlaid component in a flow

NextPlaid parameters

vLLM Multivector Embeddings

vLLM Multivector Embeddings parameters

See also