Back to Pinecone Python Client

Integrated Records (Server-Side Embedding)

docs/how-to/integrated-records.md

9.0.04.4 KB
Original Source

Integrated Records (Server-Side Embedding)

Integrated indexes store text records and embed them server-side using a hosted model. You write text; Pinecone handles the embedding. No separate embed step required.

Create an integrated index

Use {class}~pinecone.IntegratedSpec and provide an {class}~pinecone.EmbedConfig that maps a document field to the embedding input:

python
from pinecone import Pinecone
from pinecone.models.indexes.specs import EmbedConfig, IntegratedSpec

pc = Pinecone(api_key="your-api-key")

pc.indexes.create(
    name="articles",
    spec=IntegratedSpec(
        cloud="aws",
        region="us-east-1",
        embed=EmbedConfig(
            model="multilingual-e5-large",
            field_map={"text": "body"},   # embed the "body" field of each record
        ),
    ),
)

field_map maps the model's input field name ("text" for most models) to the field in your records that holds the content to embed.

Get a handle to the index:

python
index = pc.index("articles")

Upsert records

Call {meth}~pinecone.Index.upsert_records with a list of record dicts. Each record must have an _id (or id) field. Include any fields you configured in field_map:

python
response = index.upsert_records(
    namespace="en",
    records=[
        {"_id": "article-1", "body": "Vector databases accelerate AI search."},
        {"_id": "article-2", "body": "RAG pipelines combine retrieval with generation."},
        {"_id": "article-3", "body": "Pinecone scales to billions of vectors."},
    ],
)
print(response.record_count)   # number of records submitted

Records are sent as newline-delimited JSON (NDJSON). Embeddings are generated asynchronously by Pinecone; allow a moment before searching.

Search records

Call {meth}~pinecone.Index.search with inputs containing the query text. Pinecone embeds the query server-side and returns the nearest records:

python
results = index.search(
    namespace="en",
    top_k=5,
    inputs={"text": "AI and machine learning"},
)

for hit in results.result.hits:
    print(hit.id, hit.score, hit.fields)

Response: SearchRecordsResponse

search returns a {class}~pinecone.models.vectors.search.SearchRecordsResponse:

  • .result.hits — list of {class}~pinecone.models.vectors.search.Hit objects, ordered by descending score.
  • .usage.read_units — read units consumed.
  • .usage.embed_total_tokens — tokens used for embedding the query.

Each {class}~pinecone.models.vectors.search.Hit exposes:

  • .id — the record identifier.
  • .score — similarity score (higher is more relevant).
  • .fields — dict of record fields returned in the result.

Use SearchInputs for IDE support

python
from pinecone.models.vectors.search import SearchInputs

results = index.search(
    namespace="en",
    top_k=5,
    inputs=SearchInputs(text="AI and machine learning"),
)

Rerank in a single search call

Pass a rerank config to retrieve and rerank in one request:

python
from pinecone.models.vectors.search import RerankConfig

results = index.search(
    namespace="en",
    top_k=10,
    inputs={"text": "best practices for vector search"},
    rerank=RerankConfig(
        model="bge-reranker-v2-m3",
        rank_fields=["body"],
        top_n=3,
    ),
)

for hit in results.result.hits:
    print(hit.id, hit.score)

top_k controls how many candidates are retrieved; top_n controls how many survive reranking. The response contains at most top_n hits.

You can also pass a plain dict for rerank:

python
results = index.search(
    namespace="en",
    top_k=10,
    inputs={"text": "best practices"},
    rerank={"model": "bge-reranker-v2-m3", "rank_fields": ["body"], "top_n": 3},
)

Filter by metadata

Pass a filter dict to restrict results to records matching metadata conditions:

python
results = index.search(
    namespace="en",
    top_k=5,
    inputs={"text": "quantum computing"},
    filter={"category": {"$eq": "science"}},
)

Select returned fields

By default the server returns all record fields. Use fields to restrict the response:

python
results = index.search(
    namespace="en",
    top_k=5,
    inputs={"text": "AI research"},
    fields=["body", "author"],
)

See also

  • {doc}/how-to/inference/embeddings — generate embeddings manually for non-integrated indexes.
  • {doc}/how-to/inference/reranking — rerank results from any source.