Back to Pinecone Python Client

Generating Embeddings

docs/how-to/inference/embeddings.md

9.0.03.2 KB
Original Source

Generating Embeddings

Pinecone hosts embedding models so you can generate vectors without managing your own embedding infrastructure. Call pc.inference.embed and pass your text inputs directly.

Basic usage

python
from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")

result = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=["The quick brown fox", "A second piece of text"],
    parameters={"input_type": "passage"},
)

for embedding in result:
    print(embedding.values[:5])   # first five values

The parameters dict is model-specific. Common keys:

  • input_type"query" for search queries, "passage" for documents being indexed.
  • truncate"END" (default) or "NONE" to raise an error on overlong input.

Discover supported parameters for any model:

python
info = pc.inference.model.get("multilingual-e5-large")
print(info.supported_parameters)

Response: EmbeddingsList

embed returns an {class}~pinecone.models.inference.embed.EmbeddingsList containing:

  • .data — list of {class}~pinecone.models.inference.embed.DenseEmbedding or {class}~pinecone.models.inference.embed.SparseEmbedding objects (one per input).
  • .model — model name used.
  • .usage.total_tokens — token count consumed.

Iterate to access individual embeddings:

python
for emb in result:
    print(emb.values)       # DenseEmbedding: list of floats

For sparse embeddings (e.g. pinecone-sparse-english-v0), access sparse_indices and sparse_values instead:

python
result = pc.inference.embed(
    model="pinecone-sparse-english-v0",
    inputs=["machine learning frameworks"],
)
sparse = result.data[0]
print(sparse.sparse_indices)
print(sparse.sparse_values)

Some models return hybrid (dense + sparse) embeddings as two separate items per input.

Using the EmbedModel enum

Use the {class}~pinecone.models.enums.EmbedModel enum for tab-completion and typo safety:

python
from pinecone import Pinecone
from pinecone.client.inference import Inference

pc = Pinecone(api_key="your-api-key")

result = pc.inference.embed(
    model=Inference.EmbedModel.Multilingual_E5_Large,
    inputs=["search query"],
    parameters={"input_type": "query"},
)

Batch size

Send multiple inputs in a single call to amortize network overhead. The API enforces a per-call token limit; for large batches, split inputs into chunks and iterate:

python
texts = [...]   # potentially hundreds of documents

batch_size = 96
all_embeddings = []
for i in range(0, len(texts), batch_size):
    batch = texts[i : i + batch_size]
    result = pc.inference.embed(model="multilingual-e5-large", inputs=batch)
    all_embeddings.extend(result.data)

Storing embeddings in an index

Extract raw values and upsert into a standard (non-integrated) index:

python
index = pc.index("product-search")

vectors = [
    (f"doc-{i}", emb.values)
    for i, emb in enumerate(result.data)
]
index.upsert(vectors=vectors)

For server-side embedding (no manual embed step), use an integrated index and {meth}~pinecone.Index.upsert_records instead — see {doc}/how-to/integrated-records.

List available models

python
models = pc.inference.model.list(type="embed")
print(models.names())