docs/how-to/inference/embeddings.md
Pinecone hosts embedding models so you can generate vectors without managing your own
embedding infrastructure. Call pc.inference.embed and pass your text inputs directly.
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
result = pc.inference.embed(
model="multilingual-e5-large",
inputs=["The quick brown fox", "A second piece of text"],
parameters={"input_type": "passage"},
)
for embedding in result:
print(embedding.values[:5]) # first five values
The parameters dict is model-specific. Common keys:
input_type — "query" for search queries, "passage" for documents being indexed.truncate — "END" (default) or "NONE" to raise an error on overlong input.Discover supported parameters for any model:
info = pc.inference.model.get("multilingual-e5-large")
print(info.supported_parameters)
embed returns an {class}~pinecone.models.inference.embed.EmbeddingsList containing:
.data — list of {class}~pinecone.models.inference.embed.DenseEmbedding or
{class}~pinecone.models.inference.embed.SparseEmbedding objects (one per input)..model — model name used..usage.total_tokens — token count consumed.Iterate to access individual embeddings:
for emb in result:
print(emb.values) # DenseEmbedding: list of floats
For sparse embeddings (e.g. pinecone-sparse-english-v0), access sparse_indices
and sparse_values instead:
result = pc.inference.embed(
model="pinecone-sparse-english-v0",
inputs=["machine learning frameworks"],
)
sparse = result.data[0]
print(sparse.sparse_indices)
print(sparse.sparse_values)
Some models return hybrid (dense + sparse) embeddings as two separate items per input.
Use the {class}~pinecone.models.enums.EmbedModel enum for tab-completion and typo
safety:
from pinecone import Pinecone
from pinecone.client.inference import Inference
pc = Pinecone(api_key="your-api-key")
result = pc.inference.embed(
model=Inference.EmbedModel.Multilingual_E5_Large,
inputs=["search query"],
parameters={"input_type": "query"},
)
Send multiple inputs in a single call to amortize network overhead. The API enforces a per-call token limit; for large batches, split inputs into chunks and iterate:
texts = [...] # potentially hundreds of documents
batch_size = 96
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i : i + batch_size]
result = pc.inference.embed(model="multilingual-e5-large", inputs=batch)
all_embeddings.extend(result.data)
Extract raw values and upsert into a standard (non-integrated) index:
index = pc.index("product-search")
vectors = [
(f"doc-{i}", emb.values)
for i, emb in enumerate(result.data)
]
index.upsert(vectors=vectors)
For server-side embedding (no manual embed step), use an integrated index and
{meth}~pinecone.Index.upsert_records instead — see
{doc}/how-to/integrated-records.
models = pc.inference.model.list(type="embed")
print(models.names())