content/providers/03-community-providers/21-jina-ai.mdx
patelvivekdev/jina-ai-provider is a community provider that uses Jina AI to provide text and multimodal embedding support for the AI SDK.
The Jina provider is available in the jina-ai-provider module. You can install it with
<Tabs items={['pnpm', 'npm', 'yarn', 'bun']}> <Tab> <Snippet text="pnpm add jina-ai-provider" dark /> </Tab> <Tab> <Snippet text="npm install jina-ai-provider" dark /> </Tab> <Tab> <Snippet text="yarn add jina-ai-provider" dark /> </Tab> <Tab> <Snippet text="bun add jina-ai-provider" dark /> </Tab> </Tabs>
You can import the default provider instance jina from jina-ai-provider:
import { jina } from 'jina-ai-provider';
If you need a customized setup, you can import createJina from jina-ai-provider and create a provider instance with your settings:
import { createJina } from 'jina-ai-provider';
const customJina = createJina({
// custom settings
});
You can use the following optional settings to customize the Jina provider instance:
baseURL string
The base URL of the Jina API.
The default prefix is https://api.jina.ai/v1.
apiKey string
API key that is being sent using the Authorization header.
It defaults to the JINA_API_KEY environment variable.
headers Record<string,string>
Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>
Custom fetch implementation.
Defaults to the global fetch function.
You can use it as a middleware to intercept requests,
or to provide a custom fetch implementation for e.g. testing.
You can create models that call the Jina text embeddings API using the .embeddingModel() factory method.
import { jina } from 'jina-ai-provider';
const .embeddingModel = jina.embeddingModel('jina-embeddings-v3');
You can use Jina embedding models to generate embeddings with the embed or embedMany function:
import { jina } from 'jina-ai-provider';
import { embedMany } from 'ai';
const .embeddingModel = jina.embeddingModel('jina-embeddings-v3');
export const generateEmbeddings = async (
value: string,
): Promise<Array<{ embedding: number[]; content: string }>> => {
const chunks = value.split('\n');
const { embeddings } = await embedMany({
model: .embeddingModel,
values: chunks,
providerOptions: {
jina: {
inputType: 'retrieval.passage',
},
},
});
return embeddings.map((embedding, index) => ({
content: chunks[index]!,
embedding,
}));
};
You can create models that call the Jina multimodal (text + image) embeddings API using the .multiModalEmbeddingModel() factory method.
import { jina, type MultimodalEmbeddingInput } from 'jina-ai-provider';
import { embedMany } from 'ai';
const multimodalModel = jina.multiModalEmbeddingModel('jina-clip-v2');
export const generateMultimodalEmbeddings = async () => {
const values: MultimodalEmbeddingInput[] = [
{ text: 'A beautiful sunset over the beach' },
{ image: 'https://i.ibb.co/r5w8hG8/beach2.jpg' },
];
const { embeddings } = await embedMany<MultimodalEmbeddingInput>({
model: multimodalModel,
values,
});
return embeddings.map((embedding, index) => ({
content: values[index]!,
embedding,
}));
};
Pass Jina embedding options via providerOptions.jina. The following options are supported:
inputType 'text-matching' | 'retrieval.query' | 'retrieval.passage' | 'separation' | 'classification'
Intended downstream application to help the model produce better embeddings. Defaults to 'retrieval.passage'.
'retrieval.query': input is a search query.'retrieval.passage': input is a document/passage.'text-matching': for semantic textual similarity tasks.'classification': for classification tasks.'separation': for clustering tasks.outputDimension number
Number of dimensions for the output embeddings. See model documentation for valid ranges.
jina-embeddings-v3: min 32, max 1024.jina-clip-v2: min 64, max 1024.jina-clip-v1: fixed 768.embeddingType 'float' | 'binary' | 'ubinary' | 'base64'
Data type for the returned embeddings.
normalized boolean
Whether to L2-normalize embeddings. Defaults to true.
truncate boolean
Whether to truncate inputs beyond the model context limit instead of erroring. Defaults to false.
lateChunking boolean
Split long inputs into 1024-token chunks automatically. Only for text embedding models.
| Model | Context Length (tokens) | Embedding Dimension | Modalities |
|---|---|---|---|
jina-embeddings-v3 | 8,192 | 1024 | Text |
jina-clip-v2 | 8,192 | 1024 | Text + Images |
jina-clip-v1 | 8,192 | 768 | Text + Images |
const strings = ['text1', 'text2']const text = [{ text: 'Your text here' }]const image = [{ image: 'https://example.com/image.jpg' }] or Base64 data URLsconst mixed = [{ text: 'object text' }, { image: 'image-url' }, { image: 'data:image/jpeg;base64,...' }]