Running AI Models - Supabase

Edge Functions have a built-in API for running AI models. You can use this API to generate embeddings, build conversational workflows, and do other AI related tasks in your Edge Functions.

This allows you to:

Generate text embeddings without external dependencies
Run Large Language Models via Ollama or Llamafile
Build conversational AI workflows

Setup

There are no external dependencies or packages to install to enable the API.

Create a new inference session:

const model = new Supabase.ai.Session('model-name')

To get type hints and checks for the API, import types from functions-js:

import 'jsr:@supabase/functions-js/edge-runtime.d.ts'

</Admonition>

Running a model inference

Once the session is instantiated, you can call it with inputs to perform inferences:

// For embeddings (gte-small model)
const embeddings = await model.run('Hello world', {
  mean_pool: true,
  normalize: true,
})

// For text generation (non-streaming)
const response = await model.run('Write a haiku about coding', {
  stream: false,
  timeout: 30,
})

// For streaming responses
const stream = await model.run('Tell me a story', {
  stream: true,
  mode: 'ollama',
})

Generate text embeddings

Generate text embeddings using the built-in gte-small model:

gte-small model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens. While you can provide inputs longer than 512 tokens, truncation may affect the accuracy.

</Admonition>

const model = new Supabase.ai.Session('gte-small')

Deno.serve(async (req: Request) => {
  const params = new URL(req.url).searchParams
  const input = params.get('input')
  const output = await model.run(input, { mean_pool: true, normalize: true })
  return new Response(JSON.stringify(output), {
    headers: {
      'Content-Type': 'application/json',
      Connection: 'keep-alive',
    },
  })
})

Using Large Language Models (LLM)

Inference via larger models is supported via Ollama and Mozilla Llamafile. In the first iteration, you can use it with a self-managed Ollama or Llamafile server.

We are progressively rolling out support for the hosted solution. To sign up for early access, fill out this form.

</Admonition> <video width="99%" muted playsInline controls={true}> <source src="https://xguihxuzqibwxjnimxev.supabase.co/storage/v1/object/public/videos/docs/guides/edge-functions-inference-2.mp4" type="video/mp4" /> </video>

Running locally

<Tabs scrollable size="large" type="underlined" defaultActiveId="ollama" queryGroup="platform"

<TabPanel id="ollama" label="Ollama"> <StepHikeCompact> <StepHikeCompact.Step step={1} fullWidth> <StepHikeCompact.Details title="Install Ollama" fullWidth> [Install Ollama](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pull the Mistral model

  ```bash
  ollama pull mistral
  ```
</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={2} fullWidth> <StepHikeCompact.Details title="Run the Ollama server" fullWidth> bash ollama serve </StepHikeCompact.Details> </StepHikeCompact.Step> <StepHikeCompact.Step step={3} fullWidth> <StepHikeCompact.Details title="Set the function secret" fullWidth> Set a function secret called AI_INFERENCE_API_HOST to point to the Ollama server

  ```bash
  echo "AI_INFERENCE_API_HOST=http://host.docker.internal:11434" >> supabase/functions/.env
  ```
</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={4} fullWidth> <StepHikeCompact.Details title="Create a new function" fullWidth>

  ```bash
  supabase functions new ollama-test
  ```

  ```ts supabase/functions/ollama-test/index.ts
  import 'jsr:@supabase/functions-js/edge-runtime.d.ts'
  const session = new Supabase.ai.Session('mistral')

  Deno.serve(async (req: Request) => {
    const params = new URL(req.url).searchParams
    const prompt = params.get('prompt') ?? ''

    // Get the output as a stream
    const output = await session.run(prompt, { stream: true })

    const headers = new Headers({
      'Content-Type': 'text/event-stream',
      Connection: 'keep-alive',
    })

    // Create a stream
    const stream = new ReadableStream({
      async start(controller) {
        const encoder = new TextEncoder()

        try {
          for await (const chunk of output) {
            controller.enqueue(encoder.encode(chunk.response ?? ''))
          }
        } catch (err) {
          console.error('Stream error:', err)
        } finally {
          controller.close()
        }
      },
    })

    // Return the stream to the user
    return new Response(stream, {
      headers,
    })
  })
  ```

</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={5} fullWidth> <StepHikeCompact.Details title="Serve the function" fullWidth> bash supabase functions serve --env-file supabase/functions/.env </StepHikeCompact.Details> </StepHikeCompact.Step> <StepHikeCompact.Step step={6} fullWidth> <StepHikeCompact.Details title="Execute the function" fullWidth> bash curl --get "http://localhost:54321/functions/v1/ollama-test" \ --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ -H "Authorization: $ANON_KEY" </StepHikeCompact.Details> </StepHikeCompact.Step> </StepHikeCompact>

</TabPanel> <TabPanel id="llamafile" label="Mozilla Llamafile">

Follow the Llamafile Quickstart to download an run a Llamafile locally on your machine.

Since Llamafile provides an OpenAI API compatible server, you can either use it with @supabase/functions-js or with the official OpenAI Deno SDK.

<Tabs scrollable size="large" type="underlined" defaultActiveId="supabase-functions-js" queryGroup="sdk"

<TabPanel id="supabase-functions-js" label="Supabase Functions JS"> <StepHikeCompact> <StepHikeCompact.Step step={1} fullWidth> <StepHikeCompact.Details title="Set function secret" fullWidth> Set a function secret called `AI_INFERENCE_API_HOST` to point to the Llamafile server

  ```bash
  echo "AI_INFERENCE_API_HOST=http://host.docker.internal:8080" >> supabase/functions/.env
  ```
</StepHikeCompact.Details>

</StepHikeCompact.Step>

<StepHikeCompact.Step step={2} fullWidth> <StepHikeCompact.Details title="Create a new function" fullWidth> Create a new function with the following code

  ```bash
  supabase functions new llamafile-test
  ```
</StepHikeCompact.Details>

</StepHikeCompact.Step>

<StepHikeCompact.Step step={3} fullWidth> <StepHikeCompact.Details title="Add the function code" fullWidth> <Admonition type="note">

    Note that the model parameter doesn't have any effect here. The model depends on which Llamafile is currently running.

  </Admonition>

  ```ts supabase/functions/llamafile-test/index.ts
  import 'jsr:@supabase/functions-js/edge-runtime.d.ts'
  const session = new Supabase.ai.Session('LLaMA_CPP')

  Deno.serve(async (req: Request) => {
    const params = new URL(req.url).searchParams
    const prompt = params.get('prompt') ?? ''

    // Get the output as a stream
    const output = await session.run(
      {
        messages: [
          {
            role: 'system',
            content:
              'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.',
          },
          {
            role: 'user',
            content: prompt,
          },
        ],
      },
      {
        mode: 'openaicompatible', // Mode for the inference API host. (default: 'ollama')
        stream: false,
      }
    )

    console.log('done')
    return Response.json(output)
  })
  ```
</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={4} fullWidth> <StepHikeCompact.Details title="Serve the function" fullWidth> bash supabase functions serve --env-file supabase/functions/.env </StepHikeCompact.Details> </StepHikeCompact.Step> <StepHikeCompact.Step step={5} fullWidth> <StepHikeCompact.Details title="Execute the function" fullWidth> bash curl --get "http://localhost:54321/functions/v1/llamafile-test" \ --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ -H "Authorization: $ANON_KEY" </StepHikeCompact.Details> </StepHikeCompact.Step> </StepHikeCompact>

</TabPanel> <TabPanel id="openai" label="OpenAI Deno SDK"> <StepHikeCompact> <StepHikeCompact.Step step={1} fullWidth> <StepHikeCompact.Details title="Set function secret" fullWidth> Set the following function secrets to point the OpenAI SDK to the Llamafile server

  ```bash
  echo "OPENAI_BASE_URL=http://host.docker.internal:8080/v1" >> supabase/functions/.env
  echo "OPENAI_API_KEY=sk-XXXXXXXX" >> supabase/functions/.env
  ```

</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={2} fullWidth> <StepHikeCompact.Details title="Create a new function" fullWidth> bash supabase functions new llamafile-test </StepHikeCompact.Details> </StepHikeCompact.Step> <StepHikeCompact.Step step={3} fullWidth> <StepHikeCompact.Details title="Add the function code" fullWidth> <Admonition type="note">

    Note that the model parameter doesn't have any effect here. The model depends on which Llamafile is currently running.

  </Admonition>

  ```ts
  import OpenAI from 'https://deno.land/x/[email protected]/mod.ts'

  Deno.serve(async (req) => {
    const client = new OpenAI()
    const { prompt } = await req.json()
    const stream = true

    const chatCompletion = await client.chat.completions.create({
      model: 'LLaMA_CPP',
      stream,
      messages: [
        {
          role: 'system',
          content:
            'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.',
        },
        {
          role: 'user',
          content: prompt,
        },
      ],
    })

    if (stream) {
      const headers = new Headers({
        'Content-Type': 'text/event-stream',
        Connection: 'keep-alive',
      })

      // Create a stream
      const stream = new ReadableStream({
        async start(controller) {
          const encoder = new TextEncoder()

          try {
            for await (const part of chatCompletion) {
              controller.enqueue(encoder.encode(part.choices[0]?.delta?.content || ''))
            }
          } catch (err) {
            console.error('Stream error:', err)
          } finally {
            controller.close()
          }
        },
      })

      // Return the stream to the user
      return new Response(stream, {
        headers,
      })
    }

    return Response.json(chatCompletion)
  })
  ```
</StepHikeCompact.Details>

</TabPanel> </Tabs> </TabPanel> </Tabs>

Deploying to production

Once the function is working locally, it's time to deploy to production.

<StepHikeCompact> <StepHikeCompact.Step step={1} fullWidth> <StepHikeCompact.Details title="Deploy an Ollama or Llamafile server" fullWidth> Deploy an Ollama or Llamafile server and set a function secret called `AI_INFERENCE_API_HOST` to point to the deployed server:

  ```bash
  supabase secrets set AI_INFERENCE_API_HOST=https://path-to-your-llm-server/
  ```

</StepHikeCompact.Details>

</StepHikeCompact.Step> <StepHikeCompact.Step step={2} fullWidth> <StepHikeCompact.Details title="Deploy the function" fullWidth> bash supabase functions deploy </StepHikeCompact.Details> </StepHikeCompact.Step> <StepHikeCompact.Step step={3} fullWidth> <StepHikeCompact.Details title="Execute the function" fullWidth> bash curl --get "https://project-ref.supabase.co/functions/v1/ollama-test" \ --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ -H "Authorization: $ANON_KEY" </StepHikeCompact.Details> </StepHikeCompact.Step> </StepHikeCompact>

As demonstrated in the video above, running Ollama locally is typically slower than running it in on a server with dedicated GPUs. We are collaborating with the Ollama team to improve local performance.

In the future, a hosted LLM API, will be provided as part of the Supabase platform. Supabase will scale and manage the API and GPUs for you. To sign up for early access, fill up this form.

</Admonition>