Back to Paradedb

Configure your Environment

docs/documentation/getting-started/environment.mdx

0.24.024.8 KB
Original Source

This guide will walk you through setting up your environment to run queries against ParadeDB. Choose your preferred tool below:

<Accordion title="SQL">

ParadeDB comes with a helpful procedure that creates a table populated with mock data to help you get started. Run the following command to create this table.

sql
CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'mock_items'
);

Then, inspect the first 3 rows:

sql
SELECT description, rating, category
FROM mock_items
LIMIT 3;
ini
       description        | rating |  category
--------------------------+--------+-------------
 Ergonomic metal keyboard |      4 | Electronics
 Plastic Keyboard         |      4 | Electronics
 Sleek running shoes      |      5 | Footwear
(3 rows)

Next, let's create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.

sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

You're all set! Try running some queries.

</Accordion> <Accordion title="Drizzle (JavaScript)"> To get started, create a TypeScript project with [Drizzle](https://orm.drizzle.team/), [postgres.js](https://github.com/porsager/postgres), and [@paradedb/drizzle-paradedb](https://www.npmjs.com/package/@paradedb/drizzle-paradedb) installed.
bash
npm init -y
npm pkg set type=module
npm install [email protected] [email protected] postgres @paradedb/[email protected] tsx

Create a db.ts file with your database connection and a schema for ParadeDB's built-in test table:

ts
import { indexing } from "@paradedb/drizzle-paradedb";
import { drizzle } from "drizzle-orm/postgres-js";
import postgres from "postgres";
import {
  boolean,
  customType,
  date,
  integer,
  jsonb,
  pgTable,
  serial,
  text,
  time,
  timestamp,
  varchar,
} from "drizzle-orm/pg-core";

export const client = postgres(
  "postgres://myuser:mypassword@localhost:5432/mydatabase",
);
export const db = drizzle({ client });

export const mockItems = pgTable(
  "mock_items",
  {
    id: serial("id").primaryKey(),
    description: text("description"),
    rating: integer("rating"),
    category: varchar("category", { length: 255 }),
    inStock: boolean("in_stock"),
    metadata: jsonb("metadata"),
    createdAt: timestamp("created_at"),
    lastUpdatedDate: date("last_updated_date"),
    latestAvailableTime: time("latest_available_time"),
    weightRange: customType<{ data: string }>({
      dataType: () => "int4range",
    })("weight_range"),
  },
  (table) => [
    indexing
      .bm25Index("search_idx")
      .on(
        table.id,
        table.description,
        table.category,
        table.rating,
        table.inStock,
        table.metadata,
        table.createdAt,
        table.lastUpdatedDate,
        table.latestAvailableTime,
        table.weightRange,
      ),
  ],
);
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

Create a Drizzle Kit configuration that points at the same database:

ts
import { defineConfig } from "drizzle-kit";

export default defineConfig({
  schema: "./db.ts",
  dialect: "postgresql",
  // Keep Drizzle Kit focused on `public.mock_items` so it
  // doesn't try to modify tables from preinstalled extensions.
  schemaFilter: ["public"],
  tablesFilter: ["mock_items"],
  dbCredentials: {
    url: "postgres://myuser:mypassword@localhost:5432/mydatabase",
  },
});

Open a TypeScript REPL:

bash
node --import tsx

Import your database connection, schema, and any helpers used by the setup and query snippets. The Node REPL does not support static import statements, so use dynamic imports here:

ts
const { client, db, mockItems } = await import("./db.ts");
const { and, desc, gt, sql } = await import("drizzle-orm");
const { search } = await import("@paradedb/drizzle-paradedb");

Create and populate mock_items:

ts
await db.execute(sql`
  CALL paradedb.create_bm25_test_table(
    schema_name => 'public',
    table_name => 'mock_items'
  )
`);

Now, in a separate terminal, push the Drizzle schema to create the BM25 index:

bash
npx drizzle-kit push

To paste a formatted query snippet, enter editor mode:

text
.editor

Paste the query body without its import lines. The helpers from those imports are already available from the dynamic imports above. Press Ctrl+D to run the pasted query.

You're all set! Try running some queries in the REPL.

</Accordion> <Accordion title="Django (Python)"> To start you'll need a [Django](https://www.djangoproject.com/) project with [Psycopg](https://www.psycopg.org/) and [django-paradedb](https://pypi.org/project/django-paradedb/) installed. Run the following to create one:
bash
python3 -m venv .venv
source .venv/bin/activate
pip install django psycopg django-paradedb==0.8.0
python3 -m django startproject myproject .
python3 manage.py startapp myapp

In myproject/settings.py, add 'django.contrib.postgres' and 'myapp' to INSTALLED_APPS. Then, configure DATABASES["default"] to point to Postgres:

python
INSTALLED_APPS = [
    ...,
    'django.contrib.postgres',
    'myapp',
]

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "mydatabase",
        "USER": "myuser",
        "PASSWORD": "mypassword",
        "HOST": "localhost",
        "PORT": "5432",
    }
}

We can now add a model for ParadeDB's built-in test table and BM25 index:

python
from django.db import models

from django.contrib.postgres.fields import IntegerRangeField
from paradedb.indexes import BM25Index
from paradedb.queryset import ParadeDBManager
from paradedb.search import Tokenizer


class MockItem(models.Model):
    description = models.TextField(null=True, blank=True)
    rating = models.IntegerField(null=True, blank=True)
    category = models.CharField(max_length=255, null=True, blank=True)
    in_stock = models.BooleanField(null=True, blank=True)
    metadata = models.JSONField(null=True, blank=True)
    created_at = models.DateTimeField(null=True, blank=True)
    last_updated_date = models.DateField(null=True, blank=True)
    latest_available_time = models.TimeField(null=True, blank=True)
    weight_range = IntegerRangeField(null=True, blank=True)

    objects = ParadeDBManager()

    class Meta:
        db_table = "mock_items"
        indexes = [
            BM25Index(
                fields={
                    "id": {},
                    "description": {"tokenizer": Tokenizer.unicode_words()},
                    "category": {"tokenizer": Tokenizer.literal()},
                    "rating": {},
                    "in_stock": {},
                    "metadata": {"json_fields": {"fast": True}},
                    "created_at": {},
                    "last_updated_date": {},
                    "latest_available_time": {},
                    "weight_range": {},
                },
                key_field="id",
                name="search_idx",
            ),
        ]
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

Run the migrations to create the table and index:

bash
python3 manage.py makemigrations
python3 manage.py migrate

Now, open a Python shell with python3 manage.py shell and run the following command to populate mock_items.

python
from django.db import connection

with connection.cursor() as cursor:
    cursor.execute("""
        CALL paradedb.create_bm25_test_table(
          schema_name => 'public',
          table_name  => 'mock_items_tmp'
        );
        INSERT INTO public.mock_items
        SELECT * FROM public.mock_items_tmp;
        DROP TABLE public.mock_items_tmp;
    """)

You're all set! Try running some queries in your Python shell.

</Accordion> <Accordion title="SQLAlchemy (Python)"> To get started, install [SQLAlchemy](https://www.sqlalchemy.org/), [Alembic](https://alembic.sqlalchemy.org/en/latest/), [Psycopg](https://www.psycopg.org/), and [sqlalchemy-paradedb](https://pypi.org/project/sqlalchemy-paradedb/).
bash
python3 -m venv .venv
source .venv/bin/activate
pip install sqlalchemy psycopg alembic sqlalchemy-paradedb==0.6.0

Initialize Alembic:

bash
alembic init migrations

Then update the Alembic configuration to point to your database:

ini
sqlalchemy.url = postgresql+psycopg://myuser:mypassword@localhost:5432/mydatabase

ParadeDB comes with a built-in test table that we'll run our queries against. Create a models.py file with a model and search index for that table:

python
from __future__ import annotations

from datetime import date, datetime, time
from typing import Any

from sqlalchemy import Boolean, Date, DateTime, Index, Integer, String, Text, Time
from sqlalchemy.dialects.postgresql import INT4RANGE, JSONB, Range
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

from paradedb.sqlalchemy import indexing


class Base(DeclarativeBase):
    pass


class MockItem(Base):
    __tablename__ = "mock_items"

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    description: Mapped[str | None] = mapped_column(Text, nullable=True)
    rating: Mapped[int | None] = mapped_column(Integer, nullable=True)
    category: Mapped[str | None] = mapped_column(String(255), nullable=True)
    in_stock: Mapped[bool | None] = mapped_column(Boolean, nullable=True)
    metadata_: Mapped[dict[str, Any] | None] = mapped_column("metadata", JSONB, nullable=True)
    created_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
    last_updated_date: Mapped[date | None] = mapped_column(Date, nullable=True)
    latest_available_time: Mapped[time | None] = mapped_column(Time, nullable=True)
    weight_range: Mapped[Range[int] | None] = mapped_column(INT4RANGE, nullable=True)


Index(
    "search_idx",
    indexing.BM25Field(MockItem.id),
    indexing.BM25Field(MockItem.description),
    indexing.BM25Field(MockItem.category),
    indexing.BM25Field(MockItem.rating),
    indexing.BM25Field(MockItem.in_stock),
    indexing.BM25Field(MockItem.metadata_),
    indexing.BM25Field(MockItem.created_at),
    indexing.BM25Field(MockItem.last_updated_date),
    indexing.BM25Field(MockItem.latest_available_time),
    indexing.BM25Field(MockItem.weight_range),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

Copy this configuration into your migrations/env.py:

python
from logging.config import fileConfig

from sqlalchemy import engine_from_config, text
from sqlalchemy import pool

from alembic import context

# This import is required for autogenerated ParadeDB migrations
# to work properly.
import paradedb.sqlalchemy.alembic  # noqa: F401
from models import Base

config = context.config

if config.config_file_name is not None:
    fileConfig(config.config_file_name)

target_metadata = Base.metadata

# The ParadeDB Docker image comes pre-bundled with some popular
# extensions like PostGIS. PostGIS automatically creates a table
# called `spatial_ref_sys`. This tells Alembic not to drop it even
# though it isn't tracked in Alembic's metadata.
IGNORED_TABLES = {"spatial_ref_sys"}


def include_object(object, name, type_, reflected, compare_to):
    if type_ == "table" and reflected and name in IGNORED_TABLES:
        return False
    return True


def run_migrations_offline() -> None:
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )

    with context.begin_transaction():
        context.run_migrations()


def run_migrations_online() -> None:
    connectable = engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )

    with connectable.connect() as connection:
        # This prevents Alembic from modifying tables outside
        # of the `public` schema.
        connection.execute(text("SET search_path TO public"))
        connection.commit()
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            include_object=include_object,
        )

        with context.begin_transaction():
            context.run_migrations()


if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()

Next, add a migration to create the mock_items test table. Create a blank migration in 0001_create_mock_items_table.py by running the following command:

bash
alembic revision --rev-id 0001 -m "Create mock_items table"

Update the generated migration to create the table:

python
def upgrade() -> None:
    """Upgrade schema."""
    op.execute(
        """
        CALL paradedb.create_bm25_test_table(
          schema_name => 'public',
          table_name => 'mock_items'
        )
        """
    )


def downgrade() -> None:
    """Downgrade schema."""
    op.execute("DROP TABLE IF EXISTS public.mock_items")

Then, run it with:

bash
alembic upgrade head

Next, autogenerate a new migration to create the search index.

bash
alembic revision --rev-id 0002 --autogenerate -m "Create search index on mock_items"

The generated migration should look like this:

python
"""add mock_items search index

Revision ID: 0002
Revises: 0001
Create Date: 2026-04-07 13:56:45.304941

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision: str = '0002'
down_revision: Union[str, Sequence[str], None] = '0001'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
    """Upgrade schema."""
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_bm25_index('search_idx', 'mock_items', ['id', 'description', 'category', 'rating', 'in_stock', 'metadata', 'created_at', 'last_updated_date', 'latest_available_time', 'weight_range'], key_field='id', table_schema='public')
    # ### end Alembic commands ###


def downgrade() -> None:
    """Downgrade schema."""
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_bm25_index('search_idx', if_exists=True, schema='public')
    # ### end Alembic commands ###

Then run it with:

bash
alembic upgrade head

Finally, run python and execute the following:

python
from models import MockItem
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg://myuser:mypassword@localhost:5432/mydatabase')

You're all set! Try running some queries in your shell.

</Accordion> <Accordion title="Rails (Ruby)"> To get started, create a [Rails](https://rubyonrails.org/) app that uses PostgreSQL.
bash
rails new paradedb -d postgresql
cd paradedb

Add the rails-paradedb gem to your Gemfile:

ruby
gem "rails-paradedb", "0.7.0", require: "parade_db"

Then install it:

bash
bundle install

Update config/database.yml to point to your ParadeDB database:

yml
development:
  adapter: postgresql
  encoding: unicode
  database: mydatabase
  username: myuser
  password: mypassword
  host: localhost
  port: 5432

ParadeDB comes with a built-in test table that we'll run our queries against. Generate a migration to create it:

bash
rails generate migration CreateMockItemsTable

Update the generated migration to create mock_items:

ruby
def up
  execute <<~SQL
    CALL paradedb.create_bm25_test_table(
      schema_name => 'public',
      table_name => 'mock_items'
    );
  SQL
end

def down
  drop_table :mock_items, if_exists: true
end

Next, create a model for the mock_items table in app/models/mock_item.rb:

ruby
class MockItem < ApplicationRecord
  include ParadeDB::Model

  self.table_name = "mock_items"
  self.primary_key = "id"
end

Then, create a search index for that table in app/models/mock_item_index.rb:

ruby
class MockItemIndex < ParadeDB::Index
  self.table_name = :mock_items
  self.key_field = :id
  self.index_name = :search_idx
  self.fields = {
    id: nil,
    description: nil,
    category: nil,
    rating: nil,
    in_stock: nil,
    metadata: nil,
    created_at: nil,
    last_updated_date: nil,
    latest_available_time: nil,
    weight_range: nil
  }
end
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

Generate a migration for the search index:

bash
rails generate migration CreateMockItemsIndex

Update the generated migration to create the index:

ruby
def up
  create_paradedb_index(MockItemIndex, if_not_exists: true)
end

def down
  remove_bm25_index :mock_items, name: :search_idx, if_exists: true
end

Run the migrations:

bash
rails db:migrate

You're all set! Open the Rails console and run some queries.

bash
rails console
</Accordion> <Accordion title="Entity Framework Core (C#)"> To get started, create a .NET project with [EF Core](https://learn.microsoft.com/en-us/ef/core/), [Npgsql.EntityFrameworkCore.PostgreSQL](https://www.nuget.org/packages/npgsql.entityframeworkcore.postgresql), and [ParadeDB.EntityFrameworkCore](https://www.nuget.org/packages/ParadeDB.EntityFrameworkCore) installed.
bash
dotnet new console --framework net10.0
dotnet new tool-manifest
dotnet tool install dotnet-ef
dotnet add package Microsoft.EntityFrameworkCore.Design
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL
dotnet add package ParadeDB.EntityFrameworkCore --version 0.1.0
<Note> This console app uses `OnConfiguring` to keep the example self-contained. In ASP.NET Core or another app with dependency injection, register ParadeDB through `UseNpgsql` in `Program.cs`:
cs
builder.Services.AddDbContextPool<AppDbContext>(opt =>
{
    opt.UseNpgsql(
        builder.Configuration.GetConnectionString("AppDatabase"),
        o => o.UseParadeDb()
    );
});
</Note>

Replace Program.cs with a DbContext, model, and query scratchpad for ParadeDB's built-in test table:

cs
using System.Text.Json;
using Microsoft.EntityFrameworkCore;
using NpgsqlTypes;
using ParadeDB.EntityFrameworkCore.Extensions;

await using var dbContext = new AppDbContext();

// Replace this with the query you want to run.
var results = await dbContext
    .MockItems.Where(item =>
        EF.Functions.MatchAny(item.Description, "running shoes") && item.Rating > 2
    )
    .OrderBy(item => item.Rating)
    .Select(item => new { item.Description, item.Rating, item.Category })
    .Take(5)
    .ToListAsync();

PrintResults(results);

static void PrintResults<T>(IReadOnlyList<T> rows)
{
  var properties = typeof(T).GetProperties();

  foreach (var property in properties)
  {
      Console.Write($"{property.Name,-24}");
  }

  Console.WriteLine();
  Console.WriteLine(new string('-', properties.Length * 24));

  foreach (var row in rows)
  {
      foreach (var property in properties)
      {
          var value = property.GetValue(row)?.ToString() ?? "";
          value = value.Length > 21 ? value[..21] + "..." : value;

          Console.Write($"{value,-24}");
      }

      Console.WriteLine();
  }
}

public sealed class AppDbContext : DbContext
{
    public DbSet<MockItem> MockItems => Set<MockItem>();

    protected override void OnConfiguring(DbContextOptionsBuilder options) =>
        options.UseNpgsql("Host=localhost;Port=5432;Database=mydatabase;Username=myuser;Password=mypassword", npgsql => npgsql.UseParadeDb());

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<MockItem>(entity =>
        {
            entity.ToTable("mock_items");
            entity.HasKey(item => item.Id);

            entity.Property(item => item.Id).HasColumnName("id");
            entity.Property(item => item.Description).HasColumnName("description");
            entity.Property(item => item.Rating).HasColumnName("rating");
            entity.Property(item => item.Category).HasColumnName("category").HasColumnType("varchar(255)");
            entity.Property(item => item.InStock).HasColumnName("in_stock");
            entity.Property(item => item.Metadata).HasColumnName("metadata").HasColumnType("jsonb");
            entity.Property(item => item.CreatedAt).HasColumnName("created_at").HasColumnType("timestamp");
            entity.Property(item => item.LastUpdatedDate).HasColumnName("last_updated_date").HasColumnType("date");
            entity.Property(item => item.LatestAvailableTime).HasColumnName("latest_available_time").HasColumnType("time");
            entity.Property(item => item.WeightRange).HasColumnName("weight_range").HasColumnType("int4range");

            entity
                .HasBm25Index("search_idx", item => item.Id)
                .HasField(item => item.Description)
                .HasField(item => item.Category)
                .HasField(item => item.Rating)
                .HasField(item => item.InStock)
                .HasField(item => item.Metadata)
                .HasField(item => item.CreatedAt)
                .HasField(item => item.LastUpdatedDate)
                .HasField(item => item.LatestAvailableTime)
                .HasField(item => item.WeightRange);
        });
    }
}

public sealed class MockItem
{
    public int Id { get; set; }
    public string? Description { get; set; }
    public int? Rating { get; set; }
    public string? Category { get; set; }
    public bool? InStock { get; set; }
    public JsonDocument? Metadata { get; set; }
    public DateTime? CreatedAt { get; set; }
    public DateOnly? LastUpdatedDate { get; set; }
    public TimeOnly? LatestAvailableTime { get; set; }
    public NpgsqlRange<int>? WeightRange { get; set; }
}
<Note> As a general rule of thumb, any columns that you want to filter, `GROUP BY`, `ORDER BY`, or aggregate as part of a full text query should be added to the index for faster performance. </Note> <Note> Note the mandatory `key_field` option. See [choosing a key field](/documentation/indexing/create-index#choosing-a-key-field) for more details. </Note>

Create the EF Core migration:

bash
dotnet ef migrations add CreateMockItems

Open the generated migration in Migrations/*_CreateMockItems.cs and add this seed step to the end of the Up method:

cs
migrationBuilder.Sql("""
    CALL paradedb.create_bm25_test_table(
      schema_name => 'public',
      table_name  => 'mock_items_tmp'
    );

    INSERT INTO public.mock_items
    SELECT * FROM public.mock_items_tmp;

    DROP TABLE public.mock_items_tmp;
    """);

Then apply the migration:

bash
dotnet ef database update

This creates mock_items, the BM25 index, and loads the mock data. Now, run the query included in Program.cs:

bash
dotnet run

You're all set! Try running some queries by adding them to Program.cs.

</Accordion>