Back to Paradedb

Migrating from Elasticsearch

docs/documentation/migration/elasticsearch-feature-comparison.mdx

0.23.322.2 KB
Original Source

This page is for developers who are evaluating or actively migrating from Elasticsearch (or OpenSearch) to ParadeDB. ParadeDB delivers Elastic-quality full-text search as a Postgres extension, so your data, queries, and infrastructure all live inside Postgres — no ETL pipelines, no separate cluster to manage.

Key Differences

ElasticsearchParadeDB
Query languageJSON DSLStandard SQL with search operators
Data modelDenormalized documentsNormalized relational tables with JOINs
TransactionsPer-document atomicity, eventual consistencyFull ACID transactions
Index storageSeparate clusterInside Postgres (same database)
Schema changesDynamic mapping or reindexDefined at index creation; REINDEX to change
Updates/deletesExpensive (reindex internally)Native Postgres operations

Migration Tips

  • Start with your most common queries. Map your highest-traffic Elasticsearch queries using the full-text search reference.
  • Use SQL JOINs instead of denormalization. Elasticsearch requires denormalized documents, but ParadeDB supports full SQL JOINs. You can normalize your schema and simplify your data model.
  • Continue to use Postgres tooling. Backups, replication, monitoring, and CI/CD integrate with standard Postgres tools you already use.

Feature Comparison

Query Capabilities

FeatureElasticsearchParadeDBNotes
Full-text search (BM25)Match and phrase operators
Fuzzy matchingMax edit distance of 2 via pdb.fuzzy()
Phrase matching### operator
Phrase prefixpdb.phrase_prefix()
Regular expressionspdb.regex()
Wildcard queriesVia regex
Boolean queriesVia SQL AND/OR/NOT or paradedb.boolean
Proximity search## operator
More Like Thisparadedb.more_like_this
Nested queriesVia SQL JOINs
Parent-child queriesVia SQL JOINs
Geo queriesUse PostGIS
Percolator (reverse search)
Script-based scoring
Suggesters (autocomplete)Via search_tokenizer (index with ngram, search with unicode) or fuzzy prefix

Text Analysis

FeatureElasticsearchParadeDBNotes
Custom tokenizers12+ built-in tokenizers
Token filters7 filters: lowercase, stemmer, stopwords, ascii_folding, alpha_num_only, trim, token_length
Character filters
Synonyms⚠️Coming soon
Different search-time analyzerVia search_tokenizer or multiple tokenizers per field
Multi-language supportChinese (Jieba), Japanese/Korean (Lindera), ICU
Stemming19 languages
Stopwords29 languages
N-gram tokenizationConfigurable min/max gram size

Aggregations

FeatureElasticsearchParadeDBNotes
Bucket aggregationsterms, histogram, date_histogram, range, filters
Metric aggregationsavg, sum, min/max, count, stats, percentiles, cardinality, top_hits
Pipeline aggregationsUse SQL window functions (SUM() OVER(), LAG(), etc.)
Nested aggregationsUse pdb.agg() with SQL GROUP BY
ES-compatible JSON syntaxpdb.agg() accepts ES JSON directly
SQL GROUP BYLimitedFull SQL aggregation support
<Note> Since `pdb.agg()` accepts Elasticsearch-compatible JSON, many of your existing aggregation queries can be migrated with minimal changes. See the [aggregates documentation](/documentation/aggregates/overview). </Note>

Scoring and Relevance

FeatureElasticsearchParadeDBNotes
BM25 scoringpdb.score()
Custom boostpdb.boost() type cast
Constant scorepdb.const()
Disjunction maxparadedb.disjunction_max()
Function scoreUse boost / const as partial alternatives
Script scoring
Decay functions
Field value factor

Highlighting

FeatureElasticsearchParadeDBNotes
Snippet highlightingpdb.snippet()
Multiple snippetspdb.snippets()
Custom tagsstart_tag, end_tag parameters
Byte offsetspdb.snippet_positions()
Fuzzy highlighting

Index Management

FeatureElasticsearchParadeDBNotes
Create indexCREATE INDEX ... USING bm25
Drop indexDROP INDEX
ReindexREINDEX
Index aliasesVia Postgres views
Index templates
Dynamic mappingSchema defined at index creation; requires REINDEX to change
Multi-field indexAll columns included in one index per table

Data Operations

FeatureElasticsearchParadeDBNotes
ACID transactionsFull Postgres ACID compliance
Real-time indexing⚠️ES is near-real-time (requires refresh); ParadeDB provides immediate read-after-write
JOINsFull SQL JOIN support
UPDATE / DELETE⚠️ES internally reindexes; Postgres handles natively
Bulk insertCOPY or batch INSERT
SQL queriesFull SQL including subqueries, CTEs, window functions

Deployment and Operations

FeatureElasticsearchParadeDBNotes
Horizontal sharding⚠️Via Citus for distributed workloads
Read replicasPostgres streaming replication
KubernetesCNPG / Helm charts
DockerOfficial Docker image
Logical replication ingestSync from existing Postgres
Cross-cluster search
Snapshot / restoreVia Postgres backup tools (pg_dump, WAL archiving)
Monitoringpg_stat, pganalyze, standard Postgres tools

Pagination

FeatureElasticsearchParadeDBNotes
from / sizeSQL LIMIT / OFFSET
scroll APIUse SQL cursors (DECLARE / FETCH) instead
search_afterUse keyset pagination (WHERE id > last_id ORDER BY id) instead
Top K optimizationparadedb.limit_fetch_multiplier