docs/documentation/full-text/match.mdx
Match queries are the go-to query type for text search in ParadeDB. There are two types of match queries: match disjunction and match conjunction.
Match disjunction uses the ||| operator and means "find all documents that contain one or more of the terms tokenized from this text input."
To understand what this looks like in practice, let's consider the following query:
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ||| 'running shoes'; ```from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running shoes', operator='OR'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_any(MockItem.description, "running shoes"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_any("running shoes")
.select(:description, :rating, :category)
This query returns:
description | rating | category
---------------------+--------+----------
Sleek running shoes | 5 | Footwear
White jogging shoes | 3 | Footwear
Generic shoes | 4 | Footwear
(3 rows)
Let's look at what the ||| operator does:
description column. In this example,
let's assume description uses the unicode tokenizer.running shoes becomes two tokens: running and shoes.description contains any one of the tokens, running or shoes.This is why all results have either running or shoes tokens in description.
Let's consider a few more hypothetical documents to see whether they would be returned by match disjunction.
These examples assume that the index uses the default tokenizer and token filters, and that the query is
running shoes.
| Original Text | Tokens | Match | Reason | Related |
|---|---|---|---|---|
| Sleek running shoes | sleek running shoes | ✅ | Contains both running and shoes. | |
| Running shoes sleek | sleek running shoes | ✅ | Contains both running and shoes. | Phrase |
| SLeeK RUNNING ShOeS | sleek running shoes | ✅ | Contains both running and shoes. | Lowercasing |
| Sleek run shoe | sleek run shoe | ❌ | Contains neither running nor shoes. | Stemming |
| Sleke ruining shoez | sleke ruining shoez | ❌ | Contains neither running nor shoes. | Fuzzy |
| White jogging shoes | white jogging shoes | ✅ | Contains shoes. | Match conjunction |
Suppose we want to find rows that contain both running and shoes. This is where the &&& match conjunction operator comes in.
&&& means "find all documents that contain all terms tokenized from this text input."
from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running shoes', operator='AND'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_all(MockItem.description, "running shoes"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_all("running shoes")
.select(:description, :rating, :category)
This query returns:
description | rating | category
---------------------+--------+----------
Sleek running shoes | 5 | Footwear
(1 row)
Note that White jogging shoes and Generic shoes are no longer returned because they do not have the token running.
Match conjunction works exactly like match disjunction, except for one key distinction. Instead of finding documents containing at least one matching token from the query, it finds documents where all tokens from the query are a match.
Let’s consider a few more hypothetical documents to see whether they would be returned by match conjunction.
These examples assume that the index uses the default tokenizer and token filters, and that the query is
running shoes.
| Original Text | Tokens | Match | Reason | Related |
|---|---|---|---|---|
| Sleek running shoes | sleek running shoes | ✅ | Contains both running and shoes. | |
| Running shoes sleek | sleek running shoes | ✅ | Contains both running and shoes. | Phrase |
| SLeeK RUNNING ShOeS | sleek running shoes | ✅ | Contains both running and shoes. | Lowercasing |
| Sleek run shoe | sleek run shoe | ❌ | Does not contain both running and shoes. | Stemming |
| Sleke ruining shoez | sleke ruining shoez | ❌ | Does not contain both running and shoes. | Fuzzy |
| White jogging shoes | white jogging shoes | ❌ | Does not contain both running and shoes. | Match conjunction |
-- These two queries produce the same results
SELECT description, rating, category
FROM mock_items
WHERE description ||| 'shoes';
SELECT description, rating, category
FROM mock_items
WHERE description &&& 'shoes';
By default, the match query automatically tokenizes the query string with the same tokenizer used by the field it's being searched against. This behavior can be overridden by explicitly casting the query to a different tokenizer.
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ||| 'running shoes'::pdb.whitespace; ```from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running shoes', operator='OR', tokenizer='whitespace'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_any(MockItem.description, "running shoes", tokenizer="whitespace"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_any("running shoes", tokenizer: "whitespace")
.select(:description, :rating, :category)
The match operators also accept text arrays. If a text array is provided, each element of the array is treated as an exact token, which means that no further processing is done.
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description &&& ARRAY['running', 'shoes']; ```from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running', 'shoes', operator='AND'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_all(MockItem.description, "running", "shoes"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_all("running", "shoes")
.select(:description, :rating, :category)