docs/documentation/full-text/phrase.mdx
Phrase queries work exactly like match conjunction, but are more strict in that they require the order and position of tokens to be the same.
Suppose our query is running shoes, and we want to omit results like
running sleek shoes or shoes running — these results contain the right tokens, but not in the exact order and position
that the query specifies.
Enter the ### phrase operator:
INSERT INTO mock_items (description, rating, category) VALUES
('running sleek shoes', 5, 'Footwear'),
('shoes running', 5, 'Footwear');
from paradedb import ParadeDB, Phrase
MockItem.objects.filter(
description=ParadeDB(Phrase('running shoes'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.phrase(MockItem.description, "running shoes"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.phrase("running shoes")
.select(:description, :rating, :category)
This query returns:
description | rating | category
---------------------+--------+----------
Sleek running shoes | 5 | Footwear
(1 row)
Note that running sleek shoes and shoes running did not match the phrase running shoes despite having the tokens running and
shoes because they appear in the wrong order or with other words in between.
Let's look at what happens under the hood for the above phrase query:
description column. In this example,
let's assume description uses the unicode tokenizer.running shoes becomes two tokens: running and shoes.description contains running immediately followed by shoes.Let’s consider a few more hypothetical documents to see whether they would be returned by the phrase query.
These examples assume that index uses the default tokenizer and token filters, and that the query is
running shoes.
| Original Text | Tokens | Match | Reason | Related |
|---|---|---|---|---|
| Sleek running shoes | sleek running shoes | ✅ | Contains running and shoes, in that order. | |
| Sleek shoes running | sleek shoes running | ❌ | running and shoes not in the right order. | Match conjunction |
| SLeeK RUNNING ShOeS | sleek running shoes | ✅ | Contains running and shoes, in that order. | Lowercasing |
| Sleek run shoe | sleek run shoe | ❌ | Does not contain both running and shoes. | Stemming |
| Sleke ruining shoez | sleke ruining shoez | ❌ | Does not contain both running and shoes. | |
| White jogging shoes | white jogging shoes | ❌ | Does not contain both running and shoes. |
Slop allows the token ordering requirement of phrase queries to be relaxed. It specifies how many changes — like extra words in between or transposed word positions — are allowed while still considering the phrase a match:
sleek shoes vs. sleek running shoes) has a slop of 1running shoes vs. shoes running) has a slop of 2To apply slop to a phrase query, cast the query to slop(n), where n is the maximum allowed slop.
from paradedb import ParadeDB, Phrase
MockItem.objects.filter(
description=ParadeDB(Phrase('shoes running', slop=2))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.phrase(MockItem.description, "shoes running", slop=2))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.phrase("shoes running", slop: 2)
.select(:description, :rating, :category)
The phrase query supports custom query tokenization.
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### 'running shoes'::pdb.whitespace; ```from paradedb import ParadeDB, Phrase
MockItem.objects.filter(
description=ParadeDB(Phrase('running shoes', tokenizer='whitespace'))
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.phrase(MockItem.description, "running shoes", tokenizer="whitespace"))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.phrase("running shoes", tokenizer: "whitespace")
.select(:description, :rating, :category)
The phrase operator also accepts a text array as the right-hand side argument. If a text array is provided, each element of the array is treated as an exact token, which means that no further processing is done.
The following query matches documents containing the token shoes immediately followed by running:
MockItem.objects.extra(
where=["description ### ARRAY['running', 'shoes']"]
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.phrase(MockItem.description, ["running", "shoes"]))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.phrase(%w[running shoes])
.select(:description, :rating, :category)
Adding slop is supported:
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### ARRAY['shoes', 'running']::pdb.slop(2); ```MockItem.objects.extra(
where=["description ### ARRAY['shoes', 'running']::pdb.slop(2)"]
).values('description', 'rating', 'category')
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.phrase(MockItem.description, ["shoes", "running"], slop=2))
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.phrase(%w[shoes running], slop: 2)
.select(:description, :rating, :category)