Phrase - Paradedb — ContextQMD

Phrase queries work exactly like match conjunction, but are more strict in that they require the order and position of tokens to be the same.

Suppose our query is running shoes, and we want to omit results like running sleek shoes or shoes running — these results contain the right tokens, but not in the exact order and position that the query specifies.

Enter the ### phrase operator:

sql

INSERT INTO mock_items (description, rating, category) VALUES
('running sleek shoes', 5, 'Footwear'),
('shoes running', 5, 'Footwear');

<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### 'running shoes'; ```

python

from paradedb import ParadeDB, Phrase

MockItem.objects.filter(
    description=ParadeDB(Phrase('running shoes'))
).values('description', 'rating', 'category')

python

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search

stmt = (
    select(MockItem.description, MockItem.rating, MockItem.category)
    .where(search.phrase(MockItem.description, "running shoes"))
)

with Session(engine) as session:
    session.execute(stmt).all()

ruby

MockItem.search(:description)
        .phrase("running shoes")
        .select(:description, :rating, :category)

</CodeGroup>

This query returns:

csv

     description     | rating | category
---------------------+--------+----------
 Sleek running shoes |      5 | Footwear
(1 row)

Note that running sleek shoes and shoes running did not match the phrase running shoes despite having the tokens running and shoes because they appear in the wrong order or with other words in between.

How It Works

Let's look at what happens under the hood for the above phrase query:

Retrieves the tokenizer configuration of the description column. In this example, let's assume description uses the unicode tokenizer.
Tokenizes the query string with the same tokenizer. This means running shoes becomes two tokens: running and shoes.
Finds all rows where description contains running immediately followed by shoes.

Examples

Let’s consider a few more hypothetical documents to see whether they would be returned by the phrase query. These examples assume that index uses the default tokenizer and token filters, and that the query is running shoes.

Original Text	Tokens	Match	Reason	Related
Sleek running shoes	`sleek` `running` `shoes`	✅	Contains `running` and `shoes`, in that order.
Sleek shoes running	`sleek` `shoes` `running`	❌	`running` and `shoes` not in the right order.	Match conjunction
SLeeK RUNNING ShOeS	`sleek` `running` `shoes`	✅	Contains `running` and `shoes`, in that order.	Lowercasing
Sleek run shoe	`sleek` `run` `shoe`	❌	Does not contain both `running` and `shoes`.	Stemming
Sleke ruining shoez	`sleke` `ruining` `shoez`	❌	Does not contain both `running` and `shoes`.
White jogging shoes	`white` `jogging` `shoes`	❌	Does not contain both `running` and `shoes`.

Adding Slop

Slop allows the token ordering requirement of phrase queries to be relaxed. It specifies how many changes — like extra words in between or transposed word positions — are allowed while still considering the phrase a match:

An extra word in between (e.g. sleek shoes vs. sleek running shoes) has a slop of 1
A transposition (e.g. running shoes vs. shoes running) has a slop of 2

To apply slop to a phrase query, cast the query to slop(n), where n is the maximum allowed slop.

<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### 'shoes running'::pdb.slop(2); ```

python

from paradedb import ParadeDB, Phrase

MockItem.objects.filter(
    description=ParadeDB(Phrase('shoes running', slop=2))
).values('description', 'rating', 'category')

python

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search

stmt = (
    select(MockItem.description, MockItem.rating, MockItem.category)
    .where(search.phrase(MockItem.description, "shoes running", slop=2))
)

with Session(engine) as session:
    session.execute(stmt).all()

ruby

MockItem.search(:description)
        .phrase("shoes running", slop: 2)
        .select(:description, :rating, :category)

</CodeGroup>

Using a Custom Tokenizer

The phrase query supports custom query tokenization.

<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### 'running shoes'::pdb.whitespace; ```

python

from paradedb import ParadeDB, Phrase

MockItem.objects.filter(
    description=ParadeDB(Phrase('running shoes', tokenizer='whitespace'))
).values('description', 'rating', 'category')

python

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search

stmt = (
    select(MockItem.description, MockItem.rating, MockItem.category)
    .where(search.phrase(MockItem.description, "running shoes", tokenizer="whitespace"))
)

with Session(engine) as session:
    session.execute(stmt).all()

ruby

MockItem.search(:description)
        .phrase("running shoes", tokenizer: "whitespace")
        .select(:description, :rating, :category)

</CodeGroup>

Using Pretokenized Text

The phrase operator also accepts a text array as the right-hand side argument. If a text array is provided, each element of the array is treated as an exact token, which means that no further processing is done.

The following query matches documents containing the token shoes immediately followed by running:

<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### ARRAY['running', 'shoes']; ```

python

MockItem.objects.extra(
    where=["description ### ARRAY['running', 'shoes']"]
).values('description', 'rating', 'category')

python

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search

stmt = (
    select(MockItem.description, MockItem.rating, MockItem.category)
    .where(search.phrase(MockItem.description, ["running", "shoes"]))
)

with Session(engine) as session:
    session.execute(stmt).all()

ruby

MockItem.search(:description)
        .phrase(%w[running shoes])
        .select(:description, :rating, :category)

</CodeGroup>

Adding slop is supported:

<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ### ARRAY['shoes', 'running']::pdb.slop(2); ```

python

MockItem.objects.extra(
    where=["description ### ARRAY['shoes', 'running']::pdb.slop(2)"]
).values('description', 'rating', 'category')

python

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search

stmt = (
    select(MockItem.description, MockItem.rating, MockItem.category)
    .where(search.phrase(MockItem.description, ["shoes", "running"], slop=2))
)

with Session(engine) as session:
    session.execute(stmt).all()

ruby

MockItem.search(:description)
        .phrase(%w[shoes running], slop: 2)
        .select(:description, :rating, :category)

</CodeGroup>