Back to Paradedb

Highlighting

docs/documentation/full-text/highlight.mdx

0.23.310.7 KB
Original Source
<Note> Highlighting is an expensive process and can slow down query times. We recommend passing a `LIMIT` to any query where `pdb.snippet` or `pdb.snippets` is called to restrict the number of snippets that need to be generated. </Note>

<Note>Highlighting is not supported for fuzzy search.</Note>

Highlighting refers to the practice of visually emphasizing the portions of a document that match a user's search query.

Basic Usage

pdb.snippet(<column>) can be added to any query where a ParadeDB operator is present. pdb.snippet returns the single best snippet, sorted by relevance score. The following query generates highlighted snippets against the description field.

<CodeGroup> ```sql SQL SELECT id, pdb.snippet(description) FROM mock_items WHERE description ||| 'shoes' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippet

MockItem.objects.filter(
    description=ParadeDB(Match('shoes', operator='OR'))
).annotate(
    snippet=Snippet('description')
).values('id', 'snippet')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(MockItem.id, pdb.snippet(MockItem.description).label("snippet"))
    .where(search.match_any(MockItem.description, "shoes"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("shoes")
        .with_snippet(:description)
        .select(:id)
        .limit(5)
</CodeGroup> <ParamField body="start_tag" default="<b>"> The leading indicator around the highlighted region. </ParamField> <ParamField body="end_tag" default="</b>"> The trailing indicator around the highlighted region. </ParamField> <ParamField body="max_num_chars" default={150}> Max number of characters for a highlighted snippet. A snippet may contain multiple matches if they are close to each other. </ParamField>

By default, <b></b> encloses the snippet. This can be configured with start_tag and end_tag:

<CodeGroup> ```sql SQL SELECT id, pdb.snippet(description, start_tag => '<i>', end_tag => '</i>') FROM mock_items WHERE description ||| 'shoes' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippet

MockItem.objects.filter(
    description=ParadeDB(Match('shoes', operator='OR'))
).annotate(
    snippet=Snippet('description', start_sel='<i>', stop_sel='</i>')
).values('id', 'snippet')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(
        MockItem.id,
        pdb.snippet(
            MockItem.description,
            start_tag="<i>",
            end_tag="</i>",
        ).label("snippet"),
    )
    .where(search.match_any(MockItem.description, "shoes"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("shoes")
        .with_snippet(:description, start_tag: "<i>", end_tag: "</i>")
        .select(:id)
        .limit(5)
</CodeGroup>

Multiple Snippets

pdb.snippets(<column>) returns an array of snippets, allowing you to retrieve multiple highlighted matches from a document. This is particularly useful when a document has several relevant matches spread throughout its content.

<CodeGroup> ```sql SQL SELECT id, pdb.snippets(description, max_num_chars => 15) FROM mock_items WHERE description ||| 'artistic vase' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippets

MockItem.objects.filter(
    description=ParadeDB(Match('artistic vase', operator='OR'))
).annotate(
    snippets=Snippets('description', max_num_chars=15)
).values('id', 'snippets')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(MockItem.id, pdb.snippets(MockItem.description, max_num_chars=15).label("snippets"))
    .where(search.match_any(MockItem.description, "artistic vase"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("artistic vase")
        .with_snippets(:description, max_chars: 15)
        .select(:id)
        .limit(5)
</CodeGroup>
ini
 id |                snippets
----+-----------------------------------------
 19 | {<b>Artistic</b>,"ceramic <b>vase</b>"}
(1 row)

<ParamField body="start_tag" default="<b>"> The leading indicator around the highlighted region. </ParamField> <ParamField body="end_tag" default="</b>"> The trailing indicator around the highlighted region. </ParamField> <ParamField body="max_num_chars" default={150}> Max number of characters for a highlighted snippet. When `max_num_chars` is small, multiple snippets may be generated for a single document. </ParamField> <ParamField body="limit" default={5}> The maximum number of snippets to return per document. </ParamField> <ParamField body="offset" default={0}> The number of snippets to skip before returning results. Use with `limit` for pagination. </ParamField> <ParamField body="sort_by" default="score"> The order in which to sort the snippets. Can be `'score'` (default, sorts by relevance) or `'position'` (sorts by appearance in the document). </ParamField>

Limiting and Offsetting Snippets

You can control the number and order of snippets returned using the limit, offset, and sort_by parameters.

For example, to get only the first snippet:

<CodeGroup> ```sql SQL SELECT id, pdb.snippets(description, max_num_chars => 15, "limit" => 1) FROM mock_items WHERE description ||| 'running' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippets

MockItem.objects.filter(
    description=ParadeDB(Match('running', operator='OR'))
).annotate(
    snippets=Snippets('description', max_num_chars=15, limit=1)
).values('id', 'snippets')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(MockItem.id, pdb.snippets(MockItem.description, max_num_chars=15, limit=1).label("snippets"))
    .where(search.match_any(MockItem.description, "running"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("running")
        .with_snippets(:description, max_chars: 15, limit: 1)
        .select(:id)
        .limit(5)
</CodeGroup>

To get the second snippet (by skipping the first one):

<CodeGroup> ```sql SQL SELECT id, pdb.snippets(description, max_num_chars => 15, "limit" => 1, "offset" => 1) FROM mock_items WHERE description ||| 'running' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippets

MockItem.objects.filter(
    description=ParadeDB(Match('running', operator='OR'))
).annotate(
    snippets=Snippets('description', max_num_chars=15, limit=1, offset=1)
).values('id', 'snippets')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(
        MockItem.id,
        pdb.snippets(MockItem.description, max_num_chars=15, limit=1, offset=1).label("snippets"),
    )
    .where(search.match_any(MockItem.description, "running"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("running")
        .with_snippets(:description, max_chars: 15, limit: 1, offset: 1)
        .select(:id)
        .limit(5)
</CodeGroup>

Sorting Snippets

Snippets can be sorted either by their relevance score ('score') or their position within the document ('position').

To sort snippets by their appearance in the document:

<CodeGroup> ```sql SQL SELECT id, pdb.snippets(description, max_num_chars => 15, sort_by => 'position') FROM mock_items WHERE description ||| 'artistic vase' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippets

MockItem.objects.filter(
    description=ParadeDB(Match('artistic vase', operator='OR'))
).annotate(
    snippets=Snippets('description', max_num_chars=15, sort_by='position')
).values('id', 'snippets')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(
        MockItem.id,
        pdb.snippets(MockItem.description, max_num_chars=15, sort_by="position").label("snippets"),
    )
    .where(search.match_any(MockItem.description, "artistic vase"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("artistic vase")
        .with_snippets(:description, max_chars: 15, sort_by: :position)
        .select(:id)
        .limit(5)
</CodeGroup>

Byte Offsets

pdb.snippet_positions(<column>) returns the byte offsets in the original text where the snippets would appear. It returns a two-dimensional integer array where each nested pair is [start, end): the first value is the byte index of the first highlighted byte, and the second value is the byte index immediately after the last highlighted byte.

<CodeGroup> ```sql SQL SELECT id, pdb.snippet(description), pdb.snippet_positions(description) FROM mock_items WHERE description ||| 'shoes' LIMIT 5; ```
python
from paradedb import Match, ParadeDB, Snippet, SnippetPositions

MockItem.objects.filter(
    description=ParadeDB(Match('shoes', operator='OR'))
).annotate(
    snippet=Snippet('description'),
    snippet_positions=SnippetPositions('description')
).values('id', 'snippet', 'snippet_positions')[:5]
python
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(
        MockItem.id,
        pdb.snippet(MockItem.description).label("snippet"),
        pdb.snippet_positions(MockItem.description).label("snippet_positions"),
    )
    .where(search.match_any(MockItem.description, "shoes"))
    .limit(5)
)

with Session(engine) as session:
    session.execute(stmt).all()
ruby
MockItem.search(:description)
        .matching_any("shoes")
        .with_snippet(:description)
        .with_snippet_positions(:description)
        .select(:id)
        .limit(5)
</CodeGroup>
ini
 id |          snippet           | snippet_positions
----+----------------------------+-------------------
  4 | White jogging <b>shoes</b> | {{14,19}}
  3 | Sleek running <b>shoes</b> | {{14,19}}
  5 | Generic <b>shoes</b>       | {{8,13}}
(3 rows)