docs/documentation/full-text/fuzzy.mdx
Fuzziness allows for tokens to be considered a match even if they are not identical, allowing for typos in the query string.
<Warning> While fuzzy matching will work for non-latin characters (Chinese, Japanese, Korean, etc..), it may not give expected results (with large result sets returned) as Levenshtein distance relies on individual character difference.If you need this functionality then please thumbs-up this issue, and leave a comment with your use case.
</Warning>To add fuzziness to a query, cast it to the fuzzy(n) type, where n is the edit distance.
Fuzziness is supported for match and term queries.
-- Fuzzy match conjunction SELECT id, description FROM mock_items WHERE description &&& 'runing shose'::pdb.fuzzy(2) LIMIT 5;
-- Fuzzy Term SELECT id, description FROM mock_items WHERE description === 'shose'::pdb.fuzzy(2) LIMIT 5;
```ts Drizzle
import { search } from "@paradedb/drizzle-paradedb";
// Fuzzy match disjunction
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.matchAny(mockItems.description, search.fuzzy("runing shose", 2)))
.limit(5);
// Fuzzy match conjunction
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.matchAll(mockItems.description, search.fuzzy("runing shose", 2)))
.limit(5);
// Fuzzy term
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.term(mockItems.description, search.fuzzy("shose", 2)))
.limit(5);
from paradedb import Fuzzy, MatchAll, MatchAny, ParadeDB, Term
# Fuzzy match disjunction
MockItem.objects.filter(
description=ParadeDB(MatchAny(Fuzzy('runing shose', 2)))
).values('id', 'description')[:5]
# Fuzzy match conjunction
MockItem.objects.filter(
description=ParadeDB(MatchAll(Fuzzy('runing shose', 2)))
).values('id', 'description')[:5]
# Fuzzy term
MockItem.objects.filter(
description=ParadeDB(Fuzzy(Term('shose'), 2))
).values('id', 'description')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
# Fuzzy match disjunction
fuzzy_or_stmt = (
select(MockItem.id, MockItem.description)
.where(search.match_any(MockItem.description, "runing shose", distance=2))
.limit(5)
)
# Fuzzy match conjunction
fuzzy_and_stmt = (
select(MockItem.id, MockItem.description)
.where(search.match_all(MockItem.description, "runing shose", distance=2))
.limit(5)
)
# Fuzzy term
fuzzy_term_stmt = (
select(MockItem.id, MockItem.description)
.where(search.term(MockItem.description, "shose", distance=2))
.limit(5)
)
with Session(engine) as session:
{
"or_rows": session.execute(fuzzy_or_stmt).all(),
"and_rows": session.execute(fuzzy_and_stmt).all(),
"term_rows": session.execute(fuzzy_term_stmt).all(),
}
# Fuzzy match disjunction
MockItem.search(:description)
.matching_any('runing shose', distance: 2)
.select(:id, :description)
.limit(5)
# Fuzzy match conjunction
MockItem.search(:description)
.matching_all('runing shose', distance: 2)
.select(:id, :description)
.limit(5)
# Fuzzy term
MockItem.search(:description)
.term("shose", distance: 2)
.select(:id, :description)
.limit(5)
// Fuzzy match disjunction
await dbContext
.MockItems.Where(item =>
EF.Functions.MatchAny(item.Description, Pdb.Fuzzy("runing shose", 2))
)
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();
// Fuzzy match conjunction
await dbContext
.MockItems.Where(item =>
EF.Functions.MatchAll(item.Description, Pdb.Fuzzy("runing shose", 2))
)
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();
// Fuzzy term
await dbContext
.MockItems.Where(item => EF.Functions.Term(item.Description, Pdb.Fuzzy("shose", 2)))
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();
By default, the match and term queries require exact token matches between the query and indexed text. When a query is cast to fuzzy(n), this requirement is relaxed -- tokens are matched if their Levenshtein distance, or edit distance, is less than or equal to n.
Edit distance is a measure of how many single-character operations are needed to turn one string into another. The allowed operations are:
112<Note>For performance reasons, the maximum allowed edit distance is 2.</Note>
<Note>Casting a query to fuzzy(0) is the same as an exact token match.</Note>
fuzzy also supports prefix matching.
For instance, "runn" is a prefix of "running" because it matches the beginning of the token exactly. "rann" is a fuzzy prefix of "running" because it matches the
beginning within an edit distance of 1.
To treat the query string as a prefix, set the second argument of fuzzy to either t or "true":
import { search } from "@paradedb/drizzle-paradedb";
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.term(mockItems.description, search.fuzzy("rann", 1, true)))
.limit(5);
from paradedb import Fuzzy, ParadeDB, Term
MockItem.objects.filter(
description=ParadeDB(Fuzzy(Term('rann'), 1, prefix=True))
).values('id', 'description')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.id, MockItem.description)
.where(search.term(MockItem.description, "rann", distance=1, prefix=True))
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.term("rann", distance: 1, prefix: true)
.select(:id, :description)
.limit(5)
await dbContext
.MockItems.Where(item => EF.Functions.Term(item.Description, Pdb.Fuzzy("rann", 1, true)))
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();
When used with match queries, fuzzy prefix treats all tokens in the query string as prefixes.
For instance, the following query means "find all documents containing the fuzzy prefix rann AND the fuzzy prefix slee":
import { search } from "@paradedb/drizzle-paradedb";
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.matchAll(mockItems.description, search.fuzzy("slee rann", 1, true)))
.limit(5);
from paradedb import Fuzzy, MatchAll, ParadeDB
MockItem.objects.filter(
description=ParadeDB(MatchAll(Fuzzy('slee rann', 1, prefix=True)))
).values('id', 'description')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.id, MockItem.description)
.where(search.match_all(MockItem.description, "slee rann", distance=1, prefix=True))
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_all("slee rann", distance: 1, prefix: true)
.select(:id, :description)
.limit(5)
await dbContext
.MockItems.Where(item =>
EF.Functions.MatchAll(item.Description, Pdb.Fuzzy("slee rann", 1, true))
)
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();
By default, the cost of a transposition (i.e. "shose" → "shoes") is 2. Setting the third argument of fuzzy to t lowers the
cost of a transposition to 1:
import { search } from "@paradedb/drizzle-paradedb";
await db
.select({
id: mockItems.id,
description: mockItems.description,
})
.from(mockItems)
.where(search.term(mockItems.description, search.fuzzy("shose", 1, false, true)))
.limit(5);
from paradedb import Fuzzy, ParadeDB, Term
MockItem.objects.filter(
description=ParadeDB(Fuzzy(Term('shose'), 1, transposition_cost_one=True))
).values('id', 'description')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.id, MockItem.description)
.where(search.term(MockItem.description, "shose", distance=1, transpose_cost_one=True))
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.term("shose", distance: 1, transposition_cost_one: true)
.select(:id, :description)
.limit(5)
await dbContext
.MockItems.Where(item =>
EF.Functions.Term(item.Description, Pdb.Fuzzy("shose", 1, false, true))
)
.Select(item => new { item.Id, item.Description })
.Take(5)
.ToListAsync();