docs/inbox-matching.md
The inbox matching system links inbox documents (receipts/invoices) to bank transactions using a deterministic scoring model optimized on real production feedback.
The current algorithm is embedding-free for inbox matching. It relies on:
This design keeps matching explainable, fast, and stable in production.
This document covers:
packages/db/src/queries/transaction-matching.ts,packages/db/src/utils/transaction-matching.ts,It does not cover unrelated transaction embedding features used elsewhere.
graph TD
A[New Inbox Item] --> B[process-attachment]
B --> C[batch-process-matching]
C --> D[findMatches]
E[New Transaction] --> F[match-transactions-bidirectional]
F --> G[findInboxMatches]
D --> H[scoreMatch]
G --> H
H --> I{Team thresholds}
I -->|auto threshold| J[Auto-match + confirm]
I -->|suggested threshold| K[Create suggestion]
I -->|below threshold| L[No match yet]
Candidate search is SQL-first and efficient:
team_id) and status-bounded records.pg_trgm text similarity (word_similarity) for name-driven retrieval.pg_trgm is used for retrieval speed and relevance, while final ranking is done by the custom scorer.
Final confidence is produced by scoreMatch() from:
nameScore from normalized token similarity and containment logic,amountScore with strict same-currency behavior and base-amount cross-currency handling,currencyScore (same currency strongest; shared base currency next),dateScore with invoice/expense-aware timing logic.Confidence receives additional guarded adjustments:
For a normalized (inboxName, transactionName) pair:
This improves recurring merchant variant matching (e.g. legal entity vs card statement name). Alias learning is scoped per-team — one team's data never influences another team's matching.
Declines/unmatches are converted into a decayed penalty:
This prevents repeated bad suggestions while remaining recoverable.
Previously dismissed exact inbox/transaction pairs are not re-suggested.
Calibration is computed from recent labeled outcomes and cached briefly in memory.
calibratedSuggestedThresholdcalibratedAutoThreshold (strict, derived above suggested threshold)This avoids one global threshold for all teams and improves precision/recall balance per team.
Auto-match is conservative and requires both:
Pattern gate expectations include repeated confirmations and high historical reliability, with low negative evidence.
If not eligible for auto-match but above suggested threshold, a pending suggestion is created instead.
Inbox:
new -> analyzing -> pending -> suggested_match/done (or later no_match)
Suggestion:
pending -> confirmed/declined/unmatched/expired
unmatched is treated as negative feedback for future calibration/penalties.
packages/db/src/scripts/matching-eval-db.ts provides safe verification against real DB data.
Key properties:
BEGIN TRANSACTION READ ONLY),Main command:
bun run eval:matching:db
Useful options:
--team-id <uuid>--from-days-ago <n> --to-days-ago <n>--fixed-threshold <n>--show-review-list true --review-limit <n>Review list output highlights:
Compared to the legacy embedding-driven matcher, V2 is:
packages/db/src/queries/transaction-matching.tspackages/db/src/queries/inbox-matching.tspackages/db/src/queries/transactions.tspackages/db/src/queries/inbox.tspackages/db/src/utils/transaction-matching.tspackages/db/src/scripts/matching-eval-db.ts