Back to Cherry Studio

Fuzzy Search for File List

docs/en/references/fuzzy-search.md

1.9.44.6 KB
Original Source

Fuzzy Search for File List

This document describes the fuzzy search implementation for file listing in Cherry Studio.

Overview

The fuzzy search feature allows users to find files by typing partial or approximate file names/paths. It uses a two-tier file filtering strategy (ripgrep glob pre-filtering with greedy substring fallback) combined with subsequence-based scoring for optimal performance and flexibility.

Features

  • Ripgrep Glob Pre-filtering: Primary filtering using glob patterns for fast native-level filtering
  • Greedy Substring Matching: Fallback file filtering strategy when ripgrep glob pre-filtering returns no results
  • Subsequence-based Segment Scoring: During scoring, path segments gain additional weight when query characters appear in order
  • Relevance Scoring: Results are sorted by a relevance score derived from multiple factors

Matching Strategies

1. Ripgrep Glob Pre-filtering (Primary)

The query is converted to a glob pattern for ripgrep to do initial filtering:

Query: "updater"
Glob:  "*u*p*d*a*t*e*r*"

This leverages ripgrep's native performance for the initial file filtering.

2. Greedy Substring Matching (Fallback)

When the glob pre-filter returns no results, the system falls back to greedy substring matching. This allows more flexible matching:

Query: "updatercontroller"
File:  "packages/update/src/node/updateController.ts"

Matching process:
1. Find "update" (longest match from start)
2. Remaining "rcontroller" → find "r" then "controller"
3. All parts matched → Success

Scoring Algorithm

Results are ranked by a relevance score based on named constants defined in FileStorage.ts:

ConstantValueDescription
SCORE_FILENAME_STARTS100Filename starts with query (highest priority)
SCORE_FILENAME_CONTAINS80Filename contains exact query substring
SCORE_SEGMENT_MATCH60Per path segment that matches query
SCORE_WORD_BOUNDARY20Query matches start of a word
SCORE_CONSECUTIVE_CHAR15Per consecutive character match
PATH_LENGTH_PENALTY_FACTOR4Logarithmic penalty for longer paths

Scoring Strategy

The scoring prioritizes:

  1. Filename matches (highest): Files where the query appears in the filename are most relevant
  2. Path segment matches: Multiple matching segments indicate stronger relevance
  3. Word boundaries: Matching at word starts (e.g., "upd" matching "update") is preferred
  4. Consecutive matches: Longer consecutive character sequences score higher
  5. Path length: Shorter paths are preferred (logarithmic penalty prevents long paths from dominating)

Example Scoring

For query updater:

FileScore Factors
RCUpdater.jsShort path + filename contains "updater"
updateController.tsMultiple segment matches
UpdaterHelper.plistLong path penalty

Configuration

DirectoryListOptions

typescript
interface DirectoryListOptions {
  recursive?: boolean      // Default: true
  maxDepth?: number        // Default: 10
  includeHidden?: boolean  // Default: false
  includeFiles?: boolean   // Default: true
  includeDirectories?: boolean // Default: true
  maxEntries?: number      // Default: 20
  searchPattern?: string   // Default: '.'
  fuzzy?: boolean          // Default: true
}

Usage

typescript
// Basic fuzzy search
const files = await window.api.file.listDirectory(dirPath, {
  searchPattern: 'updater',
  fuzzy: true,
  maxEntries: 20
})

// Disable fuzzy search (exact glob matching)
const files = await window.api.file.listDirectory(dirPath, {
  searchPattern: 'update',
  fuzzy: false
})

Performance Considerations

  1. Ripgrep Pre-filtering: Most queries are handled by ripgrep's native glob matching, which is extremely fast
  2. Fallback Only When Needed: Greedy substring matching (which loads all files) only runs when glob matching returns empty results
  3. Result Limiting: Only top 20 results are returned by default
  4. Excluded Directories: Common large directories are automatically excluded:
    • node_modules
    • .git
    • dist, build
    • .next, .nuxt
    • coverage, .cache

Implementation Details

The implementation is located in src/main/services/FileStorage.ts:

  • queryToGlobPattern(): Converts query to ripgrep glob pattern
  • isFuzzyMatch(): Subsequence matching algorithm
  • isGreedySubstringMatch(): Greedy substring matching fallback
  • getFuzzyMatchScore(): Calculates relevance score
  • listDirectoryWithRipgrep(): Main search orchestration