scanner/README.md
This document provides a comprehensive technical explanation of Navidrome's music library scanner system.
The Navidrome scanner is built on a multi-phase pipeline architecture designed for efficient processing of music files. It systematically traverses file system directories, processes metadata, and maintains a database representation of the music library. A key performance feature is that some phases run sequentially while others execute in parallel.
flowchart TD
subgraph "Scanner Execution Flow"
Controller[Scanner Controller] --> Scanner[Scanner Implementation]
Scanner --> Phase1[Phase 1: Folders Scan]
Phase1 --> Phase2[Phase 2: Missing Tracks]
Phase2 --> ParallelPhases
subgraph ParallelPhases["Parallel Execution"]
Phase3[Phase 3: Refresh Albums]
Phase4[Phase 4: Playlist Import]
end
ParallelPhases --> FinalSteps[Final Steps: GC + Stats]
end
%% Triggers that can initiate a scan
FileChanges[File System Changes] -->|Detected by| Watcher[Filesystem Watcher]
Watcher -->|Triggers| Controller
ScheduledJob[Scheduled Job] -->|Based on Scanner.Schedule| Controller
ServerStartup[Server Startup] -->|If Scanner.ScanOnStartup=true| Controller
ManualTrigger[Manual Scan via UI/API] -->|Admin user action| Controller
CLICommand[Command Line: navidrome scan] -->|Direct invocation| Controller
PIDChange[PID Configuration Change] -->|Forces full scan| Controller
DBMigration[Database Migration] -->|May require full scan| Controller
Scanner -.->|Alternative| External[External Scanner Process]
The execution flow shows that Phases 1 and 2 run sequentially, while Phases 3 and 4 execute in parallel to maximize performance before the final processing steps.
controller.go)This is the entry point for all scanning operations. It provides:
type Scanner interface {
// ScanAll starts a full scan of the music library. This is a blocking operation.
ScanAll(ctx context.Context, fullScan bool) (warnings []string, err error)
Status(context.Context) (*StatusInfo, error)
}
scanner.go)The primary implementation that orchestrates the four-phase scanning pipeline. Each phase follows the Phase interface pattern:
type phase[T any] interface {
producer() ppl.Producer[T]
stages() []ppl.Stage[T]
finalize(error) error
description() string
}
This design enables:
external.go)The External Scanner is a specialized implementation that offloads the scanning process to a separate subprocess. This is specifically designed to address memory management challenges in long-running Navidrome instances.
// scannerExternal is a scanner that runs an external process to do the scanning. It is used to avoid
// memory leaks or retention in the main process, as the scanner can consume a lot of memory. The
// external process will be spawned with the same executable as the current process, and will run
// the "scan" command with the "--subprocess" flag.
//
// The external process will send progress updates to the main process through its STDOUT, and the main
// process will forward them to the caller.
sequenceDiagram
participant MP as Main Process
participant ES as External Scanner
participant SP as Subprocess (navidrome scan --subprocess)
participant FS as File System
participant DB as Database
Note over MP: DevExternalScanner=true
MP->>ES: ScanAll(ctx, fullScan)
activate ES
ES->>ES: Locate executable path
ES->>SP: Start subprocess with args:
scan --subprocess --configfile ... etc.
activate SP
Note over ES,SP: Create pipe for communication
par Subprocess executes scan
SP->>FS: Read files & metadata
SP->>DB: Update database
and Main process monitors progress
loop For each progress update
SP->>ES: Send encoded progress info via stdout pipe
ES->>MP: Forward progress info
end
end
SP-->>ES: Subprocess completes (success/error)
deactivate SP
ES-->>MP: Return aggregated warnings/errors
deactivate ES
Technical details:
Process Isolation
--subprocess flag to indicate it's running as a child process--configfile, --datafolder, etc.)Inter-Process Communication
gob encoding for efficient binary transferMemory Management Benefits
Error Handling
phase_1_folders.go)This phase handles the initial traversal and media file processing.
flowchart TD
A[Start Phase 1] --> B{Full Scan?}
B -- Yes --> C[Scan All Folders]
B -- No --> D[Scan Modified Folders]
C --> E[Read File Metadata]
D --> E
E --> F[Create Artists]
E --> G[Create Albums]
F --> H[Save to Database]
G --> H
H --> I[Mark Missing Folders]
I --> J[End Phase 1]
Technical implementation details:
Folder Traversal
walkDirTree to traverse the directory structure.ndignore files for exclusionsMetadata Extraction
filesBatchSize = 200)MediaFile objectsAlbum and Artist Creation
Database Persistence
phase_2_missing_tracks.go)This phase identifies tracks that have moved or been deleted.
flowchart TD
A[Start Phase 2] --> B[Load Libraries]
B --> C[Get Missing and Matching Tracks]
C --> D[Group by PID]
D --> E{Match Type?}
E -- Exact --> F[Update Path]
E -- Same PID --> G[Update If Only One]
E -- Equivalent --> H[Update If No Better Match]
F --> I[End Phase 2]
G --> I
H --> I
Technical implementation details:
Track Identification Strategy
Match Analysis
Database Update Strategy
phase_3_refresh_albums.go)This phase updates album information based on the latest track metadata.
flowchart TD
A[Start Phase 3] --> B[Load Touched Albums]
B --> C[Filter Unmodified]
C --> D{Changes Detected?}
D -- Yes --> E[Refresh Album Data]
D -- No --> F[Skip]
E --> G[Update Database]
F --> H[End Phase 3]
G --> H
H --> I[Refresh Statistics]
Technical implementation details:
Album Selection Logic
Change Detection
Statistics Refreshing
phase_4_playlists.go)This phase imports and updates playlists from the file system.
flowchart TD
A[Start Phase 4] --> B{AutoImportPlaylists?}
B -- No --> C[Skip]
B -- Yes --> D{Admin User Exists?}
D -- No --> E[Log Warning & Skip]
D -- Yes --> F[Load Folders with Playlists]
F --> G{For Each Folder}
G --> H[Read Directory]
H --> I{For Each Playlist}
I --> J[Import Playlist]
J --> K[Pre-cache Artwork]
K --> L[End Phase 4]
C --> L
E --> L
Technical implementation details:
Playlist Discovery
Import Process
Configuration Awareness
After the four main phases, several finalization steps occur:
Garbage Collection
Statistics Refresh
Library Status Update
Database Optimization
The watcher system (watcher.go) provides real-time monitoring of file system changes:
flowchart TD
A[Start Watcher] --> B[For Each Library]
B --> C[Start Library Watcher]
C --> D[Monitor File Events]
D --> E{Change Detected?}
E -- Yes --> F[Wait for More Changes]
F --> G{Time Elapsed?}
G -- Yes --> H[Trigger Scan]
G -- No --> F
H --> I[Wait for Scan Completion]
I --> D
Technical implementation details:
Event Throttling
Library-specific Watching
Platform Adaptability
The scanner carefully manages album identity across scans:
A sophisticated algorithm identifies moved files:
If a scan is interrupted:
Several strategies minimize memory usage:
The scanner implements a sophisticated concurrency model to optimize performance:
Phase-Level Parallelism:
chain.RunParallel() functionWithin-Phase Concurrency:
phase_1_folders.go processes folders concurrently: ppl.NewStage(p.processFolder, ppl.Name("process folder"), ppl.Concurrency(conf.Server.DevScannerThreads))Pipeline Architecture Benefits:
Thread Safety Mechanisms:
The scanner's behavior can be customized through several configuration settings that directly affect its operation:
| Setting | Description | Default |
|---|---|---|
Scanner.Enabled | Whether the automatic scanner is enabled | true |
Scanner.Schedule | Cron expression or duration for scheduled scans (e.g., "@daily") | "0" (disabled) |
Scanner.ScanOnStartup | Whether to scan when the server starts | true |
Scanner.WatcherWait | Delay before triggering scan after file changes detected | 5s |
Scanner.ArtistJoiner | String used to join multiple artists in track metadata | " • " |
| Setting | Description | Default |
|---|---|---|
PlaylistsPath | Path(s) to search for playlists (supports glob patterns) | "" |
AutoImportPlaylists | Whether to import playlists during scanning | true |
| Setting | Description | Default |
|---|---|---|
DevExternalScanner | Use external process for scanning (reduces memory issues) | true |
DevScannerThreads | Number of concurrent processing threads during scanning | 5 |
| Setting | Description | Default |
|---|---|---|
PID.Track | Format for track persistent IDs (critical for tracking moved files) | "musicbrainz_trackid|albumid,discnumber,tracknumber,title" |
PID.Album | Format for album persistent IDs (affects album grouping) | "musicbrainz_albumid|albumartistid,album,albumversion,releasedate" |
These options can be set in the Navidrome configuration file (e.g., navidrome.toml) or via environment variables with the ND_ prefix (e.g., ND_SCANNER_ENABLED=false). For environment variables, dots in option names are replaced with underscores.
The Navidrome scanner represents a sophisticated system for efficiently managing music libraries. Its phase-based pipeline architecture, careful handling of edge cases, and performance optimizations allow it to handle libraries of significant size while maintaining data integrity and providing a responsive user experience.