crates/log-analyzer/README.md
A powerful log analysis system for Spacedrive that parses, patterns, and collapses repetitive log entries into queryable insights.
For most debugging tasks, start with the phase summary:
cargo run --example analyze_sync_log --features cli -- phases test.log --duration 5
This aggregates 85,000+ interleaved log lines into 5-second windows showing operation counts, perfect for understanding sync flow.
use log_analyzer::LogAnalyzer;
let analyzer = LogAnalyzer::from_file("test.log")?;
// Phase-based summary (most useful)
let summary = analyzer.generate_phase_summary(5)?; // 5s windows
println!("{}", summary);
// Statistics
println!("Compression: {:.1}%", analyzer.compression_ratio() * 100.0);
# Phase summary (recommended - shows "what happened when")
cargo run --example analyze_sync_log --features cli -- phases test.log --duration 5
# Statistics (quick overview)
cargo run --example analyze_sync_log --features cli -- stats test.log
# Condensed timeline (preserves chronological order)
cargo run --example analyze_sync_log --features cli -- condense test.log
# Full analysis with markdown report
cargo run --example analyze_sync_log --features cli -- analyze test.log
# Export to JSON with database
cargo run --example analyze_sync_log --features cli -- analyze test.log \
--format json \
--output analysis.json \
--database analysis.db
Extracts structured components from each log line:
2025-11-16T07:19:57.232531Z DEBUG ThreadId(02) sd_core::sync::peer: Recorded ACK peer=1817e146
↓
{
timestamp: 2025-11-16T07:19:57.232531Z,
level: DEBUG,
thread: ThreadId(02),
module: "sd_core::sync::peer",
message: "Recorded ACK peer=1817e146"
}
Identifies templates by comparing similar messages:
Log A: "Recorded ACK peer=1817e146"
Log B: "Recorded ACK peer=8ef7a321"
↓
Template: "Recorded ACK peer={UUID}"
Groups consecutive instances with statistics:
Template: "Recorded ACK peer={UUID}"
Count: 1992
Duration: 96ms
Variables:
- UUID: 1992 unique values
Identifies repeating multi-step patterns:
Pattern: [Commit, Emit Event]
Detected: This 2-step sequence repeats 1000× times
Result: 2000 groups → 1 sequence (compression!)
Aggregates operations by time windows for high-level flow understanding:
cargo run --example analyze_sync_log --features cli -- phases test.log --duration 5
Output:
## 09:15:28 → 09:15:33 (5s, 12,450 events)
### Key Operations
[4,028×] transaction: Committing device-owned data (entry)
[3,829×] sync_transport: [MockTransport] Delivering message to target
[3,829×] peer: State change applied successfully (entry)
[2,015×] peer: Broadcasting shared change (content_identity)
[1,936×] peer: Shared change applied successfully
[ 200×] dependency: Added dependency tracking
### By Module
sd_core::service::sync: 8,450 events
sd_core::infra::sync: 2,100 events
sync_realtime_integration_test::helpers: 1,900 events
Benefits:
Save to SQLite for advanced queries:
-- Query sequences
SELECT id, repetitions, description, group_count
FROM sequences
ORDER BY repetitions DESC;
-- Query templates
SELECT template_id, COUNT(*) as count
FROM log_instances
WHERE timestamp BETWEEN '07:19:57' AND '07:20:00'
GROUP BY template_id;
Log File → Parser → Pattern Detector → Collapse Engine → Sequence Detector → Database
↓
Analysis & Reports
See examples/ directory:
simple.rs - Basic library usageanalyze_sync_log.rs - Full-featured CLI toolcargo test -p log-analyzer
Real-world results from Spacedrive sync test logs:
Example sequences detected:
Shows aggregated operations per time window - answers "what happened when?"
## 09:15:37 → 09:15:42 (5s, 46,581 events)
[5,374×] Delivering message to target
[3,576×] State change sent successfully
[1,798×] Shared change applied successfully
[1,798×] Broadcasting shared change
[1,798×] ACK sent successfully
Use when: Debugging sync flow, understanding system behavior
Shows collapsed sequences preserving order - answers "in what sequence?"
[999× SEQUENCE] 09:15:28 → 09:15:38 (2 steps, 10s total)
Step 1: Committing device-owned data (entry)
Step 2: Sync event emitted (StateChange)
09:15:38 Location created successfully
[100×] 09:15:38 Indexing jobs still running (200ms)
Use when: Timeline analysis, event ordering matters
Markdown/JSON with complete statistics and templates
Use when: Sharing analysis, generating documentation
Instead of manually scrolling through 85,000 log lines, get:
Phase Summary:
Condensed Timeline:
-- Find the most frequently repeating sequences
SELECT
s.id,
s.repetitions,
s.description,
s.group_count,
s.template_sequence
FROM sequences s
ORDER BY s.repetitions DESC
LIMIT 10;
See /docs/core/design/LOG_ANALYSIS_SYSTEM.md for complete design specification.