docs/long-term-plans/multi-client-file-sync-reliability.md
Status: Planned
The single-file approach (sync-data.json) has these specific weaknesses when multiple clients sync simultaneously:
_uploadWithRetry() (file-based-sync-adapter.service.ts:474) retries exactly once on rev mismatch. With 3+ clients syncing at similar intervals, the retry can also fail — the second upload attempt has no fallback.
The upload cycle is: download → read state snapshot → merge ops → encrypt → compress → upload. This can take seconds (especially with large state + archives). Any other client uploading during that window causes a conflict.
Every upload includes the complete application state (line 452: getStateSnapshot()), both archives, and 500 recent ops. This makes the file large and the upload slow, widening the race window.
WebDAV uses lastmod (seconds resolution) as the revision. Two uploads within the same second can't be distinguished. The syncVersion counter inside the file compensates, but only if the file is actually re-downloaded between attempts.
For local file sync (Electron/Android), there's no server-side compare-and-swap. The rev is an MD5 hash computed client-side, but the read-modify-write is not atomic.
It works reasonably well for 2 clients because:
It gets fragile with 3+ clients or short sync intervals because the single retry isn't enough, and the large file size makes uploads slow.
What: Fix the most obvious weaknesses without changing the storage model.
Changes to file-based-sync-adapter.service.ts:
Retry loop with exponential backoff instead of single retry
_uploadWithRetry() with a loop: attempt up to 3-5 timesLock file before upload (optional, for providers that support it)
sync.lock file with client ID + timestamp before uploadingmigration.lock in the codebaseWebDAV: use ETag headers instead of lastmod for revision
Pros: Minimal code change, backward compatible, no migration needed Cons: Still fundamentally limited — single file remains the bottleneck Reliability improvement: Good enough for 3-4 clients with reasonable sync intervals (2+ minutes)
What: Split into two files — a state snapshot (updated infrequently) and an operations log (updated every sync). This reduces contention because most sync cycles only touch the ops file.
Storage structure:
sync-data.json → state snapshot (updated every Nth sync or on demand)
sync-ops.jsonl → append-only operation log (updated every sync)
sync-meta.json → vector clock + syncVersion + metadata
How it works:
sync-ops.jsonl. This is smaller and faster than rewriting the full state.sync-ops.jsonl, filter to new ops. Fast because it's just the ops, not the full state.sync-data.json with current state and reset sync-ops.jsonl.sync-meta.json has the syncVersion counter. Only contested during uploads, and the file is tiny (fast upload → small race window).The key insight: Most sync cycles don't need to touch the large state file at all. Ops are small. Conflicts on a small file are rare and fast to resolve.
Pros: Significantly less contention, smaller uploads, backward-compatible migration path Cons: Three files to manage instead of one; append-only JSONL needs periodic compaction; providers that don't support append (Dropbox) would need to re-upload the ops file Reliability improvement: Handles 4-5+ concurrent clients well
Files to modify:
file-based-sync-adapter.service.ts — split upload/download into ops-only and snapshot pathsfile-based-sync.types.ts — add new file type constants, ops file formatappendFile() method (or just re-upload the ops file for providers that don't support append)What: Each client writes only to its own files. Other clients only read. Zero write conflicts by design.
Storage structure:
sp-sync/
clients/
<client-id-A>/
manifest.json # Batch list + vector clock (unencrypted)
ops/
<timestamp>-<seq>.jsonl # Immutable operation batch files
snapshot.json # This client's state snapshot (encrypted)
snapshot-archive-young.json
snapshot-archive-old.json
<client-id-B>/
manifest.json
ops/
...
How it works:
clients/<myId>/ops/, update manifest.json. Never modify another client's files.manifest.json → download new batch files by exact path.snapshot.json for initial state, then catches up with batch files.Why it eliminates conflicts:
manifest.json is the only mutable file per client, and only the owning client writes itImplementation: This would be a new provider (not modifying existing file-based sync), implementing OperationSyncCapable directly. The existing FileBasedSyncAdapterService stays unchanged for users who don't need multi-client reliability.
Pros: Zero contention, scales to any number of clients, works with folder sync tools Cons: More files to manage, needs directory listing support, biggest implementation effort, needs migration path Reliability improvement: Handles unlimited concurrent clients reliably
New files:
src/app/op-log/sync-providers/file-based/multi-client/multi-client-sync-adapter.service.tssrc/app/op-log/sync-providers/file-based/multi-client/multi-client-sync.types.tssrc/app/op-log/sync-providers/file-based/multi-client/multi-client-gc.service.tsModified files:
provider.const.ts — new provider ID (or config flag on existing providers)provider-manager.service.ts — register new providerglobal-config.model.ts — config for multi-client modesync-form.const.ts — UI toggle or separate provider optionLevel 1 (retry + backoff) is a quick win worth doing regardless — it's a small change that makes the current system more robust.
Level 3 (per-client files) is the correct long-term solution if multi-client reliability is a priority. It also naturally enables Syncthing compatibility as a side effect. Level 2 is a half-measure that adds complexity without fully solving the problem.
The question is whether to go 1 → 3 (quick fix now, proper solution later) or straight to 3.
listFiles()?Yes, but only for peer discovery — and it can be minimized with a manifest approach.
Level 3 needs listFiles() for two things:
clients/ directory to find other client IDsclients/<peerId>/ops/ to find new operation batchesWe can eliminate need #2 entirely with per-client manifest files. Each client updates its own manifest.json with the list of its batch files. Other clients read the manifest by exact path (clients/<peerId>/manifest.json) — no directory listing needed.
This reduces listFiles() to just peer discovery (listing clients/ once to find new peers). Known peers are cached locally.
listFiles())First sync / peer discovery (needs listFiles() once):
listFiles('clients/') → discover peer directoriesmanifest.json → get their batch files + vector clocksnapshot.json for initial stateNormal sync cycle (no listFiles() needed):
manifest.jsonmanifest.json → download new batch fileslistFiles('clients/') occasionally (every Nth cycle) to find new peerslistFiles() entirely?Alternatives considered:
register/<myId>.json. Still needs listing register/ to find peers.peers.json listing all peers. Creates the shared-mutable-file problem we're trying to avoid.Verdict: listFiles() is the cleanest solution. The missing implementations are trivial:
ipcMain.handle(IPC_FILE_SYNC_LIST_FILES, ...) with fs.readdirSync() — ~10 linesDocumentFile.listFiles() in Capacitor plugin — natural SAF capabilityImplementing listFiles() is much simpler than designing a discovery mechanism that avoids it.
Level 3 needs clients/<id>/ops/ directories to exist:
create_folder_v2 API (already available in the Dropbox API)fs.mkdirSync(path, { recursive: true }) — add to IPC handlerDocumentFile.createDirectory() — add to Capacitor plugin| Prerequisite | WebDAV | Dropbox | Electron | Android |
|---|---|---|---|---|
listFiles() | exists | exists | needs IPC handler (~10 lines) | needs implementation |
| Directory creation | auto (MKCOL) | needs createDir() call | needs mkdirSync() call | needs createDirectory() call |
uploadFile() to subdirs | works | works | works | works |
downloadFile() from subdirs | works | works | works | works |
Piggybacking was removed from the file-based sync adapter. Remote ops are now discovered exclusively via downloadOps() on the next sync cycle, eliminating the stale piggyback bug and simplifying the upload path.
FileBasedSyncData already has an unused checksum?: string field (line 83 in file-based-sync.types.ts). Could be leveraged for integrity verification in any level of improvement.
Recent commit 87d884ed17 ("fix(sync): prevent recurring task duplication across clients") confirms multi-client sync issues are a real problem users hit, not just theoretical.
listFiles()The IPC event FILE_SYNC_LIST_FILES is defined in ipc-events.const.ts:46 and exposed in preload.ts:47-48, but there is no ipcMain.handle() implementation in the Electron main process. So listFiles() is missing on both Android SAF and Electron LocalFile.
webdav-api.ts)