docs-src/docs/partial-sync.md
Partial sync is a replication pattern where a client only synchronizes a subset of the server's data instead of the full dataset. The subset is defined by a scope, for example a geographic region, a project, a tenant, a permission group, or a chunk of a game world. RxDB does not have a single "partial sync" switch; the pattern is built on top of the standard RxDB Sync Engine by running multiple replication states in parallel, each one filtered to a different scope, and starting or stopping them as the user's context changes.
This page explains what partial sync is, when to use it, and how to implement it correctly with RxDB.
A full sync, where every client downloads the entire collection, works well when the dataset is small or when every client truly needs every document. Partial sync becomes useful when one or more of the following is true:
The pattern shows up in very different products, but the shape of the solution stays the same: pick a scope id, run one replication per active scope, stop replications when the scope is no longer in use.
A voxel game (think Minecraft) divides the world into fixed-size chunks, where each chunk is a cube of blocks identified by its grid coordinates. The full world can be effectively infinite, so syncing every block to every client is not an option, both because of the amount of data and because the player only ever interacts with the blocks near them.
The natural scope is the chunk id. The client keeps one db.voxels collection and starts a replication for every chunk inside the player's render distance, for example a 5x5 area around the player. Each replication uses a replicationIdentifier like voxels-chunk-123 and a pull URL like /api/voxels/pull?chunkId=123. As the player walks, a new ring of chunks comes into range and an old ring leaves. The client starts replications for the new chunks and cancels replications for the chunks that fell out of range. Because every replication keeps its own checkpoint, walking back to a chunk visited five minutes ago only fetches the blocks that other players modified in the meantime instead of re-downloading the whole chunk.
The player can visit thousands of chunks per session, so old chunks should be removed from local storage when the scope is left to keep the synced dataset small.
In a multi-tenant SaaS product (think Notion, Linear, or a B2B CRM), every user belongs to one or more workspaces or tenants, and each workspace has its own documents, tasks, customers, or notes. A user only ever works inside one workspace at a time, and they must never see data from a workspace they have no access to. Downloading every workspace's data to the client would waste bandwidth and would also be a serious permissions problem.
The scope is the workspace id. When the user opens workspace acme, the client starts a replication with replicationIdentifier: 'workspace-acme' against an endpoint like /api/sync?workspaceId=acme. The server enforces, on every pull and push, that the authenticated user is allowed to access that workspace and only returns rows that belong to it. When the user switches to workspace globex, the client starts a second replication for globex and, depending on product behavior, either keeps the acme replication running in the background (so notifications and counts stay live) or cancels it to save resources. Because each workspace has its own replicationIdentifier, switching back to acme later resumes from the last checkpoint instead of re-syncing from scratch.
This is also a good fit for the cleanup plugin: when a user leaves a workspace permanently, removing the locally cached documents for that scope keeps the on-device footprint proportional to the workspaces the user actually uses.
A field-service or last-mile delivery app (think a technician dispatch tool or a courier app) gives each worker a list of jobs assigned to them for the day. The backend has work orders for thousands of technicians across many regions, but a single technician only needs the orders on their current route, plus a small buffer for jobs that might be reassigned to them. They also need full offline support, because the app is used inside basements, elevators, and rural areas with no signal.
The scope here can be the route id, the assigned-technician id, or even a geo region like a city or postal code. The client starts a replication identified by that scope, for example routes-2026-05-28-tech-42, and the server returns only the work orders matching it. When dispatch reassigns the technician to a different route mid-day, the client cancels the old replication and starts a new one. Updates the technician made offline (status changes, signatures, photos) sit in the local collection until the device gets signal again, at which point the active scope's replication pushes them up and resolves any conflicts.
Compared to a full sync of all work orders, this keeps the device's storage tiny, makes the initial load fast even on a weak connection, and removes any risk of one technician seeing another technician's jobs.
Partial sync in RxDB is built on three properties of the Sync Engine:
replicateRxCollection() returns an independent RxReplicationState that can be started and cancelled at any time.replicationIdentifier. RxDB stores a separate checkpoint per identifier, so two replications never share progress and never overwrite each other's metadata.pull.handler is just a function. It can call any URL with any query parameters, so filtering the data by scope is done on the server using parameters that the client passes in.The pattern is: keep one RxCollection on the client (for example db.voxels) and attach multiple replication states to it. Each replication uses a different replicationIdentifier (for example voxels-chunk-123) and a pull/push handler that talks to a scoped server endpoint (for example /api/voxels?chunkId=123). When the user enters a new scope, you start a new replication. When they leave, you cancel the old one.
RxCollection: db.voxels
|
+------------------------+------------------------+
| | |
replicationState replicationState replicationState
id: chunk-123 id: chunk-124 id: chunk-125
pull: ?chunkId=123 pull: ?chunkId=124 pull: ?chunkId=125
All replication states share the same local storage, so queries against the collection return documents from every active scope at once. The user does not see a difference between data that came from one chunk or another, they just see the documents.
The minimal implementation has three parts:
startScopeReplication(scopeId) function that creates a new replication if one does not exist.stopScopeReplication(scopeId) function that cancels the replication when the scope is no longer needed.import { replicateRxCollection } from 'rxdb/plugins/replication';
const activeReplications: Record<string, RxReplicationState<any, any>> = {};
function startChunkReplication(chunkId: string) {
if (activeReplications[chunkId]) return activeReplications[chunkId];
const replicationState = replicateRxCollection({
collection: db.voxels,
// Unique per scope so RxDB tracks its own checkpoint for this slice.
replicationIdentifier: 'voxels-chunk-' + chunkId,
live: true,
pull: {
async handler(checkpoint, batchSize) {
const minUpdatedAt = checkpoint ? checkpoint.updatedAt : 0;
const res = await fetch(
`/api/voxels/pull?chunkId=${chunkId}` +
`&minUpdatedAt=${minUpdatedAt}&limit=${batchSize}`
);
const documents = await res.json();
return {
documents,
checkpoint: documents.length === 0
? checkpoint
: {
id: documents[documents.length - 1].id,
updatedAt: documents[documents.length - 1].updatedAt
}
};
},
// Tag every pulled document with the scope it came from.
// This is what makes per-scope cleanup possible later.
modifier: doc => ({ ...doc, chunkId })
},
push: {
async handler(changeRows) {
const res = await fetch(`/api/voxels/push?chunkId=${chunkId}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(changeRows)
});
// Must return an array of conflicts, or [] if there are none.
return res.json();
},
// Only push documents that belong to this scope.
// Returning null skips the document for this replication.
modifier: doc => doc.chunkId === chunkId ? doc : null
}
});
activeReplications[chunkId] = replicationState;
return replicationState;
}
async function stopChunkReplication(chunkId: string) {
const rep = activeReplications[chunkId];
if (!rep) return;
await rep.cancel();
delete activeReplications[chunkId];
}
// Reconcile active scopes whenever the player moves.
function onPlayerMove(neighboringChunkIds: string[]) {
neighboringChunkIds.forEach(startChunkReplication);
Object.keys(activeReplications).forEach(cid => {
if (!neighboringChunkIds.includes(cid)) {
stopChunkReplication(cid);
}
});
}
The server must understand the scope parameter and only return documents that belong to it. A typical pull endpoint:
chunkId, minUpdatedAt and limit from the request.limit documents from the scope, ordered by updatedAt, that have changed after minUpdatedAt.A typical push endpoint:
{ assumedMasterState, newDocumentState } rows.assumedMasterState to the current server state.newDocumentState. If they do not, returns the current server state as a conflict.This is the same contract as for a normal full RxDB replication, just with an extra scope filter applied to the query.
For real time updates, add a pull.stream$ that emits whenever the server has new writes in the scope. See Replication for the full handler contract.
Because each replication has its own replicationIdentifier, RxDB stores a separate checkpoint for every scope in its internal replication metadata. When the user leaves chunk-123 and comes back later, the new replication state for chunk-123 reuses the existing checkpoint and only fetches documents that changed since the last sync. The client never re-downloads the whole chunk after the first time.
This is the same "git-like" diff sync that the standard Sync Engine uses, applied independently per scope.
The pull.modifier runs on every pulled document before it is written to local storage. Use it to add a field like chunkId, projectId or tenantId that records which scope a document belongs to. The corresponding push.modifier can then return null for documents that do not belong to the current scope, so each replication only pushes the rows it is responsible for.
Tagging is what makes the next two things possible:
If the document already has a natural scope field on the server (for example chunkId is already part of the schema), the pull.modifier is not strictly needed, the field comes down with the document. The modifier is only required when the scope is implicit on the server side.
Cancelling a replication state stops the sync, but it does not remove the documents that were already written into the local collection. For long running apps, those documents accumulate. There are two patterns to deal with this:
Remove the documents from the local collection on scope exit. After stopChunkReplication(chunkId), run a local query that finds all documents tagged with that chunkId and remove them. Use RxCollection.find({ selector: { chunkId } }).remove() for a bulk delete. Be aware that this also pushes deletes back to the server through any replication that still considers those documents in scope, so make sure the push modifiers filter correctly.
Let the cleanup plugin reclaim space over time. Deleted documents are kept around for a configurable minimumDeletedTime so deletions can still replicate, then the plugin physically removes them. Set awaitReplicationsInSync: true to make sure cleanup only runs when all active replication states are caught up.
For purely local "evict from cache" semantics (drop the data without telling the server), use RxCollection.bulkRemove() and combine it with a push modifier that returns null for those rows, so the local delete is not propagated as a server delete.
live: true replication also needs a pull.stream$ observable that emits new server-side writes as they happen. With partial sync, each replication state should subscribe to a stream that is already filtered to its scope, otherwise every client receives events for every scope and ignores most of them. Typical implementations:
chunkId query parameter.When the stream emits a RESYNC event, RxDB falls back to the pull handler with the last checkpoint, which gives you a way to recover from missed events.
Partial sync is more flexible than full sync but adds complexity. Things to keep in mind:
replicationIdentifier keeps its own metadata. If a user opens thousands of distinct scopes, call replicationState.remove() instead of cancel() to clear the metadata for scopes you do not plan to return to.pull.stream$ defeats the bandwidth savings of partial sync.The voxel example is just a convenient way to picture the pattern. The same shape applies to many real applications:
In all of these, the building blocks are the same: a stable replicationIdentifier per scope, a scoped pull/push handler, a tag on every pulled document, and a clear rule for when a scope is started and stopped. With those four pieces, RxDB gives you the offline-first, conflict-aware, resumable replication of the standard Sync Engine, applied to exactly the slice of data the user actually needs.
A single full replication still downloads every document in the collection to the client, even if your queries only read a small subset. Partial sync moves the filter to the server side so the documents are never sent over the wire and never written to local storage in the first place. Queries on the client then operate on a smaller dataset, which is faster and uses less disk.
</details> <details> <summary>Can I run many replication states on the same collection at the same time?</summary>Yes. RxDB does not limit the number of replication states attached to a single RxCollection. Each one keeps its own checkpoint metadata under its replicationIdentifier. In practice the limit is the number of concurrent network connections and the server's ability to handle parallel pull requests. If you need hundreds of active scopes, consider combining several into one replication on the server side and using a composite scope id.
No. You typically use one endpoint with a scope parameter, for example /api/sync/pull?scope=acme. The handler passes the scope as a query parameter on each pull and push call. The server uses that parameter to filter rows.
The document is stored once in the local collection because the primary key is shared. Without care, both replications would try to push local changes for that document. Use a push.modifier that returns null for documents that do not belong to the current scope, so exactly one replication is responsible for pushing each document. For pulls, RxDB de-duplicates by primary key, so receiving the same document from two scopes is safe.
No. As long as you reuse the same replicationIdentifier, RxDB resumes from the last stored checkpoint and only fetches documents that changed since then. This is the same diff sync that the standard Sync Engine uses, applied per scope.
The server should reject the next pull or push for that scope. On the client, cancel the replication and remove the locally cached documents that belong to it. The cleanup plugin can handle the physical removal of soft-deleted rows over time.
</details> <details> <summary>Can partial sync use the real-time pull stream?</summary>Yes, and it should. Each replication's pull.stream$ must only emit events for its scope, otherwise every client receives events for every scope. Implementations include one WebSocket per scope, a shared WebSocket with server-side subscriptions per scope, or Server-Sent Events with a scope query parameter.
Start the new scoped replications with their own replicationIdentifier values. The first pull for each scope will see no checkpoint and fetch from the beginning of that scope. Documents already in the local collection from the old full sync are matched by primary key and updated in place. Once the new replications are in sync, cancel the old full replication and call .remove() on it to clear its checkpoint metadata.
Not always. If the server can derive the scope from the document's own fields (for example a chunkId or workspaceId column), no client-side tagging is needed. If the scope is implicit on the server side, use a pull.modifier to attach the scope id to each pulled document so the client can identify and clean up data per scope later.