website/blog/posts/2025-08-04-reliability-sprint.md
When AWS launched S3 in 2006 they didn’t lead with features — they led with eleven nines.
Our last quarter was our own “eleven‑nine sprint.” We set one goal: make ElectricSQL so boring‑reliable that you stop thinking about it and just build.
Everyone says "use boring software". How do you become the "boring software"? As it turns out, through lots of unglamorous work.
Electric is a Postgres-native, CDN-powered sync engine. We power the sync layer for companies all around the world.
Your sync layer just has to work. It's load-bearing infrastructure that drives critical data flows and people rightfully have database-level expectations for it.
We look to tools like S3 and Redis for inspiration.
S3 is a pretty simple idea. Read and write files in the cloud. Yet they made it extraordinary by delivering on the 11 9s promise and scaling it to essentially infinite capacity.
Redis is just a networked data structures server. But by scaling it hundreds of thousands of read/writes per second, it's essential glue in almost every backend.
Postgres-native sync only becomes extraordinary when scaled to huge numbers of subscribers, transactions/sec, and S3/PG levels of reliability.
We want sync to be magical infrastructure like S3 and Redis so for four months we went heads‑down on the unglamorous work of reliability engineering chasing every incident on Electric Cloud & every user-reported bug.
Companies like Trigger.dev have achieved 20,000 updates per second and sub-100ms latency using Electric.
Electric captures changes from Postgres via logical replication and streams them to clients over HTTP + JSON. This gives us:
The result is a system that handles 500GB+ of daily Postgres traffic while maintaining sub-100ms update latency. Our Electric Cloud syncs data to devices in over 100 countries every month.
Here's how we made Electric (almost) boring.
Electric now handles network failures gracefully:
#2753): If IPv6 fails, we automatically retry on IPv4#2682): Dedicated connection prevents pool exhaustion deadlocks#2651): If a slot falls too far behind, Electric recovers instead of halting#2654): "Unable to connect to Postgres" instead of stack traces#2866 forces a brutal kill‑and‑respawn, shrinking median recovery from 18 s → 1.2 s.Electric adapts to your evolving database:
The silent schema drift: New tables could appear without Electric noticing, breaking shapes. Now we auto-refresh metadata when unknown relations appear (#2510).
#2634): Detects and adapts when publications are altered externally#2507): Prevents including computed columns that would break replication#2487): Old shape queries continue working after upgradesYour clients stay connected through restarts:
#2624): Long-poll requests survive Electric restarts#2575): Individual shapes timeout independently, preventing one slow shape from blocking others#2476): Clear signal to clients when they need to refetch shapes#2576, #2531).Better visibility into what Electric is doing:
#2592): See "ReplicationClient" instead of anonymous PIDs in Observer#2637): Connection blocks and backpressure now emit observable events#2535, #2555): Dead processes no longer crash metrics collection#2684): Only log real problems, not routine retriesPreventing resource exhaustion at scale:
#2514): Inactive shapes automatically expire from memory#2501): Error responses (4xx/5xx) no longer pollute caches#2616).#2662): Race-free deletion of shape filesHandling edge cases in Postgres replication:
#2470): Fixed LSN persistence race conditions#2499): Don't sync updates that change nothing#2638): Fixed visibility of composite primary key changes#2604, #2617): Periodic checks can't crash ElectricWe've killed a huge number of bugs. But we're not done if something doesn't work right for you.
So see something odd or unexpected? Please file an issue or chat with us over on Discord.
Thanks to everyone who filed issues - we've learned a lot with all of you.
P.S. We've learned some surprising things about performance along the way. Check back soon for some news about huge performance improvements.