Back to Napkin Math

Napkin Math

README.md

latest16.2 KB
Original Source

Napkin Math

The goal of this project is to collect software, numbers, and techniques to quickly estimate the expected performance of systems from first-principles. For example, how quickly can you read 1 GB of memory? By composing these resources you should be able to answer interesting questions like: how much storage cost should you expect to pay for logging for an application with 100,000 RPS?

The best introduction to this skill is through my talk at SRECON.

The best way to practise napkin math in the grand domain of computers is to work on your own problems. The second-best is to subscribe to this newsletter where you'll get a problem every few weeks to practise on. It should only take you a few minutes to solve each one as your facility with these techniques improve.

The archive of problems to practise with are here. The solution will be in the following newsletter.

Numbers

Below are numbers rounded for memorization, not faux precision. The rows this repo can currently refresh on a single host were re-measured and revalidated on fresh GCP c4-standard-48-lssd instances on March 8, 2026 (Intel Xeon 6985P-C, 48 vCPU / 24 physical cores, 180 GB RAM, Ubuntu 22.04.5 LTS).

Note 1: Some throughput and latency numbers don't line up, this is intentional for ease of calculations.

Note 2: Take the numbers with a grain of salt. E.g. for I/O, fio is the state-of-the-art. I am continuously updating these numbers as I learn more to improve accuracy and as hardware improves.

OperationLatencyThroughput1 MiB1 GiB
Sequential Memory R/W (64 bytes)0.5 ns
├ Single Thread20 GiB/s50 μs50 ms
├ Threaded200 GiB/s5 μs5 ms
Network Same-Zone10 GiB/s100 μs100 ms
├ Inside VPC10 GiB/s100 μs100 ms
├ Outside VPC3 GiB/s300 μs300 ms
Hashing, not crypto-safe (64 bytes)10 ns5 GiB/s200 μs200 ms
Random Memory R/W (64 bytes)20 ns3 GiB/s300 μs300 ms
Fast Serialization [8] [9]N/A1 GiB/s1 ms1s
Fast Deserialization [8] [9]N/A1 GiB/s1 ms1s
System Call300 nsN/AN/AN/A
Hashing, crypto-safe (64 bytes)100 ns1 GiB/s1 ms1s
Sequential SSD read (8 KiB)1 μs8 GiB/s100 μs100 ms
Context Switch [1] [2]10 μsN/AN/AN/A
Sequential SSD write, -fsync (8KiB)2 μs3 GiB/s300 μs300 ms
TCP Echo Server (32 KiB)50 μs500 MiB/s2 ms2s
Decompression [11]N/A1 GiB/s1 ms1s
Compression [11]N/A500 MiB/s2 ms2s
Sequential SSD write, +fsync (8KiB)300 μs30 MiB/s30 ms30s
Sorting (64-bit integers)N/A500 MiB/s2 ms2s
Sequential HDD Read (8 KiB)10 ms250 MiB/s2 ms2s
Blob Storage GET, if not match30 ms
Blob Storage GET, 1 conn (128KiB)50 ms100 MiB/s10 ms10s
Blob Storage GET, n conn (offsets)50 msNW limit
Blob Storage PUT, 1 conn (128KiB)100 ms100 MiB/s10 ms10s
Blob Storage PUT, n conn (multipart)150 msNW limit
Blob Storage PUT, CAS (8 KiB)100 ms
Random SSD Read (8 KiB)100 μs70 MiB/s15 ms15s
Serialization [8] [9]N/A100 MiB/s10 ms10s
Deserialization [8] [9]N/A100 MiB/s10 ms10s
Proxy: Envoy/ProxySQL/Nginx/HAProxy50 μs???
Network within same region250 μs2 GiB/s500 μs500 ms
Premium network within zone/VPC250 μs25 GiB/s50 μs40 ms
{MySQL, Memcached, Redis, ..} Query500 μs???
Random HDD Read (8 KiB)10 ms0.7 MiB/s2 s30m
Network between regions [6]Varies25 MiB/s40 ms40s
Network NA Central <-> East25 ms25 MiB/s40 ms40s
Network NA Central <-> West40 ms25 MiB/s40 ms40s
Network NA East <-> West60 ms25 MiB/s40 ms40s
Network EU West <-> NA East80 ms25 MiB/s40 ms40s
Network EU West <-> NA Central100 ms25 MiB/s40 ms40s
Network NA West <-> Singapore180 ms25 MiB/s40 ms40s
Network EU West <-> Singapore160 ms25 MiB/s40 ms40s

†: "Fast serialization/deserialization" is typically a simple wire-protocol that just dumps bytes, or a very efficient environment. Typically standard serialization such as e.g. JSON will be of the slower kind. We include both here as serialization/deserialization is a very, very broad topic with extremely different performance characteristics depending on data and implementation.

For the active Criterion suite, run ./run --bench napkin_math to get the right optimization levels and Linux tuning. You won't get the right numbers when you're compiling in debug mode. The wrapper already uses sudo internally. On locked-down cloud images, run sudo sysctl -w kernel.perf_event_paranoid=-1 once before invoking it. You can help this project by adding new suites and filling out the blanks.

Note: The active benchmark path today is Criterion.rs in benches/. src/main.rs is still the older ad hoc harness and remains the source of truth for the benches that have not been fully migrated and revalidated yet. The current Criterion suite now includes memory_read, memory_random, hash, syscall, sort, serialization, compression, and compressed_memory_read. The current SSD rows were refreshed from the older harness with NAPKIN_BENCH_FILE pointed at a RAID0 local-SSD mount. The compressed_memory_read Criterion bench is a BitPacker integer-unpack microbenchmark; it should not be used to rewrite the generic [11] compression/decompression rows above. The new serialization and compression Criterion groups are workload-specific and are not yet wired into the generic README rows above. memory_read now emits explicit No SIMD and SIMD variants in Criterion, but the README intentionally collapses them to one single-thread row and one threaded row for memorability.

I am aware of some inefficiencies in this suite. I intend to improve my skills in this area, in order to ensure the numbers are the upper-bound of performance you may be able to squeeze out in production. I find it highly unlikely any of them will be more than 2-3x off, which shouldn't be a problem for most users.

Cost Numbers

Approximate numbers that should be consistent between Cloud providers.

WhatAmount$ / Month1y commit $ /monthSpot $ /monthHourly Spot $
CPU1$15$10$2$0.005
GPU1$5000$3000$1500$2
Memory1 GB$2$1$0.2$0.0005
Storage
├ Warehouse Storage1 GB$0.02
├ Blob (S3, GCS)1 GB$0.02
├ Zonal HDD1 GB$0.05
├ Ephemeral SSD1 GB$0.08$0.05$0.05$0.07
├ Regional HDD1 GB$0.1
├ Zonal SSD1 GB$0.2
├ Regional SSD1 GB$0.35
Networking
├ Same Zone1 GB$0
├ Blob1 GB$0
├ Ingress1 GB$0
├ L4 LB1 GB$0.008
├ Inter-Zone1 GB$0.01
├ Inter-Region1 GB$0.02
├ Internet Egress †1 GB$0.1
CDN Egress1 GB$0.05
CDN Fill ‡1 GB$0.01
Warehouse Query1 GB$0.005
Logs/Traces ♣1 GB$0.5
Metrics1000$20
EKM Keys1$1

† This refers to network leaving your cloud provider, e.g. sending data to S3 from GCP or egress network for sending HTML from AWS to a client.

‡ An additional per cache-fill fee is incurred that costs close to blob storage write costs (see just below).

7 This is standard pricing among a few logging providers, but e.g. Datadog pricing is different and charges $0.1 per ingested logs with $1.5 per 1m on top for 7d retention.

Furthermore, for blob storage (S3/GCS/R2/...), you're charged per read/write operation (fewer, large files is cheaper):

1M1000
Reads$0.4$0.0004
Writes$5$0.005
EKM Encryption$3$0.003

Compression Ratios

This is sourced from a few sources. [3] [4] [5] Note that compression speeds (but generally not ratios) vary by an order of magnitude depending on the algorithm and the level of compression (which trades speed for compression).

I typically ballpark that another x in compression ratio decreases performance by 10x. E.g. we can get a 2x ratio on English Wikipedia at ~200 MiB/s, and 3x at ~20MiB/s, and 4x at 1MB/s.

WhatCompression Ratio
HTML2-3x
English2-4x
Source Code2-4x
Executables2-3x
RPC5-10x
SSL-2% [10]

Techniques

  • Don't overcomplicate. If you are basing your calculation on more than 6 assumptions, you're likely making it harder than it should be.
  • Keep the units. They're good checksumming. Wolframalpha has terrific support if you need a hand in converting e.g. KiB to TiB.
  • Calculate with exponents. A lot of back-of-the-envelope calculations are done with just coefficients and exponents, e.g. c * 10^e. Your goal is to get within an order of magnitude right--that's just e. c matters a lot less. Only worrying about single-digit coefficients and exponents makes it much easier on a napkin (not to speak of all the zeros you avoid writing).
  • Perform Fermi decomposition. Write down things you can guess at until you can start to hint at an answer. When you want to know the cost of storage for logging, you're going to want to know how big a log line is, how many of those you have per second, what that costs, and so on.

Resources