Let’s say we’re a SAAS company with 10,000 RPS which resolves in 100ms and our application only works on single threaded system and assume each CPU cost $10
Thus here the cost will be 10,000 RPS at 100ms = 1,000 CPU/sec at 10,000/month
here are the avg rates in 2025:
- CPU: $30 / core / month
- Memory: $1 / GB / month
- SSD: $0.1 / GB / month
- Disk: $0.01 / GB / month
- Cloud-storage: $0.02 / GB / month
- Network: $0.10 / GB / month (between zones, egress, between regions)
here are some terms and meaning of them:
- Median and P99 response time - common latency percentiles in performance SLIs(Service Level Indicator)
- Throughput - It’s equal to the RPS that system can handle
- Transactions - Same, RPS that system can handle
Q Assume we’re building a region fail over mechanism, in which if one region fails we need to fail over our read replica of database to other writer in other region to catch up to this failed region read replica, and here we’re storing the auth sessions in-memory of size 16GB in the database server and we want to transfer these in-memory sessions to new region replica ?
A The naive approach is “can we just write the in-memory database to SSD and send it over the network to new replica couple of seconds before the crash”
Here is the flow:
- Reading 1GB of sequential memory takes ~100ms
- Writing 1GB to SSD takes ~500ms
- Transferring 1GB from one cloud region(not zone) to another takes ~1min(150 Mbit/sec)
- Reading 1GB from SSD takes ~250ms (New Replica)
- Writing 1GB of random memory in 64-bit increments takes ~1.5 seconds (New Replica)
- This above time is for 1GB but there is 16GB, even if we only consider the Transfer time and ignore others, it’s still not feasible
- 16 GB = ~16,384 MiB
- If cross-region throughput is ~25 MiB/s (a reasonable ballpark from low-level measurements) → 16,384 / 25 ≈ 655 seconds (~11 minutes)
- If using your 150 Mbit/s number (~17.9 MiB/s) → 16,384 / 17.9 ≈ 915 seconds (~15 minutes)
- Egress cost at 0.32, so cost is not the blocker, time is the blocker
Q A merchant in Australia takes 2-3sec to serve a response but, as per the math we know
- Render time - ~100ms
- Round-trip time between Australia and U.S Server - ~250ms
- Request cycle round-trips - ~4.5 from DNS(1), TCP(1), SSL(2), HTTP(1)
- The total should be: 4 * 250ms + 100ms = 1.1 seconds
But why it’s taking more than the estimated time ?
A TCP has slow start, it doesn’t know the size of the pipe between client and the server, so it can’t send the atom that long, it need to try to inch it’s way to figure out the bandwidth between client and the server
TCP initially sends 15kB(default in Linux, can change it) from server to the client, if all the packets made it to the client then, next it will send 30kB, next 60kB, next 120Kb,…
But if we increase the size of the initial packet then there is chance of losing the data and that takes 3-4 more round trips to ensure that all the packets reached the client, thus 15kB is optimal as the initial packets size
Ballpark Values for Reference:
| Operation | Latency / Throughput |
|---|---|
| Sequential memory R/W (no SIMD) | ~10 GiB/s → ~100 ms to read 1 GiB |
| Random memory R/W | ~1 GiB/s → ~1 second per GiB for random access |
| SSD sequential read (8 KiB) | ~4 GiB/s → ~200 ms for 1 GiB |
| SSD sequential write (8 KiB, no fsync) | ~1 GiB/s → ~1 second per GiB |
| Network between regions | ~25 MiB/s (ballpark, depends on cloud provider) |