Every SaaS reaches the moment where someone's loop hits an API endpoint 800 times a second. The textbook answer is "add Redis, write a sliding window." The textbook answer for us was wrong. We didn't have Redis in the cluster. Adding it for one feature felt expensive — operationally, mentally, and on the runbook.
Postgres can do this. The only primitive you need is INSERT … ON CONFLICT DO UPDATE … RETURNING.
The whole thing
CREATE TABLE rate_buckets ( scope TEXT NOT NULL, key TEXT NOT NULL, window_start TIMESTAMPTZ NOT NULL, count INTEGER NOT NULL DEFAULT 0, PRIMARY KEY (scope, key, window_start) );
-- atomic: returns the new count after this attempt INSERT INTO rate_buckets (scope, key, window_start, count) VALUES ($1, $2, $3, 1) ON CONFLICT (scope, key, window_start) DO UPDATE SET count = rate_buckets.count + 1 RETURNING count;
The caller picks the window — second, minute, hour — by truncating NOW() to that boundary in app code. Two checks per send path: 5/sec/tenant and 60/min/phone. Each adds ~1 ms to the round trip. A nightly worker prunes window_start < NOW() - 1h.
Where it falls down
At a few million ops/min you start contending on the same row. Strategies for that day: shard the key (tenant:abc:0..15), push to Redis, or batch increments. None apply at our scale, and the whole table has under 10k live rows at any time.
We've been running this for six weeks. Zero ops issues. The most expensive thing about it is that our rate-limit metric is called "rate_buckets" instead of something with hyperloglog in the name.