← All posts

2026-04-28

Webhooks that don't lie: what we changed after a 4am incident

At 04:11 UTC, a customer Slacked: "our pipeline is empty for two days, what gives?" We checked our webhook delivery dashboard. 100% success rate. Every event delivered. Every status code 200.

It turned out their reverse proxy had a misconfigured route — when we POSTed to https://their-app/webhooks/inboxr, the proxy returned a 200 OK with a friendly HTML error page ("We'll be right back!"). Their actual webhook handler never got the payload. We marked them delivered. Two days of silent data loss.

The fix

Two changes, both small:

1. Capture response body on every attempt — not just failures. Success bodies are usually empty (correct receivers return 202 with no body). Receivers that return 200 + HTML are 99% the time broken.

// before
if (!res.ok) errorBody = (await res.text()).slice(0, 1024);

// after
responseBody = (await res.text()).slice(0, 1024);
if (!res.ok) errorBody = responseBody;

2. Surface the body in the dashboard. The delivery history accordion now shows the receiver's response — even on success — so a glance tells you whether your endpoint is healthy or a lying proxy is in front of it.

What we explicitly didn't do

We didn't add Content-Type sniffing or HTML detection in the webhook runner to "auto-fail" deliveries that look wrong. That's the kind of clever heuristic that breaks things later. Show the user the raw response and let them decide.

We also redesigned the backoff schedule to match the spec our customers had been asking for: 1s, 10s, 1m, 10m, 1h, 6h — six attempts then dead-letter. Six retries spaced like that is enough to survive a 6-hour outage; it doesn't pummel a struggling endpoint with retries every five minutes for a day.

Receivers should never lie. Webhook senders should never trust a 2xx in isolation.