fundamentals
The HTTP status codes you'll see most often during outages
The HTTP status code your monitor reports is the first clue you get about what's actually broken. A 502 and a 504 sound similar; they point at completely different problems. A 503 with a Retry-After header is benign; a 503 without one is a fire. This is a working engineer's field guide to the codes you'll see most during incidents and how to read them.
The 0 response: not really HTTP at all
Most monitors report "status 0" or "no response" when the HTTP exchange never completed. This is the worst kind of failure because the code carries no information — only the underlying error does. The four common causes:
- DNS resolution failed. Your domain doesn't resolve. See our DNS monitoring guide for the diagnostic.
- TCP connection refused or timed out. The host resolved but nothing's listening on the port. Process crashed, load balancer pulled the host, security group changed.
- TLS handshake failed. Certificate expired, hostname mismatch, protocol version unsupported. Will appear as
SSL_ERRORorHANDSHAKE_FAILUREin logs. - Read timeout. Connection succeeded, request sent, but the server never started writing a response within the monitor's timeout window. Usually a downstream dependency stuck on a slow query.
2xx and 3xx: not always good news
A 200 isn't a guarantee of health. Two patterns to watch:
- The smiling 200 of doom. Your server returned 200 OK with a body that says "Internal Server Error." Common when an upstream proxy swallows errors. The fix: content-match checks. Your monitor should assert a specific string (e.g.
"healthy") appears in the body, not just that the status is 2xx. - The 302 to nowhere. Your load balancer's health probe follows a redirect into an authenticated page that returns 200, so the LB thinks all is well — while the actual public homepage 500s. Monitors that follow redirects need to assert what the final URL and content are.
4xx: the client (probably) did something
Most 4xx codes mean "your request is wrong." In monitoring, the question is whether your monitor is the bad client or your users are getting blocked.
400 Bad Request
Malformed request. If your monitor sees this consistently, the monitor is sending bad data — wrong content-type, malformed JSON, invalid header. Fix the monitor.
401 Unauthorized / 403 Forbidden
Auth failed. Common when you monitor an API endpoint and the API token rotated. Rotate your monitor tokens through the same secrets system as production — if the secret rotates and the monitor doesn't pick it up, you get a false page.
404 Not Found
Endpoint or route doesn't exist. Usually means a deploy moved the URL. If your monitor 404s after a deploy, that's the deploy's problem — but verify the URL is what you intend, not what you typed.
408 Request Timeout
Rare but real. The server closed an idle connection before the client (your monitor) finished sending. Often a sign of load balancer pre-emptive killing of slow uploads.
429 Too Many Requests
Rate limited. If your monitor hits this, you're probably checking too aggressively (sub-30-second intervals against a shared endpoint) or your rate limiter doesn't allowlist the monitor IPs. Add the monitor's probe IPs to your rate limit bypass list.
5xx: the server did something — the codes you'll see
This is where most real outages live. The codes are similar enough that teams confuse them constantly.
500 Internal Server Error
Catch-all for "something blew up in the application." Unhandled exceptions, divide-by-zero, the database returning a result the app didn't expect. The status code carries no diagnostic — you have to look at server logs. Always page on a sustained rate of 500s.
502 Bad Gateway
A proxy in front of your app (Nginx, Caddy, an AWS ALB) couldn't get a valid response from the upstream. Translation: the application crashed or restarted while serving the request. 502s typically come in bursts during a deploy or process restart. A small spike during release is normal; a sustained 502 rate means crashes.
503 Service Unavailable
The server is temporarily refusing to serve, usually because it's deliberately throttling or shedding load. Two flavors:
- With
Retry-After: the server is asking the client to back off for N seconds. Often maintenance mode. Benign if expected. - Without
Retry-After: your service is at capacity. Either the app is rejecting connections to protect itself (good) or the load balancer can't find healthy backends (bad). Both warrant investigation.
504 Gateway Timeout
The proxy reached the upstream, sent the request, and the upstream didn't respond in time. Translation: the app is up, but slow. A 504 is a 502's slower cousin; almost always means a downstream dependency (database, third-party API, internal service) is sluggish. 504s pile up — they are the symptom that turns into 503s or 502s as the service buckles.
521, 522, 523, 524 (Cloudflare-specific)
Cloudflare extends the 5xx range with codes that distinguish origin failures:
- 521 — Web server is down (TCP refused).
- 522 — Connection timed out (origin firewall, MTU, or routing).
- 523 — Origin unreachable (Cloudflare can't route to your network at all).
- 524 — A timeout occurred (origin started responding but didn't finish within 100s).
If you're behind Cloudflare, alert on these as if they were 5xx — they're the most diagnostic information you'll get from the edge.
Which codes to alert on, and how loudly
- Page immediately: 0 (no response), 502, 503-without-Retry-After, 504, 521/522/523/524.
- Page if sustained >5 minutes: 500, 503-with-Retry-After.
- Warn (don't page): 4xx on monitoring endpoints — fix the monitor first.
Where to go from here
Walk through your current alert set and ask: do we know what each code means in our stack? Do we have content-match checks where 200s can lie? Are 4xx alerts going to the right place (engineering, not on-call)? Getting the mapping right is most of the value — better alerts beat more alerts every time.
Looking for one specific status code? Our HTTP status code reference has a page per code with everything you'd want to know.