fundamentals
MTTA vs MTTR: What's the Difference?
MTTA and MTTR get conflated constantly, usually in the same sentence by someone who means one and says the other. It's an easy mistake -- the acronyms look alike and both show up in the same incident report -- but they measure very different things. One is about how fast a human engages with a problem. The other is about how fast the service is actually restored. Confusing them leads teams to celebrate the wrong number, or worse, to invest in fixing a problem they don't have.
This is the focused companion to our broader guide to MTTR, MTBF, and MTTD. If you want the whole family of reliability acronyms, start there. If you specifically need to know how MTTA vs MTTR differ and what to do about each, keep reading.
MTTA vs MTTR in one sentence
Here's the whole distinction, stripped down:
- MTTA — Mean Time To Acknowledge. From the moment your monitoring detects a problem to the moment a human acknowledges the page. It measures the gap between the alert firing and someone saying "I've got it."
- MTTR — Mean Time To Recover. From the start of the incident to full service restoration. It measures the entire event, end to end. (Some teams expand the R as Repair or Resolve; those measure slightly different endpoints, but the shape is the same.)
The key thing to internalize: MTTA is a small window inside the incident, and MTTR spans the whole thing. If an incident runs for 40 minutes and someone acknowledged the page 4 minutes in, your MTTA for that incident is 4 minutes and your MTTR is 40. The acknowledgment window is a subset of the recovery window, not a separate timeline. That's exactly why a great MTTA can hide a terrible MTTR: you engaged fast and then spent 36 minutes flailing.
How each is calculated
Both are simple averages over a set of incidents in some window (a quarter is typical):
- MTTA = sum of (acknowledged − detected) / number of incidents
- MTTR = sum of (resolved − started) / number of incidents
Concretely, if you had three incidents last month with acknowledgment delays of 2m, 4m, and 18m, your mean MTTA is 8 minutes. But notice how much that one 18-minute page drags the average up -- the median is 4 minutes, and that's a much more honest picture of a typical response.
Why the distinction matters
This is the part teams miss, and it's the whole reason to keep the two numbers separate: MTTA and MTTR are fixed by completely different work.
MTTA is an alerting and on-call problem
If your MTTA is bad, no amount of better runbooks or faster rollbacks will help, because the clock stops the moment someone acknowledges -- before any fixing happens. High MTTA comes from the alerting layer and the humans attached to it: pages routed to the wrong person, alerts so noisy that people ignore them, an on-call rotation that's burned out and slow to respond, or no clear escalation when the primary misses. These are covered in our posts on setting up an on-call rotation and fixing alert fatigue.
MTTR is a tooling problem
If your MTTR is bad but MTTA is fine, people are engaging quickly and then getting stuck. That's a tooling and process gap: no fast rollback path, no runbook for the failure mode, or observability so thin that nobody knows where to look. You improve MTTR by making thefixing faster -- and improving MTTA does almost nothing for it. The reverse is also true. Pouring effort into a slick rollback pipeline is wasted if the page sits unacknowledged for 30 minutes first.
Realistic targets for a small team
Rough numbers for a B2B SaaS with one or two on-call engineers. Treat these as starting points, not gospel:
- MTTA: < 5 minutes during business hours, < 10 minutes overnight. If overnight is much worse than daytime, that's a rotation-health signal, not a personal failing.
- MTTR: P50 under 30 minutes, P95 under 4 hours. Anything routinely past 4 hours is usually a dependency failure outside your control or a database problem requiring a restore.
The gap between those two ranges is the tell. If MTTA is 4 minutes and MTTR P50 is 90 minutes, your people are fast and your tooling is slow. If MTTA is 25 minutes and MTTR is 35, the incidents are basically acknowledgment delays -- the fix is quick once someone actually looks.
How to improve MTTA
Everything here targets the window between detection and someone engaging. In rough priority order:
- Stop pages people ignore. The single biggest MTTA killer is a rotation that has learned alerts are probably false. Cut the noise first; alert fatigue quietly adds minutes to every acknowledgment.
- Require multi-region quorum before paging. A page that only fires when several probe regions agree is a page worth trusting -- and a trusted page gets acknowledged faster. See why quorum matters.
- Set a clear escalation path. If the primary hasn't acknowledged within, say, 5 minutes, escalate to the secondary automatically. Don't let a missed page sit.
- Keep the rotation fair and healthy. A well-rested engineer acknowledges in two minutes; an exhausted one silences and rolls over. Sane shifts and handoffs are an MTTA intervention.
- Use minimum-duration thresholds. Require a failure to persist for, say, 60-90 seconds before it pages. It won't hurt MTTA on real incidents and it kills the flapping alerts that train people to wait and see.
How to improve MTTR
These target the window between someone engaging and the service actually coming back:
- Write mitigation-first runbooks. The first job is to stop the bleeding, not diagnose root cause. A good runbook leads with "how do I make this stop right now" -- restart, fail over, roll back -- and saves the investigation for the postmortem. Our on-call runbook template is built around this.
- Make rollback fast and boring. A large share of incidents are "the last deploy did it." If rolling back is a one-command, well-rehearsed operation, your MTTR on that whole category collapses. If it's theoretical, it won't happen at 3am.
- Invest in observability so people know where to look. Much of a long MTTR is spentfinding the problem, not fixing it. Dashboards that answer "what changed and where does it hurt" in one glance turn a 30-minute hunt into a 5-minute one.
Track both, and read them together
The reason to keep MTTA and MTTR side by side is that each one is meaningless without the other. A great MTTA with a terrible MTTR means you engage fast and then flail -- your alerting works, your response doesn't. A great MTTR with a terrible MTTA means real outages sit unacknowledged while the clock runs, and the only reason recovery looks fast is that you measure it from when someone finally noticed.
Pull both from tools you already have: MTTA usually lives in your monitoring or paging tool's alert log, and MTTR lives in your incident tracker. Report the median and P95 of each, month over month, and feed the outliers back through your postmortem process. When one number moves and the other doesn't, you'll know immediately whether the work belongs to the alerting layer or the tooling layer -- which is the entire point of not conflating them.
Frequently asked questions
- What is the difference between MTTA and MTTR?
- MTTA (Mean Time To Acknowledge) measures how long it takes a human to acknowledge an alert after detection. MTTR (Mean Time To Recover) measures how long it takes to restore full service from the moment the incident starts. MTTA is a small window inside the incident; MTTR spans the entire event, including the acknowledgment window. A short MTTA tells you people engage quickly; a short MTTR tells you the service comes back quickly.
- How do you calculate MTTA?
- MTTA is the sum of (acknowledged time minus detected time) across all incidents divided by the number of incidents in your window, usually a quarter. Because a few ignored 2am pages can dominate the average, report the median (P50) and 95th percentile (P95) alongside the mean rather than the mean alone.
- What is a good MTTA?
- For a small B2B SaaS team with one or two on-call engineers, aim for under 5 minutes during business hours and under 10 minutes overnight. If your MTTA is routinely higher than that, the problem is almost always alerting or on-call health -- noisy pages, bad routing, or an exhausted rotation -- not the engineers.
- Does improving MTTA improve MTTR?
- Only at the margin. MTTA is one component of MTTR, so shaving acknowledgment time does lower the total slightly. But if it takes 3 hours to roll back a bad deploy, cutting MTTA from 8 minutes to 3 barely moves MTTR. MTTA is an alerting and on-call problem; MTTR is mostly a tooling problem. They improve through different work.
- Is MTTR the same as MTTD?
- No. MTTD (Mean Time To Detect) measures the gap between something breaking and your monitoring noticing. MTTR (Mean Time To Recover) measures the gap between the incident starting and full service restoration. MTTD is about monitoring coverage and check frequency; MTTR is about how fast you respond and remediate once you know.
Uptimera team
We build Uptimera — multi-region uptime monitoring, SSL and DNS checks, and branded status pages. These guides come from running the same monitoring and on-call practices we write about.