scalable systems handle growth without breaking
← back · 2 min read ·

pre-mortems beat post-mortems

the cheapest debugging is the kind you do before the bug.

a post-mortem is what you write after something broke. a pre-mortem is the same document, written before. you imagine the launch failed, you have an hour, you write the post-mortem now.

it sounds like a thought experiment. it’s not. it’s a debugging session. the technique was named by gary klein in hbr, 2007 — applied research from naturalistic decision-making — and the empirical claim is that prospective hindsight (imagining failure has already happened) increases the ability to identify failure modes by ~30%. that matches my own experience.

the trick is the framing

“what could go wrong” is a question that reads as risk-averse and gets answered with hand-waves. “the launch failed — what happened?” is a story prompt. people fill in stories. the stories are concrete: the queue backed up, the migration locked the orders table, the new vendor’s webhook was three minutes late and our retry logic doubled the request volume.

i’ve run pre-mortems before every major launch since 2019. the same three things show up every time:

  1. one or two failure modes the original plan missed entirely.
  2. one mitigation that’s cheap if you do it now and expensive if you do it later — a feature flag, a secondary index, a rate limit.
  3. one assumption that turns out to be wrong when you state it out loud — “we said we’d cut over at midnight, but the support team starts at 8am.”

the format

thirty minutes of silent writing, thirty minutes of reading each other’s drafts, one shared doc with the failure modes ranked by how cheap they are to defuse.

the pre-mortem isn’t a substitute for the post-mortem. it’s a forecast. the post-mortem audits the forecast — did the failure mode we predicted actually happen, and if not, what did we miss? over time the forecasts get sharper.

a post-mortem after a failure is information. a pre-mortem before a launch is leverage. if you can only afford one, do the cheaper one.

scalable labs·cvr 30091604·github·linkedin·hello@scalable.dk