The incident is over. The service is stable. The timeline has been written, the contributing factors identified, and the action items assigned.
Everyone agrees the learning matters.
Three months later, a similar pattern starts forming. The on-call engineer opens a war room. The first questions are familiar: what changed, who owns it, is this related to something before? Someone may vaguely remember a similar incident, but the detail has faded. The postmortem exists. Nobody searches for it in the first fifteen minutes.
Postmortems are useful, but passive
A postmortem does something important. It slows a team down long enough to think carefully about what actually happened. Writing a timeline, identifying contributing factors, and proposing corrective action has real value. It creates a shared record.
The value of a postmortem is not the document. It is the future decision it improves.
The problem is not what gets written. The problem is that the document waits to be found.
A postmortem assumes that next time, someone will:
- know the document exists
- think to search for it at the right moment
- recognise that the current situation is similar enough to be relevant
- find the right page before the incident has moved on
- extract the useful detail while the team is already under pressure
That is a lot of steps that depend on memory and pattern recognition at exactly the moment a team has least capacity for them.
Static learning fails under pressure
During an incident, teams are not calmly reviewing old documentation. They are working through symptoms, recent changes, ownership paths, and customer impact simultaneously.
The team that wrote the postmortem three months ago had full context. The team responding today may not have the same people in the room. They may not know where to look. They may not recognise a partial pattern as related to something they have not thought about since it was filed.
Incident learning should not depend on someone remembering the right page under pressure.
The postmortem is there. It is just not where the team is looking.
Incident memory should show up next time
The goal is not to write better documents. The goal is to make past learning available when it can still change the outcome.
Incident knowledge becomes more useful when it can surface during:
- live incident response, when a similar pattern is forming
- deployment review, when a risky change touches the same service or dependency
- code review, when a PR modifies an area with a history of incidents
- ownership handovers, when operational context needs to transfer between teams
The question is not "did we document this?" It is "will the team recognise the pattern before it becomes expensive again?"
Teams should not have to rediscover the same lesson every time. The signals were present before. The contributing factors were identified. The blast radius was mapped. That knowledge exists in the organisation. It should travel with the system, not sit archived in a page that requires the right search terms to surface.
Where Ember fits
Monitoring tells teams something is wrong. Version control shows what changed. Postmortems record what was learned.
None of these surfaces past incident learning automatically when it is most relevant.
Ember is being built to connect incident memory to live incident context. Not to replace postmortems or the process of writing them, but to help relevant past context appear when similar conditions are present.
When a pattern resembles a previous incident, Ember should help surface:
- what the previous incident involved and which services were affected
- which changes were implicated and what the contributing factors were
- what actions helped and what wasted time
- how confident the similarity is, and what evidence supports it
Ember does not guarantee recurrence detection or prevent repeated incidents by itself. But it tries to make the learning that already exists available at the moment it can change a decision, rather than waiting to be found.
Learning only matters if it changes what happens next
A team does not really learn from an incident because a document exists.
It learns when that knowledge changes what happens next time. When a responder recognises a pattern earlier. When a deployment gets a closer look because a similar change caused trouble before. When the team can move quickly because they are not rebuilding context from scratch.
The incidents will come. The question is whether the team has to rediscover the same lessons, or whether past experience is working for them.