• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Learnings from incidents are turned into engineering improvements

Purpose and Strategic Importance

This standard ensures that lessons from incidents are not lost but actively drive engineering improvements. It turns short-term recovery into long-term resilience, helping teams build better systems through continuous learning.

Aligned to our "Post-Incident Learning Culture" policy, this standard fosters accountability, strengthens feedback loops, and prevents recurrence. Without it, incidents repeat, trust erodes, and opportunities to grow are missed.

Strategic Impact

  • Improved delivery flow and system stability
  • Reduced recurrence of failures and incidents
  • Stronger engineering discipline and accountability
  • Enhanced trust and confidence in releases
  • Better alignment of technical improvements with business priorities

Risks of Not Having This Standard

  • Reduced ability to respond to change or failure
  • Accumulation of technical debt or unresolved issues
  • Poor developer morale and frustration
  • Loss of stakeholder confidence in delivery quality
  • Repeated failures due to lack of systemic improvement

CMMI Maturity Model

Level 1 – Initial

Category Description
People & Culture - Action items from incidents are rarely completed or followed up.
- Learning from incidents is informal or ignored.
Process & Governance - No structured process for tracking or implementing improvements.
- Incident resolution focuses on firefighting only.
Technology & Tools - Limited or no tooling for managing post-incident actions.
- Knowledge is siloed or lost.
Measurement & Metrics - No tracking of improvement completion or impact.

Level 2 – Managed

Category Description
People & Culture - Some learnings are captured but implementation is inconsistent.
- Responsibility for improvements is unclear.
Process & Governance - Improvements identified during incident reviews are occasionally tracked.
- Processes lack rigor and prioritisation.
Technology & Tools - Basic tracking tools (e.g., spreadsheets, ticket systems) used for some actions.
- Sharing of learnings is ad hoc.
Measurement & Metrics - Partial tracking of improvement progress; limited outcome analysis.

Level 3 – Defined

Category Description
People & Culture - Teams prioritise and commit to implementing improvements.
- Incident learnings are part of team retrospectives.
Process & Governance - Formal processes ensure post-incident actions are tracked and closed.
- Improvement outcomes are reviewed regularly.
Technology & Tools - Tools support linking incidents to improvement work.
- Documentation and sharing of lessons is standardised.
Measurement & Metrics - Metrics track closure rates and effectiveness of improvements.

Level 4 – Quantitatively Managed

Category Description
People & Culture - Learning from incidents drives continuous engineering improvement.
- Teams proactively identify systemic issues.
Process & Governance - Cross-incident trend analysis guides strategic investments.
- Improvement work is integrated into planning cycles.
Technology & Tools - Advanced tooling aggregates incident data and tracks systemic fixes.
- Dashboards provide visibility on improvement impact.
Measurement & Metrics - Analysis of failure recurrence reduction and engineering health trends.

Level 5 – Optimising

Category Description
People & Culture - Continuous learning culture is embedded organisation-wide.
- Teams innovate proactively based on incident insights.
Process & Governance - Incident learnings directly influence architecture, processes, and culture.
- Strategic foresight prevents recurrence and drives resilience.
Technology & Tools - Predictive analytics anticipate issues and recommend improvements.
- Integrated systems enable real-time feedback loops.
Measurement & Metrics - Demonstrated systemic improvement and reduced incident impact over time.

Key Measures

  • Completion rate of post-incident action items
  • Reduction in repeat incidents linked to implemented improvements
  • Time from incident identification to improvement deployment
  • Developer and stakeholder satisfaction related to incident handling
  • Maturity assessment scores based on learning culture and process rigor
Associated Policies
  • Post-Incident Learning Culture
Associated Practices
  • Root Cause Analysis (RCA)
  • Exploratory Testing
  • Test-Driven Development (TDD)
  • Behaviour-Driven Development (BDD)
  • Non-functional Requirement Testing
  • Mutation Testing
  • End-to-End (E2E) Testing
  • Contract Testing
  • Visual Regression Testing
  • Integration Testing
  • Blameless Postmortems

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering