• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Major incidents are followed by timely, blameless reviews

Purpose and Strategic Importance

This standard ensures major incidents are followed by timely, blameless reviews that focus on learning, not fault. It helps teams uncover root causes, share insights, and strengthen systems without fear or blame.

Aligned to our "Post-Incident Learning Culture" policy, this standard builds trust, encourages transparency, and improves system resilience. Without it, teams miss critical learning opportunities and risk repeating avoidable failures.

Strategic Impact

  • Improved delivery flow and reduced risk
  • Higher system resilience through continuous improvement
  • Increased trust and transparency across teams
  • Stronger alignment between technical and business priorities
  • Faster time to value and reduced rework

Risks of Not Having This Standard

  • Reduced ability to respond effectively to incidents
  • Accumulation of technical debt and recurring failures
  • Poor developer morale and increased frustration
  • Loss of confidence in delivery quality
  • Misalignment between technical fixes and business needs

CMMI Maturity Model

Level 1 – Initial

Category Description
People & Culture - No shared mindset or training around reviews.
- Incident post-mortems are punitive or skipped.
Process & Governance - No formal triggers or timelines for reviews.
- Teams act independently with no standard approach.
Technology & Tools - No dedicated tracking or collaboration tools.
- Notes are informal and fragmented.
Measurement & Metrics - No visibility or tracking of review completion or outcomes.

Level 2 – Managed

Category Description
People & Culture - Some trained facilitators conduct blameless retrospectives.
- Learning is recognised but seen as optional.
Process & Governance - Policy mandates reviews within 72 hours for Severity 1/2 incidents.
- Basic templates adopted by some teams.
Technology & Tools - Incident registers or ticketing flag major incidents.
- Shared documentation captures review outputs.
Measurement & Metrics - Percentage of incidents reviewed on time.
- Number of action items generated per review.

Level 3 – Defined

Category Description
People & Culture - All relevant staff trained in lessons-learned facilitation.
- Peer reviewers ensure blameless language.
Process & Governance - Global playbook guides review tailoring.
- Reviews contribute to organisation-wide knowledge base.
Technology & Tools - Automated reminders and dashboards track overdue reviews.
- Centralised repository supports tagging and reuse.
Measurement & Metrics - Quality scoring of review reports.
- Median time from incident to report publication.

Level 4 – Quantitatively Managed

Category Description
People & Culture - Data-driven retrospectives use control charts to detect trends.
- Roles like “Data Champion” monitor review health.
Process & Governance - KPIs with SLAs for timeliness, closure, and recurrence.
- Quarterly health checks adjust processes as needed.
Technology & Tools - Real-time analytics reveal recurring error modes.
- Automated playbook suggestions based on root cause patterns.
Measurement & Metrics - Control chart analysis of review cycle times.
- Percentage of actions closed on time; drop in repeat incidents.

Level 5 – Optimising

Category Description
People & Culture - Learning champions drive cross-team innovation.
- Successes are celebrated and inform strategic roadmaps.
Process & Governance - Predictive triggers initiate proactive reviews.
- Outcomes feed into training and design standards.
Technology & Tools - Machine-learning-driven RCA assistants suggest real-time hypotheses.
- Integration with planning tools for continuous improvement.
Measurement & Metrics - Year-on-year reduction in Severity 1/2 incidents.
- Quantified business impact from avoided downtime or costs.

Key Measures

  • % of major incidents followed by timely reviews
  • Quality and completeness of review documentation
  • Action item closure rate and time-to-closure
  • Reduction in recurrence of similar incidents
  • Stakeholder satisfaction with incident handling and learning
  • Maturity scores from structured review process assessments
Associated Policies
  • Post-Incident Learning Culture
Associated Practices
  • Root Cause Analysis (RCA)
  • Exploratory Testing
  • Test-Driven Development (TDD)
  • Behaviour-Driven Development (BDD)
  • Non-functional Requirement Testing
  • Mutation Testing
  • End-to-End (E2E) Testing
  • Contract Testing
  • Visual Regression Testing
  • Integration Testing
  • Incident Response Playbooks
  • Blameless Postmortems

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering