Standard : Failure patterns are used to inform architectural investment

Purpose and Strategic Importance

This standard ensures that patterns from failures are systematically analysed and used to guide architectural decisions. It turns operational pain into long-term improvement, enabling teams to invest in resilience where it matters most.

Aligned to our "Post-Incident Learning Culture" policy, this standard promotes smarter design, reduces repeat failures, and supports continuous learning. Without it, teams risk fixing symptoms instead of causes—slowing progress and weakening system reliability.

Strategic Impact

Improved consistency and quality across teams
Reduced operational friction and delivery risks
Stronger ownership and autonomy in technical decision-making
More inclusive and sustainable engineering culture

Risks of Not Having This Standard

Slower time-to-value and increased rework
Accumulation of inconsistency and process debt
Reduced trust in engineering data, systems, or ownership
Loss of agility in the face of change or failure

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Failures resolved in isolation without shared learning. - Root causes often undocumented or revisited repeatedly.
Process & Governance	- No standard practice for identifying failure patterns. - Architectural decisions rarely informed by incident data.
Technology & Tools	- Incident management tools used for firefighting only. - No analysis or aggregation of failure data.
Measurement & Metrics	- Lack of visibility into recurring failure trends or architectural impacts.

Level 2 – Managed

Category	Description
People & Culture	- Some teams capture incident learnings, but sharing is inconsistent. - Architectural changes driven by local needs.
Process & Governance	- Informal feedback loops link incidents to architecture. - Improvements are reactive and isolated.
Technology & Tools	- Incident data occasionally reviewed but not systematically analysed. - Tools support manual extraction of failure info.
Measurement & Metrics	- Basic tracking of incident recurrence without architectural prioritisation.

Level 3 – Defined

Category	Description
People & Culture	- Teams systematically review failure patterns. - Architectural teams engaged in incident retrospectives.
Process & Governance	- Failure analyses standardised and documented. - Learnings feed into architecture and design discussions.
Technology & Tools	- Tools support aggregation and trending of failure data. - Dashboards provide visibility into architectural risks.
Measurement & Metrics	- Metrics on failure types, frequency, and related architectural changes.

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- Organisation-wide awareness of failure patterns. - Architecture investment prioritised by data.
Process & Governance	- Failure trend analysis informs roadmap and resilience strategies. - Continuous feedback between incidents and architecture.
Technology & Tools	- Automated analysis identifies systemic issues. - Tools correlate failures with architectural components.
Measurement & Metrics	- Impact of architectural changes on failure rates tracked and reported.

Level 5 – Optimising

Category	Description
People & Culture	- Post-incident learning drives architecture evolution. - Cross-team collaboration prevents failure recurrence.
Process & Governance	- Failure patterns shape long-term resilience and platform strategies. - Shared learning culture embedded organisation-wide.
Technology & Tools	- Advanced predictive analytics guide architectural investment. - Systems proactively alert on emerging failure patterns.
Measurement & Metrics	- Demonstrated reduction in critical failures due to architectural improvements. - Evidence of organisational learning loops impacting resilience.

Key Measures

Adoption rates and coverage across teams
Impact on delivery metrics, quality, or team health
Evidence of ownership, governance, or learning loops