Standard : Monitoring is embedded in design and operations
Purpose and Strategic Importance
This standard ensures that monitoring is a first-class capability built into system design, development, and operations. By instrumenting services, infrastructure, and user workflows with real-time metrics, health checks, and user-experience indicators, teams gain the visibility needed to detect anomalies early, troubleshoot effectively, and maintain high levels of service reliability.
Strategic Impact
- Early detection and proactive response to anomalies
- Improved operational excellence through data-driven decisions
- Enhanced ability to prioritize features and architectural improvements
- Assurance of SLA compliance and regulatory governance
Risks of Not Having This Standard
- Blind spots in production leading to user impact
- Inefficient incident diagnosis and longer outages
- Decreased customer satisfaction due to unnoticed errors
- Growth of fragmented and costly monitoring solutions
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
Monitoring is informal or manual, with little standard practice. |
| Process & Governance |
Monitoring efforts are ad hoc, reactive, and inconsistent. |
| Technology & Tools |
Reliance on logs and manual checks without automation. |
| Measurement & Metrics |
No consistent measurement of detection or alert effectiveness. |
Level 2 – Managed
| Category |
Description |
| People & Culture |
Teams begin to recognise importance of monitoring and define basic metrics. |
| Process & Governance |
Central collection of key service and infrastructure metrics established. |
| Technology & Tools |
Basic alerting in place but varies across teams and systems. |
| Measurement & Metrics |
Some tracking of alert volumes and incident detection times. |
Level 3 – Defined
| Category |
Description |
| People & Culture |
Monitoring is embedded in team practices, with clear ownership. |
| Process & Governance |
Standardised metric schemas and dashboards mandated across teams. |
| Technology & Tools |
SLIs and SLOs defined, tracked, and reported regularly. |
| Measurement & Metrics |
Metrics quality and coverage are monitored for completeness. |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
Teams use monitoring data proactively to improve system health. |
| Process & Governance |
Monitoring quality metrics (accuracy, latency) are measured and optimised. |
| Technology & Tools |
Anomaly detection, dynamic thresholds, and alert tuning implemented. |
| Measurement & Metrics |
Quantitative tracking of detection time and alert precision. |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
Predictive analytics and automated remediation are cultural norms. |
| Process & Governance |
Continuous monitoring improvement processes reduce noise and improve relevance. |
| Technology & Tools |
Intelligent systems anticipate issues and guide preventive action. |
| Measurement & Metrics |
Business impact quantified from monitoring-driven improvements. |
Key Measures
- Monitoring coverage (% of systems with standardized metrics)
- Mean Time to Detect (MTTD) issues via monitoring
- Alert precision (ratio of true positive alerts to total alerts)
- SLO compliance rate (percentage of time services meet defined objectives)
- Monitoring data latency (time from event to metric availability)