Why Most Engineering Organisations Misdiagnose Their Problems

The Real Problem Isn't Capability

Most engineering organisations don't struggle because they lack capability.

They struggle because they apply the wrong thinking to the wrong problems.

I've seen it repeatedly. A leadership team kicks off a transformation programme - detailed roadmap, phased milestones, RAG status updates every fortnight. Eighteen months later, very little has changed. Not because the people weren't talented, or the intention wasn't genuine. But because they treated a complex, emergent challenge like a predictable delivery problem.

The same pattern plays out across engineering organisations every day:

  • Treating innovation like delivery - defining requirements upfront for problems that aren't yet understood
  • Treating incidents like predictable systems - writing runbooks for failure modes that don't repeat
  • Treating transformation like a roadmap exercise - planning culture change the same way you plan a release

The root cause isn't execution. It's misdiagnosis.


What Cynefin Actually Is

Dave Snowden developed Cynefin (pronounced kuh-NEV-in - a Welsh word meaning habitat or haunt) in the late 1990s while working at IBM. It has since become one of the most useful frameworks in existence for leaders operating in uncertainty.

But here's what it isn't: a categorisation model.

You don't use Cynefin to label problems and file them away. You use it to decide how to think before you decide what to do. It's a sense-making framework - and that distinction matters enormously.

The central insight is deceptively simple:

The nature of the problem determines the approach. Not the other way around.

Most organisations do the opposite. They have a preferred approach - planning, best practice, expertise, agile ceremonies - and they apply it regardless of the problem in front of them. Cynefin gives you the language and the discipline to stop doing that.


The Five Domains

Cynefin defines five domains. Each one describes a different relationship between cause and effect, and demands a fundamentally different leadership stance.


Clear (formerly Obvious)

The domain of known knowns.

In this domain, cause and effect are self-evident. Best practice applies. The right answer is known, repeatable, and documentable.

Engineering examples:

  • Repeatable deployments via CI/CD pipelines
  • Standard onboarding processes
  • Routine patching and dependency updates
  • Incident playbooks for well-understood failure modes

Leadership stance: Sense → Categorise → Respond

You observe what's happening, recognise which category it falls into, and apply the known best practice. There is no need to analyse deeply or experiment. The answer is already known.

Failure mode: Over-engineering simple problems. If your team is debating the "right approach" to something that already has a well-established answer, you're wasting energy. Standardise it, automate it, and move on.


Complicated

The domain of known unknowns.

Cause and effect exist, but they're not immediately obvious. You need expertise to navigate them. There may be more than one good answer, but a competent specialist can identify the best one.

Engineering examples:

  • System architecture decisions
  • Performance tuning and capacity planning
  • Security design
  • Database schema evolution

Leadership stance: Sense → Analyse → Respond

You observe what's happening, bring in the right expertise to analyse it, and respond based on that analysis. Good engineering organisations thrive here - it's where deep technical skill creates genuine value.

Failure mode: Assuming there is only one right answer, or that the expert in the room must be deferred to without challenge. Complicated problems reward expertise, but they also reward second opinions and structured analysis.


Complex

The domain of unknown unknowns. This is where most of your leadership attention should live.

In the complex domain, cause and effect can only be understood in retrospect. You cannot design the solution upfront. Patterns emerge over time. The environment responds to your actions in ways you cannot fully predict.

Engineering examples:

  • Culture change
  • AI adoption across engineering teams
  • DevOps transformation
  • Platform engineering rollout
  • Building psychological safety
  • Organisational redesign

Leadership stance: Probe → Sense → Respond

You run safe-to-fail experiments. You observe what happens. You amplify what works and dampen what doesn't. You do this repeatedly, adjusting as the system responds.

The critical phrase is safe-to-fail, not fail-safe. You're not trying to prevent failure - you're designing experiments small enough that failure is tolerable and informative.

Failure mode: Applying detailed plans and expecting predictability. This is where transformation programmes die. You cannot roadmap your way through a complex domain. The leaders who succeed here are the ones who treat their plans as hypotheses, not commitments.


Chaotic

The domain of no cause and effect - at least, not one you can act on right now.

In the chaotic domain, there is no time for analysis. Something is broken, spreading, or burning. The priority is to act - to establish any order before attempting sense-making.

Engineering examples:

  • Major production outages
  • Security breaches and active intrusions
  • Data loss events
  • Critical vendor failures mid-deployment

Leadership stance: Act → Sense → Respond

You do something - not everything, but something - to stabilise the situation. Once you've interrupted the freefall, you can begin to understand what happened and move into complex or complicated thinking.

Failure mode: Analysis paralysis. The worst thing you can do in a chaotic situation is convene a working group. Act first. Understand later.


Disorder

The domain you're actually in most of the time - and the most dangerous one.

Disorder is when you don't know which of the other four domains you're in. People default to their comfort zone:

  • Engineers treat everything as complicated (it just needs the right analysis)
  • Leaders treat everything as clear (it just needs the right process)
  • Innovators treat everything as complex (everything needs an experiment)

No one agrees because everyone is speaking from a different implicit model of reality. This is the domain where most organisations actually operate most of the time, and it's why so many decisions feel confused, circular, or politically charged.

The way out is deliberate sense-making. Stop. Name the domain. Build shared understanding of what kind of problem you're actually dealing with.


Misdiagnosis Is the Real Problem

Here is the thing that organisations rarely say out loud but is almost always true:

Most transformation programmes fail not because they are wrong, but because they are misclassified.

The intent is good. The people are capable. But the approach is fundamentally mismatched to the nature of the challenge.

Common patterns:

Mistake What It Looks Like
Treating complex work as complicated Heavy planning, slow progress, growing backlog of "edge cases"
Treating chaos as complex Too much discussion, not enough action, the fire keeps spreading
Treating clear work as complex Unnecessary reinvention of solved problems, endless workshops
Treating complicated work as clear Best practice applied without context, experts ignored

The second pattern deserves particular attention. When organisations face genuine disruption - a major incident, a market shift, a platform failure - the temptation is to convene a discovery process. Run a retrospective. Form a working group. In a chaotic domain, this is catastrophic. Act first. Stabilise. Then reflect.


Mapping Digital Engineering Activities

Once you start seeing through a Cynefin lens, your work landscape looks different. Here is a practical mapping for common engineering activities:

Activity Domain Implication
CI/CD pipeline operation Clear Standardise, automate, measure compliance
System architecture decisions Complicated Bring the right experts, document trade-offs
Performance tuning Complicated Analyse, instrument, optimise iteratively
Platform engineering adoption Complex Run pilots, learn from early adopters, scale what works
AI integration into engineering workflows Complex Experiment team by team, don't mandate upfront
DevOps transformation Complex Probe with safe-to-fail experiments, not programmes
Major incident response Chaotic Act to stabilise, then move to complicated/complex
Culture change Complex Never, ever treat this as a project
Standard onboarding Clear Standardise ruthlessly, automate where possible

The value of this table isn't the specific classifications - yours may differ. The value is having the conversation at all.


Leadership Implications

Autonomy and Constraints

Where you sit in Cynefin should determine how tightly you constrain your teams:

  • Clear → strict constraints. There is a best practice. Enforce it.
  • Complicated → expert-informed constraints. Set the standards, allow professional judgement within them.
  • Complex → flexible guardrails. Define the boundaries of safe-to-fail, then get out of the way.
  • Chaotic → command and control. Temporarily. Until you've stabilised.

Applying uniform governance across all four domains is itself a failure mode. The leader who insists on approval gates for every architectural decision, regardless of context, is slowing down the complicated domain unnecessarily. The leader who gives teams full autonomy in the chaotic domain is creating more chaos.

BVSSH Alignment

The BVSSH outcomes map cleanly onto Cynefin thinking:

  • Better is primarily a complex pursuit. Quality, culture, and engineering excellence emerge through experimentation, feedback, and learning - not through compliance.
  • Safer requires discipline across clear and complicated domains. Security standards, compliance controls, and operational runbooks belong here.
  • Sooner is achieved by moving work across domains. Complexity gets understood, becomes complicated, gets standardised, becomes clear. Flow improves as you reduce the cognitive load at each stage.
  • Happier lives almost entirely in the complex domain. You cannot mandate psychological safety. You probe, observe, and amplify.

DORA Metrics

DORA metrics - deployment frequency, lead time for changes, change failure rate, mean time to restore - are powerful. But they are meaningful in the clear and complicated domains, and potentially misleading in the complex domain.

If you're running safe-to-fail experiments in platform adoption or AI integration, your deployment frequency might intentionally be low during a discovery phase. Your lead time might be long while you're learning. Applying DORA benchmarks as universal targets across all domains rewards the wrong behaviour.

Use DORA metrics where cause and effect are understood. In complex domains, track learning velocity instead.


OKRs: The Framework We Keep Getting Wrong

Outcome-oriented OKRs are arguably the best goal-setting framework available to engineering organisations. But most teams use them incorrectly - and Cynefin explains exactly why.

There are two distinct levels where complexity plays out in an OKR structure, and conflating them is where things go wrong.

The Objective is a complex outcome

An objective like Improve developer experience sits firmly in the complex domain. It is long-term, emergent, and shaped by human behaviour you cannot fully predict. You cannot design your way to it. You can only move towards it iteratively, as signals accumulate and patterns emerge.

The Key Result is a hypothesis

A Key Result like Reduce P95 build time to under 3 minutes by Q3 is not a delivery commitment - it is a hypothesis. It encodes a belief: if we reduce build time, we think developer experience will improve.

That is a reasonable bet. Build time is a known friction point. But it is still a bet. Developer experience is complex enough that reducing build time might move the needle significantly, or it might turn out that slow builds weren't actually the primary frustration. You don't know until you act and observe.

This reframing matters. A Key Result in a complex domain is not something you execute - it is something you probe towards. You are testing whether your hypothesis holds, not delivering a predetermined outcome.

Initiatives are your Probe → Sense → Respond actions

The activities you run to test the hypothesis - the surveys, the experiments, the improvements - are not Key Results. They are initiatives, sometimes called bets or probes. They sit beneath the Key Result, not alongside it:

Objective: Improve developer experience
Key Result (measurable hypothesis): Improve developer satisfaction score from 6.2 to 7.5 by end of Q3 - on the belief that targeted friction reduction drives measurable sentiment improvement

Initiatives (Probe → Sense → Respond):

  • Run a developer satisfaction pulse with at least 60% response rate to surface the top friction points (Probe)
  • Implement one targeted improvement per friction point and measure sentiment change within 30 days (Sense)
  • Scale the highest-impact improvement to all teams and re-baseline the satisfaction score (Respond)

The initiatives are how you test the hypothesis. If the satisfaction score moves, your hypothesis holds and you double down. If it doesn't, you've learned something - and you revise the hypothesis before committing further.

This is Probe → Sense → Respond applied to OKRs. You're not abandoning rigour. You're applying the right kind of rigour for the domain you're actually in.

The implications are significant:

  • Treat Key Results as hypotheses, not commitments. In a complex domain, a Key Result tells you what you believe will move the objective - not what you guarantee will happen.
  • Keep initiatives lightweight and time-boxed. If an initiative doesn't move the Key Result, that is data. Stop it. Try something else.
  • Treat missed Key Results as learning, not failure. If the hypothesis didn't hold, you've discovered something true about the system. That's worth more than hitting a number that was guessed upfront.

The organisations that get the most from OKRs are the ones that stop treating them as a performance management tool and start treating them as a learning system. Cynefin gives you the language to make that distinction clearly - and to hold your teams accountable for learning, not just for hitting numbers.


A Practical Playbook

Step 1: Diagnose Before Acting

Before selecting an approach, ask:

  • Is cause and effect predictable and repeatable?
  • Can we design the solution upfront with reasonable confidence?
  • Has this problem been solved before, in comparable conditions?

If the answer to all three is yes, you're in clear or complicated territory. If the answer to any of them is no, you may be in complex or chaotic territory.

Step 2: Match the Approach to the Domain

Domain Approach
Clear Standardise. Automate. Enforce. Stop reinventing.
Complicated Bring the right expertise. Analyse the trade-offs. Document the decision.
Complex Run safe-to-fail experiments. Probe, sense, respond. Build in feedback loops.
Chaotic Act immediately. Stabilise first. Understand later.

Step 3: Re-evaluate Continuously

Domains shift. This is one of the most important things Cynefin teaches.

A new technology starts in the complex domain - patterns are unknown, adoption is uncertain, outcomes are emergent. Over time, as understanding grows, it moves into the complicated domain - experts can reason about it, trade-offs become visible. Eventually, with enough standardisation, it reaches the clear domain - best practice is established, it becomes routine.

This is the natural direction of travel: Chaos → Complex → Complicated → Clear.

Your job as a leader is to accelerate this movement where it matters, and to resist the temptation to treat something as clear before it has actually become clear.


The Closing Point

There is a temptation, especially in engineering organisations, to want everything to be in the clear domain. Clear is comfortable. Clear is efficient. Clear is measurable.

But the goal isn't to move everything into the clear domain.

The goal is to recognise where you are, and respond appropriately.

The organisations that perform best over time aren't the ones with the most standardisation, or the best processes, or the most rigorous governance. They're the ones that can read the nature of a challenge accurately - and shift their approach to match it.

High-performing engineering organisations don't just solve problems well.

They understand what kind of problem they are solving.

That is the discipline Cynefin develops. And it may be the most important leadership skill available to engineering organisations operating in a world that is becoming more complex, not less.

Ragan McGill
Ragan McGill

Engineering leader blending strategy, culture, and craft to build high-performing teams and future-ready platforms - driving transformation through autonomy, continuous improvement, and data-driven excellence.