Design the Failure State First

Methodology · May 2026 · 6 min read

Part of my AI Design Patterns library → AI Failure State Patterns

AI is wrong 30–70% of the time in real usage. The happy path demo is 30% of the design work. The failure state—when confidence is low, data is missing, or the model is uncertain—is the other 70%. This is what I design first.

The Illusion of the Demo

In 2020, I watched a machine learning team demo their new model to a PE client. Confidence: 94%. Accuracy: perfect. Everyone left the room impressed. Three months later, on actual deal data, the same model was wrong 40% of the time. The difference wasn't the model—it was the data it had never seen.

The demo showed the happy path. It didn't show the 40% of time when the model was uncertain, when input data was missing, when edge cases broke every assumption. That's the failure state. And that's where users lose trust.

What Failure Looks Like

Users encounter three types of AI failures:

Most teams don't design for these. They assume edge cases are "too rare" or "not important." Then production hits the edge cases and trust evaporates.

Failure state design is not error handling. It's trust recovery. When the system breaks, users need to know why and what to do. That's design.

My Approach: Three Layers

Layer 1: Prevention. Can we avoid the failure entirely? Validate inputs before they hit the model. Reject malformed data with a clear message: "This field expects an email address (example: [email protected])."

Layer 2: Transparency. If the model runs but is uncertain, don't hide it. Show the confidence. Show the alternative options. Show what the model was uncertain about.

Layer 3: Recovery. Give users a path forward when failure happens. "The model is uncertain. You can: (A) review the top 3 alternatives, (B) input additional data, or (C) ask a human expert."

Key Takeaway

Start every intelligent system design with the question: "What happens when this fails?" Not as an afterthought. Not as "error handling." But as the primary design challenge. The happy path is 30% of the work. The failure state is the 70% that determines whether users trust your system to run their business.

Continue exploring

Next essay
The Explainability Layer
All essays
Back to Writing