Everyone has an AI pilot. Few have AI in production.
We've reviewed dozens of AI initiatives across financial services and telecoms. The pattern is consistent: six-month pilots that prove technically possible but business-impossible. Then they die.
Here is why, and what we do differently.
The three failure modes
1. Starting with the tech, not the outcome
"We want to try GPT-4" is not a use case. It is a distraction. We see teams spend months integrating models before asking: what metric will this actually move? If you cannot answer that in one sentence, stop.
2. Underestimating the operational work
AI does not operate itself. Someone must monitor drift, handle edge cases, and defend decisions to regulators. Most pilots ignore this until month five, when someone asks who owns the thing.
3. No handover plan
Consultants build impressive demos. Then they leave. The internal team has no documentation, no training materials, and no budget to maintain what was built. The pilot ends; nothing ships.
What works instead
We run AI programmes in four phases.
Phase 1: Use case validation (2–3 weeks)
- Define the decision this AI will make
- Identify who reviews it, when it fails, and how we measure success
- Kill it here if the numbers do not work
Phase 2: Design with guardrails (4–6 weeks)
- Build the minimum viable intervention
- Design the human-in-the-loop process from day one
- Document operational responsibilities before writing code
Phase 3: Pilot with teeth (8–12 weeks)
- Run against real data with real users
- Measure actual business metrics, not technical accuracy
- Daily stand-ups with the business, not just the tech team
Phase 4: Production handover (4 weeks)
- Train internal teams on operation and maintenance
- Deliver runbooks, not just repositories
- Stay on-site until the internal team says they don't need us
A real example
A challenger bank wanted AI to triage customer complaints. Their first pilot classified emails with 94% accuracy — and saved zero minutes, because no one trusted it enough to skip the manual review.
We started over. Same technology, different problem: could we reduce the time analysts spend gathering case context by suggesting relevant templates and previous resolutions?
Six weeks later: analysts saved two hours per day. Accuracy mattered less than utility.
The bottom line
AI projects fail when they optimise for the demo. They succeed when they optimise for the operator.
If you are six months into an AI pilot with nothing in production, the problem is not the model. It is the approach.