← Back to Perspectives

The Vanity Metric Trap

Your chatbot dashboard looks impressive. The vendor reports 40% containment rate. The line chart trends upward. The C-suite sees cost savings. You look competent, innovative, forward-thinking.

But here is what nobody is measuring: how many of those "contained" conversations actually resolved the customer's problem? How many customers left the interaction satisfied? How many had to phone the call centre immediately afterwards anyway?

We have audited chatbot implementations across dozens of organisations. The pattern is depressingly consistent. Containment rates of 30-50% are common. Actual resolution rates—the percentage of conversations where the customer achieved their objective without escalating—often sit below 5%. Some organisations we reviewed had effectively zero genuine resolution through their chatbot, despite proudly reporting "industry standard" containment figures.

How Vendors Game the Numbers

Containment is a meaningless metric because it is trivial to manipulate. A vendor can report high containment simply by making the chatbot refuse to escalate. When customers give up in frustration after ten minutes of circular conversation, that counts as a "contained" interaction. When the bot pretends to understand but provides no useful information, that counts as contained.

We saw one retail chatbot that responded to refund requests with: "I understand you want a refund. Refunds are processed by our returns team. Is there anything else I can help you with?" This resolved nothing. The customer still needed to contact support. But it counted as contained because the customer eventually stopped engaging.

Other common tricks include:

These tactics inflate containment metrics while destroying customer trust and increasing downstream costs.

The Hidden Costs of Fake Containment

The problem goes deeper than misleading dashboards. Every failed chatbot interaction creates real business costs that do not show up in your vendor reports.

The Recontact Multiplier: When customers fail to resolve their issue via chatbot, they contact you again through another channel. Often they are angrier and have higher expectations because they have already wasted time. The cost to handle these second contacts is 2-3x higher than handling the original query properly.

The Satisfaction Bleed: Customer satisfaction with your brand drops with every failed self-service attempt. Research shows that customers who fail at self-service have satisfaction scores 20-30% lower than those who never attempted self-service at all. You are actively damaging relationships with your most motivated digital customers.

The Agent Corrosion: When your human agents pick up the pieces, they spend the first minutes of every chatbot-escalated call untangling confusion and rebuilding rapport. This is exhausting work that drives attrition. We have seen agent retention rates drop 15-20% in teams handling high volumes of chatbot fallout.

How to Measure What Actually Matters

Stop reporting containment. Start measuring these instead:

Verified Resolution Rate

Not "did the customer stop chatting" but "did the customer confirm their issue was resolved?" This requires explicit confirmation, ideally with a simple post-interaction survey question: "Did this solve your problem? Yes / No."

Recontact Rate Within 24 Hours

Track how many chatbot users contact you again within a day through any channel. If your chatbot was genuinely helpful, this number should be low. Most organisations we audit see 40-60% of chatbot users recontact—evidence of systemic failure.

Channel Shift Index

Measure whether your digital channels are reducing contacts to expensive channels (phone, email), not just moving them around. A proper chatbot implementation should show a measurable decrease in total contact volume, not just redistribution.

Customer Effort Score (CES)

Ask customers how easy it was to get what they needed. Chatbot interactions should score comparably to or better than human-assisted channels. If your chatbot scores higher effort than speaking to an agent, you have failed.

Task Completion Rate by Intent

Break down resolution rates by the type of query. Knowing your chatbot solves 80% of password resets but 2% of billing disputes tells you where to focus improvement efforts.

The 90-Day Audit Framework

If you suspect your chatbot metrics are misleading, here is how to find the truth:

Week 1: Baseline Your Reality

Add a mandatory post-chatbot question asking whether the issue was resolved. Enable tracking that links chatbot sessions to subsequent contacts across all channels. Accept that the numbers will be uncomfortable.

Weeks 2-4: Deep Dive Analysis

Sample 100 recent chatbot conversations where no escalation occurred. Have humans review them against the original query. Categorise outcomes: truly resolved, partially resolved, given information but not resolution, abandoned in frustration, completely misunderstood.

Weeks 5-8: Shadowing and Validation

Listen to calls or review chats from customers who used the chatbot first. Document what went wrong. Map the most common failure patterns.

Weeks 9-12: Reset and Rebuild

Based on actual evidence, redesign conversation flows for your top five intents. Focus on resolution, not containment. Pilot changes with a control group.

What Good Chatbots Actually Look Like

The organisations using chatbots effectively share common characteristics:

They are Ruthlessly Scoped

Successful chatbots handle three to five things exceptionally well, not fifty things badly. They are explicit about their limitations and offer fast, clear escalation paths.

They Optimise for Resolution, Not Engagement

The goal is to get the customer what they need as quickly as possible, even if that means immediately transferring to a human. Long conversation duration is a warning sign, not a success metric.

They Integrate Properly

Real value comes from connected systems: checking order status in the warehouse database, processing returns in the CRM, updating accounts in real-time. If your chatbot only provides information from a static FAQ, it is not solving anything.

They Learn from Failures

Every escalation is analysed. Every failure pattern is documented. The bot improves weekly based on what actually happened, not what was theoretically supposed to happen.

The Hard Truth About Cost Savings

Full disclosure: properly implemented chatbots that genuinely resolve customer issues do save money. Our clients typically see 15-25% cost reduction in handling for well-scoped, well-integrated intents.

But this requires investment. Good conversation design. Proper system integration. Continuous optimisation. Testing with real customers. The "cheap" chatbot that promises 40% containment with minimal set-up costs is lying to you about both the results and the total cost of ownership.

The real cost savings are modest and hard-won. The customer satisfaction improvements only come if you prioritise resolution over containment. The vendor promising breakthrough results with minimal effort is selling you vanity metrics that will crumble under scrutiny.


Suspect your chatbot metrics are too good to be true? Albion Illiriya conducts independent chatbot audits that reveal the real picture behind vendor dashboards. We help organisations rebuild their conversational AI around resolution and customer outcomes. Get in touch for a confidential discussion about your situation.