Introduction #
Here’s a scenario I keep running into in real projects: the algorithm engineer tells you the model accuracy has gone from 85% to 87%, and it took two weeks. You’re still 3 percentage points away from the 90% target, and at this rate it might take another month. But the product ships next month.
What do you do? Keep grinding on the algorithm? Or try a different angle?
That’s what this post is about: how to close the gap with product design when algorithm performance hits a ceiling. In the AI product world, there’s a shorthand for this mindset: “when technology falls short, let product design pick up the slack.”
Three Pitfalls in AI Product Design #
Before diving into specific techniques, let me share three lessons I’ve seen play out repeatedly.
Don’t Neglect the Front-End Interaction #
AI products have a peculiar quality: they’re like an iceberg. Below the waterline sits a massive amount of algorithm engineering work and iterative logic strategies. Above the surface is the only thing users ever see: the interactive interface.
You might spend most of your time on algorithms, business logic, data, and strategy, but that doesn’t mean you can afford to skimp on front-end interaction design. User experience comes first. All that backend logic exists to serve the user1.
I’ve seen plenty of projects make this mistake: spending 80% of the effort on algorithm performance while leaving the front-end rough and unpolished. Users simply don’t care. No matter how good the algorithm is, if people won’t use it, it doesn’t matter.
Keep the Algorithm Loop Closed #
AI products have a requirement that other products rarely worry about: the algorithm function module must form a closed loop.
Offline, the cycle looks like this: collect bad cases, train the model, improve performance, deploy. But the online product needs to keep this loop running smoothly too: data collection, model training, iterative optimization, deployment. The entire pipeline has to work seamlessly and continuously.
If your product only has a “use” function but no “feedback” or “optimization” channel, the AI product is essentially dead. The launch day is its peak, and it only goes downhill from there.
Limited Algorithm Performance Is the Norm #
Algorithm optimization has a harsh reality: putting in 80% more effort might only gain you 1-2 percentage points of accuracy. Going from 85% to 87% might take two weeks; from 90% to 92% might take two months2.
So you need to learn how to leverage other forces: rule-based assistance, algorithmic aids, corpus cleaning, and so on. The core idea is simple: “when technology falls short, let product design compensate” or “let data compensate.”
Making It Concrete with a Smart Customer Service Scenario #
Let’s ground this in a specific use case: a customer service chatbot. A complete intelligent customer service system has three key touchpoints.
Touchpoint 1: The Visitor Chat Page #
What users see the moment they enter the consultation page determines the rest of their experience. This page needs several carefully designed elements:
- Quick access: a transfer-to-human button, a rating button, common action shortcuts, giving users an escape route at all times
- Question recommendations: if the user just submitted a loan application, suggest approval timeline queries or disbursement schedule questions they might care about
- Product or promotion info: if the user came from a loan product detail page, show that product; if from the homepage, show current featured products
The goal here isn’t to show off. It’s to anticipate the user’s needs before they even speak, reducing their effort.
Touchpoint 2: The Q&A Interaction #
This is where users talk directly to the bot, and it’s where product design is tested the most. Different scenarios require different conversation management strategies: how to guide in a pre-loan consultation scenario, how to push forward in an outbound marketing scenario, how to handle early repayment. Each needs its own design.
There’s a crucial design principle here: offer users selectable options, not instructions.
Here’s a concrete example. The typical approach is to tell the user, “You can check in the Loan Management section of the mobile banking app.” The user still has to find the entry point themselves. The better approach: display the user’s loans directly and let them pick one. One tap and done.
Give users multiple choice, not fill-in-the-blank. This principle is critical in AI product design3.
Also, if your product involves voice interaction, there’s a dedicated role called a VUI designer (Voice User Interface). In voice scenarios, users don’t even have a screen. The entire experience comes from each turn of conversation, so dialogue script design directly determines the product’s success or failure.
Touchpoint 3: The Admin Backend #
This is for the operations team, typically including:
- Configuring visitor-side display content
- Maintaining the knowledge base
- Training and teaching the bot
- Viewing data reports
For an MVP, these three touchpoints are the core: a client-side chat page + Q&A logic + admin backend. Everything else can come in later iterations.
Two Directions for “Product Compensating for Technology” #
When algorithm performance isn’t good enough, there are two main directions to pursue.
Direction 1: Algorithm-Assisted Recognition #
Expand similar questions during cold start. At launch, each standard question in the knowledge base might only have a handful of phrasing variations. If you throw the algorithm straight at it, accuracy will be low.
The solution is to expand all standard questions or use technical means to generate similar phrasings. For example, if the standard question is “Can I repay my loan early?”, you could expand it to:
- “How do I make an early repayment”
- “I’ve been paying for a year, can I still settle early”
- “Where’s the early repayment entrance”
The more phrasings you have, the broader the algorithm’s matching range. During cold start, manually expanding similar questions is the simplest and most effective improvement.
Add regex or keyword-based forced recall. This is a highly practical “safety net” strategy. For example, if the user’s input contains the phrase “early repayment,” regardless of what the algorithm thinks, force-recall the “repayment methods” standard question.
Why do this? Because during cold start, the corpus is thin and the algorithm might miss obvious matches. Keyword forced recall acts as insurance: even if the algorithm misfires, keyword matching won’t let things slip through.
# Simple implementation of keyword forced recall
FORCE_RECALL_RULES = {
"early repayment": "repayment_methods",
"rate adjustment": "rate_change_process",
"early settlement": "early_settlement_policy",
}
def force_recall(user_input: str) -> str | None:
for keyword, intent in FORCE_RECALL_RULES.items():
if keyword in user_input:
return intent
return NoneGuide users to click and select. Another approach works from the user’s side, reducing the algorithm’s burden:
- Input suggestions: as soon as the user types a few characters, surface questions containing those characters. The user sees what they want before finishing their sentence and just clicks
- Quick access buttons: show the most likely questions at the start of a session; after the user asks something, predict and display the most likely follow-up
The essence of these designs: don’t expect the algorithm to guess user intent 100% correctly. Instead, make the “guessing” process visible and let the user help the algorithm choose4.
Direction 2: Clean Data to Reduce Algorithm Noise #
Sometimes the algorithm performs poorly not because the algorithm is bad, but because the input data is dirty. You can improve performance from a data cleaning angle.
Maintain a translation lexicon. In speech-to-text (ASR) scenarios, dialects are a major problem. For example, in Sichuan dialect, a phrase that sounds like “yao ai” means “okay,” but the algorithm doesn’t recognize it. If you maintain a dialect translation lexicon that automatically converts “yao ai” to “okay,” the algorithm can handle it correctly.
Maintain proper noun and synonym dictionaries. Brand names and proper nouns are frequently missegmented by algorithms. For example, “Shenzhou Car Rental” might get split into “Shenzhou” and “Car Rental,” or “NetEase Qiyu” into “NetEase” and “Qiyu.” Maintaining a proper noun dictionary tells the algorithm to treat these as single units.
Similarly, users might say “early repayment,” “early settlement,” or “I don’t want the loan anymore,” all meaning the same thing. A synonym dictionary helps the algorithm understand these different expressions.
# Proper noun dictionary example
proper_nouns:
- Shenzhou Car Rental
- NetEase Qiyu
- Fligy Travel
# Synonym dictionary example
synonyms:
early repayment:
- early settlement
- I don't want the loan anymore
- pay off the loan early
rate adjustment:
- my rate went up
- interest is too high
- can you lower the rateFilter meaningless filler words. Phrases like “uh uh uh,” “this that,” “you know that thing” don’t help with understanding user intent at all. They just add noise. Maintaining a blocklist to filter these out can noticeably improve algorithm performance.
The value of data cleaning is often underestimated, but it might offer the best return on investment of any optimization technique. No algorithm changes, no new model training. Just one extra processing step before data enters the algorithm5.
SaaS vs. Custom Products: Different Design Thinking #
The specific strategies for “product compensating for technology” also depend on your product type.
SaaS products serve many businesses. You can’t do offline data collection, labeling, and training for each customer. So the admin backend must support a complete online closed loop: online data collection, online labeling, online training, and online model metrics viewing. Customer data and model training processes need to be productized and automated. This demands more from product design, but once it’s done, the marginal cost is extremely low.
Deeply customized products offer more room to maneuver. For example, you could train 10 deep learning models for a specific scenario, allocate 10% of traffic to each for A/B testing, and after a period select the best-performing model. Custom products allow more aggressive optimization, but the per-customer delivery cost is also higher.
The core distinction: SaaS products win through automation and scale; custom products win through depth and differentiation. Think this through before deciding where to invest your resources.
-
This is especially true in consumer-facing products. Users have less patience with AI products than traditional ones because their expectations are higher. The first 30 seconds of experience determine whether they’ll stick around. ↩︎
-
Diminishing marginal returns in algorithm optimization is a well-documented phenomenon. Simply put, the closer you get to the theoretical ceiling, the more compute, data, and time each additional percentage point requires, often growing exponentially. ↩︎
-
This principle doesn’t apply only to AI products. Any scenario requiring user input benefits from multiple choice over fill-in-the-blank. But AI products especially need to prioritize this, because users have higher expectations for “intelligence” and lower tolerance for friction. ↩︎
-
This approach is common in recommendation systems too: rather than precisely predicting what a user wants, present a candidate set and let them choose. Giving the user the final say reduces algorithmic pressure while improving perceived quality. ↩︎
-
There’s an old saying in data science: “Garbage in, garbage out.” The ceiling of an algorithm is often determined not by the model architecture, but by data quality. Before investing heavily in model optimization, check whether your data is clean first. ↩︎