August 28th 2025

90% of AI Agents at the “double cliff”: General can't achieve it, and vertical can't get it

Ollie @PuppyAgent

Image Source:PuppyAgent

TheGartner 2025 report states that 83% of corporate AI projects fail to meet expectations. While the AI industry is still debating the merits of "generalists" versus "vertical specialists," a harsh reality is emerging: 90% of AI Agent companies are caught in a double bind of "insufficient general capabilities + lack of vertical data." In this dilemma with no right answer, 90% of Agent projects are quietly heading toward failure, while the survivors are searching for new ways to stay alive on the edge of the cliff.

Front Cliff: The "Capability Cliff" of General Agents

Intent understanding and instruction following are two crucial basic capabilities for an Agent. Traditional foundational large models can no longer meet the demands of complex tasks. Agents require diverse workflows, systems, and controls... encapsulating the base models into implementable systems.

The vision of ageneral Agent is enticing—a single intelligent entity capable of solving problems across various fields.

Image Source:PuppyAgent

However, when most companies attempt to apply so-called generalAI Agents to professional scenarios, they frequently encounter a "capability cliff." A 2025 test by Stanford Universityrevealed a shocking fact: when user instructions exceed three rounds of dialogue, the intent recognition accuracy of general Agents plummets to 41%.Although models like Claude have expanded the system prompt to 128K tokens, in multi-role scenarios, these Agents still frequently confuse the users' true needs, misjudging simple inquiries as complex decisions.

What's more dangerous is the trap of scaled-up hallucinations.

In professional scenarios such as financial risk control, the error rate of content generated bygeneral Agents is as high as 52%, and these errors are often presented in a professional tone, such as fabricating regulatory provisions or making up statistical data. A bank once invested heavily in developing a general risk control Agent, only to find that its error rate exceeded 65% when handling composite decisions involving "customer historical behavior + market fluctuations + policy changes," forcing the company to invest 3.2 times the manpower for verification. An MIT experiment on cross-field migration proved that when an Agent trained in the medical field was transferred to a legal scenario, the task pass rate dropped from 78% to 32%. The core issue lies in the non-generalizability of the action space—when the Tool Usage interface switches from a medical API to a financial API, the Agent cannot adaptively adjust its action space.

Many companies are falling into a dangerous misconception: equating "being able to run a demo" with "having business value."

Image Source:PuppyAgent

A car manufacturer once invested 20 million in training a "general customer service Agent," but it failed in real scenarios because it couldn't handle composite decisions involving "tire type + weather + driving habits." This reveals a key paradox: the more a general Agent pursues "omnipotence,"the lower its reliability in vertical scenarios becomes.

"We are not training intelligent agents; we are dressing up hallucinations in professional attire." The dilemma of general Agents is not that they can't do everything, but that they can't even perform basic actions reliably in professional scenarios.

Back Cliff: The "Resource Cliff" of Vertical Agents

When companies turn to vertical Agents for a breakthrough, they find themselves falling into another "resource cliff."

Core industry data is like treasure locked on an island: diagnostic data from top-tier hospitals, bank risk control logs, and other key assets areinaccessible to 91% of companies due to compliance barriers. What's more severe is the data quality trap. An industrial AI team spent eight months obtaining equipment failure data, but 67% of it was invalidated due to inconsistent labeling standards—vertical data requires industry know-how to be used correctly. Real cases show that the cost for a medical AI company to obtain 100,000 compliant and labeled datapoints has soared from 830,000 yuan in 2022 to 4.12 million yuan in 2024, a staggering increase of 400%.

Image Source:PuppyAgent

Even more scarce than data are the hybrid talents who can bridge the gap between technology and industry.In the development of financial Agents, engineers who can master both quantitative trading logic and RLHF (Reinforcement Learning from Human Feedback) tuning are in short supply, with a market availability ofless than 3.7% of the demand, resulting in an astonishing supply-to-demand ratio of 1:27. The communication breakdown between industry experts and AI engineers often leads to disastrous outcomes: industry experts produce vague experience-based "rule fragments," which AI engineers then force into "erroneous knowledge graphs," causing the final Agents to deviate significantly from the essence of the business. A manufacturing client requested the development of a "device failure prediction Agent," but the industry experts were unable to describe the "spectral characteristics of bearing noise" in technical terms, leading to model training that completely missed the actual requirements.

The dilemma of vertical Agents is not only about not being able to obtain data, but also about not being able to understand or use the data correctly even when it is obtained.

Survival Strategy: The Three-Step Breakthrough Method to Bridge the Double Cliff

At present, most base models rely on distillation from series of models like GPT and Claude. The majority of data has not been labeled for their own business scenarios and regional/national conditions. Simply constructing workflows and adding RAG (Retrieval-Augmented Generation) and other means cannot achieve truly end-to-end implementation capabilities.

Many companies or organizations haven't even sorted out their own business data flywheels, let alone drive business growth with Agents.

Image Source:pexels

In this awkward situation, what may be more important is to be able to use Agents to truly create more business value-adding scenarios. If high accuracy is not available, one can start with creative fields that require lower accuracy; if there is a lack of data, one can explore business entry points with more open-source data. The fundamental thing is to obtain more real cases, so as to build one's own Agent moat in actual combat.

If you also want to throw away the complicated work at hand, please click the "Get Started" on the side to explore the business possibilities brought by PuppyAgent.

Image Source:PuppyAgent

PuppyAgent has always been exploring the use of dynamic interactive RAG and Agents to serve real business growth in workflows.We hope to inspire users with each end-to-end case, rather than boasting about our generalizability, nor being confined to a single vertical scenario. We have now implemented numerous cases in fields such as customer service,housing rental,legal affairs,and document management.You can click towatch the case videos.

Conclusion

Excellent cases and companies all share a common feature: they no longer focus on creating the "perfect Agent," but on building a "human-machine collaborative decision-making system," setting up safety rails at key nodes and unleashing Agent efficiency in routine scenarios.

For companies teetering on the edge of the cliff, the primary task is to face the reality:general Agents can't do everything, and vertical Agents can't firmly grasp core resources. The real way out lies in systematic thinking—organically combining industry knowledge distillation, small data enhancement, and human-machine collaborative verification to form an implementable "three-step breakthrough method."

Image Source:PuppyAgent

We may repeatedly ask Manus/Genspark, if one dayOpenAIandGooglereally achieve general Agents, where will your competitiveness be? We may also ask OpenAI and Google what the real difficulties in scaling scenarios and generalizing workflows are. Essentially, this is a tightrope hanging between cliffs.

Is your company ready to walk this tightrope?

Previous Blogs

April 30th 2025

How RAG Improves Customer Service Efficiency and Accuracy

RAG-based customer service boosts efficiency and accuracy by combining real-time data retrieval with AI, ensuring precise, context-aware responses for customers.

May 12th 2025

A Comprehensive Guide to Enterprise RAG Implementation Success

Enterprise RAG implementation guide: Avoid pitfalls in self-development, analyze top frameworks, and configure systems for scalability and success.

See All Blogs