×
Due diligence duds: Salesforce study reveals AI agents fail 65% of multi-step CRM tasks
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

A new study led by Kung-Hsiang Huang, a Salesforce AI researcher, reveals that large language model (LLM) agents struggle significantly with customer relationship management tasks and fail to properly handle confidential information. The findings expose a critical gap between AI capabilities and real-world enterprise requirements, potentially undermining ambitious efficiency targets set by both companies and governments banking on AI agent adoption.

What you should know: The research used a new benchmark called CRMArena-Pro to test AI agents on realistic CRM scenarios using synthetic data.

  • LLM agents achieved only a 58 percent success rate on single-step tasks that require no follow-up actions or additional information.
  • Performance plummeted to just 35 percent when tasks required multiple steps to complete.
  • The study found that “agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance.”

Why this matters: These limitations could derail major efficiency initiatives that depend heavily on AI agent capabilities.

  • Salesforce CEO Marc Benioff previously told investors that AI agents represent “a very high margin opportunity” as the company captures a share of customer efficiency savings.
  • The UK government has targeted £13.8 billion ($18.7 billion) in savings by 2029 through digitization efforts that rely partly on AI agent adoption.
  • Organizations may be overestimating AI agents’ readiness for complex enterprise tasks.

The research approach: CRMArena-Pro creates a realistic testing environment by feeding synthetic data into a Salesforce organization sandbox.

  • The benchmark addresses what researchers called a gap in existing tools that “failed to rigorously measure the capabilities or limitations of AI agents.”
  • Previous benchmarks largely ignored AI agents’ ability to recognize sensitive information and follow proper data handling protocols.
  • Agents must decide between making API calls or requesting clarification from users based on query complexity.

The big picture: “These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios,” the research paper concluded.

  • The study highlights the disconnect between AI marketing promises and actual performance in business-critical applications.
  • Organizations should exercise caution before banking on AI agent benefits that remain unproven in complex enterprise environments.
LLM agents flunk CRM and confidentiality tasks

Recent News

Ecolab CDO transforms century-old company with AI-powered revenue solutions

From dish machine diagnostics to pathogen detection, digital tools now generate subscription-based revenue streams.

Google Maps uses AI to reduce European car dependency with 4 major updates

Smart routing now suggests walking or transit when they'll beat driving through traffic.

Am I hearing this right? AI system detects Parkinson’s disease from…ear wax, with 94% accuracy

The robotic nose identifies four telltale compounds that create Parkinson's characteristic musky scent.