MCP Protocol Security: Function Hijacking Risks for AI Agents

As the business community aggressively invests in autonomous agents, researchers have uncovered a significant structural flaw. According to a preprint by Yannis Belkhiter and his team published on arXiv, the industry is making a classic mistake by focusing primarily on jailbreaks and prompt injections. In the era of agents, the real threat stems from Function Hijacking Attacks (FHA), which allow attackers to seize control over tool selection via the Model Context Protocol (MCP).

The mechanics of the attack are elegantly destructive: an adversary forces the agent to call a specific malicious function, bypassing semantic context and prompt filters entirely. This is no theoretical hypothesis—during Belkhiter's experiments across five different models (including specialized reasoning variants), the Attack Success Rate (ASR) on the BFCL dataset ranged from 70% to 100%. Essentially, if your AI architect assumes the model will always "choose" the right tool based on user intent, the reality is far more precarious.

The MCP vulnerability provides a direct path to data sabotage, theft of corporate assets, and the creation of infinite execution loops. The research demonstrates that these attacks are universal: they are domain-agnostic and do not require fine-tuning for specific business processes. More concerningly, Belkhiter found that it is possible to train adversarial functions capable of hijacking tool selection across multiple diverse queries and configurations simultaneously.

From our perspective, deploying MCP in its current form without rigorous execution environment isolation represents an unjustifiable operational risk. The hype surrounding "autonomy" has blinded executive leadership: expanding agent permissions through MCP creates attack vectors that traditional cybersecurity frameworks are simply not equipped to handle. The reliability of agentic systems now depends less on the internal "intelligence" of the LLM and more on external defensive barriers. Integrating these agents into critical business logic today is effectively handing the keys to your infrastructure to a system that stress tests break 100% of the time. Until full execution isolation is guaranteed, trusting AI agents as internal actors is an unaffordable luxury.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

AI AgentsCybersecurityAI SafetyAI in BusinessAnthropic