Oracle Poisoning: A New Threat to AI Agents and Knowledge Graphs

Your AI agents are making flawlessly logical decisions based on absolute lies. A new study by researchers from Microsoft, UNSW Canberra, and SAP introduces 'Oracle Poisoning'—a class of attacks that renders traditional prompt-injection defenses useless. The premise is simple: the attacker doesn't corrupt the instructions, but rather the structured knowledge graphs that the agent treats as its sole source of truth. Unlike standard RAG poisoning, which targets text similarity, this method manipulates the Model Context Protocol (MCP).

When an agent queries a database and receives a response stating that a function is safe or a dependency has been verified, it treats this as an objective fact rather than a recommendation. As lead researcher Ben Kereopa-York aptly notes, the agent becomes a prisoner in Plato’s Cave: the knowledge graph is the wall, and the query results are the shadows it is forced to accept as reality.

Data suggests the scale of this vulnerability is catastrophic. Researchers tested nine models from three leading providers against an industrial code graph containing 42 million nodes. At a moderate attack complexity (L2), every model tested accepted fabricated security data in 269 out of 270 cases. This near-100% success rate proves that when agents autonomously invoke graph query tools, they lack the skepticism required to verify data integrity. While trust in false data fluctuates between 3% and 55% during standard prompting, directed queries via internal tools cause a total collapse of defensive logic. Even advanced models that showed zero trust in static tests immediately defected to the 'lying oracle' once placed in a live agentic environment.

For the enterprise sector, this means the current obsession with prompt filtering and interface monitoring is like trying to lock the door of a house with no walls. The real problem lies in the integrity of the data infrastructure. After examining five defense mechanisms, researchers concluded that most are either partially effective or model-dependent. The only reliable way to eliminate the mutation vector is through rigorous read-only access controls. As AI agents integrate deeper with platforms like CodeQL or Sourcegraph to manage codebases, the trusted channel within the MCP becomes a critical point of failure.

It is naive to rely on the 'intelligence' of the model. In fact, the more powerful the agent, the more detailed, persuasive, and ultimately dangerous its version of a false reality becomes when built on poisoned data. Traditional cybersecurity is used to guarding the perimeter, but Oracle Poisoning proves that in the world of AI, the data is the perimeter. If your architecture allows an adversary to insert nodes or modify properties within a knowledge graph, no amount of prompt engineering or model alignment will prevent a systemic failure.

Tech leaders must shift their focus from protecting what an agent is told to auditing what an agent knows. The gap between perfect logic and false facts is precisely where the next generation of enterprise exploits will live.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

AI AgentsCybersecurityAI SafetyMicrosoft