How DoorDash Cut AI Search Costs by 98%

Native search grounding has become the standard for AI agents, but in industrial production, it is turning into an architectural trap. When search is baked into a provider's API as a "black box," you lose control over data retrieval policies and token costs. Emmanuel Boateng and the DoorDash engineering team are proving that to make agents stable, you must strip them of the privilege to roam the web independently and move search into a decoupled layer.

The Redundancy Crisis: When Search Ruines Character

The primary issue uncovered at DoorDash is Search-Induced Verbosity. As soon as a model gains access to external data via native tools, it tends to ignore system instructions. An agent tasked with providing a short snippet of code or an object name suddenly begins churning out explanatory paragraphs, inspired by the retrieved context. This isn't just annoying—it breaks software contracts and inflates token bills. The model fails to balance the weight of found evidence with the conciseness of the original prompt. DoorDash's solution is Decoupled Search Grounding (DSG), an architecture where data retrieval is transformed into a structured tool layer rather than a hidden internal model process.

"Real-time grounding is an optimizable interface boundary, not a fixed model feature."

This approach allows for semantic-level caching, which is critical for response predictability. Benchmarks on SimpleQA and FreshQA demonstrated that DSG gives engineers control over search depth and context rendering—capabilities that native integrations simply do not offer. While standard solutions struggle to keep up with data freshness, DSG ensures rigid format compliance—a quality valued higher in production than a model's ability to "chat."

The Economics of a Modular Stack

DoorDash's financial results read like a death sentence for monolithic solutions. In Query Intent Understanding (QIU) workloads, the DSG architecture not only matched native search in accuracy but slashed costs by 98%. The secret lies in creating a shared grounding layer that achieved a 99.4% warm cache hit rate. For businesses, this marks the end of LLM provider dictatorship: model selection is no longer tied to the quality of a provider's proprietary search. You can swap reasoning engines or data providers without rewriting your application logic.

"On SimpleQA, accuracy is nearly identical to native (86.1% vs 87.7%), while search costs are 91% lower."

Beyond savings, the modular approach solves the latency bottleneck. Layer separation allowed DoorDash to reduce latency by 68% through optimized caching. Instead of trusting a "black box" to decide which sources to believe, DSG enables the implementation of custom evidence-verification policies. The model now functions as a processor for verified information rather than an autonomous browser prone to improvisation. The "all-in-one" era ends where high workloads and the need to control TCO begin. While accuracy on hyper-current data remains a point of debate, for stability and cost-efficiency, moving search outside the LLM "brain" is becoming the only viable path.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

AI in BusinessCost ReductionRAG and Vector SearchLarge Language ModelsDoorDash

Beyond the Black Box: How DoorDash Slashed AI Search Costs by 98%

The Redundancy Crisis: When Search Ruines Character

The Economics of a Modular Stack