Claude Code Fail: Why CTOs Need the Index of Human Reliance

Anthropic has officially acknowledged a systemic decline in the quality of Claude Code. A series of critical bugs—ranging from reduced reasoning depth to caching failures—has transformed the tool into an unpredictable 'black box.' According to a report by The Decoder, between March 4 and April 20, the company intentionally lowered the 'reasoning effort' from high to medium in an attempt to reduce latency. The gamble that users wouldn't notice the drop in quality failed: the tool became significantly less capable, forcing a rollback of the changes on April 7.

Anthropic’s technical post-mortem vividly illustrates how easily 'silent' degradation can bypass internal testing. A bug in the Claude Agent SDK introduced on March 26 was intended to clear reasoning history after an hour of inactivity; instead, it wiped the history after every single step. Consequently, Claude lost context, repeated old mistakes, and exhausted usage limits in vain. Simultaneously, an attempt to reduce the verbosity of the Opus 4.7 model via a system instruction (a 100-word limit) caused a 3% drop in quality—a regression only detected after an expanded evaluation. Anthropic admitted that the layering of these issues created a cumulative decline that was difficult for users to articulate but impossible for the company to ignore.

For CTOs and engineering leads, this case serves as a stark warning. The industry’s acute shortage of compute power is forcing providers to balance update speed against model stability, and that balance is increasingly shifting to the detriment of the latter. A tool that worked perfectly yesterday may degrade today without a single change to your own codebase. Blindly trusting a vendor's internal testing is becoming a dangerous oversight.

In our view, it is time for companies to implement an Index of Human Reliance (IHR). This metric tracks how often developers must manually correct errors made by AI agents. If your team treats AI as a 'set-and-forget' solution, you are effectively outsourcing quality control to algorithms constrained by hardware shortages. The priority must shift from the volume of generated code to the number of sessions completed without external intervention. The era of blind faith in autonomy is over: the future belongs to rigorous external audits and a persistent human-in-the-loop model.

Source: The Decoder →

Rate this material

★ ★ ★ ★ ★

AI in BusinessAI AgentsLarge Language ModelsAnthropic