Stella Lorenzo, AI Director at AMD, has revealed stark data showing that Anthropic's Claude 3 model began exhibiting significant performance degradation shortly after its launch. An analysis of over 230,000 queries spanning February and March indicated a sharp drop in the model's median reasoning length, falling from 2,200 to 600 characters. Concurrently, the ratio of read operations to code edits contracted from 6.6:1 to 2:1. In essence, the model appears to have shifted from thoughtful analysis to superficial imitation, a development that is, to put it mildly, concerning.
The situation reportedly intensified after March 8, when Anthropic introduced a 'thinking redaction' feature, presented as a convenience. While the developers claimed this was a minor adjustment, AMD's data suggests the opposite. Following this change, Claude 3 began to refuse tasks more frequently and generated increasingly contradictory responses. Lorenzo noted that whereas there was previously some expectation of model stability, it must now be acknowledged that their performance can degrade visibly and rapidly post-launch. The long-term value proposition of such models is therefore an open question.
For businesses, this decline in quality translates directly into increased expenses. AMD estimates that the daily cost of using Claude has surged by a factor of 122 due to the drop in efficiency. Tasks are taking longer to complete, requiring more iterations, or are not being completed at all. The anticipated savings from automation are instead turning into unforeseen expenditures for manual rework and verification of outputs generated by the 'optimized' AI.
This case serves as a potent illustration of how changes presented as improvements can lead to hidden increases in operational costs and reputational risks. Before scaling any AI solution, it is critical to rigorously assess its actual performance and total cost of ownership. Independent verification is no longer a luxury but a necessity. Otherwise, your budget risks becoming a black hole, and trust in the technology may evaporate.