The era when OpenAI was considered the uncontested heavyweight champion of cognitive performance is officially over. A new report from Epoch AI reveals a tectonic shift: Anthropic’s Claude Fable 5 hasn't just edged out the competition—it has opened a double-digit lead in a field where every percentage point is won through engineering blood and sweat. On FrontierMath, the industry's most ruthless mathematical benchmark, the model delivered 87% accuracy across the first three levels and a phenomenal 88% on the elite Tier 4.
A leap from 10% to 88% in less than a year isn't a routine update; it’s a radical overhaul of reasoning architecture.
To grasp the scale of this progress, one only needs to look in the rearview mirror: as recently as early 2026, its predecessor, Opus 4.5, was struggling below the 10% mark on Tier 4 tasks. While the market speculated whether large language models were hitting a complexity ceiling, Anthropic simply dismantled that ceiling.
Against this backdrop, OpenAI’s GPT-5.5 looks unexpectedly lackluster. With a Tier 4 score of approximately 75%, Sam Altman’s latest offering trails by 13 points. In the world of FrontierMath—which utilizes standardized Epoch AI scaffolding to eliminate manipulation and "marketing polish"—this distance is not a rounding error; it is a chasm. It represents the difference between autonomous engineering you can trust and a system that requires constant supervision. As OpenAI prepares GPT-5.6, leadership in R&D and complex financial modeling has de facto shifted to Anthropic.
Key Takeaways for Business:
Superiority in verifiable computations makes Claude the primary choice for high-stakes tasks where the cost of error is prohibitive.
Claude Mythos and OpenAI models recently co-solved a long-standing Erdős conjecture, signaling a transition from theory to genuine scientific breakthroughs.
In the solo race of logical endurance, Anthropic is now setting the pace, forcing OpenAI into the uncharacteristic role of the underdog.