SU-01 Model: Scaling Laws Outperform Symbolic AI in Math

The race to create 'digital Einsteins' has entered a phase of pragmatic minimalism. A research team from Shanghai AI Laboratory and Tsinghua University has unveiled SU-01 (30B-A3B architecture), a model that clinches gold at International Mathematical and Physics Olympiads (IMO and IPhO) without the usual architectural crutches. Rather than building a labyrinth of symbolic engines and hyper-specialized modules, Yafu Li, Yu Cheng, and their colleagues bet on a 'Unified Scaling' method—a pure scaling strategy that relegates complex neuro-symbolic hybrids to the scrapheap of history.

Technically, SU-01 is a lean, three-stage pipeline. It begins with supervised fine-tuning (SFT) on 340,000 reasoning trajectories, followed by two-stage reinforcement learning (RL), and a final flourish of Test-Time Scaling. The model is capable of maintaining a coherent chain of thought across more than 100,000 tokens. This allows it to solve problems from the IMO 2025 and IPhO 2024 levels—tasks that previously required a 'zoo' of neural networks and external search algorithms. This is a critical precedent: the success of SU-01 proves that 'hard' skills like proof and self-verification are a matter of computational discipline, not clever code.

On the IMO-ProofBench, the model paired with inference-time scaling achieved a staggering 80.5% success rate. To put that in context, it leaves Gemini-1.5-Pro and current iterations of GPT-4 in the dust, confirming that Scaling Laws are more effective than attempts to mimic human logic through software add-ons. Unlike AlphaGeometry, which is locked into geometric tasks, SU-01 demonstrates robust generalization: physics, mathematics, and related disciplines are all handled by the same unified method.

For business leaders and R&D departments in fintech or engineering, this case is a clear signal to simplify the stack. Instead of investing in custom 'smart' modules for every task, the focus is shifting toward reward design and optimizing inference-time compute. We are witnessing the erosion of margins for 'wrapper' solutions even in R&D-heavy segments: universal reasoning models are becoming experts through raw power and correct methodology. The only remaining question is where this computational expansion will hit its ceiling, but for now, the line between pure calculation and the 'understanding' of physical processes is more blurred than ever.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

Artificial IntelligenceLarge Language ModelsMachine LearningFine-tuningSU-01