Aryabhata 2: Scaling RL for STEM Logic in LLMs

The PhysicsWallah team—Ritvik Rastogi, Vishal Singh, Tejas Chaudhary, and Sandeep Varma—has unveiled Aryabhata 2. This model attempts to cure the primary birth defect of general-purpose LLMs: their fatal inability to perform accurate symbolic reasoning in physics and mathematics. While GPT giants often hallucinate formulas by guessing the next token, Aryabhata 2—built on the GPT-OSS-20B foundation—shifts toward verifiable problem-solving via reinforcement learning (RL). This isn't just a "smart chat" interface; it is an ambitious attempt to build a system capable of weathering the rigor of India's JEE and NEET exams, where the cost of a reasoning error is far higher than in a humanities essay.

Technical Efficiency and Conciseness

The technical elegance of this solution lies in its mechanics: the researchers traded parameter bloating for scaled RL runs. Leveraging PhysicsWallah’s internal question banks, the model was trained not merely to "speak correctly," but to explore various paths to a solution.

According to the team's report, this approach allowed the model to not only outperform its base version on specialized benchmarks but also to reduce output length by 64%.

In a world where every extra token represents both financial cost and latency, such concise reasoning is a direct challenge to the "wordy but dim-witted" general-purpose models.

The model is built on a 20B parameter architecture. Proprietary datasets were used for reinforcement learning. Significant reduction in compute costs while maintaining high accuracy.

A Lesson for Business and EdTech

For EdTech founders and system architects, there is a vital takeaway here: the "reasoning tax" in scalable student support systems can be radically slashed. Instead of feeding cloud giants' budgets for endless Chain-of-Thought sequences, Aryabhata 2 offers precise symbolic logic in a compact 20B form factor. This transforms the AI tutor from an expensive novelty into a cost-effective tool with predictable inference economics.

The transition from linguistic intuition to verifiable rewards in symbolic domains marks the end of an era where AI was simply required to sound human. Aryabhata 2 proves that in high-stakes fields like STEM, specialization and deep RL beat scale and versatility. The future of AI integration in education and engineering clearly belongs to those who trade creative generation for rigorous mathematical correctness.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

Large Language ModelsMachine LearningFine-tuningCost ReductionAryabhata 2

Beyond Hallucinations: How Aryabhata 2 Uses RL to Solve the STEM Logic Gap