Just eight days after the launch of Composer 2, which Cursor billed as a significant leap in performance and pricing, a developer named Finn uncovered an API identifier: `kimi-k2p5-rl-0317-s515-fast`. The company had strongly promoted its proprietary development, but it appears another model lies beneath the 'Composer 2' facade. The presentations, Pareto efficiency charts, and aggressive pricing strategy designed to undercut competitors are now potentially exposed by a single, unrenamed line of code.
In the eight days leading up to Composer 2's launch, Cursor released details about a system deserving more attention than it received. On March 11, the team unveiled CursorBench, an internal evaluation system for coding agents. Regardless of your opinion on subsequent events, this benchmark is genuinely innovative. The tasks are sourced from actual Cursor sessions via the Cursor Blame system, which traces code back to the originating prompt. These prompts are intentionally ambiguous, mirroring how developers actually interact with AI agents. The scale of tasks approximately doubled from the first to the third iteration of the benchmark. Token efficiency is a first-class metric, and the evaluation covers four dimensions, moving beyond a simple pass/fail assessment. Following this, Cursor focused on model engineering, developing a 'compactification in an RL loop' technique. This approach aims to refine models for better efficiency and performance. The company's claims about Composer 2 being a novel, internally developed technology are now in question following Finn's discovery.