Google DeepMind is once again attempting to convince the market that its new models, Gemini 2.5 Pro and Flash, along with the newcomer Flash-Lite, possess a form of "internal thinking." This feature, described as a "thinking budget," is claimed by Google DeepMind to allow for control over query processing time to enhance accuracy. Stripping away the public relations messaging, Gemini 2.5 Pro appears to be an evolution of last year's developments, with the most significant changes occurring in pricing. The cost for input tokens for Gemini 2.5 Flash has risen from $0.15 to $0.30 per million, although output costs have decreased. Concurrently, the price difference between versions with and without the "thinking" capability has vanished. This move resembles an attempt to streamline tariff structures but could also signal a strategy to steer customers towards more expensive options.
The new Gemini 2.5 Flash-Lite is positioned as the fastest and most budget-friendly solution for mass classification and summarization tasks. Its "thinking" capability is disabled by default but can be activated via API. The model supports integration with Google Search and code execution. The introduction of Flash-Lite clearly suggests Google's aim to segment its audience by price point. Flash-Lite, with its emphasis on speed and low cost, could indeed become a justifiable replacement for those who previously used earlier Flash models, particularly when every cent counts. For Gemini 2.5 Flash, this update is more of an evolution, whose appeal is diminished by the increased cost of input data.
When will "thinking" models become a genuine working tool rather than just another API option? Gemini 2.5 Flash-Lite, with its dynamic "thinking budget" management, appears most promising for integration into business processes where speed and cost are critical. However, the "thinking" capability being dormant by default, and the focus on "economic efficiency" for sensitive tasks, indicate that the actual ROI from the model's deep reasoning will be highly dependent on the specific use case. True breakthroughs capable of fundamentally altering existing paradigms will likely require more advanced and, consequently, more expensive versions. For now, most of the advantages boil down to fine-tuning performance and cost, rather than a fundamentally new level of intelligence—more accurately, a more expensive iteration of what existed before.
Google DeepMind continues its strategic play in the generative AI landscape, offering more flexible and, according to its claims, cost-effective models. The Gemini 2.5 updates, especially Flash-Lite, are targeting the mass market segment where price and speed are decisive factors. If you are a CEO considering AI implementation, remember that the true value of "thinking" models will only be realized when developers learn to utilize them effectively, and if Google can provide stable and transparent pricing without hidden pitfalls. Currently, this appears to be an effort to optimize AI costs by passing them onto customers, masked as "advanced reasoning" capabilities.