Scaling AI in R&D: Linear Attention vs. Quadratic Tax

For a long time, the modeling of complex molecular systems hit a computational dead end often referred to in the industry as the 'quadratic tax' of classic transformers. According to research by Sheng Gong and Wen Yan from ByteDance Seed, published in Nature Machine Intelligence, standard architectures, established by Ashish Vaswani in 2017, struggle with scaling. Within these frameworks, memory and power costs grow quadratically with the number of atoms, which makes the simulation of real biosystems challenging for R&D budgets. Researchers previously had to choose between precision at the micro-level or the 'vision' of the entire system. Usually, the former was chosen, resulting in fragmented data where physical context, crucial for drug discovery, was lost.

Gong and Yan propose a technical maneuver: replacing heavy quadratic attention with linear attention with O(N) computational complexity. In our view, this is a rare case where mathematical balancing translates directly into savings on GPU hours. The method allows the model to maintain global awareness without excessive cloud computing bills. As follows from the report, the new mechanism preserves physical symmetries—a basic requirement for force fields—and eliminates the need to use 'crutches' like fragmentation or crude long-range physical fixes. Now, the neural network is capable of tracking how changes in a distant part of a protein affect binding at the active site without turning the task into a massive project of electricity consumption.

This paradigm shift changes the economics of drug discovery. Previously, the industry relied on approximations that could fail in edge cases. According to the study, the transition to a unified global model instead of 'patchwork' local patches creates a more stable foundation for molecular dynamics. Testing on various systems confirmed that accuracy in capturing long-range effects remains high. This is not just a technical curiosity, but a direct impact on operational costs: time previously spent correcting errors in simplified models can now be spent on the actual search for compounds.

The introduction of O(N) linear attention marks the end of an era when scientists had to sacrifice physical reliability for computability. Of course, the path is blocked by the conservatism of existing pipelines tied to legacy frameworks. However, a competitive advantage in materials science will inevitably pass to those who begin to use globally oriented models for simulating rare folding events and complex chemical reactions. The 'quadratic tax' is no longer an mandatory condition of doing business, and those who continue to cling to fragmented modeling risk falling behind in both speed and scientific accuracy.

Source: Nature Machine Intelligence →

Rate this material

★ ★ ★ ★ ★

Neural NetworksAI in HealthcareCost ReductionDigital TransformationByteDance