LLM Vulnerability: Stealing Model Weights via Token Lists

Protecting proprietary AI assets by hiding full logit values no longer guarantees model security. A study from the University of Southern California and the University of Edinburgh proves that simple token ranking—the descending order of word probabilities—acts as a unique geometric "signature" of the model.

As Matthew Finlayson and his colleagues discovered, the low-rank unembedding layer creates what is known as a "softmax bottleneck." This restricts model outputs to a tiny, system-specific subset of possible rankings. Put simply, the very sequence of suggested words betrays the parameters of the system’s final layer.

Experience shows that an attacker can use these rankings to reconstruct weights almost as effectively as if they had access to raw probability data.

While API providers diligently restrict logit access to prevent prompt inversion and direct parameter hijacking, the ranking signature remains tellingly expressive. The only consolation for platform owners is that the top-k sample size required to identify a model is typically smaller than what is needed to steal it entirely. By limiting API responses to a sufficiently small number of tokens, providers can maintain a verifiable signature without handing over the keys to their intellectual property.

New Rules for Corporate Espionage

This discovery radically changes the industry's playing field. The ranking signature is described as the first known "polynomially unforgeable" identifier, as finding another set of weights that produces identical results is an NP-hard problem.

Finlayson’s team estimates this allows owners to prove authenticity or detect weight leaks simply by checking them against API outputs. Interface security requires an immediate overhaul: even seemingly harmless word autocomplete suggestions now function as high-precision maps of a system's internal geometry. Standard top-k functions in commercial AI services may be unwittingly broadcasting model architectures to competitors.

The question of how many companies are currently volunteering their technological secrets through public interfaces remains wide open.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

Large Language ModelsCybersecurityAI SafetyNeural Networks

The Softmax Bottleneck: How Simple Token Lists Can Leak Secret AI Weights