Porting voice-activated AI from smart speakers to wireless earbuds hit a hard physical wall: instead of a stable power outlet, there is a tiny battery; instead of a powerful processor, there is a chip with an instruction cache of just 4 KB. As Grigory Afanasenko from Yandex’s voice technology team points out, the standard 1.7 MB models used in home systems simply won't fit. To launch the Yandex Drops earbuds, engineers had to rebuild the wake-word engine from scratch, fighting for every kilobyte of the 208 KB of available SRAM.
Surgery on the Architecture
The optimization strategy boiled down to radical architectural surgery without sacrificing accuracy. The team implemented a two-stage pipeline: a lightweight Voice Activity Detector (VAD) runs constantly, reducing the system load fivefold. The main wake-word model was "squeezed" using knowledge distillation and 8-bit quantization, while standard convolutions were replaced with depthwise-separable ones. This reduced the number of parameters by a factor of 20. Ultimately, the neural network was slimmed down to 200 KB, managing to function even within the constraints of the chip manufacturer's SDK, which lacks basic padding support and clips context to just 11–15 frames.
The Business Value of Tiny ML
For the business world, this case sets a new standard for efficiency: smart features no longer require expensive hardware or the "crutch" of a constant cloud connection. Running locally on ultra-low power not only saves data traffic but also drastically reduces latency, transforming earbuds from a simple accessory into a true wearable terminal. Yandex has demonstrated that autonomous AI can survive under extreme resource scarcity if the model architecture is dictated by the specific silicon rather than theoretical textbook accuracy.
Alisa now lives in a space smaller than a single high-quality smartphone photo.
Real industry progress isn't just about expanding data centers. Efficiency requires knowing when to stop and what to cut. On-device AI is moving from a luxury to a technical necessity for wearables.