Alibaba has unveiled Qwen3.5-Omni, a model that its engineers claim can generate software code by listening to voice commands and analyzing video, reportedly without direct training on such data. This development signals a new era where businesses might communicate with AI without needing to master complex prompting techniques.

In audio tasks, Alibaba states that Qwen3.5-Omni surpasses Google Gemini 3.1 Pro. The model's support for 74 languages unlocks vast potential for global automation. Qwen3.5-Omni can process over ten hours of audio and seven minutes of video, indicating it is more than a novelty; it is a tool for analyzing intricate business processes.

Alibaba has not disclosed the model's size, offering access instead through an API. This approach likely aims to maintain control and generate revenue from enterprise clients. The model's multimodality—its ability to understand text, images, audio, and video—strongly suggests that the future lies with AI systems that can interact with users in a human-like manner.

What this means for business right now is that Qwen3.5-Omni is less about empowering developers and more about enabling executives to issue commands to AI via voice or video, receiving code or analyses in return. This capability could significantly accelerate development cycles, boost efficiency, and prompt a reevaluation of technology interaction not just within IT departments but across entire company operations. Other players in the AI space are expected to follow suit.

Generative AILarge Language ModelsAI in BusinessAutomationAlibaba