@Ali_TongyiLab: 1/10 🚀 Qwen3.5-Omni is here! S...

1

1/10 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI.
Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction.
A standout feature:
Audio-Visual Vibe Coding: Describe your vision to the camera, and Qwen3.5-Omni instantly builds a functional website or game for you.
Highlights:
Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping.
SOTA Performance: Qwen3.5-Omni has secured 215 SOTA scores across various sub-tasks, matching the top-tier text/vision capabilities of the Qwen3.5 series.
Audio-Visual Understanding: From auto-segmentation to fine-grained script generation, it understands the relationship between characters and their environment like never before.
Seamless Interaction: With native API support for Semantic Interruption, voice conversations feel human-like and background-noise resistant.
Global Multilingual Mastery: Pioneering support for 74 languages in speech recognition and 29 languages in expressive speech generation, breaking down global communication barriers.
Autonomous Intelligence: Native support for WebSearch and complex Function Calling—the model now independently decides when to pull real-time data.
Qwen3.5-Omni is built to be the backbone of next-gen AI applications, empowering developers and users alike with true multimodal reasoning.