Qwen3-Next: Moving towards more extreme training reasoning cost effectiveness.

date
12/09/2025
Alibaba's Thousand Question Answering (Qwen3) has released the next generation basic model architecture Qwen3-Next, and has open sourced the Qwen3-Next-80B-A3B series models based on this architecture. They believe that Context Length Scaling and Total Parameter Scaling are the two major trends in the future development of large models. In order to further enhance the training and inference efficiency of models in long contexts and large-scale total parameters, they have designed a new model structure for Qwen3-Next. Compared to Qwen3's MoE model structure, this new structure includes the following core improvements: a hybrid attention mechanism, a high sparsity MoE structure, a series of training-friendly optimizations for stability, and a multi-token prediction mechanism to improve inference efficiency.