Lates News

date
12/09/2025
Alibaba released the next generation base model architecture Qwen3-Next and open sourced the Qwen3-Next-80B-A3B series models based on this architecture. They believe that Context Length Scaling and Total Parameter Scaling are the two major trends for the future development of large models, so they have designed a brand new model structure Qwen3-Next to further improve the training and inference efficiency of the model in long context and large-scale total parameter settings. Compared to the MoE model structure of Qwen3, this structure has made the following core improvements: a blend attention mechanism, a high sparsity MoE structure, a series of training stability-friendly optimizations, and a multi-token prediction mechanism to improve inference efficiency.