Facing the wall intelligent release based on sparse-linear hybrid architecture SALA training 9B model.
On February 12th, MindWall Intelligence officially released the Sparse-Linear Attention Hybrid Architecture (SALA) and the text model MiniCPM-SALA based on this architecture, with only 9 billion parameters. According to reports, MiniCPM-SALA does not use acceleration algorithms such as speculation sampling. On the cloud inference chip, the inference speed can reach 3.5 times that of Qwen3-8B when the sequence length is 256K tokens, and it supports inference with a context length of up to one million tokens on cloud chips and consumer-grade side GPUs.
Latest

