The first letter delay is reduced by 3.6 times, Tencent Huan Yuan proposes the Stem sparse attention algorithm, and long text reasoning accelerates the new SOTA.

date
05/06/2026
Tencent Hunyuan announced the proposal of the Stem sparse attention algorithm, which has been included in the machine learning conference ICML-26. According to the full-stack acceleration solution of Stem algorithm x HPC operator, at the algorithm level, Stem achieves nearly lossless accuracy with a 25% budget reduction through token position decay and output perception metrics; at the operator level, the open-source Stem+BSA operator from HPC converts sparse benefits into real hardware acceleration, reducing the initial delay of the first word by 3.7 times with 128K context.