Tencent's MixBlend AI Infra core technology is open source: inference throughput increased by 30%
Tencent's Mix-Element AI Infra team officially launches the open-source production-level high-performance LLM reasoning core operator library HPC-Ops. In real scenarios, based on HPC-Ops, the QPM inference of the Mix-Element model is improved by 30%, and the QPM inference of the DeepSeek model is improved by 17%. At the same time, in terms of single-operator performance, HPC-Ops achieves the highest improvement of 2.22 times compared to FlashInfer/FlashAttention for Attention; 1.88 times compared to DeepGEMM for GroupGEMM; and 1.49 times compared to TensorRT-LLM for FusedMoE.
Latest

