Lates News
Tencent's Hybrid AI Infra team officially launched an open-source production-grade high-performance LLM reasoning core operator library HPC-Ops. In real-world scenarios, using the HPC-Ops hybrid model for reasoning can improve QPM by 30% for the DeepSeek model and by 17% for the QPM. At the same time, in terms of single operator performance, HPC-Ops achieves the highest improvement compared to FlashInfer/FlashAttention for Attention by 2.22 times; GroupGEMM by 1.88 times compared to DeepGEMM; and FusedMoE by 1.49 times compared to TensorRT-LLM.
Latest

