Tencent Huanyuan announced the open source of the first multimodal unified CoT incentive model.

14/05/2025

On May 13, Tencent Hyim published a message stating that they recently collaborated with Shanghai AILab, Fudan University, and Shanghai Wisdom College to propose a new research work called Unified Reward-Think. They have constructed the first unified multi-modal reward model with long-chain reasoning capabilities, allowing the reward model to truly "learn to think" for the first time in various visual tasks. This achievement has significantly improved the accurate evaluation of complex visual generation and understanding tasks, as well as the cross-task generalization and reasoning interpretability. The project has now been fully open-sourced, including the model, dataset, training scripts, and evaluation tools.