Grok 4: The potential for long process workflow applications is beginning to show, driving the demand for AI infrastructure and computing power.

date
12/07/2025
avatar
GMT Eight
Recommend focusing on investment opportunities in key companies in related fields, and comprehensively sorting out the following investment themes: 1) Theme 1: General management software; 2) Theme 2: Tool software and other key industry software; 3) Theme 3: AI infrastructure.
CITIC SEC released a research report stating that Grok 4's reasoning capabilities in professional disciplines and complex tasks are outstanding, demonstrating the future model's potential application in long-process professional work. It supports the implementation of high-value scenarios by Agents and, combined with subsequent multimodal capabilities, is expected to break through and open up new application scenarios. The industry implementation will drive demand for AI infrastructure and computing power. It is recommended to pay attention to investment opportunities in key companies in related fields. The following investment themes are summarized: 1) Theme One: General management software; 2) Theme Two: Tool software and other key industry software; 3) Theme Three: AI infrastructure. Key points from CITIC SEC are as follows: Event: Grok 4 officially released and open for use On July 10th Beijing time, XAI released a new generation base large model Grok 4, including Grok 4 and Grok 4 Heavy versions, with improved performance in professional discipline task reasoning. The model's B-end API pricing is $3 per million Tokens input and $15 per million Tokens output, about 50% more expensive than o3. C-end subscribers who pay $30 per month can use Grok 4, while the Grok-4 Heavy version, requiring high reasoning power investment, requires a membership fee of $300 per month to use. Significant upgrades to reasoning capabilities in professional disciplines and complex tasks Grok 4's performance in professional disciplines and complex tasks in business environments far surpasses previous best models (SOTA) in knowledge capabilities, already surpassing undergraduate and graduate levels, quickly narrowing the gap with top human experts in all fields. 1) HLE: In the Humanity's Last Exam (HLE) test set compiled by experts in various disciplines, Grok-4 achieved an accuracy rate of 26.9% without using tools, 41.0% when using tools, and could further improve to 50.7% by increasing RL computing power during the reasoning stage, doubling the level of the previous SOTA model at 21.6%. 2) Vending-Bench: In the Vending-Bench test measuring complex task-solving abilities in a business environment, Grok-4's score is twice that of the second-place Claude Opus 4, indicating a direction towards solving real complex problems. 3) Others: In professional discipline knowledge test sets such as GPQA, AIME25, HMMT 25, USAMO 25, Grok 4 Heavy won in four items, especially achieving close to perfect scores of 100%/96.7% in AIME25 and HMMT25 respectively. Development of reasoning capabilities drives demand for computing power, and technical innovation brings new ideas to improve reasoning efficiency for subsequent models In terms of training, Grok 4's training data is 100 times higher than Grok 2 and 10 times higher than Grok-3 in post-training reinforcement learning. On the reasoning side, similar to OpenAI o3-high, Grok 4 Heavy improves model effectiveness by increasing reinforcement learning computing power input, validating the effectiveness of Test time computing. The reasoning efficiency of Grok 4 (reasoning efficiency in terms of cost) is significantly higher than all previous models according to the ARC-AGI v2 test results. In terms of technology, two engineering innovations of Grok 4 are: 1) demonstrating the significant value of tool usage capability for reasoning performance, greatly improving model reasoning performance by letting the model learn to use tools in the pre-training stage; 2) finding a reliable reward signal scheme in post-training reinforcement learning. Grok 4's innovation shows that reasoning capability is still the industry focus and future direction, and engineering exploration provides new ideas for upgrading reasoning capabilities in subsequent models. Updated flexible and emotionally nuanced voice interactions, multimodality is a focus for future updates Grok 4 has released a new voice assistant, Eve, with a 50% reduction in conversation delay and a 10-fold increase in daily user engagement time. In live demonstrations, the new voice assistant's conversation tone, pitch, and intonation are highly similar to a real person, capable of imitating whispers or singing newly composed songs. The event also showcased Grok-4's potential in the field of game development, where game designers used AI to create a simple first-person shooter game in just 4 hours. Musk mentioned that the first AI game and the first AI movie are expected to be released next year. Grok 4's current understanding and generation abilities in the visual field are still lacking, and Musk stated that related functions are expected to be significantly improved in the next minor version within weeks to months. XAI plans to release code models in August, multimodal intelligent agents in September, and video generation models in October. Risk factors: Underperformance in the development of core AI technologies, misuse of AI leading to serious social impacts, enterprise data security risks, information security risks, intensified industry competition, geopolitical risks.