DeepSeek responds for the first time to the questioning of OpenAI regarding distillation.

18/09/2025

On September 18th, DeepSeek once again caused a sensation. The DeepSeek-R1 research paper, completed jointly by the DeepSeek team with Liang Wenfeng as the corresponding author, made it to the cover of the international authoritative journal "Nature". In January of this year, DeepSeek had released an initial preprint version of the paper on arxiv. Compared to that version, the one published in "Nature" included more model details and reduced anthropomorphic descriptions. In the supplementary materials, DeepSeek mentioned that the training cost of the R1 model was only $294,000, and addressed the initial doubts regarding using OpenAI for distillation. In January of this year, there were reports suggesting that researchers from OpenAI believed DeepSeek may have used the outputs of OpenAI models to train R1, which could accelerate model improvement with fewer resources. In the supplementary information section of the paper, DeepSeek responded to questions about the source of training data for DeepSeek-V3-Base. "The training data for DeepSeek-V3-Base comes solely from ordinary web pages and e-books, without any synthetic data. During the pretraining cooling phase, we did not intentionally incorporate any synthetic data generated by OpenAI, as all data used in this phase was obtained through web crawling," DeepSeek stated.