DeepSeek R1-0528, the latest development from Chinese company DeepSeek, represents a significant advance in the reasoning capabilities of artificial intelligence models. The new model is based on the January DeepSeek R1 and is an improved version of it. According to the company, the performance of DeepSeek R1-0528 already rivals that of OpenAI's o3 model and approaches the capabilities of Google Gemini 2.5 Pro.
The model has significantly improved its reasoning and inference capabilities. This was achieved through the use of increased computing resources, algorithmic optimization, and an increase in token usage per question from an average of 12,000 to 23,000. As a result, the model showed a significant performance increase in various tests. For example, on the AIME 2025 test, its accuracy increased from 70% to 87.5%.
The DeepSeek R1-0528 architecture contains 685 billion parameters (up from 671 billion in the previous R1) and uses a Mixture-of-Experts (MoE) design, where only 37 billion parameters are active per token. The model's context window is 128K tokens, and it can generate a maximum of 64K tokens. It supports function calls and JSON output formats. In addition, the hallucination rate has been reduced, especially when rewriting and summarizing content. Its code generation capabilities have also been improved.
The model has achieved remarkable results on various benchmarks. In mathematical tasks, its performance reaches or exceeds that of leading models such as OpenAI o3 and Google Gemini 2.5 Pro. In programming and coding tasks on LiveCodeBench, it ranks behind the OpenAI o4-mini and o3 reasoning models. Its general reasoning abilities have also improved, as evidenced by a significant increase in its GPQA-Diamond test score (from 71.5% to 81.0%).
DeepSeek has released a smaller, distilled version of the main R1-0528 model, called DeepSeek-R1-0528-Qwen3-8B. This model is based on Qwen3-8B and contains reasoning knowledge distilled from DeepSeek-R1-0528. It delivers outstanding performance among open source models. It outperforms Qwen3-8B by +10.0% and matches the performance of Qwen3-235B-thinking. It can be run on a single GPU with at least 40 GB of VRAM.