MiniMax-M1 AI model, targeting the handling of large texts

With the development of artificial intelligence systems, there is a growing demand for models that are not only capable of interpreting language, but also of carrying out complex, multi-step thought processes. Such models can be crucial not only in theoretical tasks, but also in software development or real-time decision-making, for example. However, these applications are particularly sensitive to computational costs, which are often difficult to control using traditional approaches.

The computational load of currently widely used transformer-based models increases rapidly with input length, as the so-called softmax attention mechanism scales quadratically. This means that working with longer texts dramatically increases resource requirements, which is simply unsustainable in many applications. Although several research directions have attempted to solve this problem—such as sparse or linear attention mechanisms and feedback-based networks—these approaches have typically not proven to be sufficiently stable or scalable at the level of the largest systems.

In this challenging environment, the MiniMax AI research group presented its new model, MiniMax-M1, which strives for both computational efficiency and practical applicability to real-world problems. An important feature of the model is that it is open-source, meaning it is not exclusively designed for corporate use, but is also available for research purposes. MiniMax-M1 is based on a multi-expert architecture and is capable of handling long text contexts through a hybrid attention system. It consists of a total of 456 billion parameters, of which approximately 45.9 billion are activated per token.

The system can handle inputs up to one million tokens in length, which is eight times the capacity of some previous models. To optimize the attention mechanism, the researchers introduced a so-called “lightning attention” procedure, which is more efficient than the traditional softmax approach. In the case of MiniMax-M1, the classic method is still used in every seventh transformer block, while the new, linear attention-based method is used in the other blocks. This hybrid structure allows for the handling of large inputs while keeping the computational requirements at an acceptable level.

A new reinforcement learning algorithm called CISPO was also developed to train the model. This algorithm does not limit the updating of generated tokens, but rather the so-called important sampling weights, resulting in a more stable learning process. The training process took place over three weeks with 512 H800 graphics processors, which cost approximately $534,000 to rent.

The model's performance was also evaluated in various tests. Based on the results, MiniMax-M1 performed particularly well in software development and tasks requiring long text contexts, but also showed outstanding results in the area of so-called “agentic” tool use. Although it was surpassed by some newer models in mathematics and coding competitions, it outperformed several widely used systems in working with long texts.

MiniMax-M1 is therefore not just another large model in the history of artificial intelligence development, but an initiative that combines practical considerations with openness to research. Although the technology is still evolving, this development shows promise for the scalable and transparent implementation of systems capable of deep thinking in long contexts. 

Share this post
After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry
For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.
The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate
The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.
Beyond the Hype: What Does GPT-5 Really Offer?
The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.
The Most Popular Theories About the Impact of AI on the Workplace
Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?
A Brutal Quarter for Apple, but What Comes After the iPhone?
Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.
The Micron 9650: The World's First Commercial PCIe 6.0 SSD
In the age of artificial intelligence and high-performance computing, data speed has become critically important. In this rapidly accelerating digital world, Micron has announced a technological breakthrough that redefines our concept of data center storage. Enter the Micron 9650, the world’s first SSD equipped with a PCIe 6.0 interface—not just another product on the market, but a herald of a new era in server-side storage, offering unprecedented speed and efficiency.