MiniMax-M1 AI model, targeting the handling of large texts

With the development of artificial intelligence systems, there is a growing demand for models that are not only capable of interpreting language, but also of carrying out complex, multi-step thought processes. Such models can be crucial not only in theoretical tasks, but also in software development or real-time decision-making, for example. However, these applications are particularly sensitive to computational costs, which are often difficult to control using traditional approaches.

The computational load of currently widely used transformer-based models increases rapidly with input length, as the so-called softmax attention mechanism scales quadratically. This means that working with longer texts dramatically increases resource requirements, which is simply unsustainable in many applications. Although several research directions have attempted to solve this problem—such as sparse or linear attention mechanisms and feedback-based networks—these approaches have typically not proven to be sufficiently stable or scalable at the level of the largest systems.

In this challenging environment, the MiniMax AI research group presented its new model, MiniMax-M1, which strives for both computational efficiency and practical applicability to real-world problems. An important feature of the model is that it is open-source, meaning it is not exclusively designed for corporate use, but is also available for research purposes. MiniMax-M1 is based on a multi-expert architecture and is capable of handling long text contexts through a hybrid attention system. It consists of a total of 456 billion parameters, of which approximately 45.9 billion are activated per token.

The system can handle inputs up to one million tokens in length, which is eight times the capacity of some previous models. To optimize the attention mechanism, the researchers introduced a so-called “lightning attention” procedure, which is more efficient than the traditional softmax approach. In the case of MiniMax-M1, the classic method is still used in every seventh transformer block, while the new, linear attention-based method is used in the other blocks. This hybrid structure allows for the handling of large inputs while keeping the computational requirements at an acceptable level.

A new reinforcement learning algorithm called CISPO was also developed to train the model. This algorithm does not limit the updating of generated tokens, but rather the so-called important sampling weights, resulting in a more stable learning process. The training process took place over three weeks with 512 H800 graphics processors, which cost approximately $534,000 to rent.

The model's performance was also evaluated in various tests. Based on the results, MiniMax-M1 performed particularly well in software development and tasks requiring long text contexts, but also showed outstanding results in the area of so-called “agentic” tool use. Although it was surpassed by some newer models in mathematics and coding competitions, it outperformed several widely used systems in working with long texts.

MiniMax-M1 is therefore not just another large model in the history of artificial intelligence development, but an initiative that combines practical considerations with openness to research. Although the technology is still evolving, this development shows promise for the scalable and transparent implementation of systems capable of deep thinking in long contexts. 

Share this post
This is how LLM distorts
With the development of artificial intelligence (AI), more and more attention is being paid to so-called large language models (LLMs), which are now present not only in scientific research but also in many areas of everyday life—for example, in legal work, health data analysis, and computer program coding. However, understanding how these models work remains a serious challenge, especially when they make seemingly inexplicable mistakes or give misleading answers.
 How is the relationship between OpenAI and Microsoft transforming the artificial intelligence ecosystem?
One of the most striking examples of the rapid technological and business transformations taking place in the artificial intelligence industry is the redefinition of the relationship between Microsoft and OpenAI. The two companies have worked closely together for years, but recent developments clearly show that industry logic now favors more flexible, multi-player collaboration models rather than exclusive partnerships.
Amazon and SK Group to build South Korea's largest AI center
A new era may be dawning for South Korea's artificial intelligence industry, with Amazon Web Services (AWS) announcing that it will build the country's largest AI computing center in partnership with SK Group. The investment is not only a technological milestone, but also has a spectacular impact on SK Hynix's stock market performance.
Change in Windows facial recognition: no longer works in the dark
Microsoft recently introduced an important security update to its Windows Hello facial recognition login system, which is part of the Windows 11 operating system. As a result of the change, facial recognition no longer works in the dark, and the company has confirmed that this is not a technical error, but the result of a conscious decision.
Kali Linux 2025.2 released: sustainable improvements in a mature system
The latest stable release of Kali Linux, the popular Linux distribution for ethical hacking and cybersecurity analysis, version 2025.2, was released in June 2025. This time, the developers have not only introduced maintenance updates, but also several new features that enhance both usability and functionality of the system. The updates may be of particular interest to those who use the operating system for penetration testing, network traffic analysis or other security purposes.
Revolutionary AI Memory System Unveiled
Large Language Models (LLMs) are central to the pursuit of Artificial General Intelligence (AGI), yet they currently face considerable limitations concerning memory management. Contemporary LLMs typically depend on knowledge embedded within their fixed weights and a limited context window during operation, which hinders their ability to retain or update information over extended periods. While approaches such as Retrieval-Augmented Generation (RAG) integrate external knowledge, they frequently lack a structured approach to memory. This often results in issues like the forgetting of past interactions, reduced adaptability, and isolated memory across different platforms. Essentially, current LLMs do not treat memory as a persistent, manageable, or shareable resource, which constrains their practical utility.