MiniMax-M1 AI model, targeting the handling of large texts

With the development of artificial intelligence systems, there is a growing demand for models that are not only capable of interpreting language, but also of carrying out complex, multi-step thought processes. Such models can be crucial not only in theoretical tasks, but also in software development or real-time decision-making, for example. However, these applications are particularly sensitive to computational costs, which are often difficult to control using traditional approaches.

The computational load of currently widely used transformer-based models increases rapidly with input length, as the so-called softmax attention mechanism scales quadratically. This means that working with longer texts dramatically increases resource requirements, which is simply unsustainable in many applications. Although several research directions have attempted to solve this problem—such as sparse or linear attention mechanisms and feedback-based networks—these approaches have typically not proven to be sufficiently stable or scalable at the level of the largest systems.

In this challenging environment, the MiniMax AI research group presented its new model, MiniMax-M1, which strives for both computational efficiency and practical applicability to real-world problems. An important feature of the model is that it is open-source, meaning it is not exclusively designed for corporate use, but is also available for research purposes. MiniMax-M1 is based on a multi-expert architecture and is capable of handling long text contexts through a hybrid attention system. It consists of a total of 456 billion parameters, of which approximately 45.9 billion are activated per token.

The system can handle inputs up to one million tokens in length, which is eight times the capacity of some previous models. To optimize the attention mechanism, the researchers introduced a so-called “lightning attention” procedure, which is more efficient than the traditional softmax approach. In the case of MiniMax-M1, the classic method is still used in every seventh transformer block, while the new, linear attention-based method is used in the other blocks. This hybrid structure allows for the handling of large inputs while keeping the computational requirements at an acceptable level.

A new reinforcement learning algorithm called CISPO was also developed to train the model. This algorithm does not limit the updating of generated tokens, but rather the so-called important sampling weights, resulting in a more stable learning process. The training process took place over three weeks with 512 H800 graphics processors, which cost approximately $534,000 to rent.

The model's performance was also evaluated in various tests. Based on the results, MiniMax-M1 performed particularly well in software development and tasks requiring long text contexts, but also showed outstanding results in the area of so-called “agentic” tool use. Although it was surpassed by some newer models in mathematics and coding competitions, it outperformed several widely used systems in working with long texts.

MiniMax-M1 is therefore not just another large model in the history of artificial intelligence development, but an initiative that combines practical considerations with openness to research. Although the technology is still evolving, this development shows promise for the scalable and transparent implementation of systems capable of deep thinking in long contexts. 

Share this post
Where is Artificial Intelligence Really Today?
The development of artificial intelligence has produced spectacular and often impressive results in recent years. Systems like ChatGPT can generate natural-sounding language, solve problems, and in many tasks, even surpass human performance. However, a growing number of prominent researchers and technology leaders — including John Carmack and François Chollet — caution that these achievements don’t necessarily indicate that artificial general intelligence (AGI) is just around the corner. Behind the impressive performances, new types of challenges and limitations are emerging that go far beyond raw capability.
Rhino Linux Releases New Version: 2025.3
In the world of Linux distributions, two main approaches dominate: on one side, stable systems that are updated infrequently but offer predictability and security; on the other, rolling-release distributions that provide the latest software at the cost of occasional instability. Rhino Linux aims to bridge this divide by combining the up-to-dateness of rolling releases with the stability offered by Ubuntu as its base.
SEAL: The Harbinger of Self-Taught Artificial Intelligence
For years, the dominant belief was that human instruction—through data, labels, fine-tuning, and carefully designed interventions—was the key to advancing artificial intelligence. Today, however, a new paradigm is taking shape. In a recent breakthrough, researchers at MIT introduced SEAL (Self-Adapting Language Models), a system that allows language models to teach themselves. This is not only a technological milestone—it also raises a fundamental question: what role will humans play in the training of intelligent systems in the future?
All it takes is a photo and a voice recording – Alibaba's new artificial intelligence creates a full-body avatar from them
A single voice recording and a photo are enough to create lifelike, full-body virtual characters with facial expressions and emotions – without a studio, actor, or green screen. Alibaba's latest development, an open-source artificial intelligence model called OmniAvatar, promises to do just that. Although the technology is still evolving, it is already worth paying attention to what it enables – and what new questions it raises.
ALT Linux 11.0 Education is the foundation of Russian educational institutions
ALT Linux is a Russian-based Linux distribution built on the RPM package manager, based on the Sisyphus repository. It initially grew out of Russian localization efforts, collaborating with international distributions such as Mandrake and SUSE Linux, with a particular focus on supporting the Cyrillic alphabet.
Spatial intelligence is the next hurdle for AGI to overcome
With the advent of LLM, machines have gained impressive capabilities. What's more, their pace of development has accelerated, with new models appearing every day that make machines even more efficient and give them even better capabilities. However, upon closer inspection, this technology has only just enabled machines to think in one dimension. The world we live in, however, is three-dimensional based on human perception. It is not difficult for a human to determine that something is under or behind a chair, or where a ball flying towards us will land. According to many artificial intelligence researchers, in order for AGI, or artificial general intelligence, to be born, machines must be able to think in three dimensions, and for this, spatial intelligence must be developed.

Linux distribution updates released in the last few days