Strong Turbulence Around Meta Llama Models

Less than a week after its market debut, Llama 4 has already received harsh criticism from users. As mentioned before, one of Llama 4’s new features is its architecture built from different modules. This design lets the model have a much larger effective parameter set than the one it uses at run time, so in theory, it should perform much better. However, several independent user tests show that it does not meet the expected results, especially for mathematical tasks and coding. Some users claim that Meta heavily manipulated benchmarks to achieve better scores, while others believe that an internal version of the model was tested while a more modest version was released to the public.

Another major new feature of Llama 4 is the 10-million-token context window, which is supposed to allow the model to handle larger codebases. Critics point out that no training datasets larger than 256,000 tokens were used, so it is unclear whether the 10-million-token figure is realistic. They argue that, for the promised input sizes, the quality of the model's outputs is highly questionable.

For now, Meta explains the errors by saying they are normal during the initial phase. They also claim that some of these issues come from user fine-tuning, which is why the results vary so much. This explanation has not reassured the professional community; many experts have expressed concerns about Meta’s lack of transparency in how they manage benchmarks.

Meanwhile, even as Llama 4 continues to face strong turbulence, two new models have been released as improvements to the previous Llama version. Both models aim to lower the models’ computational demands. NVIDIA introduced the Llama 3.1 Nemotron Ultra model, which has 253 billion parameters, advanced inference capabilities, and is designed specifically to support AI-assisted workflows.

The original Llama 3.1 model has been modified with multiphase post-processing techniques to run on much less memory and to reduce its computational requirements. NVIDIA claims that even with half as many parameters, its performance is better than that of DeepSeek R1.

This model is open source and available to everyone on Hugging Face. It can be run on a single 8x H100 GPU node, so it is recommended for those who have access to H100 server nodes. However, this still means that its use for home applications remains limited.

The other development is the Cogito v1 model released by Deep Cogito, which was fine-tuned from the Meta Llama 3.2 model. This new model is designed to add self-reflection capabilities to its hybrid reasoning, allowing it to iteratively improve its own reasoning strategies. It is available in several versions (3B, 8B, 14B, 32B, and 70B parameters) and has already delivered outstanding results in various international benchmarks, such as MMLU, ARC, and several tool-calling tasks. However, in some mathematical tests, Cogito still does not produce the expected results. 

Share this post
Phase Transition Observed in Language Model Learning
What happens inside the "mind" of artificial intelligence when it learns to understand language? How does it move from simply following the order of words to grasping their meaning? A recently published study offers a theoretical perspective on these internal processes and identifies a transformation that resembles a physical phase transition.
How AI is Helping to Reduce Carbon Emissions in the Cement Industry
One industry alone is responsible for around eight percent of global carbon emissions: cement production. That’s more than the entire aviation sector emits worldwide. As the world increasingly relies on concrete for housing, infrastructure, and industrial facilities, cement manufacturing remains highly energy-intensive and a major source of pollution. A research team at the Paul Scherrer Institute (PSI) in Switzerland is aiming to change this—by using artificial intelligence to develop new, more environmentally friendly cement formulas.
Where is Artificial Intelligence Really Today?
The development of artificial intelligence has produced spectacular and often impressive results in recent years. Systems like ChatGPT can generate natural-sounding language, solve problems, and in many tasks, even surpass human performance. However, a growing number of prominent researchers and technology leaders — including John Carmack and François Chollet — caution that these achievements don’t necessarily indicate that artificial general intelligence (AGI) is just around the corner. Behind the impressive performances, new types of challenges and limitations are emerging that go far beyond raw capability.
Rhino Linux Releases New Version: 2025.3
In the world of Linux distributions, two main approaches dominate: on one side, stable systems that are updated infrequently but offer predictability and security; on the other, rolling-release distributions that provide the latest software at the cost of occasional instability. Rhino Linux aims to bridge this divide by combining the up-to-dateness of rolling releases with the stability offered by Ubuntu as its base.
SEAL: The Harbinger of Self-Taught Artificial Intelligence
For years, the dominant belief was that human instruction—through data, labels, fine-tuning, and carefully designed interventions—was the key to advancing artificial intelligence. Today, however, a new paradigm is taking shape. In a recent breakthrough, researchers at MIT introduced SEAL (Self-Adapting Language Models), a system that allows language models to teach themselves. This is not only a technological milestone—it also raises a fundamental question: what role will humans play in the training of intelligent systems in the future?
All it takes is a photo and a voice recording – Alibaba's new artificial intelligence creates a full-body avatar from them
A single voice recording and a photo are enough to create lifelike, full-body virtual characters with facial expressions and emotions – without a studio, actor, or green screen. Alibaba's latest development, an open-source artificial intelligence model called OmniAvatar, promises to do just that. Although the technology is still evolving, it is already worth paying attention to what it enables – and what new questions it raises.

Linux distribution updates released in the last few days