Strong Turbulence Around Meta Llama Models

Less than a week after its market debut, Llama 4 has already received harsh criticism from users. As mentioned before, one of Llama 4’s new features is its architecture built from different modules. This design lets the model have a much larger effective parameter set than the one it uses at run time, so in theory, it should perform much better. However, several independent user tests show that it does not meet the expected results, especially for mathematical tasks and coding. Some users claim that Meta heavily manipulated benchmarks to achieve better scores, while others believe that an internal version of the model was tested while a more modest version was released to the public.

Another major new feature of Llama 4 is the 10-million-token context window, which is supposed to allow the model to handle larger codebases. Critics point out that no training datasets larger than 256,000 tokens were used, so it is unclear whether the 10-million-token figure is realistic. They argue that, for the promised input sizes, the quality of the model's outputs is highly questionable.

For now, Meta explains the errors by saying they are normal during the initial phase. They also claim that some of these issues come from user fine-tuning, which is why the results vary so much. This explanation has not reassured the professional community; many experts have expressed concerns about Meta’s lack of transparency in how they manage benchmarks.

Meanwhile, even as Llama 4 continues to face strong turbulence, two new models have been released as improvements to the previous Llama version. Both models aim to lower the models’ computational demands. NVIDIA introduced the Llama 3.1 Nemotron Ultra model, which has 253 billion parameters, advanced inference capabilities, and is designed specifically to support AI-assisted workflows.

The original Llama 3.1 model has been modified with multiphase post-processing techniques to run on much less memory and to reduce its computational requirements. NVIDIA claims that even with half as many parameters, its performance is better than that of DeepSeek R1.

This model is open source and available to everyone on Hugging Face. It can be run on a single 8x H100 GPU node, so it is recommended for those who have access to H100 server nodes. However, this still means that its use for home applications remains limited.

The other development is the Cogito v1 model released by Deep Cogito, which was fine-tuned from the Meta Llama 3.2 model. This new model is designed to add self-reflection capabilities to its hybrid reasoning, allowing it to iteratively improve its own reasoning strategies. It is available in several versions (3B, 8B, 14B, 32B, and 70B parameters) and has already delivered outstanding results in various international benchmarks, such as MMLU, ARC, and several tool-calling tasks. However, in some mathematical tests, Cogito still does not produce the expected results. 

Share this post
 Thinkless: Fight against the Growing Resource Demands of AI
In recent months, major tech companies have announced a series of reasoning features in their models. However, the immense resource requirements of these systems quickly became apparent, causing the prices of such subscription services to soar. Researchers at the National University of Singapore (NUS) have developed a new framework called "Thinkless", which could significantly transform how large language models (LLMs) handle reasoning tasks. This innovative approach, created by Gongfan Fang, Xinyin Ma, and Xinchao Wang at the NUS xML Lab, enables AI systems to dynamically choose between simple and complex reasoning strategies—potentially reducing computational costs by up to 90%. The framework addresses a critical inefficiency in current AI reasoning methods and represents a major step toward more resource-efficient AI.
The EU’s Open Web Index Project: Another Step Toward Digital Independence
The Open Web Index (OWI) is an open-source initiative under the European Union’s Horizon Programme, aimed at democratizing web-search technologies and strengthening Europe’s digital sovereignty. The project will launch in June 2025, providing a common web index accessible to all and decoupling the indexing infrastructure from the search services that use it. In doing so, the OWI offers not only technical innovations but also a paradigm shift in the global search market—today, a single player (Google) holds over ninety percent of the market share and determines access to online information.
Android 16 launches with enhanced protection
The new Android 16 release offers the platform’s three billion users the most comprehensive device-level protection to date. It focuses on safeguarding high-risk individuals while also marking a significant advancement for all security-conscious users. The system’s cornerstone is the upgraded Advanced Protection Program, which now activates a full suite of device-level defense mechanisms rather than the previous account-level settings. As a result, journalists, public figures, and other users vulnerable to sophisticated cyber threats can enable the platform’s strongest security features with a single switch.
Gemini Advanced Strengthens GitHub Integration
There is no shortage of innovation in the world of AI-based development tools. Google has now announced direct GitHub integration for its premium AI assistant, Gemini Advanced. This move is not only a response to similar developments by its competitor OpenAI, but also a significant step forward in improving developer workflows.