Strong Turbulence Around Meta Llama Models

Less than a week after its market debut, Llama 4 has already received harsh criticism from users. As mentioned before, one of Llama 4’s new features is its architecture built from different modules. This design lets the model have a much larger effective parameter set than the one it uses at run time, so in theory, it should perform much better. However, several independent user tests show that it does not meet the expected results, especially for mathematical tasks and coding. Some users claim that Meta heavily manipulated benchmarks to achieve better scores, while others believe that an internal version of the model was tested while a more modest version was released to the public.

Another major new feature of Llama 4 is the 10-million-token context window, which is supposed to allow the model to handle larger codebases. Critics point out that no training datasets larger than 256,000 tokens were used, so it is unclear whether the 10-million-token figure is realistic. They argue that, for the promised input sizes, the quality of the model's outputs is highly questionable.

For now, Meta explains the errors by saying they are normal during the initial phase. They also claim that some of these issues come from user fine-tuning, which is why the results vary so much. This explanation has not reassured the professional community; many experts have expressed concerns about Meta’s lack of transparency in how they manage benchmarks.

Meanwhile, even as Llama 4 continues to face strong turbulence, two new models have been released as improvements to the previous Llama version. Both models aim to lower the models’ computational demands. NVIDIA introduced the Llama 3.1 Nemotron Ultra model, which has 253 billion parameters, advanced inference capabilities, and is designed specifically to support AI-assisted workflows.

The original Llama 3.1 model has been modified with multiphase post-processing techniques to run on much less memory and to reduce its computational requirements. NVIDIA claims that even with half as many parameters, its performance is better than that of DeepSeek R1.

This model is open source and available to everyone on Hugging Face. It can be run on a single 8x H100 GPU node, so it is recommended for those who have access to H100 server nodes. However, this still means that its use for home applications remains limited.

The other development is the Cogito v1 model released by Deep Cogito, which was fine-tuned from the Meta Llama 3.2 model. This new model is designed to add self-reflection capabilities to its hybrid reasoning, allowing it to iteratively improve its own reasoning strategies. It is available in several versions (3B, 8B, 14B, 32B, and 70B parameters) and has already delivered outstanding results in various international benchmarks, such as MMLU, ARC, and several tool-calling tasks. However, in some mathematical tests, Cogito still does not produce the expected results. 

Share this post
Google Introduces the Agent2Agent (A2A) Open Source Protocol
In a recent speech, Jensen Huang (CEO of NVIDIA) divided the evolution of artificial intelligence into several phases and called the current phase the era of Agentic AI. Although he mainly focused on the next phase of the physical AI era, we should not forget that the Agentic AI era also started only this year, so its fully developed form has not yet been seen. The recent announcement by Google of the open source Agent2Agent protocol gives us a hint of what this more advanced form might look like. The protocol is designed to bridge the gap between AI agents created on different platforms, frameworks, and by various vendors, enabling smooth communication and collaboration.
Apple in Trouble with Artificial Intelligence Developments?
With Trump's tariffs, Apple appears to be facing increasing problems. One reason is that, besides the tariffs—which have hit Apple's shares hard—there are internal conflicts, especially in the division responsible for AI integration. Tripp Mickle, a journalist for The New York Times, reports that Apple has not been able to produce any new innovations lately. Although this may not be entirely true—since, after much debate, the company finally managed to launch Apple Intelligence—there is no doubt that it is lagging behind its competitors in the field of artificial intelligence.
New Collaboration Between Netflix and OpenAI
Netflix recently began testing a new artificial intelligence-based search feature that uses OpenAI’s technology to improve content search. This feature is a significant departure from traditional search methods because it allows users to find movies and TV shows using specific terms, such as their mood or preferences, rather than only using titles, genres, or actor names.
Google Geospatial Reasoning: A New AI Tool for Solving Geospatial Problems
Geospatial information science is one of today’s most dynamic fields. It deals with collecting, analyzing, and visualizing location-based data. This discipline combines geosciences with information technology to address practical needs such as urban planning, infrastructure development, natural disaster management, and public health. Although technology like GPS navigation and Google Maps has long been available, the recent explosion of data and the growing demand for real-time decision-making have created a need for new solutions. This is where artificial intelligence comes in—especially with Google’s Geospatial Reasoning framework.