Strong Turbulence Around Meta Llama Models

Less than a week after its market debut, Llama 4 has already received harsh criticism from users. As mentioned before, one of Llama 4’s new features is its architecture built from different modules. This design lets the model have a much larger effective parameter set than the one it uses at run time, so in theory, it should perform much better. However, several independent user tests show that it does not meet the expected results, especially for mathematical tasks and coding. Some users claim that Meta heavily manipulated benchmarks to achieve better scores, while others believe that an internal version of the model was tested while a more modest version was released to the public.

Another major new feature of Llama 4 is the 10-million-token context window, which is supposed to allow the model to handle larger codebases. Critics point out that no training datasets larger than 256,000 tokens were used, so it is unclear whether the 10-million-token figure is realistic. They argue that, for the promised input sizes, the quality of the model's outputs is highly questionable.

For now, Meta explains the errors by saying they are normal during the initial phase. They also claim that some of these issues come from user fine-tuning, which is why the results vary so much. This explanation has not reassured the professional community; many experts have expressed concerns about Meta’s lack of transparency in how they manage benchmarks.

Meanwhile, even as Llama 4 continues to face strong turbulence, two new models have been released as improvements to the previous Llama version. Both models aim to lower the models’ computational demands. NVIDIA introduced the Llama 3.1 Nemotron Ultra model, which has 253 billion parameters, advanced inference capabilities, and is designed specifically to support AI-assisted workflows.

The original Llama 3.1 model has been modified with multiphase post-processing techniques to run on much less memory and to reduce its computational requirements. NVIDIA claims that even with half as many parameters, its performance is better than that of DeepSeek R1.

This model is open source and available to everyone on Hugging Face. It can be run on a single 8x H100 GPU node, so it is recommended for those who have access to H100 server nodes. However, this still means that its use for home applications remains limited.

The other development is the Cogito v1 model released by Deep Cogito, which was fine-tuned from the Meta Llama 3.2 model. This new model is designed to add self-reflection capabilities to its hybrid reasoning, allowing it to iteratively improve its own reasoning strategies. It is available in several versions (3B, 8B, 14B, 32B, and 70B parameters) and has already delivered outstanding results in various international benchmarks, such as MMLU, ARC, and several tool-calling tasks. However, in some mathematical tests, Cogito still does not produce the expected results. 

Share this post
Apple in Trouble with Artificial Intelligence Developments?
With Trump's tariffs, Apple appears to be facing increasing problems. One reason is that, besides the tariffs—which have hit Apple's shares hard—there are internal conflicts, especially in the division responsible for AI integration. Tripp Mickle, a journalist for The New York Times, reports that Apple has not been able to produce any new innovations lately. Although this may not be entirely true—since, after much debate, the company finally managed to launch Apple Intelligence—there is no doubt that it is lagging behind its competitors in the field of artificial intelligence.
New Collaboration Between Netflix and OpenAI
Netflix recently began testing a new artificial intelligence-based search feature that uses OpenAI’s technology to improve content search. This feature is a significant departure from traditional search methods because it allows users to find movies and TV shows using specific terms, such as their mood or preferences, rather than only using titles, genres, or actor names.
Google Geospatial Reasoning: A New AI Tool for Solving Geospatial Problems
Geospatial information science is one of today’s most dynamic fields. It deals with collecting, analyzing, and visualizing location-based data. This discipline combines geosciences with information technology to address practical needs such as urban planning, infrastructure development, natural disaster management, and public health. Although technology like GPS navigation and Google Maps has long been available, the recent explosion of data and the growing demand for real-time decision-making have created a need for new solutions. This is where artificial intelligence comes in—especially with Google’s Geospatial Reasoning framework.
Several Important New Features in Llama 4
Meta’s latest family of artificial intelligence models, Llama 4, brings significant innovations to multimodal model development. In addition to the two models currently available—Llama 4 Scout and Llama 4 Maverick—a very powerful model called Llama 4 Behemoth is in development. Behemoth is expected to play a significant role in STEM-related tasks (science, technology, engineering, and mathematics) in the future.