Strong Turbulence Around Meta Llama Models

Less than a week after its market debut, Llama 4 has already received harsh criticism from users. As mentioned before, one of Llama 4’s new features is its architecture built from different modules. This design lets the model have a much larger effective parameter set than the one it uses at run time, so in theory, it should perform much better. However, several independent user tests show that it does not meet the expected results, especially for mathematical tasks and coding. Some users claim that Meta heavily manipulated benchmarks to achieve better scores, while others believe that an internal version of the model was tested while a more modest version was released to the public.

Another major new feature of Llama 4 is the 10-million-token context window, which is supposed to allow the model to handle larger codebases. Critics point out that no training datasets larger than 256,000 tokens were used, so it is unclear whether the 10-million-token figure is realistic. They argue that, for the promised input sizes, the quality of the model's outputs is highly questionable.

For now, Meta explains the errors by saying they are normal during the initial phase. They also claim that some of these issues come from user fine-tuning, which is why the results vary so much. This explanation has not reassured the professional community; many experts have expressed concerns about Meta’s lack of transparency in how they manage benchmarks.

Meanwhile, even as Llama 4 continues to face strong turbulence, two new models have been released as improvements to the previous Llama version. Both models aim to lower the models’ computational demands. NVIDIA introduced the Llama 3.1 Nemotron Ultra model, which has 253 billion parameters, advanced inference capabilities, and is designed specifically to support AI-assisted workflows.

The original Llama 3.1 model has been modified with multiphase post-processing techniques to run on much less memory and to reduce its computational requirements. NVIDIA claims that even with half as many parameters, its performance is better than that of DeepSeek R1.

This model is open source and available to everyone on Hugging Face. It can be run on a single 8x H100 GPU node, so it is recommended for those who have access to H100 server nodes. However, this still means that its use for home applications remains limited.

The other development is the Cogito v1 model released by Deep Cogito, which was fine-tuned from the Meta Llama 3.2 model. This new model is designed to add self-reflection capabilities to its hybrid reasoning, allowing it to iteratively improve its own reasoning strategies. It is available in several versions (3B, 8B, 14B, 32B, and 70B parameters) and has already delivered outstanding results in various international benchmarks, such as MMLU, ARC, and several tool-calling tasks. However, in some mathematical tests, Cogito still does not produce the expected results. 

Share this post
After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry
For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.
The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate
The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.
Beyond the Hype: What Does GPT-5 Really Offer?
The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.
The Most Popular Theories About the Impact of AI on the Workplace
Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?
NVIDIA Driver Support Changes – The Clock Is Ticking for the GTX 900–10 Series
NVIDIA has announced a major shift in its driver support strategy. This decision affects millions of users, but what does it actually mean in practice? Is it really time for everyone to consider upgrading their hardware, or is the situation more nuanced? Understanding the implications is key to staying prepared for the technological changes of the coming years.
A Brutal Quarter for Apple, but What Comes After the iPhone?
Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.