Several Important New Features in Llama 4

Meta’s latest family of artificial intelligence models, Llama 4, brings significant innovations to multimodal model development. In addition to the two models currently available—Llama 4 Scout and Llama 4 Maverick—a very powerful model called Llama 4 Behemoth is in development. Behemoth is expected to play a significant role in STEM-related tasks (science, technology, engineering, and mathematics) in the future.

Recently, many multimodal models have been introduced. These models can process and integrate different types of data—such as text, images, sound, and video—at the same time. This allows them to understand questions in a richer context and solve much more complex problems than previous text-only models. However, this strength can also be a drawback, as such models usually need far more resources than traditional single-modal systems. Meta addresses this issue with the Mixture of Experts (MoE) architecture used in the Llama 4 family. The MoE approach activates only a portion of the model for a given input, which improves efficiency and greatly reduces computational costs. This method is not unique to Llama 4; many large companies are following a similar trend. Nevertheless, Llama 4’s open-source strategy clearly sets it apart from its competitors.

As mentioned earlier, only the two smaller models in the family—Scout and Maverick—are currently available. Both models have 17 billion active parameters, meaning that they process input using 17 billion parameters. However, each model actually contains many more total parameters. Scout has 109 billion parameters, while Maverick has 400 billion. This is due to the MoE architecture: the models activate specific submodules (called “experts” by Meta) when processing input. Accordingly, Scout uses 16 experts and Maverick uses 128 experts. Although Scout is smaller than Maverick, it has the unique feature of a context window that can handle up to 10 million tokens, making it ideal for analyzing long texts, documents, or large codebases. While Maverick does not support such an extensive context window, several benchmarks show that it outperforms competitors like GPT-4o or Gemini 2.0 Flash in inference and coding tasks, even while using only half as many parameters as DeepSeek V3.

Although still in development, Meta claims that Behemoth will perform better than GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro in STEM-related tasks. Like its two smaller siblings, Behemoth will have 288 billion active parameters, but with 16 submodules it will have nearly 2,000 billion parameters in total. Behemoth is also notable because Meta plans to use this model to train smaller models, and it could be integrated into Meta services such as Messenger, Instagram Direct, and WhatsApp. 

Share this post
What Does the Rise of DiffuCoder and Diffusion Language Models Mean?
A new approach is now fundamentally challenging this linear paradigm: diffusion language models (dLLMs), which generate content not sequentially but globally, through iterative refinement. But are they truly better suited to code generation than the well-established AR models? And what insights can we gain from DiffuCoder, the first major open-source experiment in this field?
Apple's New AI Models Can Understand What’s on Your Screen
When we look at our phone's display, what we see feels obvious—icons, text, and buttons we’re used to. But how does artificial intelligence interpret that same interface? This question is at the heart of joint research between Apple and Finland’s Aalto University, resulting in a model called ILuvUI. This development isn’t just a technical milestone; it’s a major step toward enabling digital systems to truly understand how we use applications—and how they can assist us even more effectively.
Artificial Intelligence in the Service of Religion and the Occult
Imagine attending a religious service. The voice of the priest or rabbi is familiar, the message resonates deeply, and the sermon seems thoughtfully tailored to the lives of those present. Then it is revealed that neither the words nor the voice came from a human being—they were generated by artificial intelligence, trained on the speaker’s previous sermons. The surprise lies not only in the capabilities of the technology, but also in the realization that spirituality—so often viewed as timeless and intrinsically human—has found a new partner in the form of an algorithm. What does this shift mean for faith, religious communities, and our understanding of what it means to believe?
Getting Started with Amazon Bedrock & Knowledge Bases – A Simple Way to Make Your Documents Chat-Ready
In the world of AI, there’s often a huge gap between theory and practice. You’ve got powerful models like Claude 4, Amazon Titan, or even GPT-4, but how do you actually use them to solve a real problem? That’s where Amazon Bedrock and its Knowledge Bases come in.
CachyOS: The Linux Distribution for Gamers
Many people still associate Linux with complexity—an operating system reserved for technically savvy users, and certainly not one suitable for gaming. For years, gaming was considered the domain of Windows alone. However, this perception is gradually changing. Several Linux distributions tailored for gamers have emerged, such as SteamOS. Among them is CachyOS, an Arch-based system that prioritizes performance, security, and user experience. The July 2025 release is a clear example of how a once niche project can evolve into a reliable and appealing option for everyday use. In fact, it recently claimed the top spot on DistroWatch’s popularity list, surpassing all other distributions.
Will Artificial Intelligence Spell the End of Antivirus Software?
In professional discussions, the question of whether artificial intelligence (AI) could become a tool for cybercrime is increasingly gaining attention. While the media sometimes resorts to exaggerated claims, the reality is more nuanced and demands a balanced understanding.