Phi-4 model family expanded with two new models

 Microsoft recently announced the new generation of the Phi-4 model family, which now includes two distinct yet complementary models: the Phi-4 multimodal and the Phi-4 mini. These models not only further improve computing performance but also combine different data types in innovative ways to support a wide range of AI applications—all in a compact design with optimized resource usage.

The Phi-4 multimodal model

The Phi-4 multimodal model, built on an architecture with 5.6 billion parameters, is an excellent solution for processing speech, image, and text data simultaneously. Unlike traditional systems—where different models handle each data type—this model uses a mixture-of-LoRAs technology to represent various data types together. As a result, it processes information faster and requires fewer computing resources, which is especially important for edge computing and real-time applications.

Its optimized design uses 40% less RAM, and with a context window that can handle up to 128,000 tokens, it easily manages long and complex content. The model also supports up to 85 languages, making it suitable for global applications. In several key benchmarks, the Phi-4 multimodal model achieved outstanding results. For example, it set a new record in speech recognition on the HuggingFace OpenASR leaderboard with a 6.14% word error rate, reached 89.2% accuracy in document analysis on the DocVQA test, and achieved a 78.5% success rate on scientific questions—performance levels comparable to the latest generation models. Furthermore, for tasks that require processing different types of data together (such as understanding a diagram alongside its spoken instructions), the model delivered results that were 35% more accurate than competing solutions.

The Phi-4-mini model

With 3.8 billion parameters, the Phi-4 Mini model is optimized for text-based tasks. It uses a decoder-only transformer architecture with a grouped-query attention mechanism. This approach not only cuts memory usage—requiring 22% fewer resources—but also maintains context sensitivity. By default, the model supports 43 languages and can manage context windows of up to 128,000 tokens, making it ideal for processing long texts and documents.

Another notable innovation of the Phi-4 Mini is its ability to handle function calls and external integrations. It can automatically identify relevant APIs based on user queries, generate the necessary parameters, and interact with various external systems. For instance, in a smart home management scenario, the model can activate the air conditioning, adjust the lighting, and send a notification—all from a single natural language command and with an average response time of 1.2 seconds.

Industrial applicability and customisability

Both models—the multimodal and the text-specific versions—offer solutions for a wide range of industrial applications. In healthcare, they can analyze CT scans using real-time image processing. In the automotive industry, they support the integration of driver monitoring systems and gesture recognition. In financial services, multi-language document analysis enables real-time report generation and risk analysis. A case study from the Japanese manufacturer Headwaters Co. demonstrates that using the Phi-4 mini edge version can reduce manufacturing errors by up to 40% while processing data locally, thereby protecting industrial secrets.

The models are highly customizable. With Azure AI Foundry tools, they can be easily fine-tuned for domain-specific tasks—whether for language translation, answering medical questions, or other specialized needs. For example, after fine-tuning, an English–Indonesian translation task saw its BLEU score improve from 17.4 to 35.5, while accuracy for medical questions increased from 47.6% to 56.7%. These enhancements not only boost model performance but also expand its practical applications.

The BLEU (Bilingual Evaluation Understudy) score is a standard metric for measuring the quality of machine translation, providing a score between 0 and 100 based on how closely the machine translation matches human reference translations. A comparison table shows that while the Phi-4 mini doesn’t perform better in absolute terms than its competitors, but it has only 3.5 billion parameters. In contrast, the competitor Madlad-400-10B has 10 billion parameters yet only slightly outperforms the Phi-4 mini—highlighting a significant gain in efficiency. Of course, the BLEU score is still far from 100.

Model Parametercount BLEU (base) BLEU (fine tuned) Fine tuning time
Phi-4-mini 3.8B 17.4 35.5 3 hours (16 A100)
Madlad-400-10B 10B 29.1 38.2 14 hours (32 A100)
NLLB-200-distilled 1.3B 22.7 31.9 8 hours (8 A100)
OPUS-MT (latest) 0.5B 15.8 24.3 2 hours (4 A100)
Tower-7B-v0.1 7B 26.4 34.1 12 hours (24 A100)

According to Microsoft, great emphasis has been placed on the security and ethical operation of these models. The Phi-4 family underwent extensive security audits using Microsoft’s AI Red Team and the Python Risk Identification Toolkit (PyRIT), which tested over 120 different attack vectors. As a result, the models are safeguarded against multilingual security tests, database injection attacks, issues in dynamic privilege management, and they support multi-factor authentication. Moreover, the ability to deploy locally allows these systems to operate without an internet connection, while 256-bit encryption ensures the security of local data.

The Phi-4 models are available on three major platforms: Azure AI Foundry, the NVIDIA API Catalog (optimized for the latest GPU architectures such as NVIDIA H100 and Blackwell), and the HuggingFace Hub, where open-source implementations are also offered.

The integration of the Phi-4 model family is set to bring significant changes across various industries. In smartphones, local AI processing can make language translation systems not only faster but also up to 65% more energy efficient. In education technology, adaptive learning platforms can provide personalized feedback to support learning, while predictive maintenance for IoT devices can boost system efficiency. Microsoft also plans to base the next generation of Copilot+ PCs on the Phi-4 models, where local AI processing may improve energy efficiency by up to 90%.

Summary

Overall, the Phi-4 model family represents a significant advance in the field of compact language models. Its multimodal capabilities, compact design, edge computing optimization, and extensive customizability are set to revolutionize AI applications in both everyday life and industrial settings. Microsoft’s innovative approach demonstrates how advanced technology can be made accessible and useful not only for large enterprises but also for a broader audience of everyday users. 

Share this post
After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry
For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.
The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate
The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.
Beyond the Hype: What Does GPT-5 Really Offer?
The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.
The Most Popular Theories About the Impact of AI on the Workplace
Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?
A Brutal Quarter for Apple, but What Comes After the iPhone?
Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.
The Micron 9650: The World's First Commercial PCIe 6.0 SSD
In the age of artificial intelligence and high-performance computing, data speed has become critically important. In this rapidly accelerating digital world, Micron has announced a technological breakthrough that redefines our concept of data center storage. Enter the Micron 9650, the world’s first SSD equipped with a PCIe 6.0 interface—not just another product on the market, but a herald of a new era in server-side storage, offering unprecedented speed and efficiency.