Phi-4 model family expanded with two new models

 Microsoft recently announced the new generation of the Phi-4 model family, which now includes two distinct yet complementary models: the Phi-4 multimodal and the Phi-4 mini. These models not only further improve computing performance but also combine different data types in innovative ways to support a wide range of AI applications—all in a compact design with optimized resource usage.

The Phi-4 multimodal model

The Phi-4 multimodal model, built on an architecture with 5.6 billion parameters, is an excellent solution for processing speech, image, and text data simultaneously. Unlike traditional systems—where different models handle each data type—this model uses a mixture-of-LoRAs technology to represent various data types together. As a result, it processes information faster and requires fewer computing resources, which is especially important for edge computing and real-time applications.

Its optimized design uses 40% less RAM, and with a context window that can handle up to 128,000 tokens, it easily manages long and complex content. The model also supports up to 85 languages, making it suitable for global applications. In several key benchmarks, the Phi-4 multimodal model achieved outstanding results. For example, it set a new record in speech recognition on the HuggingFace OpenASR leaderboard with a 6.14% word error rate, reached 89.2% accuracy in document analysis on the DocVQA test, and achieved a 78.5% success rate on scientific questions—performance levels comparable to the latest generation models. Furthermore, for tasks that require processing different types of data together (such as understanding a diagram alongside its spoken instructions), the model delivered results that were 35% more accurate than competing solutions.

The Phi-4-mini model

With 3.8 billion parameters, the Phi-4 Mini model is optimized for text-based tasks. It uses a decoder-only transformer architecture with a grouped-query attention mechanism. This approach not only cuts memory usage—requiring 22% fewer resources—but also maintains context sensitivity. By default, the model supports 43 languages and can manage context windows of up to 128,000 tokens, making it ideal for processing long texts and documents.

Another notable innovation of the Phi-4 Mini is its ability to handle function calls and external integrations. It can automatically identify relevant APIs based on user queries, generate the necessary parameters, and interact with various external systems. For instance, in a smart home management scenario, the model can activate the air conditioning, adjust the lighting, and send a notification—all from a single natural language command and with an average response time of 1.2 seconds.

Industrial applicability and customisability

Both models—the multimodal and the text-specific versions—offer solutions for a wide range of industrial applications. In healthcare, they can analyze CT scans using real-time image processing. In the automotive industry, they support the integration of driver monitoring systems and gesture recognition. In financial services, multi-language document analysis enables real-time report generation and risk analysis. A case study from the Japanese manufacturer Headwaters Co. demonstrates that using the Phi-4 mini edge version can reduce manufacturing errors by up to 40% while processing data locally, thereby protecting industrial secrets.

The models are highly customizable. With Azure AI Foundry tools, they can be easily fine-tuned for domain-specific tasks—whether for language translation, answering medical questions, or other specialized needs. For example, after fine-tuning, an English–Indonesian translation task saw its BLEU score improve from 17.4 to 35.5, while accuracy for medical questions increased from 47.6% to 56.7%. These enhancements not only boost model performance but also expand its practical applications.

The BLEU (Bilingual Evaluation Understudy) score is a standard metric for measuring the quality of machine translation, providing a score between 0 and 100 based on how closely the machine translation matches human reference translations. A comparison table shows that while the Phi-4 mini doesn’t perform better in absolute terms than its competitors, but it has only 3.5 billion parameters. In contrast, the competitor Madlad-400-10B has 10 billion parameters yet only slightly outperforms the Phi-4 mini—highlighting a significant gain in efficiency. Of course, the BLEU score is still far from 100.

Model Parametercount BLEU (base) BLEU (fine tuned) Fine tuning time
Phi-4-mini 3.8B 17.4 35.5 3 hours (16 A100)
Madlad-400-10B 10B 29.1 38.2 14 hours (32 A100)
NLLB-200-distilled 1.3B 22.7 31.9 8 hours (8 A100)
OPUS-MT (latest) 0.5B 15.8 24.3 2 hours (4 A100)
Tower-7B-v0.1 7B 26.4 34.1 12 hours (24 A100)

According to Microsoft, great emphasis has been placed on the security and ethical operation of these models. The Phi-4 family underwent extensive security audits using Microsoft’s AI Red Team and the Python Risk Identification Toolkit (PyRIT), which tested over 120 different attack vectors. As a result, the models are safeguarded against multilingual security tests, database injection attacks, issues in dynamic privilege management, and they support multi-factor authentication. Moreover, the ability to deploy locally allows these systems to operate without an internet connection, while 256-bit encryption ensures the security of local data.

The Phi-4 models are available on three major platforms: Azure AI Foundry, the NVIDIA API Catalog (optimized for the latest GPU architectures such as NVIDIA H100 and Blackwell), and the HuggingFace Hub, where open-source implementations are also offered.

The integration of the Phi-4 model family is set to bring significant changes across various industries. In smartphones, local AI processing can make language translation systems not only faster but also up to 65% more energy efficient. In education technology, adaptive learning platforms can provide personalized feedback to support learning, while predictive maintenance for IoT devices can boost system efficiency. Microsoft also plans to base the next generation of Copilot+ PCs on the Phi-4 models, where local AI processing may improve energy efficiency by up to 90%.

Summary

Overall, the Phi-4 model family represents a significant advance in the field of compact language models. Its multimodal capabilities, compact design, edge computing optimization, and extensive customizability are set to revolutionize AI applications in both everyday life and industrial settings. Microsoft’s innovative approach demonstrates how advanced technology can be made accessible and useful not only for large enterprises but also for a broader audience of everyday users. 

Share this post
Apple’s AI Doctor Plans
Apple is developing an AI-powered health coach, codenamed “Project Mulberry,” designed to give personalized advice for everyday life. The new feature is expected to be included in a future iOS 19.4 update—likely in spring or summer 2026—and will first launch in the US.
Babylon.js 8.0 Has Arrived
After a year of intense development, Microsoft has finally unveiled Babylon.js version 8.0. This new release brings several advanced features that enable faster, more visually appealing, and interactive web experiences. IBL shadows create more realistic environment lighting, while area lights allow for simple yet effective 2D lighting effects. In addition, the Node Render Graph—introduced in its alpha stage—gives developers full control over the rendering process, and the new Lightweight Viewer and WGSL Core Engine shaders help reduce development time and boost performance.
 Credit-based Windows Notepad usage with Copilot integration
Microsoft is introducing a new feature in Windows Notepad that allows you to use Microsoft Copilot, an artificial intelligence to improve your writing in Notepad. The feature allows you to rephrase your writing, generate a summary, or make other text tweaks such as adjusting the tone or style of text.
Zorin OS 17.3 with the Most User-Friendly GNOME Desktop Environment
Released on March 26, 2023, Zorin OS 17.3 introduces several new features to enhance the user experience, security, and system compatibility.