Phi-4 model family expanded with two new models

 Microsoft recently announced the new generation of the Phi-4 model family, which now includes two distinct yet complementary models: the Phi-4 multimodal and the Phi-4 mini. These models not only further improve computing performance but also combine different data types in innovative ways to support a wide range of AI applications—all in a compact design with optimized resource usage.

The Phi-4 multimodal model

The Phi-4 multimodal model, built on an architecture with 5.6 billion parameters, is an excellent solution for processing speech, image, and text data simultaneously. Unlike traditional systems—where different models handle each data type—this model uses a mixture-of-LoRAs technology to represent various data types together. As a result, it processes information faster and requires fewer computing resources, which is especially important for edge computing and real-time applications.

Its optimized design uses 40% less RAM, and with a context window that can handle up to 128,000 tokens, it easily manages long and complex content. The model also supports up to 85 languages, making it suitable for global applications. In several key benchmarks, the Phi-4 multimodal model achieved outstanding results. For example, it set a new record in speech recognition on the HuggingFace OpenASR leaderboard with a 6.14% word error rate, reached 89.2% accuracy in document analysis on the DocVQA test, and achieved a 78.5% success rate on scientific questions—performance levels comparable to the latest generation models. Furthermore, for tasks that require processing different types of data together (such as understanding a diagram alongside its spoken instructions), the model delivered results that were 35% more accurate than competing solutions.

The Phi-4-mini model

With 3.8 billion parameters, the Phi-4 Mini model is optimized for text-based tasks. It uses a decoder-only transformer architecture with a grouped-query attention mechanism. This approach not only cuts memory usage—requiring 22% fewer resources—but also maintains context sensitivity. By default, the model supports 43 languages and can manage context windows of up to 128,000 tokens, making it ideal for processing long texts and documents.

Another notable innovation of the Phi-4 Mini is its ability to handle function calls and external integrations. It can automatically identify relevant APIs based on user queries, generate the necessary parameters, and interact with various external systems. For instance, in a smart home management scenario, the model can activate the air conditioning, adjust the lighting, and send a notification—all from a single natural language command and with an average response time of 1.2 seconds.

Industrial applicability and customisability

Both models—the multimodal and the text-specific versions—offer solutions for a wide range of industrial applications. In healthcare, they can analyze CT scans using real-time image processing. In the automotive industry, they support the integration of driver monitoring systems and gesture recognition. In financial services, multi-language document analysis enables real-time report generation and risk analysis. A case study from the Japanese manufacturer Headwaters Co. demonstrates that using the Phi-4 mini edge version can reduce manufacturing errors by up to 40% while processing data locally, thereby protecting industrial secrets.

The models are highly customizable. With Azure AI Foundry tools, they can be easily fine-tuned for domain-specific tasks—whether for language translation, answering medical questions, or other specialized needs. For example, after fine-tuning, an English–Indonesian translation task saw its BLEU score improve from 17.4 to 35.5, while accuracy for medical questions increased from 47.6% to 56.7%. These enhancements not only boost model performance but also expand its practical applications.

The BLEU (Bilingual Evaluation Understudy) score is a standard metric for measuring the quality of machine translation, providing a score between 0 and 100 based on how closely the machine translation matches human reference translations. A comparison table shows that while the Phi-4 mini doesn’t perform better in absolute terms than its competitors, but it has only 3.5 billion parameters. In contrast, the competitor Madlad-400-10B has 10 billion parameters yet only slightly outperforms the Phi-4 mini—highlighting a significant gain in efficiency. Of course, the BLEU score is still far from 100.

Model Parametercount BLEU (base) BLEU (fine tuned) Fine tuning time
Phi-4-mini 3.8B 17.4 35.5 3 hours (16 A100)
Madlad-400-10B 10B 29.1 38.2 14 hours (32 A100)
NLLB-200-distilled 1.3B 22.7 31.9 8 hours (8 A100)
OPUS-MT (latest) 0.5B 15.8 24.3 2 hours (4 A100)
Tower-7B-v0.1 7B 26.4 34.1 12 hours (24 A100)

According to Microsoft, great emphasis has been placed on the security and ethical operation of these models. The Phi-4 family underwent extensive security audits using Microsoft’s AI Red Team and the Python Risk Identification Toolkit (PyRIT), which tested over 120 different attack vectors. As a result, the models are safeguarded against multilingual security tests, database injection attacks, issues in dynamic privilege management, and they support multi-factor authentication. Moreover, the ability to deploy locally allows these systems to operate without an internet connection, while 256-bit encryption ensures the security of local data.

The Phi-4 models are available on three major platforms: Azure AI Foundry, the NVIDIA API Catalog (optimized for the latest GPU architectures such as NVIDIA H100 and Blackwell), and the HuggingFace Hub, where open-source implementations are also offered.

The integration of the Phi-4 model family is set to bring significant changes across various industries. In smartphones, local AI processing can make language translation systems not only faster but also up to 65% more energy efficient. In education technology, adaptive learning platforms can provide personalized feedback to support learning, while predictive maintenance for IoT devices can boost system efficiency. Microsoft also plans to base the next generation of Copilot+ PCs on the Phi-4 models, where local AI processing may improve energy efficiency by up to 90%.

Summary

Overall, the Phi-4 model family represents a significant advance in the field of compact language models. Its multimodal capabilities, compact design, edge computing optimization, and extensive customizability are set to revolutionize AI applications in both everyday life and industrial settings. Microsoft’s innovative approach demonstrates how advanced technology can be made accessible and useful not only for large enterprises but also for a broader audience of everyday users. 

Share this post
 Thinkless: Fight against the Growing Resource Demands of AI
In recent months, major tech companies have announced a series of reasoning features in their models. However, the immense resource requirements of these systems quickly became apparent, causing the prices of such subscription services to soar. Researchers at the National University of Singapore (NUS) have developed a new framework called "Thinkless", which could significantly transform how large language models (LLMs) handle reasoning tasks. This innovative approach, created by Gongfan Fang, Xinyin Ma, and Xinchao Wang at the NUS xML Lab, enables AI systems to dynamically choose between simple and complex reasoning strategies—potentially reducing computational costs by up to 90%. The framework addresses a critical inefficiency in current AI reasoning methods and represents a major step toward more resource-efficient AI.
The EU’s Open Web Index Project: Another Step Toward Digital Independence
The Open Web Index (OWI) is an open-source initiative under the European Union’s Horizon Programme, aimed at democratizing web-search technologies and strengthening Europe’s digital sovereignty. The project will launch in June 2025, providing a common web index accessible to all and decoupling the indexing infrastructure from the search services that use it. In doing so, the OWI offers not only technical innovations but also a paradigm shift in the global search market—today, a single player (Google) holds over ninety percent of the market share and determines access to online information.
Android 16 launches with enhanced protection
The new Android 16 release offers the platform’s three billion users the most comprehensive device-level protection to date. It focuses on safeguarding high-risk individuals while also marking a significant advancement for all security-conscious users. The system’s cornerstone is the upgraded Advanced Protection Program, which now activates a full suite of device-level defense mechanisms rather than the previous account-level settings. As a result, journalists, public figures, and other users vulnerable to sophisticated cyber threats can enable the platform’s strongest security features with a single switch.
Gemini Advanced Strengthens GitHub Integration
There is no shortage of innovation in the world of AI-based development tools. Google has now announced direct GitHub integration for its premium AI assistant, Gemini Advanced. This move is not only a response to similar developments by its competitor OpenAI, but also a significant step forward in improving developer workflows.