Apple's New AI Models Can Understand What’s on Your Screen

2025-07-17T07:00:00.000+00:00 2025 July 17. 07:00 Attila Fodor

When we look at our phone's display, what we see feels obvious—icons, text, and buttons we’re used to. But how does artificial intelligence interpret that same interface? This question is at the heart of joint research between Apple and Finland’s Aalto University, resulting in a model called ILuvUI. This development isn’t just a technical milestone; it’s a major step toward enabling digital systems to truly understand how we use applications—and how they can assist us even more effectively.

ILuvUI (Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations) is a so-called visual-linguistic model that can interpret both images and text-based instructions. But it doesn’t stop at recognizing screen elements—it’s designed to understand user intent, interpret visual information in context, and support more natural interaction within digital environments.

Most of today’s AI models are trained primarily on natural images, like animals or landscapes. While these models can perform well when answering text-based questions, they often struggle with the structured and complex layouts of mobile app interfaces. ILuvUI, on the other hand, was built specifically to understand such structured environments, and it outperformed its open-source base model, LLaVA, not just in machine-based evaluations but also in human preference tests.

Instead of being trained on real user interactions, ILuvUI was trained on synthetically generated data—such as detailed screen descriptions, Q&A dialogues, and the expected outcomes of various user actions. Perhaps its most remarkable feature is that it doesn’t require designated screen areas to function. It can interpret the entire content of a screen based on a simple text prompt and respond accordingly.

One of the most promising application areas for this technology is accessibility. For users who, for any reason, cannot visually follow what’s happening on an app interface, this could be a powerful tool to help them navigate digital spaces that would otherwise be difficult to access. In addition, automated testing could also benefit significantly, as a more intelligent interpretation of user interface behavior can speed up debugging and improve operational checks.

It’s important to note that ILuvUI is not a finished product. Future development plans include expanding its image encoders, improving resolution handling, and supporting output formats that integrate seamlessly with app development environments. Even so, the current foundation is promising—and it ties directly into another major Apple initiative: its next-generation AI system, Apple Intelligence.

This new system brings the latest advances in generative language models directly to Apple devices. It includes several components: a smaller, on-device model that ensures fast and energy-efficient performance, and a larger, server-based model for handling more complex tasks. These architectures feature innovations aimed at reducing memory use and processing time. Apple has also invested heavily in image understanding, developing a custom vision encoder trained specifically on image data.

Apple emphasizes that it does not use personal data to train these models. Instead, they’re built using licensed, open-source, and publicly available datasets, along with content collected by its web crawler, Applebot. Additional filtering mechanisms are in place to ensure the models do not include personally identifiable or unsafe information. Privacy remains a cornerstone of the system’s design, with development grounded in on-device processing and a new infrastructure called Private Cloud Compute.

With its Foundation Models framework, Apple allows developers to integrate these models directly into their apps. This includes guided text generation, support for Swift-type data structures, and the ability to call device functions—enabling developers to build reliable, focused AI features tailored to specific services or data sources.

While public demos often emphasize the speed, efficiency, and "intelligence" of these new AI systems, it’s essential to remember that they are still human-designed tools. They don’t possess intentions or understanding of their own. Nonetheless, they are getting ever closer to interpreting users’ goals and responding in meaningful, context-aware ways.

Share this post

2025. August 17.

After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry

For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.

2025. August 12.

The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate

The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.

2025. August 08.

Beyond the Hype: What Does GPT-5 Really Offer?

The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.

2025. August 07.

The Most Popular Theories About the Impact of AI on the Workplace

Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?

2025. August 01.

A Brutal Quarter for Apple, but What Comes After the iPhone?

Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.

2025. July 31.

The Micron 9650: The World's First Commercial PCIe 6.0 SSD

In the age of artificial intelligence and high-performance computing, data speed has become critically important. In this rapidly accelerating digital world, Micron has announced a technological breakthrough that redefines our concept of data center storage. Enter the Micron 9650, the world’s first SSD equipped with a PCIe 6.0 interface—not just another product on the market, but a herald of a new era in server-side storage, offering unprecedented speed and efficiency.

Apple's New AI Models Can Understand What’s on Your Screen

After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry

The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate

Beyond the Hype: What Does GPT-5 Really Offer?

The Most Popular Theories About the Impact of AI on the Workplace

A Brutal Quarter for Apple, but What Comes After the iPhone?

The Micron 9650: The World's First Commercial PCIe 6.0 SSD

Linux distributions

FreeBSD

SDesk

blendOS

Archcraft

KDE neon

MidnightBSD

SmartOS

AnduinOS

PikaOS Linux

Athena OS

CentOS

FunOS

Peppermint OS

Zorin OS

Desktop Environments

Cosmic

Bspwm

Cinnamon

Hyprland

KDE Plasma

Gnome

Budgie

Popular