Spatial intelligence is the next hurdle for AGI to overcome

With the advent of LLM, machines have gained impressive capabilities. What's more, their pace of development has accelerated, with new models appearing every day that make machines even more efficient and give them even better capabilities. However, upon closer inspection, this technology has only just enabled machines to think in one dimension. The world we live in, however, is three-dimensional based on human perception. It is not difficult for a human to determine that something is under or behind a chair, or where a ball flying towards us will land. According to many artificial intelligence researchers, in order for AGI, or artificial general intelligence, to be born, machines must be able to think in three dimensions, and for this, spatial intelligence must be developed.

What does spatial intelligence mean?

Spatial intelligence essentially means that an artificial system is able to perceive, understand, and manipulate three-dimensional data, as well as navigate in a 3D environment. This is much more than mere object recognition, which today's AIs are already excellent at. It is about machines recognizing depth, volume, relationships between objects, and spatial context—similar to how we humans interpret the space around us. Dr. Fei-Fei Li, a pioneer in the field of AI and an expert referred to as the “godmother of artificial intelligence,” emphasizes that this ability is just as fundamental to the future of AI as language processing. Just as language laid the foundation for communication, understanding 3D space will enable AI to truly interact meaningfully with our physical environment.

However, achieving this is a serious challenge and does not follow straightforwardly from existing LLM technology. One element of the problem is that language is fundamentally one-dimensional (1D), as linguistic information arrives sequentially, in order—for example, words and syllables come one after another in speech or writing. For this reason, models suitable for language processing, such as LLMs, work well with sequence-based learning (e.g., sequence-to-sequence models). The other problem is that language is a purely generative phenomenon: it is not tangible, we cannot see or touch it, but it originates from the human mind – it is a completely internal construct that we only record afterwards (e.g. in writing).

In contrast, the visual world is three-dimensional (3D), and if we include time, it is four-dimensional (4D). During visual perception, the 3D world is reduced to a two-dimensional projection (e.g., on our retina or in a camera image) – this is a mathematically ill-posed problem (there is no clear solution). In addition, the visual world is not only generative but also reconstructive – bound by real physical laws – and its uses are more diverse, ranging from metaverse generation to robotics. Therefore, according to Fei-Fei Li, modeling spatial intelligence (e.g., 3D world models) is a much more complex and difficult challenge than developing LLMs.

Google Geospatial Reasoning Framework: is this spatial intelligence?

There are a bunch of ways to build spatial intelligence these days. Computer vision and 3D processing play a key role. Lidar, stereo cameras, and structured light sensors are used to collect depth information, which is then processed by neural algorithms. These technologies are already being used in autonomous systems, robotics, and geospatial applications.

The Geospatial Reasoning Framework developed by Google is a significant technological step towards the application of spatial intelligence, building on the company's global geodata infrastructure and advanced generative AI capabilities (for more information, see my previous article Google Geospatial Reasoning: A New AI Tool for Solving Geospatial Problems). The system aims to uncover and interpret complex spatial relationships based on various data, such as satellite images, maps, and mobility patterns. At its core are basic models such as the Population Dynamics Foundation Model, which models population changes, and trajectory-based mobility models, which analyze the movement of people over large areas. These models work in close integration with Google's existing systems (Google Maps, Earth Engine, Street View), giving them access to hundreds of millions of locations and extensive geographic data.

This framework enables, for example, the modeling of urban planning scenarios, spatial analysis of disaster situations, mapping of climate vulnerabilities, and tracking of public health trends. The system uses AI—specifically Gemini capabilities—to automatically perform GIS operations from natural language queries, generate new spatial data content, or present complex geographic relationships.

At the same time, it is important to note that this approach does not cover the entire spectrum of spatial intelligence, especially not the kind of 3D world understanding that Fei-Fei Li refers to. Google's system is fundamentally built on 2D maps and geographic plane models, which are excellent for large-scale, aggregated spatial analysis, but not suited to dealing with fine-grained, object-level 3D relationships, physical laws, or embodied AI tasks. True spatial intelligence—such as when a robot needs to navigate a room, identify objects, or manipulate them—requires much more than on-site data processing: it requires dynamic world modeling, handling of perceptual uncertainty, and understanding of time-varying physical interactions.

According to Dr. Fei-Fei Li, the development of vision took 540 million years of evolution, while language developed in just half a million years — which shows just how fundamental and complex a task it is.

The Paths of the Future

Although remarkable results are already visible in specialized applications, achieving human-level spatial intelligence remains an ambitious goal. Initiatives such as World Labs, which attract huge investments, show that the industry sees great potential in this area. In the future, the effective integration of different types of spatial intelligence—from fine 3D object manipulation to large-scale geographic reasoning—will be key. In addition, standardized measurement and evaluation frameworks need to be developed to accurately track progress. Collaboration between experts in computer vision, robotics, cognitive science, and geography is essential for success. This is because training models with spatial intelligence is extremely difficult. While there is a wealth of text and images on the web for training LLM models, acquiring such a large amount of data about the 3D world is not only a major challenge, but also requires completely new approaches.

But how long will all this take? Obviously, no one knows the answer, as the task is so complex that even the researchers themselves are reluctant to make predictions. However, there is one noteworthy story worth mentioning in this regard. In an interview, Dr. Fei-Fei Li said that when she graduated from university, her dream was that, with her life's work, she might be able to create software that could describe what is in a picture in words. In 2015, she and her colleagues and students (Andrej Karpathy, Justin Johnson, etc.) suddenly found themselves with a ready-made solution. Dr. Li was a little disappointed and wondered what the hell she was going to do with the rest of her life. She jokingly remarked to Andrej Karpathy that now they should create the reverse of the software, i.e., generate an image from text. Andrej laughed at this funny absurdity, and Dr. Li probably chuckled to herself, but those of us who haven't spent the last few years living in a cave or under a rock know how the story ended. 

Share this post
After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry
For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.
The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate
The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.
Beyond the Hype: What Does GPT-5 Really Offer?
The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.
The Most Popular Theories About the Impact of AI on the Workplace
Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?
A Brutal Quarter for Apple, but What Comes After the iPhone?
Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.
The Micron 9650: The World's First Commercial PCIe 6.0 SSD
In the age of artificial intelligence and high-performance computing, data speed has become critically important. In this rapidly accelerating digital world, Micron has announced a technological breakthrough that redefines our concept of data center storage. Enter the Micron 9650, the world’s first SSD equipped with a PCIe 6.0 interface—not just another product on the market, but a herald of a new era in server-side storage, offering unprecedented speed and efficiency.