Over the past few decades, software development has fundamentally shaped our digital world, but the latest technological breakthroughs are ushering in a new era in which computer programming is undergoing a radical transformation. According to Andrej Karpathy, former director of artificial intelligence at Tesla, software development has accelerated dramatically in recent years after decades of slow change, fundamentally rewriting our understanding of programming.
The evolution of software
Traditional software development, which Karpathy calls Software 1.0, is about programmers writing explicit instructions for the computer. This code, written in Python or C++, for example, is used to perform precisely defined tasks on the computer. Just think of the vast universe of code stored on GitHub, all of which is based on this principle.
However, a few years ago, a new type of software appeared, Software 2.0, which is based on neural networks. Here, programmers no longer write the code directly, but provide data and fine-tuning processes that allow neural networks to “learn” how to perform tasks. For example, an image recognition system does not identify cats based on explicit rules, but learns recognition patterns from a huge amount of data. The Hugging Face platform is the GitHub of Software 2.0, where models and weights can be shared instead of storing traditional source code.
The latest and perhaps most revolutionary change is the emergence of Software 3.0, led by large language models (LLMs). These models can be “programmed” in simple human language, such as instructions written in English, or “prompts.” This is a completely new process that allows us to simply ask the language model to perform the task instead of writing Python code for an algorithm. According to Karpathy, this change is extremely noteworthy, as we can program computers in our native language. The development of Tesla Autopilot is a good example of this trend: as neural networks have evolved, more and more of the previously handwritten C++ code has been replaced by Software 2.0-based solutions. Now the same process is beginning with Software 3.0, where LLMs are taking over the role of traditional code.
LLMs as operating systems and “utility services”
Karpathy draws exciting analogies between LLMs and other well-known systems. On the one hand, LLMs are very similar to utility services such as electricity. OpenAI, Google Gemini, and Anthropic require significant capital investment to train their models, and then offer these “services” through APIs with metered access (e.g., pay-per-token). Just as we expect low latency and high availability in electricity supply, these are also fundamental requirements for LLMs. The parallel can be observed here as well: if a large language model goes down, it can cause a “power outage” in the global intelligence supply, paralyzing systems that rely on the models.
On the other hand, LLMs are increasingly reminiscent of operating systems. They are no longer simple commodities like water or electricity, but complex software ecosystems that are becoming increasingly complex through the use of different devices and multimodality (text, image, and sound processing). Just as there is an open source alternative (Linux) to Windows or macOS, there is also competition between closed source models (e.g., GPT, Claude, Gemini) and open source initiatives (e.g., the LLaMA ecosystem) in the field of LLMs.
Currently, LLM computing remains expensive, which centralizes models in the cloud. This is reminiscent of the 1960s, when computers were similarly centralized and operated in time-sharing systems. The “personal computer revolution” has not yet fully taken place in the field of LLMs, although there are already initial signs of the emergence of local, “personal LLMs.” It is also interesting that users now communicate directly with LLMs via a text interface, just as they would with an operating system via a terminal, and graphical user interfaces (GUIs) have not yet appeared in general.
However, there is also a significant difference: LLMs are disrupting the traditional direction of technology diffusion. In the past, new, transformative technologies (such as electricity and computers) were first used by governments and large corporations and later spread to consumers. In the case of LLMs, however, the opposite is true: it is the general public, everyday users, who are the first and most intensive users of these new “magical computers.”
The psychology of LLMs and “partially autonomous applications”
To understand how LLMs work, it is important to examine their “psychology.” According to Karpathy, LLMs are like stochastic simulations of the human mind. They have been trained on vast amounts of text data, so they have encyclopedic knowledge and can remember many things, like a “rain man”—but at the same time, they have significant cognitive deficits. They tend to hallucinate, make up things, and lack self-awareness. Their intelligence is “toothed”: in certain areas, they perform superhumanly, but at other times they make fundamental mistakes that no human would make. For example, they may insist that 9.11 is greater than 9.9. They also suffer from amnesia: their context window (i.e., their current “memory”) is “cleared” after each interaction, and they are unable to learn permanently on their own, as humans do.
This duality—superpowers and cognitive deficits—fundamentally defines collaboration with LLMs. That is why, Karpathy emphasizes, “partially autonomous applications” are the future. This means that LLMs do not function as fully autonomous agents, but rather assist users in performing tasks while humans continue to supervise and control the processes.
The coding application Cursor is an excellent example of this: instead of copying code directly into the chatbot, the application integrates the LLM into the development environment. These applications have the following characteristics:
-
Context management: LLMs automatically manage relevant information and context.
-
Multiple LLM invocations: Applications use multiple LLMs in the background, such as embedding models, chat models, and code file modification models.
-
Application-specific graphical user interface (GUI): Visual interfaces are critical because they make it easier for humans to monitor the system's work and perform tasks more quickly. For example, an interface that visually displays code changes is much more effective than a long text description.
-
Autonomy slider: This feature allows the user to control the degree of autonomy enjoyed by the LLM. For example, in Cursor, you can choose between completion, modification of a code snippet, modification of an entire file, or even automatic transformation of the entire code base.
The search application called Perplexity works on a similar principle, with source citation and various levels of search options.
Collaboration with artificial intelligence
One of the most important aspects of collaboration with artificial intelligence is accelerating the generation and verification cycle. AI generates, humans verify – that is the division of labor. There are two main ways to accelerate this cycle:
-
Speeding up verification: Graphical user interfaces (GUIs) are extremely important, as visual information can be processed much faster by the human brain than text.
-
Keeping AI on a leash: It is important that AI does not generate modifications that are too large and unmanageable (e.g., a 10,000-line code modification), because human verification will still remain a bottleneck. As Tesla experienced during the development of its self-driving systems, partial autonomy was implemented step by step in a controlled manner, and even today, there is still a lot of human intervention.
Karpathy warns against excessive optimism about AI agents. Although “2025 is the year of artificial intelligence agents” is a bold statement, Karpathy believes that we are more likely to be facing a decade of agents, and human supervision will continue to be essential. The “Iron Man suit” analogy perfectly illustrates this approach: we should not be building Iron Man robots, but Iron Man suits that enhance human capabilities. The goal is to create partially autonomous products with customized GUIs and user interfaces, where autonomy can be controlled with a slider to regulate the degree of system intervention. This ensures that artificial intelligence provides effective assistance while humans remain in control of the processes and minimize the risk of errors.