All it takes is a photo and a voice recording – Alibaba's new artificial intelligence creates a full-body avatar from them

2025-07-04T09:00:00.000+00:00 2025 July 04. 09:00 Attila Fodor

A single voice recording and a photo are enough to create lifelike, full-body virtual characters with facial expressions and emotions – without a studio, actor, or green screen. Alibaba's latest development, an open-source artificial intelligence model called OmniAvatar, promises to do just that. Although the technology is still evolving, it is already worth paying attention to what it enables – and what new questions it raises.

OmniAvatar is based on a multi-channel learning approach: the model processes data from voice, image, and text prompts simultaneously. It breaks down speech into smaller units and uses them to infer the emotional charge, emphasis, and rhythm of the moment. The model then generates a moving, talking character video that reflects emotions, working in conjunction with the specified image and text prompts. The system is not only capable of synchronizing mouth movements, but also of harmonizing body language and facial expressions with what is being said—in fact, the character can even interact with objects, for example, pointing, lifting something, or gesturing.

One of the important innovations of the development is that the user can control all this with simple text commands. For example, we can specify that the character should smile, be angry or surprised, or that the scene should take place in an office or even under a lemon tree. This opens up new possibilities in content creation: educational videos, virtual tours, customer service role-playing, and even the creation of singing avatars become easier – without motion capture or actors.

However, the model's uniqueness lies not only in its technological flexibility, but also in the fact that it has been made available as open source. This is a rare step in the world of cutting-edge technologies developed at the corporate level. With this decision, Alibaba and Zhejiang University, which collaborated on the development, are giving researchers, developers, and creative professionals around the world the opportunity to experiment with it, customize it, and even integrate it into their own applications.

Emotion Control

OmniAvatar can control the emotions through prompts, like happy, angry, surprise and sad. pic.twitter.com/fcJQ4ZmSVV
— Angry Tom (@AngryTomtweets) July 1, 2025

It is important to note, however, that the characters seen in the current demonstration videos are not yet completely free of artificial effects. Some observers report a “plastic” visual world that is somewhat distant from realism. However, this is not necessarily a disadvantage: the characters may still be suitable for informational, educational, or promotional purposes, especially in situations where the goal is not realism but effective content delivery. Moreover, with the advancement of technical details, this visual limitation may gradually disappear.

The research team has only published partial technical documentation on the construction of the underlying system, but based on the published scientific communication, the model works with so-called cross-modal (multisensory) learning. This means that it achieves the rich movement and emotion output presented in the demonstration videos through the combined interpretation of sound and visual signals.

The future of this technology depends on a number of factors, primarily on how successful it will be in making avatars look more natural and how well it can be integrated into various industry practices. At the same time, the direction it is taking is already clear: we are moving increasingly towards automated yet personal digital communication with body language and emotions.

Due to the accessibility and versatility of the tool, it offers exciting opportunities for both research and practical applications. The key question for the coming years will be how we exploit this opportunity: will we be able to integrate it into everyday digital communication in a value-creating, thoughtful way, or will it remain just another spectacular technological promise? The answer is still open – but the tool is already in our hands, and anyone can download it from the official GitHub repository.

Share this post

2025. August 17.

After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry

For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.

2025. August 12.

The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate

The world of science is built on curiosity, collaboration, and collective progress—at least in principle. In reality, however, it has always been marked by competition, inequality, and the potential for error. The scientific community has long feared that these pressures could divert some researchers from the fundamental mission of science: creating credible knowledge. For a long time, fraud appeared to be mainly the work of lone perpetrators. In recent years, however, a troubling trend has emerged: growing evidence suggests that fraud is no longer a series of isolated missteps but an organized, industrial-scale activity, according to a recent study.

2025. August 08.

Beyond the Hype: What Does GPT-5 Really Offer?

The development of artificial intelligence has accelerated rapidly in recent years, reaching a point where news about increasingly advanced models is emerging at an almost overwhelming pace. In this noisy environment, it’s difficult for any new development to stand out, as it must be more and more impressive to cross the threshold of user interest. OpenAI carries a double burden in this regard: not only must it continue to innovate, but it also needs to maintain its lead over fast-advancing competitors. It is into this tense landscape that OpenAI’s newly unveiled GPT-5 model family has arrived—eagerly anticipated by critics who, based on early announcements, expect nothing less than a new milestone in AI development. The big question, then, is whether it lives up to these expectations. In this article, we will examine how GPT-5 fits into the trajectory of AI model evolution, what new features it introduces, and how it impacts the current technological ecosystem.

2025. August 07.

The Most Popular Theories About the Impact of AI on the Workplace

Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?

2025. August 01.

A Brutal Quarter for Apple, but What Comes After the iPhone?

Amid global economic and trade challenges, Apple has once again proven its extraordinary market power, surpassing analyst expectations in the third quarter of its 2025 fiscal year. The Cupertino giant not only posted record revenue for the period ending in June but also reached a historic milestone: the shipment of its three billionth iPhone. This achievement comes at a time when the company is grappling with the cost of punitive tariffs, intensifying competition in artificial intelligence, and a series of setbacks in the same field.

2025. July 31.

The Micron 9650: The World's First Commercial PCIe 6.0 SSD

In the age of artificial intelligence and high-performance computing, data speed has become critically important. In this rapidly accelerating digital world, Micron has announced a technological breakthrough that redefines our concept of data center storage. Enter the Micron 9650, the world’s first SSD equipped with a PCIe 6.0 interface—not just another product on the market, but a herald of a new era in server-side storage, offering unprecedented speed and efficiency.

All it takes is a photo and a voice recording – Alibaba's new artificial intelligence creates a full-body avatar from them

After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry

The Number of Organized Scientific Fraud Cases is Growing at an Alarming Rate

Beyond the Hype: What Does GPT-5 Really Offer?

The Most Popular Theories About the Impact of AI on the Workplace

A Brutal Quarter for Apple, but What Comes After the iPhone?

The Micron 9650: The World's First Commercial PCIe 6.0 SSD

Linux distributions

Br OS

CachyOS

SystemRescue

ALT Linux

Calculate Linux

AnduinOS

DynFi Firewall

OSMC

Q4OS

SmartOS

Tails

Voyager Live

Bluestar Linux

Desktop Environments

Gnome

KDE Plasma

Hyprland

Cinnamon

Popular

Linux Kernel 6.16 Released

TypeScript 5.9

Beyond the Hype: What Does GPT-5 Really Offer?

What is WhoFi?

After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry