This Animated Mona Lisa Was Created by AI, and It Is Terrifying

Deepfake AI images of the Mona Lisa.
A new type of artificial intelligence can generate a "living portrait" from just one image. (Image credit: Egor Zakharov)

The enigmatic, painted smile of the "Mona Lisa" is known around the world, but that famous face recently displayed a startling new range of expressions, courtesy of artificial intelligence (AI).

In a video shared to YouTube on May 21, three video clips show disconcerting examples of the Mona Lisa as she moves her lips and turns her head. She was created by a convolutional neural network — a type of AI that processes information much as a human brain does, to analyze and process images.

Researchers trained the algorithm to understand facial features' general shapes and how they behave relative to each other, and then to apply that information to still images. The result was a realistic video sequence of new facial expressions from a single frame. [Can Machines Be Creative? Meet 9 AI 'Artists']

For the Mona Lisa videos, the AI "learned" facial movement from datasets of three human subjects, producing three very different animations. While each of the three clips was still recognizable as the Mona Lisa, variations in the training models' looks and behavior lent distinct "personalities" to the "living portraits," Egor Zakharov, an engineer with the Skolkovo Institute of Science and Technology, and the Samsung AI Center (both located in Moscow), explained in the video.

Zakharov and his colleagues also generated animations from photos of 20th-century cultural icons such as Albert Einstein, Marilyn Monroe and Salvador Dali. The researchers described their findings, which were not peer-reviewed, in a study published online May 20 in the preprint journal arXiv.

Familiar faces take on unfamiliar expressions. (Image credit: E. Zakharov et al.)

Producing original videos such as these, known as deepfakes, isn't easy. Human heads are geometrically complex and highly dynamic; 3D models of heads have "tens of millions of parameters," the study authors wrote.

What's more, the human vision system is very good at identifying "even minor mistakes" in 3D-modeled human heads, according to the study. Seeing something that looks almost human — but not quite — triggers a sensation of profound unease known as the uncanny valley effect.

AI has previously demonstrated that producing convincing deepfakes is possible, but it required multiple angles of the desired subject. For the new study, the engineers introduced the AI to a very large dataset of reference videos showing human faces in action. The scientists established facial landmarks that would apply to any face, to teach the neural network how faces behave in general.

Then, they trained the AI to use the reference expressions to map movement of the source's features. This enabled the AI to create a deepfake even when it had just one image to work from, the researchers reported.

And more source images delivered an even more detailed result in the final animation. Videos created from 32 images, rather than just one, achieved "perfect realism" in a user study, the scientists wrote.

Originally published on Live Science.

Mindy Weisberger
Live Science Contributor

Mindy Weisberger is an editor at Scholastic and a former Live Science channel editor and senior writer. She has reported on general science, covering climate change, paleontology, biology, and space. Mindy studied film at Columbia University; prior to Live Science she produced, wrote and directed media for the American Museum of Natural History in New York City. Her videos about dinosaurs, astrophysics, biodiversity and evolution appear in museums and science centers worldwide, earning awards such as the CINE Golden Eagle and the Communicator Award of Excellence. Her writing has also appeared in Scientific American, The Washington Post and How It Works Magazine.