AI-powered humanoid robot can serve you food, stack the dishes — and have a conversation with you

Artificial intelligence robot (left) hands an apple to a man (right).
In the new promotional video, a technician asks Figure 01 to perform a range of simple tasks in a minimalist test environment resembling a kitchen. (Image credit: Figure)

A self-correcting humanoid robot that learned to make a cup of coffee just by watching footage of a human doing it can now answer questions thanks to an integration with OpenAI's technology.

In the new promotional video, a technician asks Figure 01 to perform a range of simple tasks in a minimalist test environment resembling a kitchen. He first asks the robot for something to eat and is handed an apple. Next, he asked Figure 01 to explain why it handed him an apple while it was picking up some trash. The robot answers all the questions in a robotic but friendly voice.

Related: Watch scientists control a robot with their hands while wearing the Apple Vision Pro

The company said in its video that the conversation is powered by an integration with technology made by OpenAI — the name behind ChatGPT. It's unlikely that Figure 01 is using ChatGPT itself, however, because that AI tool does not normally use pause words like "um," which this robot does.

See more

Should everything in the video work as claimed, it means an advancement in two key areas for robotics. As experts previously told Live Science, the first advancement is the mechanical engineering behind dexterous, self-correcting movements like people can perform. It means very precise motors, actuators and grippers inspired by joints or muscles, as well as the motor control to manipulate them to carry out a task and hold objects delicately. 

Even picking up a cup — something which people barely think about consciously — uses intensive on-board processing to orient muscles in precise sequence.

The second advancement is real-time natural language processing (NLP) thanks to the addition of OpenAI's engine — which needs to be as immediate and responsive as ChatGPT when you type a query into it. It also needs software to translate this data into audio, or speech. NLP is a field of computer science that aims to give machines the capacity to understand and convey speech.

Although the footage appears impressive, so far Livescience.com is sceptical. Listen at 0.52s and again at 1.49s, when Figure 01 starts a sentence with a quick 'uh', and repeats the word 'I', just like a human taking a split second to get her thoughts in order to speak. Why (and how) would an AI speech engine include such random, humanlike tics of diction? Overall, the inflection is also suspiciously imperfect, too much like the natural, unconscious cadence humans use in speech.

We suspect it might actually be pre-recorded to showcase what Figure Robotics is working on rather than a live field test, but if – as the video caption claims – everything really is the result of a neural network and really shows Figure 01 responding in real time, we've just taken another giant leap towards the future.

Drew Turney

Drew is a freelance science and technology journalist with 20 years of experience. After growing up knowing he wanted to change the world, he realized it was easier to write about other people changing it instead. As an expert in science and technology for decades, he’s written everything from reviews of the latest smartphones to deep dives into data centers, cloud computing, security, AI, mixed reality and everything in between.