Robots receive major intelligence boost thanks to Google DeepMind's 'thinking AI' — a pair of models that help machines understand the world

Screenshot of a robot from Google DeepMind performing multiple tasks at once.

(Image credit: Google DeepMind. Retrieved from Youtube.)

Google DeepMind has unveiled a pair of artificial intelligence (AI) models that will enable robots to perform complex general tasks and reason in a way that was previously impossible.

Earlier this year, the company revealed the first iteration of Gemini Robotics, an AI model based on its Gemini large language model (LLM) — but specialized for robotics. This allowed machines to reason and perform simple tasks in physical spaces.

The baseline example Google points to is the banana test. The original AI model was capable of receiving a simple instruction like "place this banana in the basket," and guiding a robotic arm to complete that command.

Powered by the two new models, a robot can now take a selection of fruit and sort them into individual containers based on color. In one demonstration, a pair of robotic arms (the company's Aloha 2 robot) accurately sorts a banana, an apple and a lime onto three plates of the appropriate color. Further, the robot explains in natural language what it's doing and why as it performs the task.

Gemini Robotics 1.5: Thinking while acting - YouTube

Watch On

"We enable it to think," said Jie Tan, a senior staff research scientist at DeepMind, in the video. "It can perceive the environment, think step-by-step and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks."

AI-powered robotics of tomorrow

While the demonstration may seem simple on the surface, it demonstrates a number of sophisticated capabilities. The robot can spatially locate the fruit and the plates, identify the fruit and the color of all of the objects, match the fruit to the plates according to shared characteristics and provide a natural language output describing its reasoning.

It's all possible because of the way the newest iterations of the AI models interact. They work together in much the same way a supervisor and worker do.

Google Robotics-ER 1.5 (the "brain") is a vision-language model (VLM) that gathers information about a space and the objects located within it, processes natural language commands and can utilize advanced reasoning and tools to send instructions to Google Robotics 1.5 (the "hands and eyes"), a vision-language-action (VLA) model. Google Robotics 1.5 matches those instructions to its visual understanding of a space and builds a plan before executing them, providing feedback about its processes and reasoning throughout.

The two models are more capable than previous versions and can use tools like Google Search to complete tasks.

The team demonstrated this capacity by having a researcher ask Aloha to use recycling rules based on her location to sort some objects into compost, recycling and trash bins. The robot recognized that the user was located in San Francisco and found recycling rules on the internet to help it accurately sort trash into the appropriate receptacles.

Another advance represented in the new models is the ability to learn (and apply that learning) across multiple robotics systems. DeepMind representatives said in a statement that any learning gleaned across its Aloha 2 robot (the pair of robotics arms), Apollo humanoid robot and bi-arm Franka robot can be applied to any other system due to the generalized way the models learn and evolve.

—Google AI 'is sentient,' software engineer claims before being suspended

—Google DeepMind's robotic arm can now beat humans at table tennis

"General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control," the Gemini Robotics Team said in a technical report on the new models. That kind of generalized reasoning means that the models can approach a problem with a broad understanding of physical spaces and interactions and problem-solve accordingly, breaking tasks down into small, individual steps that can be easily executed. This contrasts with earlier approaches, which relied on specialized knowledge that only applied to very specific, narrow situations and individual robots.

The scientists provided an additional example of how robots could help in a real-world scenario. They presented an Apollo robot with two bins and asked it to sort clothes by color — with whites going into one bin and other colors into the other. They then added an additional hurdle as the task progressed by moving the clothes and bins around, forcing the robot to reevaluate the physical space and react accordingly, which it managed successfully.

Alan is a freelance tech and entertainment journalist who specializes in computers, laptops, and video games. He's previously written for sites like PC Gamer, GamesRadar, and Rolling Stone. If you need advice on tech, or help finding the best tech deals, Alan is your man.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.