Researchers gave AI an 'inner monologue' and it massively improved its performance

Digital generated image of multi coloured gear wheels connected together in shape of brain on grey background.

Training an AI model to think before it spoke doubled its performance levels. (Image credit: Andriy Onufriyenko via Getty Images)

Giving artificial intelligence (AI) systems an "inner monologue" makes them considerably better at reasoning, new research shows.

The method trains AI systems to think before they respond to prompts, just as many people consider what we should say next before we speak. This is different from the way scientists have trained mainstay AI chatbots, like ChatGPT, which don't "think" about what they write or anticipate different possibilities for the next steps in a conversation.

Dubbed "Quiet-STaR," the new method instructs an AI system to generate many inner rationales in parallel before responding to a conversational prompt. When the AI answers prompts, it generates a mixture of these predictions with and without a rationale, printing the best answer — which can be verified by a human participant depending on the nature of the question.

Finally, it learns by discarding rationales that proved incorrect. In effect, the training method gives AI agents the capacity to anticipate future conversations and learn from ongoing ones.

The researchers applied the Quiet-STaR algorithm to Mistral 7B, an open-source large language model (LLM), and posted the results March 14 to the pre-print database arXiv. (The paper has not yet been peer-reviewed.)

The Quiet-STaR-trained version of Mistral 7B scored 47.2% on a reasoning test versus 36.3% before any training. It still flunked a school math test, earning a score of 10.9%. But that was nearly double the starting score of 5.9% in the vanilla version.

Models like ChatGPT and Gemini are built from neural networks — collections of machine learning algorithms arranged in a way that mimics the structure and learning patterns of the human brain. However, systems built using this architecture are abysmal at common sense reasoning or contextualization — and AI chatbots do not have genuine "understanding."

—AI chatbots need to be much better at remembering things. Have scientists just cracked their terrible memory problem?

—New Chinese AI model 'better than industry leader' in key metrics

Past attempts to improve the reasoning capabilities of LLMs have been highly domain-specific and could not be applied to different types of AI models.

The self-taught reasoner (STaR) algorithm, which the researchers used as a basis for their work, is one example of such a training algorithm — but is held back by these limitations.

The scientists who developed Quiet-STaR named it that because the principles of STaR can be applied quietly in the background and generally over several different types of LLM, independent of the original training data. Now they want to investigate how techniques like theirs can reduce the gap between neural network-based AI systems and human-like reasoning capabilities.

Keumars is the technology editor at Live Science. He has written for a variety of publications including ITPro, The Week Digital, ComputerActive, The Independent, The Observer, Metro and TechRadar Pro. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro. He is an NCTJ-qualified journalist and has a degree in biomedical sciences from Queen Mary, University of London. He's also registered as a foundational chartered manager with the Chartered Management Institute (CMI), having qualified as a Level 3 Team leader with distinction in 2023.