AI chatbots need to be much better at remembering things. Have scientists just cracked their terrible memory problem?

Brain illustration dissolving.
Chatbots like ChatGPT begin to fail if you have a conversation that's long enough, and haven't yet been able to remember details between seperate conversations. (Image credit: Eoneren via Getty Images)

Artificial intelligence (AI) chatbots are terrible at remembering things — both between separate conversations and even during the same conversation. But two recent breakthroughs might completely change this.

If you talk to a large language model (LLM) like OpenAI's ChatGPT for long enough, it will begin to forget crucial pieces of information — especially if the conversation stretches on for more than 4 million words of input. Its performance then begins to deteriorate rapidly. 

Meanwhile, ChatGPT and other LLMs can't retain information between conversations. For example, if you finish one conversation and reboot ChatGPT a week later, the chatbot won't remember anything from the previous exchange. 

But two separate teams have potentially found solutions to these memory issues. A team of scientists led by the Massachusetts Institute of Technology (MIT) have pinpointed the reason AI forgets things mid-conversation and come up with a method to fix it, while developers at OpenAI have begun testing long-term memory, in which you can tell ChatGPT to remember parts of conversations, ask it what it remembers and later tell it to forget something — or wipe its memory completely. 

Improving mid-conversation performance 

The scientists found that they could improve chatbots' short-term memory by changing how the key-value cache — the chatbot's short-term memory — stores and replaces tokens, where one token is a chunk of input text. The scientists dubbed their new approach "StreamingLLM" and presented their findings in a paper published on Dec. 12, 2023 in the pre-print server arXiv

Related: ChatGPT will lie, cheat and use insider trading when under pressure to make money, research shows

A chatbot's memory is limited, so it evicts the oldest tokens and replaces them with newer tokens as the conversation continues. But applying StreamingLLM to an LLM means it can retain the first four tokens — before evicting the fifth token onwards. This means it will still forget things — because of the nature of its limited memory — but remember the very first interactions.

The order of the tokens (and whether they are labeled first, second, third, and so on) also matters because they feed into an "attention map" for the active conversation. This maps out how strongly each token relates to other tokens.

For example, if the fifth token is evicted, you may expect the sixth token to become the new fifth token. But for StreamingLLM to work, tokens must remain encoded as they were originally. In this example, the sixth token must not be encoded as the new "fifth" token just because it is now fifth in line — but remain encoded as the sixth token. 

Illustration of a network of neurons with glowing connections against a black background

Tokens feed into an "attention map" for each conversation, with the AI chatbot forging links between tokens and determining their relevance to one another. (Image credit: Andriy Onufriyenko via Getty Images)

These two changes mean a chatbot performs just as effectively beyond 4 million words as it did before, the scientists said in their paper. It's also 22 times faster than another short-term memory method that avoids performance crashing by constantly recomputing part of the earlier conversation.

"Now, with this method, we can persistently deploy these large language models. By making a chatbot that we can always chat with, and that can always respond to us based on our recent conversations, we could use these chatbots in some new applications," said study lead author Guangxuan Xiao, an electrical engineering and computer science graduate student at MIT, in a statement.

StreamingLLM has already been incorporated into Nvidia's open source LLM model optimization library called TensorRT-LLM — which is used by developers as a foundation for their own AI models. The researchers also plan to improve StreamingLLM by designing it to find and reincorporate tokens that have been evicted if they're needed again.

ChatGPT will never forget

OpenAI is also testing a method to improve ChatGPT's long-term memory, so that users can continue conversations and effectively build a working relationship with the AI chatbot.

When conversing with the LLM, users can ask ChatGPT to remember something specific or to grant it autonomy to remember elements of the conversation that it deems appropriate to store for later. These memories are not linked with specific conversations, so deleting chats does not erase memories — the memory itself must be deleted in a separate interface. Unless these are manually deleted, starting a new chat will pre-load ChatGPT with previously saved memories. 

OpenAI provided several examples of how this would be useful. In one example, the chatbot remembers that a kindergarten teacher with 25 students prefers 50-minute lessons with follow-up activities, and recalls this information when helping them create a lesson plan. In another, somebody tells ChatGPT their toddler loves jellyfish — and the AI tool remembers this when designing a birthday card for them. 

The company has rolled out the new memory features to a small portion of ChatGPT users, representatives said in a statement on Feb. 13, ahead of a planned broader rollout to all users. 

OpenAI will use information from memories to improve its models, company representatives said in the statement. They added, however, that scientists are taking steps to assess and mitigate biases and prevent ChatGPT from remembering sensitive information like health details unless a user explicitly asks it to. Users with memory access can also use a "temporary chat" in which memory is deactivated entirely. 

Keumars Afifi-Sabet
Channel Editor, Technology

Keumars is the technology editor at Live Science. He has written for a variety of publications including ITPro, The Week Digital, ComputerActive, The Independent, The Observer, Metro and TechRadar Pro. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro. He is an NCTJ-qualified journalist and has a degree in biomedical sciences from Queen Mary, University of London. He's also registered as a foundational chartered manager with the Chartered Management Institute (CMI), having qualified as a Level 3 Team leader with distinction in 2023.