Large language models can be squeezed onto your phone — rather than needing 1000s of servers to run — after breakthrough

Running massive AI models locally on smartphones or laptops may be possible after a new compression algorithm trims down their size — meaning your data never leaves your device. The catch is that it might drain your battery in an hour.

The logos of Google Gemini, ChatGPT, Microsoft Copilot, Claude by Anthropic, Perplexity, and Bing apps are displayed on the screen of a smartphone in Reno, United States, on November 21, 2024.
(Image credit: Jaque Silva/NurPhoto via Getty Images)

Powerful artificial intelligence (AI) models like ChatGPT need copious amounts of power to run so they are usually housed in vast data centers. But a new breakthrough could compress these AI models so they fit onto a smartphone or laptop.

A new algorithm, dubbed Calibration Aware Low precision Decomposition with Low Rank Adaptation (CALDERA), compresses the massive amounts of data needed to run a large language model (LLM) by trimming redundancies in the code and reducing the precision of its layers of information.

Keumars Afifi-Sabet
Channel Editor, Technology

Keumars is the technology editor at Live Science. He has written for a variety of publications including ITPro, The Week Digital, ComputerActive, The Independent, The Observer, Metro and TechRadar Pro. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro. He is an NCTJ-qualified journalist and has a degree in biomedical sciences from Queen Mary, University of London. He's also registered as a foundational chartered manager with the Chartered Management Institute (CMI), having qualified as a Level 3 Team leader with distinction in 2023.