Why is DeepSeek such a game-changer? Scientists explain how the AI models work and why they were so cheap to build.

DeepSeek is a new artificial intelligence (AI) model from China.

(Image credit: Thomas Fuller/SOPA Images/LightRocket via Getty Images)

Less than two weeks ago, a scarcely known Chinese company released its latest artificial intelligence (AI) model and sent shockwaves around the world.

DeepSeek claimed in a technical paper uploaded to GitHub that its open-weight R1 model achieved comparable or better results than AI models made by some of the leading Silicon Valley giants — namely OpenAI's ChatGPT, Meta’s Llama and Anthropic's Claude. And most staggeringly, the model achieved these results while being trained and run at a fraction of the cost.

And Nvidia, a company that makes high-end H100 graphics chips presumed essential for AI training, lost $589 billion in valuation in the biggest one-day market loss in U.S. history. DeepSeek, after all, said it trained its AI model without them — though it did use less-powerful Nvidia chips. U.S. tech companies responded with panic and ire, with OpenAI representatives even suggesting that DeepSeek plagiarized parts of its models.

This efficiency extends to the training of DeepSeek's models, which experts cite as an unintended consequence of U.S. export restrictions. China's access to Nvidia's state-of-the-art H100 chips is limited, so DeepSeek claims it instead built its models using H800 chips, which have a reduced chip-to-chip data transfer rate. Nvidia designed this "weaker" chip in 2023 specifically to circumvent the export controls.

The need to use these less-powerful chips forced DeepSeek to make another significant breakthrough: its mixed precision framework. Instead of representing all of its model's weights (the numbers that set the strength of the connection between an AI model's artificial neurons) using 32-bit floating point numbers (FP32), it trained a parts of its model with less-precise 8-bit numbers (FP8), switching only to 32 bits for harder calculations where accuracy matters.

All of this adds up to a startlingly efficient pair of models. While the training costs of DeepSeek's competitors run into the tens of millions to hundreds of millions of dollars and often take several months, DeepSeek representatives say the company trained V3 in two months for just $5.58 million. DeepSeek V3's running costs are similarly low — 21 times cheaper to run than Anthropic's Claude 3.5 Sonnet.

"As cheaper, more efficient methods for developing cutting-edge AI models become publicly available, they can allow more researchers worldwide to pursue cutting-edge LLM development, potentially speeding up scientific progress and application creation," Cao said. "At the same time, this lower barrier to entry raises new regulatory challenges — beyond just the U.S.-China rivalry — about the misuse or potentially destabilizing effects of advanced AI by state and non-state actors."

TOPICS

Ben Turner is a U.K. based writer and editor at Live Science. He covers physics and astronomy, tech and climate change. He graduated from University College London with a degree in particle physics before training as a journalist. When he's not writing, Ben enjoys reading literature, playing the guitar and embarrassing himself with chess.

Why is DeepSeek such a game-changer? Scientists explain how the AI models work and why they were so cheap to build.

What makes DeepSeek's models tick?

A more efficient type of large language model

RELATED STORIES