NEWS They Overtook ChatGPT, Cost Pennies, and Surprised Again. DeepSeek Proves That AI Can Be Cheap

ExcalibuR

Legend
LEGEND
PREMIUM
MEMBER
Joined
Jan 17, 2025
Messages
4,031
Reaction score
7,800
Deposit
11,800$
They Overtook ChatGPT, Cost Pennies, and Surprised Again. DeepSeek Proves That AI Can Be Cheap
The economics of large language models are now changed forever.
1759370286185.png
The Chinese company DeepSeek has introduced the experimental version of its language model, DeepSeek-V3.2-Exp, which for the first time implements its own variant of sparse attention—a technique that significantly reduces computational costs when processing long text sequences. The new mechanism is called DeepSeek Sparse Attention and, according to the developers, can reduce the model's operating costs by almost half. To confirm the savings, the company reduced API usage prices by 50%.

The issue of computational load in large language models is particularly acute in long dialogues. The classic Transformer architecture, developed in 2017, compares each word in the input sequence with all others, leading to a quadratic increase in the number of operations. With an input of a thousand words, that's already a million comparisons, and with ten thousand—a hundred million. This cost growth makes long sessions resource-intensive and slows down work, as with each new query the system is forced to re-analyze the entire dialogue history.

The sparse attention technology works differently. It does not match every word with all others but selects a limited set of the most significant connections. DeepSeek uses its own mechanism called the "lightning indexer" for this—a small additional neural network block that assesses the significance of word pairs and selects up to 2048 of the most relevant connections for each position. The company has not disclosed the details of how the indexer makes decisions but claims that text comprehension quality does not suffer.

Internal tests showed that the new model delivers comparable results to the previous version, DeepSeek-V3.1-Terminus, while maintaining high accuracy and the ability to process long sequences. Notably, DeepSeek has open-sourced the components under an MIT license and provided open weights, allowing other researchers to verify and develop the proposed solutions.

DeepSeek first made headlines in January when its R1 model managed to reach the level of OpenAI's o1 with training costs of only $6 million. Furthermore, the company's chat app briefly reached the top spot in the iPhone App Store, overtaking ChatGPT. Since then, industry attention has been focused on the Chinese lab, which is forced to find ways to optimize computations due to limited access to modern GPUs and other specialized chips under export restrictions.

Although sparse attention as an approach has long been known and was first used in GPT-3 and a number of other models from Western developers, DeepSeek claims that its implementation has achieved fine-tuning and a real reduction in computational cost without tangible loss of quality. Independent experts have not yet confirmed these results; however, if the company's findings prove correct, such methods could seriously change the economics of using AI models in the long term.
 
Top Bottom