Deepseek Engram is a new artificial intelligence model training method designed to decouple memory storage from computational processes. This method, created in collaboration with Peking University, is expected to reduce dependence on expensive HBM memories used for AI inference and training.
Traditional large language models (LLM) require high-bandwidth memory for knowledge retrieval and basic computation, creating a bottleneck in both performance and cost. This HBM bottleneck is widely recognized as a key reason why the prices of DRAM and NAND used in client equipment have risen exponentially in recent months.
DeepSeek, the Chinese artificial intelligence startup, has already caused an economic, stock market, technological and geostrategic earthquake, by demonstrating that AI models can be trained and used in a much more efficient and economical way. And they are there.
How Deepseek Engram works
Chinese researchers explain that existing models waste sequential depth on trivial operationswhich could otherwise support higher-level reasoning and ensure that the Deepseek Engram allows models to “search” essential information efficiently without overloading GPU memory, freeing up capacity for more complex reasoning tasks.
The system was tested on a 27 billion parameter model and showed measurable improvements compared to industry standard benchmarks. By performing knowledge retrieval via hash N-grams, Engram provides static memory access independent of the current context.
The retrieved information is then adjusted by a context-sensitive control mechanism to align with the hidden state of the model. This design allows them to handle long context entries more efficiently and supports system-level prefetching with minimal performance overhead.
The Engram method complements other hardware-efficient approaches, including solutions like Phison’s AI inference accelerators. It also works in conjunction with emerging CXL (Compute Express Link) standards, which aim to overcome GPU memory bottlenecks in large-scale AI workloads.
Engram minimizes the amount of high-speed memory required by using static information lookups, making it memory usage more efficient. Tests show that reallocating about 20-25% of the sparse parameter budget to Engram produces better performance than pure MoE models, maintaining stable gains at different scales.
This technique can relieve pressure on expensive memory hardwareparticularly in regions such as China, where access to HBM lags behind competitors such as Samsung, SK Hynix and Micron. Engram’s early validation suggests that the models can expand parameter scaling and reasoning ability while managing memory demands more efficiently. It is expected that this approach can reduce the astronomical costs for AI infrastructure and thus alleviate the rest of the electronic markets collapsed due to lack of supply.
