DeepSeek researchers have developed a technology called Manifold-Constrained Hyper-Connections, or mHC, that can improve the performance of artificial intelligence models.
The Chinese AI lab debuted the software in a paper published on Wednesday.
DeepSeek created mHC to enhance the so-called residual connection mechanism that large language models use to learn new information. The mechanism, which was invented in 2015, also ships with many vision models. DeepSeek is not the first market player to have tried to improve upon residual connections, but earlier attempts had mixed results.
An AI model comprises numerous software components called layers. When a user enters a prompt, the text enters the first layer, which performs a small portion of the calculations necessary to generate a prompt response. The first layer sends the results of its calculations to the second layer, which completes another portion of the work, passes its results to the third layer and so forth. The last layer outputs an answer to the user’s question.
The last layer plays a key role in the AI training process. If a model outputs an incorrect prompt response, the last layer receives a so-called gradient. A gradient is a signal that indicates the AI made a mistake. It also contains information on how the model can improve. The gradient enters the last layer and travels backwards through the rest of the AI’s structure until it reaches the first layer.
In 2015, researchers invented a gradient management mechanism known as a residual connection. It’s a shortcut that enables the gradient to directly travel between two distant AI layers without having to go through all the layers in between. Residual connections mitigate several common AI training errors, which is the reason they’re widely used in LLMs and vision models.
Last September, researchers debuted an alternative to residual connections called Hyper-Connections. It addresses several of the mechanism’s shortcomings but comes with limitations of its own. The mHC architecture introduced by DeepSeek this week is an enhanced implementation of Hyper-Connections. It avoids several of the technical challenges associated with the latter mechanism, which makes it more suitable for production use.
The primary innovation in mHC is that it incorporates a so-called manifold. Manifolds are a broad family of mathematical objects that vary significantly in complexity. Some manifolds are simple geometric shapes such as circles, while others span more than three dimensions. DeepSeek says that mHC uses a manifold to maintain the stability of gradients while they travel between an AI model’s layers.
The company put the architecture to the test by using it to train 3 LLMs with 3 billion, 9 billion and 27 billion parameters. It then trained three other models with identical parameter counts using Hyper-Connections, the technology from which mHC is derived. According to DeepSeek, the mHC-powered LLMs performed better across eight different AI benchmarks.
The company says that the architecture is also more hardware-efficient than Hyper-Connections. The latter mechanism significantly increases LLMs’ memory requirements during training. In its internal tests, DeepSeek determined that mHC incurs a hardware overhead of only 6.27%.
“By deepening the understanding of how topological structures influence optimization and representation learning, mHC will help address current limitations and potentially illuminate new pathways for the evolution of next-generation foundational architectures,” DeepSeek researchers wrote in the mHC paper.
Image: Unsplash
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
