On Tuesday, Chinese GPU (Graphics Processing Unit) maker Moore Threads announced the rapid deployment of DeepSeek’s distilled model inference services, enabling large-scale model capabilities to be transferred to smaller, more efficient versions for high-performance inference on domestic GPUs.
Why it matters: This deployment strengthens China’s AI ecosystem by integrating domestic AI hardware (GPUs) with homegrown large models, helping to reduce reliance on foreign technology.
Details: Based on its self-developed GPU, Moore Threads has quickly deployed inference services for the DeepSeek distilled model through a dual-engine approach that combines open-source and proprietary technologies.
- According to the announcement, Moore Threads has deployed the DeepSeek-R1-Distill-Qwen-7B distilled model based on the Ollama open-source framework. This model showcases enhanced performance across various Chinese tasks and confirms the versatility of its self-developed GPU and its compatibility with CUDA (Compute Unified Device Architecture), the company said.
- The GPU maker claimed that its self-developed high-performance inference engine enhances model computational efficiency and resource utilization through customized operator acceleration and memory management. This engine ensures the efficient operation of DeepSeek’s distilled model while laying a strong foundation for future large-scale model deployments, the company added.
- Moore Threads said it aims to empower more developers in AI innovation using its GPU by deploying inference services for the DeepSeek distilled model.
Context: Within just 20 days of launch, DeepSeek’s daily active users (DAUs) surged past 20 million, local media outlet Sina Tech reported today. In the global AI product DAU rankings, ChatGPT leads with 53.23 million users, followed by DeepSeek with 22.15 million, while ByteDance’s Doubao ranks third with 16.95 million.
Related