Themes - DeepSeek, Kimi And AI Efficiency Paradigm Shift (Pt.2)

Themes - DeepSeek, Kimi And AI Efficiency Paradigm Shift (Pt.2)

Summary

  • DeepSeek has multifold lower opex and cluster costs than Western rivals, but its total capex is likely a lot closer.
  • Despite US chip restrictions, DeepSeek has sustained its AI advancements through innovative training strategies.
  • DeepSeek is scaling up with potential outside funding and deeper collaboration with Huawei.

Training

One of the most misunderstood aspects of DeepSeek is its infrastructure expertise. Thanks to High Flyer's previous work, DeepSeek has developed a leading-edge understanding of operating and optimizing large GPU clusters. In fact, before 2022, High Flyer identified bugs that no one else had found while deploying large-scale GPU clusters. The company shared a new network optimization solution with NVDA, which later used it as a template for other customers.

Even before the AI boom sparked by ChatGPT, High Flyer was already deeply experienced in large-scale distributed deep learning (DL) training. Much like other quantitative firms, High Flyer has always focused on maximizing the efficiency of its hardware infrastructure. Its overarching goal has been to achieve a 2x to 3x overall efficiency gain by fully utilizing the potential of its hardware.

A key factor in this efficiency gain is the development of proprietary numerical operators that NVDA hadn't pre-built for customers in CUDA. In some cases, High Flyer was able to achieve gains of 25% to 500% by creating its own low-level operators, further boosting performance.

DeepSeek has summarised its training very well: "Pre-Training: Towards Ultimate Training Efficiency.

Contact Footer Example