Themes - DeepSeek, Kimi And AI Efficiency Paradigm Shift (Pt.1)

Summary
- DeepSeek is a true innovator, not a copycat. Its model architecture, algos, training and inference frameworks are all firsts in the industry.
- DeepSeek hasn't used smuggled H100s, just 2048 H800s, as per its disclosure.
- For the near term, it should continue to use NVDA cards, but over the longer term, it has a good chance to build its own training library plus other AI ASICs, like Huawei's Ascend AI chips.
- DeepSeek is shifting the GPU demand from memory bound to compute bound again. Its amazing parallelism strategy could enable more distributed training and even larger clusters.
- Its inference cluster is also changing the single card/node inferencing paradigm, currently adopted by all other LLM builders, which will affect various vendors disproportionately, which we will discuss in Part 2.
DeepSeek has unexpectedly dominated headlines since 20th January 2025, following the open-source release of its DeepSeek R1 model. The response has been remarkable, particularly given that its predecessor, DeepSeek R1-lite, was released on its official website on 20th November 2024, yet garnered no media attention whatsoever. While we’ve been evaluating R1-lite against competitors like o1 and o1 mini for months, the absence of DeepSeek R1 on platforms like Poe (a popular AI benchmarking tool) initially puzzled us — until we realized the model hadn’t been open-sourced until this January.
The open-sourcing of R1 is the pivotal moment that thrust DeepSeek into the spotlight. In contrast, Moonshot AI, its Chinese startup rival, released Kimi 1.5 on the same day, 20th January. Though Kimi 1.5 reportedly outperforms R1 in certain tasks, it failed to generate comparable buzz. Why? DeepSeek’s strategy — full transparency, replicable training pipelines, and a diverse ecosystem of models (including distilled versions of BABA’s Qwen and Meta’s Llama) — resonates deeply with the open-source community. This move isn’t just about catching up to GPT models; it’s about democratizing access to cutting-edge reasoning capabilities, validating the power of open-source collaboration.
The story didn’t stop at the technical community. After trending among open-source developers, it caught the eye of AI experts on platforms like X, then cascaded into mainstream media and public discourse. Beyond the hype, however, lie deeper implications: DeepSeek’s innovations — such as its efficient model architecture, training framework, and cross-model distillation techniques — could redefine industry standards for transparency and scalability.
In short, DeepSeek’s open-source ethos isn’t just a PR win; it’s a testament to how collaboration can accelerate progress in AI. As the community races to replicate, refine, and build upon R1, the industry may witness a true paradigm shift — one driven not by proprietary tech, but by shared knowledge.
In this article, we will briefly share what we know and offer our unique insights:
On the AI model front:
- The story and background of DeepSeek, and High Flyer, the quant hedge fund behind DeepSeek
- DeepSeek's differentiation
- DeepSeek and Kimi's algo innovation
- The response from OpenAI, META, and X.ai
- Why you should avoid Baidu (BIDU) as an AI play
- Major Chinese AI vendors and their prospects
On AI chip front:
- The chips used by DeepSeek and Chinese AI vendors
- The impact of the dramatic decrease in cost, and the true cost of DeepSeek
- CUDA moat and training stack
- Inference stack and future evolution
(The images provided in the article are from the DeepSeek V3 and R1 technical reports, unless specifically sourced)
DeepSeek and High Flyer
High Flyer, a quantitative trading firm founded in 2015, aims to capitalize on inefficiencies and arbitrage opportunities in China’s stock market. High Flyer focused on hiring new and promising talents locally, unlike its peers who leveraged expertise from leading Western quant firms like Jane Street, Two Sigma, DE Shaw, and Citadel. Unlike its peers, High Flyer recruits young talent with deep expertise in mathematics, high-performance computing, and software development. Unlike early entrants in China’s quant industry (which emerged in the early 2010s), High Flyer entered the market later. To compensate, it adopted an aggressive strategy to establish dominance through marketing, talent acquisition, and technological innovation — particularly in AI.