DeepSeek's Clever Shortcut: How Chinese AI Labs Are Using Claude to Train Reasoning Models

DeepSeek used Anthropic's Claude API to extract synthetic reasoning data, grading signals, and alignment techniques that helped power its reasoning model, DeepSeek-R1, according to detailed accusations from Anthropic released in February 2026. The Chinese AI lab was responsible for approximately 150,000 of 16 million total API exchanges in what Anthropic describes as coordinated "industrial-scale distillation attacks" involving three Chinese AI companies .

What Exactly Is DeepSeek Accused of Doing?

DeepSeek's strategy was notably different from other labs in the distillation campaign. Rather than simply copying Claude's outputs at massive scale, DeepSeek focused on three specific capabilities that would be expensive and time-consuming to build from scratch .

  • Reasoning Extraction: DeepSeek crafted prompts asking Claude to "imagine and articulate the internal reasoning behind a completed response and write it out step by step." Since Claude's API doesn't typically expose its chain-of-thought reasoning process, this generated synthetic reasoning-trace data at scale, exactly the kind of expensive training data needed to build a thinking model like DeepSeek-R1.
  • Reward Model Training: DeepSeek used Claude as a grading system, feeding in its own model outputs and having Claude score them against specific rubrics. This provided free reinforcement learning signals without needing human annotators, essentially outsourcing the reinforcement learning from human feedback (RLHF) process to a competitor's API.
  • Censorship-Safe Alignment: DeepSeek prompted Claude to generate "safe" alternatives to politically sensitive queries about dissidents and authoritarianism, then used those responses to train its own model on how to handle censorship according to Chinese government standards.

The scale of DeepSeek's operation was relatively modest compared to other labs in the campaign. DeepSeek accounted for less than 1% of the 16 million total exchanges Anthropic documented, yet the targeting was highly strategic and focused on capabilities that would be most valuable for building advanced reasoning models .

How Does Knowledge Distillation Actually Work?

To understand why this matters, it helps to know what knowledge distillation is and why it's become such a powerful technique. Knowledge distillation was formalized in 2015 by Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, three of the most influential figures in deep learning . The original concept is straightforward: train a smaller model to match the full probability distribution of a larger, more capable model.

Traditionally, distillation requires access to the teacher model's internal parameters and probability outputs. A student model learns not just the final answers but the teacher's confidence levels across all possible answers. When the teacher is 85% confident about a particular response, the student learns to replicate that exact uncertainty pattern. This approach, which Hinton called "dark knowledge," allows smaller models to achieve performance far beyond what their size would normally suggest .

The problem for Chinese AI labs and smaller open-source efforts is that they don't have access to the best models' internal weights and parameters. They can only interact through the chat interface or API, receiving only the final text outputs. This is where the workaround comes in: even without seeing internal probabilities, you can still learn from a model's text outputs alone by asking thousands or millions of carefully chosen questions and collecting its answers, explanations, code completions, tool calls, and refusals .

Why Should You Care About This Distillation Controversy?

The distillation accusations against DeepSeek, Moonshot AI, and MiniMax represent a fundamental challenge to how AI companies protect their competitive advantages. Anthropic's report describes a sophisticated operation involving 24,000 fake accounts, distributed networks of API accounts managed through commercial proxy services, and coordinated infrastructure designed to evade detection .

What makes DeepSeek's approach particularly noteworthy is its efficiency and focus. While MiniMax drove over 13 million exchanges attempting broad-scale imitation of Claude's capabilities, and Moonshot AI conducted 3.4 million exchanges targeting agentic reasoning and computer-use agents, DeepSeek's smaller volume of 150,000 exchanges was precisely targeted at the most valuable and expensive-to-create training signals: reasoning traces, reward model data, and alignment techniques .

The timing is significant. Moonshot released Kimi K2.5 in January 2026 as an open-source model that outperformed Claude 3.5 Sonnet on some coding benchmarks at 90% lower cost. The community called it a second "DeepSeek moment," suggesting that if the distillation accusations are accurate, these competitive capabilities may have originated from Claude's API rather than independent development .

How to Understand the Broader Context of AI Model Training

  • Knowledge Distillation History: The technique itself is not new or inherently unethical. Geoffrey Hinton and other pioneers developed it as a legitimate way to compress large models into smaller, more efficient versions. The controversy centers on whether using a competitor's API to extract training data without explicit permission constitutes misuse.
  • The Cost of Reasoning Data: Building training data for reasoning models like DeepSeek-R1 is extraordinarily expensive because it requires people to manually write out step-by-step reasoning for hundreds of thousands of examples. Using an API to generate this data synthetically represents a massive cost savings and explains why reasoning extraction was DeepSeek's primary focus.
  • Competitive Pressure in AI: The distillation campaign reflects intense competition between Chinese and American AI labs. Smaller labs and open-source efforts face a fundamental disadvantage: they cannot match the training compute and human annotation budgets of frontier AI companies like Anthropic and OpenAI, making API-based distillation an attractive shortcut.

The accusations against DeepSeek and other Chinese labs arrived alongside similar claims from OpenAI, which sent a memo to the U.S. House Select Committee on China accusing DeepSeek of "free-riding" on U.S. frontier-model capabilities, and Google, which documented a 100,000-prompt campaign targeting Gemini's reasoning traces. All three reports were released within 11 days of each other in mid-February 2026 .

What remains unresolved is whether this represents a genuine security threat to AI development or a natural consequence of making powerful models available through public APIs. The technical sophistication of the distillation operation is undeniable, but so is the fact that knowledge distillation itself is a legitimate, published technique developed by some of the field's most respected researchers.