Why China's Humanoid Robot Industry Is Stuck in Data Silos While Competitors Race Ahead
China's humanoid robot industry is fragmenting at a critical moment, with domestic companies hoarding training data and building separate operating systems instead of pooling resources to compete globally. While international leaders like Tesla's Optimus and Boston Dynamics have already accumulated over one million hours of training data, China's combined embodied AI companies possess only a few hundred thousand hours, according to industry insiders. This data gap threatens to derail the entire sector's commercialization timeline, which experts view as a make-or-break window spanning 2026 to 2027 .
How Much Training Data Do Humanoid Robots Actually Need?
The embodied AI industry consensus is clear: converging a truly capable humanoid robot model requires several million hours of effective training data, possibly even tens of millions. This isn't theoretical speculation. Public data from overseas leaders demonstrates the scale required. Tesla's Optimus and Boston Dynamics have both surpassed the one million-hour mark in their core training datasets. Yet when asked about China's standing, Xu Guoqiang, Director of the Research Ecosystem at Qunxun Intelligence, delivered a sobering assessment .
"The combined data from all domestic embodied intelligence companies might only amount to a few hundred thousand hours. We are a full order of magnitude away from true model convergence," stated Xu Guoqiang.
Xu Guoqiang, Director of the Research Ecosystem at Qunxun Intelligence
The problem extends beyond raw volume. Most domestic data focuses on basic scenarios, such as robotic arms grasping cups or simple repetitive movements. The proportion of effective, high-complexity, cross-scenario data remains extremely low. Even if every top-tier Chinese company pooled their datasets, the combined volume would fall short of what's needed, and the quality would likely fail to meet the standards required for true generalization capabilities .
Why Are Companies Treating Data Like Buried Treasure?
The core issue driving this fragmentation is simple: data has become every company's most jealously guarded asset. In the software era, copying code cost virtually nothing, enabling rapid knowledge sharing and industry-wide progress. But collecting embodied data requires real physical equipment, dedicated space, and significant time investment. Every robot grab, movement, or stumble represents tangible financial expenditure. This fundamental difference has created a mindset where companies treat data as competitive advantage rather than shared infrastructure .
Xu describes the current landscape as companies digging separate wells in a dry riverbed. If they dug a little deeper together, they would hit groundwater. Instead, they hover at the surface, repeatedly collecting basic data while the entire industry wastes massive social resources on redundant efforts. This mirrors the data silos that plagued the early automotive industry, when plant process data, vehicle-to-infrastructure information, and in-car operation data were all treated as top secrets, creating major bottlenecks for intelligent upgrades .
Steps to Break Down Industry Data Barriers
- National-Level Infrastructure: China is pushing for construction of national-level embodied intelligence data training grounds that can serve multiple companies and reduce duplication of effort.
- Academic Integration: Several universities have incorporated embodied intelligence majors into their curricula and integrated data collection into daily student assessments, creating a pipeline of shared training data.
- Hybrid Ecosystem Models: Companies are exploring open-source base platforms combined with proprietary data, allowing shared foundations while protecting competitive advantages in specialized datasets.
Xu acknowledged that this barrier will gradually break down. "The current fragmentation is a specific characteristic of the industry's early stages. In the future, the lines of data sharing will blur," he noted. However, the timeline remains urgent. The 2026 to 2027 period is already viewed as the critical commercialization window for embodied humanoid robots. If the sector cannot cross the fundamental threshold of "data magnitude plus quality," the so-called industry competition might be lost before it even begins .
What's Happening With Robot Operating Systems?
Beyond data silos, a second major bottleneck is emerging: the fragmented landscape of robot operating systems. If data is the fuel, the operating system is the engine that makes everything run. Currently, the industry remains in its infancy, with technology nowhere near achieving cross-scenario universality. Even within a single company, different delivery scenarios often require multiple operating systems running in parallel, making industry-wide standardization impossible at this stage .
Lyu Jun, a research scientist at NonEmpty Intelligence, explained the core issue. "This isn't because companies are doing a poor job," he stated. "Fundamentally, embodied intelligence technology hasn't reached the level of cross-scenario, large-scale application." A model that runs smoothly on one company's wheeled robot might crash entirely when ported to another company's bipedal robot. This technological immaturity means standardization, like what unified the PC era under Windows or created the mobile duopoly of Android and iOS, remains premature .
"Personally, I hope to see one or two companies launch a stable, reliable general operating system compatible with different models and hardware devices. This would reduce homogeneous competition and encourage resource integration, allowing us to focus limited power on core technology R&D," explained Lyu Jun.
Lyu Jun, Research Scientist at NonEmpty Intelligence
The industry faces a paradox. In fields where technology is immature and basic science has yet to see breakthroughs, premature and excessive homogeneous competition crowds out scarce scientific research resources. If every company must reinvent the wheel, write underlying drivers from scratch, and build their own operating system, who will be left to solve the true core challenges, such as enabling robots to understand vague human commands or achieve autonomous cross-scenario adaptation .
Lyu Jun elevated the perspective to a national level: "Embodied intelligence is a cause requiring massive investment, and it concerns national-level technological competition. We cannot afford internal friction." If domestic companies fail to form synergy at the operating system level, what they face in the future may not be commercial competition, but a technological crushing defeat on the international stage .
What's the Timeline for Industry Breakthrough?
The 4th Embodied Intelligent Robot Industry Development Forum, held in spring 2026, revealed both progress and anxiety. Speakers showcased proprietary achievements, with companies claiming they had built their own base models, developed full-stack operating systems, and accumulated specific hours of training data. Yet beneath the surface, a nagging concern persisted: everyone is fighting their own battles in isolation. The question isn't moral but mathematical: are these limited resources, scarce data, and top talent actually accelerating the industry, or just fueling massive waste .
The turning point is budding. China is actively constructing national-level embodied intelligence data training grounds, and universities are incorporating embodied intelligence into curricula. These structural changes suggest the data sharing barrier will gradually dissolve. However, the critical question remains: can the industry afford to wait? The 2026 to 2027 window represents the make-or-break moment for commercialization of embodied humanoid robots. If the sector cannot achieve the fundamental threshold of data magnitude and quality, the entire competitive race may already be decided .