The Hidden Labor Behind Humanoid Robots: Why Tech Companies Are Paying Workers $15/Hour to Train AI

The race to build humanoid robots has created an unexpected job market: companies like Figure AI, Tesla, and others are hiring thousands of workers worldwide to record everyday hand movements, from folding clothes to opening refrigerators, at wages as low as $15 per hour. This emerging gig economy reveals a critical bottleneck in artificial intelligence development: the scarcity of real-world motion data needed to train robots that can perform physical tasks alongside humans.

Why Is Real-World Motion Data So Valuable for Robot Training?

Humanoid robots don't learn to fold clothes or wash dishes from instruction manuals or simulations alone. They require billions of hours of actual human movement data, captured from real homes and workplaces, to understand the subtle nuances of physical tasks. This is where the data collection boom begins. Micro1, a Palo Alto-based company, has recruited approximately 4,000 workers across 71 countries, collecting over 160,000 hours of video footage each month. Each worker submits at least 10 hours of video per week, alternating between different household tasks.

The scale of this need is staggering. According to Arian Sadeghi, Vice President of Micro1, "1.6 million hours of monthly footage is far from sufficient. We likely need billions of hours. We haven't even begun collecting data on human-to-human interactions; we're still at the most basic level of household tasks". At the current collection rate, gathering the billions of hours needed would take approximately ten thousand years of continuous operation.

How Does the AI-Powered Interview and Video Submission Process Work?

The hiring process itself is unusual. Candidates must first undergo an interview with an AI agent named Zara, who assesses suitability and requests a trial recording video. After approval, workers receive a headband mount, recording guide, and task checklist. There's a catch: you need an iPhone with a LiDAR sensor, meaning at least an iPhone 12 Pro or newer.

The technical requirements are strict. Workers must keep their hands visible in the frame at all times and move at a "natural pace." However, what feels natural to humans often appears too fast on camera, so workers commonly report deliberately slowing down their movements, creating an unnatural, sleepwalking-like effect. After submission, videos undergo dual review by AI and human reviewers, with only about half of all submissions ultimately approved.

  • Rejection Reasons: Videos are rejected for insufficient lighting, hands moving out of frame, movements that are too fast, or unauthorized objects appearing in the background
  • Payment Structure: Workers are paid by the hour, but if a video is rejected, their labor during that time is unpaid
  • Annotation Phase: Videos that pass review enter a second phase where human annotators label action categories, object names, and motion trajectories frame by frame

The creative burden falls heavily on workers. Arjun, a tutor in New Delhi, says it typically takes him an hour to brainstorm enough household tasks to fill a 15-minute recording. Micro1 requires workers to constantly "vary the content," as diverse scenarios are crucial for training effectiveness, but homes are limited in size and creativity eventually runs out.

Why Do U.S. Workers Earn Three Times More Than Workers in India?

There's a stark geographic wage gap in this emerging labor market. Videos from U.S. households sell for significantly higher prices than those from other regions. Ravi Rajalingam, founder of the data labeling company Objectways, explains the reasoning: "Because robot companies assume U.S. consumers will be the first to purchase humanoid robots, operational environment data from American homes is more valuable". For the same task of folding clothes, hands in Los Angeles can earn three times more than hands in Chennai.

This wage disparity reflects a deeper economic reality. While $15 per hour is competitive in Nairobi or Manila, it pales in comparison to the billions of dollars invested in robotic companies. Workers in lower-income countries are providing the foundational data that will power trillion-dollar industries, yet they capture only a fraction of the value they create.

What Is "Ghost Work" and How Does It Apply to Robot Training?

The concept of invisible labor behind AI systems isn't new. In 2019, anthropologist Mary Gray and computer scientist Siddharth Suri published a book called "Ghost Work," which describes the human labor,such as labeling images, filtering inappropriate content, and cleaning training data,that makes AI systems seem intelligent, yet never appears in any product descriptions. Gray discovered that when she asked engineers who was doing this work, responses included "I'm not sure" and "I don't dare to check".

What's different now is the nature of the labor itself. In the past, ghost work primarily occurred in front of screens, involving actions like clicking, labeling, and reviewing. Now, the body itself has become raw material. Gestures such as folding clothes, the rhythm of cooking, or the motion of opening a refrigerator are being collected, priced, and resold. These raw materials flow from ordinary households in India, Nigeria, the Philippines, and Kenya, converge at companies in Palo Alto and San Francisco, and are transformed into products that enter the market.

Nick Couldry and Ulises Mejias proposed a framework called "data colonialism" in their study of the digital economy, meaning that tech companies' appropriation of data structurally continues the historical logic of colonialism's extraction of land and resources, transforming human daily life itself into raw material available for capital extraction.

What Information Do Workers Actually Have About How Their Data Is Used?

Perhaps most troubling is the information asymmetry between workers and the companies collecting their data. Micro1, citing confidentiality, does not disclose its client list to workers. Workers remain unclear about how their data will be stored, whether it will be resold to other third parties, or what specific robots or applications will ultimately use their movements. They sign agreements and receive payment, but they remain at the bottom of the information chain, with little knowledge of the full scope of what they are participating in.

Gray's research on ghost labor revealed something striking: workers often spontaneously find one another and form informal mutual support networks, because the work itself provides almost no support. Isolation is the default state of this kind of labor, and people must rely on each other to sustain a sense of meaning.

How Does This Connect to the Broader Humanoid Robot Market?

The timing is critical. In 2026, the global humanoid robot market is projected to reach $4.23 billion, and by 2027, mass production plans by companies such as Tesla will drive the global cumulative installations past 100,000 units. These robots are likely to enter factories and homes to take over physical labor, and the data used to train them comes from people who currently rely on physical labor to make a living.

The supply chain supporting this growth is equally complex. According to Morgan Stanley's Humanoid 100 report, the humanoid robotics market is projected to reach trillions of dollars in economic impact by 2050, with over one billion humanoid robots potentially in operation. Supply chain bottlenecks in actuators, sensors, wiring, connectors, and power electronics remain one of the biggest barriers to scaling humanoid robotics.

What Is "Tacit Knowledge" and Why Does It Matter for Robot Training?

Philosopher Michael Polanyi wrote in 1958 that "We know more than we can tell." He referred to this as "tacit knowledge," meaning that humans possess vast amounts of knowledge that do not exist in propositional form, but are instead embodied in actions, perceptions, and intuition. Cycling is a common example: you know how to maintain balance, but you cannot write down a set of rules to teach it to someone else. It can only be learned through practice, gradually internalized through observation, imitation, and repetition.

What companies like Figure AI and Tesla are attempting is unprecedented: extracting this tacit knowledge from the human body and converting it into data that machines can process. The camera on a worker's forehead at Micro1 captures not just the motion of folding clothes, but also how the fingers sense the weight of the fabric, how the wrist flips at just the right moment, and how the gaze tracks the edge of the fabric throughout the folding process. Scale AI has announced that it has collected over 100,000 hours of data, and other companies including Encord and DoorDash have launched their own data collection initiatives.

This represents the first time in human history that an attempt has been made to externalize and commodify the embodied knowledge that humans use to navigate the physical world. The implications for labor, inequality, and the future of work remain largely unexamined as the humanoid robot industry accelerates toward mass production.