Why Boston Dynamics' Spot Robot Now Understands the Physical World Like Humans Do
Boston Dynamics has given its Spot robot a major upgrade: the ability to reason about the physical world in ways that align with human understanding. The quadruped robot is now equipped with Google DeepMind's Gemini Robotics-ER 1.6, a high-level reasoning model that transforms how Spot approaches complex industrial inspection tasks. This partnership marks a significant step toward robots that can work autonomously without constant human instruction .
What Does It Mean for a Robot to "Understand" the Physical World?
The word "understanding" gets thrown around a lot in robotics and AI, but what it actually means in practice remains fuzzy. The key difference with Gemini Robotics-ER 1.6 is that it attempts to make robots think the way humans do about their environment. This matters because when a human tells a robot to do something, there's often a gap between the instruction and how the robot decides to execute it .
"The benchmark we measure ourselves against when it comes to understanding is that the system should answer the way a human would," explained Carolina Parada, Head of Robotics at Google DeepMind.
Carolina Parada, Head of Robotics at Google DeepMind
Consider a simple example: if you ask a robot to recycle cans in a living room, it might grip the can sideways. A human would avoid this because we know from experience that cans with leftover liquid will spill. Robots don't have that lifetime of accumulated knowledge, but Gemini Robotics-ER 1.6 is designed to reason about these kinds of safety considerations .
How Does Spot Actually Use This New Reasoning Capability?
Boston Dynamics has deployed several thousand Spot robots in commercial settings, making it one of the few companies with real-world embodied AI at scale. The new reasoning model focuses on one of the few applications where legged robots have proven commercially viable: industrial inspection .
With Gemini Robotics-ER 1.6, Spot can now perform tasks that previously required human oversight or explicit programming. The robot's new capabilities include:
- Autonomous hazard detection: Spot can independently identify dangerous debris or spills in industrial facilities without being explicitly told what to look for.
- Instrument reading: The robot can read complex gauges and sight glasses, interpreting analog measurements that would normally require human inspection.
- Environmental reasoning: Spot can call on vision-language-action models to understand what's happening in its surroundings and respond appropriately.
"Advances like Gemini Robotics ER 1.6 mark an important step toward robots that can better understand and operate in the physical world. Capabilities like instrument reading and more reliable task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously," said Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics.
Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics
The reasoning model also includes a safety layer called the ASIMOV benchmark, which trains the system to avoid dangerous actions. For instance, if you ask the robot to bring you a cup of water, it will reason not to place it on the edge of a table where it could fall .
What's the Catch? Why Can't Robots Just Use All Their Sensors?
There's still a disconnect between what Gemini Robotics-ER 1.6 can do and what physical robots actually need to operate reliably. One of the new features is success detection, which uses multiple camera angles to determine whether Spot has successfully grasped an object. But robots have other well-established ways to sense a successful grasp, including touch sensors and force sensors, that the model isn't currently using .
The reason reveals a fundamental problem the robotics field is still wrestling with: training models requires data, and there's a massive shortage of the right kind of data. Vision data is abundant on the internet, but touch and force sensing data is extremely rare. This creates a bottleneck where models become vision-only by necessity, not by design .
"At the moment, these models are strictly vision only. There is lots of visual information on the web about how to pick up a pen. If we had enough data with touch information, we could easily learn it, but there is not a lot of data with touch sensing on the internet," noted Carolina Parada.
Carolina Parada, Head of Robotics at Google DeepMind
Boston Dynamics is addressing this by requiring customers who use the new inspection capabilities to share their data with the company. This real-world data becomes part of the training pipeline for future versions of the model .
How Reliable Does a Robot Actually Need to Be?
You might assume that robots need to be nearly perfect to be useful in industrial settings, but the reality is more nuanced. Most critical infrastructure in facilities is already instrumented with sensors that alert operators when something goes wrong. Spot's job is to catch the problems that fall through the cracks, the uninstrumented issues that could still cause trouble if nobody's paying attention .
Boston Dynamics has found that somewhere north of 80 percent accuracy is the threshold where operators find the robot useful rather than annoying. Below that level, the robot essentially becomes a false alarm machine, and operators start ignoring it. This insight comes from real-world deployment experience with actual customers, not laboratory testing .
"We take this very seriously. We roll out new DeepMind capabilities through beta programs to a smaller set of customers to understand what to anticipate, and we only actively advertise features we are confident will work," said Marco da Silva.
Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics
The company's cautious approach to rolling out new features reflects the trust problem that comes with deploying AI in the real world. Customers need to believe that the robot will work reliably, and Boston Dynamics has learned that overpromising capabilities damages that trust .
What Comes Next for Embodied AI?
Both Boston Dynamics and Google DeepMind see the Spot deployment as a testing ground for how reasoning models can be most useful in the real world. The insights gained from Spot's inspection work will inform how similar models are applied to other embodied AI platforms, including Boston Dynamics' humanoid robot Atlas .
The ultimate goal isn't to make better inspection robots. It's to get closer to safe and reliable robots that can perform household tasks like picking up laundry, taking a dog for a walk, or clearing away soda cans without making a mess. Each real-world deployment of Spot brings the field closer to that vision by generating the kind of data and operational insights that can't be obtained in a laboratory .