Google's AI Overviews Are Spreading Misinformation at a Scale Never Seen Before

Google's AI Overviews are providing false information to hundreds of millions of people every day, despite appearing accurate 91 percent of the time. A recent analysis by the AI startup Oumi, conducted at the request of The New York Times, reveals that the AI-generated summaries appearing above Google search results create a misinformation crisis of unprecedented scale. With Google processing roughly five trillion search queries annually, a 9 percent error rate translates to tens of millions of incorrect answers provided every hour and hundreds of thousands every minute .

Why Are People Trusting AI Overviews Without Verification?

The danger lies not just in the errors themselves, but in how users interact with AI-generated content. Research shows that people tend to trust what an AI tells them without questioning it. Only 8 percent of users actually double-check an AI's answer, according to one study cited in the analysis. Even more concerning, another experiment found that users still followed AI guidance nearly 80 percent of the time when it gave them the wrong answer, a phenomenon researchers dubbed "cognitive surrender" .

Large language models, the AI systems powering these overviews, are designed to sound authoritative and confident. When they cannot find a straightforward answer, they can fabricate information and present it as fact. Combined with the convenience of Google's AI Overviews, which appear prominently above search results, untold numbers of users accept these summaries without verification.

What Did the Oumi Study Actually Test?

Oumi's analysis used SimpleQA, a widely recognized benchmark for AI accuracy designed by OpenAI. The researchers conducted two rounds of testing, each involving 4,326 Google searches. The first round, conducted in October, tested AI Overviews powered by Google's Gemini 2 model. A follow-up in February tested the feature after Google switched to Gemini 3, its much-hyped upgrade .

The results showed improvement between the two models, but also revealed troubling patterns:

  • Gemini 2 Accuracy: The older model provided factually sound responses 85 percent of the time, meaning it was wrong 15 percent of the time.
  • Gemini 3 Accuracy: The newer model improved to 91 percent accuracy, suggesting Google's models are getting better at avoiding hallucinations.
  • Ungrounded Responses: Gemini 2 provided answers that were "ungrounded" 37 percent of the time, meaning the AI cited websites that didn't actually support the information provided. Surprisingly, this problem worsened with Gemini 3, jumping to 56 percent of responses being ungrounded.

Ungrounded responses are particularly problematic because they make it nearly impossible for users to verify the AI's claims. When an AI cites a source that doesn't support its answer, users who try to fact-check are left confused and misinformed .

How to Verify Information From AI Overviews

Given the scale and nature of these errors, users should take specific steps to protect themselves from misinformation:

  • Always Check Sources: Click through to the websites that AI Overviews cite to verify the information actually appears there and supports the AI's claim.
  • Cross-Reference Multiple Sources: Don't rely on a single source or a single AI overview. Search for the same information across multiple websites and compare the answers.
  • Be Skeptical of Confident Statements: When an AI presents information with absolute certainty, especially on complex or controversial topics, treat it as a starting point for research rather than a final answer.
  • Use Specialized Databases for Critical Information: For health, legal, or financial questions, consult official databases, government websites, or professional organizations rather than relying on AI summaries.

What Is Google Saying About These Findings?

Google disputed the Oumi analysis, arguing it has serious flaws. A Google spokesperson stated that the study "doesn't reflect what people are actually searching on Google" . However, Google's own internal testing paints a similarly damning picture. In an internal analysis of Gemini 3, Google found that the AI model produced incorrect information 28 percent of the time. Google claims that AI Overviews are more accurate than the raw model because they draw on Google search results before generating answers, but this claim is contradicted by the ungrounded response data .

The improvement between Gemini 2 and Gemini 3 may mask a deeper problem. While accuracy improved by 6 percentage points, the rate of ungrounded responses actually increased by 19 percentage points. This suggests that Gemini 3 is becoming better at sounding confident while simultaneously becoming worse at actually supporting its claims with real sources .

Why Does This Matter for AI Research and Development?

This situation highlights a critical gap between how AI models are evaluated in research settings and how they perform in real-world applications serving billions of users. The SimpleQA benchmark, while widely used in the AI research community, may not capture the full scope of how these models fail in practice. The fact that Google was willing to deploy a model with an 85 percent accuracy rate to its entire user base, only to later upgrade to a 91 percent accurate version, suggests that accuracy thresholds for public-facing AI systems may need to be reconsidered .

The scale of this misinformation crisis is difficult to overstate. With hundreds of thousands of incorrect answers provided every minute, Google's AI Overviews represent a fundamentally new category of information hazard. Unlike traditional misinformation, which spreads through social networks and requires active sharing, AI Overviews deliver false information directly to users through one of the world's most trusted search engines, with minimal friction and maximum authority.