The AI Agents Problem: Why Humans Still Outperform AI on Complex Scientific Tasks

AI agents are transforming how scientists work, but they're not ready to replace human expertise on complex tasks. According to the 2026 Artificial Intelligence Index Report released by Stanford University, the best AI agents score roughly half as well as human specialists with PhDs when tackling multistep scientific workflows . This finding comes as the number of scientific publications mentioning AI has skyrocketed nearly 30-fold since 2010, raising questions about whether the technology is truly improving research productivity.

Why Are Scientists Adopting AI So Rapidly?

The adoption curve is striking. In 2025 alone, more than 80,000 papers, preprints, and publications in the natural sciences mentioned AI, representing a 26% increase from 2024 . Across different scientific fields, between 6% and 9% of all publications now reference AI tools. Physical sciences led the way with 33,000 AI-related publications, while Earth sciences had the highest percentage at 9% .

Despite these numbers, researchers remain uncertain whether AI is actually making science faster or better.

"The studies are limited," said Yolanda Gil, a computer scientist at the University of Southern California who led the index report. "But scientists can't live without it. If you took AI away from them, there would be a riot. So it must be helping in some way."

Yolanda Gil, Computer Scientist at the University of Southern California

The paradox is real: scientists feel dependent on AI tools, yet evidence that these tools boost productivity remains sparse. Some researchers worry the growth is happening too fast.

"Whether or not this explosive growth is meaningful is hotly debated. My view is that it is happening too fast, without giving scientific norms time to adjust, and so the quality of research has taken a nosedive," noted Arvind Narayanan, a computer scientist at Princeton University.

Arvind Narayanan, Computer Scientist at Princeton University

What's Holding AI Agents Back in Scientific Research?

AI agents are software programs designed to autonomously carry out sequences of actions, including complex scientific workflows. They sound promising in theory: imagine an AI that could design experiments, analyze data, and write up findings without human intervention. In practice, they fall short .

The core problem is reliability. AI agents struggle to consistently perform multistep workflows where one mistake early on cascades into larger errors downstream. When researchers tested the best available AI agents against human PhD specialists on complex scientific tasks, the agents performed at roughly 50% of human-level accuracy . For tasks requiring precision, logical reasoning, and domain expertise, humans remain the gold standard.

"Agents are wonderful, but we are still far from a place where we understand how to use them effectively," explained Yolanda Gil.

Yolanda Gil, Computer Scientist at the University of Southern California

How Are Science Foundation Models Changing the Landscape?

One bright spot in the AI-for-science ecosystem is the emergence of specialized foundation models. These are large AI models trained on massive datasets from specific scientific domains, allowing them to tackle a wider range of tasks within that field. The past year has seen rapid proliferation of these tools .

A notable example is AION-1, the first foundation model designed specifically for astronomy. Trained on more than 200 million celestial objects, it can classify galaxies and estimate their properties with high accuracy . Similar specialized models are emerging across physics, chemistry, biology, and other disciplines.

The speed of development surprised even researchers.

"When I talked to scientists in 2024 and said 'There's foundation models for science', scientists would not know what that means. They didn't know they existed. I think we have seen that really advance very quickly," said Gil.

Yolanda Gil, Computer Scientist at the University of Southern California

Steps to Understand AI's Role in Your Research Field

  • Assess Current Capabilities: Evaluate whether existing AI agents and foundation models in your field can handle routine tasks like literature review, data preprocessing, or hypothesis generation, while reserving complex decision-making for human experts.
  • Monitor Domain-Specific Tools: Stay informed about newly released science foundation models tailored to your discipline, as these specialized tools are advancing faster than general-purpose AI systems.
  • Design Hybrid Workflows: Rather than replacing human scientists, design research processes that leverage AI for speed and pattern recognition while keeping humans in control of critical decisions and quality assurance.
  • Track Productivity Evidence: Document whether AI adoption actually improves your team's research output, publication rate, or discovery speed, since the broader scientific community still lacks clear evidence of productivity gains.

What Does This Mean for the Future of Scientific Research?

The gap between AI agent performance and human expertise suggests that the near-term future of AI in science is not replacement, but augmentation. AI tools excel at specific, well-defined subtasks: scanning thousands of papers for relevant citations, identifying patterns in large datasets, or generating initial hypotheses. Humans excel at the creative, strategic, and judgment-heavy aspects of research .

The challenge ahead is figuring out how to integrate these capabilities effectively. As more scientists adopt AI tools without clear guidance on best practices, the risk of lower-quality research increases. At the same time, researchers who ignore AI entirely may fall behind peers who use it strategically for routine work.

The Stanford report suggests that the scientific community needs time to develop norms around AI use, establish best practices for validation and reproducibility, and honestly assess where AI adds value versus where it introduces risk. Until then, the explosive growth in AI-related publications may reflect adoption hype more than genuine scientific progress.