Google DeepMind's New Framework Could Finally Tell Us How Close We Are to AGI

Q: Why Measuring AGI Progress Has Been So Difficult?

One of the biggest challenges in AI research is knowing how close we actually are to achieving artificial general intelligence, a system that could match or exceed human cognitive abilities across virtually any task. The problem is straightforward: there's been a lack of empirical tools for evaluating whether AI systems are genuinely becoming more intelligent in a general sense, or simply getting better at narrow, specific tasks. Ryan Burnell, a research scientist at Google DeepMind, and Oran Kelly, a product manager at the same organization, led the effort to address this gap by drawing on decades of research from psychology, neuroscience, and cognitive science .

Q: What Are the 10 Cognitive Abilities in Google's Framework?

Rather than relying on traditional benchmarks that test narrow capabilities, Google DeepMind's framework identifies 10 cognitive abilities that researchers hypothesize will be important for general intelligence in AI systems. These abilities span the full spectrum of human cognition and represent the foundation for how the team plans to evaluate progress .

Q: How to Participate in the Cognitive Abilities Hackathon?

Google DeepMind is inviting the global research community to help build the evaluations needed to measure these cognitive abilities. The hackathon focuses on five areas where the evaluation gap is largest: learning, metacognition, attention, executive functions, and social cognition. Participants can use Kaggle's newly launched Community Benchmarks platform to design and test their evaluations against a lineup of frontier AI models .

Q: How Does the Evaluation Protocol Actually Work?

Google DeepMind's approach to measuring cognitive abilities follows a three-stage evaluation protocol that benchmarks AI system performance in relation to human capabilities. This methodology ensures that progress toward AGI is measured against a meaningful standard: how well AI systems perform compared to humans across diverse cognitive tasks . The first stage involves evaluating AI systems across a broad suite of cognitive tasks covering each of the 10 abilities, using held-out test sets to prevent data contamination, where models might have already seen the test data during training. The second stage requires collecting human baselines for the same tasks from a demographically representative sample of adults, ensuring that the human comparison group reflects the diversity of human cognition. The third stage maps each AI system's performance relative to the distribution of human performance in each ability, showing whether the AI exceeds, matches, or falls short of human-level capability in specific domains .

Q: Why Does This Framework Matter for the Future of AI?

The significance of this framework extends beyond academic interest. Artificial general intelligence has the potential to accelerate scientific discovery and help solve some of humanity's most pressing problems, from disease research to climate modeling. However, without clear metrics for measuring progress, it's impossible to know whether we're moving toward that goal or simply creating increasingly specialized systems that appear intelligent in narrow domains. By establishing a cognitive taxonomy grounded in decades of psychological and neuroscience research, Google DeepMind is creating a shared language and measurement standard that the entire AI research community can use to track genuine progress toward AGI . The hackathon component is particularly important because it democratizes the evaluation-building process. Rather than relying solely on researchers at major labs, Google DeepMind is inviting the broader community of data scientists, machine learning engineers, and academics to contribute their expertise in designing rigorous cognitive tests. This crowdsourced approach could yield more diverse and robust evaluations than any single organization could develop alone, ultimately creating a more comprehensive picture of how close AI systems are to achieving general intelligence.

FrontierNews.ai AI Research Desk

FrontierNews.ai