Google DeepMind has introduced a scientific framework for measuring progress toward artificial general intelligence (AGI) by identifying 10 key cognitive abilities that AI systems need to develop. The research team released a paper titled "Measuring Progress Toward AGI: A Cognitive Taxonomy" on March 17, 2026, alongside a community hackathon offering $200,000 in prizes to help build the evaluations that will put this framework into practice. Why Measuring AGI Progress Has Been So Difficult? One of the biggest challenges in AI research is knowing how close we actually are to achieving artificial general intelligence, a system that could match or exceed human cognitive abilities across virtually any task. The problem is straightforward: there's been a lack of empirical tools for evaluating whether AI systems are genuinely becoming more intelligent in a general sense, or simply getting better at narrow, specific tasks. Ryan Burnell, a research scientist at Google DeepMind, and Oran Kelly, a product manager at the same organization, led the effort to address this gap by drawing on decades of research from psychology, neuroscience, and cognitive science. What Are the 10 Cognitive Abilities in Google's Framework? Rather than relying on traditional benchmarks that test narrow capabilities, Google DeepMind's framework identifies 10 cognitive abilities that researchers hypothesize will be important for general intelligence in AI systems. These abilities span the full spectrum of human cognition and represent the foundation for how the team plans to evaluate progress. - Perception: The ability to extract and process sensory information from the environment, such as understanding images, audio, or text input. - Generation: The capacity to produce outputs such as text, speech, and physical actions in response to tasks. - Attention: The skill of focusing cognitive resources on what matters most in a given situation or problem. - Learning: The capability to acquire new knowledge through experience and direct instruction without being explicitly programmed. - Memory: The ability to store and retrieve information over time, maintaining context across conversations or tasks. - Reasoning: The capacity to draw valid conclusions through logical inference and structured thinking. - Metacognition: Knowledge and monitoring of one's own cognitive processes, essentially thinking about thinking. - Executive Functions: Skills like planning, inhibition of impulses, and cognitive flexibility to switch between tasks. - Problem Solving: The ability to find effective solutions to domain-specific problems that haven't been encountered before. - Social Cognition: The capacity to process and interpret social information and respond appropriately in social situations. How to Participate in the Cognitive Abilities Hackathon? Google DeepMind is inviting the global research community to help build the evaluations needed to measure these cognitive abilities. The hackathon focuses on five areas where the evaluation gap is largest: learning, metacognition, attention, executive functions, and social cognition. Participants can use Kaggle's newly launched Community Benchmarks platform to design and test their evaluations against a lineup of frontier AI models. - Timeline: Submissions are open from March 17 through April 16, 2026, with results announced on June 1. - Prize Structure: The hackathon offers a total prize pool of $200,000, including $10,000 awards for the top two submissions in each of the five cognitive ability tracks, plus $25,000 grand prizes for the four best overall submissions. - Platform Access: Participants can build and test their evaluations on Kaggle's Community Benchmarks platform, which provides access to frontier models for benchmarking purposes. - Submission Requirements: Entries should design rigorous evaluations that test AI systems' performance in one or more of the five targeted cognitive abilities against human baselines. How Does the Evaluation Protocol Actually Work? Google DeepMind's approach to measuring cognitive abilities follows a three-stage evaluation protocol that benchmarks AI system performance in relation to human capabilities. This methodology ensures that progress toward AGI is measured against a meaningful standard: how well AI systems perform compared to humans across diverse cognitive tasks. The first stage involves evaluating AI systems across a broad suite of cognitive tasks covering each of the 10 abilities, using held-out test sets to prevent data contamination, where models might have already seen the test data during training. The second stage requires collecting human baselines for the same tasks from a demographically representative sample of adults, ensuring that the human comparison group reflects the diversity of human cognition. The third stage maps each AI system's performance relative to the distribution of human performance in each ability, showing whether the AI exceeds, matches, or falls short of human-level capability in specific domains. Why Does This Framework Matter for the Future of AI? The significance of this framework extends beyond academic interest. Artificial general intelligence has the potential to accelerate scientific discovery and help solve some of humanity's most pressing problems, from disease research to climate modeling. However, without clear metrics for measuring progress, it's impossible to know whether we're moving toward that goal or simply creating increasingly specialized systems that appear intelligent in narrow domains. By establishing a cognitive taxonomy grounded in decades of psychological and neuroscience research, Google DeepMind is creating a shared language and measurement standard that the entire AI research community can use to track genuine progress toward AGI. The hackathon component is particularly important because it democratizes the evaluation-building process. Rather than relying solely on researchers at major labs, Google DeepMind is inviting the broader community of data scientists, machine learning engineers, and academics to contribute their expertise in designing rigorous cognitive tests. This crowdsourced approach could yield more diverse and robust evaluations than any single organization could develop alone, ultimately creating a more comprehensive picture of how close AI systems are to achieving general intelligence.