A growing number of AI safety researchers are stepping back from doomsday predictions, arguing that the field has made enough progress on alignment that catastrophic outcomes are becoming less likely. Yet this optimism masks a deeper tension: the problems that remain unsolved may be the most dangerous ones of all, and whether humanity avoids disaster depends entirely on choices companies and governments have yet to make. The shift in tone is striking. In January 2026, David Dalrymple, programme director at the UK's Advanced Research and Invention Agency, dropped his probability estimate for AI-caused human extinction from 40 to 50 percent down to just 5 to 8 percent, even assuming no further progress is made on alignment. Around the same time, researcher Adrià Garriga-Alonso, who previously worked at FAR.AI and Redwood Research, quit his AI safety job in December, concluding there was "no point" doing more speculative alignment work because current strategies would be sufficient. This represents a dramatic reversal from the 2010s, when many AI researchers viewed alignment as a problem that might need to be solved perfectly on the first attempt or face extinction-level consequences. The change reflects genuine technical progress in how AI systems learn human values. What Changed in How AI Systems Learn Values? The breakthrough came from an unexpected direction. For years, AI luminaries including Yoshua Bengio, Stuart Russell, and Yann LeCun debated whether AI systems could ever truly understand human values. The dominant approach at the time, reinforcement learning, involved training AI through trial and error in simulated environments, a process fundamentally alien to how humans learn. But the field shifted when large language models (LLMs), which are AI systems trained on vast amounts of text, became the dominant paradigm. "We do this pretraining on human data, and then we get something that understands human values fairly innately now," Garriga-Alonso explained. Anthropic CEO Dario Amodei agrees, noting that "models inherit a vast range of humanlike motivations from pretraining." This understanding underpins current alignment efforts like Anthropic's constitution for Claude, which guides the model using written principles like "helpful" and "harmless". The second reason for optimism is that progress in AI development has been more gradual than feared. Rather than sudden leaps in capability, the field has seen multiple frontier models, successive versions, and continuous experimentation. This means researchers can iterate and learn from mistakes rather than needing to get everything right immediately. How Are Researchers Building Safety Infrastructure for Future AI Systems? Safety researchers are developing concrete tools and approaches to manage increasingly powerful AI systems: - Model Organisms Research: Scientists create toy environments where misalignment can be studied in controlled settings, testing whether AI systems demonstrate misaligned behavior and how well safety strategies work when success and failure are measurable. - Scalable Oversight Methods: Researchers are developing approaches that use aligned but weaker AI models to monitor stronger ones, extending human ability to oversee systems that are smarter than humans. - Iterative Development: Jan Leike, an AI alignment researcher now leading the Alignment Science team at Anthropic, argues that "we can evolve our mitigations and safeguards incrementally with our models," reducing the pressure to solve everything at once. Ryan Greenblatt, chief scientist at Redwood Research, told Transformer that baseline scalable oversight methods have worked better than expected, though he noted less effort has been put toward developing these strategies than he would have hoped. These approaches could help ensure that smarter-than-human AI systems don't operate beyond human understanding. Why Are Some Experts Still Deeply Concerned? Despite the optimistic reassessments, Dalrymple's revised 5 to 8 percent extinction risk estimate remains sobering. A one in 20 chance of human extinction would make AI the biggest threat humanity faces, potentially worse than estimates for nuclear war, climate change, and engineered pandemics combined. More importantly, the hardest problems remain untouched. "We're still doing alignment on easy mode since our models aren't really superhuman yet," Jan Leike explained. The crucial test will come when AI systems become qualitatively superhuman, at which point "lots of stuff starts breaking down," according to Evan Hubinger, the alignment stress-testing team lead at Anthropic. Warning signs are already emerging. Research designed to elicit misaligned behavior has uncovered blackmail, deception, and cheating in AI systems. Dario Amodei writes that such problems "seem particularly likely to occur when AI systems pass a threshold from less powerful than humans to more powerful than humans, since the range of possible actions an AI system could engage in, including hiding its actions or deceiving humans about them, expands radically after that threshold." He estimates a 25 percent chance of things going "really, really badly". The gap between solvability and actual solutions is critical. Renowned AI professor Stuart Russell said in December that while it is possible to make safe, aligned AI, companies "need to make the AI systems millions of times safer" to bring risk down to levels deemed acceptable from other sources like nuclear reactors or asteroid strikes. Without regulation, he argued, this won't happen. Ryan Greenblatt has written that existential risk from misalignment could be reduced to 7 percent if there is political will for international coordination and significant investment in safety work. "It seems to me like risk is very elastic to how much people try," he noted. "If the world was trying very hard, risk would probably be lower." The emerging consensus among safety researchers is neither complacent nor alarmist: alignment problems are solvable, but only if the world actually prioritizes solving them. As Leike put it, "Just because a problem is solvable, this doesn't mean it's solved. We have to actually keep doing the work to get it done."