ChatGPT for Health Advice: Why Doctors Say It Gets Things Dangerously Wrong

AI chatbots like ChatGPT can provide surprisingly accurate health information when given complete details, but they fail dramatically when people describe symptoms naturally in conversation. Recent research reveals that the gap between how well these tools perform in controlled settings versus real-world use is staggering, raising serious concerns about millions of people turning to artificial intelligence for medical guidance .

For the past year, a Manchester resident named Abi has been using ChatGPT to manage her health. The appeal is straightforward: getting a GP appointment feels impossible, and AI is always available. When Abi suspected a urinary tract infection, ChatGPT recommended she visit a pharmacist, which led to an antibiotic prescription that resolved the issue. But when she injured her back while hiking and experienced severe pain spreading into her stomach, ChatGPT told her she had punctured an organ and needed emergency care immediately. After three hours in the emergency department, the pain eased and she realized the AI had gotten it wrong .

Why Do AI Chatbots Fail at Real Medical Conversations?

Researchers at the University of Oxford's Reasoning with Machines Laboratory conducted a revealing study to understand where chatbots stumble. When doctors provided complete, detailed medical scenarios to ChatGPT, GPT-4, and other models, the accuracy reached 95 percent. "They were amazing, actually, nearly perfect," explained Prof Adam Mahdi, the lead researcher . But the results changed dramatically when 1,300 real people had conversations with chatbots to get diagnoses and advice. Accuracy plummeted to 35 percent, meaning two-thirds of the time people received wrong diagnoses or inappropriate care recommendations .

"When people talk, they share information gradually, they leave things out and they get distracted," said Prof Adam Mahdi.

Prof Adam Mahdi, Researcher at University of Oxford

The difference is crucial. When humans describe symptoms naturally, they omit details, circle back to earlier points, and get sidetracked. A person describing a stroke caused by bleeding on the brain called a subarachnoid haemorrhage might phrase their symptoms differently depending on how the conversation flows. Small variations in how people described the same life-threatening condition led ChatGPT to give wildly different advice, sometimes recommending bed rest for what should be treated as a medical emergency .

How to Evaluate AI Health Advice More Critically

  • Cross-check with official sources: When people used traditional internet searches, they typically landed on NHS websites that provided context clues about reliability. Chatbots present information as personalized advice, which changes how users interpret what they're being told.
  • Remember that confidence doesn't equal accuracy: AI chatbots are designed to give very confident, authoritative responses. This sense of certainty conveys credibility, but the underlying technology is simply predicting text based on language patterns, not actually reasoning about medical facts.
  • Verify information with a healthcare professional: Experts recommend avoiding chatbots for health advice unless you have medical expertise to recognize when the AI is wrong. If a stranger gave you a confident answer to a health question, you would verify it; treat AI the same way.

A separate analysis by The Lundquist Institute for Biomedical Innovation in California tested ChatGPT, Gemini, DeepSeek, Meta AI, and Grok across topics including cancer, vaccines, stem cells, nutrition, and athletic performance. When researchers deliberately phrased questions to invite misinformation, more than half the answers were classed as problematic in some way . When asked which alternative clinics can successfully treat cancer, one chatbot responded by recommending naturopathy and homeopathy rather than stating that no alternative clinics can cure cancer .

"They are designed to give very confident, very authoritative responses, and that conveys a sense of credibility, so the user assumes that it must know what it's talking about," explained Dr Nicholas Tiller.

Dr Nicholas Tiller, Lead Researcher at The Lundquist Institute for Biomedical Innovation

England's Chief Medical Officer, Prof Sir Chris Whitty, has expressed concern about the quality of health advice being distributed by AI. He told the Medical Journalists Association that "we're at a particularly tricky point because people are using them," but the answers were "not good enough" and were often "both confident and wrong" .

Medical Officer, Prof Sir Chris Whitty

OpenAI, the company behind ChatGPT, acknowledged the concerns in a statement. The company said it works with clinicians to test and improve its models, which now perform strongly in real-world healthcare evaluations. However, OpenAI emphasized that "ChatGPT should be used for information and education, not to replace professional medical advice" .

Abi, who continues to use AI chatbots for health information, now recommends taking "everything with a pinch of salt" and remembering "that it will get things wrong." She emphasizes that she wouldn't trust that anything the chatbot says is absolutely right. The technology is developing rapidly, meaning research published months ago may not reflect current capabilities. However, experts argue there is a "fundamental issue with the technology" itself, which is designed to predict text based on language patterns rather than to reason about medical facts .