ChatGPT is truly awful at diagnosing medical conditions
The large language model gets medical calls wrong more often than not.
Get the world’s most fascinating discoveries delivered straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Delivered Daily
Daily Newsletter
Sign up for the latest discoveries, groundbreaking research and fascinating breakthroughs that impact you and the wider world direct to your inbox.
Once a week
Life's Little Mysteries
Feed your curiosity with an exclusive mystery every week, solved with science and delivered direct to your inbox before it's seen anywhere else.
Once a week
How It Works
Sign up to our free science & technology newsletter for your weekly fix of fascinating articles, quick quizzes, amazing images, and more
Delivered daily
Space.com Newsletter
Breaking space news, the latest updates on rocket launches, skywatching events and more!
Once a month
Watch This Space
Sign up to our monthly entertainment newsletter to keep up with all our coverage of the latest sci-fi and space movies, tv shows, games and books.
Once a week
Night Sky This Week
Discover this week's must-see night sky events, moon phases, and stunning astrophotos. Sign up for our skywatching newsletter and explore the universe with us!
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
ChatGPT's medical diagnoses are accurate less than half of the time, a new study reveals.
Scientists asked the artificial intelligence (AI) chatbot to assess 150 case studies from the medical website Medscape and found that GPT 3.5 (which powered ChatGPT when it launched in 2022) only gave a correct diagnosis 49% of the time.
Previous research showed that the chatbot could scrape a pass in the United States Medical Licensing Exam (USMLE) — a finding hailed by its authors as "a notable milestone in AI maturation."
But in the new study, published Jul. 31 in the journal PLOS ONE, scientists cautioned against relying on the chatbot for complex medical cases that require human discernment.
"If people are scared, confused, or just unable to access care, they may be reliant on a tool that seems to deliver medical advice that's 'tailor-made' for them," senior study author Dr. Amrit Kirpalani, a doctor in pediatric nephrology at the Schulich School of Medicine and Dentistry at Western University, Ontario, told Live Science. "I think as a medical community (and among the larger scientific community) we need to be proactive about educating the general population about the limitations of these tools in this respect. They should not replace your doctor yet."
ChatGPT's ability to dispense information is based on its training data. Scraped from the repository Common Crawl, the 570 gigabytes of text data fed into the 2022 model amounts to roughly 300 billion words, which were taken from books, online articles, Wikipedia and other web pages.
Related: Biased AI can make doctors' diagnoses less accurate
Get the world’s most fascinating discoveries delivered straight to your inbox.
AI systems spot patterns in the words they were trained on to predict what may follow them, enabling them to provide an answer to a prompt or question. In theory, this makes them helpful for both medical students and patients seeking simplified answers to complex medical questions, but the bots' tendency to "hallucinate" —making up responses entirely — limits their usefulness in medical diagnoses.
To assess the accuracy of ChatGPT's medical advice, the researchers presented the model with 150 varied case studies — including patient history, physical exam findings and images taken from the lab — that were intended to challenge the diagnostic abilities of trainee doctors. The chatbot chose one of four multiple-choice outcomes before responding with its diagnosis and a treatment plan which the researchers rated for accuracy and clarity.
The results were lackluster, with ChatGPT getting more responses wrong than right on medical accuracy, while it gave complete and relevant results 52% of the time. Nonetheless, the chatbot’s overall accuracy was much higher at 74%, meaning that it could identify and discard wrong multiple choice answers much more reliably.
The researchers said that one reason for this poor performance could be that the AI wasn’t trained on a large enough clinical dataset, making it unable to juggle results from multiple tests and avoid dealing in absolutes as effectively as human doctors.
Despite its shortcomings, the researchers said that AI and chatbots could still be useful in teaching patients and trainee doctors — providing the AI systems are supervised and their proclamations are accompanied with some healthy fact-checking.
"If you go back to medical journal publications from around 1995, you can see that the very same discourse was happening with 'the world wide web. There were new publications about interesting use cases and there were also papers that were skeptical as to whether this was just a fad." Kirpalani said. "I think with AI and chatbots specifically, the medical community will ultimately find that there's a huge potential to augment clinical decision-making, streamline administrative tasks, and enhance patient engagement."

Ben Turner is a U.K. based writer and editor at Live Science. He covers physics and astronomy, tech and climate change. He graduated from University College London with a degree in particle physics before training as a journalist. When he's not writing, Ben enjoys reading literature, playing the guitar and embarrassing himself with chess.
