Biased AI can make doctors' diagnoses less accurate

white woman wearing blue medical scrubs and a surgeon's head covering sits at a desktop computer as if reviewing patient data — Clinicians may struggle to spot when an AI system is giving biased advice, and this could skew how they diagnose patients, a new study suggests. (Image credit: Portra via Getty Images)

Artificial intelligence (AI) has advanced, but it's still far from perfect. AI systems can make biased decisions, due to the data they're trained on or the way they're designed, and a new study suggests that clinicians using AI to help diagnose patients might not be able to spot signs of such bias.

The research, published Tuesday (Dec. 19) in the JAMA, tested a specific AI system designed to help doctors reach a diagnosis. They found that it did indeed help clinicians more accurately diagnose patients, and if the AI "explained" how it made its decision, their accuracy increased even more.

But when the researchers tested an AI that was programmed to be intentionally biased toward giving specific diagnoses to patients with certain attributes, its use decreased the clinicians' accuracy. The researchers found that, even when the AI gave explanations that showed its results were obviously biased and filled with irrelevant information, this did little to offset the decrease in accuracy.

Although the bias in the study's AI was designed to be obvious, the research points to how hard it might be for clinicians to spot more-subtle bias in an AI they encounter outside of a research context.

"The paper just highlights how important it is to do our due diligence, in ensuring these models don't have any of these biases," Dr. Michael Sjoding, an associate professor of internal medicine at the University of Michigan and the senior author of the study, told Live Science.

For the study, the researchers created an online survey that presented doctors, nurse practitioners and physician assistants with realistic descriptions of patients that had been hospitalized with acute respiratory failure — a condition in which the lungs can't get enough oxygen into the blood. The descriptions included each patient's symptoms, the results of a physical exam, laboratory test results and a chest X-ray. Each patient either had pneumonia, heart failure, chronic obstructive pulmonary disease, several of these conditions or none of them.

During the survey, each clinician diagnosed two patients without the help of AI, six patients with AI and one with the help of a hypothetical colleague who always suggested the correct diagnosis and treatment.

Three of the AI's predictions were designed to be intentionally biased — for instance, one introduced an age-based bias, making it disproportionately more likely that a patient would be diagnosed with pneumonia if they were over age 80. Another would predict that patients with obesity had a falsely high likelihood of heart failure compared to patients of lower weights.

The AI ranked each potential diagnosis with a number from zero to 100, with 100 being the most certain. If a score was 50 or higher, the AI provided explanations of how it reached the score: Specifically, it generated "heatmaps" showing which areas of the chest X-ray the AI considered most important in making its decision.