AI diagnosed medical issues as well as doctors in NIH test

AI diagnosed medical issues as well as human doctors in NIH test, struggled to explain answers

By Susan Carpenter Washington, D.C.

PUBLISHED 3:12 PM ET Jul. 23, 2024 PUBLISHED 3:12 PM EDT Jul. 23, 2024

An artificial intelligence program did as good of a job diagnosing patients as physical doctors, according to a new study from the National Institutes of Health. The AI model was able to diagnose clinical images on a medical quiz with high accuracy but made mistakes when explaining how it came up with the answers.

What You Need To Know

An artificial intelligence program did as good of a job diagnosing patients as physical doctors, according to a new study from the National Institutes of Health

The AI model was able to diagnose clinical images on a medical quiz with high accuracy but made mistakes when explaining how it came up with the answers

For its study, the researchers used questions from the New England Journal of Medicine’s Image Challenge, which includes 207 real clinical images and short text descriptions about a patient’s symptoms

While AI was even more accurate than doctors who weren’t able to consult external resources, human physicians beat the AI model when answering the most difficult questions and were able to look up information online

“Integration of AI into health care holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner,” National Library of Medicine Acting Director Stephen Sherry said in the study’s published results.

“However, as this study shows, AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis,” he said.

For its study, the researchers used questions from the New England Journal of Medicine’s Image Challenge, which includes 207 real clinical images and short text descriptions about a patient’s symptoms. The online quiz asks users to pick the correct diagnosis from several possible answers and to provide a written justification for each answer that includes describing the image, summarizing medical knowledge and providing step-by-step reasoning.

An AI model known as GPT-4V, capable of processing combinations of types of data, and nine human physicians took the quiz.

The NIH researchers found the AI model and the human doctors both received high scores in correctly diagnosing the images. While AI was even more accurate than doctors who weren’t able to consult external resources, human physicians beat the AI model when answering the most difficult questions and were able to look up information online.