Understanding aches, pains, and medical conditions is a challenging task. Doctors and other health professionals attend school for years to learn about symptoms, diagnostic methods, and treatments. Afterwards, most spend considerable time reading literature and staying up to date with developments in the field.
Suddenly, large language models are changing the way clinicians approach medical diagnostics. For years, machine learning and other forms of AI have helped decipher data, spot trends, and automate processes. Now, ChatGPT and other Generative AI (GenAI) tools are streamlining administrative tasks, enhancing clinical-decision support, and augmenting patient education.
“Generative AI is rapidly moving into the mainstream of healthcare,” said Adam Rodman, an assistant professor of medicine at Beth Israel Deaconess Medical Center in Boston, and an AI researcher at Harvard Medical School. “It’s helping both doctors and patients understand events, communicate, and make important medical decisions.”
In fact, research conducted by Rodman and others demonstrates that AI delivers relatively accurate diagnoses, and it may even outperform humans at some tasks. Yet GenAI also introduces challenges, risks, and dangers. This includes spewing incorrect information—a.k.a. hallucinations—and possibly eroding critical face-to-face contact between physicians and patients.
Doctors Little Helper
Busy doctors, anxious patients, and an overtaxed healthcare system is the new normal. GenAI could help fill the gaps. “It is a disruptive technology that can process complex information quickly and serve as a real-time tool,” said Andrew S. Parsons, Associate Dean for Clinical Competency and an associate professor in the Division of Hospital Medicine at the University of Virginia School of Medicine.
A group of researchers, including Parsons and Rodman, recently embarked on a study to better understand how large language models impact diagnostic reasoning compared to conventional resources. They analyzed the behaviors and outcomes of 50 physicians across three diagnostics approaches: doctors who used only conventional methods, those who relied on ChatGPT-4 along with conventional resources; and AI with no human interaction.
Remarkably, the LLMs scored 16 percentage points higher than the conventional group. They reported the results in an October 2024 paper published in the Journal of the American Medical Association, “Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.” “The study suggests that AI alone can outperform clinicians in specific diagnostic reasoning tasks when presented with structured vignettes,” Parsons said.
This doesn’t mean GenAI will replace human clinicians. “Real-life scenarios have nuances. They require an understanding of patient contexts, emotions, and dynamic decision-making,” Parsons noted. More likely, generative AI will complement current methods, which include widely used decision-support tools such as UpToDate and DynaMed. “ChatGPT introduces a dynamic, conversational interface,” he said.
Generative AI isn’t only a valuable diagnostic tool for clinicians; it can help patients sort through complex medical information. An August 2024 study conducted by Kaiser Family Foundation found that 25% of adults under age 30 already use generative AI at least once a month to gather health information. Among all adults, 17% rely on AI chatbots. “Patients are increasingly using the technology to educate themselves and get a second opinion. People want to understand their health better,” Rodman said.
Large language models are a clear step up from search engines, Rodman said. “If a medical professional has collected and collated the information, a large language model can be remarkably good at checking symptoms and assessing general issues. Nevertheless, questions remain about how these models work and how effective and accurate they are when a patient interacts with them directly. It is too early to draw definitive answers,” he added.
Rx for Results
In the years ahead, ChatGPT and similar LLMs could profoundly reshape the medical field, said Ethan Goh, Clinical Research Fellow at the Stanford Clinical Excellence Research Center and a lead author of the JAMA paper. These tools increasingly help clinicians transcribe notes, identify correct billing codes, and capture important data. “They drive significant efficiency gains,” he said.
The next frontier is integrating large language models into mainstream diagnostics. This could fundamentally change the way doctors practice medicine and make clinical decisions, Goh said. “Leveraging GenAI technology is a way to ingest and synthesize large amounts of information quickly. The technology could prove valuable in open-ended environments where there isn’t a clear single-best answer. It could guide clinicians through a range of possibilities—including things that a human may have overlooked.”
Another benefit of GenAI, Rodman said, is that it can reduce human cognitive biases that sometimes lead to poor clinical decisions. “Oftentimes, humans anchor on early data, and they don’t question themselves after they have reached a conclusion,” he explained. “AI models don’t do that. If you instruct them to reexamine things or look at a situation in a different way, they will do so—and they are often quite good at it.”
While AI has the potential to transform medical diagnostics, it must gain the full trust of clinicians, patients, and regulators. The fact that it lacks explainability is a concern. The most glaring issues are the potential to commit clinical errors, and patients mistaking AI advice for definitive medical guidance. “Today’s AI lacks contextual awareness. It can’t handle ethical dilemmas or understand the unpredictability of clinical environments,” Parsons said.
AI ultimately will require a hybrid approach, Parsons added. Within this human-in-the-loop model, “AI serves as a tool and complements human clinicians rather than replacing them. Healthcare professionals must focus on maintaining their core clinical reasoning abilities while embracing AI as a supportive tool. The trade-off involves leveraging AI’s speed and breadth without losing sight of the deeper, patient-centered aspects of care.”
Beyond Bots
Integrating GenAI into everyday clinical practice will necessitate training, so medical professionals understand how to prompt chatbots and interpret results. Sets of predefined prompts could help guide doctors and others through decision-making, Parsons said. Another challenge is integrating AI into existing workflows—or revamping processes to incorporate chatbots and other forms of generative AI. This might require tight integration with electronic health records and connected medical devices, Goh said.
Experts hope to conduct more research into the space. Rodman said that approaching chatbots and GenAI in a creative yet measured way is crucial. They could become valuable assistants. “The goal is to make everyone more informed and efficient,” he said. “Perhaps AI serves as the first point of contact but with need to make sure that a care team is involved in all aspects of diagnosis and treatment. The goal is to deliver better and more humane healthcare.”
Samuel Greengard is an author and journalist based in West Linn, OR, USA.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment