Although artificial intelligence won’t replace your doctor any time soon, a new study has found that technologies such as ChatGPT could improve patients’ experience by providing responses to their healthcare questions that are more accurate and in a manner they perceive as more empathetic than answers from human doctors.
The study, appearing in JAMA Internal Medicine, compared written responses from human physicians and those from ChatGPT to real-world health questions. A panel of licensed health care professionals evaluating the responses preferred ChatGPT’s answers 79% of the time, and found them more empathetic and of higher quality.
“The demand for doctors to answer questions via electronic patient messaging these days is overwhelming, so it is not a surprise that physicians not only are experiencing burnout, but also that the quality of their answers sometimes suffers. This study is evidence that AI tools can make doctors more efficient and accurate, and patients happier and healthier,” said study co-author Mark Dredze, an associate professor of computer science at Johns Hopkins University’s Whiting School of Engineering who advised the research team on the capabilities of large language models. Dredze is also director of research for foundations of AI at the Johns Hopkins AI-X Foundry, which drives AI research and its applications in health, medicine, and safety with the goal of understanding and improving the human condition.
Study leader John W. Ayers, of the Qualcomm Institute at the University of California San Diego, says the results provide an early glimpse into the important role that AI assistants could play in healthcare.
“The opportunities for improving healthcare with AI are massive,” said Ayers, who is also vice chief of innovation in the UC San Diego School of Medicine Division of Infectious Disease and Global Public Health. “AI-augmented care is the future of medicine.”
The research team behind the study set out to answer the question: Can ChatGPT respond accurately to the types of questions that patients send to their doctors?
To obtain a large and diverse sample of healthcare questions and physician answers that did not include identifiable personal information, the team turned to Reddit’s AskDocs, a social media forum where patients publicly post medical questions to which doctors respond.
r/AskDocs is a subreddit with approximately 452,000 members who post medical questions to which verified healthcare professionals submit answers. While anyone can respond to a question, moderators verify healthcare professionals’ credentials and responses display the respondent’s level of credentials. The result is a large and diverse set of patient medical questions and accompanying answers from licensed medical professionals.
While some may wonder if question-answer exchanges posted on social media are a fair way to test this, clinical team members noted that the exchanges reflected their clinical experience.
The team randomly sampled 195 exchanges from r/AskDocs where a verified physician responded to a public question. The team then provided the original question to ChatGPT and asked it to author a response. A panel of three licensed healthcare professionals assessed each question and the corresponding responses, blinded to whether the response originated from a physician or ChatGPT. They compared responses based on information quality and empathy, noting which one they preferred.
The result? The panel of healthcare professional evaluators preferred ChatGPT responses to physician responses almost 80% of the time.
“ChatGPT messages responded with nuanced and accurate information that often addressed more aspects of the patients’ questions than the physicians’ responses,” said study co-author Jessica Kelley, a nurse practitioner with San Diego firm Human Longevity.
Additionally, ChatGPT responses were rated significantly higher in quality than physician responses: Good or very good quality responses were 3.6 times higher for ChatGPT than physicians (physicians 22.1% versus ChatGPT 78.5%). The responses were also more empathic: Empathetic or very empathetic responses were 9.8 times higher for ChatGPT than for physicians (physicians 4.6% versus ChatGPT 45.1%).
“There have been several studies showing that these AI models can pass medical licensing questions, but that doesn’t mean they would provide good answers to questions from real people. This study shows that they can,” Dredze says. “We aren’t proposing that we build AI doctors, but our results suggest that doctors could be more effective when aided by AI.”
Aaron Goodman, an associate clinical professor at UC San Diego School of Medicine and study co-author, says, “I never imagined saying this, but ChatGPT is a prescription I’d like to give to my inbox. The tool will transform the way I support my patients.”
In addition to improving workflow, investments into AI assistant messaging could impact patient health and physician performance, the study authors say.
“We could use these technologies to train doctors in patient-centered communication, eliminate health disparities suffered by minority populations who often seek healthcare via messaging, build new medical safety systems, and assist doctors by delivering higher quality and more efficient care,” says Dredze. “When doctors are overwhelmed, empathy for their patients can be the first thing to go. But empathy is critical in care: A patient doesn’t listen to a doctor if they don’t feel heard. This study is evidence that AI could help doctors maintain empathetic and accurate communication with their patients.”