Letter to the Editor
Comment on: http://medinform.jmir.org/2025/1/e80987/
doi:10.2196/82057
Keywords
We sincerely thank the author for the constructive commentary on our recent publication. Our study evaluated ChatGPT’s performance across multiple dimensions—including history taking, diagnostic accuracy, communication skills, and empathic expression—through a clinical performance examination using simulated patients combined with written examinations [
].In our study, the written examination was not intended to solely serve as a direct comparison of performance between ChatGPT and human physicians. Rather, it was included to support the interpretation of ChatGPT’s communication skills and empathic responses observed during simulated patient interactions by providing additional context regarding the model’s underlying clinical knowledge. A previous study has shown that patients may perceive ChatGPT’s responses as empathic or trustworthy, even when those responses are clinically inappropriate [
]. However, effective clinical communication is not merely about verbal fluency or emotional tone; it must be grounded in adequate medical knowledge. For this reason, earlier studies evaluating artificial intelligence empathy have also assessed the clinical appropriateness of responses and compared them to those of human physicians [ , ].Consistent with prior work, we also assessed the simulated patient conversations in terms of both clinical accuracy and empathic engagement, as evaluated by an emergency medicine professor. However, we recognize that physicians vary in their diagnostic styles and communication approaches. Subjective judgment from the evaluator may have influenced the ratings, especially given that the evaluated outputs were full conversations rather than single responses. To provide a complementary and more structured assessment, we incorporated a written test focused on 3 key domains: diagnosis, investigation, and treatment planning. Performance on this test may serve as a supporting element to help ensure that ChatGPT’s interpersonal strengths were not misaligned with clinical reasoning.
As the author correctly pointed out, the questions in the written examination were adapted from a publicly available textbook published in 2018 [
]. We cannot rule out the possibility that ChatGPT was exposed to this material or similar content during pretraining, due to the limited transparency regarding its training data. Therefore, part of the model’s performance on the written test may have been influenced by data contamination. We fully acknowledge this methodological limitation and agree that the results from the written examination should be interpreted with caution.We are truly grateful for the author’s thoughtful engagement, which raises important considerations for future studies regarding the assessment of AI in clinical settings.
Acknowledgments
During the preparation of this manuscript, the authors used ChatGPT to assist with improving readability and correcting grammatical errors. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Conflicts of Interest
None declared.
References
- Park C, An MH, Hwang G, Park RW, An J. Clinical performance and communication skills of ChatGPT versus physicians in emergency medicine: simulated patient study. JMIR Med Inform. Jul 17, 2025;13:e68409. [FREE Full text] [CrossRef] [Medline]
- Armbruster J, Bussmann F, Rothhaas C, Titze N, Grützner PA, Freischmidt H. “Doctor ChatGPT, can you help me?” The patient’s perspective: cross-sectional study. J Med Internet Res. Oct 01, 2024;26:e58831. [FREE Full text] [CrossRef] [Medline]
- Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. Jun 01, 2023;183(6):589-596. [FREE Full text] [CrossRef] [Medline]
- Shamil E, Ravi P, Mistry D. 100 Cases in Emergency Medicine and Critical Care. Boca Raton, FL. CRC Press; 2018.
Edited by A Iannaccio; This is a non–peer-reviewed article. submitted 15.Aug.2025; accepted 20.Aug.2025; published 29.Sep.2025.
Copyright©ChulHyoung Park, Min Ho An, Gyubeom Hwang, Rae Woong Park, Juho An. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 29.Sep.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.