JMI

JMIR Med Inform

JMIR Medical Informatics

2291-9694

JMIR Publications

Toronto, Canada

v13i1e82057

41021270

10.2196/82057

Letter to the Editor

Author’s Reply: "Data Contamination in AI Evaluation"

Iannaccio

Amanda

Park

ChulHyoung

MD 1 2 3

https://orcid.org/0000-0003-0531-9144

Min Ho

MD 1 2 3

https://orcid.org/0000-0003-2773-9756

Hwang

Gyubeom

MD 1 2 3

https://orcid.org/0000-0002-2293-4555

Park

Rae Woong

MD, PhD 1 2 3 4

https://orcid.org/0000-0003-4989-3287

Juho

MD 5

Department of Emergency Medicine School of Medicine Ajou University

164 Worldcup-ro

Yeongtong-gu

Suwon, 16499

Republic of Korea 82 0312195016 ermd.jh@gmail.com

https://orcid.org/0000-0002-7407-426X

1 Department of Biomedical Informatics School of Medicine Ajou University

Suwon

Republic of Korea 2 Center for Biomedical Informatics Research Ajou University Medical Center

Suwon

Republic of Korea 3 Department of Medical Sciences Graduate School of Ajou University

Suwon

Republic of Korea 4 BK21 R&E Initiative for Advanced Precision Medicine

Suwon

Republic of Korea 5 Department of Emergency Medicine School of Medicine Ajou University

Suwon

Republic of Korea

Corresponding Author: Juho An ermd.jh@gmail.com

2025

29 9 2025

e82057

15 8 2025 20 8 2025

©ChulHyoung Park, Min Ho An, Gyubeom Hwang, Rae Woong Park, Juho An. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 29.09.2025.

2025

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

https://medinform.jmir.org/2025/1/e68409/

http://medinform.jmir.org/2025/1/e80987/

artificial intelligence large language model ChatGPT emergency medicine clinical performance examination history taking clinical reasoning empathy patient experience

We sincerely thank the author for the constructive commentary on our recent publication. Our study evaluated ChatGPT’s performance across multiple dimensions—including history taking, diagnostic accuracy, communication skills, and empathic expression—through a clinical performance examination using simulated patients combined with written examinations [1].

In our study, the written examination was not intended to solely serve as a direct comparison of performance between ChatGPT and human physicians. Rather, it was included to support the interpretation of ChatGPT’s communication skills and empathic responses observed during simulated patient interactions by providing additional context regarding the model’s underlying clinical knowledge. A previous study has shown that patients may perceive ChatGPT’s responses as empathic or trustworthy, even when those responses are clinically inappropriate [2]. However, effective clinical communication is not merely about verbal fluency or emotional tone; it must be grounded in adequate medical knowledge. For this reason, earlier studies evaluating artificial intelligence empathy have also assessed the clinical appropriateness of responses and compared them to those of human physicians [2,3].

Consistent with prior work, we also assessed the simulated patient conversations in terms of both clinical accuracy and empathic engagement, as evaluated by an emergency medicine professor. However, we recognize that physicians vary in their diagnostic styles and communication approaches. Subjective judgment from the evaluator may have influenced the ratings, especially given that the evaluated outputs were full conversations rather than single responses. To provide a complementary and more structured assessment, we incorporated a written test focused on 3 key domains: diagnosis, investigation, and treatment planning. Performance on this test may serve as a supporting element to help ensure that ChatGPT’s interpersonal strengths were not misaligned with clinical reasoning.

As the author correctly pointed out, the questions in the written examination were adapted from a publicly available textbook published in 2018 [4]. We cannot rule out the possibility that ChatGPT was exposed to this material or similar content during pretraining, due to the limited transparency regarding its training data. Therefore, part of the model’s performance on the written test may have been influenced by data contamination. We fully acknowledge this methodological limitation and agree that the results from the written examination should be interpreted with caution.

We are truly grateful for the author’s thoughtful engagement, which raises important considerations for future studies regarding the assessment of AI in clinical settings.

During the preparation of this manuscript, the authors used ChatGPT to assist with improving readability and correcting grammatical errors. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

None declared.

Park

Hwang

Park

Clinical performance and communication skills of ChatGPT versus physicians in emergency medicine: simulated patient study

JMIR Med Inform 2025 07 17 13 e68409

10.2196/68409

40674718

v13i1e68409

PMC12289221

Armbruster

Bussmann

Rothhaas

Titze

Grützner

Paul Alfred

Freischmidt

“Doctor ChatGPT, can you help me?” The patient’s perspective: cross-sectional study

J Med Internet Res 2024 10 01 26 e58831

10.2196/58831

39352738

v26i1e58831

PMC11480680

Ayers

Poliak

Dredze

Leas

Zhu

Kelley

Faix

Goodman

Longhurst

Hogarth

Smith

Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum

JAMA Intern Med 2023 06 01 183 6 589 596

10.1001/jamainternmed.2023.1838

37115527

2804309

PMC10148230

Shamil

Ravi

Mistry

100 Cases in Emergency Medicine and Critical Care 2018

Boca Raton, FL

CRC Press