<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="letter" dtd-version="2.0">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JMI</journal-id>
      <journal-id journal-id-type="nlm-ta">JMIR Med Inform</journal-id>
      <journal-title>JMIR Medical Informatics</journal-title>
      <issn pub-type="epub">2291-9694</issn>
      <publisher>
        <publisher-name>JMIR Publications</publisher-name>
        <publisher-loc>Toronto, Canada</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">v13i1e80987</article-id>
      <article-id pub-id-type="pmid">41021280</article-id>
      <article-id pub-id-type="doi">10.2196/80987</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Letter to the Editor</subject>
        </subj-group>
        <subj-group subj-group-type="article-type">
          <subject>Letter to the Editor</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Data Contamination in AI Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Iannaccio</surname>
            <given-names>Amanda</given-names>
          </name>
        </contrib>
      </contrib-group>
      <contrib-group>
        <contrib id="contrib1" contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Acar</surname>
            <given-names>Alaeddin</given-names>
          </name>
          <degrees>MD</degrees>
          <xref rid="aff1" ref-type="aff">1</xref>
          <address>
            <institution>Department of Neurosurgery, Kulu State Hospital</institution>
            <addr-line>No 4, 139518 Street, Dinek, Kulu</addr-line>
            <addr-line>Konya, 42770</addr-line>
            <country>Turkey</country>
            <phone>90 542 472 37 23</phone>
            <email>alaeacar@gmail.com</email>
          </address>
          <ext-link ext-link-type="orcid">https://orcid.org/0009-0006-0417-6785</ext-link>
        </contrib>
      </contrib-group>
      <aff id="aff1">
        <label>1</label>
        <institution>Department of Neurosurgery, Kulu State Hospital</institution>
        <addr-line>Konya</addr-line>
        <country>Turkey</country>
      </aff>
      <author-notes>
        <corresp>Corresponding Author: Alaeddin Acar <email>alaeacar@gmail.com</email></corresp>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2025</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>29</day>
        <month>9</month>
        <year>2025</year>
      </pub-date>
      <volume>13</volume>
      <elocation-id>e80987</elocation-id>
      <history>
        <date date-type="received">
          <day>20</day>
          <month>7</month>
          <year>2025</year>
        </date>
        <date date-type="accepted">
          <day>20</day>
          <month>8</month>
          <year>2025</year>
        </date>
      </history>
      <copyright-statement>©Alaeddin Acar. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 29.09.2025.</copyright-statement>
      <copyright-year>2025</copyright-year>
      <license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
        <p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.</p>
      </license>
      <self-uri xlink:href="https://medinform.jmir.org/2025/1/e80987" xlink:type="simple"/>
      <related-article related-article-type="commentary-article" id="v13i1e68409" ext-link-type="doi" xlink:href="10.2196/68409" vol="13" page="e68409" xlink:type="simple">https://medinform.jmir.org/2025/1/e68409/</related-article>
      <related-article related-article-type="commentary" id="v13i1e82057" ext-link-type="doi" xlink:href="10.2196/82057" vol="13" page="e82057" xlink:type="simple">http://medinform.jmir.org/2025/1/e82057/</related-article>
      <kwd-group>
        <kwd>artificial intelligence</kwd>
        <kwd>large language model</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>emergency medicine</kwd>
        <kwd>clinical performance examination</kwd>
        <kwd>history taking</kwd>
        <kwd>clinical reasoning</kwd>
        <kwd>empathy</kwd>
        <kwd>patient experience</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <p>This letter is regarding the recent publication of the article titled “Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study” by Park et al [<xref ref-type="bibr" rid="ref1">1</xref>]. The study makes a significant contribution to the growing field of artificial intelligence (AI) evaluation in medicine, and I congratulate the authors on their valuable work. However, I would like to highlight a potential methodological limitation in the written examination portion of the study. The authors state that their examination questions were taken from a 2018 textbook, <italic>100 Cases in Emergency Medicine and Critical Care</italic> [<xref ref-type="bibr" rid="ref2">2</xref>]. The AI model they tested, ChatGPT (OpenAI), was trained on huge amounts of public text from the internet, which likely included this textbook. This means ChatGPT may have seen exactly the same questions and answers during its training.</p>
    <p>This problem is known as “data contamination.” If the AI has already seen the test questions, its high scores might show good memory, not good medical reasoning. This makes the comparison to human doctors, who were seeing the questions for the first time, unfair. The study found that ChatGPT performed much better than doctors on this written test, but this result could be due to this methodological limitation.</p>
    <p>Other researchers in the field are aware of this problem and take steps to avoid it. For example, a study by Busch et al [<xref ref-type="bibr" rid="ref3">3</xref>] on radiology used private, members-only cases that were not likely in the AI’s training data to minimize this risk. Another study by Noda et al [<xref ref-type="bibr" rid="ref4">4</xref>] on a Japanese medical examination used questions from an examination that took place after the AI’s training data cut-off date.</p>
    <p>These studies show the importance of using new and unseen questions when testing AI. Because the study by Park et al [<xref ref-type="bibr" rid="ref1">1</xref>] did not use this approach, I believe the results of their written examination should be viewed with caution. Future studies must use methods like those in the Busch et al [<xref ref-type="bibr" rid="ref3">3</xref>] and Noda et al [<xref ref-type="bibr" rid="ref4">4</xref>] papers to ensure a fair and valid test of AI’s abilities.</p>
  </body>
  <back>
    <app-group/>
    <glossary>
      <title>Abbreviations</title>
      <def-list>
        <def-item>
          <term id="abb1">AI</term>
          <def>
            <p>artificial intelligence</p>
          </def>
        </def-item>
      </def-list>
    </glossary>
    <ack>
      <p>Google Gemini was used for language editing.</p>
    </ack>
    <fn-group>
      <fn fn-type="conflict">
        <p>None declared.</p>
      </fn>
    </fn-group>
    <ref-list>
      <ref id="ref1">
        <label>1</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Park</surname>
              <given-names>C</given-names>
            </name>
            <name name-style="western">
              <surname>An</surname>
              <given-names>MH</given-names>
            </name>
            <name name-style="western">
              <surname>Hwang</surname>
              <given-names>G</given-names>
            </name>
            <name name-style="western">
              <surname>Park</surname>
              <given-names>RW</given-names>
            </name>
            <name name-style="western">
              <surname>An</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Clinical performance and communication skills of ChatGPT versus physicians in emergency medicine: simulated patient study</article-title>
          <source>JMIR Med Inform</source>
          <year>2025</year>
          <month>07</month>
          <day>17</day>
          <volume>13</volume>
          <fpage>e68409</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://medinform.jmir.org/2025//e68409/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/68409</pub-id>
          <pub-id pub-id-type="medline">40674718</pub-id>
          <pub-id pub-id-type="pii">v13i1e68409</pub-id>
          <pub-id pub-id-type="pmcid">PMC12289221</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref2">
        <label>2</label>
        <nlm-citation citation-type="book">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Shamil</surname>
              <given-names>E</given-names>
            </name>
            <name name-style="western">
              <surname>Ravi</surname>
              <given-names>P</given-names>
            </name>
            <name name-style="western">
              <surname>Mistry</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <source>100 Cases in Emergency Medicine and Critical Care</source>
          <year>2018</year>
          <publisher-loc>Boca Raton, FL</publisher-loc>
          <publisher-name>CRC Press</publisher-name>
        </nlm-citation>
      </ref>
      <ref id="ref3">
        <label>3</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Busch</surname>
              <given-names>F</given-names>
            </name>
            <name name-style="western">
              <surname>Han</surname>
              <given-names>T</given-names>
            </name>
            <name name-style="western">
              <surname>Makowski</surname>
              <given-names>MR</given-names>
            </name>
            <name name-style="western">
              <surname>Truhn</surname>
              <given-names>D</given-names>
            </name>
            <name name-style="western">
              <surname>Bressem</surname>
              <given-names>KK</given-names>
            </name>
            <name name-style="western">
              <surname>Adams</surname>
              <given-names>L</given-names>
            </name>
          </person-group>
          <article-title>Integrating text and image analysis: exploring GPT-4V’s capabilities in advanced radiological applications across subspecialties</article-title>
          <source>J Med Internet Res</source>
          <year>2024</year>
          <month>05</month>
          <day>01</day>
          <volume>26</volume>
          <fpage>e54948</fpage>
          <pub-id pub-id-type="doi">10.2196/54948</pub-id>
          <pub-id pub-id-type="medline">38691404</pub-id>
          <pub-id pub-id-type="pii">v26i1e54948</pub-id>
          <pub-id pub-id-type="pmcid">PMC11097051</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref4">
        <label>4</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Noda</surname>
              <given-names>M</given-names>
            </name>
            <name name-style="western">
              <surname>Ueno</surname>
              <given-names>T</given-names>
            </name>
            <name name-style="western">
              <surname>Koshu</surname>
              <given-names>R</given-names>
            </name>
            <name name-style="western">
              <surname>Takaso</surname>
              <given-names>Y</given-names>
            </name>
            <name name-style="western">
              <surname>Shimada</surname>
              <given-names>MD</given-names>
            </name>
            <name name-style="western">
              <surname>Saito</surname>
              <given-names>C</given-names>
            </name>
            <name name-style="western">
              <surname>Sugimoto</surname>
              <given-names>H</given-names>
            </name>
            <name name-style="western">
              <surname>Fushiki</surname>
              <given-names>H</given-names>
            </name>
            <name name-style="western">
              <surname>Ito</surname>
              <given-names>M</given-names>
            </name>
            <name name-style="western">
              <surname>Nomura</surname>
              <given-names>A</given-names>
            </name>
            <name name-style="western">
              <surname>Yoshizaki</surname>
              <given-names>T</given-names>
            </name>
          </person-group>
          <article-title>Performance of GPT-4V in answering the Japanese otolaryngology board certification examination questions: evaluation study</article-title>
          <source>JMIR Med Educ</source>
          <year>2024</year>
          <month>03</month>
          <day>28</day>
          <volume>10</volume>
          <fpage>e57054</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://mededu.jmir.org/2024//e57054/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/57054</pub-id>
          <pub-id pub-id-type="medline">38546736</pub-id>
          <pub-id pub-id-type="pii">v10i1e57054</pub-id>
          <pub-id pub-id-type="pmcid">PMC11009855</pub-id>
        </nlm-citation>
      </ref>
    </ref-list>
  </back>
</article>
