Published on in Vol 13 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/66917, first published .
Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Journals

  1. Omar M, Hijazi K, Omar M, Nadkarni G, Klang E. Performance of large language models on family medicine licensing exams. Family Practice 2025;42(4) View
  2. Omar M, Glicksberg B, Nadkarni G, Klang E. Refining LLMs outputs with iterative consensus ensemble (ICE). Computers in Biology and Medicine 2025;196:110731 View
  3. Huang Y, Yang G, Shen Y, Chen H, Wu W, Li X, Wu Y, Zhang K, Xu J, Zhang J. Application of Large Language Models in Complex Clinical Cases: Cross-Sectional Evaluation Study. JMIR Medical Informatics 2025;13:e73941 View
  4. FUJITA W, SAKAMOTO A, SATO E, KANEKO T, KAGIYAMA N. Transformative Impact of Artificial Intelligence on Internal Medicine: Current Applications, Challenges, and Future Horizons for Urban Health. Juntendo Medical Journal 2025 View