Published on in Vol 13 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/66917, first published .
Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Journals

  1. Omar M, Hijazi K, Omar M, Nadkarni G, Klang E. Performance of large language models on family medicine licensing exams. Family Practice 2025;42(4) View
  2. Omar M, Glicksberg B, Nadkarni G, Klang E. Refining LLMs outputs with iterative consensus ensemble (ICE). Computers in Biology and Medicine 2025;196:110731 View
  3. Huang Y, Yang G, Shen Y, Chen H, Wu W, Li X, Wu Y, Zhang K, Xu J, Zhang J. Application of Large Language Models in Complex Clinical Cases: Cross-Sectional Evaluation Study. JMIR Medical Informatics 2025;13:e73941 View
  4. FUJITA W, SAKAMOTO A, SATO E, KANEKO T, KAGIYAMA N. Transformative Impact of Artificial Intelligence on Internal Medicine: Current Applications, Challenges, and Future Horizons for Urban Health. Juntendo Medical Journal 2025 View
  5. Akinniranye O, Akinniranye O. Performance of Large Language Models and Top-Decile Doctors on an Undergraduate Ophthalmology Examination. Cureus 2025 View
  6. Thelwall M, Yang Y. Implicit and explicit research quality score probabilities from ChatGPT. Quantitative Science Studies 2025;6:1271 View

Conference Proceedings

  1. Meena Y, Mondal S, Potta M. Proceedings of the 16th International Conference of Human-Computer Interaction (HCI) Design & Research. Muteract: Interactive and Iterative Prompt Mutation Interface for LLM Developers and Evaluators View