Published on in Vol 12 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/57674, first published .
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Journals

  1. Rewthamrongsris P, Burapacheep J, Trachoo V, Porntaveetus T. Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures. International Dental Journal 2025;75(1):206 View
  2. Andrew A, Tizzard E. Large language models for improving cancer diagnosis and management in primary health care settings. Journal of Medicine, Surgery, and Public Health 2024:100157 View
  3. Chang Y, Yin J, Li J, Liu C, Cao L, Lin S. Applications and Future Prospects of Medical LLMs: A Survey Based on the M-KAT Conceptual Framework. Journal of Medical Systems 2024;48(1) View
  4. Kreso A, Boban Z, Kabic S, Rada F, Batistic D, Barun I, Znaor L, Kumric M, Bozic J, Vrdoljak J. Using large language models as decision support tools in emergency ophthalmology. International Journal of Medical Informatics 2025:105886 View

Books/Policy Documents

  1. Xu H, Xue T, Liu D, Zhang F, Westin C, Kikinis R, O’Donnell L, Cai W. Foundation Models for General Medical AI. View