Published on in Vol 12 (2024)
Preprints (earlier versions) of this paper are
available at
https://preprints.jmir.org/preprint/55627, first published
.

Journals
- Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
- Hirosawa T, Harada Y, Tokumasu K, Ito T, Suzuki T, Shimizu T. Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: An Experimental Study (Preprint). JMIR Medical Informatics 2024 View
- Liu C, Ho C, Wu T. Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination. Healthcare 2024;12(17):1726 View
- Diniz‐Freitas M, Lago‐Méndez L, Limeres‐Posse J, Diz‐Dios P. Challenging ChatGPT‐4V for the Diagnosis of Oral Diseases and Conditions. Oral Diseases 2024 View
- Sun S, Chen K, Anavim S, Phillipi M, Yeh L, Huynh K, Cortes G, Tran J, Tran M, Yaghmai V, Houshyar R. Large Language Models with Vision on Diagnostic Radiology Board Exam Style Questions. Academic Radiology 2024 View
- Hiredesai A, Martinez C, Anderson M, Howlett C, Unadkat K, Noland S. Is Artificial Intelligence the Future of Radiology? Accuracy of ChatGPT in Radiologic Diagnosis of Upper Extremity Bony Pathology. HAND 2024 View
- Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Application of large language models in disease diagnosis and treatment. Chinese Medical Journal 2025;138(2):130 View
- Saraiva M, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida M, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso L, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. Journal of Clinical Medicine 2025;14(2):572 View
- Noda M, Takahara S, Hayashi S, Inui A, Oe K, Matsushita T. Evaluating ChatGPT’s Performance in Classifying Pertrochanteric Fractures Based on Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) Standards. Cureus 2025 View
- Nguyen H, Dang H, Nguyen T, Hoang V, Nguyen V, Wu J. Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study. PLOS ONE 2025;20(1):e0317423 View
- Yang X, Li T, Wang H, Zhang R, Ni Z, Liu N, Zhai H, Zhao J, Meng F, Zhou Z, Tang S, Wang L, Wang X, Luo H, Ren G, Zhang L, Kang X, Wang J, Bo N, Yang X, Xue W, Zhang X, Chen N, Guo R, Li B, Li Y, Liu Y, Zhang T, Liang S, Lv Y, Nie Y, Fan D, Zhao L, Pan Y. Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms. npj Digital Medicine 2025;8(1) View