Published on in Vol 11 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/48808, first published .
ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

Journals

  1. Kaneda Y, Takita M, Hamaki T, Ozaki A, Tanimoto T. ChatGPT's Potential in Enhancing Physician Efficiency: A Japanese Case Study. Cureus 2023 View
  2. Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023 View
  3. Onder C, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz S. Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Scientific Reports 2024;14(1) View
  4. Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Formative Research 2023;7:e51798 View
  5. Nacher M, Françoise U, Adenis A. ChatGPT neglects a neglected disease. The Lancet Infectious Diseases 2024;24(2):e76 View
  6. Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interactive Journal of Medical Research 2024;13:e54704 View
  7. Koga S, Du W. Integrating AI in medicine: Lessons from Chat-GPT's limitations in medical imaging. Digestive and Liver Disease 2024;56(6):1114 View
  8. Silva T, Andrade-Bortoletto M, Ocampo T, Alencar-Palha C, Bornstein M, Oliveira-Santos C, Oliveira M. Performance of a commercially available Generative Pre-trained Transformer (GPT) in describing radiolucent lesions in panoramic radiographs and establishing differential diagnoses. Clinical Oral Investigations 2024;28(3) View
  9. Mizuta K, Hirosawa T, Harada Y, Shimizu T. Can ChatGPT-4 evaluate whether a differential diagnosis list contains the correct diagnosis as accurately as a physician?. Diagnosis 2024;11(3):321 View
  10. Koga S. The double-edged nature of ChatGPT in self-diagnosis. Wiener klinische Wochenschrift 2024;136(7-8):243 View
  11. Hudon A, Kiepura B, Pelletier M, Phan V. Using ChatGPT in Psychiatry to Design Script Concordance Tests in Undergraduate Medical Education: Mixed Methods Study. JMIR Medical Education 2024;10:e54067 View
  12. Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases. JMIR Formative Research 2024;8:e59267 View
  13. Bridges J. Computerized diagnostic decision support systems – a comparative performance study of Isabel Pro vs. ChatGPT4. Diagnosis 2024;11(3):250 View
  14. Fabre B, Magalhaes Filho M, Aguiar P, da Costa F, Gutierres B, William W, Del Giglio A. Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature. ESMO Real World Data and Digital Oncology 2024;4:100042 View
  15. Shikino K, Shimizu T, Otsuka Y, Tago M, Takahashi H, Watari T, Sasaki Y, Iizuka G, Tamura H, Nakashima K, Kunitomo K, Suzuki M, Aoyama S, Kosaka S, Kawahigashi T, Matsumoto T, Orihara F, Morikawa T, Nishizawa T, Hoshina Y, Yamamoto Y, Matsuo Y, Unoki Y, Kimura H, Tokushima M, Watanuki S, Saito T, Otsuka F, Tokuda Y. Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research. JMIR Medical Education 2024;10:e58758 View
  16. Harada Y, Sakamoto T, Sugimoto S, Shimizu T. Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study. JMIR Formative Research 2024;8:e53985 View
  17. Nasef H, Patel H, Amin Q, Baum S, Ratnasekera A, Ang D, Havron W, Nakayama D, Elkbuli A. Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons’ Perspectives. The American Surgeon™ 2025;91(3):325 View
  18. Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
  19. Koga S, Du W. From text to image: challenges in integrating vision into ChatGPT for medical image interpretation. Neural Regeneration Research 2025;20(2):487 View
  20. Takahashi H, Shikino K, Kondo T, Komori A, Yamada Y, Saita M, Naito T. Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR Medical Education 2024;10:e59133 View
  21. Stalp J, Denecke A, Jentschke M, Hillemanns P, Klapdor R. Quality of ChatGPT-Generated Therapy Recommendations for Breast Cancer Treatment in Gynecology. Current Oncology 2024;31(7):3845 View
  22. Hirosawa T, Shimizu T. The potential, limitations, and future of diagnostics enhanced by generative artificial intelligence. Diagnosis 2024;11(4):446 View
  23. Hoppe J, Auer M, Strüven A, Massberg S, Stremmel C. ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis. Journal of Medical Internet Research 2024;26:e56110 View
  24. Ono D, Dickson D, Koga S. Evaluating the efficacy of few‐shot learning for GPT‐4Vision in neurodegenerative disease histopathology: A comparative analysis with convolutional neural network model. Neuropathology and Applied Neurobiology 2024;50(4) View
  25. Danesh A, Danesh A, Danesh F. Innovating dental diagnostics: ChatGPT's accuracy on diagnostic challenges. Oral Diseases 2025;31(3):911 View
  26. Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
  27. Gargari O, Fatehi F, Mohammadi I, Firouzabadi S, Shafiee A, Habibi G. Diagnostic accuracy of large language models in psychiatry. Asian Journal of Psychiatry 2024;100:104168 View
  28. Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nature Medicine 2024;30(11):3098 View
  29. Shah-Mohammadi F, Finkelstein J. Accuracy Evaluation of GPT-Assisted Differential Diagnosis in Emergency Department. Diagnostics 2024;14(16):1779 View
  30. Chen J, Reddy A, Al-Sharif E, Shoji M, Kalaw F, Eslani M, Lang P, Arya M, Koretz Z, Bolo K, Arnett J, Roginiel A, Do J, Robbins S, Camp A, Scott N, Rudell J, Weinreb R, Baxter S, Granet D. Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?. Ophthalmology Science 2025;5(1):100600 View
  31. Hwai H, Ho Y, Wang C, Huang C. Large language model application in emergency medicine and critical care. Journal of the Formosan Medical Association 2024 View
  32. Young C, Enichen E, Rivera C, Auger C, Grant N, Rao A, Succi M. Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports. American Journal of Medical Genetics Part A 2025;197(2) View
  33. Radha Krishnan R, Hung E, Ashford M, Edillo C, Gardner C, Hatrick H, Kim B, Lai A, Li X, Zhao Y, Raubenheimer J. Evaluating the capability of ChatGPT in predicting drug–drug interactions: Real‐world evidence using hospitalized patient data. British Journal of Clinical Pharmacology 2024;90(12):3361 View
  34. Ghanta S, Al’Aref S, Lala-Trinidade A, Nadkarni G, Ganatra S, Dani S, Mehta J. Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review. Diagnostics 2024;14(21):2393 View
  35. Du W, Jin X, Harris J, Brunetti A, Johnson E, Leung O, Li X, Walle S, Yu Q, Zhou X, Bian F, McKenzie K, Kanathanavanich M, Ozcelik Y, El-Sharkawy F, Koga S. Large language models in pathology: A comparative study of ChatGPT and Bard with pathology trainees on multiple-choice questions. Annals of Diagnostic Pathology 2024;73:152392 View
  36. Hayat J, Lari M, AlHerz M, Lari A. The Utility and Limitations of Artificial Intelligence-Powered Chatbots in Healthcare. Cureus 2024 View
  37. Schmidt H, Rotgans J, Mamede S. Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents. Journal of General Internal Medicine 2025;40(4):790 View
  38. Puleio F, Lo Giudice G, Bellocchio A, Boschetti C, Lo Giudice R. Clinical, Research, and Educational Applications of ChatGPT in Dentistry: A Narrative Review. Applied Sciences 2024;14(23):10802 View
  39. Ho C, Tian T, Ayers A, Aaron R, Phillips V, Wolf R, Mathioudakis N, Dai T, Klonoff D. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Medical Informatics and Decision Making 2024;24(1) View
  40. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, Liang Q, Zhang J, Li X. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education 2024;24(1) View
  41. Cuevas-Nunez M, Silberberg V, Arregui M, Jham B, Ballester-Victoria R, Koptseva I, de Tejada M, Posada-Caez R, Manich V, Bara-Casaus J, Fernández-Figueras M. Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology 2025;139(4):453 View
  42. Shiferaw M, Zheng T, Winter A, Mike L, Chan L. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. BMC Medical Informatics and Decision Making 2024;24(1) View
  43. Farhadi Nia M, Ahmadi M, Irankhah E. Transforming dental diagnostics with artificial intelligence: advanced integration of ChatGPT and large language models for patient care. Frontiers in Dental Medicine 2025;5 View
  44. Nan D, Zhao X, Chen C, Sun S, Lee K, Kim J. Bibliometric Analysis on ChatGPT Research with CiteSpace. Information 2025;16(1):38 View
  45. Hu X, Xu D, Zhang H, Tang M, Gao Q. Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors. The Spine Journal 2025;25(6):1196 View
  46. Saraiva M, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida M, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso L, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. Journal of Clinical Medicine 2025;14(2):572 View
  47. Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen J, McKechnie T, Lee Y, Mayol J, Antoniou S, Thirunavukarasu A, Sanger S, Ramji K, Guyatt G. Large Language Models for Chatbot Health Advice Studies. JAMA Network Open 2025;8(2):e2457879 View
  48. Oh R, Gonsalves T. Lightweight Denoising Diffusion Implicit Model for Medical Segmentation. Electronics 2025;14(4):676 View
  49. Naeem A, Khan O, Baqir S, Jana K, Shankar P, Kaur A, Zaaya M, Sajid F, Mohsin F, Boadla M, Oo A, Wong V, Noor M, Sandhu S, Slobodyanuk K, Shetty V, Tokayer A. Language Artificial Intelligence Models as Pioneers in Diagnostic Medicine? A Retrospective Analysis on Real-Time Patients. Journal of Clinical Medicine 2025;14(4):1131 View
  50. Ford J, Pevy N, Grunewald R, Howell S, Reuber M. Can artificial intelligence diagnose seizures based on patients' descriptions? A study of GPT‐4. Epilepsia 2025;66(6):1959 View
  51. Bhasuran B, Jin Q, Xie Y, Yang C, Hanna K, Costa J, Shavor C, Han W, Lu Z, He Z. Preliminary analysis of the impact of lab results on large language model generated differential diagnoses. npj Digital Medicine 2025;8(1) View
  52. Suga T, Uehara O, Abiko Y, Toyofuku A. Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis. Journal of Pain Research 2025;Volume 18:1387 View
  53. Takita H, Kabata D, Walston S, Tatekawa H, Saito K, Tsujimoto Y, Miki Y, Ueda D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025;8(1) View
  54. Mansoor M, Ibrahim A, Grindem D, Baig A. Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance. JMIRx Med 2025;6:e65263 View
  55. Shan G, Chen X, Wang C, Liu L, Gu Y, Jiang H, Shi T. Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis. JMIR Medical Informatics 2025;13:e64963 View
  56. Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
  57. Jin Z, Abola R, Bargnes V, Tsivitis A, Rahman S, Schwartz J, Bergese S, Schabel J. The utility of generative artificial intelligence Chatbot (ChatGPT) in generating teaching and learning material for anesthesiology residents. Frontiers in Artificial Intelligence 2025;8 View
  58. Wu X, Huang Y, He Q. A large language model improves clinicians’ diagnostic performance in complex critical illness cases. Critical Care 2025;29(1) View
  59. Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H. Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis. Journal of Medical Internet Research 2025;27:e72062 View
  60. Yao G, Zhang W, Zhu Y, Wong U, Zhang Y, Yang C, Shen G, Li Z, Gao H. Comparing the accuracy of large language models and prompt engineering in diagnosing realworld cases. International Journal of Medical Informatics 2025:106026 View
  61. Del Monte F, Barolo R, Circhetta M, Delmonaco A, Castagno E, Pivetta E, Bergamasco L, Franco M, Olmo G, Bondone C. Diagnostic efficacy of large language models in the pediatric emergency department: a pilot study. Frontiers in Digital Health 2025;7 View
  62. Hassanein F, El Barbary A, Hussein R, Ahmed Y, El‐Guindy J, Sarhan S, Abou‐Bakr A. Diagnostic Performance of ChatGPT‐4o and DeepSeek‐3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis. Oral Diseases 2025 View

Conference Proceedings

  1. Santos J, Santos H, Ulbrich A, Faccio D, Tabalipa F, Nogueira R, Costa M. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Evaluating LLMs for Diagnosis Summarization View