ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

doi:10.2196/48808

Published on 09.Oct.2023 in Vol 11 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/48808, first published 09.May.2023.

Doctor in white coat using computer in modern office

ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

Takanobu Hirosawa¹

; Ren Kawamura¹

; Yukinori Harada¹

; Kazuya Mizuta¹

; Kazuki Tokumasu²

; Yuki Kaji³

; Tomoharu Suzuki⁴

; Taro Shimizu¹

Article Authors Cited by (107) Tweetations (12) Metrics

Journals

Kaneda Y, Takita M, Hamaki T, Ozaki A, Tanimoto T. ChatGPT's Potential in Enhancing Physician Efficiency: A Japanese Case Study. Cureus 2023 View
Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023 View
Onder C, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz S. Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Scientific Reports 2024;14(1) View
Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Formative Research 2023;7:e51798 View
Nacher M, Françoise U, Adenis A. ChatGPT neglects a neglected disease. The Lancet Infectious Diseases 2024;24(2):e76 View
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interactive Journal of Medical Research 2024;13:e54704 View
Koga S, Du W. Integrating AI in medicine: Lessons from Chat-GPT's limitations in medical imaging. Digestive and Liver Disease 2024;56(6):1114 View
Silva T, Andrade-Bortoletto M, Ocampo T, Alencar-Palha C, Bornstein M, Oliveira-Santos C, Oliveira M. Performance of a commercially available Generative Pre-trained Transformer (GPT) in describing radiolucent lesions in panoramic radiographs and establishing differential diagnoses. Clinical Oral Investigations 2024;28(3) View
Mizuta K, Hirosawa T, Harada Y, Shimizu T. Can ChatGPT-4 evaluate whether a differential diagnosis list contains the correct diagnosis as accurately as a physician?. Diagnosis 2024;11(3):321 View
Koga S. The double-edged nature of ChatGPT in self-diagnosis. Wiener klinische Wochenschrift 2024;136(7-8):243 View
Hudon A, Kiepura B, Pelletier M, Phan V. Using ChatGPT in Psychiatry to Design Script Concordance Tests in Undergraduate Medical Education: Mixed Methods Study. JMIR Medical Education 2024;10:e54067 View
Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases. JMIR Formative Research 2024;8:e59267 View
Bridges J. Computerized diagnostic decision support systems – a comparative performance study of Isabel Pro vs. ChatGPT4. Diagnosis 2024;11(3):250 View
Fabre B, Magalhaes Filho M, Aguiar P, da Costa F, Gutierres B, William W, Del Giglio A. Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature. ESMO Real World Data and Digital Oncology 2024;4:100042 View
Shikino K, Shimizu T, Otsuka Y, Tago M, Takahashi H, Watari T, Sasaki Y, Iizuka G, Tamura H, Nakashima K, Kunitomo K, Suzuki M, Aoyama S, Kosaka S, Kawahigashi T, Matsumoto T, Orihara F, Morikawa T, Nishizawa T, Hoshina Y, Yamamoto Y, Matsuo Y, Unoki Y, Kimura H, Tokushima M, Watanuki S, Saito T, Otsuka F, Tokuda Y. Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research. JMIR Medical Education 2024;10:e58758 View
Harada Y, Sakamoto T, Sugimoto S, Shimizu T. Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study. JMIR Formative Research 2024;8:e53985 View
Nasef H, Patel H, Amin Q, Baum S, Ratnasekera A, Ang D, Havron W, Nakayama D, Elkbuli A. Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons’ Perspectives. The American Surgeon™ 2025;91(3):325 View
Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
Koga S, Du W. From text to image: challenges in integrating vision into ChatGPT for medical image interpretation. Neural Regeneration Research 2025;20(2):487 View
Takahashi H, Shikino K, Kondo T, Komori A, Yamada Y, Saita M, Naito T. Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR Medical Education 2024;10:e59133 View
Stalp J, Denecke A, Jentschke M, Hillemanns P, Klapdor R. Quality of ChatGPT-Generated Therapy Recommendations for Breast Cancer Treatment in Gynecology. Current Oncology 2024;31(7):3845 View
Hirosawa T, Shimizu T. The potential, limitations, and future of diagnostics enhanced by generative artificial intelligence. Diagnosis 2024;11(4):446 View
Hoppe J, Auer M, Strüven A, Massberg S, Stremmel C. ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis. Journal of Medical Internet Research 2024;26:e56110 View
Ono D, Dickson D, Koga S. Evaluating the efficacy of few‐shot learning for GPT‐4Vision in neurodegenerative disease histopathology: A comparative analysis with convolutional neural network model. Neuropathology and Applied Neurobiology 2024;50(4) View
Danesh A, Danesh A, Danesh F. Innovating dental diagnostics: ChatGPT's accuracy on diagnostic challenges. Oral Diseases 2025;31(3):911 View
Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
Gargari O, Fatehi F, Mohammadi I, Firouzabadi S, Shafiee A, Habibi G. Diagnostic accuracy of large language models in psychiatry. Asian Journal of Psychiatry 2024;100:104168 View
Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nature Medicine 2024;30(11):3098 View
Shah-Mohammadi F, Finkelstein J. Accuracy Evaluation of GPT-Assisted Differential Diagnosis in Emergency Department. Diagnostics 2024;14(16):1779 View
Chen J, Reddy A, Al-Sharif E, Shoji M, Kalaw F, Eslani M, Lang P, Arya M, Koretz Z, Bolo K, Arnett J, Roginiel A, Do J, Robbins S, Camp A, Scott N, Rudell J, Weinreb R, Baxter S, Granet D. Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?. Ophthalmology Science 2025;5(1):100600 View
Hwai H, Ho Y, Wang C, Huang C. Large language model application in emergency medicine and critical care. Journal of the Formosan Medical Association 2025;124(8):696 View
Young C, Enichen E, Rivera C, Auger C, Grant N, Rao A, Succi M. Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports. American Journal of Medical Genetics Part A 2025;197(2) View
Radha Krishnan R, Hung E, Ashford M, Edillo C, Gardner C, Hatrick H, Kim B, Lai A, Li X, Zhao Y, Raubenheimer J. Evaluating the capability of ChatGPT in predicting drug–drug interactions: Real‐world evidence using hospitalized patient data. British Journal of Clinical Pharmacology 2024;90(12):3361 View
Ghanta S, Al’Aref S, Lala-Trinidade A, Nadkarni G, Ganatra S, Dani S, Mehta J. Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review. Diagnostics 2024;14(21):2393 View
Du W, Jin X, Harris J, Brunetti A, Johnson E, Leung O, Li X, Walle S, Yu Q, Zhou X, Bian F, McKenzie K, Kanathanavanich M, Ozcelik Y, El-Sharkawy F, Koga S. Large language models in pathology: A comparative study of ChatGPT and Bard with pathology trainees on multiple-choice questions. Annals of Diagnostic Pathology 2024;73:152392 View
Hayat J, Lari M, AlHerz M, Lari A. The Utility and Limitations of Artificial Intelligence-Powered Chatbots in Healthcare. Cureus 2024 View
Schmidt H, Rotgans J, Mamede S. Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents. Journal of General Internal Medicine 2025;40(4):790 View
Puleio F, Lo Giudice G, Bellocchio A, Boschetti C, Lo Giudice R. Clinical, Research, and Educational Applications of ChatGPT in Dentistry: A Narrative Review. Applied Sciences 2024;14(23):10802 View
Ho C, Tian T, Ayers A, Aaron R, Phillips V, Wolf R, Mathioudakis N, Dai T, Klonoff D. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Medical Informatics and Decision Making 2024;24(1) View
Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, Liang Q, Zhang J, Li X. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education 2024;24(1) View
Cuevas-Nunez M, Silberberg V, Arregui M, Jham B, Ballester-Victoria R, Koptseva I, de Tejada M, Posada-Caez R, Manich V, Bara-Casaus J, Fernández-Figueras M. Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology 2025;139(4):453 View
Shiferaw M, Zheng T, Winter A, Mike L, Chan L. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. BMC Medical Informatics and Decision Making 2024;24(1) View
Farhadi Nia M, Ahmadi M, Irankhah E. Transforming dental diagnostics with artificial intelligence: advanced integration of ChatGPT and large language models for patient care. Frontiers in Dental Medicine 2025;5 View
Nan D, Zhao X, Chen C, Sun S, Lee K, Kim J. Bibliometric Analysis on ChatGPT Research with CiteSpace. Information 2025;16(1):38 View
Hu X, Xu D, Zhang H, Tang M, Gao Q. Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors. The Spine Journal 2025;25(6):1196 View
Saraiva M, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida M, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso L, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. Journal of Clinical Medicine 2025;14(2):572 View
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen J, McKechnie T, Lee Y, Mayol J, Antoniou S, Thirunavukarasu A, Sanger S, Ramji K, Guyatt G. Large Language Models for Chatbot Health Advice Studies. JAMA Network Open 2025;8(2):e2457879 View
Oh R, Gonsalves T. Lightweight Denoising Diffusion Implicit Model for Medical Segmentation. Electronics 2025;14(4):676 View
Naeem A, Khan O, Baqir S, Jana K, Shankar P, Kaur A, Zaaya M, Sajid F, Mohsin F, Boadla M, Oo A, Wong V, Noor M, Sandhu S, Slobodyanuk K, Shetty V, Tokayer A. Language Artificial Intelligence Models as Pioneers in Diagnostic Medicine? A Retrospective Analysis on Real-Time Patients. Journal of Clinical Medicine 2025;14(4):1131 View
Ford J, Pevy N, Grunewald R, Howell S, Reuber M. Can artificial intelligence diagnose seizures based on patients' descriptions? A study of GPT‐4. Epilepsia 2025;66(6):1959 View
Bhasuran B, Jin Q, Xie Y, Yang C, Hanna K, Costa J, Shavor C, Han W, Lu Z, He Z. Preliminary analysis of the impact of lab results on large language model generated differential diagnoses. npj Digital Medicine 2025;8(1) View
Suga T, Uehara O, Abiko Y, Toyofuku A. Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis. Journal of Pain Research 2025;Volume 18:1387 View
Takita H, Kabata D, Walston S, Tatekawa H, Saito K, Tsujimoto Y, Miki Y, Ueda D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025;8(1) View
Mansoor M, Ibrahim A, Grindem D, Baig A. Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance. JMIRx Med 2025;6:e65263 View
Shan G, Chen X, Wang C, Liu L, Gu Y, Jiang H, Shi T. Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis. JMIR Medical Informatics 2025;13:e64963 View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Jin Z, Abola R, Bargnes V, Tsivitis A, Rahman S, Schwartz J, Bergese S, Schabel J. The utility of generative artificial intelligence Chatbot (ChatGPT) in generating teaching and learning material for anesthesiology residents. Frontiers in Artificial Intelligence 2025;8 View
Wu X, Huang Y, He Q. A large language model improves clinicians’ diagnostic performance in complex critical illness cases. Critical Care 2025;29(1) View
Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H. Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis. Journal of Medical Internet Research 2025;27:e72062 View
Yao G, Zhang W, Zhu Y, Wong U, Zhang Y, Yang C, Shen G, Li Z, Gao H. Comparing the accuracy of large language models and prompt engineering in diagnosing realworld cases. International Journal of Medical Informatics 2025;203:106026 View
Del Monte F, Barolo R, Circhetta M, Delmonaco A, Castagno E, Pivetta E, Bergamasco L, Franco M, Olmo G, Bondone C. Diagnostic efficacy of large language models in the pediatric emergency department: a pilot study. Frontiers in Digital Health 2025;7 View
Hassanein F, El Barbary A, Hussein R, Ahmed Y, El‐Guindy J, Sarhan S, Abou‐Bakr A. Diagnostic Performance of ChatGPT‐4o and DeepSeek‐3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis. Oral Diseases 2025;31(12):3361 View
Danesh A, Danesh A, Danesh F. Advancing dental diagnostics with OpenAI's o1-preview. The Journal of the American Dental Association 2025;156(7):555 View
Nacher M, Françoise U, Adenis A, Siddig E. Large language models and their performance for the diagnosis of histoplasmosis. PLOS Neglected Tropical Diseases 2025;19(7):e0013151 View
Mazzucchelli M, Salzano S, Caltabiano R, Magro G, Certo F, Barbagallo G, Broggi G. Diagnostic Performance of ChatGPT‐4.0 in Histopathological Analysis of Gliomas: A Single Institution Experience. Neuropathology 2025;45(4) View
Wu X, Huang Y, He Q. Diagnostic performance of newly developed large language models in critical illness cases: A comparative study. International Journal of Medical Informatics 2025;204:106088 View
Patel A, Cheung J. Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4. Journal of Clinical Sleep Medicine 2025;21(9):1511 View
Hirosawa T, Mizuta K, Harada Y, Shimizu T. Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians. The American Journal of Medicine 2023;136(11):1119 View
Yilmaz B, Kahraman E, Brennan M, Grewal A, Aktas A. Accuracy of ChatGPT‐4 Plus in Providing Information on Oral Cancer Management. Oral Diseases 2026;32(2):422 View
Esposito E, Cardakli N, Christoff A, Kraus C. Diagnostic Accuracy and Counseling Quality of GPT-4o for Strabismus and Pseudostrabismus in Patient-Generated Mobile Photographs: A Preliminary Evaluation. Clinical Ophthalmology 2025;Volume 19:4077 View
Szydlak R, Kiyak Y, Hege I, Górski S, Linglart L, Shchudrova T, Torre D, Kononowicz A. ChatGPT versus human authors: A comparative study of concept maps for clinical reasoning training with virtual patients. Medical Teacher 2026;48(4):712 View
Haupt F, Rödig T, Liersch P. Evaluating ChatGPT-4o as an Educational Support Tool for the Emergency Management of Dental Trauma: Randomized Controlled Study Among Students. JMIR Medical Education 2025;11:e80576 View
Brooks J, Blankson P, Campbell P, Cowley R, Yang T, Oseni T, Rodriguez A, Idris M. Assessment of Physician Preferences for Large Language Model–Generated Responses Across Geographic Regions and Clinical Experience Levels: Preliminary Survey Study. JMIR Formative Research 2026;10:e82487 View
Lopes S, Mascarenhas M, Fonseca J, Fernandes M, Leite-Moreira A. Unveiling the Algorithm: The Role of Explainable Artificial Intelligence in Modern Surgery. Healthcare 2025;13(24):3208 View
Hassanein F, Hussein R, Almalahy H, Sarhan S, Ahmed Y, Abou-Bakr A. Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment. Scientific Reports 2025;15(1) View
Khalid M, El-Kefraoui C, Wang A, Wang J, Phang P, Brown C, Ghuman A, Raval M, Karimuddin A. Artificial intelligence takes on the multidisciplinary committee: A single-center study for rectal cancer management. Surgery 2026;195:109975 View
Kutbi D, Abou-Bakr E, Haidar H. Evaluating the Accuracy of Medical Information Generated by ChatGPT and Gemini and Its Alignment With International Clinical Guidelines From the Surviving Sepsis Campaign: Comparative Study. JMIR Formative Research 2025;9:e84251 View
Srinivasan S, Ai X, Lo T, Gilson A, Zou M, Zou K, Kim H, Yang M, Pushpanathan K, Yew S, Loke W, Goh J, Chen Y, Kong Y, Fu E, Ong M, Nwanyanwu K, Dave A, Li K, Sun C, Chia M, Yang G, Wong W, Chen D, Liu D, Singer M, Antaki F, Del Priore L, Jonas J, Adelman R, Chen Q, Tham Y. BEnchmarking Large Language Models for Ophthalmology (BELO): An Expert-Curated Data Set and Evaluation Framework for Knowledge and Reasoning. Ophthalmology Science 2026;6(3):101050 View
Jagadeesh Y, Rizvi N, Nair M. Evaluating Generative AI (Microsoft Copilot) as an Adjunctive Decision-Support System in Oral and Maxillofacial Radiology: A Retrospective Study. Oral 2026;6(1):10 View
Xu A, Speakman S, Piranio V, Medina R, Liu M, Lamprecht C, Abchee N, Brennan M. Medical Student Experiences With ChatGPT: National Cross-Sectional Study. JMIR Formative Research 2026;10:e76838 View
Patel A, Contractor H, Heninger H, Vallamchetla S, Li P, Tao C, Cheung J. Performance of successive generative pretrained transformers (GPT) models in medical cases and board style questions. Scientific Reports 2026;16(1) View
Gu Y, Chen X, Shan G, Tao J, Xia Y, Gu Y, Huang P, Shi T. Development and validation of a deep learning-based emergency triage model: a feasibility and effectiveness study. BMC Emergency Medicine 2026;26(1) View
Chen M, Wu Y, Ma J, Jia X, Gao C, Zhao F, Qiao Y. Independent and collaborative performance of large language models and healthcare professionals in diagnosis and triage. npj Digital Medicine 2026;9(1) View
Kopka M, He L, Feufel M. Evaluating the accuracy of ChatGPT model versions for giving care-seeking advice. Communications Medicine 2026;6(1) View
Kopka M, Feufel M. Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study. JMIR Biomedical Engineering 2026;11:e88053 View
Nacanabo M, Seghda A, Tall/Thiam A, Millogo G, Bayala Y, Yameogo V, Samadoulougou A, Zabsonre P. Evaluation of the Diagnostic Capabilities of Artificial Intelligence (GPT‐4) in a Cardiology Department in Sub‐Saharan Africa: Cross‐Sectional Study. Health Science Reports 2026;9(3) View
Wutz M, Söling S, Köberlein-Neu J. Physicians’ expectations of the use of conversational agents in healthcare: a qualitative study. BMC Health Services Research 2026;26(1) View
Hack S, Craig J, Lin C, Fu C, Kwiatkowska M, Kocum P, Allevi F, Saibene A. Retrieval-augmented generative AI enhances clinical reasoning in odontogenic sinusitis versus maxillary sinus mucositis. European Archives of Oto-Rhino-Laryngology 2026;283(4):2353 View
Gulen D, Gözden H, Ekin S, Ceylan I. Structural error asymmetry and harm-weighted analysis of ChatGPT versus ICU Physicians in acid–base interpretation: a prospective observational study. Scientific Reports 2026;16(1) View
Sim J, Horan M, Huang X, Kim M, Srivastava D, Ness K, Hudson M, Baker J, Huang I. Optimizing prompting strategies improves large language model classification of pain- and fatigue-related functional impact in childhood cancer survivors. Communications Medicine 2026;6(1) View
Hassanein F, Ibrahim S, Tomo S, Alsahhaf A, Abou-Bakr A. Prompt engineering shapes diagnostic accuracy and explanation quality of LLM in oral lesion diagnosis: a prospective, expert-blinded benchmark study. Odontology 2026 View
Bayala Y, Kaboré F, Sougué C, Ouedraogo A, Zongo Y, Zabsonré/Tiendrebeogo W, Ouedraogo D. Assessing the clinical reasoning of large language models on complex rheumatology cases: A multidimensional evaluation of four artificial intelligence. Health Informatics Journal 2026;32(2) View
Çetin T, Pay L, Dereli Ş, Arter E, Hayıroglu M. Comparative Analysis of Artificial Intelligence Chatbots for Heart Failure Care. Muğla Sıtkı Koçman Üniversitesi Tıp Dergisi 2026;13(1):92 View
Chaudhry M, Hollman J, Hartstein A, Calley D. Diagnostic utility of artificial intelligence in musculoskeletal physical therapy: A comparison with physical therapists. Musculoskeletal Science and Practice 2026;84:103585 View
Doligalska-Dolina A, Ziółkowska K, Wróbel M. The Role of Artificial Intelligence Algorithms in Challenging Diagnostic Cases – Between Potential and Real Clinical Support. Otolaryngologia Polska 2026;80(2):39 View
Ronel D, Shapiro G, Ben Kiki T, Keren Y. ChatGPT in Orthopedic Trauma: Consistency, Accuracy, and Agreement With Textbook and Expert Opinion. Cureus 2026 View
de Araújo E, de Medeiros Carvalho L, de de Souza B, de Santana I, Martins H, de Pontes Santos H, Mélo C, dos Reis L, Dias L, Batista A, de Lucena E, Bonan P. Virtual assistants based on artificial intelligence for oral diagnosis: help for clinicians AI oral diagnosis helper. Diagnostic Pathology 2026;21(1) View
Shujaat S, Gopinathan Pillai A, Riaz M, Alharbi W, Ganganna K, Alfadley A, Aboalela A, Abolemaaty W. Diagnostic Performance of Contemporary Large Language Models on Free-Text Histopathologic Descriptions in Oral and Maxillofacial Pathology. Head and Neck Pathology 2026;20(1) View
Sun Y, Xu X, Liu D, Long Y, Luo L, Wu M, Ou Y, Zhang Y, Cui Y. Psychological Risk Assessment in Plastic Surgery via a DeepSeek Large Language Model: A Retrospective Cohort Study. Aesthetic Plastic Surgery 2026 View
Seo D, Chung J, Choi Y, Shin Y, Park W. Large Language Models for Endodontic Symptom Assessment and Treatment Planning Using Image-Free Clinical Records: A Comparative Evaluation Study (Preprint). JMIR Medical Informatics 2025 View
Yu C, Li F, Zhang N, Hu H, Huang H, Wang J, Tao Y, Wu Y. Effectiveness of Artificial Intelligence–Assisted Peer Teaching in Orthopedic Clinical Education: Historical Cohort Study. JMIR Medical Education 2026;12:e87959 View
Sağlam Gürmen E, Yorgancıoğlu M, Oral A. Agreement between ChatGPT and emergency physicians in laceration management: A prospective study. Injury 2026:113478 View
Mlika M, Zorgati M, Ben Ismail I, Cheikhrouhou S, Hofman P, Labbene I. Comparing Artificial intelligence to physicians’ competences in the domain of clinical reasoning: A systematic review and meta-analysis. Journal of Medical Education and Curricular Development 2026;13 View

Books/Policy Documents

Georgiadi M, Tomprou D, Inglezaki I, Meramveliotakis I, Plexousakis S. AI in Learning, Educational Leadership, and Special Education. View
Herrera Montano I, Góngora Alonso S, Martínez Licort R, Sainz de Abajo B, de la Torre Díez I, Miramontes González J, Simón Pérez C, Briongos Figuero L, Corral Gudino L. Proceedings of 20th Iberian Conference on Information Systems and Technologies (CISTI 2025). View
Balasubramanian N, Dakshit S. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. View

Conference Proceedings

Santos J, Santos H, Ulbrich A, Faccio D, Tabalipa F, Nogueira R, Costa M. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Evaluating LLMs for Diagnosis Summarization View

Citation

Please cite as:

Hirosawa T, Kawamura R, Harada Y, Mizuta K, Tokumasu K, Kaji Y, Suzuki T, Shimizu T
ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation
JMIR Med Inform 2023;11:e48808
doi: 10.2196/48808 PMID: 37812468 PMCID: 10594139

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Computer-Aided Diagnosis (405) Clinical Informatics (2139) Clinical Information and Decision Making (3537) Decision Support for Health Professionals (2119) Advanced Data Analytics in eHealth (335) Ontologies, Classifications, and Coding (417) Chatbots and Conversational Agents (1135)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn