Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration

doi:10.2196/55627

Published on 09.Apr.2024 in Vol 12 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/55627, first published 18.Dec.2023.

Woman with curly hair focused on a computer screen

Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration

Takanobu Hirosawa¹

; Yukinori Harada¹

; Kazuki Tokumasu²

; Takahiro Ito³

; Tomoharu Suzuki⁴

; Taro Shimizu¹

Article Authors Cited by (41) Tweetations (5) Metrics

Journals

Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
Hirosawa T, Harada Y, Tokumasu K, Ito T, Suzuki T, Shimizu T. Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: An Experimental Study (Preprint). JMIR Medical Informatics 2024 View
Liu C, Ho C, Wu T. Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination. Healthcare 2024;12(17):1726 View
Diniz‐Freitas M, Lago‐Méndez L, Limeres‐Posse J, Diz‐Dios P. Challenging ChatGPT‐4V for the Diagnosis of Oral Diseases and Conditions. Oral Diseases 2025;31(2):701 View
Sun S, Chen K, Anavim S, Phillipi M, Yeh L, Huynh K, Cortes G, Tran J, Tran M, Yaghmai V, Houshyar R. Large Language Models with Vision on Diagnostic Radiology Board Exam Style Questions. Academic Radiology 2025;32(5):3096 View
Hiredesai A, Martinez C, Anderson M, Howlett C, Unadkat K, Noland S. Is Artificial Intelligence the Future of Radiology? Accuracy of ChatGPT in Radiologic Diagnosis of Upper Extremity Bony Pathology. HAND 2026;21(1):73 View
Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Application of large language models in disease diagnosis and treatment. Chinese Medical Journal 2025;138(2):130 View
Saraiva M, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida M, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso L, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. Journal of Clinical Medicine 2025;14(2):572 View
Noda M, Takahara S, Hayashi S, Inui A, Oe K, Matsushita T. Evaluating ChatGPT’s Performance in Classifying Pertrochanteric Fractures Based on Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) Standards. Cureus 2025 View
Nguyen H, Dang H, Nguyen T, Hoang V, Nguyen V, Wu J. Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study. PLOS ONE 2025;20(1):e0317423 View
Yang X, Li T, Wang H, Zhang R, Ni Z, Liu N, Zhai H, Zhao J, Meng F, Zhou Z, Tang S, Wang L, Wang X, Luo H, Ren G, Zhang L, Kang X, Wang J, Bo N, Yang X, Xue W, Zhang X, Chen N, Guo R, Li B, Li Y, Liu Y, Zhang T, Liang S, Lv Y, Nie Y, Fan D, Zhao L, Pan Y. Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms. npj Digital Medicine 2025;8(1) View
Chiesa-Estomba C, Andueza-Guembe M, Maniaci A, Mayo-Yanez M, Betances-Reinoso F, Vaira L, Saibene A, Lechien J. Accuracy of ChatGPT-4o in Text and Video Analysis of Laryngeal Malignant and Premalignant Diseases. Journal of Voice 2025 View
Aşar E, İpek İ, Bi̇lge K. Customized GPT-4V(ision) for radiographic diagnosis: can large language model detect supernumerary teeth?. BMC Oral Health 2025;25(1) View
Alyanak B, Çakar İ, Dede B, Yıldızgören M, Bağcıer F. Artificial intelligence vs human expertise: A comparison of plantar fascia thickness measurements through MRI imaging. International Journal of Medical Informatics 2025;203:105999 View
Peng W, Cheng X, Deng J, Zhang X. ChatGPT Applications in Nursing: Current Status and Future Perspectives. Nursing Open 2025;12(6) View
Nguyen D, Kim G, Bedayat A. Evaluating ChatGPT's performance across radiology subspecialties: A meta-analysis of board-style examination accuracy and variability. Clinical Imaging 2025;125:110551 View
Fukataki Y, Hayashi W, Nishimoto N, Ito Y, Kuo P. Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility. PLOS Digital Health 2025;4(6):e0000695 View
Altinbilek E, Az A, Sogut O, Dogan Y, Akdemir T, Belen E, Biter H, Saricicek T, Ozcomlekci M, Kilic N. Task-specific versus general-purpose AI models in ECG analysis: A comparative study with emergency medicine specialists. The American Journal of Emergency Medicine 2025;95:220 View
Kramer R. Comparing ChatGPT with human perceptions of illusory faces. Visual Cognition 2025;33(2):119 View
Tai H, Kovarik C. ChatGPT-4’s Level of Dermatological Knowledge Based on Board Examination Review Questions and Bloom’s Taxonomy. JMIR Dermatology 2025;8:e74085 View
Liu H, Ma C, Yang Y, Liao W, Wang Y. Strategies for enhancing PHC accessibility through mobile and capsule clinics: a spatial location allocation study in China. BMC Health Services Research 2025;25(1) View
Liu J, Zhang T, Ma Y, Hu T, Lin F, Liang H, Yang D, Pan Y, Gao D, Qiu L, Gao T. Generative artificial intelligence perspectives on typical landscape types: Can ChatGPT compete with human insight?. Landscape and Urban Planning 2025;264:105479 View
Attal L, Shvartz E, Gorenshtein A, Pincovich S, Bahir D. Comparative Assessment of Large Language Models in Optics and Refractive Surgery: Performance on Multiple-Choice Questions. Vision 2025;9(4):85 View
Al-Zoghby A, Ismail Ebada A, Saleh A, Abdelhay M, Awad W. A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Computers, Materials & Continua 2025;84(3):4155 View
Erden Y, Dilek G, Temel M, Soylu H, Kalfaoğlu M, Bağcıer F. Evaluating the Performance of ChatGPT-4V in Detecting Inflammatory Magnetic Resonance Imaging Findings of Sacroiliitis: Potentials, Challenges, and Limitations. Journal of Imaging Informatics in Medicine 2025 View
Di Fabrizio D, Daziani G, Qose I, Bindi E, Ilari M, Filosa A, Busardò F, Goteri G, Cobellis G. Seeing Beyond the Microscope: Artificial Intelligence and Fluorescence Confocal Digital Imaging in Pediatric Surgical Pathology. Children 2025;12(12):1608 View
Demrozi F, Farmanbar M, Engan K. Multimodal AI (MMAI) for next-generation healthcare: data domains, algorithms, challenges, and future perspectives. Current Opinion in Biomedical Engineering 2026;37:100632 View
Angelico G, Spadola S, Santoro A, Mulè A, D’Aquila F, La Cava G, Marletta S, Valente M, Urtueta B, Addante F, Narducci N, Memeo L, Colarossi C, Rizzo A, Zannoni G. AI-assisted sentinel lymph node examination and metastatic detection in breast cancer: the potential of ChatGPT for digital pathology research. Pathologica 2025;117(5):468 View
Ma L, Liu Q, Li P, Huang J, Xu Y, Pang J, Xie H. Analysis of responses to comprehensive nursing exam questions by large language models: A comparison of ChatGPT, DeepSeek and students. Nurse Education in Practice 2026;91:104691 View
Olukanni E, Akanmu A, Jebelli H. Multimodal Large Language Models in Construction Education for Learning Human–Robot Collaboration: A Narrative Review. ASCE OPEN: Multidisciplinary Journal of Civil Engineering 2026;4(1) View
Sussan T, Brawley R, Eckroth J, Mossell J, Weitao T. Diagnostic Accuracy of GPT-4 With Vision in Neuroradiology Board-Style Exam Questions: Cross-Sectional Case-Based Study. JMIR Neurotechnology 2026;5:e69708 View
Kirchhoff J, Berns F, Schieder C, Schobel J. Pricing models for diagnostic AI based on qualitative insights from healthcare decision makers. npj Digital Medicine 2026;9(1) View
Ye X, Shen Y, Chen Q, He X, Lu X, Grzybowski A, Jin K, Xie W. Report Generation System for Slit-Lamp Image Interpretation Using Vision-Language Models. Ophthalmology and Therapy 2026;15(4):1509 View
Naidu G, Krishnan V. Artificial Intelligence-Powered Legal Document Processing for Medical Negligence Cases: A Critical Review. International Journal of Intelligence Science 2025;15(01):10 View
Campo-Beamud C, Adan Ruiz A, Bastante Quijano J, Campo Beamud E, Gómez-Romero F, Fernández Ruíz A, Copete S. Publicly available multimodal large language models for ocular surface infections: benchmarking against corneal specialists in triage, diagnosis and treatment. British Journal of Ophthalmology 2026:bjo-2025-328867 View
Alhankawi A, Braithwaite C, Holle A, Moore M, Tarabichi S, Chhabra A. ChatGPT, Claude Sonnet, and Grok Display Similarly Low Rates of Accuracy in Identifying Image‐Based Orthopaedic Sports Pathologies. Arthroscopy, Sports Medicine, and Rehabilitation 2026;8(2) View
Holmstrom J, Braithwaite C, Alhankawi A, Moore M, Patel K, Miller B. Comparing the Efficacy Between ChatGPT 5, Grok 3, and Claude 4.5 Sonnet in Analyzing Orthopedic Trauma-Related Imaging. Journal of Orthopaedic Trauma 2026;40(7):360 View
Eauchai L, Otálora González L, Shi Y, McGinnis M, Yovchev A, Herasevich S, Pickering B, Herasevich V. Do Multimodal Vision-Language Models Enhance the Medical Diagnostic Process? A Systematic Review. Healthcare 2026;14(13):1877 View
He D, Jin L, Mei X, Jia M, Wang Z, Ren B, Cao L. Optimizing in-context exemplars for a generalist vision-language model to support region of interest selection in ovarian tumor grossing. iScience 2026;29(7):116518 View
Kalafat U, Mutlu H, Yazıcı R, Genç M, Bulut B, Öz M, Gür A, Yortanlı M, Şakar U, Zhang L. Artificial intelligence meets pediatric orthopedics: A comparative analysis of ChatGPT-4o, Gemini 2.0, and Claude 3.5 in detecting supracondylar humeral fractures. PLOS One 2026;21(7):e0353782 View
Scott I. Can AI assist in reducing diagnostic error? A narrative review. Diagnosis 2026 View

This paper is in the following e-collection/theme issue:

Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration

Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration

Journals