Published on in Vol 13 (2025)
Preprints (earlier versions) of this paper are
available at
https://preprints.jmir.org/preprint/69485, first published
.

Journals
- Zhong D, Liang Y, Yan H, Chen X, Yang Q, Ma S, Su Y, Chen Y, Huang X, Wang M. A Comparative Study of Five Large Language Models’ Response for Liver Cancer Comprehensive Treatment. Journal of Hepatocellular Carcinoma 2025;Volume 12:1861 View
- Meretukov D, Grechukhina K, Evdokimov V, Didych D, Kondratieva S, Rakitina O, Gordeev A, Shilo P, Khatkov I, Zhukova L. Deriving Real-World Evidence from Non-English Electronic Medical Records in Hormone Receptor-Positive Breast Cancer Using Large Language Models. Cancers 2025;17(23):3836 View
- Kaleci A, Şahinbaş B, Ağadayı E, Çelikkaya S, Altun A, Kardan E. Performance of Large Language Models in Medical Exams: A Comparison Between ChatGPT and Medical Students. Tıp Eğitimi Dünyası 2025;24(74):135 View
- Zhou Y, Wang W, Wang P, Hu K. Diagnostic performance of large language models on the NEJM image challenge: a comparative study with human evaluators and the impact of prompt engineering. Frontiers in Medicine 2026;12 View
- Karampinis E, Zoumpourli C, Kontogianni C, Arkoumanis T, Koumaki D, Mantzaris D, Filippakis K, Papadopoulou M, Theofili M, Enechukwu N, Ouédraogo N, Katoulis A, Zafiriou E, Sgouros D. Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions. Medicina 2026;62(1):227 View
- Wang Y. Integrating large language models into medical undergraduate laboratory course to enhance bioethical competence: a quasi-experimental study. Frontiers in Medicine 2026;12 View
- Atlı Ş, Nalbant G, Beşparmak T, Türkyılmaz A, Erdemir A. Comparison of Artificial Intelligence Chatbots and Dental Students on Context of Dental Trauma. Dental Traumatology 2026 View
- Liu L, Ma K, Wang Y. Performance of five large language models in oral and maxillofacial surgery exam questions: a comparative study. BMC Oral Health 2026;26(1) View
- Qi X, Fan L, Yao Y, Shen S, Yang Z, Zhu J, Yang D. Performance evaluation and comparison of ChatGPT, Gemini, Grok, and DeepSeek in the interpretation of tumor marker reports. Clinica Chimica Acta 2026;588:120984 View
- Kim M, Park J, Kang S. Comparative performance of recent and prior large language models and pediatric residents on pediatric in-training examination questions. Scientific Reports 2026 View
- İzci Çetinkaya F, Mirza A, Ekici H, Eryılmaz Eren E, Ture Z. Evaluation of Artificial Intelligence Chatbots in Providing Brucellosis‐Related Health Information: A Multidimensional Quality Assessment. Zoonoses and Public Health 2026 View
- Li Y, Chen X, Dolata M. LLM-as-a-Judge for mental support: A meta-evaluation using domain-specific platform data. Electronic Markets 2026;36(1) View
Conference Proceedings
- Cheng S, Xu H, Meng S, Hao S, Yue C, Li Z. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. The Privacy Paradox of LLMs: User Perceptions and the Reality of PII Leakage View
