Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

doi:10.2196/57674

Published on 28.Jun.2024 in Vol 12 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/57674, first published 25.Feb.2024.

Doctor in lab coat with stethoscope reviews medical data on tablet and large screen.

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Jie Xu¹

; Lu Lu¹

; Xinwei Peng¹

; Jiali Pang¹

; Jinru Ding¹

; Lingrui Yang²

; Huan Song^{3, 4}

; Kang Li^{3, 4}

; Xin Sun²

; Shaoting Zhang¹

Article Authors Cited by (20) Tweetations Metrics

Jie Xu ¹ , DHM ; Lu Lu ¹ , MA ; Xinwei Peng ¹ , MM ; Jiali Pang ¹ , MS ; Jinru Ding ¹ , MEng ; Lingrui Yang ² , MSc ; Huan Song ^{3, 4} , PhD ; Kang Li ^{3, 4} , PhD ; Xin Sun ² , MD ; Shaoting Zhang ¹ , PhD

¹ Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China

² Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

³ West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China

⁴ Med-X Center for Informatics, Sichuan University, Chengdu, China

Corresponding Author:

Shaoting Zhang, PhD
Shanghai Artificial Intelligence Laboratory
OpenMedLab
West Bank International Artificial Intelligence Center, 701 Yunjin Road
Shanghai 200032
China
Phone: 86 021-23537800
Email: zhangshaoting@pjlab.org.cn

Citation

Please cite as:

Xu J, Lu L, Peng X, Pang J, Ding J, Yang L, Song H, Li K, Sun X, Zhang S
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation
JMIR Med Inform 2024;12:e57674
doi: 10.2196/57674 PMID: 38952020 PMCID: 11225096

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Natural Language Processing (1251) Formative Evaluation of Digital Health Interventions (5021) Chatbots and Conversational Agents (1150) Artificial Intelligence (4625) Generative Language Models Including ChatGPT (1455) AI Language Models in Health Care (714)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn