Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation
Jie Xu
1
, DHM ;
Lu Lu
1
, MA ;
Xinwei Peng
1
, MM ;
Jiali Pang
1
, MS ;
Jinru Ding
1
, MEng ;
Lingrui Yang
2
, MSc ;
Huan Song
3, 4
, PhD ;
Kang Li
3, 4
, PhD ;
Xin Sun
2
, MD ;
Shaoting Zhang
1
, PhD
1
Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China
2
Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
3
West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
4
Med-X Center for Informatics, Sichuan University, Chengdu, China
Corresponding Author:
-
Shaoting Zhang, PhD
-
Shanghai Artificial Intelligence Laboratory
-
OpenMedLab
-
West Bank International Artificial Intelligence Center, 701 Yunjin Road
-
Shanghai, 200032
-
China
-
Phone:
86 021-23537800
-
Email: zhangshaoting@pjlab.org.cn