Published on in Vol 12 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/57674, first published .
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Jie Xu   1 , DHM ;   Lu Lu   1 , MA ;   Xinwei Peng   1 , MM ;   Jiali Pang   1 , MS ;   Jinru Ding   1 , MEng ;   Lingrui Yang   2 , MSc ;   Huan Song   3, 4 , PhD ;   Kang Li   3, 4 , PhD ;   Xin Sun   2 , MD ;   Shaoting Zhang   1 , PhD

1 Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China

2 Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

3 West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China

4 Med-X Center for Informatics, Sichuan University, Chengdu, China

Corresponding Author:

  • Shaoting Zhang, PhD
  • Shanghai Artificial Intelligence Laboratory
  • OpenMedLab
  • West Bank International Artificial Intelligence Center, 701 Yunjin Road
  • Shanghai, 200032
  • China
  • Phone: 86 021-23537800
  • Email: zhangshaoting@pjlab.org.cn