TY - JOUR AU - Kim, Sunyoung AU - Park, Jaeyu AU - Son, Yejun AU - Lee, Hojae AU - Woo, Selin AU - Lee, Myeongcheol AU - Lee, Hayeon AU - Sang, Hyunji AU - Yon, Dong Keon AU - Rhee, Sang Youl PY - 2025 DA - 2025/2/7 TI - Development and Validation of a Machine Learning Algorithm for Predicting Diabetes Retinopathy in Patients With Type 2 Diabetes: Algorithm Development Study JO - JMIR Med Inform SP - e58107 VL - 13 KW - type 2 diabetes KW - diabetes retinopathy KW - algorithm KW - machine learning KW - prediction KW - comorbidities KW - retinal KW - ophthalmology AB - Background: Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined. Objective: This study used data from 3 university hospitals in South Korea to conduct a simple and accurate assessment of ML-based risk prediction for the development of DR that can be universally applied to adults with type 2 diabetes mellitus (T2DM). Methods: DR was predicted using data from 2 independent electronic medical records: a discovery cohort (one hospital, n=14,694) and a validation cohort (2 hospitals, n=1856). The primary outcome was the presence of DR at 3 years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort, and the area under the receiver operating characteristic (ROC) curve was analyzed in both cohorts. Results: Among 14,694 patients screened for inclusion, 348 (2.37%) were diagnosed with DR. For DR, the extreme gradient boosting (XGBoost) system had an accuracy of 75.13% (95% CI 74.10‐76.17), a sensitivity of 71.00% (95% CI 66.83‐75.17), and a specificity of 75.23% (95% CI 74.16‐76.31) in the original dataset. Among the validation datasets, XGBoost had an accuracy of 65.14%, a sensitivity of 64.96%, and a specificity of 65.15%. The most common feature in the XGBoost model is dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease. Conclusions: This approach shows the potential to enhance patient outcomes by enabling timely interventions in patients with T2DM, improving our understanding of contributing factors, and reducing DR-related complications. The proposed prediction model is expected to be both competitive and cost-effective, particularly for primary care settings in South Korea. SN - 2291-9694 UR - https://medinform.jmir.org/2025/1/e58107 UR - https://doi.org/10.2196/58107 DO - 10.2196/58107 ID - info:doi/10.2196/58107 ER -