%0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e58107 %T Development and Validation of a Machine Learning Algorithm for Predicting Diabetes Retinopathy in Patients With Type 2 Diabetes: Algorithm Development Study %A Kim,Sunyoung %A Park,Jaeyu %A Son,Yejun %A Lee,Hojae %A Woo,Selin %A Lee,Myeongcheol %A Lee,Hayeon %A Sang,Hyunji %A Yon,Dong Keon %A Rhee,Sang Youl %K type 2 diabetes %K diabetes retinopathy %K algorithm %K machine learning %K prediction %K comorbidities %K retinal %K ophthalmology %D 2025 %7 7.2.2025 %9 %J JMIR Med Inform %G English %X Background: Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined. Objective: This study used data from 3 university hospitals in South Korea to conduct a simple and accurate assessment of ML-based risk prediction for the development of DR that can be universally applied to adults with type 2 diabetes mellitus (T2DM). Methods: DR was predicted using data from 2 independent electronic medical records: a discovery cohort (one hospital, n=14,694) and a validation cohort (2 hospitals, n=1856). The primary outcome was the presence of DR at 3 years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort, and the area under the receiver operating characteristic (ROC) curve was analyzed in both cohorts. Results: Among 14,694 patients screened for inclusion, 348 (2.37%) were diagnosed with DR. For DR, the extreme gradient boosting (XGBoost) system had an accuracy of 75.13% (95% CI 74.10‐76.17), a sensitivity of 71.00% (95% CI 66.83‐75.17), and a specificity of 75.23% (95% CI 74.16‐76.31) in the original dataset. Among the validation datasets, XGBoost had an accuracy of 65.14%, a sensitivity of 64.96%, and a specificity of 65.15%. The most common feature in the XGBoost model is dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease. Conclusions: This approach shows the potential to enhance patient outcomes by enabling timely interventions in patients with T2DM, improving our understanding of contributing factors, and reducing DR-related complications. The proposed prediction model is expected to be both competitive and cost-effective, particularly for primary care settings in South Korea. %R 10.2196/58107 %U https://medinform.jmir.org/2025/1/e58107 %U https://doi.org/10.2196/58107