Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study

Background: The first-year survival rate among patients undergoing hemodialysis remains poor. Current mortality risk scores for patients undergoing hemodialysis employ regression techniques and have limited applicability and robustness. Objective: We aimed to develop a machine learning model utilizing clinical factors to predict first-year mortality in patients undergoing hemodialysis that could assist physicians in classifying high-risk patients. Methods: Training and testing cohorts consisted of 5351 patients from a single center and 5828 patients from 97 renal centers undergoing hemodialysis (incident only). The outcome was all-cause mortality during the first year of dialysis. Extreme gradient boosting was used for algorithm training and validation. Two models were established based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2), and 10-fold cross-validation was applied to each model. The area under the curve (AUC), sensitivity (recall), specificity, precision, balanced accuracy, and F1 score were used to assess the predictive ability of the models. Results: In the training and testing cohorts, 585 (10.93%) and 764 (13.11%) patients, respectively, died during the first-year follow-up. Of 42 candidate features, the 15 most important features were selected. The performance of model 1 (AUC 0.83, 95% CI 0.78-0.84) was similar to that of model 2 (AUC 0.85, 95% CI 0.81-0.86). Conclusions: We developed and validated 2 machine learning models to predict first-year mortality in patients undergoing hemodialysis. Both models could be used to stratify high-risk patients at the early stages of dialysis. (JMIR Med Inform


Background
The overall prevalence of chronic kidney disease is 10.8% in China and 15% in the United States, which has brought significant economic, social, and medical burdens on patients and society [1][2][3]. According to the United States Renal Data System, there are approximately 120,000 patients with end-stage renal disease starting chronic renal replacement therapy every year [2]. However, survival among incident hemodialysis patients remains poor, especially in the first year of the initiation of dialysis [4,5].
End-stage renal disease is a complex disease state with multiple associated comorbidities. Patients initiating hemodialysis often have acute complications, and some of them suffer from major comorbid conditions that are associated with poor short-term prognoses [6]. It is essential to stratify the risk of mortality according to clinical and laboratory findings of patients undergoing hemodialysis; therefore, the identification of patients undergoing hemodialysis who are at high risk of first-year mortality is of great clinical significance. It can inform patients of their survival prognosis in the early stages of dialysis and allow clinicians to make targeted intervention strategies to improve first-year outcomes. Previous studies [7][8][9][10][11] have identified many risk factors for early dialysis mortality, such as old age, chronic heart failure, catheter use, low albumin, low hemoglobin, and high estimated glomerular filtration rate at dialysis initiation. However, because of the heterogeneity of primary disorders and broad comorbidities, these risk factors are not enough to be used for conclusive decision making. In recent years, a number of clinical risk models have been developed to predict early mortality in the dialysis population, and most are based on linear models (logistic or Cox model) [12][13][14][15][16]. The performances of these models were not good enough in either the original population or the external validation-area under the curve (AUC) of these models ranged from 0.710 to 0.752 [17]. In addition, no study compared models based on predialysis data with models based on data after dialysis.
In recent years, machine learning has been proven to be a very powerful method by researchers in medical fields [18][19][20][21]. Machine learning is useful in identifying the most important factors and for developing predictive models with the best performance. A recent study [22] reported on a random forest machine learning model used to predict first-year survival of incident hemodialysis patients. The model's AUC was 0.749 (95% CI 0.742-0.755), which was superior to those of traditional risk prediction models; however, this is not accurate enough for clinical application.

Objective
Therefore, in this study, we sought to develop and validate sufficiently accurate models based on machine learning techniques, utilizing readily available clinical factors to predict first-year mortality in incident dialysis patients.

Study Design
This study retrospectively collected data from Zhejiang Dialysis System. Zhejiang Dialysis System is a database of hemodialysis and peritoneal dialysis patients in East China. Training data were retrieved from the First Affiliated Hospital College of Medicine Zhejiang University between January 2007 and April 2019 ( Figure 1). Testing data were collected from 97 renal centers between January 2010 and August 2018 for external validation ( Figure 1). All follow-up data were updated to August 2019. Adult patients (aged ≥18 years) with end-stage renal disease and with follow-up exceeding 12 months who started maintenance hemodialysis were included. Patients who died within 12 months of follow-up were also included. The exclusion criteria were as follows: patients with a history of previous renal replacement therapy, patients whose kidney function recovered within 3 months, patients who received renal transplantation or switched to peritoneal dialysis within 12 months after dialysis initiation. We also excluded patients with missing information on disease diagnoses or age at dialysis initiation.
This study followed the tenets of the Declaration of Helsinki and was approved by the ethics committee of the First Affiliated Hospital of Zhejiang University (IIT20200088A) in Hangzhou, China. Written informed consent was obtained from each participant.

Outcome and Predictors
The outcome of this study was all-cause mortality during the first year of dialysis. Outcome status and potential candidate variables for the prediction tool, including demographic information, disease diagnoses, comorbidities, and laboratory test results, were obtained from the Zhejiang Dialysis System.
Demographic information and type of vascular access were collected at the start of dialysis. Disease diagnoses, comorbid information, and laboratory test results were collected 0-3 months after dialysis initiation. The most recent serum creatinine measurements prior to the index date were used to estimate the glomerular filtration rate using the Chronic Kidney Disease Epidemiology Collaboration equation [23].
A total of 42 variables were included as candidate features based on review of relevant literature and clinical experience. Only BMI and ferritin had missing data, and both instances of missing data were less than 6% (Table 1).

Data Preprocessing
Before the baseline model was developed, missing data were imputed with the mean value for continuous variables and the mode value for categorical variables. By using one-hot encoding, all categorical features were transformed into numerical features. Box-Cox transformation was performed to normalize numerical features that were highly skewed [24].

Algorithm Development and Validation
An extreme gradient boosting machine learning algorithm was employed to build a model to predict the correlation between features and the outcome. Extreme gradient boosting is an integrated learning algorithm based on gradient boosted decision trees [25]. Using the Gini impurity index [26], we estimated the feature importance scores of candidate features after going through the training process. The feature importance scores showed how valuable each feature was in the construction of the boosted decision trees within the model.
The extreme gradient boosting algorithm was employed because (1) it has high efficiency and accuracy, (2) it can prevent overfitting via regularization, (3) it provides feature importance, and (4) it allows the use of a wide variety of computing environments.
Other popular machine learning algorithms-adaptive boosting, light gradient boosting machine, logistic regression, linear discriminant analysis, random forest, extra trees, gradient boosting, multiple layers perception, k-nearest neighbor, and decision trees-were compared with extreme gradient boosting.
We developed 2 models that were based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2); 10-fold cross-validation was used to avoid overfitting and to validate each model [27]. We measured AUC, sensitivity (recall), specificity, precision, balanced accuracy, and F1 score to assess the predictive ability of each model. The balanced accuracy was calculated as follows: balanced accuracy = (sensitivity + specificity) / 2. The F1 score were calculated as follows: F1 score = (2 × precision × recall) / (precision + recall). Shapley additive explanation (SHAP) values were used to measure the marginal contribution of each feature to the models [28].

Demographic and Clinical Characteristics
The demographic and clinical characteristics of the training and testing cohorts indicated that most characteristics were similarly distributed ( Table 1). All patients were Chinese. The mean ages at dialysis initiation were 51.67 years (SD 16.48) in the training cohort and 62.53 years (SD 16.20) in the testing cohort; 61.58% of the patients (3295/5351) in the training cohort and 60.47% of the patients (3524/5828) in the testing cohort were men; out of 5351 patients, 585 (10.93%) deaths were reported in the training cohort, and out of 5828 patients, 764 (13.11%) deaths were reported in the testing cohort.

Model Performance
The ranks of features selected after training the extreme gradient boosting models are shown in Multimedia Appendix 1 and Multimedia Appendix 2. The same 15 most important features were chosen for both model 1 and model 2: age at dialysis initiation, vascular access, metastatic cancer, diabetic nephropathy, congestive heart failure, ischemic heart disease, cerebrovascular disease, albumin, hemoglobin, neutrophil, C-reactive protein, creatinine, estimated glomerular filtration rate, systolic blood pressure, and BMI.
SHAP value results are shown in Figure 3 (model 1) and Figure  4 (model 2). Each point represents a data sample for the feature.
History of congestive heart failure, albumin level, C-reactive protein level, and age at dialysis initiation were the most important factors affecting the prediction for first-year mortality in both model 1 and model 2. Figure 5 shows an example using model 2 that shows how features contribute to the probability for a single participant. This participant had a history of congestive heart failure, low creatinine level, a high C-reactive protein level, high neutrophil count, and old age at dialysis initiation, which contributed to a higher probability of mortality in the first year, although he had normal BMI and slightly high systolic blood pressure levels.

Principal Findings
In this study, by implementing advanced machine learning techniques, we developed and validated 2 clinical risk prediction models for first-year mortality in incident hemodialysis patients. The 2 extreme gradient boosting models were established based on the data available at dialysis initiation and data from 0-3 months after dialysis initiation. The performance of model 1 (AUC 0.83) was similar to that of model 2 (AUC 0.85), suggesting that we can predict first-year mortality in patients undergoing hemodialysis at dialysis initiation.
Mortality for patients undergoing hemodialysis during the first year of dialysis initiation is high [4]. Therefore, early and precise individualized risk estimates are required for clinical decision making. Traditional strategies for building prediction models have contributed to quality improvement and decision support. Nevertheless, these models have some limitations that may lead to missing important predictors and relationships. Our prediction models (model 1: AUC 0.83, model 2: AUC 0.85), compared with previous models (AUC 0.710-0.752) [12][13][14][15][16][17], were more accurate in stratifying the risk of first-year mortality for patients undergoing hemodialysis. Our prediction models had several unique and important characteristics. First, many clinical features have been reported for the prediction of first-year mortality in incident hemodialysis patients; some of these features are interact with each other. Traditional prediction models do not account for interactions between input features. By using extreme gradient boosting, we selected the 15 most important features from 42 candidate features, and then combined them nonlinearly. Second, missing data and data noise are inevitable in clinical data collected from the real world, which is a complex problem for traditional strategies. Machine learning techniques can deal with missing data and data noise automatically to improve model performance. Third, relationships between data may change over time because of improvements in treatment and changing populations. For example, the rates of diabetic nephropathy and cardiovascular disease have been increasing yearly [1,2]. Traditional prediction models are always nonrenewable. Machine learning allows for continual updating of the model to incorporate new data and capture changes in the relationships between features. Finally, compared with traditional predictive models, machine learning models are more complex and harder to interpret; it is not easy to determine how these models make decisions. Therefore, we used SHAP values to interpret the models in this study. SHAP values for a single patient can help physicians evaluate prognosis and make individualized treatment regimens.
Previous studies [8,15,29] have used data from distinct time periods. Floege et al [15], by using 90-to 180-day baseline and 0-to 90-day baseline data for the prediction of first-year mortality, revealed that 2 Cox regression models had similar performances. Some studies [8,29]

Limitations and Future Work
Despite the promising prospects demonstrated by our study, it had some limitations. First, our training data were based on retrospective data generated from a single center. Therefore, a possible center effect cannot be excluded. Second, although no restriction was placed on ethnicity, all patients included were Chinese. The primary disease of end-stage renal disease and cardiovascular conditions of patients undergoing hemodialysis in China differ from those of patients undergoing hemodialysis in other regions [2,30]. Thus, the applicability of our models to other ethnic groups and regions needs to be confirmed. Third, we only assessed 1-year mortality, whereas long-term mortality is also important [31]. Therefore, we plan to establish a model to predict 2-year and 5-year mortality in future studies. Finally, therapeutic intervention data, such as dialysis dose and frequency, were not used in this study because therapeutic interventions were not always fixed until 1-2 months after dialysis initiation, and therapeutic interventions in patients varied. We also plan to display the prediction models on the website of the Zhejiang Dialysis Quality Control Center and as a mobile app for better application.

Conclusions
To accurately predict first-year mortality in incident hemodialysis patients, we developed and validated 2 machine learning models based on data available at dialysis initiation and data 0-3 months after dialysis initiation. The overall diagnostic performances of the 2 models were similar. We hope our models may assist clinicians in stratifying the risk of mortality at the early stages of dialysis. Our models need to be evaluated in data sets of patients undergoing hemodialysis from other ethnic groups and regions before implementation in clinical practice. For future research, long-term mortality predictions for patients undergoing incident dialysis will be addressed.