Published on in Vol 8, No 10 (2020): October

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/20578, first published .
Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study

Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study

Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study

Original Paper

Kidney Disease Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China

*these authors contributed equally

Corresponding Author:

Jianghua Chen, MD

Kidney Disease Center

The First Affiliated Hospital, Zhejiang University School of Medicine

#79 Qingchun Road

Hangzhou, 310003

China

Phone: 86 57187236992

Email: zjukidney@zju.edu.cn


Background: The first-year survival rate among patients undergoing hemodialysis remains poor. Current mortality risk scores for patients undergoing hemodialysis employ regression techniques and have limited applicability and robustness.

Objective: We aimed to develop a machine learning model utilizing clinical factors to predict first-year mortality in patients undergoing hemodialysis that could assist physicians in classifying high-risk patients.

Methods: Training and testing cohorts consisted of 5351 patients from a single center and 5828 patients from 97 renal centers undergoing hemodialysis (incident only). The outcome was all-cause mortality during the first year of dialysis. Extreme gradient boosting was used for algorithm training and validation. Two models were established based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2), and 10-fold cross-validation was applied to each model. The area under the curve (AUC), sensitivity (recall), specificity, precision, balanced accuracy, and F1 score were used to assess the predictive ability of the models.

Results: In the training and testing cohorts, 585 (10.93%) and 764 (13.11%) patients, respectively, died during the first-year follow-up. Of 42 candidate features, the 15 most important features were selected. The performance of model 1 (AUC 0.83, 95% CI 0.78-0.84) was similar to that of model 2 (AUC 0.85, 95% CI 0.81-0.86).

Conclusions: We developed and validated 2 machine learning models to predict first-year mortality in patients undergoing hemodialysis. Both models could be used to stratify high-risk patients at the early stages of dialysis.

JMIR Med Inform 2020;8(10):e20578

doi:10.2196/20578

Keywords



Background

The overall prevalence of chronic kidney disease is 10.8% in China and 15% in the United States, which has brought significant economic, social, and medical burdens on patients and society [1-3]. According to the United States Renal Data System, there are approximately 120,000 patients with end-stage renal disease starting chronic renal replacement therapy every year [2]. However, survival among incident hemodialysis patients remains poor, especially in the first year of the initiation of dialysis [4,5].

End-stage renal disease is a complex disease state with multiple associated comorbidities. Patients initiating hemodialysis often have acute complications, and some of them suffer from major comorbid conditions that are associated with poor short-term prognoses [6]. It is essential to stratify the risk of mortality according to clinical and laboratory findings of patients undergoing hemodialysis; therefore, the identification of patients undergoing hemodialysis who are at high risk of first-year mortality is of great clinical significance. It can inform patients of their survival prognosis in the early stages of dialysis and allow clinicians to make targeted intervention strategies to improve first-year outcomes. Previous studies [7-11] have identified many risk factors for early dialysis mortality, such as old age, chronic heart failure, catheter use, low albumin, low hemoglobin, and high estimated glomerular filtration rate at dialysis initiation. However, because of the heterogeneity of primary disorders and broad comorbidities, these risk factors are not enough to be used for conclusive decision making. In recent years, a number of clinical risk models have been developed to predict early mortality in the dialysis population, and most are based on linear models (logistic or Cox model) [12-16]. The performances of these models were not good enough in either the original population or the external validation—area under the curve (AUC) of these models ranged from 0.710 to 0.752 [17]. In addition, no study compared models based on predialysis data with models based on data after dialysis.

In recent years, machine learning has been proven to be a very powerful method by researchers in medical fields [18-21]. Machine learning is useful in identifying the most important factors and for developing predictive models with the best performance. A recent study [22] reported on a random forest machine learning model used to predict first-year survival of incident hemodialysis patients. The model’s AUC was 0.749 (95% CI 0.742-0.755), which was superior to those of traditional risk prediction models; however, this is not accurate enough for clinical application.

Objective

Therefore, in this study, we sought to develop and validate sufficiently accurate models based on machine learning techniques, utilizing readily available clinical factors to predict first-year mortality in incident dialysis patients.


Study Design

This study retrospectively collected data from Zhejiang Dialysis System. Zhejiang Dialysis System is a database of hemodialysis and peritoneal dialysis patients in East China. Training data were retrieved from the First Affiliated Hospital College of Medicine Zhejiang University between January 2007 and April 2019 (Figure 1). Testing data were collected from 97 renal centers between January 2010 and August 2018 for external validation (Figure 1). All follow-up data were updated to August 2019.

Figure 1. A workflow to develop the prediction models for first-year mortality in incident hemodialysis patients. XGBoost: extreme gradient boosting.
View this figure

Adult patients (aged ≥18 years) with end-stage renal disease and with follow-up exceeding 12 months who started maintenance hemodialysis were included. Patients who died within 12 months of follow-up were also included.

The exclusion criteria were as follows: patients with a history of previous renal replacement therapy, patients whose kidney function recovered within 3 months, patients who received renal transplantation or switched to peritoneal dialysis within 12 months after dialysis initiation. We also excluded patients with missing information on disease diagnoses or age at dialysis initiation.

This study followed the tenets of the Declaration of Helsinki and was approved by the ethics committee of the First Affiliated Hospital of Zhejiang University (IIT20200088A) in Hangzhou, China. Written informed consent was obtained from each participant.

Outcome and Predictors

The outcome of this study was all-cause mortality during the first year of dialysis. Outcome status and potential candidate variables for the prediction tool, including demographic information, disease diagnoses, comorbidities, and laboratory test results, were obtained from the Zhejiang Dialysis System.

Demographic information and type of vascular access were collected at the start of dialysis. Disease diagnoses, comorbid information, and laboratory test results were collected 0-3 months after dialysis initiation. The most recent serum creatinine measurements prior to the index date were used to estimate the glomerular filtration rate using the Chronic Kidney Disease Epidemiology Collaboration equation [23].

A total of 42 variables were included as candidate features based on review of relevant literature and clinical experience. Only BMI and ferritin had missing data, and both instances of missing data were less than 6% (Table 1).

Table 1. Baseline characteristics of the training and testing cohorts.
CharacteristicsAt dialysis initiation0-3 months

Training cohort (n=5351)Testing cohort (n=5828)Training cohort (n=4425)Testing cohort (n=3729)
Sex, n (%)




Male3295 (61.58)3524 (60.47)2744 (62.01)2264 (60.71)

Female2056 (38.42)2304 (39.53)1681 (37.99)1465 (39.29)
Body mass index (kg/m2), mean (SD)a22.09 (3.29)21.73 (3.07)22.19 (3.39)21.83 (3.04)
Age at dialysis initiation (years), mean (SD)51.67 (16.48)62.53 (16.20)52.61 (16.59)62.45 (15.9)
Systolic pressure (mmHg), mean (SD)137.49 (22.93)146.18 (24.58)138.52 (23.15)146.33 (24.68)
Diastolic pressure (mmHg), mean (SD)77.76 (12.26)78.95 (15.52)80.45 (12.15)79.02 (15.45)
Chronic kidney disease etiology, n (%)




Chronic glomerulonephritis2823 (52.76)3015 (51.73)2445 (55.25)2064 (55.35)

Diabetic nephropathy1120 (20.93)1191 (20.44)895 (20.23)818 (21.94)

Hypertensive nephropathy262 (4.90)557 (9.56)218 (4.93)370 (9.92)

Lupus nephritis68 (1.27)50 (0.86)57 (1.29)29 (0.78)

ANCA-associatedb vasculitis57 (1.07)64 (1.10)53 (1.20)33 (0.88)

Gouty nephropathy32 (0.60)125 (2.14)26 (0.59)72 (1.93)

Polycystic kidney disease286 (5.34)214 (3.67)220 (4.97)150 (4.02)

Other703 (13.14)612 (11.07)511 (11.54)204 (5)
Comorbid conditions, n (%)




Cirrhosis86 (1.61)90 (1.54)81 (1.83)60 (1.61)

Multiple myeloma46 (0.86)90 (1.54)46 (1.04)51 (1.37)

Atrial fibrillation108 (2.02)109 (1.87)85 (1.92)72 (1.93)

Congestive heart failure969 (18.11)999 (17.14)794 (17.94)605 (16.22)

Ischemic heart disease1476 (27.58)1578 (27.08)1206 (27.25)983 (26.36)

Metastatic cancer86 (1.61)91 (1.56)74 (1.67)38 (1.02)

Lymphoma7 (0.13)7 (0.12)6 (0.14)1 (0.03)

Chronic obstructive pulmonary disease241 (4.50)165 (2.83)169 (3.82)78 (2.09)

Cerebrovascular disease322 (6.02)411 (7.05)244 (5.51)271 (7.27)
Laboratory data




Leukocyte (109/L), mean (SD)7.32 (2.95)7.71 (3.79)7.40 (3.09)6.90 (3.22)

Neutrophil (109/L), mean (SD)5.23 (2.68)5.06 (3.32)5.36 (2.78)4.22 (2.57

Hemoglobin (g/L), mean (SD)94.82 (23.30)83.09 (19.12)91.05 (21.68)86.50 (14.67)

Platelet (109/L), mean (SD)193.28 (93.47)182.47 (83.70)190.84 (88.13)184.36 (71.39)

Albumin (g/L), mean (SD)36.01 (6.75)33.27 (5.99)36.80 (6.59)33.98 (5.54)

Phosphorus (mmol/L), mean (SD)1.81 (0.62)1.70 (0.66)1.66 (0.52)1.54 (0.50)

Calcium (mmol/L), mean (SD)2.15 (0.28)2.02 (0.30)2.14 (0.22)2.08 (0.23)

Potassium (mmol/L)4.87 (1.11)4.52 (0.91)4.76 (0.96)4.42 (0.69)

Parathyroid hormone (pg/ml), mean (SD)334.71 (292.07)246.95 (193.61)315.98 (291.84)241.26 (206.48)

Creatinine (μmol/L), mean (SD)807.11 (352.04)718.84 (336.47)755.28 (315.95)661.5 (268.48)

Urea nitrogen (mmol/L), mean (SD)22.65 (12.07)23.61 (11.77)19.87 (8.72)20.01 (8.13)

Uric acid (μmol/L), mean (SD)436.84 (147.54)450.27 (157.44)392.87 (126.48)402.19 (113.46)

C-reactive protein, mean (SD)40.84 (44.09)25.65 (44.46)18.52 (35.01)20.23 (31.22)

Cholesterol (mmol/L), mean (SD)4.34 (1.30)4.30 (1.42)4.27 (1.23)4.34 (1.25)

Triglycerides (mmol/L), mean (SD)1.56 (1.00)1.60 (1.03)1.58 (0.96)1.63 (0.97)

High-density lipoprotein, (mmol/L), mean (SD)1.14 (0.42)1.11 (0.43)1.12 (0.39)1.15 (0.38)

Low-density lipoprotein (mmol/L), mean (SD)2.36 (1.10)2.37 (1.022.31 (1.04)2.35 (0.92)

Very low-density lipoprotein (mmol/L), mean (SD)1.65 (1.55)2.11 (1.35)1.63 (1.54)1.60 (0.93)

Ferritin (ng/mL), mean (SD)c174.59 (126.34)328.25 (295.78)144.34 (144.87)305.42 (278.73)

eGFRd (mL/min/1.73m2), mean (SD)6.75 (3.79)7.28 (3.93)7.23 (3.85)7.58 (3.44)
Vascular access at dialysis initiation, n (%)




Nontunneled catheter3295 (61.58)3388 (58.13)2495 (56.38)1893 (50.76)

Tunneled catheter1068 (19.96)1266 (21.72)1005 (22.71)938 (25.15)

Fistula or graft988 (18.46)1174 (20.14)925 (20.90)898 (24.08)
Death at 1-year follow-up, n (%)585 (10.93)764 (13.11)437 (9.88)477 (12.79)

aThe missing rates of body mass index in the 4 cohorts were 270 (5.04%), 298 (5.11%), 210 (4.74%), and 168 (4.50%), respectively.

bANCA: antineutrophil cytoplasmic antibody.

cThe missing rates of ferritin in the 4 cohorts were 0.36%, 3.00%, 0.36%, and 2.13%, respectively.

deGFR: estimated glomerular filtration rate.

Data Preprocessing

Before the baseline model was developed, missing data were imputed with the mean value for continuous variables and the mode value for categorical variables. By using one-hot encoding, all categorical features were transformed into numerical features. Box-Cox transformation was performed to normalize numerical features that were highly skewed [24].

Algorithm Development and Validation

An extreme gradient boosting machine learning algorithm was employed to build a model to predict the correlation between features and the outcome. Extreme gradient boosting is an integrated learning algorithm based on gradient boosted decision trees [25]. Using the Gini impurity index [26], we estimated the feature importance scores of candidate features after going through the training process. The feature importance scores showed how valuable each feature was in the construction of the boosted decision trees within the model.

The extreme gradient boosting algorithm was employed because (1) it has high efficiency and accuracy, (2) it can prevent overfitting via regularization, (3) it provides feature importance, and (4) it allows the use of a wide variety of computing environments.

Other popular machine learning algorithms—adaptive boosting, light gradient boosting machine, logistic regression, linear discriminant analysis, random forest, extra trees, gradient boosting, multiple layers perception, k-nearest neighbor, and decision trees—were compared with extreme gradient boosting.

We developed 2 models that were based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2); 10-fold cross-validation was used to avoid overfitting and to validate each model [27]. We measured AUC, sensitivity (recall), specificity, precision, balanced accuracy, and F1 score to assess the predictive ability of each model. The balanced accuracy was calculated as follows: balanced accuracy = (sensitivity + specificity) / 2. The F1 score were calculated as follows: F1 score = (2 × precision × recall) / (precision + recall). Shapley additive explanation (SHAP) values were used to measure the marginal contribution of each feature to the models [28].


Demographic and Clinical Characteristics

The demographic and clinical characteristics of the training and testing cohorts indicated that most characteristics were similarly distributed (Table 1). All patients were Chinese. The mean ages at dialysis initiation were 51.67 years (SD 16.48) in the training cohort and 62.53 years (SD 16.20) in the testing cohort; 61.58% of the patients (3295/5351) in the training cohort and 60.47% of the patients (3524/5828) in the testing cohort were men; out of 5351 patients, 585 (10.93%) deaths were reported in the training cohort, and out of 5828 patients, 764 (13.11%) deaths were reported in the testing cohort.

Model Performance

The ranks of features selected after training the extreme gradient boosting models are shown in Multimedia Appendix 1 and Multimedia Appendix 2. The same 15 most important features were chosen for both model 1 and model 2: age at dialysis initiation, vascular access, metastatic cancer, diabetic nephropathy, congestive heart failure, ischemic heart disease, cerebrovascular disease, albumin, hemoglobin, neutrophil, C-reactive protein, creatinine, estimated glomerular filtration rate, systolic blood pressure, and BMI.

Among the 11 algorithms applied (Table 2), the extreme gradient boosting algorithm had the best generalized performance for both model 1 (AUC 0.83, 95% CI 0.78-0.84; balanced accuracy 84.52%; F1 score 0.75) and model 2 (AUC 0.85, 95% CI 0.81-0.86, balanced accuracy 89.21%, F1 score 0.78). As shown in Figure 2, the receiver operating characteristic curves of both models were similar.

SHAP value results are shown in Figure 3 (model 1) and Figure 4 (model 2). Each point represents a data sample for the feature. History of congestive heart failure, albumin level, C-reactive protein level, and age at dialysis initiation were the most important factors affecting the prediction for first-year mortality in both model 1 and model 2. Figure 5 shows an example using model 2 that shows how features contribute to the probability for a single participant. This participant had a history of congestive heart failure, low creatinine level, a high C-reactive protein level, high neutrophil count, and old age at dialysis initiation, which contributed to a higher probability of mortality in the first year, although he had normal BMI and slightly high systolic blood pressure levels.

Table 2. Performance of different algorithms trained on the testing data set.
ModelsPrecision, %Sensitivity, %Specificity, %F1 scoreBalanced accuracy, %AUCa (95% CI)Accuracy, %
Model 1







Adaptive boosting43.3455.3789.290.486272.330.81 (0.77-0.82)84.92

Decision tree68.6135.4797.550.467666.510.78 (0.76-0.80)89.41

Extra trees78.5659.9597.530.680078.740.83 (0.77-0.83)92.60

Gradient boosting52.5849.3593.290.509171.320.82 (0.77-0.83)87.53

k-nearest neighbor47.3250.9291.450.490571.180.76 (0.76-0.84)86.14

Linear discriminant analysis14.0282.4623.740.239753.100.75 (0.74-0.84)31.43

Light gradient boosting91.7662.7099.150.744980.920.82 (0.77-0.83)94.37

Logistic regression14.1685.4721.840.243053.660.68 (0.68-0.85)30.18

Multiple layers perception16.6478.8040.440.274859.620.80 (0.68-0.85)45.47

Random forest90.6240.4599.370.559369.910.81 (0.78-0.83)91.64

Extreme gradient boosting79.3471.8697.180.754184.520.83 (0.78-0.84)93.86
Model 2







Adaptive boosting61.8372.3393.450.666782.890.83 (0.80-0.84)90.75

Decision tree78.5063.5297.450.702280.480.81 (0.80-0.82)93.11

Extra trees74.4860.5996.960.668278.770.84 (0.80-0.85)92.30

Gradient boosting83.0867.9297.970.747482.950.84 (0.82-0.85)94.13

k-nearest neighbor87.3752.2098.890.653575.550.82 (0.81-0.86)92.92

Linear discriminant analysis16.3382.8137.760.272860.290.76 ()0.76-0.8643.52

Light gradient boosting77.9775.6896.860.768186.270.85 (0.80-0.85)94.15

Logistic regression16.1281.7637.580.269259.670.73 (0.73-0.86)43.23

Multiple layers perception16.1980.0839.210.269459.650.71 (0.71-0.86)44.44

Random forest66.6770.0294.860.683082.440.82 (0.80-0.85)91.69
 Extreme gradient boosting78.9578.6296.920.787887.770.85 (0.81-0.86)94.58

aAUC: area under the curve.

Figure 2. Receiver-operating characteristic curves of model 1 and model 2. AUC: the area under the curve.
View this figure
Figure 3. SHAP values illustrating how features contribute to model 1. Blue shows a negative contribution, and red shows a positive contribution. SHAP: Shapley additive explanation.
View this figure
Figure 4. SHAP values illustrating how features contribute to model 2. Blue shows a negative contribution, and red shows a positive contribution. SHAP: Shapley additive explanation.
View this figure
Figure 5. The SHAP value for a single data sample. BMI: body mass index, CHF: congestive heart failure, CRP: C-reactive protein, Cr: creatinine, NEU: neutrophil, SBP: systolic blood pressure.
View this figure

Principal Findings

In this study, by implementing advanced machine learning techniques, we developed and validated 2 clinical risk prediction models for first-year mortality in incident hemodialysis patients. The 2 extreme gradient boosting models were established based on the data available at dialysis initiation and data from 0-3 months after dialysis initiation. The performance of model 1 (AUC 0.83) was similar to that of model 2 (AUC 0.85), suggesting that we can predict first-year mortality in patients undergoing hemodialysis at dialysis initiation.

Mortality for patients undergoing hemodialysis during the first year of dialysis initiation is high [4]. Therefore, early and precise individualized risk estimates are required for clinical decision making. Traditional strategies for building prediction models have contributed to quality improvement and decision support. Nevertheless, these models have some limitations that may lead to missing important predictors and relationships. Our prediction models (model 1: AUC 0.83, model 2: AUC 0.85), compared with previous models (AUC 0.710-0.752) [12-17], were more accurate in stratifying the risk of first-year mortality for patients undergoing hemodialysis. Our prediction models had several unique and important characteristics. First, many clinical features have been reported for the prediction of first-year mortality in incident hemodialysis patients; some of these features are interact with each other. Traditional prediction models do not account for interactions between input features. By using extreme gradient boosting, we selected the 15 most important features from 42 candidate features, and then combined them nonlinearly. Second, missing data and data noise are inevitable in clinical data collected from the real world, which is a complex problem for traditional strategies. Machine learning techniques can deal with missing data and data noise automatically to improve model performance. Third, relationships between data may change over time because of improvements in treatment and changing populations. For example, the rates of diabetic nephropathy and cardiovascular disease have been increasing yearly [1,2]. Traditional prediction models are always nonrenewable. Machine learning allows for continual updating of the model to incorporate new data and capture changes in the relationships between features. Finally, compared with traditional predictive models, machine learning models are more complex and harder to interpret; it is not easy to determine how these models make decisions. Therefore, we used SHAP values to interpret the models in this study. SHAP values for a single patient can help physicians evaluate prognosis and make individualized treatment regimens.

Previous studies [8,15,29] have used data from distinct time periods. Floege et al [15], by using 90- to 180-day baseline and 0- to 90-day baseline data for the prediction of first-year mortality, revealed that 2 Cox regression models had similar performances. Some studies [8,29] used data obtained at dialysis initiation to predict the 3- to 6- month mortality of patients undergoing hemodialysis. Akbilgic et al [17] developed a random forest model based on 49 predialysis patient features (AUC 0.75, 95% CI 0.74-0.76); however, it may be not feasible for all users because too many features are needed. Our models were based on 15 features that are easily available for clinicians. The performance of model 1 was satisfactory, suggesting that model 1 can be used to classify high-risk patients at the early stage of dialysis. The first-year mortality risk of dialysis patients may be reduced by personalized and targeted preventive therapies.

Limitations and Future Work

Despite the promising prospects demonstrated by our study, it had some limitations. First, our training data were based on retrospective data generated from a single center. Therefore, a possible center effect cannot be excluded. Second, although no restriction was placed on ethnicity, all patients included were Chinese. The primary disease of end-stage renal disease and cardiovascular conditions of patients undergoing hemodialysis in China differ from those of patients undergoing hemodialysis in other regions [2,30]. Thus, the applicability of our models to other ethnic groups and regions needs to be confirmed. Third, we only assessed 1-year mortality, whereas long-term mortality is also important [31]. Therefore, we plan to establish a model to predict 2-year and 5-year mortality in future studies. Finally, therapeutic intervention data, such as dialysis dose and frequency, were not used in this study because therapeutic interventions were not always fixed until 1-2 months after dialysis initiation, and therapeutic interventions in patients varied. We also plan to display the prediction models on the website of the Zhejiang Dialysis Quality Control Center and as a mobile app for better application.

Conclusions

To accurately predict first-year mortality in incident hemodialysis patients, we developed and validated 2 machine learning models based on data available at dialysis initiation and data 0-3 months after dialysis initiation. The overall diagnostic performances of the 2 models were similar. We hope our models may assist clinicians in stratifying the risk of mortality at the early stages of dialysis. Our models need to be evaluated in data sets of patients undergoing hemodialysis from other ethnic groups and regions before implementation in clinical practice. For future research, long-term mortality predictions for patients undergoing incident dialysis will be addressed.

Acknowledgments

This work was supported by National Key Research and Development Projects of China (2018YFC1314003). Study sponsors had no role in study design; collection, analysis, and interpretation of data; writing the report; and the decision to submit the report for publication.

Authors' Contributions

KS and JC conceptualized the study; JC acquired funding; KS, XY, JL, and YH collected data; KS developed methodology, analyzed the data, and wrote the first draft; and PZ reviewed and edited.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Importance rankings of 42 features based on data at dialysis initiation.

DOC File , 224 KB

Multimedia Appendix 2

Importance ranking of 42 features based on data 0-3 months after dialysis initiation.

DOC File , 224 KB

  1. Zhang L, Wang F, Wang L, Wang W, Liu B, Liu J, et al. Prevalence of chronic kidney disease in China: a cross-sectional survey. The Lancet 2012 Mar;379(9818):815-822. [CrossRef]
  2. Saran RR, Abbott KC. US Renal Data System 2018 Annual Data Report: Epidemiology of Kidney Disease in the United States American journal of kidney diseases : the official journal of the National Kidney Foundation 2019, 73(3S1). 2019.   URL: https://www.usrds.org/annual-data-report/previous-adrs/ [accessed 2020-09-01]
  3. Collaboration GBDCKD. Global, regional, and national burden of chronic kidney disease, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 2020 Feb 29;395(10225):709-733 [FREE Full text] [CrossRef] [Medline]
  4. Robinson BM, Zhang J, Morgenstern H, Bradbury BD, Ng LJ, McCullough KP, et al. Worldwide, mortality risk is high soon after initiation of hemodialysis. Kidney Int 2014 Jan;85(1):158-165 [FREE Full text] [CrossRef] [Medline]
  5. Foley RN, Chen S, Solid CA, Gilbertson DT, Collins AJ. Early mortality in patients starting dialysis appears to go unregistered. Kidney Int 2014 Aug;86(2):392-398 [FREE Full text] [CrossRef] [Medline]
  6. Kovesdy CP, Naseer A, Sumida K, Molnar MZ, Potukuchi PK, Thomas F, et al. Abrupt Decline in Kidney Function Precipitating Initiation of Chronic Renal Replacement Therapy. Kidney Int Rep 2018 May;3(3):602-609 [FREE Full text] [CrossRef] [Medline]
  7. Jassal SV, Karaboyas A, Comment LA, Bieber BA, Morgenstern H, Sen A, et al. Functional Dependence and Mortality in the International Dialysis Outcomes and Practice Patterns Study (DOPPS). Am J Kidney Dis 2016 Feb;67(2):283-292 [FREE Full text] [CrossRef] [Medline]
  8. Wick JP, Turin TC, Faris PD, MacRae JM, Weaver RG, Tonelli M, et al. A Clinical Risk Prediction Tool for 6-Month Mortality After Dialysis Initiation Among Older Adults. Am J Kidney Dis 2017 May;69(5):568-575. [CrossRef] [Medline]
  9. Saleh T, Sumida K, Molnar MZ, Potukuchi PK, Thomas F, Lu JL, et al. Effect of Age on the Association of Vascular Access Type with Mortality in a Cohort of Incident End-Stage Renal Disease Patients. Nephron 2017 May 18;137(1):57-63 [FREE Full text] [CrossRef] [Medline]
  10. Karaboyas A, Morgenstern H, Li Y, Bieber BA, Hakim R, Hasegawa T, et al. Estimating the Fraction of First-Year Hemodialysis Deaths Attributable to Potentially Modifiable Risk Factors: Results from the DOPPS. CLEP 2020 Jan;12:51-60. [CrossRef]
  11. Karaboyas A, Morgenstern H, Waechter S. Low hemoglobin at hemodialysis initiation: an international study of anemia management and mortality in the early dialysis period. Clin Kidney J 2020;13(3):425-433. [CrossRef]
  12. Mauri JM, Clèries M, Vela E, Catalan Renal Registry. Design and validation of a model to predict early mortality in haemodialysis patients. Nephrol Dial Transplant 2008 May 26;23(5):1690-1696. [CrossRef] [Medline]
  13. Chua H, Lau T, Luo N, Ma V, Teo B, Haroon S, et al. Predicting first-year mortality in incident dialysis patients with end-stage renal disease - the UREA5 study. Blood Purif 2014 Feb 26;37(2):85-92 [FREE Full text] [CrossRef] [Medline]
  14. Doi T, Yamamoto S, Morinaga T, Sada KE, Kurita N, Onishi Y. Risk Score to Predict 1-Year Mortality after Haemodialysis Initiation in Patients with Stage 5 Chronic Kidney Disease under Predialysis Nephrology Care. PloS one 2015;10(6):e0129180. [CrossRef] [Medline]
  15. Floege J, Gillespie IA, Kronenberg F, Anker SD, Gioni I, Richards S, et al. Development and validation of a predictive mortality risk score from a European hemodialysis cohort. Kidney Int 2015 May;87(5):996-1008 [FREE Full text] [CrossRef] [Medline]
  16. Quinn RR, Laupacis A, Hux JE, Oliver MJ, Austin PC. Predicting the Risk of 1-Year Mortality in Incident Dialysis Patients. Medical Care 2011;49(3):257-266. [CrossRef]
  17. Ramspek C, Voskamp P, van Ittersum F, Krediet R, Dekker F, van Diepen M. Prediction models for the mortality risk in chronic dialysis patients: a systematic review and independent external validation study. CLEP 2017 Sep;9:451-464. [CrossRef]
  18. Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram. Nat Med 2019 Jan 7;25(1):70-74. [CrossRef]
  19. Chen T, Li X, Li Y, Xia E, Qin Y, Liang S, et al. Prediction and Risk Stratification of Kidney Outcomes in IgA Nephropathy. Am J Kidney Dis 2019 Sep;74(3):300-309. [CrossRef] [Medline]
  20. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nature medicine 2019 Jan;25(1):30-36. [CrossRef] [Medline]
  21. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019 Jan 7;25(1):44-56. [CrossRef]
  22. Akbilgic O, Obi Y, Potukuchi PK, Karabayir I, Nguyen DV, Soohoo M, et al. Machine Learning to Identify Dialysis Patients at High Death Risk. Kidney Int Rep 2019 Sep;4(9):1219-1229 [FREE Full text] [CrossRef] [Medline]
  23. Levey AS, Stevens LA, Schmid CH, Zhang Y, Castro AF, Feldman HI, CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration). A new equation to estimate glomerular filtration rate. Ann Intern Med 2009 May 05;150(9):604-612 [FREE Full text] [CrossRef] [Medline]
  24. Asar Ö, Ilk O, Dag O. Estimating Box-Cox power transformation parameter via goodness-of-fit tests. Communications in Statistics - Simulation and Computation 2014 Dec 12;46(1):91-105. [CrossRef]
  25. Chen TG. Xgboost: A scalable tree boosting system. 2016 Presented at: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016; San Francisco   URL: https://dl.acm.org/doi/pdf/10.1145/2939672.2939785 [CrossRef]
  26. Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable importances in forests of randomized trees. Advances in Neural Information Processing Systems. 2013.   URL: http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized [accessed 2020-09-01]
  27. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005 Aug 01;21(15):3301-3307. [CrossRef] [Medline]
  28. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017.   URL: https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf [accessed 2020-09-01]
  29. Couchoud CG, Beuscart JR, Aldigier J, Brunet PJ, Moranne OP, REIN registry. Development of a risk stratification algorithm to improve patient-centered care and decision making for incident elderly patients with end-stage renal disease. Kidney Int 2015 Nov;88(5):1178-1186 [FREE Full text] [CrossRef] [Medline]
  30. Wang F, Yang C, Long J. Executive summary for the 2015 Annual Data Report of the China Kidney Disease Network. Kidney Int 2019 Aug;96(2):501-505. [CrossRef] [Medline]
  31. Arase H, Yamada S, Hiyamuta H, Taniguchi M, Tokumoto M, Tsuruya K, et al. Modified creatinine index and risk for long-term infection-related mortality in hemodialysis patients: ten-year outcomes of the Q-Cohort Study. Sci Rep 2020 Jan 27;10(1):1241 [FREE Full text] [CrossRef] [Medline]


AUC: area under the curve
BMI: body mass index
SHAP: Shapley additive explanation


Edited by G Eysenbach; submitted 23.05.20; peer-reviewed by L Cilar, M Sokolova; comments to author 01.07.20; revised version received 15.08.20; accepted 16.08.20; published 29.10.20

Copyright

©Kaixiang Sheng, Ping Zhang, Xi Yao, Jiawei Li, Yongchun He, Jianghua Chen. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 29.10.2020.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.