Use of Deep Learning to Predict Acute Kidney Injury After Intravenous Contrast Media Administration: Prediction Model Development Study

Background: Precise prediction of contrast media–induced acute kidney injury (CIAKI) is an important issue because of its relationship with poor outcomes. Objective: Herein, we examined whether a deep learning algorithm could predict the risk of intravenous CIAKI better than other machine learning and logistic regression models in patients undergoing computed tomography (CT). Methods: A total of 14,185 patients who were administered intravenous contrast media for CT at the preventive and monitoring facility in Seoul National University Hospital were reviewed. CIAKI was defined as an increase in serum creatinine of ≥ 0.3 mg/dL within 2 days or ≥ 50% within 7 days. Using both time-varying and time-invariant features, machine learning models, such as the recurrent neural network (RNN), light gradient boosting machine (LGM), extreme gradient boosting machine (XGB), random forest (RF), decision tree (DT), support vector machine (SVM), κ -nearest neighbors, and logistic regression, were developed using a training set, and their performance was compared using the area under the receiver operating characteristic curve (AUROC) in a test set. Results: CIAKI developed in 261 cases (1.8%). The RNN model had the highest AUROC of 0.755 (0.708-0.802) for predicting CIAKI, which was superior to that obtained from other machine learning models. Although CIAKI was defined as an increase in serum creatinine of ≥ 0.5 mg/dL or ≥ 25% within 3 days, the highest performance was achieved in the RNN model with an AUROC of 0.716 (95% confidence interval [CI] 0.664-0.768). In feature ranking analysis, the albumin level was the most highly contributing factor to RNN performance, followed by time-varying kidney function. Conclusions: Application of a deep learning algorithm improves the predictability of intravenous CIAKI after CT, representing a basis for future clinical alarming and preventive systems.


Introduction
Computed tomography (CT) using contrast media is necessary to clinically detect abnormalities, but the administration of contrast media can lead to acute kidney injury (known as contrast media-induced acute kidney injury [CIAKI]).This is a critical issue due to subsequent risk of irreversible kidney dysfunction and increased mortality [1].This adverse relationship is more critical in intra-arterial administration of contrast media than in intravenous administration [2].Nevertheless, frequent use of CT scanning with intravenous contrast media increases the risk of nephrotoxicity, which requires prophylaxis and monitoring of kidney functions [3].Prediction of intravenous CIAKI after CT scanning may be clinically essential to prepare for intervention in advance, but most relevant studies have primarily focused on intra-arterial CIAKI [4].Models generated in some studies have predicted intravenous CIAKI, but these models had limitations because model performance was evaluated using a training set (rather than a test set) [5][6][7][8][9][10], an updated definition of CIAKI was not used [5][6][7][8][9][10][11][12], a prophylaxis protocol was not described [5,10,11], cases with intra-arterial administration of contrast media were combined in the analysis of intravenous cases [6,9,10], and confounding factors were not sufficiently considered [6][7][8][9][10].
Deep learning algorithms have achieved successful prediction of patient outcomes [13,14], which will change the paradigm of clinical decision making from diagnosis to treatment.Among deep learning algorithms, the recurrent neural network (RNN) can learn and characterize a temporal data set.In the nephrology field, using a time-varying data set of kidney function and vital signs, the predictability of outcomes has improved, such as acute kidney injury [15] and intradialytic complications, which are better than other machine learning (eg, gradient boosting machine) [16] and discrete-time logistic regression [17] models.Precise prediction of intravenous CIAKI may be difficult because multiple conditions have interactive and complex effects on its risk, and heterogeneous features of patients along with fluctuating dynamics of kidney functions before CT scanning may also complicate precise prediction.Herein, we addressed whether an RNN model with a time-varying data set including kidney functions could predict the risk of intravenous CIAKI better than other machine learning or conventional scoring models.

Data Source and Study Patients
A total of 19,628 patients underwent CT scanning with intravenous administration of contrast media at the 1-day-care facility of the Seoul National University Hospital between February 2007 and January 2019.This facility was built for the purpose of monitoring and preventing CIAKI in patients at risk, such as those with reduced kidney function or comorbidities.During admission, patients received hydration with 500 mL of 0.9% saline before and after intravenous administration of contrast media and 1200 mg of N-acetylcysteine for 3 days [18,19].Kidney function was subsequently monitored for 2-7 days after CT scanning.Patients aged less than 18 years (n=5), with end-stage kidney disease (n=335), and no information about serum creatinine levels 28 days before and 7 days after CT scanning (n=5103) were excluded.Accordingly, 14,185 cases were included in the analysis (Multimedia Appendix 1).The institutional review board of the National University Hospital approved the study design (no.H-1812-134-997), which was conducted in accordance with the principles of the Declaration of Helsinki.

Study Features and Outcomes
Baseline characteristics, such as age, sex, weight, height, comorbidities (eg, coronary artery disease, any cancer, liver cirrhosis, glomerulonephritis, kidney transplantation), protocol of CT scanning and volume of contrast media, vital signs (eg, systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and body temperature), and medications (eg, β-blocker, calcium channel blocker, angiotensin-converting enzyme inhibitor, angiotensin receptor blocker, hydrochlorothiazide, spironolactone, furosemide, statin, metformin, sodium-glucose cotransporter 2 inhibitor, dipeptidyl peptidase-4 inhibitor, other oral hypoglycemic agents, and insulin), were collected using the patients' electronic medical records.Vital signs were measured at the time of admission to the facility.Laboratory findings were measured up to 1 month before CT scanning, and variables such as white blood cell count, hemoglobin, hematocrit, platelet count, cholesterol, albumin, total bilirubin, alkaline phosphatase, aspartate transaminase, alanine transaminase, uric acid, blood urea nitrogen, glucose, calcium, phosphate, sodium, potassium, chloride, and bicarbonate were evaluated.The estimated glomerular filtration rate (eGFR) was calculated using the Chronic Kidney Disease Epidemiology Collaboration equation [20].Time-varying features included serum creatinine, eGFR, and elapsed times before CT scanning, and time-invariant features included all the other features.The baseline characteristics are summarized in Table 1.

XSL • FO
RenderX CIAKI was defined as an increase in serum creatinine of ≥0.3 mg/dL within 2 days or ≥50% within 7 days according to the Kidney Disease Improving Global Outcomes guideline [21].In a sensitivity analysis, the other definition recommended by the European Society of Urogenital Radiology was used, such as an increase in serum creatinine of ≥0.5 mg/dL or ≥25% within 3 days [22].As a long-term outcome, information about kidney progression (ie, doubling of serum creatinine, >50% decrease in eGFR, and the need for dialysis and transplantation) and all-cause mortality were obtained using the patients' electronic medical records, the Korean end-stage renal disease registry, and the National Database of Statistics, Korea.

Model Development
Patients were randomly assigned into a training set (70%) to develop the model and a test set (30%) to examine the performance of the model, wherein the occurrence of CIAKI was evenly distributed between the two sets.To develop the RNN model, we combined RNN and multiplayer perceptron (MLP) components.As an RNN component, we used the long short-term memory (LSTM) architecture, which is composed of input, output, and forget gates [23].The median number of time-varying serum creatinine/eGFR values was 16 during the median timeframe of 4 years (1-9 years) before CT scanning.With respect to these results, 16 consecutive time-varying features were used in the RNN model.These features entered stacked cells and a subsequent dense layer (ie, RNN module), while time-invariant features were processed by 3 dense layers of the MLP module.The results were finally concatenated and then passed through 4 dense layers as a merging module.A dropout layer (rate=0.5) was followed behind each dense layer, while internal LSTM layers used input dropout (rate=0.5)and recurrent dropout (rate=0.5)[24].Batch normalization layers were located at the end of RNN and multilayer perceptron modules and after the first and third layers of the merging module.Binary cross-entropy loss was used as a loss function to calculate the difference between actual and predicted labels.The Adam method was used for an optimizer [25], and the best parameter was selected using 10-fold cross-validation.Figure 1 presents the schematic diagram of the RNN model.To provide the model training process, we have added the Python code in Multimedia Appendix 2. The script includes data preprocessing, splitting, modeling, and training process information.
We also developed other machine learning models, such as a light gradient boosting machine (LGM), an extreme gradient boosting machine (XGB), a random forest (RF), a decision tree (DT), a support vector machine (SVM), a κ-nearest neighbor, and logistic regression, to compare their performance to the RNN model.These models could not handle time-varying features; therefore, only time-invariant features were included in the models.Tenfold cross-validation was used in the hyperparameter-tuning process, and candidate hyperparameters are listed in Multimedia Appendix 3.

Feature Importance
Feature importance in the performance of the RNN model was evaluated using SHapley Additive exPlanations (SHAP) [26].This method explains the model outcome as a sum of values attributed to each input feature, allowing the SHAP value to be interpreted as feature importance.The gradient SHAP model was applied to calculate the SHAP value [26].The sum of SHAP values was used in the case of time-varying features.For non-RNN models, LinearExplainer (logistic regression and SVM) and TreeExplainer (DT, RF, XGB, and LGM) were used [26].

Statistical Analysis
Categorical and continuous variables are expressed as proportions and the means ± SD if they had a normal distribution and as medians with IQRs if they were non-normally distributed.Missing values of time-invariant features (4219 cases [28.5%] had at least 1 missing value) were imputed by the κ-nearest-neighboring imputer based on information in the training set [27].If there were missing values in time-varying features (7031 cases [49.6%] had at least 1 missing value), masking was used during training of the RNN model.Model performance was evaluated in the test set using the area under the receiver operating characteristic curve (AUROC) and compared between models using the Delong test.All P values were set as two-sided, and values less than 0.05 were defined as significant.Statistical analyses were performed using R software (version 4.0.2;The Comprehensive R Archive Network: http://cran.r-project.org) and Python (version 3.8.3;Python Software Foundation: http://www.python.org).TensorFlow 2.3.0 (Google Brain, Google Inc.) was used as a deep learning framework [28], and other machine learning algorithms were performed by Scikit-learn [29].

Baseline Characteristics
The mean age of cases was 67.5 (SD 11.1) years, and 22.8% (n=3233) were female.The median values of serum creatinine and eGFR were 1.4 mg/dL (IQR 1.3-1.7 mg/dL) and 47.1 mL/min/1.73m² (IQR 38.9-56.1 mL/min/1.73m²), respectively.The most common protocol was CT of the abdomen and pelvis (n=4360, 30.7%), followed by the liver (n=3323, 23.4%) and urogenital area (n=1330, 9.4%).Other baseline characteristics of the patients are presented in Table 1.The values of baseline characteristics did not differ between the training and test sets (Multimedia Appendix 4).

CIAKI and Long-Term Outcomes
Intravenous CIAKI occurred in 261 (1.8%) patients after CT scanning (1.8% in the training set and 2.0% in the test set).
During the median follow-up period of 4 years (IQR 2-7 years), renal progression and all-cause mortality were identified in 3400 (24.0%) and 3762 (26.5%) patients, respectively.The CIAKI group had a higher risk of these outcomes compared with the non-CIAKI group (P<.001 for renal progression and P=.042 for all-cause mortality; see Multimedia Appendix 5).

Model Performance
When model performance was evaluated in the test set, the RNN model achieved the highest AUROC of 0.755 (95% confidence interval [CI] 0.708-0.802),followed by the RF (0.726 [95% CI 0.674-0.778])and logistic regression (0.690 [95% CI 0.632-0.748])(Table 2).The AUROC of the RNN model was greater than that obtained from other machine learning models (P<.05), except the RF, and the corresponding curves support these results (Figure 2).We further compared the performance of the RNN model with other published scoring models.Eight studies have developed models to predict intravenous CIAKI [5][6][7][8][9][10][11][12].The flowchart of study selection and their associated information is presented in Multimedia Appendix 6 and Table 3, respectively.Of these 8 models, 5 used specific features to develop models, such as cystatin C [6][7][8]10], homocysteine [7], neutrophil gelatinase-associated lipocalin [10], β2-microglobulin [10], and urine output [9].Accordingly, 3 other models, such as the Mehran score [30], which was originally developed for patients undergoing intra-arterial administration of contrast media during coronary angiography but had also undergone CT scanning in 1 study [11], and two logistic regression-based models without testing of an independent data set [5,12], were compared to the RNN model.The performance of these 3 models was lower than that of the RNN model with the following AUROCs: 0.521 (P<.001) in the Mehran score and 0.539 (P<.001) and 0.645 (P=.022) in the other 2 logistic regression-based models.k Included patients with both intravenous and intra-arterial administration of contrast media.

Sensitivity Analysis
For sensitivity analysis, another definition of CIAKI was used, an increase in serum creatinine of ≥0.5 mg/dL or ≥25% within 3 days [22].The RNN model was the best model in predicting the risk of CIAKI, with an AUROC of 0.716 (95% CI 0.664-0.768),which was greater than that of most of the other machine learning models (Multimedia Appendix 7).The corresponding curves support these results (Multimedia Appendix 8).
Other machine learning models were trained after including 48 features (ie, 16 sets of serum creatinine, eGFR, and elapsed times) as an independent feature without timed order.The results are summarized in Multimedia Appendix 9.Although these XSL • FO RenderX features were considered in the models, the model performance was less than that of the RNN model.Furthermore, the original pipeline was separated into 4 models (MLP alone, MLP plus merging, RNN alone, and RNN plus merging), and their performance was compared with that of the original pipeline (named a default model).The AUROC plots are presented in Multimedia Appendix 10.The deep learning model with the MLP module alone and the RNN module alone had AUROCs of 0.705 (95% CI 0.647-0.763)and 0.702 (95% CI 0.642-0.763),respectively.After adding the merging module to these models, the AUROCs were 0.710 (95% CI 0.653-0.768) in the MLP-plus-merging module and 0.675 (95% CI 0.610-0.740) in the RNN-plus-merging module.All these values were lower than the value from the original deep learning model.
To evaluate the effect of the model complexity on performance, we built other deep learning architectures, such as a simple model (ie, 1 less dense layer in the RNN module, MLP module, and merging module) and a complex model (ie, 1 more dense layer in the RNN module, MLP module, and merging module).The AUROCs were 0.751 (95% CI 0.702-0.801)and 0.734 (95% CI 0.678-0.791) in the simple and complex models, respectively.We also developed models with a single LSTM layer having a simpler RNN architecture (named "single model") and with two stacked bidirectional LSTM layers having a more complex RNN architecture (named "bidirectional model").The single and bidirectional models had AUROCs of 0.746 (95% CI 0.696-0.795)and 0.717 (95% CI 0.656-0.777),respectively.
The AUROC plots of these models compared to that of the original model (named "default model") are described in Multimedia Appendix 11.

Principal Results
Intravenous CIAKI is a critical issue because it contributes to poor outcomes [31], as noted in its association with renal progression and increased mortality above.This study first applied the RNN algorithm to predict intravenous CIAKI with a greater AUROC than that obtained from other machine learning or conventional scoring models.These results indicate that the time-varying data of kidney function (ie, serum creatinine and eGFR) significantly contribute to the precise prediction of intravenous CIAKI.SHAP analysis demonstrated that feature importance could help understand how risk is estimated.
Because kidney function fluctuates over time, a single value of serum creatinine or eGFR may not perfectly represent the kidney function of patients.Certain attempts using time-varying kidney functions by time-dependent Cox regression [32] and trajectory analysis [33] have improved the precise estimation of kidney function.Recently, deep learning with the RNN model showed favorable performance in predicting acute kidney injury [15], implying the additive benefit of time-varying kidney functions to the model performance.Patients with comorbidities, including cancer, diabetes mellitus, and chronic kidney disease, are recommended for frequent follow-up of their kidney function because these data can be used to better predict the trend of kidney function than a single estimation.In this regard, the present RNN model achieved the highest performance in predicting intravenous CIAKI with time-varying features.
Deep learning architecture is complex and difficult to interpret in nature and is referred to as a black box.To overcome this limitation, this study applied SHAP to concretely explain the model output.Using SHAP values, clinicians can comprehend how the risk probability is explained by the results of various features and decide whether the model output is feasible.If the model prediction seems to be imprecise, as in the lower case in Figure 3B, the SHAP values in features highly relevant to the model performance provide room for reconsideration.

Limitations
Despite these informative results, there are limitations to be discussed.The study design was retrospective and needs to be validated in future independent cohorts.Unidentified factors, such as urine output and heart function, may provide additional information about the risk of CIAKI, but the present data set included most clinically used features.The prophylaxis protocol may differ between centers, and thus, the present RNN model may need to be adjusted when applied externally.

Conclusions
Application of a deep learning algorithm improves the predictability of intravenous CIAKI, and our model performs better than other machine learning and conventional scoring models.These results may be attributable to the consideration of time-varying kidney functions, in addition to time-invariant features, and corresponding SHAP values may maximize the utility of the model in clinics.If proactive management of intravenous CIAKI is possible via precise prediction, overall patient outcomes will improve.The study results represent the basis of this goal.

Multimedia Appendix 5
Kaplan-Meier curves of renal survival (A) and patient survival (B) according to intravenous CIAKI.CIAKI: contrast media-induced acute kidney injury.

Multimedia Appendix 6
Flowchart of study selection regarding the modeling of predicting intravenous CIAKI.CIAKI: contrast media-induced acute kidney injury.

Multimedia Appendix 7
Table of AUROCs for predicting intravenous CIAKI, which was defined as an increase in serum creatinine ≥0.5 mg/dL or ≥25% within 3 days.AUROC: area under the receiver operating characteristic curve; CIAKI: contrast media-induced acute kidney injury.

[
10] k a CIAKI: contrast media-induced acute kidney injury.b AUROC: area under the receiver operating characteristic curve.c CT: computed tomography.d N/A: not available.e CAG: coronary angiography.f sCr: serum creatinine.g CysC: cystatin C. h eGFR: estimated glomerular filtration rate.i Used the Mehran risk score.
j RIFLE: Risk Injury Failure Loss of kidney function and End-stage kidney disease classification.

Type of CT c , n (%)
a CIAKI: contrast media-induced acute kidney injury.b P values were derived from the chi-square tests for categorical variables and the Student t-test or the Mann-Whitney U test for continuous variables.c CT: computed tomography.d N/A: not applicable.e eGFR: estimated glomerular filtration rate.JMIR Med Inform 2021 | vol. 9 | iss. 10 | e27177 | p. 3 https://medinform.jmir.org/2021/10/e27177(page number not for citation purposes)

Table 2 .
AUROC a of machine learning models in predicting intravenous CIAKI b .

Table 3 .
Previous studies predicting intravenous CIAKI a .