This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited.
We aimed to develop and validate machine learning (ML) models by using diverse fields of EMR to predict the risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery.
EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006 and 2016 were categorized into static basic (eg, demographics), dynamic time-series (eg, laboratory values), and cardiac-specific data (eg, coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. The primary outcome was 30-day mortality following invasive treatment.
GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80, and the AUPRCs of RF, LR, and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with an AUROC of 0.90, while FNN had an AUROC of 0.85. The AUROCs of LR and RF were slightly lower at 0.80 and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. Among the categories in the GBM model, time-series dynamic data demonstrated a high AUROC of >0.95, contributing majorly to the excellent results.
Exploiting the diverse fields of the EMR data set, the ML-based 30-day adverse cardiac event prediction models demonstrated outstanding results, and the applied framework could be generalized for various health care prediction models.
Cardiovascular disease is the leading cause of mortality throughout the world and is associated with various morbidities [
In recent times, with an increase in the availability of large volume of electronic medical record (EMR) data, there has been a gradual interest in using data-driven approaches to construct efficient tools for risk prediction [
The data for this study were obtained from Asan Medical Center, which provides quaternary medical care for people in South Korea. It has 55 departments—approximately 2700 beds—and >8000 employees; it sees approximately 3,000,000 outpatient clinic visits and 900,000 admissions per year. The Asan biomedical research environment is the data warehouse system of Asan Medical Center, which has deidentified information of 4 million patients and is updated every 3 days [
For external validation, we used data obtained from the EMRs of Ulsan University Hospital, which is a tertiary hospital with approximately 900 beds that caters to a metropolitan city and its surrounding suburban area in the southern region of South Korea. The patients’ demographics, medical practice, and operating systems differ between the 2 hospitals, which would allow evaluation of the model in a different population.
The overall process for building the EMR-based database is presented in Figure S1 of
Study diagram. Database, machine learning, and validation. AMC: Asan Medical Center; CABG: coronary artery bypass grafting; EMR: electronic medical record; ML: machine learning; PCI: percutaneous coronary intervention.
A cohort of 16,793 patients who had undergone PCI (n=12,519) or CABG (n=4274) between January 1, 2006 and November 30, 2016 was identified in the Asan heart registry. As the majority of patients underwent the index PCI or CABG within 1 year after their first generation of data in EMR, we fairly used 1-year accumulated data prior to index procedures for the entire population. The total number of independent records in the data set was 5,184,565, derived from 3364 features.
An example case incorporating serial and various electronic medical record data to predict adverse events. BP: blood pressure; BSA: body surface area; BUN: blood urea nitrogen; CAG: coronary angiography; CK-MB: creatine kinase myocardial band; Dia: diameter; EDD: end diastolic dimension; EF: ejection fraction; EKG: electrocardiogram; ESD: end systolic dimension; FFR: fractional flow rate; GLS: global longitudinal strain; Hb: hemoglobin; HR: heart rate; LDL: low-density lipoprotein; Leng: length; Lp(a): lipoprotein A; LV: left ventricle; PCI: percutaneous coronary intervention; pLAD: proximal left anterior descending; Pr: pressure; RR: respiratory rate.
We only used data generated until index PCI or CABG, whereas data obtained after the index procedure were excluded for developing ML algorithms (see
The hyperparameters for each model were determined using an empirical search and 5-fold cross-validation on the study population to determine the values that had the best performance (see
Hyperparameters and those values of each model.
Model, hyperparameter | Value | |
|
||
|
Solver | liblinear |
|
Maximal iteration | 100 |
|
||
|
Number of estimators | 100 |
|
Maximal depth | 10 |
|
||
|
Objective | binary |
|
Estimators | 150 |
|
Boosting type | Gradient boosting decision tree |
|
Number of leaves | 15 |
|
Maximal depth | –1 (no limit) |
|
Learning rate | 0.025 |
|
Minimal number of data in child | 90 |
|
||
|
Learning rate | 0.0002 |
|
Hidden layer units | (64,64) |
|
Batch size | 64 |
|
Epoch | 40 |
|
Dropout rate | 0.5 |
|
Optimizer | Adam (beta1=.5, beta2=.999) |
The descriptive characteristics of the study population are provided as number (%) and mean (SD) for categorical and continuous variables, respectively. The discrimination performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, we evaluated model calibration (ie, the model’s ability to accurately predict the observed absolute risk) by using the Brier score, where 0 would indicate perfect calibration, and generated the calibration plots. A 2-sided
The baseline characteristics of the population in the development and internal validation groups are listed in
Baseline clinical characteristics of the development and internal validation set.
Characteristics | Development and internal validation set | ||
|
Total population (N=16,793) | Percutaneous coronary |
Coronary artery bypass grafting surgery (n=4274) |
Age (years), mean (SD) | 62.7 (10.2) | 62.2 (10.5) | 64.1 (9.4) |
Male sex, n (%) | 12,465 (74.2) | 9312 (74.4) | 3153 (73.8) |
Body mass index (kg/m2), mean (SD) | 24.9 (3.1) | 25.0 (3.0) | 24.6 (3.1) |
Hypertension, n (%) | 10,697 (63.7) | 7758 (62) | 2939 (68.8) |
Diabetes mellitus, n (%) | 6084 (36.2) | 4127 (33) | 1957 (45.8) |
Hyperlipidemia, n (%) | 9200 (54.8) | 6932 (55.4) | 2268 (53.1) |
Current cigarette smoker, n (%) | 3009 (17.9) | 2424 (19.4) | 585 (13.7) |
Prior myocardial infarction, n (%) | 568 (3.4) | 394 (3.1) | 174 (4.1) |
Previous cerebrovascular accident, n (%) | 596 (3.5) | 420 (3.4) | 176 (4.1) |
History of congestive heart failure, n (%) | 243 (1.4) | 132 (1.1) | 111 (2.6) |
Peripheral vascular disease, n (%) | 278 (1.7) | 199 (1.6) | 79 (1.8) |
Valvular heart disease, n (%) | 387 (2.3) | 106 (0.8) | 281 (6.6) |
Chronic renal insufficiency, n (%) | 566 (3.4) | 363 (2.9) | 203 (4.7) |
Chronic lung disease, n (%) | 386 (2.3) | 306 (2.4) | 80 (1.9) |
Chronic liver disease, n (%) | 487 (2.9) | 396 (3.2) | 91 (2.1) |
History of malignancy, n (%) | 1019 (6.1) | 816 (6.5) | 203 (4.7) |
Presentation with acute myocardial infarction, n (%) | 3032 (18.1) | 2509 (20) | 523 (12.2) |
Admission via emergency department, n (%) | 5054 (30.1) | 3941 (31.5) | 1113 (26) |
Admission via outpatient clinics, n (%) | 11,739 (69.9) | 8578 (68.5) | 3161 (74) |
Baseline clinical characteristics of the external validation set.
Characteristics | External validation set | ||
|
Total population (n=4159) | Percutaneous coronary |
Coronary artery bypass grafting surgery (n=209) |
Age (years), mean (SD) | 61.7 (10.9) | 61.6 (9.4) | 62.7 (10.9) |
Male sex, n (%) | 2913 (70) | 2779 (70.3) | 134 (64.1) |
Body mass index (kg/m2), mean (SD) | 24.0 (5.4) | 24.0 (5.2) | 23.8 (6.4) |
Hypertension, n (%) | 1947 (46.8) | 1851 (46.8) | 96 (45.9) |
Diabetes mellitus, n (%) | 1278 (30.7) | 1195 (30.2) | 83 (39.7) |
Hyperlipidemia, n (%) | 1154 (27.7) | 1098 (27.7) | 56 (26.7) |
Current cigarette smoker, n (%) | 1285 (30.9) | 1234 (31.2) | 51 (24.4) |
Prior myocardial infarction, n (%) | 280 (6.7) | 265 (6.7) | 15 (7.1) |
Previous cerebrovascular accident, n (%) | 233 (5.6) | 220 (5.5) | 13 (6.2) |
History of congestive heart failure, n (%) | 76 (1.8) | 71 (1.7) | 5 (2.3) |
Peripheral vascular disease, n (%) | 49 (1.1) | 45 (1.1) | 4 (1.9) |
Valvular heart disease, n (%) | 27 (0.6) | 18 (0.4) | 9 (4.3) |
Chronic renal insufficiency, n (%) | 130 (3.1) | 123 (3.1) | 7 (3.3) |
Chronic lung disease, n (%) | 146 (3.5) | 143 (3.6) | 3 (1.4) |
Chronic liver disease, n (%) | 201 (4.8) | 193 (4.8) | 8 (3.8) |
History of malignancy, n (%) | 192 (4.6) | 183 (4.6) | 9 (4.3) |
Presentation with acute myocardial infarction, n (%) | 1357 (32.6) | 1314 (33.2) | 43 (20.5) |
Admission via emergency department, n (%) | 1706 (41) | 1634 (41.3) | 72 (34.4) |
Admission via outpatient clinics, n (%) | 2453 (58.9) | 2316 (58.6) | 137 (65.5) |
Five-fold cross-validation of performance of each machine model in predicting 30-day mortality after invasive treatment. A. Area under the receiver-operator characteristic curve, B. Area under the precision-recall curve, and C. Calibration plot with Brier score.
On external validation using the data set of the Ulsan University hospital, maximal predictive performance was observed with GBM (AUROC 0.90, 95% CI 0.86-0.95;
External validation of performance of each machine model in predicting 30-day mortality after invasive treatment. A. Area under the receiver operator characteristic curve, B. Area under the precision-recall curve, and C. Calibration plot with Brier score.
Prediction performance of the gradient boosting machine model assessed by area under the receiver operator characteristic curves. A. Each data category, B. Combination of data categories. AUROC: area under the receiver operator characteristic curve.
The performance of the ML models for predicting 30-day MACE is demonstrated in
Performance of machine learning models for predicting major adverse cardiac events.
Model | Area under the receiver operating characteristic curve | 95% CI | Area under the precision-recall curve | Brier score | |
Logistic regression | 0.83 | 0.82-0.88 | <.001 | 0.37 | 0.06 |
Random forest | 0.85 | 0.83-0.88 | <.001 | 0.39 | 0.06 |
Gradient boosting machine | 0.88 | 0.85-0.90 | <.001 | 0.50 | 0.05 |
Feedforward neural network | 0.85 | 0.83-0.88 | <.001 | 0.41 | 0.06 |
The rank of important variables in the models for predicting 30-day mortality is presented in
Top 10 important variables of each machine learning model.
Rank | Logistic regression | Random forest | Gradient boosting machine | Feedforward neural network |
1 | Systolic blood pressure | Serum aspartate aminotransferase | Serum protein | Serum phosphorus |
2 | Diastolic blood pressure | PaCO2 | Age | PaCO2 |
3 | Respiratory rate | Arterial pH | Serum phosphorus | Hemoglobin |
4 | PaCO2 | PaO2 | Systolic blood pressure | Systolic blood pressure |
5 | Arterial pH | Serum alanine aminotransferase | Platelet | Normal sinus rhythm in electrocardiogram |
6 | PaO2 | Total bilirubin | Serum aspartate aminotransferase | Estimated glomerular filtration rate |
7 | Aspartate aminotransferase | Creatine kinase-myocardial band | PaO2 | Serum glucose |
8 | Pulse rate | White blood cell | Serum albumin | Platelet |
9 | Blood urea nitrogen | Serum sodium | Pulse rate | PaO2 |
10 | Serum phosphorus | Platelet | Activated partial thromboplastin time | Arterial pH |
This was a retrospective study that applied ML to structured and unstructured patient data from the EMR of a large quaternary hospital to develop a risk prediction model for 30-day adverse cardiac events in patients who underwent PCI or CABG. We comparatively evaluated the performance of several models; all models demonstrated outstanding results with AUROCs more than 0.90 with excellent calibration. On external validation, the performance in predicting 30-day mortality decreased; however, it remained favorable. Dynamic time-series data, including laboratory values, vital signs, and medications, demonstrated the best performances, which mainly contributed to outstanding performance of the models.
Traditional risk prediction models are derived from a small set of selected risk factors based on the significant univariate relationship with the end point on LR, which might deteriorate the predictive performance. Moreover, it is difficult to include new and more discriminatory risk factors into the traditional models, which limits their extension ability [
In this study, we found that the algorithms developed from a large single-center EMR database were reliable for use in the population of a different hospital, albeit with a relatively low performance. Of note, different hospitals serve dissimilar patient populations and have divergent clinical practice patterns; therefore, the EMR data reflecting the real-world clinical practice in each hospital has its own distinct characteristics. Hence, a somewhat low performance of the proposed prediction models in a different cohort can be anticipated. Ideally, a model that achieves the highest possible level of generalizability is desirable. However, there have been concerns about whether a model developed at 1 center can be applied to another center [
Predictive models with EMR data frequently rely on structured data. However, given the volume and richness of data available in unstructured clinical notes or reports, ML models might benefit from leveraging text mining tools to enhance the model [
Several limitations of this study should be noted. First, the cardiovascular event rates, including mortality, might be underestimated because events were captured only from a single-center EMR database. Linking it with the national claim data or health insurance data might possibly capture the events more accurately. Second, although ease of interpretation is vital for evaluation of the models [
Exploiting the diverse parameters of EMR data sets, we developed and validated ML models for predicting the 30-day mortality risk following PCI or CABG. The ML algorithms showed excellent performance, and the applied framework can be generalized for various health care prediction models. This study suggests that ML using the real-word clinical data set can provide a substantial method of developing risk prediction models. Future studies are warranted to establish the clinical effectiveness of this approach and real-time application at the point of care.
Supplementary figures and tables.
Time-series analysis.
area under the precision-recall curve
area under the receiver operating characteristic curve
coronary artery bypass grafting
electronic medical record
feedforward neural network
gradient boosting machine
logistic regression
major adverse cardiac events
machine learning
percutaneous coronary intervention
random forest
single-photon emission computed tomography
This work was supported by the Institute for Information and Communications Technology Promotion grant funded by the Korean government (Ministry of Science and Information Communications Technology; 2018-0-00861, Intelligent Software Technology Development for Medical Data Analysis) and the Korea Medical Device Development Fund grant funded by the Korea government (Ministry of Science and Information Communications Technology, Ministry of Trade, Industry and Energy, Ministry of Health and Welfare, Ministry of Food and Drug Safety; Project: KMDF_PR_20200901_0097).
None declared.