Diagnostic Model of in-Hospital Mortality in Patients with Acute ST-Segment Elevation Myocardial Infarction Used Artificial Intelligence Methods : Algorithm Development and Validation

Background: Preventing in-hospital mortality in Patients with ST-segment elevation myocardial infarction (STEMI) is a crucial step. Objectives: The objective of our research was to to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods. Methods: As our datasets were highly imbalanced, we evaluated the effect of down-sampling methods. Therefore, down-sampling techniques was additionally implemented on the original dataset to create 1 balanced datasets. This ultimately yielded 2 datasets; original, and down-sampling. We divide non-randomly the American population into a training set and a test set , and anther American population as the validation set. We used artificial intelligence methods to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients, including logistic regression, decision tree, extreme gradient boosting (XGBoost), K nearest neighbor classification model ,and multi-layer perceptron.We used confusion matrix combined with the area under the receiver operating characteristic curve (AUC) to evaluate the pros and cons of the above models.


Introduction
In the United States, an estimated 605,000 acute myocardial infarction (AMI) events occur each year . [1]In Europe, the in-hospital mortality of patients with ST-segment elevation myocardial infarction (STEMI) is between 4% and 12%. [2]Coronary heart disease including STEMI remains the main cause of death. [1]Preventing in-hospital mortality of STEMI is a crucial step.A tool is needed to help early detection of patients with increased in-hospital mortality.The Global Registration Risk Score for Acute Coronary Events (GRACE) can be accessed via mobile devices, so it enjoyed a high reputation among users.Myocardial infarction thrombolysis (TIMI) risk score can predict the clinical manifestations of 30-day mortality in patients with fibrinolytic-eligible STEMI. [3]The ACTION (Acute Coronary Treatment and Intervention Outcomes Network) score [4] was established in 2011 using 65,668 AMI patients, and 16,336 AMI patients were used to validate as a model for predicting in-hospital mortality.The ACTION model updated in 2016 used more patients and added cardiac arrest as a risk factor. [5]Xiang Li used the machine learning method to make a prediction model of in-hospital mortality for STEMI patients . [6]Kwon JM used deep learning to establish a prediction model of in-hospital mortality in STEMI patients, which is better than GRACE score and TIMI score. [7]e current prediction models have the following problems: People have insufficient understanding of the data set of in-hospital mortality as unbalanced data.The unbalanced data is not converted into balanced data.There is no confusion matrix to be made and the area under the receiver operating characteristic curve (AUC) or C statistic to evaluate the prediction model is not were used for this study.Inclusion criteria: 1. all those STEMI patients who are hospitali zed; 2. all those STEMI patients over 18 years of age.Exclusion criteria: none.It was a r etrospective analysis and informed consent was waived by Ethics Committee of Beijing An zhen Hospital Capital Medical University.Outcome of interest was in-hospital mortality.All in-hospital mortality was defined as cardiogenic or non-cardiogenic death during hospitaliz ation.The presence or absence of in-hospital mortality was decided blinded to the predict or variables and based on the medical record.
We selected 8 predictors based on clinical relevance and baseline descriptive statistics.The potential candidate variables were age, female, cardiogenic shock, atrial fibrillation(AF), ventricular fibrillation(VF),in-hospital bleeding and medical history such as hypertension, old myocardial infarction.All of them based on the medical record and blinded to the predictor variables.AF was defined as all type of AF during hospitalization.In-hospital bleeding was defined as all type of bleeding during hospitalization.
In the development dataset, 5,163 out of 44,975 hospitalized patients (11.5%) experienced in-hospital mortality which represented an imbalanced dataset.We evaluated the effect of common sampling methods including down-sampling methods.Therefore, down-sampling techniques was additionally implemented on the original dataset to create 1 balanced datasets.We randomly selected 13 percent in the survival data as the control group.This ultimately yielded 2 datasets; original, and down-sampling.
To ensure reliability of data, we excluded patient who had missing information on predictors.We kept all continuous data as continuous and retained on the original scale.Discrimination was the ability of the diagnostic model to differentiate between patient with and without in-hospital mortality.This measure was quantified by calculating the AUC [8] .
In addition to F1 scores, F2 scores and F0.5 scores are also widely used in statistics.Among them, in the F2 score, the weight of the recall is higher than the precision , and in the F 0.5 score, the weight of the precision is higher than the recall .The weight of the recall is higher than the precision for the mortality in STEMI patients .We use F2 scorex combined with AUC to evaluate the pros and cons of the above models.

Results
The study was approved by the ethics committee on October 25, 2019.Data collection started on November 6 , 2019.As of submission of the manuscript, 129,021 people had been recruited for the study.
In the training dataset, 5,163 out of 44,975 hospitalized patients (11.5%) experienced in-hospital mortality.The patients' baseline characteristics of original, and down-sampling were shown in  In the test set, 4,893 out of 43,562 hospitalized patients (11.2%) experienced in-hospital mortality.

Discussion
The mortality in STEMI patients was affected by many factors, including advanced age and Killip class, and so on . [9]In this study, we investigated the predisposing factors of in-hospital mortality in patients with acute STEMI.Age, female ,ventricular fibrillation, atrial fibrillation, cardiogenic shock , in-hospital bleeding and medical history such as hypertension, old myocardial infarction were significant independent predictors of in-hospital mortality.
The F2 score of logistic regression in the training set, the test set and the validation data set were 0.7, 0.7, and 0.54 respectively.The AUC of logistic regression in the training set, the test set and the validation data set were 0.72, 0.73, and 0.76 respectively.The diagnostic model built by logistic regression was the best.So we use the diagnostic model built by logistic regression.
Adanged age has been reported to be an independent risk factor of in-hospital mortality.Elder patients had a higher risk of mechanical complications. [9]5][16][17][18][19][20][21][22] Cardiogenic shock is common and highly morbid.Cardiogenic shock is a clinical condition defined as the inability of the heart, generally as a result of impairment of its pumping function,to deliver an adequate amount of blood to the tissues to meet resting metabolic demands. [13][23, 24]Mounting evidence mandates a more nuanced view of cardiogenic shock that factors in the complex interactions between the ventricles and the systemic or pulmonary vasculature, the interdependence between the left and right ventricles, and the molecular and inflammatory milieu that often accompanies cardiogenic shock. [25]anger CB et al.observed that age, Killip class, systolic blood pressure, ST-segment deviation, cardiac arrest during presentation, serum creatinine level, positive initial cardiac enzyme findings, and heart rate were independent predictors of in-hospital mortality among 11,389 patients in the GRACE [26] .Karen S. Pieper et al. generated the updated GRACE risk model and a nomogram. [27]The GRACE risk model has since been upgraded again [28] and simplified. [29]TIMI risk score predicting 30-day mortality at presentation of fibrinolytic-eligible patients with STEMI. [3]C-ACS [30] was simple four-variable scores that have been developed to enable risk stratification at first medical contact.
Acute coronary treatment and intervention outcomes network (ACTION) score [4] used 65,668 patients to develop and 16,336 patients to validate a model to predict in-hospital mortality .The ACTION model updated in 2016 used more patients (243,440) and added cardiac arrest as a risk factor. [5]This was a form of internal validation because their cohorts were randomly created . [8] far, clinicians and researchers usually use GRACE or TIMI scores to guide treatment decisions.
Our diagnostic model of in-hospital mortality build upon these studies in several ways.We converted the unbalanced data into balanced data.We used confusion matrix combined with AUC to evaluate the pros and cons of the above models.It can be easily calculated at patient presentation.It includes only baseline factors, namely age, female, ventricular fibrillation, atrial fibrillation, cardiogenic shock , in-hospital bleeding and medical history such as hypertension, old myocardial infarction .
Our study has several important limitations including its retrospective nature.The F2 score and AUC of logistic regression in the training set, the test set and the validation data set were modest.

Conclusion
The strongest predictors of in-hospital mortality were age, female, cardiogenic shock, AF, VF,in-hospital bleeding and medical history such as hypertension, old myocardial infarction.We comprehensive.Traditional statistical methods are difficult to deal with the above problems calmly; artificial intelligence methods are needed.The objective of our research was to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods.Methods The training dataset was 44,975 patients with acute STEMI from January 2016 to Decemb er 2016 in the United States.The test dataset was 43,562 hospitalized patients with acut e STEMI from January 2017 to December 2017 in the United States.The validation data se t came from 40,484 hospitalized patients with acute STEMI from January 2018 to Decemb er 2018 in the United States.Data from the National (Nationwide) Inpatient Sample (NIS)

Table 1 ,
2. Eight variables (age ,female , history of old myocardial infarction, history of hypertension, cardiogenic shock,VF, in-hospital bleeding , and AF) were significantly different of patient.

Table 1 .
Demographic and clinical characteristics of patient with and without in-hospital mortality in the training data sets(original) a AF: atrial fibrillation.b VF: ventricular fibrillation.

Table 2 .
Demographic and clinical characteristics of patient with and without in-hospital mortality in the training data sets(down-sampling) a AF: atrial fibrillation.b VF: ventricular fibrillation.

Table 3 .
Demographic and clinical characteristics of patient with and without in-hospital mortality in the test data sets VF: ventricular fibrillation.In the validation data set, 4,001 out of 40,484 hospitalized patients (9.9%) experienced in-hospital mortality.The baseline characteristics of the patients were shown in Table4.
a AF: atrial fibrillation.b

Table 4 .
Demographic and clinical characteristics of patient with and without in-hospital mortality in the validation data sets a AF: atrial fibrillation.b VF: ventricular fibrillation.

Table 6 . confusion matrix and AUC(original to original) TP
By comparing F2 score and AUC of Table6and Table7, we can find that diagnostic model built by dataset of down-sampling is better than those diagnostic model built by dataset of original.By comparing F2 score and AUC of Table7, we can find that the diagnostic model built by logistic regression is better than those diagnostic model built by decision tree, XGBoost, and K nearest neighbor .The diagnostic model built by logistic regression is as good as the one built by a TP:True Positive.bFN:False Negative.cFP:False Positive.dTN:True Negative.eAUC:the area under the receiver operating characteristic curve.Table 7. confusion matrix and AUC((down-sampling to original)) multi-layer perceptron, but it is simpler .So we use the diagnostic model built by logistic regression("modellog.m).