A Deep-Learning Algorithm (ECG12Net) for Detecting Hypokalemia and Hyperkalemia by Electrocardiography: Algorithm Development

Background: The detection of dyskalemias—hypokalemia and hyperkalemia—currently depends on laboratory tests. Since cardiac tissue is very sensitive to dyskalemia, electrocardiography (ECG) may be able to uncover clinically important dyskalemias before laboratory results. Objective: Our study aimed to develop a deep-learning model, ECG12Net, to detect dyskalemias based on ECG presentations and to evaluate the logic and performance of this model. Methods: Spanning from May 2011 to December 2016, 66,321 ECG records with corresponding serum potassium (K) concentrations were obtained from 40,180 patients admitted to the emergency department. ECG12Net is an 82-layer convolutional neural network that estimates serum K concentration. Six clinicians—three emergency physicians and three cardiologists—participated in human-machine competition. Sensitivity, specificity, and balance accuracy were used to evaluate the performance of ECG12Net with that of these physicians. Results: In a human-machine competition including 300 ECGs of different serum K+ concentrations, the area under the curve for detecting hypokalemia and hyperkalemia with ECG12Net was 0.926 and 0.958, respectively, which was significantly better than that of our best clinicians. Moreover, in detecting hypokalemia and hyperkalemia, the sensitivities were 96.7% and 83.3%, JMIR Med Inform 2020 | vol. 8 | iss. 3 | e15931 | p. 1 https://medinform.jmir.org/2020/3/e15931 (page number not for citation purposes) Lin et al JMIR MEDICAL INFORMATICS


Introduction
Dyskalemias-hyperkalemia and hypokalemia-are common causes of sudden cardiac death in clinical practice [1]. Prompt recognition and rapid correction of these potassium (K + ) derangements are needed to prevent catastrophic outcomes [2]. Currently, the detection of dyskalemia relies on laboratory tests. Point-of-care blood testing provides rapid analysis of electrolyte levels, however, its accuracy and precision may not be as reliable as that from a clinical central laboratory; this is mainly due to dilution, which would underestimate plasma K + concentration, and the inability to discern hemolysis from pseudohyperkalemia [3,4]. Electrocardiography (ECG) is universally needed in patients with emergent cardiac or noncardiac conditions, which may exhibit the typical changes seen in dyskalemia since cardiac tissue is very sensitive to this disease. The main ECG changes associated with hypokalemia include a decreased T wave amplitude, ST-segment depression, T wave inversion, a prolonged PR interval, and an increased corrected QT interval (QTc) [5]. The typical ECG findings for hyperkalemia progress from tall peaked T waves and a shortened QT interval to a lengthened PR interval and a loss of the P wave, followed by a widening QRS complex and ultimately a sine wave morphology [5,6]. Although these morphologic changes are well known in dyskalemias, even experienced clinicians frequently do not notice all of these subtle details [7].
Previous researchers have developed ECG quantification algorithms to predict serum K + concentration based on T wave morphology, mainly using the slope and width of T waves. Hyperkalemia is associated with tall, narrow, and symmetrical T waves, whereas hypokalemia is associated with flat T waves [8][9][10][11][12]. The algorithms were mostly derived from continuous patient monitoring, such as during hemodialysis, with homogeneous ECG morphologies from a limited set of patients [8][9][10][11][12]. Recently, applying the processing of T wave morphologies manually has been used to improve the diagnosis of hyperkalemia [13]. Nevertheless, using T wave changes alone to detect dyskalemias is less sensitive and specific than a comprehensive ECG interpretation [14].
With the revolution in artificial intelligence (AI), several advanced deep-learning models, such as Oxford's VGGNet [15], Inception Net [16], ResNet [17], and DenseNet [18], have been developed, providing an unprecedented opportunity to improve health care; this was initiated by AlexNet's victory in the ImageNet Large Scale Visual Recognition Challenge in 2012 [19]. Existing deep-learning models have been shown to achieve human-level performance and be effective in medical applications when large annotated datasets are available [17,[20][21][22]. This potential to improve diagnosis and patient care prompted us to develop a deep-learning model to assist emergency physicians in recognizing ECG changes associated with dyskalemias.
Our study aimed to train a deep-learning model, ECG12Net, to predict serum K + concentration by ECG. The deep-learning model was an 82-layer convolutional neural network that underwent a series of training processes to optimize model performance. The AI system, which will learn from more than 50,000 electrocardiograms to identify critical morphologic changes, will help to reduce medical errors in emergency departments (EDs) resulting from intense time pressure and harried ED staff during busy periods in ED environments [23]. Facilitated by the system's powerful computing ability, the performance of the trained model was compared with that of emergency physicians and cardiologists. Finally, we visualized ECG12Net's calculation process to understand why and how it works.

Data Source
The data were obtained from Tri-Service General Hospital, Taiwan, and research approval was given by the Institutional Review Board (IRB) (IRB No. 1-107-05-047). From May 11, 2011, to December 31, 2016,180 emergency patients were enrolled who had 66,321 ECG records within 1 hour before or after serum K + concentration for reference. Serum K + concentrations were measured in the laboratory using indirect ion-selective electrode methods that had been accredited by the International Organization for Standardization (ISO) standard ISO-15189 and the College of American Pathologists' Laboratory Accreditation Program. All hemolyzed samples were excluded. Potential confounders, such as patients with chest pain or thyroid disorders, were not excluded from the study. We divided the dataset into training (~70%), validation (~10%), and test (~20%) sets by date. Emergency patients presenting before April 30, 2016, were included in the training set; those presenting between May 1 and July 20, 2016, were in the validation set; and those presenting after July 21, 2016, were in the test set to assess model performance. All records included in the training set were excluded from the validation and test sets; thus, there was no overlap among the three datasets. The ECG recordings were collected using a Philips 12-Lead ECG machine (PH080A). The ECG signal was recorded in a digital format. The sampling frequency was 500 Hz with 2.5 seconds recorded in each lead. The estimated K + concentrations ranged from 1.5 mEq/L to 7.5 mEq/L. Predicted K + concentrations less than 1.5 mEq/L or greater than 7.5 mEq/L were indicated accordingly without further detail (ie, as either <1.5 mEq/L or >7.5 mEq/L). Patient characteristics and laboratory results were collected using an electronic health record system. The estimated glomerular filtration rate was calculated using the Chronic Kidney Disease Epidemiology Collaboration formula [24]. Eight basic ECG morphology parameters (EMPs) were calculated by the Philips 12-Lead ECG machine: heart rate, PR interval, QRS duration, QT interval, QTc, P wave axis, RS wave axis, and T wave axis.

The Implementation of ECG12Net
We developed a 12-channel sequence-to-sequence model, which is modified from DenseNet [18]. The details are shown in Multimedia Appendix 1. The architecture of ECG12Net is shown in Figure 1. We designed an ECG lead block with 80 trainable layers whose architecture is shown in Figure 1 A. This ECG lead block was used to extract 864 features from each ECG lead, making a basic output prediction based on each lead. Figure 1 B shows how ECG12Net integrates all the information from the ECG leads to make an overall prediction. ECG12Net is composed of 12 of these ECG lead blocks corresponding to each lead sequence. We designed an attention mechanism based on a hierarchical attention network to concatenate these blocks, increasing the interpretive power of ECG12Net [25]. ECG12Net-1, which uses only ECG wave information, contains 82 trainable layers. To improve prediction performance, we added an EMPNet, which is a multilayer perceptron with two hidden layers containing eight EMPs, to ECG12Net-1 to create ECG12Net-2.

Human-Machine Competition
We evaluated the performance of practicing physicians using a subtest set. We divided the data into five categories based on the serum K + concentration: (1) K + ≤2.5 mEq/L, (2) 2.5< K + ≤3.5 mEq/L, (3) 3.5< K + <5.5 mEq/L, (4) 5.5≤ K + <6.5 mEq/L, and (5) K + ≥6.5 mEq/L. Stratified sampling was used to create the subtest set due to the rarity of cases in the first and fifth categories. Each category of K + concentration comprised 60 cases, and a total of 300 cases were used in the test. The participating physicians included an emergency physician under training (second-year resident); two emergency physicians, one with 4 and the other with 13 years of experience; a chief resident in cardiology; and two cardiologists, one with 2 and the other with 9 years of experience. The physicians had no access to patient information and no knowledge of the data. The responses they provided were entered into an online standardized data entry program. We calculated their sensitivity and specificity and compared their results with those of ECG12Net.

Statistical Analysis and Model Performance Assessment
The study cohort was divided into training, validation, and test sets. We presented their characteristics as the means and standard deviations, the numbers of patients, or the percentages, where appropriate. This information was compared using either analysis of variance or the chi-square test as appropriate. We then analyzed the EMP differences between the five serum K + groups, and the EMPs were subjected to post hoc analysis. All the dyskalemia groups were compared to the normal group.
The primary analysis was done to evaluate the performance in dyskalemia prediction between ECG12Net and the clinicians in a machine-human competition. Receiver operating characteristic curves and the areas under the curve (AUCs) were applied to evaluate the competition results. Additionally, the sensitivity, specificity, and balance accuracy of dyskalemia prediction by ECG12Net and the clinical physicians were calculated. The balance accuracy is defined as the mean of the sensitivity and specificity obtained in the study. Due to the stratified sampling process destroying the original prevalence, the positive predictive value and negative predictive value for the competition results are not presented.
The secondary analyses were performed on our test set with the data obtained after July 21, 2016, which had not been used in the training process. This was a simulated prospective study to evaluate the performance of the AI models with the mean absolute error (MAE) as the major measurement index due to the continuous predictions. Moreover, categorized analyses are also presented. Sensitivity, specificity, positive predictive value, negative predictive value, and the squared weighted kappa were used to evaluate the performance of the models. Finally, we conducted a series of logistic models to identify the effects of patient demographic characteristics on the performance of our deep-learning model.
We used a significance level ofP< throughout the analysis. Bootstrap 95% CIs were calculated and presented for all measure indexes based on 10,000 permutations. No additional adjustments for multiple comparisons were used because of the small number of planned comparisons. The statistical analysis was carried out using the software environment R, version 3.4.3 (The R Foundation).

Cohort Description
The training, validation, and test sets comprised records from 28,183; 3993; and 8004 patients, respectively. Table 1

Performance of ECG12Net on the Test Set
The model performance on the test set is shown in Multimedia Appendix 1. The performance of ECG12Net was better than that of each lead. ECG12Net-1 had the lowest MAE (0.531). Including EMP information did not improve the prediction of K + concentration (MAE ECG12Net-1: 0.531; MAE ECG12Net-2: 0.538). When categorizing among three classes-hypokalemia, normokalemia, and hyperkalemia-and five classes, with the addition of severe hypokalemia and severe hyperkalemia, as described in Multimedia Appendix 1, a similar performance was observed by ECG12Net-1; this demonstrated the highest squared weighted kappa of 0.354 in the three-class categorization and 0.396 in the five-class categorization. For the detection of hypokalemia, the sensitivity, specificity, positive predictive value, and negative predictive value of ECG12Net-1 were 50.7%, 81.6%, 44.7%, and 85.0%, respectively; for hyperkalemia, they were 50.8%, 96.0%, 26.9%, and 98.5%, respectively. The confusion scatter plots for the predictions by the two ECG12Nets are shown in Figure 3. Importantly, in detecting severe hypokalemia and hyperkalemia, ECG12Net-1 demonstrated a sensitivity of 95.6% and 84.5%, respectively. ECG12Net-2 exhibited similar prediction capabilities for severe hypokalemia and hyperkalemia as ECG12Net-1.  set (n=13,222). The x-axis indicates the true K+ concentration from laboratory testing. The y-axis presents the predicted K+ concentration by ECG12Net-1 (A) and ECG12Net-2 (B). Red points represent the highest density, followed by yellow, green, light blue, and dark blue. Perfect model performance would fall only along the red diagonal line. We categorized the K+ concentration into five groups (K + ≤2.5 mEq/L, 2.5< K + ≤ 3.5 mEq/L, 3.5< K + <5.5 mEq/L, 5.5≤ K + <6.5 mEq/L, and K + ≥6.5 mEq/L) and calculated the case counts in each grid.

Model Interpretation
A total of 58 severe hypokalemia cases were correctly detected by ECG12Net-1, of which 15 (26%) were overlooked by clinician consensus. The classical ECG findings of U wave and ST segment depression, especially in leads V2 and V3, were consistently recognized as severe hypokalemia by both the clinicians and ECG12Net-1 (see Figure 4 A). As shown in Figure 4 B, ECG12Net-1 predicted a case of severe hypokalemia from ST segment depression in the V3 lead; this case was misdiagnosed by all the clinicians. Two cases of severe hypokalemia were misclassified by ECG12Net-1 but diagnosed correctly by the clinicians (data not shown). These cases had severe noise in the presented ECG; however, the clinicians made the correct diagnosis based on the presence of a prolonged QTc.
A total of 50 severe hyperkalemia cases were correctly detected by ECG12Net-1, with 36 (72%) of these cases overlooked by clinician consensus. Figure 4 C shows a typical ECG presentation of severe hyperkalemia with tented T waves accompanied by a long QRS complex duration, which was correctly diagnosed by all clinicians and ECG12Net-1. Figure  4 D shows a case of severe hyperkalemia correctly recognized by ECG12Net-1, with ST depression followed by a peaked T wave in lead V6, which was misdiagnosed as hypokalemia by all the clinicians. There were also 10 cases of severe hyperkalemia overlooked by ECG12Net-1 and all clinicians.

Discussion
In this study, we developed a deep-learning model, ECG12Net, to detect dyskalemias through ECG analysis. Using a deep convolutional network extracting many useful ECG features with a training set of more than 50,000 ECGs, ECG12Net performed better than clinicians in detecting dyskalemias. Notably, ECG12Net performed well with sensitivities of 95.6% and 84.5% in detecting severe hypokalemia and severe hyperkalemia, respectively. ECG interpretation is one of the most important skills in medical practice. Previous studies have analyzed morphological features, for instance, the R wave peak [26] and the QRS complex [27], combined with machine learning approaches for disease detection, such as atrial fibrillation [28]. These systems were relatively imprecise, making it troublesome to quantify specific rhythm morphologies [29]. Although some recent studies have used deep convolutional neural networks and recurrent neural networks mainly for arrhythmia detection [30][31][32][33][34][35], most of the data were collected from wearable devices without offering all the important information provided by a 12-lead ECG [11]. The clinical value of these findings is also dampened by the lack of laboratory-based diagnosis and annotation and the relatively small volumes of data. In contrast, our database was unprecedented, comprising 40,180 patients and 66,321 laboratory-annotated ECG records collected by standard 12-lead ECG machines.
Galloway et al recently developed a deep-learning model to screen for hyperkalemia in patients with chronic kidney disease, stage III or higher, using ECG [36]. We applied ECG12Net to a broad set of patients in the ED and developed a continuous prediction of both hypokalemia and hyperkalemia. Moreover, although the three-category classification task in our study is more difficult than the two-category classification task in theirs, our ECG12Net achieved an AUC greater than 0.9 in detecting hyperkalemia, which is similar to that of their model with an AUC of 0.85-0.88. This highlights the strength of ECG12Net.
The EMPs of different K + concentration groups yielded several interesting findings. The EMPs, such as the PR and QTc intervals, and the data used for analysis were all collected from the original ECGs (see Multimedia Appendix 1). The impact of hyperkalemia on the T wave axis was more profound and substantial than the axes of the P and RS waves. Hypokalemia was actually associated with a widening of the QRS complex, which may be explained by the decrease in conduction velocity caused by reduced K + concentrations after hemodialysis [37]. Although the longest QTc occurred in the severe hypokalemia group, a well-documented finding, the QTc was longer in patients with hyperkalemia as well. In fact, for most of the intervals and durations, the nadir was in normokalemia, with increases on both forms of dyskalemia. Although the underlying mechanisms are unclear, these findings uncovered by big data may guide directions for further research.
Interestingly, the algorithm focusing only on morphologic changes (ie, ECG12Net-1) performed slightly better than that with additional EMP information (ie, ECG12Net-2). That the addition of EMP information did not improve the model's predictive ability corroborates prior research that found that deep-learning models can automatically extract useful features for prediction without preprocessing [17,20,21]. This also highlights the importance of morphologic changes in ECG over EMPs in the detection of dyskalemias.
There are several clinical applications of ECG12Net shown in Multimedia Appendix 1. First, severe dyskalemia could be identified by ECG12Net within 5 minutes, much faster than laboratory testing, leading to more prompt management. Second, pseudodyskalemia, defined as an abnormal reported serum or plasma K + concentration despite a normal in vivo K + concentration, can be excluded early by ECG12Net to avoid inappropriate treatment. Third, the performance of ECG12Net is more than 10% better than that of the best cardiologist in our study, whose performance was similar to other experts in prior studies [38,39]. This means that emergency physicians could have access to a consistent, beyond cardiologist-level decision aid available 24 hours a day to help diagnose and manage dyskalemic patients. Fourth, the developed ECG12Net model can be included in a wearable device for dyskalemia detection, especially for patients with advanced chronic kidney disease or uremia on dialysis. Finally, the ECG12Net model could be incorporated into ECG machines in ambulances or remote areas to facilitate telemedicine.
Explainable AI plays a critical role in clinical practice [40,41]. The so-called "black box" approach in the deep-learning models often precludes the understanding of the decision-making process [42]. To increase the interpretability of our model, we established heatmaps to visualize the focus in the ECG by ECG12Net using class activation mappings [25,43], which can help physicians understand the logic of the AI decisions. Although our ECG12Net was approximately 3.85 times more likely to be correct when inconsistencies occurred between the AI and human predictions (see Multimedia Appendix 1), physicians who can integrate the AI suggestions with the symptoms and signs of patients should make the final decision to take appropriate action.
Some limitations of this study should be mentioned. First, the studied patients were only enrolled from one academic medical center, despite the similar distribution of blood K + concentration in other large studies [44,45]. Multicenter validation is needed to confirm the value and application of this study. Second, only six clinicians participated in the competition with ECG12Net's performance. Although their performance in severe hyperkalemia detection was consistent with that of the previous studies [38,39], comparisons should be made with more experts to confirm the superiority of ECG12Net. Third, only the patients in the ED with both an ECG and a serum K + test were enrolled in this study, which may have caused selection bias and constrained the generalizability of the results. Fourth, although the sensitivity heatmap provides a glimpse into the basis for ECG12Net's prediction, the reason why the particular ECG segment was highlighted remains unclear. Finally, ECG12Net showed decreased sensitivity in detecting mild-to-moderate hypokalemia, which accounts for the majority of dyskalemias, leading to low weighted averages of the sensitivities. Hypokalemia-associated ECG changes usually occur when the serum K + level falls below 3 mEq/L [46], which may explain why our algorithm failed to accurately distinguish the ECG morphologies of mild-to-moderate hypokalemia from normokalemia.
In conclusion, we established a deep-learning model called ECG12Net to detect dyskalemias in the ED. The collaboration between physicians and AI can lead to better health care for our patients. This model will help emergency physicians promptly recognize severe dyskalemias and potentially reduce sudden cardiac death.