TOP-Net Prediction Model Using Bidirectional Long Short-term Memory and Medical-Grade Wearable Multisensor System for Tachycardia Onset: Algorithm Development Study

Background: Without timely diagnosis and treatment, tachycardia, also called tachyarrhythmia, can cause serious complications such as heart failure, cardiac arrest, and even death. The predictive performance of conventional clinical diagnostic procedures needs improvement in order to assist physicians in detecting risk early on. Objective: We aimed to develop a deep tachycardia onset prediction (TOP-Net) model based on deep learning (ie, bidirectional long short-term memory) for early tachycardia diagnosis with easily accessible data. Methods: TOP-Net leverages 2 easily accessible data sources: vital signs, including heart rate, respiratory rate, and blood oxygen saturation (SpO2) acquired continuously by wearable embedded systems, and electronic health records, containing age, gender, admission type, first care unit, and cardiovascular disease history. The model was trained with a large data set from an intensive care unit and then transferred to a real-world scenario in the general ward. In this study, 3 experiments incorporated merging patients’personal information, temporal memory, and different feature combinations. Six metrics (area under the receiver operating characteristic curve [AUROC], sensitivity, specificity, accuracy, F1 score, and precision) were used to evaluate predictive performance. JMIR Med Inform 2021 | vol. 9 | iss. 4 | e18803 | p. 1 https://medinform.jmir.org/2021/4/e18803 (page number not for citation purposes) Liu et al JMIR MEDICAL INFORMATICS


Introduction
Tachycardia, a heart rhythm disorder, is defined as an adult resting heart rate that exceeds 100 bpm [1]. According to the mechanisms, causes, expressions and outcomes, tachycardia can be classified as sinus tachycardia, atrial fibrillation, atrial flutter, ventricular tachycardia, or ventricular fibrillation [2]. Spontaneous ventricular tachyarrhythmia is a major cause of sudden cardiac death; approximately 180,000 to 300,000 people suffer from this condition in the US yearly [3,4]. Atrial fibrillation is a risk factor for stroke, congestive heart failure, and premature death. Patients suffering from atrial fibrillation for the first time have a high rate of mortality [5,6]. In addition, tachycardia has been correlated to poor outcomes [7]. Conventional tachycardia detection depends on cardiologists or clinical experts reading electrocardiogram (ECG) signals. Due to limited numbers of measurements and the intermittent nature of the diseases, the symptoms of tachycardia might not be captured when ECGs are recorded in hospitals [8]. Therefore, continuous monitoring enables clinicians to early diagnose, predict the disease, and have enough time to prevent patients from deteriorating.
Recently, several hospitals have attempted to utilize wearable devices for continuous monitoring of vital signs such as heart rate, respiration rate, and oxygen saturation (SpO 2 ) [9,10]. The adoption of wearable devices in hospitals facilitates the acquisition of patient status anywhere and anytime to reduce the workload of nurses. Compared with the use of single-threshold alarm monitoring devices and commonly used early warning scores defined by clinical experts [11], machine learning methods can automatically discover patterns and relationships within data without human instructions. Thus, machine learning has been proven as an effective clinical tool to identify abnormal events or provide early warning of diseases based on electronic health record, biomarker, gene expression, and imaging data [12][13][14]. Forkan et al [15] leveraged a hidden Markov model to predict 7 clinical onsets, including tachycardia onset, and further improved performance by using random forest algorithms to forecast events within 1 to 2 hours [16]. Lee et al [17] developed an artificial neural network to predict ventricular tachycardia within 1 hour. Szep et al [18] utilized an archetypal cardiac monitoring system with regression and boosting models to detect arrhythmia and predict the fatal arrhythmia several minutes before onset.
With nonlinear computation and flexible feature extraction, deep learning models show strong performances in representation learning and exploration of unknown information [19]. Researchers have recently used deep learning models for disease diagnosis and prediction based on physiological signals or electronic health records [20][21][22]. Since measuring and acquiring vital signs are easily measured and some open-source, labeled physiological signal (especially ECG signals) data sets are available [23,24], there exist many studies employing deep learning in cardiology [25]. Hannun et al [26] reported a convolutional neural network algorithm that detects heart arrhythmias using ECG signals acquired with a single-lead wearable sensor. Shashikumar et al [27] also presented a convolutional neural network model that detects and monitors atrial fibrillation. Teijeiro et al [28] introduced a long short-term memory (LSTM) network based on a set of features extracted from ECG records to classify normal sinus rhythm, atrial fibrillation, and anomalies. Gotlibovych et al [8] constructed a model combining a convolutional neural network and LSTM to achieve nearly real-time identification of atrial fibrillation. Cho et al [29] obtained a convolutional neural network model to predict atrial fibrillation within 4 to 6 minutes using ECG signals.
Cardiovascular diseases are complex and heterogeneous; multiple factors such as genetics, environment, age, and gender can affect the occurrence and severity of cardiovascular disease [30,31]. Age has been proven to be an independent risk factor, and being female is a greater risk factor for cardiovascular disease when elderly [31]. Few studies have attempted to develop a prediction tachycardia onset model that accounts for the patient's personal information. Respiratory dysfunction and common lung diseases, such as asthma, chronic obstructive pulmonary disease, and lung fibrosis are significantly more likely to cause cardiovascular disease [32]. Abnormal respiratory rate and its relative changes are a critical indicator to predict cardiac arrest [33], and SpO 2 has also been shown as a diagnostic marker of acute heart failure [34]. However, this useful information has not been used effectively, though it can be easily acquired with wearable sensors.
The aim of this study was to develop a bidirectional long short-term memory (BiLSTM) model-TOP-Net-that is applicable to both intensive care units and general wards [35], leverages easily accessible data, enables real-time evaluation and early prediction of tachycardia onset with a long forecast range, and is based on vital signs and electronic health record data with the following contributions: (1) combining electronic health record (sparse records) and biosensor data (high frequency records) to accomplish early prognosis and real-time prediction of tachycardia onset, and its performance of early prediction; (2) being the first to consider 2 other important vital signs and explore their different combinations being with deep learning models to predict tachycardia onset, which can improve the precision of early forecast; and (3) utilizing a large critical care data set and a model that is transferrable to real clinical scenarios wards where patients are monitored by medical-grade wearable embedded systems, for example, transferable between different countries (US to China), ethnicities (multiracial to Asian), and medical departments (intensive care unit to general ward).

Overview
We leveraged a large data set from the Medical Information Mart for Intensive Care III (MIMIC-III) [24] and its matched physiological waveform database (recorded with monitors) [36] to develop the TOP-Net model (codes available [37]). The pretrained model was transferred to a relatively small data set, from patients who were continuously monitored with a medical-grade wearable embedded system (SensEcho, Beijing SensEcho Science & Technology Co Ltd) in a real clinical environment [38]. The process is presented in Figure 1.

Methodology
We combined 2 types of data to develop TOP-Net: (1) information from biological sensors (wearable), including heart rate, respiratory rate and SpO 2 ; (2) patients' personal information from electronic health records, which represents their individual health status when admitted to the hospital, including age, gender, admission type, first care unit, and history of cardiovascular disease.

Model Overview
BiLSTM [39], a sequential model, can capture the complex and multivariate dynamics in longitudinal electronic health record data and continuously collected physiological signals that is typically used in acute condition prediction, classification, and subphenotype identification [40]. We developed the model ( Figure 2) using BiLSTM to take advantage of potential long-term and short-term changes and associated characteristics of physiological state. An overview of TOP-Net using the cohort admission and personal measurement data in hospital. BiLSTM: bidirectional long short-term memory; EHR: electronic health record; HR: heart rate; RR: respiratory rate; SpO 2 : blood oxygen saturation.

Step 1: Calculate Statistical Features
We used a BiLSTM algorithm to represent the relationship between the multiple timeseries collected by biological sensors. Data from an observing window before tachycardia onset were used to train the model. Inspired by convolutional-LSTM model [41], we designed the model to use the statistical features of the raw timeseries signals as inputs within a sliding sub-observing window. The results for all sub-observing windows were concatenated along the time and fed into the model.
The absolute energy of the timeseries is calculated as The correlation of a timeseries and its time lag is described by ƒ 2 , which is a similarity measurement index where X i is a timeseries value at one time point, n is the length of X, σ 2 and μ are estimations of the timeseries variance and mean, respectively, and l is the time lag [42].
The nonlinearity of a timeseries is quantized using where lag is a time delay operator (equal to l) [43].

Step 2: Fuse Patient Characteristics
We extracted the previously mentioned static patient information which was merged with the statistical features. The concatenated vectors were normalized and input to the BiLSTM model.

Step 3: Obtain Tachycardia Onset Risk Score
In this step, TOP-Net determines a real-time risk score that evaluates an individual risk probability of tachycardia onset.
When the risk score continuously exceeds the threshold set by the doctor for a period of time, the caregiver is alerted.

Medical Information Mart for Intensive Care (MIMIC)
MIMIC III is a large, publicly available critical care database (version 1.4 [24]), with 38,557 adult patients' (52,955 ICU admissions) detailed hospital information such as demographic information, laboratory test results, and diagnosis codes. Patients' multiple physiological signals (waveforms) and corresponding numeric format of vital signs are stored in the MIMIC III Waveform Database, which contains 10,282 patients' time alignment information and 22,247 numeric records that can be matched to the clinical database [36]. The basic information is stored in the tables of admissions, patients' hospital admission information; icustays, ICU transfer (in and out) information; patients, individual birth and death dates; and diagnoses_icd, diagnosis codes during hospitalization. All of the tables can be associated with subject_id, a unique identity of patients. The waveform database includes the header files (name, unit, and recording frequency) and segments of recordings (numeric signals). Figure 3 presents the method used to link tables of information with the temporal waveforms.

Continuous Monitoring Database for the General Ward
The use of general ward data was approved by the ethics committee of the General Hospital of PLA (S2018-095-01). In the general ward, we utilized a SensEcho medical-grade monitoring system, which can monitor patients anytime and anywhere. SensEcho contains 3 parts ( Figure 4): a wearable multisensor system unit, a wireless network and data transmission unit, and a central monitoring system [35,38]. The multisensors include a single-lead ECG sensor (200 Hz), a sensor for respiratory inductive plethysmography (25 Hz), a noninvasive photoplethysmogram sensor for SpO 2 monitoring (1 Hz) based on near-infrared spectroscopy, and a posture recognition sensor using a 3-axis accelerometer. These signals are collected and stored in a data logger. The logger has an ultra-low power Wi-Fi module and supports long-term data transmission by relying upon hospital networks. The central monitoring system receives information, processes data, and delivers and displays information. The algorithms deployed on the system included signal quality evaluation, signal processing, real-time abnormal event monitoring and early prediction, and patients' health assessment, which were packaged as a toolkit (Midas). The accuracy, stability, and effectiveness of our system have been validated in previous studies [44][45][46].
Patients admitted to the hospital were assessed by a doctor using the system. Continuous monitoring physiological signals were transmitted to the hospital server and the data in numeric format were acquired based on the waveform processing function in Midas. The clinical information was stored separately in the hospital information system. Data from the different sources were linked ( Figure 5) using patient_id, a unique identification of patients similar to subject_id in MIMIC III.

Tachycardia Onset Diagnostic Criteria
Diagnostic tachycardia onset criteria were determined by 3 clinical experts from the Emergency Department, the general ward, and surgical ICU. A tachycardia event was defined as any of the following: (1) heart rate above 100 bpm sustained over 30 minutes; (2) heart rate above 130 bpm sustained over 20 minutes; (3) heart rate above 150 bpm sustained over 5 minutes. The initial timepoint meeting of any of these conditions was recognized as tachycardia onset.

Data Set
In the ICU environment, we selected 5699 patients with the following criteria: age over 18 years old, admitted to the hospital and ICU for the first time, monitoring data longer than 14 hours with heart rate, respiratory rate, and SpO 2 recordings. The size of the observing window was chosen as 2 hours, which was used to extract the statistical features. The negative sample set was built by extracting information in the observing window with a 1-hour sliding step throughout monitoring for patients without tachycardia. The positive sample set was acquired by selecting the same features in the observing window before the occurrence of tachycardia with a forecast range. To balance the ratio of positive and negative samples, we kept extracting positive samples with a 5-minute delay based on the former (for target replication), which is a method used in a previous study [47]. The data were downsampled from per second to per minute by averaging. If more than 30% were null or 0 values of all variables at a certain time, the missing values were filled using the forward interpolation method. We randomly picked the number of negative samples close to the positive samples to further decrease class imbalances. There were 2748 and 2130 negative and positive samples, respectively.
In the general ward, we deployed the wearable grade monitoring system ( Figure 6a) in a cardiovascular disease department in January 2018. We collected data from 367 patients for research. The inclusion criteria for monitoring duration was reduced to from 14 hours to 4 hours to take into account patient length of stay. A total of 259 patients were included, and 2300 negative samples and 270 positive samples were extracted. Figure 6b shows a patient wearing a multisensor shirt, and Figure 6c shows an example of a patient encountering tachycardia.

Developing the Prediction Model
In the early prediction model, developed from the MIMIC-III data set, predictions (forecast ranges) with TOP-Net were explored from 0 hour to 6 hours with a 2-hour interval. A total of 21 statistical features were included ( Table 1). The size of sub-observing window and sliding step were individually set to 20 minutes and 5 minutes, respectively. We calculated all statistical values in sub-observing windows, sequentially amalgamated, and fed them into the model. The data set was randomly split to 80% of the training set and 20% of the testing set according to the patient's hospitalization number. The 5-fold cross-validation together with random search was used to tune the hyperparameters based on the training set considering the sample size [48]. The hidden size was set to 32. We tested learning rates ranging from 1 -4 to 1 -2 with an interval of 1 -4 and training epochs from 5 to 100 with an interval of 10. The best hyperparameters were determined by minimizing validation loss. We retrained the model using the optimal hyperparameters on the training set, and the performance of the model was assessed on the test set. Together (heart rate, respiratory rate, SpO 2 ) (n=1) Mean value of ƒ 2 using all vital signs with the default l=40 all_autocorrelation a SpO 2: blood oxygen saturation.

Comparison With Baseline Models
To further investigate the performance of TOP-Net, we designed subexperiments 1, 2, and 3 to obtain a comprehensive assessment. In subexperiment 1, the model was acquired without considering personal information and bidirection memory functions. That is, LSTM and convolutional neural network models were obtained in a total cohort without considering the personal information of patients. The structure of the LSTM was consistent with that of a BiLSTM, and the convolutional neural network model had 2 convolutional layers. In subexperiment 2, conventional machine learning methods, including extreme gradient boosting [49], multilayer perceptron, and random forest, were compared with TOP-Net with default model parameters. In subexperiment 3, different feature combinations were examined: (1) all vital signs, (2) heart rate, (3) heart rate and respiratory rate, and (4) heart rate and SpO 2 .

Performance Evaluation Metrics
Prediction performance was measured with 6 metrics: sensitivity, specificity, accuracy, F1 score, precision, and area under the receiver operating characteristic curve (AUROC).

Model Validation and Transfer to the General Ward
The performance of TOP-Net was validated using the data collected in the general ward (small data set obtained within 1 year) by the SensEcho system. A transferrable model suitable for non-ICU patients was acquired by finetuning the ICU scenario model. The model performance was also assessed with the 6 metrics using 5-fold cross-validation due to the small sample size.

Experimental Platform
We utilized PostgreSQL (version 9.6; PostgreSQL Global Development Group) to extract the clinical data. All data processing and analyses, model development, and result visualization was performed with Python (version 3.7.1) and CUDA (version 10.0). Table 2 shows admission information summary statistics for the study cohorts. The patients' ages were slightly higher in the ICU cohort and most of them were admitted to the hospital for emergencies. A large proportion of patients were admitted for elective reasons in the cardiovascular disease department of our hospital. Furthermore, a higher proportion of patients had a history of cardiovascular diseases in the general ward.

Evaluation Based on the ICU Cohort
We leveraged 5-fold cross-validation to select optimal hyperparameters with the training set and assessed the performance of the model on the test set. The hyperparameter values that we selected were learning rate =0.0002, epoch=20, and batch size=64. Figure 7 and Table 3 summarize the results from subexperiment 1 and subexperiment 2. The AUROC and F1 score for TOP-Net were consistently better than those of other models, with the exception of F1 score (TOP-Net's was slightly lower than that of the LSTM model for 6 hours prediction, though TOP-Net's sensitivity was slightly higher than of the LSTM at this time).
In Table 4, the results for models using heart rate (n=10), heart rate and respiratory rate (n=15), heart rate and SpO 2 (n=15), and statistical features of all vital signs (n=21) are shown. For 2-to 6-hour forecast ranges the model with all of the features input has the best performance with highest AUROC values. The performance is slightly reduced when inputting heart rate and respiratory rate, or heart rate and SpO 2 . The performance was the worst when including only heart rate statistical features.
The statistical characteristics of heart rate play a dominant role in real-time diagnosis. Furthermore, we employed the extreme gradient boosting algorithm to rank the importance of 21 designed features for a forecast range of 6 hours. The top 8 features ( Figure 8) were hr_abs_energy, hr_quantiles_01, hr_c3,  hr_c2, hr_quantiles_03, resp_c3, hr_mean, and hr_quantiles_07. The nonlinearity features-hr_c3 and hr_c2 (ƒ 3 with lag=3 and lag=2)-were ranked third and fourth, respectively. The respiratory feature resp_c3 was ranked sixth.

Model Validation in the General Ward
We assessed the performance of the model 2 hours before tachycardia onset because the interval between the tachycardia onset and the admission time to the department was short in our scenario of the general ward. Given the limited training data, we used the transfer learning method to finetune the model. The parameters were learning rate=0.0002, epoch=18, and batch size=32. The 5-fold cross-validation was also used to assess the performance and prevent possible overfitting. The retraining results can be seen in Table 5. TOP-Net had a stable outcome and outperformed the other 5 models (AUROC 0.965, accuracy 0.937, sensitivity 0.955, specificity 0.881, F1 score 0.793, and precision 0.680. Compared with the model in ICU, the difference in prediction performance might be caused by the difference in the severity of the patient's disease. Although convolutional neural network's F1 score was much higher, its sensitivity, to which clinicians pay more attention, was lower than that of TOP-Net. Figure 9 shows real-time risk scores of tachycardia onset and an example of early tachycardia onset prediction with TOP-Net. In Figure 9a, the patient encountered a tachycardia event after admission from 675 to 725 minutes. The risk probability was assessed every 5 minutes; Figure 9b presents real-time risk. We set the alarm threshold to 0.40 with a trade-off predictive effect of sensitivity and specificity. The risk score begins to rise after the 555th minute, showing that our model can predict the tachycardia event 125 minutes beforehand.

General
In this study, we developed a model using a publicly accessible data set and transferred it to a real clinical scenario. The performance of TOP-Net for predicting tachycardia onset 0 to 6 hours in advance was better than that of the baseline models (timeseries prognosis methods and conventional machine learning methods without timing characteristics); TOP-Net outperformed benchmarks of 2 deep learning models, 2 ensemble, and 1 neural network models for predictions 6 hour in advance.
Many continuous monitoring physiological status studies have indicated the deterioration of vital signs occurred more than 6 to 12 hours before serious adverse events [50]. Continuous monitoring, early prediction, and intervention tachycardia can reduce the occurrence of heart failure, cardiac arrest, and death. This paper proposed TOP-Net, a tachycardia onset early prediction model leveraging the BiLSTM algorithm with 8 easily accessible vital signs and personal information. TOP-Net was trained using a large ICU data set and transferred to the general ward scenario with patients monitored by wearable sensors. TOP-Net has been validated to be consistently superior to the baseline models when predicting tachycardia onset from 0 to 6 hours in advance. Including patient characteristics allowed more accurate tachycardia onset prediction than those by other models without this information. Moreover, TOP-Net achieved forecasting tachycardia onset 6 hours beforehand, and the transferred model also performed well in our clinical scenario.
In recent years, some novel models for early risk prediction of adverse events have been developed based on electronic health records or physiological signals. Pan et al [51] utilized a self-correcting deep learning approach to predict whether acute kidney injury would occur in a subsequent 6 hours. Futoma et al [52] developed a multitask Gaussian process recurrent neural network classifier to early detect sepsis achieving 4 hours in advance. Tonekaboni et al [53] trained a convolutional neural network and LSTM fusion model to predict cardiac arrest from physiological signals 24 hours in advance. For tachycardia onset prediction, Lee et al [17] used an artificial neural network-based model and 104 samples to predict ventricular tachycardia 1-hour before occurrence. Yoon et al [54] adopted a random forest-based model and 1494 samples achieving detection 75 minutes in advance. Our real-time prediction model, using the deep neural architecture on 4878 sample sets, demonstrated better and more robust performance than those of multiple baseline models, which included artificial neural network and random forest models, when predicting tachycardia onset 0 to more than 6 hours beforehand.
It is necessary for clinicians to combine a patient's current symptoms, basic information, and past medical history to diagnose disease severity [55]. For example, the proportion who might have cardiovascular disease and the risk of sustained high heart rate is not the same for patients of different ages with different histories of disease. This useful information is usually recorded in electronic health records. Recently, several researchers have tried to combine the analysis of 2 kinds of materials to represent comprehensive information and improve the performance of the models: Xu et al [56] proposed a model to predict physiological decompensation and length of ICU stay by analyzing ECG and medical records data, and Nemati et al [57] employed high-resolution vital signs and electronic health records to achieve early sepsis prediction. However, little attention has been paid to tachycardia prognosis. In this paper, we integrated electronic health record and biosensor data to accomplish early prediction. The results of subexperiment 1 show that fusing electronic health record information can improve the accuracy of early prediction compared with the LSTM and convolutional neural network models.
Risk prediction is a core task in the artificial intelligence-assisted medical domain. Cardiovascular disease prediction models based on electronic health record analysis have been studied [58][59][60]. Doctor AI [58] requires diagnosis codes, medication codes, or procedure codes to achieve multilabel predictions including heart failure. Jin et al [60] utilized 1864 diagnostic events to train a sequential model to predict the risk of heart failure but because they were limited by the need to obtain more information, the model cannot be used in hospitals with low information integration or in homes. Deep learning models using ECG signals have also been used for predictive health care tasks [61]. While ECG signals are susceptible to interference from physical artifacts, sensors can obtain heart rate using photoplethysmography instead of ECG signals. Therefore, models based on core vital signs can easily be used and to improve prediction performance. We selected 3 vital signs and 5 types of personal information that can easily be acquired from wearable sensors and hospital information systems, respectively. TOP-Net was developed using a large data set and transferred to our actual demand scenario. The results show that it has the potential to be used in ICU and the general ward, which also can be extended to home use. Table  6 presents a comparison between TOP-Net and other state-of-the-art approaches based on input information, model types, scenario for evaluating the model, sample sizes, and performance.

Limitations
This study had some limitations. Because SensEcho was deployed in the clinic for only 1 year after our research project began, the limited data collected prevented us from directly developing a general ward model. Moreover, interventions such as beta-blocker medication may affect the occurrence of tachycardia onset and cause it to not be captured by the input features. Electronic health records contain rich information such as laboratory tests, clinical orders, and nursing notes that can characterize a patient's health status and depict the trajectory of diseases. Further studies involving the integration of multivariate timeseries from electronic health records are expected to improve the prediction performance of tachycardia onset, and more data from the general ward for TOP-Net performance evaluation are required.

Conclusions
TOP-Net for real-time evaluation and early prediction of the risk of tachycardia onset, which made it possible to achieve an early forecast of tachycardia onset 6 hours in advance with clinically acceptable performance. TOP-Net was assessed using 6 metrics, 3 subexperiments, different prediction times from 0 to 6 hours. The comparison between the TOP-Net and the other 5 approaches (2 deep learning models, 2 ensemble models, and 1 artificial neural network model) showed that TOP-Net was superior to the other models. The model with personal information from electronic health records had better performance than those without. The easily accessible input data of the model (3 vital signs and 5 types of personal information) and the good performance of the transferred model in the general ward indicated the early prediction of tachycardia onset using wearable sensors is possible in hospitals or houses.