This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
A timely decision in the initial stages for patients with an acute illness is important. However, only a few studies have determined the prognosis of patients based on insufficient laboratory data during the initial stages of treatment.
This study aimed to develop and validate time adaptive prediction models to predict the severity of illness in the emergency department (ED) using highly sparse laboratory test data (test order status and test results) and a machine learning approach.
This retrospective study used ED data from a tertiary academic hospital in Seoul, Korea. Two different models were developed based on laboratory test data: order status only (OSO) and order status and results (OSR) models. A binary composite adverse outcome was used, including mortality or hospitalization in the intensive care unit. Both models were evaluated using various performance criteria, including the area under the receiver operating characteristic curve (AUC) and balanced accuracy (BA). Clinical usefulness was examined by determining the positive likelihood ratio (PLR) and negative likelihood ratio (NLR).
Of 9491 eligible patients in the ED (mean age, 55.2 years, SD 17.7 years; 4839/9491, 51.0% women), the model development cohort and validation cohort included 6645 and 2846 patients, respectively. The OSR model generally exhibited better performance (AUC=0.88, BA=0.81) than the OSO model (AUC=0.80, BA=0.74). The OSR model was more informative than the OSO model to predict patients at low or high risk of adverse outcomes (
Early-stage adverse outcomes for febrile patients could be predicted using machine learning models of highly sparse data including test order status and laboratory test results. This prediction tool could help medical professionals who are simultaneously treating the same patient share information, lead dynamic communication, and consequently prevent medical errors.
For time-sensitive diseases, timely decisions are essential; however, the availability of data is extremely limited in the early stages of medicine [
Biomarkers, especially those obtained via laboratory data, play a key role in clinical decisions in emergency settings [
Previous studies have focused on utilizing a sufficient amount of laboratory test data. Most predictive models have been developed based on long intervals such as those to predict mortality occurring within 24 or 48 hours rather than earlier periods; with these longer periods, researchers can be guaranteed of adequate information from test results [
This study aimed to develop time adaptive models that predict adverse outcomes for febrile patients in the emergency department (ED) based on a machine learning approach and highly sparse data.
This retrospective study was conducted with ED data from a tertiary academic hospital in Seoul, Korea. The hospital has approximately 2000 beds. The outpatient department has an average of approximately 9000 patients per day, while the ED has approximately 220 patients per day. Since the opening of its comprehensive cancer center in 2003, the hospital has a large portion of oncology patients undergoing both surgical and medical procedures. This study was approved by the institutional review board of the study site (IRB File No: SMC 2018-08-125). This report follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.
Data were obtained from a clinical data warehouse containing medical data for research, which enables de-identification and retrieval of patient information from electronic medical records for research purposes. It uses global standard terminology and provides near realtime data through daily updates. In addition to basic patient demographics, it provides information on tests, medications, diagnoses, and operations.
Patients who visited the ED from March 2017 through February 2019 were included in the study. Then, only febrile (body temperature >38°C) [
We used a binary composite outcome for severity. Severity was considered as death or admission to the intensive care unit after transfer from the ED.
Only laboratory test data were used as predictors, and the list of laboratory tests was selected by experts. Predictors were selected based on the typical ED process in which all possible laboratory tests could be performed after the initial assessment by physicians [
The laboratory test data provide the order status and result for each laboratory test, and all the variables were categorized. Order status indicates whether a patient has an order for a laboratory test, and the test result reflects whether it was normal, abnormal, or not reported. When the test was repeatedly performed, only the first test data were included. We developed two predictive models using these laboratory test data: order status only (OSO) for the first model and order status and results (OSR) for model 2 (
Process flow in the emergency department.
Representative example of the range of predictors for each model, where each row indicates a patient’s record of laboratory tests. Additional laboratory tests and patient records can be added. Order status, which indicates whether the test was ordered, was used in the OSO model. The OSR model was developed using order status and test results, which had three levels: normal, abnormal, and NA (not reported). CRP: C-reactive Protein; Hb: hemoglobin.
For a group of laboratory tests that are not frequently ordered but are conducted for only a few patients, the order status information causes severe data sparsity. Rather than using the order status information for each of those tests, new variables were introduced. First, rarely ordered tests (ROTs) were identified as tests that had an ordered rate <5%. The new variable ROT was defined as the number of ROTs ordered for each patient. Likewise, for tests that generate rarely detected abnormal results (RARs; <5% of the results are abnormal), a new variable RAR was defined as the number of RARs obtained among those tests for each patient.
Patients were randomly assigned to two cohorts for model development (70%) and validation (30%), which had similar distributions with respect to the outcome. We applied and compared various machine learning methods, including random forest (RF), support vector machine, logistic regression with least absolute shrinkage and selection operator, ridge, and elastic net (EN) regularization [
The predictive models were evaluated with the validation cohort using various performance measures, such as the area under the receiver operating characteristic curve (AUC), area under the precision recall curve (AUPRC), balanced accuracy (BA), sensitivity, specificity,
Class imbalance existed in our outcome data. This can lead to the classifier having poor performance because it can create bias against a class and may not able to distinguish between noise and the individuals from the minority class [
The preprocess was conducted using R version 3.4.4 [
A total of 154,402 patients visited the ED between March 1, 2017, and February 28, 2019. Based on the inclusion and exclusion criteria, 9491 patients remained in the final dataset used for modeling (
The three most frequently observed laboratory tests were C reactive protein, chlorine, and sodium. Among a total of 286 laboratory tests after preprocessing, 201 ROTs (order rate <5%) and 231 tests with RARs (abnormal rate <5%) were identified. The OSO model had 85 order status variables as well as the ROT variable. Similarly, the OSR model had 55 result variables, the RAR variable, and the variables in the OSO model.
Baseline characteristics of the total sample and comparisons between the two patient cohorts used to develop and validate the two models.
Characteristic | Total sample | Model development cohort (n=6645) | Model validation cohort (n=2846) | ||
|
|
|
|
|
|
|
Female | 4839 (51.0) | 3399 (51.2) | 1440 (50.6) | .64 |
|
Male | 4652 (49.0) | 3246 (48.8) | 1406 (49.4) |
|
Age (years), mean (SD) | 55.2 (17.7) | 55.0 (17.8) | 55.6 (17.6) | .17 | |
|
|
|
|
||
|
Other | 7483 (78.8) | 5224 (78.6) | 2259 (79.4) | .42 |
|
Ambulance | 2008 (21.2) | 1421 (21.4) | 587 (20.6) |
|
|
|
|
|
||
|
Indirect | 1217 (12.8) | 830 (12.5) | 387 (13.6) | .15 |
|
Direct | 8274 (87.2) | 5815 (87.5) | 2459 (86.4) |
|
|
|
|
|
||
|
Alert | 9206 (97.0) | 6449 (97.1) | 2757 (96.9) | .69 |
|
Not alert | 285 (3.0) | 196 (2.9) | 89 (3.1) |
|
|
|
|
|
||
|
Normal (60-120 beats per minute) | 7108 (75.1) | 4956 (74.8) | 2152 (75.7) | .35 |
|
Abnormal (<60 or >120 beats per minute) | 2361 (24.9) | 1671 (25.2) | 690 (24.3) |
|
|
|
|
|
||
|
Normal (10-30 breaths per minute) | 9399 (99.2) | 6577 (99.2) | 2822 (99.2) | 1.00 |
|
Abnormal (<10 or >30 breaths per minute) | 73 (0.8) | 51 (0.8) | 22 (0.8) |
|
|
|
|
|
||
|
Normal (90-140 mmHg) | 7212 (76.0) | 5044 (75.9) | 2168 (76.2) | .80 |
|
Abnormal (<90 or >140 mmHg) | 2279 (24.0) | 1601 (24.1) | 678 (23.8) |
|
|
|
|
|
||
|
Normal (60-90 mmHg) | 6790 (71.5) | 4738 (71.3) | 2052 (72.1) | .44 |
|
Abnormal (<60 or >90 mmHg) | 2701 (28.5) | 1907 (28.7) | 794 (27.9) |
|
|
|
|
|
||
|
Normal (>90) | 9070 (97.4) | 6356 (97.5) | 2714 (97.2) | .48 |
|
Abnormal (<90) | 245 (2.6) | 166 (2.5) | 79 (2.8) |
|
|
|
|
|
||
|
Normal | 9059 (95.4) | 6342 (95.4) | 2717 (95.5) | .99 |
Composite adverse outcomeb | 432 (4.6) | 303 (4.6) | 129 (4.5) |
aSpO2 : peripheral oxygen saturation.
bDefined as death or admission to the intensive care unit.
The OSO and OSR models were each developed based on 5 different algorithms. The RF-based models were selected as the final predictive OSO and OSR models because they had better performance overall in terms of the most evaluation measures, including specificity, precision,
Internal validation of the models using different laboratory information, reported as the score and 95% CI.
Measurea | MEWSb | OSOc | OSRd | Difference |
Difference |
AUCg | 0.68 (0.63 to 0.73) | 0.80 (0.76 to 0.84) | 0.88 (0.85 to 0.91) | 0.12 (0.12 to 0.12) | 0.08 (0.08 to 0.08) |
AUPRCh | 0.14 (0.10 to 0.20) | 0.25 (0.18 to 0.33) | 0.39 (0.30 to 0.47) | 0.11 (0.11 to 0.11) | 0.14 (0.14 to 0.14) |
Sensitivity | 0.49 (0.42 to 0.61) | 0.70 (0.62 to 0.82) | 0.81 (0.76 to 0.89) | 0.22 (0.21 to 0.22) | 0.10 (0.10 to 0.10) |
Specificity | 0.82 (0.66 to 0.83) | 0.78 (0.66 to 0.83) | 0.81 (0.75 to 0.83) | –0.04 (–0.04 to –0.04) | 0.04 (0.04 to 0.04) |
Balanced accuracy | 0.65 (0.62 to 0.69) | 0.74 (0.71 to 0.77) | 0.81 (0.78 to 0.84) | 0.09 (0.09 to 0.09) | 0.07 (0.07 to 0.07) |
Precision | 0.11 (0.08 to 0.14) | 0.13 (0.10 to 0.16) | 0.17 (0.13 to 0.20) | 0.02 (0.02 to 0.02) | 0.04 (0.04 to 0.04) |
0.18 (0.14 to 0.22) | 0.22 (0.17 to 0.26) | 0.28 (0.23 to 0.32) | 0.04 (0.04 to 0.04) | 0.06 (0.06 to 0.06) | |
PLRi | 2.68 (1.76 to 3.27) | 3.10 (2.25 to 4.29) | 4.22 (2.92 to 4.94) | 0.49 (0.48 to 0.5) | 1.07 (1.06 to 1.08) |
NLRj | 0.63 (0.49 to 0.73) | 0.39 (0.24 to 0.49) | 0.23 (0.12 to 0.31) | –0.25 (–0.25 to –0.25) | –0.14 (–0.15 to –0.14) |
aCalculations were completed with the validation set, and 95% CIs were computed using 2000 bootstrap replicates for each performance measure.
bMEWS: Modified Early Warning Score.
cOSO: model with order status only.
dOSR: model with both order status and test result.
eDifference in each performance measure between the MEWS and OSO model.
fDifference in each performance measure between the OSO and OSR models.
gAUC: area under the receiver operating characteristic curve.
hAUPRC: area under the precision recall curve.
iPLR: positive likelihood ratio.
jNLR: negative likelihood ratio.
Compared with the OSO model, the OSR model showed significant improvement in the AUC, at 8%, and maximum BA, at 7%. Additionally, the OSR model was more informative than the OSO model in predicting low-risk and high-risk patients in terms of outcome (
Important variables selected from the RF-based and EN-based models were moderately correlated in terms of their value importance and odds ratios, respectively (
The data had severe outcome imbalance: 95.4% (9059) for the majority class and 4.6% (432) for the minority class. However, the sensitivity analysis to calibrate the imbalance with various reduction scenarios did not reveal any considerable improvement in the prediction performance. Therefore, our models are not affected much by the imbalance problem (
The curves indicate how the actual outcome developed over time when the patients were divided into high-risk and low-risk categories, as predicted from the OSR model. The graph was plotted using the Kaplan-Meier survival curve, and the P-value shows the log-rank test result.
In this study, we developed a time adaptive model to predict adverse outcomes for patients in the ED. These patients are likely to have insufficient and unconfirmed clinical information, especially in the early stages of the ED process. The OSO model, which only utilizes test order status, supports our hypothesis that it is feasible to predict patient prognosis based only on the fact that a laboratory test has been ordered and without the test results. Patient demographics or vital signs were also not required for the prediction.
Febrile patients have a considerable number of laboratory tests to consider. The ED receives patients with different illnesses and febrile patients with various diseases in particular. Fever is also the most common sign of potential sepsis [
The OSO model mimics the ED physician’s clinical reasoning process in practical settings, while prediction models developed previously are limited by using only confirmed results [
In modern medicine, a multidisciplinary approach is a cornerstone of better quality [
This study could be expanded further by including vital signs, procedures, and medications for better prediction. In addition, the application could be broadened to include diagnosis as well as adverse outcomes, especially for diseases where the patient’s response over time after a particular treatment is important. Additionally, it can be extended to anticipate clinical decisions, which may be integrated as clinical decision support. The time variable is the most essential component for these predictions, and this model has successfully shown its feasibility.
This study has some limitations. First, the models were developed and internally validated using data from a single large hospital. Although cross-validation was performed with repetition, optimization, and several candidates of hyperparameters, along with survival analysis to increase their clinical impact, further studies are required for external and prospective validation.
Second, the primary parameters such as laboratory results, which may vary across individuals and clinical fields, were from febrile patients. Therefore, it could be difficult to apply these to other populations, although we attempted to include as many tests as possible. However, we believe the important variables that were selected related to laboratory tests from the 2 models are clinically relevant for the outcome variables, so there is potential to extend the models to other target populations in future studies.
Third, the OSO and OSR models were not developed with a continuous time sequence. Instead of creating a continuous model, we tried to build representative models to reflect test order status and results. Further research will be required to create a continuous model for practical use, which can be applied to various time thresholds.
Last, the imbalance of data could have affected the performance of models developed using raw data. Although various methods to deal with the issues related with imbalanced data were applied to develop and validate the model, only a few of algorithms among the methods for calibrating the class imbalance were used. It is possible that the use of other algorithms would have changed the results even though the results in this study were not significantly improved after addressing the issue of imbalanced data. Therefore, various additional algorithms should be used to address the imbalanced data in future studies.
Adverse outcomes during the early stages for febrile patients could be predicted using a time adaptive model and machine learning approach based on the highly sparse data from test order status and laboratory test results.
Supplemental code for developing models (OSO).
Number of laboratory log per person.
Performance of the OSO and OSR models.
Receiver operating characteristic (ROC) curves for models.
Important variables selected from developed models.
Model performance without and with applying imbalance-easing techniques.
area under the receiver operating characteristic curve.
area under the precision recall curve.
balanced accuracy.
C reactive protein.
emergency department.
elastic net.
hemoglobin.
Modified Early Warning Score.
negative likelihood ratio.
order status only.
order status and result.
positive likelihood ratio.
number of rarely detected results.
random forest.
number of rarely ordered tests.
Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis.
This study was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI19C0275).
WCC and KK designed the concept of study, and WCC acquired the data. SL and KK conducted all data analysis, and all authors interpreted the results. SL, WCC, and KK wrote the draft, and all authors contributed to critical revision of the manuscript. WCC and KK contributed equally as corresponding authors.
None declared.