Evaluation of the Need for Intensive Care in Children With Pneumonia: Machine Learning Approach

Background: Timely decision-making regarding intensive care unit (ICU) admission for children with pneumonia is crucial for a better prognosis. Despite attempts to establish a guideline or triage system for evaluating ICU care needs, no clinically applicable paradigm is available. Objective: The aim of this study was to develop machine learning (ML) algorithms to predict ICU care needs for pediatric pneumonia patients within 24 hours of admission, evaluate their performance, and identify clinical indices for making decisions for pediatric pneumonia patients. Methods: Pneumonia patients admitted to National Taiwan University Hospital from January 2010 to December 2019 aged under 18 years were enrolled. Their underlying diseases, clinical manifestations, and laboratory data at admission were collected. The outcome of interest was ICU transfer within 24 hours of hospitalization. We compared clinically relevant features between early ICU transfer patients and patients without ICU care. ML algorithms were developed to predict ICU admission. The performance of the algorithms was evaluated using sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and average precision. The relative feature importance of the best-performing algorithm was compared with physician-rated feature importance for explainability. Results: A total of 8464 pediatric hospitalizations due to pneumonia were recorded, and 1166 (1166/8464, 13.8%) hospitalized patients were transferred to the ICU within 24 hours. Early ICU transfer patients were younger (P<.001), had higher rates of underlying diseases (eg, cardiovascular, neuropsychological, and congenital anomaly/genetic disorders; P<.001), had abnormal laboratory data, had higher pulse rates (P<.001), had higher breath rates (P<.001), had lower oxygen saturation (P<.001), and had lower peak body temperature (P<.001) at admission than patients without ICU transfer. The random forest (RF) algorithm achieved the best performance (sensitivity 0.94, 95% CI 0.92-0.95; specificity 0.94, 95% CI 0.92-0.95; AUC 0.99, 95% CI 0.98-0.99; and average precision 0.93, 95% CI 0.90-0.96). The lowest systolic blood pressure and presence of cardiovascular and neuropsychological diseases ranked in the top 10 in both RF relative feature importance and clinician judgment. JMIR Med Inform 2022 | vol. 10 | iss. 1 | e28934 | p. 1 https://medinform.jmir.org/2022/1/e28934 (page number not for citation purposes) Liu et al JMIR MEDICAL INFORMATICS


Introduction
Despite recent advances in vaccine development, pneumonia remains a major cause of hospitalization and mortality in children in Taiwan and worldwide [1,2]. New pathogens, such as the recent coronavirus causing COVID-19, continue to cause outbreaks of pneumonia and other severe respiratory infections [3,4]. For hospitalized patients with critical conditions, the timely decision to admit them to the intensive care unit (ICU) is crucial for better prognosis and overall medical care quality [5]. The decision is usually made by doctors based on clinical criteria (eg, chief complaint, symptoms/signs, vital signs) and laboratory criteria (eg, microbiology tests, complete blood count, biochemical examinations). However, no well-structured nor quantitative approach exists.
The community-acquired pneumonia management guidelines from the Pediatric Infectious Diseases Society and the Infectious Diseases Society of America [6] recommend that pediatric patients who need ventilation, have low blood pressure, or have low oxygen saturation be admitted to the ICU for pneumonia. Other risk factors, including white blood cell count and hemoglobin, have been associated with exacerbation among pneumonia patients during hospitalization [7]. Some studies have tried to develop clinical scoring systems to standardize prognosis and disease exacerbation evaluations. For example, a modified version of the Sequential Organ Failure Assessment score for children used vital signs (blood pressure, oxygen saturation), laboratory data (creatinine, platelet count), and medications to evaluate the risk of in-hospital mortality [8].
Other scoring systems, such as the Pediatric Early Warning Score (PEWS) and Pediatric Advanced Warning Score, have been proposed to assist the evaluation of deterioration of pediatric inpatients [9][10][11]. Gold et al [12] used a modified version of PEWS calculated at admission to predict ICU admission and reported an area under the receiver operating characteristic curve (AUC) of 0.86. Nevertheless, the varying sensitivity, specificity, and degrees of human effort limited their clinical application.
In the era of health data science, using large amounts of patient data to develop algorithms to solve clinical problems has become an important approach [13][14][15][16][17][18]. For example, Makino et al [19] applied a logistic regression model to predict aggravation of diabetic kidney disease 180 days after discharge using patient demographic data, lab tests, diagnosis codes, and medical history. Their model reached an AUC of 0.74 [19]. Studies conducted in the emergency service setting showed promising results in triaging patients with asthma and chronic obstructive pulmonary disease [20]. In the critical care setting, Zhang et al [16] developed an ensemble model for the prediction of agitation in invasive mechanical ventilation patients under light sedation; an automated electronic health records model to identify patients at high risk of acute respiratory failure or death was validated retrospectively and prospectively and was determined to be feasible for real-time risk identification [17]. Artificial intelligence technology is assisting us with interpreting complex data from critical patients such as patients with acute respiratory distress syndrome (ARDS) and enables us to further improve the management of critically ill patients with individual treatment plans [18]. In these studies, machine learning (ML) algorithms were usually implemented because of the strength of incorporating large data sets and exploring the hidden relationships among features [13,14]. The most common type of clinical task (eg, determining whether the patient has a specific diagnosis, the clinical severity, and the prognosis, such as survival after a specific period) was classification. Decision tree-based models usually yield the most promising results in these clinical scenarios because of their strength in classification tasks [14,20,21].
A computer-aided prognosis prediction framework has also been applied to evaluate deterioration of pediatric inpatients. Zhai et al [22] used electronic health records in a single medical center to predict the need for pediatric intensive care within the first 24 hours of admission. Their logistic regression model reached an AUC of 0.91. Mayampurath et al [23] used 6 common vital signs (eg, temperature, pulse, blood pressure) to predict an ICU transfer event up to 36 hours in advance, reaching AUCs of 0.7-0.8. Rubin et al [24] applied a boosted trees model to electronic health records to predict pediatric ICU transfer at most 2 hours to 8 hours in advance with an AUC of 0.85. These deterioration evaluation models showed promising results with general pediatric patients.
Most ML studies for pneumonia patients have focused on using clinical imaging data for diagnosis or mortality [25][26][27]. Few studies have explored the possibility of developing an ML-based prediction framework to evaluate the need for intensive care among pediatric pneumonia patients and to yield clinically applicable performance. Therefore, we aimed to use clinical data from children with pneumonia to develop ML algorithms to predict the need for ICU transfer within 24 hours of admission, which could support physician decision-making.

Data Source
We enrolled pneumonia patients aged under 18 years admitted to the National Taiwan University Hospital from January 2010 to December 2019. The clinical data for enrolled patients were retrieved from the National Taiwan University Hospital-integrated Medical Database, and all data were de-identified before being analyzed. The institutional review board of the National Taiwan University Hospital approved this study and the use of de-identified electronic health records (201912131RINB).
The diagnosis of pneumonia was determined from the hospital records if both of the following criteria were met: (1) clinical manifestation of respiratory tract infection at admission, including symptoms (eg, dyspnea, rhinorrhea, cough, sputum), abnormal breath sounds (eg, rales, crackles, rhonchi), or a preliminary diagnosis recorded within 24 hours of admission (see Table S1 in Multimedia Appendix 1), and (2) the International Classification of Disease, ninth revision (ICD-9) and tenth revision (ICD-10) diagnostic codes related to pneumonia at discharge (see Table S2 in Multimedia Appendix 1).

Collection of Clinically Relevant Features
Data including demographics, underlying diseases, vital signs, pathogens, and laboratory data, which were available within 24 hours of hospitalization and prior to ICU transfer, were collected and included in the statistical analysis, model training, and performance evaluation, as seen in Table S3 in Multimedia Appendix 1. Underlying diseases were identified using ICD-9 and ICD-10 codes. The aforementioned clinically relevant features associated with pneumonia prognosis were also selected and ranked by 3 pediatricians specializing in pediatric infectious diseases, with 5, 10, and over 20 years of experience. If missing rates of cohort data were greater than 30%, features were excluded.

Outcome of Interest
The outcome of interest was ICU admission within 24 hours of hospitalization, including those directly admitted to the ICU from emergency departments or death within 24 hours of hospitalization. Therefore, patients transferred to the ICU after 24 hours of admission were excluded. Readmissions due to pneumonia within 14 days or due to other conditions within 3 days were also excluded because they might be related to previous admission. The cohort was thus categorized into 2 groups: early ICU transfer (ie, patients transferred to the ICU or who died within 24 hours of admission) and no ICU admission (patients who were not admitted to the ICU through discharge).

Statistical Analysis
In addition to descriptive analyses, we used chi-square tests for categorical variables to compare differences between the early ICU transfer group and the no ICU admission group. For numerical variables, the Shapiro-Wilk test was used to test normality, the Mann-Whitney U test was used for between-group comparisons if the data were not normally distributed, and the t test was used if data were normally distributed. The Benjamini-Hochberg procedure was applied to adjust for multiple comparisons. Adjusted P values <.05 were considered significant.

Model Training and Performance Evaluation
Based on previous research, we developed a logistic regression model as a baseline reference. Then, we trained random forest (RF) and eXtreme Gradient Boosting (XGB) models because of their promising performance on clinical classification tasks [14,16,17,20,[28][29][30][31]. For model training, the data set was separated into development and validation sets at a 4:1 ratio via random selection. The ML models were trained using the development set with 5-fold cross-validation. The performance was then evaluated using the independent validation set. The accuracy, sensitivity (recall), specificity, positive predictive value (precision), negative predictive value, AUC, and average precision were calculated to compare different algorithms and thresholds.
We chose 3 points to operationalize the best performing model: the points with the highest Youden index [32], high specificity (0.99), and high sensitivity (0.99), which could be applied in different clinical scenarios. The CI was estimated using bootstrapping methods with 1000 samples.

Comparison of Feature Importance Between the ML Model and Physicians
With the best-performing model selected using the aforementioned performance evaluation, we further generated the relative feature importance list using Tree Explainer based on Shapley Additive Explanations (SHAP) values [21]. The relative feature importance was also ranked by 3 physicians using a 5-point scale, and the list was generated by sorting clinical features according to the average of importance scores assessed by the physicians. Then, the relative feature importance list from the ML model was compared with the relative importance ranked by the physicians.

Software
All data were managed using the NumPy (version 1. 16

Cohort Description and Between-Group Comparison
A total of 6947 patients from 9065 hospitalizations due to pneumonia were included in the study based on their discharge diagnosis code and status at admission. The text mining algorithm correctly labeled 99.8% of admissions with clinical manifestations of a tentative diagnosis using admission notes as examined by the authors using 1000 randomly sampled admissions. Since 601 admissions were excluded based on the aforementioned exclusion criteria, it resulted in a final cohort of 8464 admissions ( Figure 1). The male-to-female ratio was 1
For the RF algorithm, at the point with the highest Youden index, the overall accuracy of the RF algorithm was 0.936 (95% CI 0.930-0.947), sensitivity was 0.940 (95% CI 0.919-0.954), and specificity was 0.935 (95% CI 0.924-0.952; Figure 2). At this threshold, there is approximately one false positive for every 3.1 positive predictions. At the point of highest sensitivity, which could include most patients with early ICU admission with some false alarms, the specificity was 0.868 (95% CI 0.642-0.917), and the negative predictive value was 0.998 (95% CI 0.995-1.000). At the point of highest specificity, which could avoid the most unnecessary ICU admissions, the sensitivity and positive predictive value (precision) for our RF algorithm were 0.835 (95% CI 0.779-0.886) and 0.897 (95% CI 0.883-0.933), respectively.    Figure 3 shows the top 20 features by relative importance from the RF algorithm based on SHAP values (see Figure S1 in Multimedia Appendix 2 for a complete list). The 5 most important features were lowest pulse rate, peak body temperature, age, lowest diastolic blood pressure, and presence of cardiovascular disease. For physician-rated relative feature importance, the presence of immunodeficiency; lowest oxygen saturation; and presence of solid neoplastic diseases, respiratory diseases, and cardiovascular diseases were considered the most important features ( Figure S2 in Multimedia Appendix 3). The presence of cardiovascular diseases, the lowest systolic blood pressure, and the presence of neuropsychological diseases were ranked in the top 10 features with the highest importance measured by both SHAP values in the XGB model and physicians' judgment.

Principal Findings
Using the clinical data from 8464 admissions of children with pneumonia, we trained 2 ML algorithms to predict the need for ICU care within 24 hours of admission. Our study showed that ML algorithms could be applied to accurately triage hospitalized pediatric patients with pneumonia and effectively identify those who may need early ICU transfer. The high specificity and sensitivity of our algorithms supported their potential application in real-world clinical scenarios, which could provide a disease-specific alarm for severe conditions with the need for ICU care in a timely manner based on individual patient conditions. Because we only included the available features at admission, this design was considered more practical in clinical use. In addition, the list of feature importance could be explained by the clinical reasoning of human physicians. The explainability further validates the use of the ML approach for the clinical classification task. To our knowledge, our study is the first to explore the possibility of applying ML methods to large clinical data sets for triaging pediatric patients with pneumonia for ICU care.
The identification of a patient with the need for ICU care in the emergency room or in the early stage of the disease might influence medical care quality and clinical outcomes [5,36]. Previous work has revealed the ability to use decision tree-based algorithms to perform classification tasks in clinical scenarios or triage, with some promising preliminary results [13,14,16,17,20,24]. However, applications in clinical classification usually focus on triaging patients with different clinical severities and more general clinical diagnoses, such as respiratory failure, other organ failures, or sepsis [13,14,20,37].
Our work is one of the few studies to focus on a large data set for a specific diagnosis, pediatric pneumonia. Our algorithm's performance has better performance than previous studies that had AUCs ranging from approximately 0.7 to 0.9 [22][23][24]29], suggesting the advantage of an ML approach dedicated to children with pneumonia. With satisfactory performance, the application of the ML algorithms we proposed can be applied to support physicians' decisions for ICU care based on individual patient conditions and further improve health care quality during hospitalization. It can also help reduce clinicians' burden during outbreaks of community-acquired pneumonia, such as the recent COVID-19 outbreak, or in hospitals with insufficient human resources.
Because we could set up different operational points for the algorithm, our algorithm could be applied in various clinical settings. For example, at the high sensitivity operational point, the specificity could be kept at 0.868 (95% CI 0.642-0.917) with a negative predictive value of 0.998 (95% CI 0.995-1.000), which could be used to rule out those who did not need ICU care. Medical centers accommodating single-digit inpatients with pneumonia per day can operate on this threshold. Using the high sensitivity point, we could help clinicians identify patients who might need ICU admission earlier and reduce the number of undertriaged patients. Although there were one-quarter of the results as false positives, the burden is acceptable when the number of inpatients per day remains low, and false negatives are more harmful. When we further examined the medical records of those false negative cases in the current data set, we found that older age might be related with false negative results. Therefore, clinicians should be aware of false negative results in older children when applying the algorithm for their decision support. In contrast, at the high specificity point (0.99), our algorithm maintained a sensitivity of 0.835 (95% CI 0.779-0.886) and a positive predictive value of 0.897 (95% CI 0.883-0.933). The high specificity with a high positive predictive value suggest that the algorithm could prevent unnecessary ICU admissions, so it may be applied when health care resources are limited or an outbreak happens. Therefore, the algorithm output could be customized according to the clinician's needs. In this way, the improved discriminability from ML algorithms could contribute to more accurate clinical decision-making and resource allocation. The ML model can not only provide automated estimation in clinical settings but also serve as a tool for training less experienced physicians or setting an alarm in hospitals with fewer human resources. Although the model does not reflect 100% of human physician decisions, it could be considered as a second opinion in the clinical setting and serve as a reference instead of being the only guideline for the final medical decision.
Our study also revealed important clinical feature indices (such as younger age, underlying diseases, higher pulse rate, and lower blood pressure) for the need for early ICU transfer, but patients with positive results for influenza A, influenza B, and S.pneumoniae at admission were less likely to be transferred to the ICU within 24 hours. These important clinical red flags could help physicians manage critically ill patients. In addition, early detection of the pathogens causing pneumonia in children makes early optimal treatment possible and improves the patient's clinical condition.

Limitations
There are some limitations in our study. First, we did not include imaging data, such as chest X-ray images, in our data set. However, diagnosis using the ICD codes relied on the physicians' clinical judgments, and clinicians might have already considered other clinical clues. Although most pneumonia patients are diagnosed clinically without specific radiological findings, including imaging data might still improve the judgment of clinical severity and thus influence the risk stratification for ICU care. Second, some clinically relevant parameters, such as blood gas values and procalcitonin measures, were not included in our algorithm training because of the high proportion of missing data. Third, the reasons for ICU admission usually varied (eg, ARDS, sepsis, respiratory failure, or other organ failures). Our algorithm could only evaluate the possible needs for ICU admission instead of the clinical diagnosis. With more data collected, an individual algorithm for a specific diagnosis might be developed in the future. Lastly, the algorithms were trained using a data set from a single medical center. Generalizability might be an issue if we would like to apply the findings to other hospital settings. Clinical validation in real-world settings might be required at the next stage to examine the application of ML algorithms in daily clinical work.

Comparisons With Prior Work
Compared with prior work that evaluated the need for ICU admission for pediatric patients, our disease-specific model for children with pneumonia demonstrated better performance. Our study incorporated up to 41 features from different domains (eg, demographics, vital signs, microbiological tests, and laboratory examinations) with no human-rated components (eg, behavior rating, respiratory difficulty). The strength of our tree-based ML approach is the ability to simultaneously process high-dimensional data linearly or nonlinearly [21]. With ML algorithms, we could integrate data with varying characteristics and solve complicated clinical questions (ie, predict the need for ICU care for hospitalized children with pneumonia). These characteristics enable the ML algorithm to include more clinical data and explore interactions among individual features, which was almost impossible to conduct with human intelligence or traditional statistical approaches, such as logistic regression.
To further validate the algorithm's explainability, we invited 3 experienced physicians to grade the importance of ICU transfer evaluations from a clinical perspective. The results showed that features that were considered to be of higher importance by ML algorithms, such as the lowest systolic blood pressure and the presence of cardiovascular and neuropsychological diseases, were also considered essential features in the physicians' clinical judgment. The results helped us explain the findings of ML algorithms without being accused of using a "black box" for clinical decision-making. However, some discrepancies were still found. For example, human doctors tend to consider immunodeficiency and solid tumor diseases to be high-risk factors for early ICU transfer, but the importance of these 2 features in the ML algorithms is very low. This discrepancy between machine and human intelligence might be the consequence of proactive management for immunocompromised patients in clinical settings and thus inversely lowers the probability of early ICU admission. When applying the ML algorithm, we still have to consider this limitation in immunocompromised patients and combine the prediction of ML algorithms with clinical judgment. In this way, we could maximize support from machines without neglecting human intelligence.

Conclusions
In summary, we developed ML algorithms that could accurately classify the risk of early ICU transfer within 24 hours of admission for children with pneumonia. The clinical use of these algorithms might detect high-risk patients earlier and improve the quality of health care for pediatric pneumonia.