Developing a Model to Predict Hospital Encounters for Asthma in Asthmatic Patients: Secondary Analysis

Background: As a major chronic disease, asthma causes many emergency department (ED) visits and hospitalizations each year. Predictive modeling is a key technology to prospectively identify high-risk asthmatic patients and enroll them in care management for preventive care to reduce future hospital encounters, including inpatient stays and ED visits. However, existing models for predicting hospital encounters in asthmatic patients are inaccurate. Usually, they miss over half of the patients who will incur future hospital encounters and incorrectly classify many others who will not. This makes it difficult to match the limited resources of care management to the patients who will incur future hospital encounters, increasing health care costs and degrading patient outcomes. Objective: The goal of this study was to develop a more accurate model for predicting hospital encounters in asthmatic patients. Methods: Secondary analysis of 334,564 data instances from Intermountain Healthcare from 2005 to 2018 was conducted to build a machine learning classification model to predict the hospital encounters for asthma in the following year in asthmatic patients. The patient cohort included all asthmatic patients who resided in Utah or Idaho and visited Intermountain Healthcare facilities during 2005 to 2018. A total of 235 candidate features were considered for model building. Results: The model achieved an area under the receiver operating characteristic curve of 0.859 (95% CI 0.846-0.871). When the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk, the model reached an accuracy of 90.31% (17,391/19,256; 95% CI 89.86-90.70), a sensitivity of 53.7% (436/812; 95% CI 50.12-57.18), and a specificity of 91.93% (16,955/18,444; 95% CI 91.54-92.31). To steer future research on this topic, we pinpointed several potential improvements to our model. Conclusions: Our model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients. After further refinement, the model could be integrated into a decision support tool to guide asthma care management allocation.


Background
In the United States, asthma affects 8.4% of the population and leads to 2.1 million emergency department (ED) visits, 479,300 hospitalizations, 3388 deaths, and US $50.3 billion in cost annually [1,2].Reducing hospital encounters, including inpatient stays and ED visits, is highly desired for asthmatic patients.For this purpose, using prognostic predictive models to prospectively identify high-risk asthmatic patients and enroll them in care management for tailored preventive care is deemed state of the

Patient Population
Our patient cohort was based on the patients who visited Intermountain Healthcare facilities during 2005 to 2018.Intermountain Healthcare is the largest health care system in the Intermountain region (Utah and southeastern Idaho), with 185 clinics and 22 hospitals providing care for approximately 60% of the residents in that region.The patient cohort included asthmatic patients identified as residents of Utah or Idaho, with or without a specific home address.A patient was defined as having asthma in a given year if the patient had at least one diagnosis code of asthma (International Classification of Diseases, Ninth Revision [ICD-9]: 493.0x, 493.1x, 493.8x, and 493.9x;International Classification of Diseases, Tenth Revision [ICD-10]: J45.x) in that year in the encounter billing database [11,23,24].Patients who died during that year were excluded.There were no other exclusions.

Prediction Target (Dependent Variable)
In the rest of this paper, we use hospital encounter for asthma to refer to inpatient stay or ED visit at Intermountain Healthcare with a principal diagnosis of asthma (ICD-9: 493.0x, 493.1x, 493.8x, and 493.9x;ICD-10: J45.x).For each patient meeting criteria for asthma in a given year, we looked at any hospital encounter for asthma in the following year as outcome.In our modeling, we used each asthmatic patient's data by the end of each year to predict the patient's outcome in the following year.

Dataset
The Intermountain Healthcare enterprise data warehouse provided a structured, clinical, and administrative dataset, including all visits of the patient cohort at Intermountain Healthcare facilities during 2005 to 2018.
The 235 candidate features are listed in the first table in Multimedia Appendix 1 [37][38][39], where each reference to the number of a specific type of items, such as medications, counts multiplicity, unless the word distinct appears.A major visit for asthma is defined as an outpatient visit with a primary diagnosis of asthma, an ED visit with an asthma diagnosis code, or an inpatient stay with an asthma diagnosis code.An outpatient visit with asthma as a secondary diagnosis is defined as a minor visit for asthma.Intuitively, all else being equal and compared with a patient with only minor visits for asthma, a patient with 1 or more major visits for asthma is more likely to incur future hospital encounters for asthma.
Each input data instance for the predictive model includes the 235 candidate features, targets the unique combination of an asthmatic patient and a year (index year), and is used to predict the patient's outcome in the following year.For that combination of patient and year, the patient's age, current primary care provider (PCP), and home address were determined based on the data available on the last day of the index year.The features of premature birth, bronchiolitis, duration of asthma, duration of chronic obstructive pulmonary disease, whether the patient had any drug or material allergy, whether the patient had any environmental allergy, whether the patient had any food allergy, and the number of allergies of the patient were derived from the historical data from 2005 to the index year.Furthermore, 1 feature was derived from the historical data in both the index year and the year before.This feature is as follows: the proportion who incurred hospital encounters for asthma in the index year out of all asthmatic patients of the patient's current

XSL • FO
RenderX PCP in the year before.The remaining 226 features were derived from the historical data in the index year.

Data Preparation
For every numerical feature, we checked the data distribution, adopted the following lower and upper bounds to spot invalid values, and replaced them with null values.Using the lower and upper bounds from the Guinness World Records [40], all body mass indexes <7.5 or >204, all weights <0.26 kg or >635 kg, and all heights <0.24 m or >2.72 m were deemed physiologically impossible and invalid.Using the lower and upper bounds provided by our team's clinical expert MDJ, all peripheral capillary oxygen saturation values >100%, all temperatures <80°F or >110°F, all systolic blood pressure values ≤0 mm Hg or >300 mm Hg, all diastolic blood pressure values ≤0 mm Hg or >300 mm Hg, all heart rates <30 beats per minute or >300 beats per minute, and all respiratory rates >120 breaths per minute were deemed physiologically impossible and invalid.
To put all the numerical features on the same scale, we standardized every numerical feature by first subtracting its mean and then dividing by its standard deviation.As outcomes were from the following year, our dataset provided 13 years of effective data (2005-2017) over a total of 14 years (2005-2018).To reflect the model's use in practice, data from 2005 to 2016 were used to train predictive models.Data from 2017 were used to assess the model's performance.

Performance Metrics
As shown in the formulas below and Table 1, we applied 6 standard metrics to gauge the model's performance: AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
The following formulas were used to calculate the standard metrics to gauge the model's performance: Here, TP is true positive, TN is true negative, FP is false positive, and FN is false negative.For example, FN is the number of patients who will incur future hospital encounters for asthma and whom the model incorrectly projects to incur no future hospital encounter for asthma.Sensitivity shows the proportion of patients who will incur future hospital encounters for asthma found by the model.Specificity shows the proportion of patients who will incur no future hospital encounter for asthma found by the model.
For the 6 performance metrics, we obtained their 95% CIs via 1000-fold bootstrap analysis [41].We calculated our final model's performance metrics on every bootstrap sample of the 2017 data.For each performance metric, we got 1000 values, the 2.5th and 97.5th percentiles of which gave its 95% CI.We drew the receiver operating characteristic curve to exhibit the sensitivity-specificity trade-off.

Classification Algorithms
We used Waikato Environment for Knowledge Analysis (Weka), version 3.9 [42], to construct machine learning classification models.Weka is a widely used, open-source machine learning and data mining package.It incorporates many standard machine learning algorithms and feature selection techniques.We considered the 39 native machine learning classification algorithms in Weka listed in Multimedia Appendix 1 as well as the extreme gradient boosting (XGBoost) classification algorithm [43] implemented in the XGBoost4J package [44].An XGBoost model is an ensemble of decision trees formed in a stagewise manner.As a scalable and efficient implementation of gradient boosting, XGBoost adopts a more regularized model formulation to help avoid overfitting and improve classification accuracy.We used our previously developed automatic model selection method [45] and the 2005 to 2016 training data to automate the selection of the machine learning classification algorithm, feature selection technique, data balancing method for handling imbalanced data, and hyperparameter values among all the suitable ones.Our automatic model selection method [45] adopts the response surface methodology to automatically check many combinations of classification algorithm, feature selection technique, data balancing method, and hyperparameter values and conducts cross-validation to choose the final combination to maximize the AUC.AUC has no reliance on the cutoff threshold used for deciding between the projected future hospital encounters for asthma and the projected no future hospital encounter for asthma.This gives AUC an advantage over the other 5 performance metrics-accuracy, sensitivity, specificity, PPV, and NPV-whose values depend on the cutoff threshold used.For each classification algorithm, our automatic model selection method attempts to adjust all the related hyperparameters by testing many hyperparameter value combinations.To expedite the search, our method performs progressive sampling on the training set and uses test results on its subsets to quickly remove unpromising algorithms and hyperparameter value combinations.As a result, with no need to find near-optimal hyperparameter value combinations for almost all the algorithms, our method can return a good combination of the algorithm, feature selection technique, data balancing method, and hyperparameter values for building the final classification model.Compared with the Auto-WEKA automatic model selection method [46], our method can cut XSL • FO RenderX search time by 28-fold and model error rate by 11% simultaneously [45].

Demographic Characteristics of Our Patient Cohort
Recall that each data instance targets a unique combination of an asthmatic patient and a year.Tables 2 and 3  For the 2017 data, the same distribution was shown for sinusitis occurrence (P=.91).On the basis of the Cochran-Armitage trend test [47], for both 2005 to 2016 and 2017 data, the data instances linked to future hospital encounters for asthma and those linked to no future hospital encounter for asthma showed different distributions for age (P<.001) and duration of asthma (P<.001).

Features and Classification Algorithm Used
After finishing the search process to maximize the AUC, our automatic model selection method [45] chose the XGBoost classification algorithm [43] and the hyperparameter values listed in Multimedia Appendix 1. XGBoost is based on decision trees and can deal with missing feature values naturally.As XGBoost only accepts numerical features as its inputs, each categorical feature was first converted into 1 or more binary features via one-hot encoding before being given to XGBoost.Our final model was constructed using XGBoost and the 142 features listed in the descending order of their importance values in the second table in Multimedia Appendix 1. Due to having no extra predictive power, the other features were automatically removed by XGBoost.As detailed in the book by Hastie et al [48], XGBoost automatically computed each feature's importance value as the mean of such values across all decision trees in the XGBoost model.In each tree, the feature's importance value was computed based on the performance improvement gained by the split at each internal node of the tree using the feature as the splitting variable, weighted by the number of data instances the node is responsible for.

Performance Measures Achieved
Our final model reached an AUC of 0.859 (95% CI 0.846-0.871).Figure 1 shows our final model's receiver operating characteristic curve.5 shows the corresponding confusion matrix of our final model.
Recall that several features require more than 1 year of historical data to compute.If we exclude these features and use only those features computed on 1 year of historical data, the model's AUC degrades to 0.849.
Without excluding the features that require more than 1 year of historical data to compute, the model trained on both asthmatic adults' (age ≥18 years) and asthmatic children's (age <18 years) data reached an AUC of 0.856 on asthmatic adults and an AUC of 0.830 on asthmatic children.In comparison, the model trained only on asthmatic adults' data reached an AUC of 0.855 on asthmatic adults.The model trained only on asthmatic children's data reached an AUC of 0.821 on asthmatic children.

If we used only the top 21 features listed in the second table in
Multimedia Appendix 1 with an importance value ≥0.01 and excluded the other 121 features, the model's AUC degraded from 0.859 to 0.855 (95% CI 0.842-0.867).When the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk, the model's accuracy degraded from 90.31% (17,

Principal Findings
We built a more accurate machine learning classification model to predict hospital encounters for asthma in the following year in asthmatic patients.Our final model achieved a higher AUC than what has been reported in the literature for this task [9][10][11][12][13][14][15][16][17][18][19][20][21][22].After further refinement to improve its accuracy and to automatically explain its prediction results [49,50], our final model could be integrated into an electronic medical record system to guide care management allocation for asthmatic patients.This could better allocate a scarce and expensive resource and help improve asthma outcomes.Asthma in adults is different from asthma in children.Our final model reached a higher AUC on asthmatic adults than on asthmatic children.More work is needed to understand the reason for this difference.In addition, more work is needed to improve the prediction accuracy on asthmatic children compared with asthmatic adults.
We considered 235 features in total, about 60% of which appeared in our final model.If a feature is unused by our final model, it does not necessarily mean that this feature has no predictive power.Rather, it only shows that this feature offers no extra predictive power on our specific dataset beyond what the features used in our final model have.On a larger dataset with more asthmatic patients, it is possible that some of the excluded features will provide extra predictive power.This is particularly true with features whose nontrivial values occur on only a small portion of asthmatic patients, such as a comorbidity with a low prevalence rate.When too few data instances take nontrivial values, the features' predictive power may not appear.
In the second table in Multimedia Appendix 1, the 2 most important features, as well as several within the top 20, reflect overall instability of the patient's asthma.The instability could derive from physiologic characteristics of the patient's asthma, as reflected by the maximum blood eosinophil count, the maximum percentage of blood eosinophils, and the average respiratory rate.The instability could also result from treatment noncompliance, PCP changes, insurance changes, and socioeconomic issues for which data were unavailable.

Comparison With Prior Work
Researchers have developed multiple models to predict inpatient stays and ED visits in asthmatic patients [9][10][11][12][13][14][15][16][17][18][19][20][21][22].Table 6 compares our final model with these models, which include all relevant ones mentioned in Loymans et al's recent systematic review [9].None of these models obtained an AUC >0.81, whereas our final model's AUC is 0.859.In other words, compared with our final model, each of these models reached an AUC lower by at least 0.049.Compared with prior model building, our model building assessed more candidate features with predictive power, adopted a more advanced classification algorithm, and used data from more asthmatic patients.All of these helped boost our final model's accuracy.Our principle of considering extensive candidate features to help enhance the XSL • FO RenderX model's accuracy is general and can be applied to other diseases and outcomes such as health care cost [51].
Except for Yurk et al's model [17], all other prior models had a PPV ≤22% and a sensitivity ≤49%, which are lower than those achieved by our final model.Yurk et al's model [17] obtained better sensitivity and PPV primarily because the model used a different prediction target: hospital encounters or ≥1 day lost because of reduced activities or missed work for asthma.This prediction target occurs for more than half of the asthmatic patients, making it relatively easy to predict.If the prediction target were changed to hospital encounters for asthma, a rarer outcome that is harder to predict, we would expect the sensitivity and PPV reached by Yurk et al's model [17] to drop.[22] a The performance measure is not reported in the original paper describing the model.

Considerations Regarding Potential Clinical Use
Despite being more accurate than the prior ones, our final model still reached a relatively low PPV of 22.65% (436/1925).However, this does not prevent our final model from being clinically useful because of the following reasons:

•
The PPV depends highly on the outcome's prevalence rate [52].A relatively rare outcome, such as future hospital encounters for asthma, will occur in only a finite number of patients.Hence, most patients projected to have the outcome will inevitably turn out to not have the outcome, causing even a good predictive model to have a low PPV [52].For such an outcome, sensitivity is more important than PPV for assessing the model's performance and potential clinical impact.As shown in Table 4, by setting the cutoff threshold for conducting binary classification at the top 10.00% (1926/19,256) of patients with the highest predicted risk, our final model has already captured 53.7% (436/812) of the asthmatic patients who will incur future hospital encounters for asthma.If one is willing to increase the cutoff threshold to the top 25.00% (4814/19,256) of patients with the highest predicted risk, our final model would have captured 78.6% (638/812) of the asthmatic patients who will incur future hospital encounters for asthma, even though the PPV is only 13.25% (638/4814).
• Proprietary models with performance measures similar to those of the previously published models are being used at health care systems such as Intermountain Healthcare, University of Washington Medicine, and Kaiser Permanente Northern California [18] for allocating preventive interventions.Our final model is an improvement over those models.Table 6 shows that compared with the previously published models, our final model reached a sensitivity higher by 4.69% or more.If we could use our final model to find 4.69% more asthmatic patients who will incur future hospital encounters for asthma and enroll them in care management, we could improve outcomes and avoid up to 9239 inpatient stays and 33,768 ED visits each year [1,[4][5][6][7].Supporting the importance of relatively small improvements in the model's performance measures, Razavian et al [53] showed that by reaching a gain of 0.05 in AUC (from 0.75 to 0.8) and a PPV of 15%, a large health insurance company such as Independence Blue Cross would be willing to deploy a new predictive model to appropriately allocate preventive interventions.
Our final model used 142 features.Reducing features used in the model could ease its clinical deployment.For this, one could use the top few features with the highest importance values (eg, ≥0.01) and exclude the others, if one is willing to accept a not-too-big degrade of model accuracy.Ideally, one should first assess the features' importance values on a dataset from the target health care system before deciding which features should be kept for that system.A feature's importance value varies across different health care systems.A feature with a low importance value on the Intermountain Healthcare dataset might have a decent importance value on a dataset from another health care system.Similar to the case with many other complex machine learning models, an XGBoost model using a nontrivial number of features is difficult to interpret globally.As an interesting area for future work, we are in the process of investigating using the automatic explanation approach described in our prior papers [49,50] to automatically explain our final XGBoost model's prediction results on individual asthmatic patients.
Our final model was built using the XGBoost classification algorithm [43].For binary classification with 2 unbalanced classes, XGBoost uses a hyperparameter scale_pos_weight to control the balance of the weights for the positive and negative classes [54].One could set scale_pos_weight to the ratio of the number of negative data instances to the number of positive data instances [54], although the optimal value of scale_pos_weight often deviates from this value by a degree varying by the specific dataset.In our case, to maximize the model's AUC, our automatic model selection method [45] did a search of possible hyperparameter values and eventually set scale_pos_weight to a nondefault value to balance the 2 classes of future hospital encounters for asthma or not [55].This has the side effect of making the model's predicted probabilities of incurring future hospital encounters for asthma very small and unaligned with the actual probabilities [55].This side effect does not prevent us from selecting the top few percentage of asthmatic patients with the highest predicted risk as candidates for receiving care management or other preventive interventions.
To avoid this side effect, we could set scale_pos_weight to its default value of 1, without balancing the 2 classes.However, that would degrade the model's AUC from 0.859 to 0.849 (95% CI 0.836-0.862).

Limitations
This study has several limitations, all of which provide interesting areas for future work: • We had no access to medication claim data.Consequently, we were unable to use as features the following major risk factors for hospital encounters for asthma in asthmatic patients: medication compliance reflected in refill frequency, the asthma medication ratio [56], the dose of inhaled corticosteroids [33], and the step number of the stepwise approach for managing asthma [33,57].We are in the process of obtaining an asthmatic patient dataset from Kaiser Permanente Southern California including these attributes [58], so that we can investigate how much gain in prediction accuracy they can bring.

•
Besides those considered in the study, other features could also help boost model accuracy.Our dataset missed some of these features, such as pulmonary function test results.An example of pulmonary function test results is the ratio of the forced expiratory volume in 1 second to the forced vital capacity, a known risk factor for hospital encounters for asthma in asthmatic patients.It would be interesting to find new predictive features from, but not limited to, the attributes available in our dataset.

•
Our study considered only structured data and non-deep-learning machine learning classification algorithms.Adding features extracted from unstructured clinical notes and using deep learning may further improve the model's accuracy [50,58].
• Our dataset included no information on the patients' health care use at non-Intermountain Healthcare facilities.As a result, we computed features using incomplete clinical and administrative data of the patients [59][60][61][62].In addition, XSL • FO RenderX instead of taking hospital encounters for asthma anywhere as the prediction target, we had to restrict it to hospital encounters for asthma at Intermountain Healthcare.It would be interesting to investigate how the model's accuracy would change if more complete clinical and administrative data of the patients are available [63].

•
Our study used data from 1 health care system and did not assess our results' generalizability.After obtaining the asthmatic patient dataset from Kaiser Permanente Southern California, we plan to evaluate our final model's performance on that dataset and explore the process of customizing models to features available in specific datasets as part of the approach to generalization.

Conclusions
Our final model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients.In particular, our final model reached an AUC of 0.859, which is higher than those previously reported in the literature for this task by ≥0.049.After further refinement, our final model could be integrated into an electronic medical record system to guide allocation of scarce care management resources for asthmatic patients.This could help improve the value equation for asthma care by improving asthma outcomes while also decreasing resource use and cost.
exhibit the demographic characteristics of our patient cohort during 2005 to 2016 and 2017, respectively.The characteristics are relatively similar between the 2 periods.During 2005 to 2016 and 2017, about 3.59% (11,332/315,308) and 4.22% (812/19,256) of data instances linked to hospital encounters for asthma in the following year, respectively.On the basis of chi-square 2-sample test, for both 2005 to 2016 and 2017 data, the data instances linked to future hospital encounters for asthma and those linked to no future hospital encounter for asthma showed the same distribution for long-acting beta2-agonist prescription (P=.67 for the 2005 to 2016 data and P=.11 for the 2017 data), mast cell stabilizer prescription (P=.29 for the 2005 to 2016 data and P>.99 for the 2017 data), allergic rhinitis occurrence (P=.38 for the 2005 to 2016 data and P=.13 for the 2017 data), and cystic fibrosis occurrence (P=.21 for the 2005 to 2016 data and P=.20 for the 2017 data) and, they showed different distributions for gender (P<.001 for the 2005 to 2016 data and P=.002 for the 2017 data), race (P<.001), ethnicity (P<.001), insurance category (P<.001), inhaled corticosteroid prescription (P<.001), inhaled steroid and rapid-onset long-acting beta2-agonist combination prescription (P<.001 for the 2005 to 2016 data and P=.002 for the 2017 data), leukotriene modifier prescription (P<.001), inhaled short-acting beta2-agonist prescription (P<.001), systemic corticosteroid prescription (P<.001), anxiety or depression occurrence (P<.001 for the 2005 to 2016 data and P=.002 for the 2017 data), bronchopulmonary dysplasia occurrence (P<.001 for the 2005 to 2016 data and P=.02 for the 2017 data), chronic obstructive pulmonary disease occurrence (P<.001), eczema occurrence (P<.001), gastroesophageal reflux occurrence (P<.001), obesity occurrence (P<.001 for the 2005 to 2016 data and P=.004 for the 2017 data), premature birth occurrence (P<.001), sleep apnea occurrence (P<.001), and smoking status (P<.001).For the data from 2005 to 2016, different distributions were shown for sinusitis occurrence (P=.006).
b ED: emergency department.

•A
PPV of 22.65% is reasonably good for identifying high-risk asthmatic patients as candidates for receiving relatively inexpensive preventive interventions.Furthermore, 4 examples of such interventions are teaching the patient how to correctly use an asthma inhaler, teaching the patient how to correctly use a peak flow meter and giving it to the patient to use at home for self-monitoring, XSL • FO RenderX training the patient to keep an environmental trigger diary, and arranging for a nurse to make additional follow-up phone calls with the patient.

Table 1 .
The confusion matrix.

Table 2 .
Demographic characteristics of the asthmatic patients at Intermountain Healthcare during 2005 to 2016.

Table 3 .
Demographic characteristics of the asthmatic patients at Intermountain Healthcare in 2017.

Table 4 .
Our final model's performance metrics when differing top percentages of asthmatic patients with the highest predicted risk were used as the cutoff threshold for conducting binary classification.

Table 5 .
Our final model's confusion matrix when the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk.

Table 6 .
A comparison of our final model and multiple prior models for predicting inpatient stays and emergency department visits in asthmatic patients.