Published on in Vol 9, No 9 (2021): September

Preprints (earlier versions) of this paper are available at, first published .
Models Predicting Hospital Admission of Adult Patients Utilizing Prehospital Data: Systematic Review Using PROBAST and CHARMS

Models Predicting Hospital Admission of Adult Patients Utilizing Prehospital Data: Systematic Review Using PROBAST and CHARMS

Models Predicting Hospital Admission of Adult Patients Utilizing Prehospital Data: Systematic Review Using PROBAST and CHARMS

Authors of this article:

Ann Corneille Monahan1 Author Orcid Image ;   Sue S Feldman2 Author Orcid Image


1Department of Epidemiology & Public Health, School of Public Health, University College Cork, Cork, Ireland

2Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, AL, United States

Corresponding Author:

Ann Corneille Monahan, MSHI, PhD

Department of Epidemiology & Public Health

School of Public Health

University College Cork

College Road

Cork, T12 K8AF


Phone: 353 21 420 5860


Background: Emergency department boarding and hospital exit block are primary causes of emergency department crowding and have been conclusively associated with poor patient outcomes and major threats to patient safety. Boarding occurs when a patient is delayed or blocked from transitioning out of the emergency department because of dysfunctional transition or bed assignment processes. Predictive models for estimating the probability of an occurrence of this type could be useful in reducing or preventing emergency department boarding and hospital exit block, to reduce emergency department crowding.

Objective: The aim of this study was to identify and appraise the predictive performance, predictor utility, model application, and model utility of hospital admission prediction models that utilized prehospital, adult patient data and aimed to address emergency department crowding.

Methods: We searched multiple databases for studies, from inception to September 30, 2019, that evaluated models predicting adult patients’ imminent hospital admission, with prehospital patient data and regression analysis. We used PROBAST (Prediction Model Risk of Bias Assessment Tool) and CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) to critically assess studies.

Results: Potential biases were found in most studies, which suggested that each model’s predictive performance required further investigation. We found that select prehospital patient data contribute to the identification of patients requiring hospital admission. Biomarker predictors may add superior value and advantages to models. It is, however, important to note that no models had been integrated with an information system or workflow, operated independently as electronic devices, or operated in real time within the care environment. Several models could be used at the site-of-care in real time without digital devices, which would make them suitable for low-technology or no-electricity environments.

Conclusions: There is incredible potential for prehospital admission prediction models to improve patient care and hospital operations. Patient data can be utilized to act as predictors and as data-driven, actionable tools to identify patients likely to require imminent hospital admission and reduce patient boarding and crowding in emergency departments. Prediction models can be used to justify earlier patient admission and care, to lower morbidity and mortality, and models that utilize biomarker predictors offer additional advantages.

JMIR Med Inform 2021;9(9):e30022




The delivery of timely quality care in emergency departments has become increasingly challenging due to crowding [1,2]. Emergency department crowding is an international problem [3-5] that has been of continuing concern for the last two decades and is expected to become more problematic with population growth and an aging population whose life expectancy is increasing. The magnitude of the crowding problem has been demonstrated by decades of research into emergency department efficiency interventions that aimed to reduce crowding by improving throughput and processes, such as triage, diagnosis, and treatment, that affect the flow of care [6,7]. However, these measures primarily promoted efficiency in portions of the emergency department care continuum and had little effect in reducing crowding, because they did not address the source of the problem at a system level [8].

Rigorous analysis suggests that exit block and emergency department boarding are the main causes of emergency department crowding [6,9-12]. Boarding is the retention of patients who have already been admitted to the hospital in the emergency department because they await assignment to an inpatient hospital bed [5]. Exit block is the delay that occurs when patients cannot be transitioned into the hospital for admission or discharged (home, rehabilitation, etc) in a timely manner [5,8]. Exit block results in emergency department boarding and is a system issue [8,13]. Both boarding and the resulting overcrowding have been conclusively associated with poor patient outcomes and threats to patient safety [5,14-17].

Predictive Modeling

Predictive modeling that can be used to address emergency department crowding is an emerging field of study. Predictive modeling is used to anticipate which factors will bring about a particular outcome [18]. In health care, models use specific data to estimate the probability that a condition or disease is already present (a diagnostic model) or the probability that an outcome will occur in the future (a prognostic model) [18]. Recent studies [19-28] of models utilizing these techniques estimate patient risk for health conditions and patient–provider encounters (eg, suicide attempts or intentional acts of self-harm) [19], acute kidney injury (ie, sudden kidney failure or damage) [20], hospital readmissions (ie, readmission to a hospital within 30 days of discharge, regardless of cause) [23,24,26,27], and perioperative mortality (ie, deaths within 30 days of surgery) [21], emergency department return visits (ie, return emergency department visits within 72 hours for any reason) [28], return visits after hospital discharge (ie, return emergency department visits within 30 days of hospital discharge for any reason) [25], and emergency department crowding or demand (ie, the availability of space for patients relative to the volume of patients that need to be seen) [22]) to improve health care delivery and patient outcomes. A subsection of this area of study focuses on predicting which emergency department patients are likely to require imminent hospital admission. This area of research is important because of its direct and immediate potential to lower patient morbidity and mortality by helping emergency department patients receive care earlier in the emergency department care continuum.

While more prediction models have been developed in recent years [18], external validation studies of published prediction models have not kept pace [29]. There is often no consensus about the best, most effective model for a particular purpose, leaving providers and policy makers unable to choose a model with confidence. In the case of hospital admission prediction, most models have not been externally validated or tested in a live emergency department environment. Furthermore, systematic reviews have received scrutiny for their lack of rigor [30-32]. Hence, a rigorous systematic review of studies of admission prediction models is needed to synthesize findings that researchers and decision-makers can rely on with confidence to address localized emergency department boarding, crowding, and exit block, as well as system-wide implications.

Systematic Review Validation

Rigorous systematic reviews follow accepted approaches. PROBAST (Prediction Model Risk of Bias Assessment Tool) [33] can be used to identify potential sources of bias in individual prediction model studies, and CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) [34] can also be used to identify potential sources of bias, organize information, and identify relevant information used to evaluate the prediction modeling studies. While the systematic review of clinical trials is generally a well-established field, the fields of health care prediction modeling and systematic review of such studies are not as well established, despite growth in these fields. For example, a search of Google Scholar for “systematic review” AND “prediction” AND “healthcare” demonstrated an increase of 410% in publications between decades (from n=45,900 in 2000-2010 to n=234,000 in 2010-2020). As the number of prediction modeling publications continue to grow, the need exists to apply the same rigor to systematic reviews of health care–related prediction modeling as that which has been applied to clinical trial and other types of systematic reviews through the use of tools, such as PROBAST and CHARMS, to facilitate quality assessment for individual prediction model studies using standardized guidelines [30,33]. Only two systematic reviews [35,36] that have focused on increasing overall throughput by decreasing emergency department boarding and systemic exit block in health systems applied the rigorous PROBAST and CHARMS methodologies, with both reporting a high degree of bias in the studies that they examined.

Logistic Regression for Systematic Reviews

Logistic regression is a technique for understanding the relationships between predictor variables and outcomes and is one of the most commonly used methods for forecasting [37]. There are a variety of techniques that can be used to model data; each is designed to accommodate types of data, number of predictors, and study aims, and each has advantages and disadvantages. Logistic regression is only used for data with a binary outcome and multiple predictors and accommodates predictors of multiple data types, such as continuous and categorical data; therefore, data types do not need to be modified, which can introduce potential bias. Logistic regression produces a mathematical form—a weighted combination of variables that predict the outcome variable [37].

We aimed to better understanding predictive modeling’s role in addressing the emergency department crowding problem by examining model predictive performance, the utility of the contribution of prehospital patient data to model prediction, applications of models, and the utility of models.

Study Design

We applied PROBAST and CHARMS to rigorously assess studies of models designed to predict adult patient imminent hospital admission using prehospital patient data collected early in the emergency department visit or during ambulance transport to the emergency department. We searched databases for papers published from inception through September 30, 2019. Data were organized and analyzed in Excel (version 2016, Microsoft Inc). This study did not require institutional review board authorization.

Data Sources and Search Strategy

We reviewed database content descriptions for 99 health science, public health, and medical databases to determine their relevance to our topic of interest, and 13 databases were found to be relevant: EBSCO Database (includes Medline database and Academic Search Complete database), CINAHL Plus with Full Text, Cochrane Library, Health and Safety Review, ProQuest Central, Scopus, BMJ Journals, JAMA, Journals at Ovid, PLOS, SAGE Journals, ScienceDirect, and NIHR/PROSPERO.

The Title, abstract, or keyword option was used with the following search string: “model or strategy and hospital* and predict* or risk.” (Asterisks were used to capture hospital, hospitalization, hospitalisation, hospitalized, hospitalized and predict, predicts, predicted, predictor, predictive.) If no results were initially produced, the search was expanded by removing all filters and searching for the terms anywhere in the document. Sources that did not allow for truncation were searched multiple times with multiple word combinations. Additionally, the internet was searched with the following combined terms: “model predict hospital admission,” “risk of hospital admission,” “hospital admission model,” “admission risk,” “emergency model,” and “hospital admission.” Reference lists were also reviewed (Figure 1).

Figure 1. Search flow diagram of included studies.
View this figure

Inclusion and Exclusion Criteria

We included full-text peer-reviewed English-language studies that evaluated strategies or models using prehospital patient data to predict imminent hospital admission of primarily adult general medicine patients with regression.

Studies in which the setting was not an emergency department, data were not collected early in the emergency department visit, or either models or logistic regression were not used and that focused on pediatric (<16 years of age), psychiatric, or specific health conditions were excluded.

Data Quality Assessment

We used PROBAST to assess risk of bias for each study. Shortcomings in a study’s design, conduct, or analysis can cause systematic errors that result in flawed or distorted results and hamper internal validity [18]. Assessment of the quality of studies, including risk of bias and model applicability to the target settings and populations, is an essential component of systematic reviews and their evidence synthesis. The first step in applying PROBAST was the identification of a clear and focused review question about the intended use of the model, targeted participants, predictors used in the modeling, and predicted outcome [33]. The second step was the identification and assessment of potential sources of bias in 4 domains (participants, predictors, outcomes, analysis). Key qualities assessed for each study included the appropriateness of the data source, whether predictors were similarly measured and defined, whether outcomes were measured similarly for all participants, and whether missing data were appropriately handled and reported.

Data Extraction and Data Synthesis

We used CHARMS to identify key items in 11 domains (eg, source of data, sample size, model development, model performance, results) in individual studies (and in their PROBAST reports) in order to evaluate potential sources of bias and issues that may affect the applicability of results in relation to the intended use of the model. Key information was organized by relevant domains (Multimedia Appendix 1).


Searches produced 1164 citations, from which 47 were selected for full review; 11 studies met inclusion criteria. Each model was critically assessed with PROBAST (Multimedia Appendix 2) and CHARMS.

CHARMS Study Characteristics

Data Source, Participants, and Outcome CHARMS Domains 1, 2, and 3

Of the 11 studies, 3 used a prospective observational cohort [38-40], and the remaining 8 used a retroactive observational cohort [22,41-47]. There was good diversity, in terms of the countries in which studies took place (South Africa [38], Scotland [41], the United States [22,42,44,45], the Netherlands [40,43], Australia [39], and Singapore [46,47]). Sampling ranged from 14 days [40] to 10 years [46], with most study durations between 3 and 27 months [38,39,41-43,45,47]. Two studies were 2 months in length [22,44].

Most studies utilized clinical and administrative patient information collected early in the emergency visit [22,38-43,46,47]; 2 studies used data collected during ambulance transport to the emergency department [44,45]. Additionally, all studies evaluated 1 or more models’ abilities to predict patient imminent need for hospital admission and defined outcome event by patient final disposition, and measured outcome by patient hospital admission or discharge from the emergency department. Furthermore, all studies corresponded to the outcome definition of the systematic review question, which reduced the potential for bias from different outcome definitions and measurement methods that can lead to differences in study results and would be a source of heterogeneity across studies [34].

Candidate Predictors CHARMS Domain 4

Candidate predictors included all predictors investigated in a given study for predictive performance and not the finalized predictors included in model analysis. Candidate predictors ranged from 5 to 14 per study (Multimedia Appendix 3): under 10 predictors [22,38,39,45], over 10 predictors [40,41,43,44,46,47], and did not report [42]. Overall, 52 candidate predictors had been evaluated, and 34 predictors were retained in models (across all studies).

Sample Size CHARMS Domain 5

Consideration of sample size is important to ensure adequate numbers of data events are collected to achieve meaningful results. Sample sizes ranged from 401 to 864,246. None reported sample size calculation, estimation, or rationale. One study [40] did, however, perform a sample size calculation for its validation. All studies described efforts to avoid overfitting, which included model comparison to validation models [22,38,40,41,43,44,46,47], model comparison to multiple site outcomes [45], model comparison to published models [42], and model comparison to triage nurse prediction of patient final disposition [39]. Overfitting describes when findings in the development sample do not exist in the relevant population resulting in a model that too closely fits the development data set and produces findings that are not reproducible [37]. Overfitting is a primary concern in prediction modeling development that can be mitigated by performing sample size estimates during study design [34].

Missing Data CHARMS Domain 6

Infrequently is value attributed to missing data in the missing state [48]; instead, the missing values are either imputed or disregarded completely [49,50]. Four studies described a process for handling missing data: 3 used multiple imputation [39,41,43], and 1 study reported “missing predictors were replaced with missing values” [42]; it was unknown whether this referred to blank (ie, missing) identifiers or whether missing values were imputed. Of the remaining 7 studies, 1 study reported 30% of data were missing and did not describe how missing data were handled (ie, whether the patient events were included or excluded) [38], and 6 studies did not mention missing data at all [22,40,44-47].

Model Development CHARMS Domain 7

Two studies also developed models using other techniques (gradient boosting and deep neural network [42], and naive Bayes [22]) in addition to models using logistic regression. Most studies selected predictors using univariate analysis [22,39,40,42,43,46,47], but 4 studies used multivariate modeling [38,41,44,45].

Model Performance CHARMS Domain 8

Model predictive performance was gauged via the percentage of patients actually admitted, the percentage of patients predicted to be admitted, and goodness of fit tests that assessed model discrimination and model calibration (Table 1).

Table 1. Model performance predicting patient hospital admission.
ReferenceModel performance

AdmissionGoodness of fit tests
Actual, n (%)Predicted, %Discrimination, AUROCa (95% CI)Calibrationb
Burch et al [38]469 (59)c
Cameron et al [41]0.88 (0.88-0.88)
Hong et al [42]60,277 (29.7)0.86(0.86-0.87)
Kim et al [39]38,695 (38.6)0.80 (0.80-0.80)Performed, not reported
Kraaijvanger et al [40]400 (31.7)31.10.87 (0.85-0.89)Reported to be good
Lucke et al [43]2912 (27)21.40.86 (0.85-0.87)Reported to be good
Meisel et al [44]132 (33)320.80 (—)Performed, not reported
Meisel et al [45]440 (24.8)39.80.83 (—)
Parker et al [46]334,115 (38.7)0.83 (0.82-0.83)Reported to be good
Peck et al [22]0.89 (—)r2=0.58 moderate to poor
Sun et al [47]95,909 (30.2)300.85 (0.85-0.85)Reported to be good

aAUROC: area under the receiver operating characteristics curve.

bStudies used several formulas to evaluate calibration, to include Hosmer-Lemeshow, threshold probability, and r2.

cNot reported.

Discrimination is a model’s ability to distinguish between patients who do and do not experience the outcome of interest and is most commonly assessed with the area under the receiver operating characteristics (AUROC) [51]. The AUROC represents the performance of a classification model that has a categorical outcome, producing a score representing a proportion of times the model correctly discriminated between groups, for example, those at high risk and low risk. The higher the AUROC, the better the model discriminates between the 2 groups (0.5-0.6 represents not better than chance, 0.6-0.7 represents poor, 0.7-0.8 represents fair, 0.8-0.9 represents good, and 0.9-1.0 represents excellent discrimination [52]). Eight studies reported good discrimination [22,40,47], 2 reported fair discrimination [39,53], and 1 study did not report any performance measurement [38].

Calibration is the extent to which model predicted risk compares to observed outcomes (ie, difference between rates of observed events and predicted events for groups [54]. Calibration is usually reported graphically by plotting observed against predicted event rates [55] and is commonly measured with the Hosmer-Lemeshow statistical test for binary categorical outcomes [54]. Most studies that measured calibration statistically, reported good agreement between predicted and observed hospital admission. Seven models evaluated calibration using Hosmer-Lemeshow [44,47,43,39], threshold probability of admission [46], or R2 [22], 1 did not report which statistic was used [40], and 2 of these 7 studies did not report results [39,44]. Four studies did not measure calibration [38,41,42,45].

Model Evaluation: Domain 9

Utility of predictive models depends on their external validation—performance evaluation on an independent data set. External validation took a variety of forms: different settings with different samples [40], same locations with different samples [43,45,46], and nurse opinion on likely patient admission [22,39]. Five models were internally validated [38,41,42,44,47].

Model Results: Domain 10

Predictive accuracy and precision drive model performance and the extent to which it can estimate the probability of individual patient outcomes, as well as model suitability for clinical and administrative uses.

The models in the 11 studies were not operational (no apps developed and no integration with information systems or workflow) and were not tested in environments in which they would be used, which compromised the evaluation of model feasibility. Operational models would identify patients likely to require hospital admission; thus, there is a great amount of utility and potential for models to improve patient care and hospital operations, including by reducing hospital exit block, emergency department boarding, and ultimately emergency department crowding.

Interpretation and Discussion: Domain 11

The utility of select prehospital patient data to act as predictors and as data-driven, actionable tools to identify patients requiring hospital admission was shown. The models utilizing biomarker predictors (eg, blood pressure, heart rate) [38,43,45] may provide advantages due to standardized definition, measurement, and interpretation of these biomarker measures. Models that use only biomarker predictors may be widely applicable and robust, and their results may be generalizable to populations and environments. Models that did not include patient history variables (eg, chronic conditions, number of prior emergency department visits) [22,38,40,47] may have greater applicability because the model does not rely on the availability of medical record information or patient reports. The predictors in these models—prehospital patient data collected early in the emergency department visit or during ambulance transport—are not the only options for predicting patient admission but are likely the best options for making timely predictions using data collected in the early stages of an urgent care visit.

AUROC values suggested fair to good ability to distinguish between outcome groups (admitted, not admitted), and thus, to predict patient imminent need for hospital admission. Likewise, the utility of the variables as predictors for the identification of patients likely to require imminent hospital admission was shown.

Risk of Bias Assessment

Data transformation can increase risk of bias by satisfying assumptions without changing the scale of representation [56]. Five studies did not transform raw data [38,44-47]. On the other hand, 6 studies transformed predictors, such as, by categorizing continuous variables and dichotomizing continuous variables [22,39-43].

Evaluation of heterogeneous predictors across studies introduces bias if they are treated as identical. In 2 studies, bias was low, because standardized, frequently calibrated equipment was used to measure predictors (eg, blood pressure, laboratory analysis, etc), which produces measurements that are comparable across studies, required no manipulation (eg, dichotomized, categorized), and offer more likelihood of retaining reliability when applied to new populations [38,43]. Age has been shown to inject bias, for example, the same model can appear to perform better when applied to a sample with a wide age range than when applied to a sample with a narrow age range [57]. Nine models included age [22,39-41,43-47], with only 2 studies indicating age >60 years [44,45].

Estimating sample size during study design minimizes model overfitting and includes calculating events-per-variable. Events-per-variable, generally, is poorly reported in prediction model studies [34] and was not reported in any of the included studies. However, events-per-variable can be calculated from other study information to aid assessment of study quality. The appropriateness of most studies’ sample size could be evaluated by calculating study events-per-variable, the number of data events needed per predictor variable to achieve meaningful results [37]. This ratio was calculated using study limiting sample size, the portion of outcome events (admitted or not admitted) that is smaller [37]. The focus is on the smaller portion of outcome events, because the total sample size is not directly relevant in binary models [37]. The limiting sample size is divided by the number of candidate predictors to produce the limiting events-per-variable ratio.

In 10 studies [22,39-47], the limiting sample size was the number of admitted patients, but in 1 study [38] the limiting sample size was the number of patients who were not admitted (ie, more patients were admitted than discharged). Limiting events-per-variable could not be calculated for 3 models because either the proportion of admitted patients or the number of candidate predictors was not reported [22,41,42]. The limiting sample size range of studies was 132.3 to 334,115, producing a limiting events-per-variable range of 9 to 30,374. The limiting events-per-variable was sufficient in most studies to obtain meaningful results and avoid bias from an overfitted model. However, at 9 events-per-variable, 1 model [44] was below the recommended 10 to 15 events-per-variable [42,58,59] and was in jeopardy of bias.

Missing data handling can inject bias. To mitigate against bias with imputation, 3 studies used multiple imputation [39,41,43], substituting missing observations with plausible estimated values derived from analysis of available data, which is the preferred method for handling missing data in prediction research [34,60]. One study [42] reported replacing missing values but did not disclose how these missing values were placed, and the remaining 7 studies did not describe the handling of missing data [22,38,40,44-47], which suggested there was an element of risk of bias. Data are usually not missing at random and instead are related to other observed participant data and, as a consequence, participants with complete data are different from those with incomplete data [34,61].

Per PROBAST definition, a model that is internally validated is a development-only study—not a development and validation study. A model must be externally validated to be considered a development and validation study. While 6 of the models were externally validated [22,39,40,43,45,46], 2 studies used nurses’ opinions [22,39] and were not validated with data.

Inclusion of false predictors increases the likelihood of model overfitting because the model corresponds too closely to its derivation data set and fails to fit other relevant data sets or predict future observations reliably [62], resulting in overly optimistic predictions of model performance for new data sets [34]. In univariate analysis, each predictor is tested individually for its association with the outcome, and the most statistically significant predictors are included in the model. However, univariate analysis is not the preferred method because it commonly introduces selection bias when predictors selected for model inclusion have a large but false association with the outcome [18,63]. In small samples, predictors could initially show no association with outcome, but after adjustment for other predictors, may show association with the outcome [34]. Conversely, multivariate modeling is preferred for predictor selection because there is no selection bias since all predictors are prespecified. Only 4 of the models used multivariate modeling for predictor selection [38,41,44,45], and the remaining models used univariate analysis [22,39,40,42,43,46,47].

Principal Findings

This study showed the utility of select, prehospital patient data to act as predictors to model identification of patients likely to require hospital admission and that models produced information that could be used to improve patient care and hospital operations. Ten studies reported model discrimination with AUROC: 8 studies reported values [22,40-43,45-47] that suggest good ability to distinguish between outcome groups (admitted, not admitted), and thus, to predict patients’ imminent need for hospital admission. An example of model application for patients who are predicted to require admission is earlier bed request giving managers more time to secure a patient bed. This forewarning could result in operations procedures to decrease exit block and increase patient flow out of the emergency department [13].

Potential sources of bias that may cause flawed or distorted model predictions were found in every model, for example, from minor (not reporting handling of missing values [38,39,43,44,47], univariate predictor selection [39,47]) to potentially damaging (dichotomized continuous variables [22,41,43], low events-per-variable [44], no external validation [38,41,42,44,47]), which suggest that study reports of models’ abilities to predict outcomes have the potential to be flawed. This is consistent with other evaluations of prediction modeling studies [34], including evaluations applying CHARMS and PROBAST in the emergency department setting [35,36].

Overall, model performances were reportedly good, with most models showing good ability to discriminate between patients who do and do not require imminent hospital admission [22,40-43,45-47], and almost half reporting good calibration to detect differences between observed and predicted admission rates [40,43,46,47]. Although several studies did not measure calibration [38,41,42,45], the remainder did [22,39,40,43,44,46,47]. However, all [38-47] but 1 study [22] poorly reported its measurement. Findings of neglected calibration measures, with an overreliance on discrimination measures, are consistent with those of other reports [34]. Assessing and reporting discrimination and calibration are important in prediction model evaluation. No models were found to have operated through an app, and none had been integrated with an information system. However, to function as intended, most models required development of an electronic app to receive patient data, operate the algorithm, and produce results. Most also required app integration with an information system to produce real-time admission prediction. Studies also did not describe a process to achieve app development or system integration.

Biomarker predictors may contribute superior value and advantage to a model due to their lack of variability in definition, measurement, and interpretation, and freedom from the confines of patient histories, resulting in a widely applicability.

The quantity of candidate predictors demonstrated the breadth of potential influences on patients’ imminent need for hospital admission. However, the number of predictors across studies did not reflect the quantity accurately because, across studies, multiple names were used for the same predictor—identically named predictors were defined differently, data collection and evaluation varied, and predictors composed of multiple variables were not specified

Models have the potential to facilitate hospital admission, subsequently reducing or ending hospital exit block, emergency department boarding, and emergency department crowding but none had been implemented or tested.

To develop models with the most potential, future investigations must address deficiencies, avoid risk of bias in model design and investigation, verify the utility of biomarker predictors and the most useful predictor combination, evaluate real-time utility of admission prediction on hospital operations, compare performance of technology enabled versus intuition, and verify longitudinal model impact on patient care and hospital operations.


Although the findings of this review are valuable and add to the current literature on artificial intelligence models in the emergency department setting, this study has several limitations. First, this was a critique of the methodologies used in the models; we did not consider the feasibility of the models examined. Second, the selection of studies and PROBAST assessments were performed by one researcher, with a second researcher providing oversight. The use of multiple researchers would have ensured intercoder reliability and mitigated systematic errors. Additionally, only studies in English and conducted with emergency department setting data were included. That being said, this study closely adhered to the CHARMS methodology for study evaluation.

Comparison With Prior Work

We applied both CHARMS and PROBAST to studies that used logistic regression and data from emergency department settings. Our findings are consistent with those of previous systematic reviews [35,36,64,65] that applied PROBAST and CHARMS methodologies to evaluate health care prediction models, in terms of risk of bias. We attempted to be focused and provide depth of analysis by identifying and appraising hospital admission prediction models that utilized prehospital patient data in a defined setting (emergency department). Four healthcare prediction model studies were reviewed for their use of PROBAST and CHARMS methodologies. However, while 2 [35,36] were set in the emergency department, evaluation variables and outcome of interest differed for all 4 studies [35,36,64,65].


ACM conceived the study design, conducted the literature review and analysis, and contributed to writing the manuscript. SSF provided oversight on the study design and literature analysis and contributed to writing the manuscript.

Conflicts of Interest

SSF receives consultancy fees from Guideway Cares (which are not in relation to this work).

Multimedia Appendix 1

Study characteristics by CHARMS domains.

DOCX File , 34 KB

Multimedia Appendix 2

Completed PROBAST.

DOCX File , 53 KB

Multimedia Appendix 3

Predictors evaluated by each study.

DOCX File , 29 KB

  1. Sinclair D. Emergency department overcrowding - implications for paediatric emergency medicine. Paediatr Child Health 2007 Jul;12(6):491-494 [FREE Full text] [CrossRef] [Medline]
  2. American College of Emergency Physicians (ACEP). Crowding. policy statement. Ann Emerg Med 2013 Jun;61(6):726-727. [CrossRef] [Medline]
  3. Di Somma S, Paladino L, Vaughan L, Lalle I, Magrini L, Magnanti M. Overcrowding in emergency department: an international issue. Intern Emerg Med 2014 Dec 2;10(2):171-175. [CrossRef] [Medline]
  4. Higginson I, Boyle A. What should we do about crowding in emergency departments? Br J Hosp Med (Lond) 2018 Sep 02;79(9):500-503 [FREE Full text] [CrossRef] [Medline]
  5. Richards JR, van der Linden C, Derlet RW. Providing Care in Emergency Department Hallways: Demands, Dangers, and Deaths. Adv Emerg Med 2014 Dec 25;2014:1-7 [FREE Full text] [CrossRef]
  6. Stead LG, Jain A, Decker WW. Emergency department over-crowding: a global perspective. Int J Emerg Med 2009 Sep 30;2(3):133-134 [FREE Full text] [CrossRef] [Medline]
  7. Kauppila T, Seppänen K, Mattila J, Kaartinen J. The effect on the patient flow in a local health care after implementing reverse triage in a primary care emergency department: a longitudinal follow-up study. Scand J Prim Health Care 2017 Jun 08;35(2):214-220 [FREE Full text] [CrossRef] [Medline]
  8. Henderson K, Boyle A. Exit block in the emergency department: recognition and consequences. Br J Hosp Med (Lond) 2014 Nov 02;75(11):623-626 [FREE Full text] [CrossRef] [Medline]
  9. Higginson I. Emergency department crowding. Emerg Med J 2012 Jun 04;29(6):437-443. [CrossRef] [Medline]
  10. Institute of medicine. Hospital-Based Emergency Care: At the Breaking Point. Washington, DC: National Academies Press; 2006.
  11. Mason S, Knowles E, Boyle A. Exit block in emergency departments: a rapid evidence review. Emerg Med J 2017 Jan 27;34(1):46-51. [CrossRef] [Medline]
  12. Scott I, Sullivan C, Staib A, Bell A. Deconstructing the 4-h rule for access to emergency care and putting patients first. Aust Health Rev 2018;42(6):698. [CrossRef] [Medline]
  13. Orewa G, Feldman SS, Hearld KR, Kennedy KC, Hall AG. Using accountable care teams to improve timely discharge: a pilot study. Qual Manag Health Care 2021 Aug 03:1. [CrossRef] [Medline]
  14. Carter EJ, Pouch SM, Larson EL. The relationship between emergency department crowding and patient outcomes: a systematic review. J Nurs Scholarsh 2014 Mar 19;46(2):106-115 [FREE Full text] [CrossRef] [Medline]
  15. Reznek MA, Murray E, Youngren MN, Durham NT, Michael SS. Door-to-imaging time for acute stroke patients is adversely affected by emergency department crowding. Stroke 2017 Jan;48(1):49-54. [CrossRef] [Medline]
  16. Odom N, Babb M, Velez L, Cockerham Z. Stud Health Technol Inform 2018;250:178-181. [Medline]
  17. Eriksson CO, Stoner RC, Eden KB, Newgard CD, Guise J. The association between hospital capacity strain and inpatient outcomes in highly developed countries: a systematic review. J Gen Intern Med 2017 Jun 15;32(6):686-696 [FREE Full text] [CrossRef] [Medline]
  18. Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019 Jan 01;170(1):w1-w33. [CrossRef] [Medline]
  19. Simon GE, Johnson E, Lawrence JM, Rossom RC, Ahmedani B, Lynch FL, et al. PPredicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry 2018 Oct 01;175(10):951-960 [FREE Full text] [CrossRef] [Medline]
  20. Mohamadlou H, Lynn-Palevsky A, Barton C, Chettipally U, Shieh L, Calvert J, et al. Prediction of acute kidney injury with a machine learning algorithm using electronic health record data. Can J Kidney Health Dis 2018 Jun 08;5:2054358118776326 [FREE Full text] [CrossRef] [Medline]
  21. Garcea G, Ganga R, Neal CP, Ong SL, Dennison AR, Berry DP. Preoperative early warning scores can predict in-hospital mortality and critical care admission following emergency surgery. J Surg Res 2010 Apr;159(2):729-734. [CrossRef] [Medline]
  22. Peck J, Benneyan J, Nightingale D, Gaehde S. Predicting emergency department inpatient admissions to improve same-day patient flow. Acad Emerg Med 2012 Sep;19(9):E1045-E1054 [FREE Full text] [CrossRef] [Medline]
  23. Allaudeen N, Vidyarthi A, Maselli J, Auerbach A. Redefining readmission risk factors for general medicine patients. J Hosp Med 2011 Feb 12;6(2):54-60. [CrossRef] [Medline]
  24. Mudge AM, Kasper K, Clair A, Redfern H, Bell JJ, Barras MA, et al. Recurrent readmissions in medical patients: a prospective study. J Hosp Med 2011 Feb 12;6(2):61-67. [CrossRef] [Medline]
  25. Li C, Chang H, Wang H, Bai Y. Diabetes, functional ability, and self-rated health independently predict hospital admission within one year among older adults: a population based cohort study. Arch Gerontol Geriatr 2011 Mar;52(2):147-152. [CrossRef] [Medline]
  26. Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk prediction models for hospital readmission: a systematic review. JAMA 2011 Oct 19;306(15):1688-1698 [FREE Full text] [CrossRef] [Medline]
  27. Nguyen OK, Makam AN, Clark C, Zhang S, Das SR, Halm EA. Predicting 30‐Day Hospital Readmissions in Acute Myocardial Infarction: The AMI “READMITS” (renal function, elevated brain natriuretic peptide, age, diabetes mellitus nonmale sex intervention with timely percutaneous coronary intervention, and low systolic blood pressure) score. JAHA 2018 Apr 17;7(8):e008882. [CrossRef]
  28. Bergese I, Frigerio S, Clari M, Castagno E, De Clemente A, Ponticelli E, et al. An innovative model to predict pediatric emergency department return visits. Pediatr Emerg Care 2019 Mar;35(3):231-236. [CrossRef] [Medline]
  29. Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol 2016 Jan;69:245-247 [FREE Full text] [CrossRef] [Medline]
  30. Brackett A, Batten J. Ensuring the rigor in systematic reviews: Part 1, the overview. Heart Lung 2020;49(5):660-661. [CrossRef] [Medline]
  31. Ioannidis JP. he mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q 2016 Sep 13;94(3):485-514 [FREE Full text] [CrossRef] [Medline]
  32. McIntosh GS, Steenstra I, Hogg-Johnson S, Carter T, Hall H. Lack of prognostic model validation in low back pain prediction studies: a systematic review. Clin J Pain 2018 Aug;34(8):748-754. [CrossRef] [Medline]
  33. Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019 Jan 01;170(1):51. [CrossRef] [Medline]
  34. Moons KM, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med 2014 Oct;11(10):e1001744 [FREE Full text] [CrossRef] [Medline]
  35. Miles J, Turner J, Jacques R, Williams J, Mason S. Using machine-learning risk prediction models to triage the acuity of undifferentiated patients entering the emergency care system: a systematic review. Diagn Progn Res 2020 Oct 02;4(1):16 [FREE Full text] [CrossRef] [Medline]
  36. Kareemi H, Vaillancourt C, Rosenberg H, Fournier K, Yadav K. Machine learning versus usual care for diagnostic and prognostic prediction in the emergency department: a systematic review. Acad Emerg Med 2021 Feb 02;28(2):184-196. [CrossRef] [Medline]
  37. Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med 2004 May 01;66(3):411-421. [CrossRef] [Medline]
  38. Burch VC, Tarr G, Morroni C. Modified early warning score predicts the need for hospital admission and inhospital mortality. Emerg Med J 2008 Oct 01;25(10):674-678. [CrossRef] [Medline]
  39. Kim SW, Li JY, Hakendorf P, Teubner DJ, Ben-Tovim DI, Thompson CH. Predicting admission of patients by their presentation to the emergency department. Emerg Med Australas 2014 Aug 16;26(4):361-367. [CrossRef] [Medline]
  40. Kraaijvanger N, Rijpsma D, Roovers L, van Leeuwen H, Kaasjager K, van den Brand L, et al. Development and validation of an admission prediction tool for emergency departments in the Netherlands. Emerg Med J 2018 Aug 07;35(8):464-470. [CrossRef] [Medline]
  41. Cameron A, Rodgers K, Ireland A, Jamdar R, McKay GA. A simple tool to predict admission at the time of triage. Emerg Med J 2015 Mar 13;32(3):174-179 [FREE Full text] [CrossRef] [Medline]
  42. Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One 2018 Jul 20;13(7):e0201016 [FREE Full text] [CrossRef] [Medline]
  43. Lucke JA, de Gelder J, Clarijs F, Heringhaus C, de Craen AJM, Fogteloo AJ, et al. Early prediction of hospital admission for emergency department patients: a comparison between patients younger or older than 70 years. Emerg Med J 2018 Jan 16;35(1):18-27. [CrossRef] [Medline]
  44. Meisel ZF, Pollack CV, Mechem CC, Pines JM. Derivation and internal validation of a rule to predict hospital admission in prehospital patients. Prehosp Emerg Care 2008 Jul 02;12(3):314-319. [CrossRef] [Medline]
  45. Meisel Z, Mathew R, Wydro G, Crawford Mechem C, Pollack C, Katzer R, et al. Multicenter validation of the Philadelphia EMS admission rule (PEAR) to predict hospital admission in adult patients using out-of-hospital data. Acad Emerg Med 2009 Jun;16(6):519-525 [FREE Full text] [CrossRef] [Medline]
  46. Parker CA, Liu N, Wu SX, Shen Y, Lam SSW, Ong MEH. Predicting hospital admission at the emergency department triage: a novel prediction model. Am J Emerg Med 2019 Aug;37(8):1498-1504. [CrossRef] [Medline]
  47. Sun Y, Heng B, Tay S, Seow E. Predicting hospital admissions at emergency department triage using routine administrative data. Acad Emerg Med 2011 Aug;18(8):844-850 [FREE Full text] [CrossRef] [Medline]
  48. Feldman SS, Davlyatov G, Hall AG. Toward understanding the value of missing social determinants of health data in care transition planning. Appl Clin Inform 2020 Aug 26;11(4):556-563 [FREE Full text] [CrossRef] [Medline]
  49. Rubin DB. Inference and missing data. Biometrika 1976 Dec;63(3):581-592. [CrossRef]
  50. Lin W, Tsai C. Missing value imputation: a review and analysis of the literature (2006–2017). Artif Intell Rev 2019 Apr 5;53(2):1487-1509. [CrossRef]
  51. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics 2005 Mar;61(1):92-105. [CrossRef] [Medline]
  52. Tape TG. The area under an ROC curve. The University of Nebraska. nebraska   URL: [accessed 2020-10-03]
  53. Wiens J, Price WN, Sjoding MW. Diagnosing bias in data-driven algorithms for healthcare. Nat Med 2020 Jan 13;26(1):25-26. [CrossRef] [Medline]
  54. Waljee A, Higgins P, Singal A. A primer on predictive models. Clin Transl Gastroenterol 2014 Jan 02;5(1):e44 [FREE Full text] [CrossRef] [Medline]
  55. Harrell JF. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression and Survival Analysis. New York: Springer; 2001:978-971.
  56. Rothery P. A cautionary note on data transformation: bias in back-transformed means. Bird Study 2009 Jun 24;35(3):219-221. [CrossRef]
  57. Pencina MJ, D'Agostino RB. Evaluating discrimination of risk prediction models: the C statistic. JAMA 2015 Sep 08;314(10):1063-1064. [CrossRef] [Medline]
  58. Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. accuracy and precision of regression estimates. J Clin Epidemiol 1995 Dec;48(12):1503-1510. [CrossRef] [Medline]
  59. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996 Dec;49(12):1373-1379. [CrossRef] [Medline]
  60. Little R, Rubin D. Statistical Analysis With Missing Data. England: John Wiley & Sons, Inc; 2002:9780471183860.
  61. Vandenbroucke J, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, STROBE Initiative. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. Int J Surg 2014 Dec;12(12):1500-1524 [FREE Full text] [CrossRef] [Medline]
  62. Moons KM, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: I. development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012 May 07;98(9):683-690. [CrossRef] [Medline]
  63. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer; 2009.
  64. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ 2020 Apr 07;369:m1328 [FREE Full text] [CrossRef] [Medline]
  65. Shamsoddin E. Can medical practitioners rely on prediction models for COVID-19? a systematic review. Evid Based Dent 2020 Sep;21(3):84-86 [FREE Full text] [CrossRef] [Medline]

AUROC: area under the receiver operating characteristics curve
CHARMS: Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies
PROBAST: Prediction Model Risk of Bias Assessment Tool

Edited by G Eysenbach, C Lovis; submitted 28.04.21; peer-reviewed by A Vagelatos; comments to author 19.05.21; revised version received 27.05.21; accepted 28.07.21; published 16.09.21


©Ann Corneille Monahan, Sue S Feldman. Originally published in JMIR Medical Informatics (, 16.09.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.