Predicting Intensive Care Unit Length of Stay and Mortality Using Patient Vital Signs: Machine Learning Model Development and Validation

Patient monitoring is vital in all stages of care. We here report the development and validation of ICU length of stay and mortality prediction models. The models will be used in an intelligent ICU patient monitoring module of an Intelligent Remote Patient Monitoring (IRPM) framework that monitors the health status of patients, and generates timely alerts, maneuver guidance, or reports when adverse medical conditions are predicted. We utilized the publicly available Medical Information Mart for Intensive Care (MIMIC) database to extract ICU stay data for adult patients to build two prediction models: one for mortality prediction and another for ICU length of stay. For the mortality model, we applied six commonly used machine learning (ML) binary classification algorithms for predicting the discharge status (survived or not). For the length of stay model, we applied the same six ML algorithms for binary classification using the median patient population ICU stay of 2.64 days. For the regression-based classification, we used two ML algorithms for predicting the number of days. We built two variations of each prediction model: one using 12 baseline demographic and vital sign features, and the other based on our proposed quantiles approach, in which we use 21 extra features engineered from the baseline vital sign features, including their modified means, standard deviations, and quantile percentages. We could perform predictive modeling with minimal features while maintaining reasonable performance using the quantiles approach. The best accuracy achieved in the mortality model was approximately 89% using the random forest algorithm. The highest accuracy achieved in the length of stay model, based on the population median ICU stay (2.64 days), was approximately 65% using the random forest algorithm.


Background
Precision observation and assessment are crucial tasks for "achieving an early diagnosis, informed planning, reflecting on the suitability of treatment options, information exchanging, and designing better health interventions" [1].The use of artificial intelligence-based solutions to improve health care services is increasing [2] and patient monitoring is now an integral part of clinical intelligence [3].The intensive care unit (ICU) is one of the most critical and resource-intensive units in hospitals, and ICU patient monitoring and continuous clinical surveillance have the potential to reduce morbidity and improve the quality of care.Therefore, hospitals often seek solutions that enable reducing waste and wait times, while increasing service efficiencies, accuracy, and productivity [2].One of the issues in current monitoring approaches is that the data are collected via sensing devices and sent to remote diagnostic testing facilities for further, often manual or semiautomated, interpretation by a health care professional.Thus, there is a need for intelligent solutions for ICU patient monitoring that require minimal human intervention and that can monitor the health status of patients, and generate timely alerts, maneuver guidance, and reports whenever adverse medical conditions are anticipated.
In our previous work [4], we proposed an Intelligent Remote Patient Monitoring (IRPM) framework (Figure 1) that consists of three modules: (i) an out-of-hospital module that utilizes data collected via wearable devices (eg, Apple Watch and SleepO2); (ii) a decision support module that generates reports; and (iii) an intelligent ICU patient monitoring module, which utilizes data collected from ICUs.We here focus on the latter module.The IRPM framework is intended to serve as a global web service interface that exposes the different framework functionalities to hospitals, hospital managers, insurance companies, and other decision makers, including the host organizations that operate and maintain the IRPM system.The intelligent ICU patient monitoring functionality of the service performs analytics of the data exchanged between ICUs and the core IRPM system, and provides the different stakeholders with the analysis results in the form of timely and early warnings.
Three main factors impact the quality of prediction models: (1) the target patient population [5], (2) methods used for data fusion, and (3) algorithm type.Different populations lead to different prediction results.Moreover, different ways of combining information from physiological variables lead to various outcome measures.The IRPM framework is intended to be hosted in the cloud since the intelligent ICU patient monitoring module aims at applying machine learning (ML) within an architecture that allows any user (regardless of whether or not they are sick) as well as any hospital system to use the framework.Since most of the used physiological variables are often obtained inside and outside hospitals, the framework will enable performing continuous patient monitoring.Therefore, we built ML models by utilizing features that are easy to obtain XSL • FO RenderX outside the hospital setting, and we avoided features that are sophisticated and require high-level medical equipment.

Related Works
There has been some research effort toward developing ML models for predicting ICU-related outcomes [6][7][8].McCarthy et al [9] performed a study on ICU mortality prediction in which they compared sliding-window predictors with recurrent predictors to classify patient state of health from ICU multivariate time-series data.They reported slightly improved performance for the recurrent neural network.Moreover, Zhu et al [10] proposed an ICU mortality prediction algorithm combining the bidirectional long short-term memory (LSTM) model with supervised learning.They trained and evaluated the LSTM model using 4000 ICU patients.They also performed a comparative analysis, which identified that their proposed method significantly outperformed several baseline methods.
A few studies have also focused on developing and validating ML models for predicting ICU-related outcomes using the Medical Information Mart for Intensive Care (MIMIC) database.Most of these works have used an exhaustive list of features to achieve higher accuracy in their models.Johnson and colleagues [11][12][13] developed models for predicting ICU mortality, achieving an area under the receiver operating characteristic (ROC) curve (AUROC) of 0.92 using a total of 148 features [12] and an AUROC of 0.86 using a range of features, including standard statistical descriptors [13].Lehman et al [14] used basic physiological variables and applied the Simplified Acute Physiology Score (SAPS-I) algorithm to predict mortality, which achieved an AUROC of 0.72.Using the Cohen standardized mean and coefficient, Tyler et al [15] assessed the differences between ICU lab values, which were used to predict ICU length of stay (LOS) and mortality.Harutyunyan et al [16] selected 17 clinical variables to build a binary LOS model to predict whether a patient will stay in the ICU for a long (≥7 days) or short (<7 days) period with 84% accuracy.Gentimis et al [17] used several inputs from seven tables to build an LOS model to predict whether a patient will stay in the ICU for a long (>5 days) or short (≤5 days) period using neural networks, with around 80% accuracy; they removed patients who stayed in the ICU longer than 20 days.Bertsimas et al [18] used several static and dynamic variables (eg, general admission data, lab results, medical orders, pharmacy data, diagnosis codes, and notes) and different classification methods to predict different LOS with accuracy in the >80% range.Some works have focused on developing ML models to be used in clinical information systems that assist in ICU discharge planning.Badawi and Breslow [19] developed and validated two models for predicting risks of death and readmission within 48 hours of ICU discharge.They used eICU Research Institute data from more than 400 ICUs and performed multivariate logistic regression (MLR) with 59 different features, including patient demographics, ICU admission diagnosis, admission severity of illness, intensive care interventions, complications occurring during the ICU stay, lab values, and physiological variables recorded within the last 24 hours of the ICU stay.They calibrated their models across deciles of risk, and their mortality model accurately discriminated between patients who would and would not experience a complication as early as 4 days before ICU discharge.However, to the best of our knowledge, predicting the LOS based on the population's median ICU patient stay using only vital signs and demographic attributes from MIMIC data has not been studied to date.

Objective
We here propose a new approach that focuses on the most critical observations in a patient's profile.The novelty of the approach lies in its ability to predict outcomes with reasonable accuracy by utilizing only vital signs that exist in the patient's profile without having prior knowledge about a patient's medical conditions or diagnoses.The approach enriches the original vital sign measures by adding extra features pertaining to their modified means, modified SDs, and quantile percentages.We evaluated the proposed approach (ie, the quantiles approach) in comparison to a baseline approach that uses the entire range of observations.We then applied both approaches to develop and validate two prediction models: (i) one focusing on classifying ICU mortality rate (survival or no survival), and (ii) another focusing on predicting the LOS in the ICU using public data from the MIMIC database.

Study Population and Data Extraction
We used MIMIC-III (v1.4) [7], a publicly available ICU adult patient database that spans 11 years between 2001 and 2012.MIMIC-III has data for 53,423 distinct hospital admissions, including nearly 500 million rows in 26 tables.The database comprises features, including patient demographics, laboratory test results, medical reports, and results from imaging studies.To meet Health Insurance Portability and Accountability Act requirements, approximate ages for patients who are more than 89 years are reported by shifting their date of birth.
Figure 2 illustrates the data extraction pipeline of ICU stays data from the MIMIC database.We started with 61,532 total ICU stay encounters.In each hospital admission, a patient could have stayed in the ICU more than once.We performed this study based on unique ICU stays rather than unique patient identifiers since our goal was to predict mortality and LOS without having prior knowledge about patients' medical conditions or diagnoses.
For patients who stayed in the ICU for at least 1 day, we considered their data for only the first day.The population's median ICU LOS was 2.64 days, and therefore we discarded data from patients who stayed in the ICU for less than 1 day, which resulted in a total of 45,254 unique ICU stays.For each ICU stay, we ran separate SQL queries to extract the patients' vital sign measurements, and height and weight features from the total 61,532 encounters.We focused on six vital sign features (body temperature, heart rate, respiration rate, systolic blood pressure, diastolic blood pressure, and oxygen saturation [SpO 2 ]) along with glucose level.The total number of ICU stays for which vital sign features were available was 59,241.We extracted four demographic features (weight, height, age, and gender).We then performed consecutive inner joins between the results of the three queries; thus, the total ICU stays reduced to 44,626 unique ICU stays.

Data Preprocessing
To enhance the accuracy of the predictive models, we eliminated extreme, trivial, and negative observations within each vital sign feature.The percentage of missing data was relatively low (less than 1% for heart rate, respiration rate, systolic blood pressure, diastolic blood pressure, SpO 2 , and glucose level, and less than 2% for body temperature).Given the low percentage of missing values and the fact that vital signs are numerical values that are typically normally distributed [20], we filled missing values of vital sign observations using the mean.

Model and Variable Selection
We built two main prediction models: in-hospital mortality and LOS for each ICU admission.Table 1 defines the outcome variables in both models.The outcome variable for the mortality model was in-hospital mortality, which reduces to a binary classification problem with two classes: predicting a patient to survive or not.The dataset has a classification imbalance problem since the in-hospital mortality percentage was 11.897%, whereas the patient survival percentage was 88.103%.The outcome for the LOS model was the number of days a patient stayed in the ICU.Half of the population spent 2.64 days in the ICU, which led us to follow two approaches for classification.In the first approach, we followed a binary classification strategy by defining two classes with an equal number of observations by considering 2.64 as a threshold.The first class predicts that a patient will stay in the ICU for 2.64 days or less, and the second class predicts that a patient will stay in the ICU for more than 2.64 days.In the second approach, we followed a regression-based classification strategy by considering the predicted outcome as a continuous variable.
We built two variations of each model: one using the baseline approach and another using the proposed quantiles approach.The models built with the baseline approach used the six vital sign features, glucose, and the five demographic features as predictor variables (Table 2).The models built with the quantiles approach used the same 12 baseline predictor variables, and augmented them with extra modified features.We discuss each model variation separately below.

Baseline Approach
Table 2 shows the descriptive statistics for the predictor variables used in the baseline approach: the patients' vital signs for the first day and the demographic variables.The population had a slight majority of men with a mean age of 64.35 years.
Pearson correlation coefficients among the vital sign variables in the baseline approach (Table 3) showed weak correlations between the variables, except between systolic and diastolic blood pressure.

Quantiles Approach
When dealing with sequential data, observations that are far from the median are often ignored.We argue that a patient's deteriorating condition often comes with a high or low level of measurement.Thus, these observations are essential as they report the point at which the patient's health status changes dramatically.We propose the notion of the "quantiles approach," in which we perform feature engineering by emphasizing the high and low quantiles of a patient sample.Figure 3 demonstrates the steps performed in the feature engineering pipeline of the quantiles approach.
First, for each patient sample, we extracted values of the 7 vital sign features.Second, for each vital sign feature within that patient sample, we calculated the mean and SD.Third, we XSL • FO RenderX normalized the observations within each vital sign feature using the probability density function, and by passing the mean and SD calculated in step 2 as parameters to that function.The blue histograms in Figures 4 and 5 show the distribution of each vital sign feature before normalization, and the red curves show the distribution after normalization.Fourth, we applied the percent point function (PPF) to each normalized vital sign feature to calculate two discrete values corresponding to the low and high values of that feature.The low values correspond to observations of the feature that are less than a given probability (the 25th percentile in our case) and the high values correspond to observations of the feature that are greater than or equal to a given probability (the 75th percentile in our case).Thus, for each vital sign feature, we calculated the values at which each percentage occurs.
Fifth, we used the calculated low and high values from step 4 to extract the observations of the vital sign features that occur in only the first and fourth quantiles (ie, we ignored the second and third quantiles).Sixth, we calculated the mean and SD of the extracted observations.In the remainder of the paper, we refer to these metrics as the modified mean and modified SD to distinguish from the original mean and SD calculated in step 2.
The final step is to calculate the quantile percentage for the vital sign feature by dividing the number of observations extracted in step 5 (ie, those that occur in the first and fourth quantiles) by the original number of observations (in all quantiles in the entire patient sample).Note that since we normalized the observations in the vital sign feature (step 3), the number of observations in the first and fourth quantiles will vary and will not always be 50% of the original observations.

Patient Use Case
To demonstrate the quantiles approach, we provide an example of a sample patient before and after applying the steps described above.Figures 4 and 5 show distributions of the 7 vital signs of the patient before (left) and (after) applying the quantiles approach.The shaded areas in Figures 4 and 5 show where the vital sign measurements are neglected.The right side of the figure shows the modified patient's observation after removing the values in the shaded area.After applying the change, the SD of the observation increased most of the time, whereas the mean (the green vertical line) did not change significantly.Table 4 shows an example of individual patient data before applying the quantiles approach.Table 5 demonstrates the features that were engineered from the original 7 vital sign measures for that patient sample.6 lists additional features that were engineered from the original 7 vital sign measures using the quantiles approach for the entire patient population.
Pearson correlation analysis among the means of vital signs samples after applying the quantiles approach (Table 7) showed that there was no significant difference compared to the baseline model (Table 3).This implies that the quantiles approach does not considerably change the correlation between the variables.

Inputs to the Baseline Approach Versus the Quantiles Approach
The models built using the baseline approach used 12 predictor variables (ie, 5 demographic attributes and 7 vital signs) (Table 2).The feature engineering step performed in the quantiles approach augmented the original set of vital sign features with 21 extra features (ie, 7 variables corresponding to the mean of each patient observation, 7 variables corresponding to the SD of each patient observation, and 7 variables corresponding to the quantile percentages).Thus, in addition to the original 12 XSL • FO RenderX variables used in the baseline, the models built through the quantiles approach used the 21 engineered features.

Models Applied
We used supervised learning techniques in both models for both variations because the model outputs were labeled accordingly.We split the dataset randomly into 75% as the training set (n=33,469 ICU stays) and 25% as the test set (n=11,157 ICU stays).To avoid overfitting, we used 10-fold cross-validation on the training set.We then trained both models using the training set and we validated the performance of both models using an unseen testing set.
We applied six commonly used ML algorithms for binary classification in both the mortality and LOS models: linear regression (LR), linear discriminant analysis, random forest (RF), k-nearest neighbors (kNN), support vector machine (SVM), and extreme gradient boosting (XGB).For the regression-based classification in the LOS model, we applied two ML algorithms to predict the number of days: MLR and support vector regression (SVR).
RF is an ensemble ML algorithm that generates bootstrapped samples from a dataset and uses the generated samples to construct several decision trees.Majority voting is then performed to decide the best classification of the generated samples.To avoid high correlation between the constructed trees, the algorithm uses a random subset of features to decide at each split point.This feature randomness increases the chances of having correct prediction results.Thus, one important parameter required by the algorithm is the number of features considered.In addition, choosing a high number of trees might increase the execution time with no considerable performance gain [21].Therefore, another important parameter is the number of decision trees needed to compose the RF.

Parameter Tuning for Mortality Classifiers
For the RF algorithm, we set the maximum number of features to consider for finding a good split to 4, and we set the estimated number of trees in an RF to 500.For SVM, we used the radial basis function as a kernel type and we set the penalty parameter of error C to 1.60.

Parameter Tuning for LOS Classifiers
For the RF algorithm, we set the maximum number of features to consider in finding a good split to 4. We also set the estimated number of trees in the RF to 400.For SVM, we used the radial basis function as a kernel type and we set the penalty parameter of error C to 0.90.

Model Calibration
To assess the goodness of fit in our models, we compared the accuracy on the test set and the mean accuracy of the trained model.We also used five metrics (accuracy, sensitivity, specificity, negative predictive value, and positive predictive value, along with corresponding 95% CIs) to validate the classification models on an unseen test set from the same population.We examined the difference in AUROC values between the test and training sets.Finally, we examined calibration across deciles using the sigmoid test supported with a visual inspection of calibration curves.

Mortality Prediction Model
Table 8 shows the performance of the mortality models on both the training and test sets using the baseline and the quantiles approach with the six different ML algorithms.
The RF algorithm achieved the highest accuracy (88.61%) in predicting mortality on the test set using the quantiles approach, followed by the XGB algorithm with an accuracy of 88.22%.All models showed high specificity and low sensitivity, indicating that our models performed very well at identifying patients who will survive but not the opposite.XGB showed the highest sensitivity rate (0.16), demonstrating the advantage of using the XGB algorithm to identify patients who will not survive.
We observed relatively low improvement in model accuracy from the baseline approach to the quantiles approach.This can be explained by the imbalanced classification problem in the mortality model (ie, a low mortality rate of 11.89%).Another possible reason is that the sample size was reduced after applying the quantiles approach, which might have misled the classifier.The original sample size (44,626 ICU stays considering only the first day in the ICU) dropped by almost by half since we included only the first and fourth quantiles for each patient observation.The algorithm uses the PPF function to return discrete values that are less than or equal to the given probability, and the best probabilities achieved in our case were at the 25th and 75th percentiles.We tried other probabilities, but due to the small sample size, varying the PPF percent did not have a significant improvement on the results.Figure 6 shows a visual comparison between the accuracy of the six ML algorithms in the mortality model using the quantiles approach.The box plots to the left show the model accuracy on the training set using 10-fold cross-validation and the graph on the right shows the one-time model accuracy on the testing set.The ROC curve is commonly used to evaluate the performance of an ML model by showing the relationship between the false-positive and true-positive rates.The AUROC metric can be used as a basis for comparison; higher values indicate that a model can identify classes using a specific ML algorithm better than another.In the case of the mortality model, the ROC curve shows the relationship between survival cases that scored as no survival and no survival cases that scored as no survival.
Table 9 shows the AUROC results of the mortality model on both the training and test sets using the baseline and quantile approaches for the different ML algorithms.Figure 7 shows the ROC curves for the six ML algorithms for both the baseline and the quantiles approach.XGB produced the highest AUROC (0.79) for predicting mortality on the test set using the quantiles approach (Table 9).

Binary Classification Algorithms
Table 10 shows the performance of the binary classification models for the LOS model on both the training set and test set using the baseline and the quantiles approaches with the six different ML algorithms.The best accuracy of predicting ICU LOS on the test set was 64.64% using the RF algorithm in the quantiles approach, followed by the SVM algorithm with an accuracy of 63.86%.The improvement in model accuracy from the baseline approach to the quantiles approach was better when compared with that found for the mortality model (Table 8).For example, the difference in accuracy between the baseline and the quantiles approach for the LOS model on the test set was 2.68% using RF and was 2.45% using SVM.The RF algorithm achieved the highest sensitivity (0.64), which indicates that the model using the RF algorithm can identify patients who will stay in the ICU for more than 2.64 days better than the other algorithms.SVM achieved the highest specificity (0.68), which indicates that the model using the SVM algorithm is better at identifying patients who will stay in the ICU for 2.64 days or less compared with the other algorithms.Figure 8 shows a visual comparison of the accuracy of the six algorithms in the LOS model results using the quantiles approach.The box plots on the left show the model accuracy on the training set using 10-fold cross-validation and the graph on the right shows the one-time model accuracy on the testing set.
Table 11 shows the AUROC results of the LOS model on both the training and test sets using the baseline and the quantiles approach with the six ML algorithms.Figure 9 shows the ROC curves for the algorithms in the baseline approach and the quantiles approach, respectively.The RF algorithm using the quantiles approach produced the highest AUROC (0.697) for predicting the LOS on the test set (Table 11).

Regression-Based Classifiers
As for the regression-based classifiers of the LOS model, we report the error between the predicted values and actual values in the test set using both the mean absolute error (MAE) and the root mean squared error metrics.The minimum, mean, and maximum LOS for the entire population was 1, 2.64, and 173.07 days, respectively.Table 12 shows the error value (per day) using both error metrics for the LOS model.The lowest error value obtained was 2.81 days using the MAE in the SVR algorithm with the quantiles approach.

Principal Results
Our findings indicate that we can build prediction models for ICU LOS and mortality with better accuracy using a combination of ML and the quantiles approach including only vital signs.Little improvement in the accuracy of the mortality model was achieved, but improvement of approximately 2.7% was achieved in the LOS model using the proposed quantiles approach.We examined model calibration across deciles for all six algorithms in both models.Figure 10 shows the probability calibration curves of the mortality model using the six algorithms.The six plots show good calibration of the models, especially in the case of the RF algorithm.Figure 11 shows the probability calibration curves of the LOS model using the six algorithms.The six plots show good calibration of the models except for the kNN algorithm.One might argue that we included only the mean and not the SD of the vital signs in the baseline approach when the comparison was to a model including both the mean and SD in the quantiles approach.Both the baseline and the quantile approaches include the means of vital signs.The quantiles approach includes an extra 21 features corresponding to modified means and modified SDs of the original values in addition to the quantile percentages.Had we chosen to include both the mean and SD of the original vital sign observations in the baseline approach, we would have also needed to include the SD of the original vital sign observations in the quantiles approach.In this case, we do not expect that there will be a significant impact.
Moreover, based on the method of population selection, the same patient could be in the training as well as in the test set but for different ICU admissions at different time points.For this study, we considered unique ICU admissions as opposed to unique patient identifiers.The rationale for focusing on unique admissions is that we sought to predict mortality and LOS without having prior knowledge about a patient's medical conditions or diagnoses.

Qualitative Comparison With Other Approaches
For the mortality model, we were able to achieve approximately 89% accuracy and an AUROC of 0.78 using only 7 vital sign features and 4 demographic attributes, along with 21 features engineered from the original features.Other researchers have used excessively more features to achieve similar or better accuracies.For instance, Johnson et al [12] used a total of 148 features to achieve an AUROC of 0.92.Lehman et al [14] applied the SAPS-I algorithm on commonly used physiological data to predict mortality and achieved an AUROC of 0.72.
Johnson et al [13] used a range of features, including standard statistical descriptors, to achieve an AUROC of 0.86.
For LOS models, most researchers used an exhaustive list of features to achieve higher accuracy in their models, but they did not report on whether they had balanced classification problems.For example, Harutyunyan et al [16] achieved 84% accuracy using 17 clinical variables and by considering a target ICU LOS of 7 days.Gentimis et al [17] achieved 80% accuracy using several inputs from seven tables to build the LOS model with a target ICU stay of 5 days.Bertsimas et al [18] used several static and dynamic variables, and achieved accuracy in the >80% range.In our approach, we built balanced classification models (using the median LOS of the entire population) with minimal features.These two conditions made it harder to achieve high accuracy, which reached only 65% in the LOS model.
One contribution of our method is the unique combination of ML with the quantiles approach.Other researchers have used various techniques to assess a patient's deteriorating conditions.Tyler et al [15] found that the methods to normalize patients' abnormal values are not thoroughly correct and might affect the results negatively.Other researchers relied on scoring systems (eg, centile-based early warning score, National Early Warning Score, or SAPS) to estimate or recognize patients' deteriorating conditions.We avoided relying on existing early warning scoring systems since they vary from patient to patient, which may lead to uncertain results.

Sensitivity Analysis
Since we considered unique ICU stays rather than individual patients, the training set/testing set split was not performed at the patient level.This might raise the concern that the vital signs and LOS measured at different ICU visits for the same patient could be highly correlated.Thus, the mortality and the LOS models might risk overestimation in predictive performance.We mitigated this effect by performing a sensitivity analysis to compare the results of the models after excluding patient overlap to the results of the original model with the overlap included.In the original model, the population size was 44,626 (corresponding to ICU stays), the training set size was 33,469 ICU stays (75% of the population), and the test set size was 11,157 ICU stays (25% of the population).
The patient overlap between the training and test sets was 3886 ICU stays (34.83% of the test set).The number of ICU stays remaining in the test set after removing the patient overlap (ie, 3886) reduced to 7271 (65.17% of the original test set of size 11,157).Table 13 shows the results of the mortality model after removing the overlap and Table 14 shows the results of the LOS model after removing the overlap.There were no significant changes compared to the model results shown in Table 8 and Table 10, respectively.The total number of ICU stays was 44,626 and the total number of patients was 33,466.We calculated the frequency of ICU stays for the entire patient population.We found that 80% of the population visited the ICU only once and 20% visited the ICU more than once.Moreover, the MIMIC database includes data for patients who might have stayed in different ICU types (eg, general, cardiac) and due to different health conditions.In addition, a patient might have visited one ICU more frequently than another, and the time period between consecutive visits within a single ICU might be several years.The sensitivity analysis findings in our case might be due to the fact that our approach focused on the visits rather than the patients and ignored the details mentioned above.

Limitations
Admittedly, this study lacks quantitative comparisons with previous research on the same topic owing to substantial differences between the research questions tackled previously, and the associated data extraction pipelines and assumptions.We mitigated this limitation by providing a qualitative comparison between our models and previous models.
Previous research based on data from the MIMIC database likely demonstrated higher accuracy since excessively more features were used than applied in this study.We believe that it is difficult to achieve high model accuracy using a limited number of features.
Additionally, as in any ML-based method, our approach might have some limitations.In this study, we used the MIMC database, which represents a patient population from a single hospital in Boston, and does not generalize to other populations or hospital systems in other areas across the United States or the rest of the world.Future research will focus on applying our approach to other patient populations.
Moreover, we ran the models using only the vital signs to measure the impact of the demographic attributes.We found that the effects of demographic attributes on the results were low.For example, age did not have a considerable effect since we were only using adult patient data in the MIMIC database.The accuracy of the mortality model without the age feature using RF in the quantiles approach was 88.536%, which is very close to the model result obtained when including age.The mortality model achieved an AUROC of 0.77 without using age and 0.78 with age included.The accuracy of the LOS model without including the age feature using RF and the quantiles approach was 64.39%, which is very close to the result obtained with the age feature included.Table 15 also shows that the differences in AUROC and positive predictive value were not significant between the mortality and LOS models both including and excluding the age feature using the RF algorithm and the quantiles approach.This would be different in pediatrics and adolescent populations, for whom vital measurements are more age-sensitive.In addition, in the MIMIC database, the ages for patients older than 89 years are not accurate; we used 90 years as a dummy value for all of these patients.Another potential reason for the low impact of including the demographic attributes is the lack of variation in height due to missing values that had to be imputed using the population mean.

Clinical Implications
Health professionals (ie, physicians, nurses, ICU specialists) can benefit from the advanced accurate predictive capabilities of the intelligent ICU patient monitoring module to help make better decisions regarding major challenges in health care, including bed management, patient flow, stock management, and effective provision of medical supplies.Poor bed management may result in the rejection of new patients, and a reduction in hospital revenue and overall quality of health services [22].Patient flow involves making decisions regarding admissions, transfers, and referrals.Hospital administration needs solutions that enable reducing waste and wait times, and to increase service efficiency and productivity.Such tools need to consider the uncertainty of patients' recovery status.Poor stock management results in resource shortage or expiration, especially in the ICU where care should be delivered promptly.Thus, integrating the predictive functionalities of the intelligent ICU patient monitoring module within existing decision support platforms and clinical workflows may have several practical implications for improving the quality of care and reducing costs.

Conclusions
In this article, we proposed a novel approach for predictive modeling with reasonable performance based on a combination of ML algorithms and the quantiles approach that utilizes only vital signs available in the patient's profile without having to use external features.Using this quantiles approach, we engineered additional features by calculating the modified means, SDs, and quantile percentages from the baseline vital sign measures, which provided us with a richer dataset to achieve better predictive power in our models.We applied our approach to build two prediction models: one for mortality prediction and another for ICU LOS.Although the accuracy of the mortality model showed minimal improvement, we achieved better results in the LOS model by around 2.7%.
Intelligent ICU patient monitoring is a promising solution that will improve clinical workflows and enable hospitals to deliver higher-quality, cost-effective patient care, and to improve the overall quality of medical services in the ICU.The solution will support ICUs to put steps ahead and "nudge" health care providers to prepare for unexpected general health conditions of patients and better manage ICU facilities [23].By relying on a minimal set of features that can be continuously collected from both inside and outside hospital systems and without requiring sophisticated medical devices, our predictive models can be used in cloud-based IRPM systems (see Exhibit X [24], a short video demonstrating the tool in action).
Relying on fewer features will be more feasible for realizing ML algorithms in real-world settings.Future directions of this research will involve adding more predictive modeling capabilities to the intelligent ICU patient monitoring module, including ICU readmission, severity level, and next-day patient vital sign measurements.We are currently working on applying this approach to a wider range of hospital systems within different geographic locations.Integrating intelligent ICU patient monitoring within existing clinical workflows and decision support platforms can support many hospitals in improving the quality of care and reducing costs.

Figure 2 .
Figure 2. Data extraction pipeline from the Medical Information Mart for Intensive Care (MIMIC) database.ICU: intensive care unit.

Figure 3 .
Figure 3. Feature engineering pipeline in the quantiles approach.MIMIC: Medical Information Mart for Intensive Care; ICU: intensive care unit; PDF: probability density function; PPF: percent point function.

Figure 4 .Figure 5 .
Figure 4. Distribution of a sample patient observation before and after applying the quantiles approach.

Figure 6 .
Figure 6.Comparison of the mortality model results using the quantiles approach on the training set (left) and the test set (right).LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbor; SVM: support vector machine; XGB: extreme gradient boosting.

Figure 7 .
Figure 7.Comparison of receiver operating characteristic curves in the mortality model using the baseline (left) and the quantiles approach (right).LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbour; SVM: support vector machine; XGB: extreme gradient boosting.

Figure 8 .
Figure 8.Comparison of the length of stay model results using the quantiles approach on the training set (left) and the test set (right).LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbor; SVM: support vector machine; XGB: extreme gradient boosting.
b LDA: linear discriminant analysis.c RF: random forest.
e SVM: support vector machine.

Figure 9 .
Figure 9.Comparison of receiver operating characteristic curves in the length of stay model using the baseline (left) and quantiles (right) approaches.LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbor; SVM: support vector machine; XGB: extreme gradient boosting.

Figure 10 .
Figure 10.Probability calibration curves of the mortality model for the six classification algorithms.LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbor; SVM: support vector machine; XGB: extreme gradient boosting.

Figure 11 .
Figure 11.Probability calibration curves of the length of stay model for the six classification algorithms.LR: logistic regression; LDA: linear discriminant analysis; RF: random forest; KNN: k-nearest neighbor; SVM: support vector machine; XGB: extreme gradient boosting.

Table 1 .
Descriptive statistics for outcome variables in the two models.

Table 3 .
Pearson correlation coefficients among vital signs of the baseline model.

Table 4 .
Sample data from an individual patient before applying the quantiles approach.

Table 5 .
Sample of patient data from after applying the quantiles approach.

Table 6 .
Vital sign data after applying the quantiles approach for the entire patient population.

Table 7 .
Pearson correlation coefficients among the mean vital signs for a sample patient using the statistical model.

Table 8 .
Mortality model results for six algorithms using different performance metrics.
a NPV: negative predictive value.b PPV: positive predictive value.c LR: logistic regression.d LDA: linear discriminant analysis.e RF: random forest.f kNN: k-nearest neighbor.g SVM: support vector machine.

Table 9 .
Mortality model performance based on area under the receiver operating characteristic curve (AUROC).
b LDA: linear discriminant analysis.c RF: random forest.d kNN: k-nearest neighbor.e SVM: support vector machine.

Table 10 .
Length of stay model results for six algorithms using different performance metrics.
b PPV: positive predictive value.c LR: logistic regression.d LDA: linear discriminant analysis.e RF: random forest.f kNN: k-nearest neighbor.g SVM: support vector machine.h XGB: extreme gradient boosting.

Table 11 .
Performance of the length of stay model results based on the area under the receiver operating characteristic curve (AUROC).
a LR: logistic regression.

Table 12 .
Regression error values of the length of stay model using the baseline and quantile approaches.
b RMSE: root mean square error.cMLR: multivariate linear regression.d SVR: support vector regression.

Table 13 .
Mortality model results for six algorithms using different performance metrics.
b PPV: positive predictive value.c LR: logistic regression.d LDA: linear discriminant analysis.e RF: random forest.f kNN: k-nearest neighbor.g SVM: support vector machine.h XGB: extreme gradient boosting.

Table 14 .
Length of stay model results for six algorithms using different performance metrics.
a NPV: negative predictive value.b PPV: positive predictive value.c LR: logistic regression.d LDA: linear discriminant analysis.e RF: random forest.f kNN: k-nearest neighbor.g SVM: support vector machine.h XGB: extreme gradient boosting.

Table 15 .
Model results including and excluding the age feature.
a AUROC: area under the receiver operating characteristic curve.bPPV: positive predictive value.