This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Predictions in pregnancy care are complex because of interactions among multiple factors. Hence, pregnancy outcomes are not easily predicted by a single predictor using only one algorithm or modeling method.
This study aims to review and compare the predictive performances between logistic regression (LR) and other machine learning algorithms for developing or validating a multivariable prognostic prediction model for pregnancy care to inform clinicians’ decision making.
Research articles from MEDLINE, Scopus, Web of Science, and Google Scholar were reviewed following several guidelines for a prognostic prediction study, including a risk of bias (ROB) assessment. We report the results based on the PRISMA (Preferred Reporting Items for Systematic Reviews and MetaAnalyses) guidelines. Studies were primarily framed as PICOTS (population, index, comparator, outcomes, timing, and setting): Population: men or women in procreative management, pregnant women, and fetuses or newborns; Index: multivariable prognostic prediction models using nonLR algorithms for risk classification to inform clinicians’ decision making; Comparator: the models applying an LR; Outcomes: pregnancyrelated outcomes of procreation or pregnancy outcomes for pregnant women and fetuses or newborns; Timing: pre, inter, and peripregnancy periods (predictors), at the pregnancy, delivery, and either puerperal or neonatal period (outcome), and either short or longterm prognoses (time interval); and Setting: primary care or hospital. The results were synthesized by reporting study characteristics and ROBs and by random effects modeling of the difference of the logit area under the receiver operating characteristic curve of each nonLR model compared with the LR model for the same pregnancy outcomes. We also reported betweenstudy heterogeneity by using
Of the 2093 records, we included 142 studies for the systematic review and 62 studies for a metaanalysis. Most prediction models used LR (92/142, 64.8%) and artificial neural networks (20/142, 14.1%) among nonLR algorithms. Only 16.9% (24/142) of studies had a low ROB. A total of 2 nonLR algorithms from low ROB studies significantly outperformed LR. The first algorithm was a random forest for preterm delivery (logit AUROC 2.51, 95% CI 1.493.53;
Prediction models with the best performances across studies were not necessarily those that used LR but also used random forest and gradient boosting that also performed well. We recommend a reanalysis of existing LR models for several pregnancy outcomes by comparing them with those algorithms that apply standard guidelines.
PROSPERO (International Prospective Register of Systematic Reviews) CRD42019136106; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=136106
Pregnancy is a common health condition that requires longterm rigorous care to anticipate adverse outcomes. Most pregnancy outcomes are identified after delivery; however, these are results of interactions among multiple factors occurring for many weeks beforehand. The number of factors and their interactions along with the time intervals make predictions of pregnancy outcomes very complicated. Multiple or multivariable logistic regression (LR) is widely used to deal with similar multifactorial problems in health outcome research [
Despite improvements in maternal and neonatal mortality, conditions still differ between developing and developed countries or regions [
Machine learning algorithms have long been applied for clinical prediction purposes. A support vector machine demonstrated a summary of receiver operating characteristics (ROCs) of >90% for breast cancer prognostic prediction [
This review will allow investigators and clinicians in pregnancy care to consider the development or application of prediction models throughout the pregnancy period. This review demonstrates which algorithms have shown robust predictive performances for a particular pregnancy outcome using a similar set of predictors. Investigators in pregnancy care may also consider whether a reanalysis by another predictive algorithm is needed by using existing data previously analyzed by an algorithm including LR. Beyond the algorithm issue, the development of machine learning models also requires an adequate methodology and interpretable results [
By applying the standard guidelines, we aim to review machine learning models and compare their predictive performances between LRs and other machine learning algorithms. In this review, we focus on machine learning models either developed or validated for making prognostic predictions in pregnancy care intended to inform clinicians’ decision making.
We reported this study based on PRISMA (Preferred Reporting Items for Systematic Reviews and MetaAnalyses) guidelines [
Before defining the eligibility criteria, we decided to view the LR as one of many algorithms in the machine learning field with respect to its use in statistics and data science. A prediction model development consisted of several elements: predictor selection, parameter fitting, and hyperparameter optimization [
By focusing on prediction algorithms, we defined eligibility criteria to screen studies by the title, abstract, and full text. We also assessed the applicability by examining the full text. These were the candidates we selected for the qualitative analysis. Key items of population, index, comparator, outcomes, timing, setting (PICOTS) [
Population: men or women in procreative management, pregnant women, and fetuses or newborns.
Index: multivariable prognostic prediction models applying nonLR algorithms for risk classification tasks intended to inform clinicians’ decision making.
Comparator: multivariable prognostic predictions applying an LR algorithm, excluding a scoring system in which the parameters determined by humans instead of using LR, for risk classification tasks intended to inform clinicians’ decision making.
Outcomes: pregnancyrelated outcomes of procreative management or pregnancy outcomes for pregnant women or fetuses or newborns.
Timing: with predictors being measured at the pre, inter, and peripregnancy periods and outcomes being assessed at the pregnancy, delivery, and either puerperal or neonatal period, short and longterm prognoses were applied.
Setting: primary care or hospital.
Additional items were the availability of several reporting components as required by TRIPOD and MLPBIOM. These components included (1) data sources, (2) outcomes, (3) evaluation metrics, (4) predictors, (5) descriptive statistics, (6) event sample sizes, (7) modeling methods or algorithms, and (8) model validation.
After briefly screening studies by eligibility criteria, we conducted an applicability assessment by thoroughly examining the full texts. Using PROBAST guidelines, we assessed the applicability according to the review question framed by PICOTS. Low, high, or unclear criteria were determined for applicable, not applicable, or unclear applicability, respectively. The assessment covered 3 domains of participants, predictors, and outcomes. Only those fulfilling
For the quantitative analysis, studies had to report the AUROC. Studies were selected from those applicable for the qualitative analysis. If there were at least three LR models and a nonLR model from any studies for an outcome, all studies with that outcome were included in the metaanalysis. This was determined based on the requirement of a minimum number of data points to calculate the variance as part of the metaanalytical procedure. If studies did not report the AUROC, we estimated the sensitivity and specificity using the trapezoidal rule (see
We searched the MEDLINE, Scopus, Web of Science, and Google Scholar databases up to May 2020. There was no limit on the publication period. However, considering the limitations of the search interface in Google Scholar, we only retrieved results from the last year with keywords in the abstract or the entire period with those keywords in the title. We also limited the publication period to the last 10 years for search results by keywords including “logistic regression multivariable prediction.” This was because we estimated that there would be enormous amounts of studies applying LR because we applied a broad range of outcomes in this study. In contrast, we might lack studies using other machine learning models, although the outcomes were broad.
The initial search filter was limited to the title, abstract, keywords, or Medical Subject Heading (MeSH; MEDLINE only) using “machine learning” AND pregnancy. We also used “machine learning AND ([pregnancy outcome from initial search] NOT pregnancy).” Keywords for pregnancy outcomes were used based on MeSH to generalize a variety of terms for pregnancy outcomes from selected studies. If the MeSH term contained “pregnancy,” then we used the alternative entry terms in the webpage recorded for this MeSH term. If all entry terms also contained “pregnancy,” then we used the term without negating “pregnancy.” In addition, we also substituted the “machine learning” part with one of the keywords consisting of “decision tree,” “artificial neural network,” “support vector machine,” “random forest,” “artificial intelligence,” “deep learning,” and “logistic regression multivariable prediction.” All keywords are described in
Duplicate records from multiple databases were removed. We refined the search results in the title or abstract using EndNote X8 (Clarivate Analytics) by “(supervised NOT unsupervised) OR prediction OR classification.” Records were screened by HS and AH, and the results were assessed by HS, AH, YC, CK, OS, TY, and YW. Disagreements were resolved by discussion with the last author (ES). Study selection was conducted in brief and thorough assessments. These brief assessments were intended to select studies by checking eligibility criteria from TRIPOD and MLPBIOM in the title, abstract, and briefly in the fulltext article. A thorough assessment of the applicability from PROBAST was conducted later before the ROB assessment.
We extracted data based on the CHARMS checklist, which includes (1) outcomes, (2) study design, (3) data sources, (4) data source design, (5) setting, (6) type of study, and (7) modeling methods or algorithms, and (8) predictive performance. Outcomes were pooled as distinct MeSH terms. Study and data source designs were classified into prospective, retrospective, nested casecontrol, casecontrol, and crosssectional. We defined the type of study based on the model validation, which might be development, validation, or both. Eligible studies were described as developing prediction models by applying LR, nonLR, or both algorithms. Predictive performances were only taken from studies that were eligible for the metaanalysis (see
We used PROBAST to assess the ROB [
We compared AUROCs from studies that reported this metric. Logit transformation was applied to the AUROCs. We computed logit AUROC differences between each nonLR and LR algorithm across studies. Summary measures from any eligible studies with all, low, or high ROB were pooled by random effects modeling, as previously described [
Pooled estimates of pairwise differences in logit AUROCs were described by points and the 95% CI [
If a study did not report the AUROC, we estimated this metric based on sensitivity and specificity. As a specificity of 0% means a sensitivity of 100% and
AUROC = 0.5 × (1 − specificity) × sensitivity + specificity × sensitivity + 0.5 × (1 − sensitivity) × specificity
Logit(AUROC) = log (AUROC / (1 – AUROC))
We used RStudio 1.2 (RStudio) with R 3.6.1 and an additional package, metafor 2.4.0, for random effects modeling. We applied the restricted maximum likelihood estimator method [
We described the characteristics of the studies consisting of population, study design, timing, and setting. This was described as the number of algorithms used for prediction modeling. The algorithms were categorized into LR, nonLR, or both algorithms. We also show the proportion of each characteristic compared with all characteristics within the same algorithm category.
ROBs within studies were described for the number of low, high, or unclear ROB studies. This was reported for overall assessment results and by domain in studies that used LR, nonLR, or both algorithms. ROBs across studies were described for the proportion of studies in which the answer to each signaling question led to low, high, or unclear ROB studies. We intended to show what makes most studies considered to have high ROBs.
Metaanalytical results were described by a forest plot faceted by outcome. Each facet showed comparisons of differences in logit AUROCs for each random effects model of nonLR versus LR algorithms. This demonstrated which algorithms tended to outperform LR for each pregnancy outcome. Comparisons that included nonLR high ROB studies were color coded. The best predictive performance for each outcome was reported. Betweenstudy heterogeneity for each random effects model was also reported.
We described predictors in the prediction models from studies in the metaanalysis. For each outcome in the metaanalysis, we selected only random effects models in which an algorithm significantly outperformed the other. This was determined by the 95% CI of the difference in logit AUROCs between a nonLR and an LR model for an outcome. If any, we only selected those that included only nonLR low ROB studies. Only predictors in the final model were included. This was intended to elucidate predictoroutcome interactions that characterized an algorithm if it outperformed the others for a particular outcome.
We found 2093 records from 4 literature databases (
Study selection workflow.
Briefly, we collected studies that either developed or validated a prediction model applying either LR (77/142, 54.2%) [
Only a few studies had prediction timing up to the puerperal or neonatal period for LR (2/77, 3%) [
Characteristics of eligible studies.
Variable  Number of studies (percentage based on column total)  

LR^{a} (n=77), n (%)  NonLR (n=50), n (%)  Both (n=15), n (%)  Total (n=142), n (%)  



Pregnant women  50 (65)  11 (22)  6 (40)  67 (47.2) 

Fetuses or newborns  19 (25)  26 (52)  7 (47)  52 (36.6) 

Men or women in procreative management  8 (10)  13 (26)  2 (13)  23 (16.2) 



Retrospective  53 (69)  27 (54)  9 (60)  89 (62.7) 

Nested casecontrol  4 (5)  14 (28)  2 (13)  20 (14.1) 

Prospective  13 (17)  4 (8)  0 (0)  17 (12) 

Crosssectional  3 (4)  3 (6)  3 (20)  9 (6.3) 

Casecontrol  4 (5)  2 (4)  1 (7)  7 (4.9) 



At delivery  28 (36)  26 (52)  7 (46.7)  61 (42.9) 

At pregnancy  34 (44)  21 (42)  5 (33.3)  60 (42.3) 

Mixed timing  13 (17)  0 (0)  1 (6.7)  14 (9.9) 

Puerperal or neonatal period  2 (3)  3 (6)  2 (13.3)  7 (4.9) 



Hospital  61 (79)  43 (86)  9 (60)  113 (79.6) 

Both  10 (13)  1 (2)  5 (33)  16 (11.3) 

Primary care  6 (8)  6 (12)  1 (7)  13 (9.2) 
^{a}LR: logistic regression.
Most studies applied an LR (92/142, 64.8%) to develop a prediction model (
The characteristics of study populations showed that pregnant women and fetuses or newborns were the populations of most studies developed using LR and nonLR models, respectively. Among pregnant women, the LR algorithm was mostly applied to develop predictions for outcome categories of obstetric labor (13/77, 17%) [
Machine learning algorithm and category of outcome.
Variable  Number of studies (percentage based on column total)  

LR^{a} (n=77), n (%)  NonLR (n=50), n (%)  Both (n=15), n (%)  Total (n=142), n (%)  



Logistic regression  77 (100)  N/A^{b}  15 (100)  92 (64.8) 

Artificial neural network  N/A  15 (30)  5 (33)  20 (14.1) 

Support vector machine  N/A  9 (18)  1 (7)  10 (7.0) 

Deep neural network  N/A  8 (16)  1 (7)  9 (6.3) 

Random forest  N/A  7 (14)  1 (7)  8 (5.6) 

Decision tree  N/A  2 (4)  5 (33)  7 (4.9) 

Gradient boosting  N/A  3 (6)  2 (13)  5 (3.5) 

Naïve Bayes  N/A  4 (8)  0 (0)  4 (2.8) 

Ensemble of algorithms  N/A  2 (4)  0 (0)  2 (1.4) 



Premature birth  9 (12)  12 (24)  3 (20)  24 (16.9) 

In vitro fertilization  7 (9)  13 (26)  2 (13)  22 (15.5) 

Obstetric labor  13 (17)  1 (2)  2 (13)  16 (11.3) 

Pregnancyinduced hypertension  12 (16)  4 (8)  0 (0)  16 (11.3) 

Fetal distress  1 (1)  9 (18)  0 (0)  10 (7.0) 

Gestational diabetes  7 (9)  2 (4)  1 (7)  10 (7.0) 

Cesarean section  4 (5)  3 (6)  2 (13)  9 (6.3) 

Fetal development  4 (5)  1 (2)  0 (0)  5 (3.5) 

Smallforgestationalage infant  3 (4)  1 (2)  1 (7)  5 (3.5) 

Others  17 (22)  4 (8)  4 (27)  25 (17.6) 
^{a}LR: logistic regression.
^{b}N/A: not applicable.
ROB is described for each eligible study in
Risk of bias within studies.
Assessment by domain  Studies by algorithm  

LR^{a} (n=77), n (%)  NonLR (n=50), n (%)  Both (n=15), n (%)  Total (n=142), n (%)  



Low  60 (78)  44 (88)  11 (73)  115 (80.9) 

High  15 (19)  3 (6)  4 (27)  22 (15.5) 

Unclear  2 (3)  3 (6)  0 (0)  5 (3.5) 



Low  54 (70)  43 (86)  12 (80)  109 (76.8) 

High  20 (26)  5 (10)  1 (7)  26 (18.3) 

Unclear  3 (4)  2 (4)  2 (13)  7 (4.9) 



Low  51 (66)  40 (80)  11 (74)  102 (71.8) 

High  24 (31)  4 (8)  2 (13)  30 (21.1) 

Unclear  2 (3)  6 (12)  2 (13)  10 (7.1) 



Low  8 (10)  15 (30)  3 (20)  26 (18.3) 

High  69 (90)  35 (70)  12 (80)  116 (81.7) 

Unclear  0 (0)  0 (0)  0 (0)  0 (0.0) 



Low  7 (9)  14 (28)  3 (20)  24 (16.9) 

High  70 (91)  35 (70)  12 (80)  117 (82.4) 

Unclear  0 (0)  1 (2)  0 (0)  1 (0.7) 
^{a}LR: logistic regression.
ROB is also described across the studies in
Signaling questions with respect to ROB domains across studies. Bars from low/high/unclear ROB are stacked to be 100%. Domains are described on the righthand side. The number on the bar is the number of low ROB studies (total LR/nonLR/both at top) based on a single signaling question summarized as a term on the lefthand side. LR: logistic regression; ROB: risk of bias.
There were 62 studies in the metaanalysis that had outcomes that were predicted by at least one nonLR and 3 LR models (see
Forest plot of random effects models for differences in logit AUROCs from a nonLR with any LR prediction models. Plots were grouped by outcome. The lines indicate the 95% CI with diamonds whose sizes were determined by the number of pairwise comparisons (k). Absolute and relative values of betweenstudy heterogeneities are denoted by τ2 and I2, respectively. Colors of the boxes and lines were determined based on the existence of high ROB studies among those using nonLR algorithms. ANN: artificial neural network; AUROC: area under the receiver operating characteristic curve; DNN: deep neural network; DT: decision tree; Ens: ensemble of multiple algorithms; GB: gradient boosting; LR: logistic regression; NB: naïve Bayes; RF: random forest; ROB: risk of bias; SVM: support vector machine.
To determine the final random effects model for each comparison, we identified studies that were responsible as the source of heterogeneity and removed those AUROCs from the random effects model. We excluded a nonLR [
The nonLR models significantly outperformed the LR models in preterm delivery (4/5 nonLR models), CS (3/5 nonLR models), preeclampsia (1/2 nonLR models), and gestational diabetes (2/3 nonLR models). From those that examined preterm delivery, a prediction model did not include a nonLR high ROB study [
In contrast, a prediction model using a nonLR algorithm significantly underperformed compared with those using an LR in a random effects model (−0.85, 95% CI −1.19 to −0.52). This applied an artificial neural network to predict vaginal birth after a CS [
A random effects model developed for comparison of artificial neural networks and LR to predict preterm delivery had the highest heterogeneity by
However,
A random effects model developed for comparison of random forests and LR to predict ongoing pregnancy had the highest absolute value of heterogeneity (
For the random effects model with the lowest
In addition, we may need to know the
A random effects model was selected for each outcome except for ongoing pregnancy, which fulfilled our criteria to describe the predictors. For each outcome in the metaanalysis, we selected random effects models in which either a nonLR algorithm significantly outperformed the LR or it was significantly underperformed by the LR. This was determined by the 95% CI of the difference in the logit AUROCs between the nonLR and LR models for an outcome. If any, we only selected those including only nonLR low ROB studies. The random effects models were random forest versus LR for preterm delivery, gradient boosting versus LR for CS, random forest versus LR for preeclampsia, gradient boosting versus LR for gestational diabetes, and artificial neural network versus LR for vaginal birth after a CS. As we only extracted the AUROC of either the best LR or nonLR model, only predictors and outcomes of that model were considered if there were multiple models for different subtypes of the outcome in a study.
For preterm delivery, Despotovic et al [
For CS, Saleem et al [
For preeclampsia, Sufriyana et al [
For gestational diabetes, Artzi et al [
For vaginal birth after a CS, Macones et al [
Of the 2093 records from 4 literature databases using 144 keywords, we found 142 eligible studies, among which 24 had a low ROB. These eligible studies developed prediction models for outcome categories of premature birth,
There were 4 models with nonLR algorithms from low ROB studies that had significantly higher differences in logit AUROCs than those with LR algorithms. The models used random forest algorithms to predict preterm delivery (2.51, 95% CI 1.493.53), gradient boosting algorithms to predict CS (2.26, 95% CI 1.393.13), random forest algorithms to predict preeclampsia (1.2, 95% CI 0.721.67), and gradient boosting algorithms to predict gestational diabetes (1.03, 95% CI 0.691.37). The first model that applied a random forest used only EHG records to predict preterm delivery. The second random forest model used only maternal demographics and medical histories but excluded obstetric ones for preeclampsia prediction. Meanwhile, the first model that applied a gradient boosting algorithm used only CTG records to predict CSs. The last model was developed by applying a gradient boosting algorithm for gestational diabetes. This model used maternal demographics, medical histories, obstetric histories, clinical or obstetric examinations, routine laboratory tests, and medications.
We compared our systematic review and metaanalysis with prior works related to either machine learning algorithms or pregnancy outcomes similar to those in our study. A recent paper described applications of artificial intelligence in obstetrics and gynecology [
Nevertheless, the predicted outcomes by nonLR models in our review were still insufficient. Diseases that cause maternal deaths should receive higher priority than those causing neonatal deaths. The risks were higher for pregnant women with antepartum hemorrhage (incidence rate ratio [IRR]=3.5, 95% CI 2.06.1) or hypertension (IRR=1.5, 95% CI 1.12.2) compared with those without these diseases [
LR was found in our study to be the most often used algorithm to develop a prediction model in pregnancy care, including predicted outcomes that caused the most maternal deaths, followed by artificial (shallow) neural networks, support vector machines, and deep neural networks. These corresponded to a systematic review and metaanalysis [
Similar to a previous study [
We hold a particular assumption to determine whether interaction of predictors and outcome may be best predicted by a prediction algorithm. If the same predictors and outcomes were used by the best prediction algorithm applied in either nonLR or LR models but not used by the other outcomes in this metaanalysis, then the prediction algorithm may be the best for the pregnancy outcome using those predictors. To predict preterm delivery with predictors that included EHG in either nonLR or LR models [
Interestingly, the random forest significantly outperformed the LR for almost all of the pregnancy outcomes included in the metaanalysis. Although the gradient boosting algorithm significantly outperformed the LR for CS and gestational diabetes instead of the random forest, gradient boosting also uses multiple decision trees as in the random forest. For ongoing pregnancy predictions in
Comparing differences in AUROCs and focusing on multiple prediction algorithms, a study with individual participant data also compared LR and nonLR algorithms, particularly Poisson regression, random forest, gradient boosting, and an ensemble of a random forest with either LR or support vector machine [
Massive evaluation of 179 algorithms from 17 machine learning families was conducted using 121 data sets [
Of the pregnancy outcomes predicted by nonLR algorithms in this review, most outcomes were
For nonLR algorithms, the lack of shared data sets may have been the reason for few prediction studies for maternal outcomes compared with those for neonatal outcomes in this systematic review. Meanwhile, pregnancyinduced hypertension was found in pregnant women of newborns who were born prematurely [
In addition, sample sizes of data sets for model development may contribute to bias in predictive performance. For example, in our metaanalysis, prediction models of ongoing pregnancy in
A metaanalysis of multivariable LR was also previously conducted for premature birth from 4 studies [
Minimizing the bias of model performance is the first thing to consider when developing a clinical prediction model. Several concerns need to be addressed when developing prognostic machine learning predictions of pregnancy care. In our review, most studies had problems of insufficient EPV (either LR and nonLR studies), single imputation (mostly LR studies), and no assessment of calibration (mostly nonLR studies). This may expose the studies to high ROBs [
Our systematic review and metaanalysis will allow investigators or clinicians in pregnancy care to consider whether trying multiple machine learning models provides benefit to their studies. If more prediction models are needed for the outcomes with more specific problems or subpopulations, then predictive modeling may consider comparisons of LR and nonLR algorithms for specific outcomes that were compared in our metaanalysis. We also reported heterogeneity measures to interpret the predictive performances of algorithms across studies.
However, the diverse populations and hyperparameters caused substantial heterogeneity of predictive performance in our metaanalysis. Future metaanalyses will be needed if more machine learning models are developed for the same outcome using the same algorithm. However, we tried to minimize the heterogeneity by excluding several studies to ensure more homogenous outcome definitions and normally distributed AUROCs. We also applied random effects modeling as recommended [
Prediction models using nonLR machine learning algorithms significantly outperformed those using LR for several pregnancy outcomes. These nonLR algorithms were random forests for predicting preterm delivery and preeclampsia and gradient boosting for predicting CS and gestational diabetes. In our review, studies that developed models using these algorithms had low ROBs. For predicting ongoing pregnancy in
On the basis of our metaanalysis, we recommend comparing multiple machine learning models, which include both LR and nonLR algorithms, to develop a prediction model. In our systematic review, we also found that many studies had high ROBs in the domain of analysis. In this domain, many studies lacked EPV to develop a prediction model. Hence, we also recommend the future development of a prediction model to pursue standard EPV and other standards based on guidelines to minimize ROBs.
Details on forest plots, search filter, eligibility criteria, study selection, list of reviewed studies, risk of bias assessment, signaling questions and the answers, predictive performance and sample size, R code for metaanalysis, and records of studies.
area under the receiver operating characteristic curve
checklist for critical appraisal and data extraction for systematic reviews of prediction modeling studies
cesarean section
cardiotocogram
electrohysterogram
events per variable
incidence rate ratio
logistic regression
Medical Subject Heading
guidelines for developing and reporting machine learning predictive models in biomedical research
odds ratio
prediction interval
population, index, comparator, outcomes, timing, and setting
Preferred Reporting Items for Systematic Reviews and MetaAnalyses
prediction model risk of bias assessment tool
risk of bias
receiver operating characteristic
transparent reporting of a multivariable prediction model for individual prognosis or diagnosis
This study was funded by the Ministry of Science and Technology of Taiwan under grant number MOST1082221E038018 and and MOST1092221E038018 to ES. The sponsor had no role in the research design or contents of the manuscript for publication.
None declared.