Published on in Vol 10, No 6 (2022): June

Preprints (earlier versions) of this paper are available at, first published .
The Prediction of Preterm Birth Using Time-Series Technology-Based Machine Learning: Retrospective Cohort Study

The Prediction of Preterm Birth Using Time-Series Technology-Based Machine Learning: Retrospective Cohort Study

The Prediction of Preterm Birth Using Time-Series Technology-Based Machine Learning: Retrospective Cohort Study

Original Paper

1Hangzhou Normal University, Hangzhou, China

2Department of Obstetrics and Gynecology, Hangzhou Women's Hospital, Hangzhou, China

3Department of Obstetrics and Gynecology, The Affiliated Hangzhou Women’s Hospital of Hangzhou Normal University, Hangzhou, China

*these authors contributed equally

Corresponding Author:

Zhenming Yuan, PhD

Hangzhou Normal University

2318 Yuhangtang Road

Hangzhou, 311121


Phone: 86 13588714850


Background: Globally, the preterm birth rate has tended to increase over time. Ultrasonography cervical-length assessment is considered to be the most effective screening method for preterm birth, but routine, universal cervical-length screening remains controversial because of its cost.

Objective: We used obstetric data to analyze and assess the risk of preterm birth. A machine learning model based on time-series technology was used to analyze regular, repeated obstetric examination records during pregnancy to improve the performance of the preterm birth screening model.

Methods: This study attempts to use continuous electronic medical record (EMR) data from pregnant women to construct a preterm birth prediction classifier based on long short-term memory (LSTM) networks. Clinical data were collected from 5187 pregnant Chinese women who gave birth with natural vaginal delivery. The data included more than 25,000 obstetric EMRs from the early trimester to 28 weeks of gestation. The area under the curve (AUC), accuracy, sensitivity, and specificity were used to assess the performance of the prediction model.

Results: Compared with a traditional cross-sectional study, the LSTM model in this time-series study had better overall prediction ability and a lower misdiagnosis rate at the same detection rate. Accuracy was 0.739, sensitivity was 0.407, specificity was 0.982, and the AUC was 0.651. Important-feature identification indicated that blood pressure, blood glucose, lipids, uric acid, and other metabolic factors were important factors related to preterm birth.

Conclusions: The results of this study will be helpful to the formulation of guidelines for the prevention and treatment of preterm birth, and will help clinicians make correct decisions during obstetric examinations. The time-series model has advantages for preterm birth prediction.

JMIR Med Inform 2022;10(6):e33835




Preterm birth, defined as birth occurring before 37 weeks of completed gestation, is the primary cause of neonatal death and disability and affects the long-term health of newborns [1,2]. According to the World Health Organization global action report on preterm birth, there are approximately 15 million premature infants born in the world every year, with an incidence rate of 5% to 18%; 1 million of these premature infants die [3]. China is the most populous country in the world, and the implementation of the two-child policy has increased the average age of first pregnancy and the incidence of preterm birth [4-6]. Compared to full-term birth, prematurity imposes adverse effects on the health and safety of both the pregnant woman and the infant. Prematurity increases the incidence of congenital malformation, being small for gestational age, and nervous system diseases associated with immature organs [7-9]. Therefore, early prediction of preterm birth and preventive measures have a significant potential to reduce mortality and improve the survival rate of preterm infants [10,11].

Despite the serious clinical consequences, there are currently no effective early screening methods for preterm birth. It is generally considered that ultrasonography cervical-length assessment is the most effective screening method [11,12], but routine, universal cervical-length screening remains controversial because of its cost [13,14]. Cervical screening is not popular in China and is performed only for pregnant women with cervical insufficiency [15]. Fetal fibronectin is an extracellular matrix glycoprotein that has also been extensively studied as a predictor of preterm birth, and although it has high specificity, it has a low detection rate [16]. Other biomarkers, including inflammatory factors, serum proteomics, and genetic factors, are associated with preterm birth [17], but each of these only has good performance in a subset of cases, and few studies have demonstrated that they are sufficiently useful for clinical use.

There is not a single or combined screening method for preterm birth that has high sensitivity and can reliably identify women at risk for preterm birth [11]. The etiological mechanism of preterm birth is elusive, and the interaction between risk factors is complex. Machine learning algorithms based on time-series technology can solve nonlinear relationships between multi-dimensional variables and analyze and mine their time-series characteristics. These machine learning models have been shown to be effective in the prediction of obstetric diseases [18,19]. Therefore, this paper proposes a time-series preterm birth prediction model based on a long short-term memory (LSTM) network.

Related Work

In the literature, various methods have been proposed to predict the risk of preterm birth with machine learning. These methods can be broadly categorized into 2 types, according to their data source: special examination data or routine clinical data. Special examination data include findings from the cervicovaginal fluid [20], electrohysterography [21], and whole-blood gene expression [22]. These data need special methods to obtain and are not suitable for large-scale initial screening. Therefore, research results based on these data have only been shown to have better prediction performance in small-sample data sets. Other research has sought to build prediction models based on routine clinical examination data and demographic data. Koivu et al [23] used a US Centers for Disease Control and Prevention (CDC) data set of almost sixteen million observations to build a prediction model; the best-performing machine learning model achieved an area under the curve (AUC) of 0.64 for preterm birth when using external the New York City test data. Lee et al [24] used the same CDC and New York City data sets to build an artificial neural network prediction model; it also had an AUC of 0.64. Weber et al [25] assessed the prediction of early (<32 weeks) spontaneous preterm birth among non-Hispanic women by applying machine learning to multilevel data from a large birth cohort; the AUC of this prediction model was 0.67.

Although the above prediction models have relatively reliable performance, they all use huge, complex data sets for analysis. It can be difficult to obtain complete data sets of this size and complexity because of privacy issues. More importantly, these models ignore the influence of time-related factors. Time-series analysis and prediction methods predict future developments according to tendencies in past changes and highlight the role of time factors in making predictions. In fact, obstetric examinations are continuous and repeated time-series records and are considered to be related to pregnancy risk [26]. Previous studies have reported that time-series models perform well in the field of obstetrics. For example, Tao et al [27] used maternal weight change trajectories during pregnancy to establish a time-series hybrid model to predict the birth weight of newborns. Zhou et al [28] predicted the risk of postpartum hemorrhage using continuous data from prenatal physical examinations. Compared with other biological phenomena, the 280-day gestational cycle has a relatively fixed time; pregnant women also have high compliance to obstetric outpatient examinations [29]. Therefore, a time-series model to mine time-series characteristics from data obtained during pregnancy has high potential.

Few studies have described the interpretability of their models. Khatibi et al [30] used Iran’s national databank of maternal and neonatal records to design a map/reduce phase-based, parallel feature selection machine learning algorithm to predict the risk of preterm birth. The map phase used parallel feature selection and classification methods to score features, while the reduce phase aggregated the feature scores in order to determine the contribution of predictors to the model. Similar methods include the calculation of frequency statistics, the Gini index and other indicators that trace the decision-making process of the tree model [31], and calculating Shapley values to define the importance of features [32].

Although none of the above methods are suitable for time-series models, it is encouraging that there have been recent proposals for interpretable frameworks for time-series classification that can be used in different medical scenarios. In the field of medical signals, Ivatur et al [33] proposed a post-hoc explainability framework for deep learning models applied to quasi-periodic biomedical time-series classification that included 3 different techniques for explanation: studying ablation, studying permutation, and using a local, interpretable model-agnostic explanation method. Maweu et al [34] proposed a modular framework named the convolutional neural network (CNN) explainability framework for electrocardiogram signals that explains the quality of the deep learning model in terms of quantifiable metrics and feature visualization. Electronic medical record (EMRs) contain time series and multimodal data which further hinder interpretability. Nguyen-Du et al [35] proposed a new deep electronic health record spotlight framework for transforming EMR data into pathways and 2D pathway images, which can then be used with 2D CNN techniques to support visual interpretation. Viton et al [36] proposed an approach based on heat maps as a visual means of highlighting significant variables over a temporal sequence, which can be applied to the problem of predicting the risk of in-hospital mortality.

This previous research motivated the current study, which makes the following key contributions: (1) we designed and implemented a complete process for preterm birth screening and providing early warnings based on regular EMR data; (2) we used machine learning based on time-series technology to analyze the obstetric examination data and improve the performance of the prediction model; (3) we provide a preliminary explanation of the quantitative interpretability of the model and explore time-series predictors affecting preterm birth.

Setting and Study Population

The data were collected from Hangzhou Women’s Hospital (Hangzhou Maternity and Child Health Care Hospital), Hangzhou, Zhejiang Province, China, between 2017 and 2020. This study included >25,000 pregnant women who received antenatal care at Hangzhou Women’s Hospital and eventually gave birth naturally through the vagina. The exclusion criteria were as follows: presence of multiple pregnancies, assisted reproduction, severe cardio- or cerebrovascular complications or comorbidities, and performance of cervical cerclage during pregnancy. The inclusion criterion was a first pregnancy test taken before 12 gestational weeks. According to the Chinese guidelines for prenatal examination [37], pregnant women should have a monthly outpatient examination before 28 weeks of gestation. Figure 1 shows the filtering and processing flow chart used to select the study population. Some women were excluded owing to failure to obtain data or implausible pregnancy outcomes. Data from a final total of 5187 women were available for analysis.

Figure 1. Flow chart showing participant selection.
View this figure

Clinical Measurements and Data Collection

Demographic data, physical examination data, ultrasound records, and laboratory data from the antenatal period were retrieved from EMRs. At registration for pregnancy, information on maternal demographic characteristics (eg, age, education, and occupation), anthropometrics (eg, body weight, height, and blood pressure), and clinical history (eg, parity and disease history) were recorded. As shown in Table 1, repeated pregnancy data were obtained for each individual from the first pregnancy test to the final pregnancy test, taken between 25 to 28 weeks. The clinical data included age, weight, uterine height, abdominal circumference, blood pressure, and findings from ultrasonic examination. Laboratory tests (eg, routine blood examination and blood biochemistry examination, including blood lipids and glucose) were performed at 24 weeks of gestation.

Participants were asked to wear light clothing when their height and weight were measured. BMI was calculated as body weight in kilograms divided by body height in meters squared. Sitting blood pressure was examined after at least 10 minutes of rest using a standard mercury sphygmomanometer with the patient’s right arm held at heart level. Maternal venous blood samples were drawn in the morning after an overnight fast of ≥8 hours.

Table 1. Description of data sources.
Gestational ageUltrasonic examinationLaboratory tests
Before 12 weeksaN/Ab
From 13 to 16 weeksN/A
From 17 to 20 weeksN/A
From 21 to 24 weeksN/A
From 25 to 28 weeks

a✓ indicates that the pregnant woman has made relevant clinical examination in this pregnancy stage.

bN/A: not applicable.

Model Design

Based on the above-mentioned features, 2 machine learning models were constructed to predict preterm birth. One was an early prediction model based on the data sources in Table 1. For each cross-sectional gestational age category, extreme gradient boosting (XGB) combined with decision trees was employed to establish the prediction model. XGB is an improvement on the gradient lifting algorithm and is widely used in the field of obstetric auxiliary diagnosis [38]. The second model used temporal prediction techniques. Long short-term memory networks (LSTMs) are a type of time-cyclic neural network that are suitable for processing and predicting events with relatively long intervals and delays in the time series [39]. LSTMs can avoid the gradient disappearance of conventional recurrent neural networks and are widely used in the field of disease diagnosis [40].

LSTMs realize information protection and control through 3 control gates, namely the input gate, the forgetting gate, and the output gate. The key in LSTMs is the unit state. The LSTM unit judges whether the output of the previous time step is useful; only useful information is saved and the rest is forgotten at the forgetting gate. Equations (1) through (5) represent the parameter update process, where σ represents the sigmoid function, ht–1 represents the output of the LSTM at the previous time step, and ht represents the current output; I, f, and o, respectively, represent the input gate, forgetting gate, and output gate in the LSTM unit. Equation (4) represents the process of the state transition of the memory unit, where ct is the state of the memory unit at the current time step. The current state is calculated by the previous time step state, ct–1, and the result of the forgetting gate and the input gate of the current-time LSTM unit.

it = σ (Wχiχt + Whiht-1 + bi) (1)
ft = σ (Wχfχt + Whfht-1 + bf) (2)
ot = σ (WXoχt + Whoht-1 + bo) (3)
Ct = ftct-1 + ittanh(Wxcχt + Whcht-1 + bc) (4)
ht = ottanh(ct) (5)

The parameters of these prediction models were determined by grid search. The models were validated with 5-fold cross-validation. The 5-fold cross-validation splits the training dataset into 2 sections, where 80% of the dataset is used for training and the remaining 20% is used for testing. Simultaneously, the incidence rate of preterm birth is about 5%, so in situations where there were imbalanced class data combined with unequal error costs, random oversampling was used to balance the dataset to get true performance values for the classifier. The random oversampling method makes the number of minority classes the same as the number of majority classes by randomly copying minority class samples to get new equilibrium data.

Under the Python 3.6 environment (Python Software Foundation), the data analysis and visualization were completed by using NumPy, Pandas, Matplotlib, Seaborn, and other libraries [41,42]. The machine learning model comes from the scikit-learn library and the deep learning framework adopts PyTorch [43]. Based on the amount of data in this study, the LSTM network was able to run on a personal computer. The adaptive learning rate of the Adam optimizer [44] was used to accelerate the convergence speed of the LSTM model. Table 2 shows the values of the parameters for the 2 models.

Table 2. Summary of parameter values in each model.
Extreme gradient boosting model

Learning rate0.01
Long short-term memory model

Loss functionCrossEntropy
Input size65
Learning rate0.001

Model Evaluation

The characteristics were compared between the preterm birth and full-term birth groups. Statistical tests were 2-sided; P values <.05 were considered statistically significant. All analyses were performed using the statistical software SPSS 22.0 (IBM).

The prediction performance was considered an important factor to evaluate the proposed model. In this paper, the receiver operating characteristic (ROC) curve and AUC were used to evaluate the model’s ability to predict preterm birth. In addition, the evaluation indicators of the confusion matrix, including accuracy, sensitivity, and specificity, were used to analyze the relationship between the actual values and the predicted values for the risk of preterm birth. Accuracy, sensitivity, and specificity were calculated as follows: accuracy = (TN + TP) / (TN + TP + FN + FP); sensitivity = TP / (TP + FN); and specificity = TN / (TN + FP), where TP indicates true positive, FP indicates false positive, TN indicates true negative, and FN indicates false negative.

Feature importance reflects the contribution each variable makes in classifying preterm birth, which explains the results of the model decision. In this study, feature importance for the XGB model was calculated by the sum of the decrease in error when split by a variable [31]. For the LSTM model, feature ablation was used, which provides feature importance at a given time step for each input feature [45], computing attribution as the difference in output after replacing each feature with a baseline; a lower AUC indicates a more important feature.

Ethics Approval

The study design was approved by the local Ethical and Research Committee (written permission, with approval number 2019-02-2). All medical procedures were performed following the relevant guidelines and regulations. The informed consent requirement for this study was waived by the board because the researchers only accessed the database for analysis purposes and all patient data were deidentified.

General Characteristics of the Study Participants

The data set used in this paper comes from a hospital in eastern China and is very extensive, including maternal ultrasound records, prenatal examination reports, and laboratory data. Of the 5187 pregnant women enrolled in the present study, 4966 gave birth at full term. The remaining 221 women gave birth preterm. The general characteristics of the participants are presented in Table 3. Table 4 summarizes the clinical characteristics of the study subjects at the second trimester (25-28 weeks).

Table 3. General characteristics of the study population (N=5187)
CharacteristicsMean (SD)
Age, years29.63 (3.52)
Prepregnancy weight, kg53.65 (8.15)
Height, cm161.45 (4.84)
Prepregnancy BMI, kg/m220.57 (2.92)
Parity, number0.26 (0.46)
Gravidity, number1.71 (0.98)
Prepregnancy SBPa, mmHg106.12 (13.02)
Prepregnancy DBPb, mmHg67.29 (9.31)
Number of preterm births in reproductive history, parity number0.003 (0.05)
Menarche, years13.47 (1.22)
Period, days6.07 (3.03)
Cycle, days29.55 (7.06)

aSystolic blood pressure.

bDiastolic blood pressure.

Table 4. Clinical characteristics and laboratory parameters at the second trimester.
CharacteristicsFull-term birth (n=4966)Preterm birth (n=221)P value

Mean (SD)Mean (SD)
General characteristics

Age, years29.61 (3.49)30.14 (3.64).02

Prepregnancy weight, kg53.92 (7.16)53.74 (8.12).31

Prepregnancy SBPa, mmHg106.70 (10.45)106.19 (11.98).48

Prepregnancy DBPb, mmHg67.65 (7.96)67.47 (7.41).53
Physical data

Gestational age, weeks26.02 (1.17)26.09 (1.19).73

Pulse rate, beats per minute77.63 (7.27)77.32 (6.82).56

Maternal weight at pregnancy, kg61.16 (7.28)60.39 (8.29).29

SBP, mmHg111.42 (10.62)113.19 (11.24).04

DBP, mmHg65.29 (7.78)66.09 (8.20)<.001

Uterine height, cm24.48 (1.82)24.02 (2.28).45

Mother abdominal circumference, cm88.76 (5.45)86.98 (8.33).45
Ultrasonic data

Biparietal diameter, cm6.70 (0.23)6.84 (0.48).05

Head circumference, cm24.60 (0.76)25.02 (1.47).13

Femur length, cm4.83 (0.17)4.93 (0.34).06

Fetal abdominal circumference, cm22.18 (0.86)22.94 (1.45).03
Laboratory data

Triglyceride, mmol/L2.15 (0.78)2.25 (0.79).02

Total bile acid, µmol/L2.22 (1.75)2.17 (1.52).43

Uric acid, µmol/L244.05 (49.69)246.05 (49.60).12

Platelets, cells × 109/L209.12 (45.24)212.26 (46.10).11

Fasting blood glucose, mmol/L4.35 (0.38)4.40 (0.46).04

Total cholesterol, mmol/L6.23 (1.01)6.19 (1.07).28

Activated partial thromboplastin time, seconds26.25 (2.97)26.26 (3.31).75

Fibrinogen, g/L3.77 (0.63)3.85 (0.64).03

Hemoglobin, g/L115.96 (8.44)116.79 (8.61).04

aSystolic blood pressure.

bDiastolic blood pressure.

Model Performance

Based on the above-mentioned features in Table 3 and Table 4, 2 machine learning models were constructed to predict preterm birth. An XGB model was used for cross-sectional research and an LSTM model was used for time-series research. The optimal parameters were set for each predictive model and corroborated via a test data set that was derived from the training data set by 5-fold cross-validation. The accuracy, sensitivity, specificity, and AUC of the models for predicting preterm birth are shown in Table 5, which compares the performance of these 2 models in identical testing data sets. Notably, the LSTM model, used for time-series research, had the best overall prediction ability. Its accuracy, sensitivity, specificity, and AUC were 0.739, 0.407, 0.982, and 0.651, respectively. Furthermore, the model performance gradually improved with the number of gestational weeks. The overall performance of the model was best in the last cross-sectional gestational age group, with an overall accuracy of 0.689, sensitivity of 0.407, specificity of 0.979, and AUC of 0.601.

Based on the validation result for the training data set, an independent testing data set was used for predicting preterm birth. The matrices and ROC curves for the predictive models in the testing data set are shown in Figure 2. Compared with cross-sectional designs, the LSTM model had a lower misdiagnosis rate at the same detection rate. The high specificity of the model excluded more true negative samples, lowering the cost of screening.

Table 5. Average prediction results of different methods after 5-fold cross-validation.
Prediction resultsObservation period (gestational weeks)Time series

Before 12 weeksWeeks 13-16Weeks 17-20Weeks 21-24Weeks 25-28

aAUC: area under the receiver operating characteristic curve.

Figure 2. Receiver operating characteristic curves and confusion matrix of prediction models: (A) cross-sectional prediction of the extreme gradient boosting model at weeks 25 to 28; (B) prediction results of the long short-term memory model. ROC: receiver operating characteristic.
View this figure

Influence of Variables on Predictions

The identification of important features by the XGB and LSTM models is shown in Figure 3. Feature importance was calculated by XGB as the sum of the decrease in error when split by a variable, which reflects the contribution each variable makes in classifying. Maternal age was the most important variable to predict preterm birth, followed by triglyceride level, total bile acid level, systolic pressure during pregnancy, fundal height, uric acid level, platelet level, and prepregnancy weight. The LSTM model for time-series research achieved the best performance, and feature ablation provided feature importance for a given time-series input feature. The importance of features was evaluated according to the degree of AUC decrease. The results indicated that the AUC decrease rate for systolic blood pressure was 2%, which was the most important time-series feature, followed by fetal abdominal circumference, head circumference, and maternal weight.

Figure 3. Importance of the variables: (A) identification of important features by the extreme gradient boosting model at weeks 25 to 28; (B) identification of important features by the long short-term memory model. AUC: area under the curve; SBP: systolic blood pressure.
View this figure

Principal Findings

Premature birth is widely recognized as an increasingly serious problem. In this study, 5 pregnancy test records in the first and second trimesters of pregnancy were selected to construct a time-series model to predict preterm delivery. Compared with traditional machine learning models, the use of a time-series model improved prediction performance for preterm birth and allowed the identification of important variables for predicting preterm birth.

The early prediction of preterm birth has always been challenging. The input index of traditional prediction model research has usually been a special test item or a combination of tests that aim to find new markers that have a high contribution to preterm birth prediction; most past studies have not been clinically verified [11,17,46]. Many studies have tried to effectively predict preterm birth, which would allow early detection and prompt management. Cervical screening, fetal fibronectin measurement, or the combination of these methods can effectively predict preterm birth [12-14,16,47]. However, there are still flaws in the forecasts. For asymptomatic women, the performance of the fetal fibronectin test is too low to be clinically relevant [48]. Many studies have found that cervical status is an independent risk factor for preterm birth. In China's 2014 edition of the Clinical Diagnosis and Treatment Guidelines for Preterm Delivery [49], it is recommended that when cervical length is <25 mm, transvaginal ultrasound should be performed before 24 weeks to predict preterm birth in high-risk patients. In fact, cervical examinations are still controversial for screening of the general population. Some studies advocate for dynamic cervical examination regardless of whether a subject is high- or low-risk [50,51]. On the other hand, a greater number of studies either oppose or do not recommend large-scale cervical screening, for reasons that include but are not limited to the material cost, the time required, the lack of unified standards, and the professional training of laboratory personnel [13,14,52-55], which may lead to costs that do not conform to health economics. The prediction model in this study effectively predicts the early development of preterm labor based on demographic factors and prenatal laboratory data. These data are easy to obtain in routine clinical practice. Therefore, the prediction model of preterm birth proposed in this study can be used as a practical screening method for preterm birth in the first and second trimesters of pregnancy.

In fact, earlier works have already reported very close or even higher accuracy than this study. Compared with the large national databases used in previous studies, the conventional data used in this paper is still relatively weak, especially in its lack of key information, such as obstetric and gynecological history and family history. However, we are excited that this paper significantly improves the performance of prediction models through a machine learning method based on time-series technology.

This study reveals various new factors that affect the prediction of preterm birth. Additionally, parameters that have been traditionally reported to be related to delivery date, such as age, prepregnancy weight, history of preterm birth, and menstrual cycle, were confirmed to be influential factors in preterm birth prediction [1,56]. Interestingly, blood pressure, blood glucose, lipids, uric acid, and other metabolic factors were also very important factors related to preterm birth. Although it has not been thoroughly investigated, the relationship between metabolic risk factors and preterm birth has been preliminarily recognized in several previous studies [57,58]. In a recent observational study of 5535 deliveries, pregnant women with a cluster of metabolic risk factors during early pregnancy were more likely to give birth preterm [59]. The metabolic reaction during pregnancy normally meets the needs of fetal growth; however, an excessive metabolic stress reaction can lead to the occurrence of various pathologies in pregnancy [60]. Despite the controversy, changes in metabolic levels during pregnancy have been observed in women who give birth preterm.


This study has several limitations. First, the laboratory examinations of the pregnant women were completed in their respective communities before 20 weeks of gestation, precluding them from being included in the analysis due to differences in test standards. In addition, the prepregnancy characteristics were affected by recall bias; moreover, most of the included women were primipara. Thus, the contribution of preterm birth history to the model was limited. Second, the performance of the model still needs to be improved, although LSTM has great potential. Nonetheless, considering this prediction model is a baseline model based on conventional data, it can continue to add biochemical and biophysical markers to increase screening performance. In addition, advanced maternal age was a clear confounding factor [61], and stratified analysis by age will be considered in a follow-up study. Third, this paper is only a preliminary explanation of the interpretability of the machine learning model. Future work will consider using a more sophisticated post hoc explainability framework, especially for time-series problems. Finally, the study was possibly affected by selection bias due to its single-center design. The prediction model has not been widely used in clinical practice, and its accuracy and practicality should be verified in prospective studies with larger samples.


Preterm birth is the primary cause of neonatal death and disability, and early prediction of preterm birth has great potential to improve the survival rate of preterm infants. In this work, we analyzed obstetric medical data based on time-series machine learning and evaluated the risk of preterm birth. Our study can screen high-risk groups for preterm birth in the early and middle trimesters of pregnancy. Compared with a traditional cross-sectional study, the time-series LSTM model in this study had better overall prediction ability with a lower misdiagnosis rate and the same detection rate. In future work, we will further improve the data set, especially regarding some key characteristics of premature birth that have been reported by past relevant research, and build a more sophisticated post hoc explainability framework for the time series model.


This research was funded by the National Health Commission Scientific Research Fund—Major Science and Technology Program of Medicine and Health of Zhejiang Province (WKJ-ZJ-1911), “Pioneer” and “Leading Goose” research and development programs of Zhejiang (2022C03102), Zhejiang Public Welfare Technology Research (GF20F020063), Primary Research and Development Plan of Zhejiang Province in China (2020C03107), Scientific and Technological Research Projects in Key Fields of the Corps (2021AB034-2), Natural Science Foundation of Zhejiang Province in China (GF20F020009, LQ21H040001), National Natural Science Foundation of China (82173530), and the Science and Technology Program of Medicine and Health of Hangzhou (ZD20200035 and OO2019054). We would also like to thank all the pregnant women and health care professionals who participated in the different stages of the development of the prediction model.

Authors' Contributions

YW and YZ were responsible for the study design. WH extracted the data. YW completed the relevant experiments. YW, WH, SL and YZ provided feedback on analyses and interpretation of results. YZ, YW, and ZY wrote the paper. All authors read and approved the final manuscript.

Conflicts of Interest

None declared.

  1. Vogel JP, Chawanpaiboon S, Moller A, Watananirun K, Bonet M, Lumbiganon P. The global epidemiology of preterm birth. Best Pract Res Clin Obstet Gynaecol 2018 Oct;52:3-12. [CrossRef] [Medline]
  2. Luu TM, Rehman Mian MO, Nuyt AM. Long-Term Impact of Preterm Birth: Neurodevelopmental and Physical Health Outcomes. Clin Perinatol 2017 Jun;44(2):305-314. [CrossRef] [Medline]
  3. Howson C, Kinney M, McDougall L, Lawn JE, Born Too Soon Preterm Birth Action Group. Born too soon: preterm birth matters. Reprod Health 2013;10 Suppl 1:S1 [FREE Full text] [CrossRef] [Medline]
  4. Xue Q, Shen F, Gao Y, Tong M, Zhao M, Chen Q. An analysis of the medical indications for preterm birth in an obstetrics and gynaecology teaching hospital in Shanghai, China. Midwifery 2016 Apr;35:17-21. [CrossRef] [Medline]
  5. Chen S, Zhang C, Chen Y. Analysis of factors influencing safety of department of obstetrics based on the second child policy and investigation of countermeasures. J Shanghai Jiaotong Univ (Med Sci) 2016;5:742-746. [CrossRef]
  6. Jing S, Chen C, Gan Y, Vogel J, Zhang J. Incidence and trend of preterm birth in China, 1990-2016: a systematic review and meta-analysis. BMJ Open 2020 Dec 12;10(12):e039303 [FREE Full text] [CrossRef] [Medline]
  7. Tokariev A, Stjerna S, Lano A, Metsäranta M, Palva JM, Vanhatalo S. Preterm Birth Changes Networks of Newborn Cortical Activity. Cereb Cortex 2019 Feb 01;29(2):814-826. [CrossRef] [Medline]
  8. Bensi C, Costacurta M, Belli S, Paradiso D, Docimo R. Relationship between preterm birth and developmental defects of enamel: A systematic review and meta-analysis. Int J Paediatr Dent 2020 Nov 02;30(6):676-686. [CrossRef] [Medline]
  9. Blencowe H, Cousens S, Oestergaard M, Chou D, Moller A, Narwal R, et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet 2012 Jun 09;379(9832):2162-2172. [CrossRef] [Medline]
  10. da Fonseca EB, Damião R, Moreira DA. Preterm birth prevention. Best Pract Res Clin Obstet Gynaecol 2020 Nov;69:40-49. [CrossRef] [Medline]
  11. Oskovi Kaplan ZA, Ozgu-Erdinc AS. Prediction of Preterm Birth: Maternal Characteristics, Ultrasound Markers, and Biomarkers: An Updated Overview. J Pregnancy 2018;2018:8367571 [FREE Full text] [CrossRef] [Medline]
  12. Reicher L, Fouks Y, Yogev Y. Cervical Assessment for Predicting Preterm Birth-Cervical Length and Beyond. J Clin Med 2021 Feb 07;10(4):627 [FREE Full text] [CrossRef] [Medline]
  13. Rosenbloom JI, Raghuraman N, Temming LA, Stout MJ, Tuuli MG, Dicke JM, et al. Predictive Value of Midtrimester Universal Cervical Length Screening Based on Parity. J Ultrasound Med 2020 Jan 08;39(1):147-154. [CrossRef] [Medline]
  14. Rozenberg P. Universal cervical length screening for singleton pregnancies with no history of preterm delivery, or the inverse of the Pareto principle. BJOG 2017 Jun 04;124(7):1038-1045. [CrossRef] [Medline]
  15. Lu C, Li Z, Wang Z, Guo H, Zhong C, Li Y. Methods of detecting cervical incompetence and their evidence-based evaluation. Journal of Practical Obstetrics and Gynecology 2018;034(005):347-351.
  16. Kuhrt K, Hezelgrave-Elliott N, Stock SJ, Tribe R, Seed PT, Shennan AH. Quantitative fetal fibronectin for prediction of preterm birth in asymptomatic twin pregnancy. Acta Obstet Gynecol Scand 2020 Sep 20;99(9):1191-1197 [FREE Full text] [CrossRef] [Medline]
  17. Glover AV, Manuck TA. Screening for spontaneous preterm birth and resultant therapies to reduce neonatal morbidity and mortality: A review. Semin Fetal Neonatal Med 2018 Apr;23(2):126-132 [FREE Full text] [CrossRef] [Medline]
  18. Jhee JH, Lee S, Park Y, Lee SE, Kim YA, Kang S, et al. Prediction model development of late-onset preeclampsia using machine learning-based methods. PLoS One 2019 Aug 23;14(8):e0221202 [FREE Full text] [CrossRef] [Medline]
  19. Maragatham G, Devi S. LSTM Model for Prediction of Heart Failure in Big Data. J Med Syst 2019 Mar 19;43(5):111. [CrossRef] [Medline]
  20. Park S, Oh D, Heo H, Lee G, Kim SM, Ansari A, et al. Prediction of preterm birth based on machine learning using bacterial risk score in cervicovaginal fluid. Am J Reprod Immunol 2021 Sep;86(3):e13435. [CrossRef] [Medline]
  21. Fergus P, Cheung P, Hussain A, Al-Jumeily D, Dobbins C, Iram S. Prediction of preterm deliveries from EHG signals using machine learning. PLoS One 2013 Oct 28;8(10):e77154 [FREE Full text] [CrossRef] [Medline]
  22. Tarca AL, Pataki, Romero R, Sirota M, Guan Y, Kutum R, DREAM Preterm Birth Prediction Challenge Consortium, et al. Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth. Cell Rep Med 2021 Jun 15;2(6):100323 [FREE Full text] [CrossRef] [Medline]
  23. Koivu A, Sairanen M. Predicting risk of stillbirth and preterm pregnancies with machine learning. Health Inf Sci Syst 2020 Dec 25;8(1):14 [FREE Full text] [CrossRef] [Medline]
  24. Lee K, Ahn KH. Application of Artificial Intelligence in Early Diagnosis of Spontaneous Preterm Labor and Birth. Diagnostics (Basel) 2020 Sep 22;10(9):733 [FREE Full text] [CrossRef] [Medline]
  25. Weber A, Darmstadt GL, Gruber S, Foeller ME, Carmichael SL, Stevenson DK, et al. Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women. Ann Epidemiol 2018 Nov;28(11):783-789.e1. [CrossRef] [Medline]
  26. Goldstein RF, Abell SK, Ranasinha S, Misso M, Boyle JA, Black MH, et al. Association of Gestational Weight Gain With Maternal and Infant Outcomes: A Systematic Review and Meta-analysis. JAMA 2017 Jun 06;317(21):2207-2225 [FREE Full text] [CrossRef] [Medline]
  27. Tao J, Yuan Z, Sun L, Yu K, Zhang Z. Fetal birthweight prediction with measured data by a temporal machine learning method. BMC Med Inform Decis Mak 2021 Jan 25;21(1):26 [FREE Full text] [CrossRef] [Medline]
  28. Zhou T, Yu K, Yuan Z, Lu S, Hu W. Predictive Analysis of Postpartum Hemorrhage Based on LSTM and XGBoost Hybrid Model. Computer Systems and Applications 2020;29(3):148-154.
  29. Miremberg H, Ben-Ari T, Betzer T, Raphaeli H, Gasnier R, Barda G, et al. The impact of a daily smartphone-based feedback system among women with gestational diabetes on compliance, glycemic control, satisfaction, and pregnancy outcome: a randomized controlled trial. Am J Obstet Gynecol 2018 Apr;218(4):453.e1-453.e7. [CrossRef] [Medline]
  30. Khatibi T, Kheyrikoochaksarayee N, Sepehri MM. Analysis of big data for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features. Arch Gynecol Obstet 2019 Dec 24;300(6):1565-1582. [CrossRef] [Medline]
  31. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care 2019 Apr 08;23(1):112 [FREE Full text] [CrossRef] [Medline]
  32. Fu Y, Gou W, Hu W, Mao Y, Tian Y, Liang X, et al. Integration of an interpretable machine learning algorithm to identify early life risk factors of childhood obesity among preterm infants: a prospective birth cohort. BMC Med 2020 Jul 10;18(1):184 [FREE Full text] [CrossRef] [Medline]
  33. Ivaturi P, Gadaleta M, Pandey AC, Pazzani M, Steinhubl SR, Quer G. A Comprehensive Explanation Framework for Biomedical Time Series Classification. IEEE J Biomed Health Inform 2021 Jul;25(7):2398-2408. [CrossRef] [Medline]
  34. Maweu BM, Dakshit S, Shamsuddin R, Prabhakaran B. CEFEs: A CNN Explainable Framework for ECG Signals. Artif Intell Med 2021 May;115:102059. [CrossRef] [Medline]
  35. Nguyen-Duc T, Mulligan N, Mannu GS, Bettencourt-Silva JH. Deep EHR Spotlight: a Framework and Mechanism to Highlight Events in Electronic Health Records for Explainable Predictions. AMIA Jt Summits Transl Sci Proc 2021;2021:475-484 [FREE Full text] [Medline]
  36. Viton F, Elbattah M, Guerin JL. Heatmaps for Visual Explainability of CNN-Based Predictions for Multivariate Time Series with Application to Healthcare. 2020 Presented at: 2020 IEEE International Conference on Healthcare Informatics (ICHI); Nov 30-Dec 3, 2020; Oldenburg, Germany. [CrossRef]
  37. Obstetrics group, Branch of Obstetrics and Gynecology, Chinese Medical Association. Pre pregnancy and pregnancy care guidelines (2018). Chinese Journal of Obstetrics and Gynecology 2018;53(1):7-13.
  38. Lu Y, Fu X, Chen F, Wong KK. Prediction of fetal weight at varying gestational age in the absence of ultrasound examination using ensemble learning. Artif Intell Med 2020 Jan;102:101748. [CrossRef] [Medline]
  39. Petrozziello A, Jordanov I, Papageorghiou AT. Deep Learning for Continuous Electronic Fetal Monitoring in Labor. 2018 Presented at: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Jul 18-21, 2018; Honolulu, HI p. 5866-5869. [CrossRef]
  40. Sun L, Wang Y, He J, Li H, Peng D, Wang Y. A stacked LSTM for atrial fibrillation prediction based on multivariate ECGs. Health Inf Sci Syst 2020 Dec 21;8(1):19 [FREE Full text] [CrossRef] [Medline]
  41. McKinney W. Python for data analysis: data wrangling with Pandas, NumPy, and IPython. Sebastopol, CA: O'Reilly Media; 2017.
  42. Komer B, Bergstra J, Eliasmith C. Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn. In: Proc Of The 13th Python in Science Conf. 2014 Presented at: SciPy2014; Jul 6-12, 2014; Austin, TX p. 32-37   URL: [CrossRef]
  43. Paszke A, Gross S, Massa F. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 2019 Presented at: 2019 Conference on Neural Information Processing Systems; Dec 8-14, 2019; Vancouver, Canada   URL: https:/​/www.​​publication/​337756689_PyTorch_An_Imperative_Style_High-Performance_Deep_Learning_Library
  44. Kingma DP, Ba JL. Adam: A method for stochastic optimization. 2015 Presented at: The 3rd International Conference on Learning Representations; May 7-9, 2015; San Diego, CA   URL:
  45. Ismail AA, Gunady M, Bravo HC. Benchmarking Deep Learning Interpretability in Time Series Predictions. ArXiv Preprint posted online on Oct 26, 2020. [CrossRef]
  46. Suff N, Story L, Shennan A. The prediction of preterm delivery: What is new? Semin Fetal Neonatal Med 2019 Feb;24(1):27-32. [CrossRef] [Medline]
  47. Son M, Miller ES. Predicting preterm birth: Cervical length and fetal fibronectin. Semin Perinatol 2017 Dec;41(8):445-451 [FREE Full text] [CrossRef] [Medline]
  48. Faron G, Balepa L, Parra J, Fils J, Gucciardo L. The fetal fibronectin test: 25 years after its development, what is the evidence regarding its clinical utility? A systematic review and meta-analysis. J Matern Fetal Neonatal Med 2020 Feb;33(3):493-523. [CrossRef] [Medline]
  49. Obstetrics Subgroup‚ Chinese Society of Obstetrics and Gynecology‚ Chinese Medical Association. [Diagnosis and therapy guideline of preterm birth (2014)]. Zhonghua Fu Chan Ke Za Zhi 2014 Jul;49(7):481-485. [Medline]
  50. Einerson BD, Grobman WA, Miller ES. Cost-effectiveness of risk-based screening for cervical length to prevent preterm birth. Am J Obstet Gynecol 2016 Jul;215(1):100.e1-100.e7. [CrossRef] [Medline]
  51. Werner EF, Han CS, Pettker CM, Buhimschi CS, Copel JA, Funai EF, et al. Universal cervical-length screening to prevent preterm birth: a cost-effectiveness analysis. Ultrasound Obstet Gynecol 2011 Jul 24;38(1):32-37 [FREE Full text] [CrossRef] [Medline]
  52. Rozenberg P. [Is universal screening for cervical length among singleton pregnancies with no history of preterm birth justified?]. J Gynecol Obstet Biol Reprod (Paris) 2016 Dec;45(10):1337-1345. [CrossRef] [Medline]
  53. Berghella V. Universal cervical length screening for prediction and prevention of preterm birth. Obstet Gynecol Surv 2012 Oct;67(10):653-658. [CrossRef] [Medline]
  54. Hermans FJ, Koullali B, van Os MA, van der Ven JE, Kazemier BM, Woiski MD, Triple P group. Repeated cervical length measurements for the verification of short cervical length. Int J Gynaecol Obstet 2017 Dec 28;139(3):318-323. [CrossRef] [Medline]
  55. Masters HR, Warshak C, Sinclair S, Rountree S, DeFranco E. Time required to complete transvaginal cervical length in women receiving universal cervical length screening for preterm birth prevention. J Matern Fetal Neonatal Med 2020 Aug 30:1-5. [CrossRef] [Medline]
  56. Torchin H, Ancel PY. [Epidemiology and risk factors of preterm birth]. J Gynecol Obstet Biol Reprod (Paris) 2016 Dec;45(10):1213-1230. [CrossRef] [Medline]
  57. Grieger JA, Bianco-Miotto T, Grzeskowiak LE, Leemaqz SY, Poston L, McCowan LM, et al. Metabolic syndrome in pregnancy and risk for adverse pregnancy outcomes: A prospective cohort of nulliparous women. PLoS Med 2018 Dec 4;15(12):e1002710 [FREE Full text] [CrossRef] [Medline]
  58. Le TM, Nguyen LH, Phan NL, Le DD, Nguyen HV, Truong VQ, et al. Maternal serum uric acid concentration and pregnancy outcomes in women with pre-eclampsia/eclampsia. Int J Gynaecol Obstet 2019 Jan 08;144(1):21-26 [FREE Full text] [CrossRef] [Medline]
  59. Lei Q, Niu J, Lv L, Duan D, Wen J, Lin X, et al. Clustering of metabolic risk factors and adverse pregnancy outcomes: a prospective cohort study. Diabetes Metab Res Rev 2016 Nov 10;32(8):835-842. [CrossRef] [Medline]
  60. Mouzon SH, Lassance L. Endocrine and metabolic adaptations to pregnancy; impact of obesity. Horm Mol Biol Clin Investig 2015 Oct;24(1):65-72. [CrossRef] [Medline]
  61. Frick AP. Advanced maternal age and adverse pregnancy outcomes. Best Pract Res Clin Obstet Gynaecol 2021 Jan;70:92-100. [CrossRef] [Medline]

AUC: area under the curve
CDC: US Centers for Disease Control and Prevention
CNN: convolutional neural network
EMR: electronic medical records
FN: false negative
FP: false positive
LSTM: long short-term memory
RNN: recurrent neural network
ROC: receiver operating characteristic
TN: true negative
TP: true positive
XGB: extreme gradient boosting

Edited by C Lovis; submitted 26.09.21; peer-reviewed by M Elbattah, I Mircheva; comments to author 31.01.22; revised version received 21.04.22; accepted 25.04.22; published 13.06.22


©Yichao Zhang, Sha Lu, Yina Wu, Wensheng Hu, Zhenming Yuan. Originally published in JMIR Medical Informatics (, 13.06.2022.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.