This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Postpartum depression (PPD) is a serious public health problem. Building a predictive model for PPD using data during pregnancy can facilitate earlier identification and intervention.
The aims of this study are to compare the effects of four different machine learning models using data during pregnancy to predict PPD and explore which factors in the model are the most important for PPD prediction.
Information on the pregnancy period from a cohort of 508 women, including demographics, social environmental factors, and mental health, was used as predictors in the models. The Edinburgh Postnatal Depression Scale score within 42 days after delivery was used as the outcome indicator. Using two feature selection methods (expert consultation and random forest-based filter feature selection [FFS-RF]) and two algorithms (support vector machine [SVM] and random forest [RF]), we developed four different machine learning PPD prediction models and compared their prediction effects.
There was no significant difference in the effectiveness of the two feature selection methods in terms of model prediction performance, but 10 fewer factors were selected with the FFS-RF than with the expert consultation method. The model based on SVM and FFS-RF had the best prediction effects (sensitivity=0.69, area under the curve=0.78). In the feature importance ranking output by the RF algorithm, psychological elasticity, depression during the third trimester, and income level were the most important predictors.
In contrast to the expert consultation method, FFS-RF was important in dimension reduction. When the sample size is small, the SVM algorithm is suitable for predicting PPD. In the prevention of PPD, more attention should be paid to the psychological resilience of mothers.
Postpartum depression (PPD) is a serious public health problem that affects 10% to 20% of pregnant women [
Machine learning (ML) may be useful in making accurate predictions based on data from multiple sources and has been applied in prediction studies in recent years [
Comparison between those ML methods concerning PPD has not been studied. This study is based on data drawn from a large, ongoing cohort study of pregnant women in the Hunan province of south central China. In this paper we combined the two feature selection methods and the two ML algorithms described above to assess four PPD prediction models using data during pregnancy to compare the effect of PPD prediction models, pick the optimal predictive model, and provide a reference for the development of ML in PPD.
This study was part of a larger cohort study. All the data included here is original and previously unpublished. Researchers in the study collected the following measures at a series of 7 visits conducted in the first trimester through 6 weeks postpartum: depression (using the Edinburgh Postnatal Depression Scale [EPDS]), social environment, and psychological and biological factors associated with depression. The study was approved by the institutional review board of the institute of clinical pharmacology of Central South University (ChiCTR-ROC-16009255).
Participants were recruited from two maternity and child care centers in the cities of Changsha and Yiyang in the Hunan province. The former is a major provincial teaching hospital located in Changsha, a city with approximately 8.15 million residents. Yiyang city is a less economically developed area of Hunan province, with approximately 4.39 million residents. Researchers sought to recruit women in the obstetric clinics of the two hospitals from September 2016, to February 2017. The following inclusion criteria were used for participants: woman, age ≥18 years, and gestation period ≤13 weeks (pregnancy weeks are estimated based on the first day of the last menstrual period). All participants signed informed consent. In total, 1126 women were recruited.
The following tools were used to collect data.
A purpose-built questionnaire, designed for this study and optimized through a pilot survey, was used to collect information including age, education, monthly income level, occupation, marital satisfaction, first pregnancy, folic acid intake, premenstrual syndrome, history of mental health concerns, family history of mental illness, mother's menopausal symptoms, childhood experiences, and life events.
The EPDS was used to self-report maternal symptoms of depression [
The Brief Resilience Scale (BRS) was used to determine the level of psychological resilience. The BRS is a 6-item questionnaire that reflects the respondent’s ability to bounce back or recover from stress. The score is the average score of each item. A higher score indicates a stronger strain and adaptability [
The Pittsburgh Sleep Quality Index (PSQI) is a comprehensive scale that reflects the sleep quality of subjects. It is composed of 7 dimensions: “Sleep Quality”, “Sleep Latency”, “Sleep Duration”, “Sleep Efficiency”, “Sleep Disorders”, “Use of Sleep Medications”, and “Daytime Dysfunction”. The scores of each dimension are summed to obtain the total PSQI score. Higher scores indicate worse sleep quality. According to the total score, sleep quality can be divided into different grades: 6 to 10 indicates “good sleep quality”, 11 to 15 indicates “average sleep quality”, and 16 to 21 indicates “poor sleep quality” [
The Social Support Rating Scale (SSRS), which was designed by Shuiyuan Xiao [
The Generalized Anxiety Disorder-7 (GAD-7) was developed by Spitzer [
Seven time points were selected for depression screening, corresponding to the women’s routine obstetric examinations. We divided these into first trimester (gestational week 13 or earlier), second trimester (weeks 17-20 and 21-24), third trimester (weeks 31-32 and 35-40) and postpartum (7 days and 6 weeks postpartum). Except for the first, screening for perinatal depression by EPDS was performed twice for each trimester. If one or more of the EPDS scores was 9.5 or higher for each grouped set of visits, the participant was regarded as at risk for depression during this period. The study questionnaire, BRS, and GAD-7 were assessed during the first trimester, whereas the PSQI was used during the second trimester, and the SSRS during the third trimester. In total, 508 out of 1126 (45.12%) participants completed all screenings (
Participant recruitment and response condition.
Two simple and easy to implement methods were used for feature selection, namely, the expert consultation and FFS-RF methods. The expert consultation method was used to select clinically relevant factors as appropriate predictors of pre-existing or potential PPD. This was accomplished by consulting experts in the area of obstetrics and gynecology as well as mental health practitioners. The FFS-RF was used to identify proper predictors for PPD. Under this approach, features within a certain bound value range (
Of the 508 participants, 75% (381) were randomly selected for model training. Data from the remaining 127 participants was held back for use in model testing and verification.
Names of the postpartum depression prediction models.
Machine learning modeling algorithm | Feature selection method | |
|
Expert consultation method | FFS-RFa |
Random forest | E-RFb | F-RFc |
Support vector machine | E-SVMd | F-SVMe |
aFFS-RF: filter feature selection based on random forest.
bE-RF: model built using the random forest algorithm and expert consultation method.
cF-RF: model built using the random forest algorithm and Random forest-based filter feature selection method.
dE-SVM: model built using the support vector machine algorithm and expert consultation method.
eF-SVM: model built using the support vector machine algorithm and Random forest-based filter feature selection method.
Optimal parameters for each model.
PPDa prediction model name | Parameter settings |
E-RFb | n_estimator=300, criterion=entropy, max_features=sqrt |
E-SVMc | Kernel=linear |
F-RFd | n_estimator=300, max_features=auto, criterion=gini |
F-SVMe | Kernel=linear |
aPPD: postpartum depression.
bE-RF: model built using the random forest algorithm and expert consultation method.
cE-SVM: model built using the support vector machine algorithm and expert consultation method.
dF-RF: model built using the random forest algorithm and Random forest-based filter feature selection method.
eF-SVM: model built using the support vector machine algorithm and Random forest-based filter feature section method.
For the test set, we used the trained models to test and compare their prediction of PPD with real data and created a confusion matrix (
Accuracy =
Misclassification rate =
Positive predictive value =
Negative predictive value =
Sensitivity (Sen) =
Specificity (Spe) =
Geometric mean =
Confusion matrix.
Predicted Results | Real Results | |
|
Positive | Negative |
Positive | a | c |
Negative | b | d |
The sensitivity and the receiver operator curve-area under the curve (ROC-AUC) were used to evaluate the effects of each model and choose the best prediction model. To select the optimal model, we first selected the model with an ROC-AUC>0.75 to confirm that it had a good comprehensive prediction effect. On this basis, we then selected the model with the highest sensitivity as the best prediction model, thus, ensuring that as many mothers as possible with a high risk of PPD would be detected.
This study used the REDCap system to build a database and SPSS version 18.0 to clean the data. The training and test sets were analyzed by the “sklearn.model_selection.train_test_split” package. The RF data were analyzed by the “sklearn.ensemble.randomforestclassifiers” package. The SVM data were analyzed by the “sklearn.svm.SVC” package. Cross-validation was performed using the “sklearn.cross_validation” package. All these packages were available in the Python 3.6 software.
The predictive features obtained by the expert consultation and FFS-RF methods are shown in
Age
Education
Monthly income level
Husband’s education
Husband’s monthly income level
Marital satisfaction
Sexual, psychological, or physical spousal abuse
Childhood abuse history
Premenstrual syndrome-mood instability
Premenstrual syndrome-sleep changes
Depression history of woman
Depression history of family members
Other mental illness history of woman
Other mental illness history of family members
Mother’s menopausal symptoms
Level of psychological resilience
Depressive symptoms in the third trimester
Level of psychological resilience
Depressive symptoms in first trimester
Monthly income level
Husband’s monthly income level
Husband’s education
Education
Mother’s menopausal symptoms
PPD prediction models were established using the RF and SVM modeling applied to the training data set, using the feature sets constructed through our two feature selection methods. The optimal parameters of each model are shown in
The model evaluation index is shown in
Test data sets for each model evaluation index.
Items | E-RFa | E-SVMb | F-RFc | F-SVMd |
Misclassification rate | 0.28 | 0.20 | 0.27 | 0.22 |
Sensitivity | 0.48 | 0.68 | 0.48 | 0.69 |
Specificity | 0.86 | 0.87 | 0.86 | 0.83 |
Positive predictive value | 0.63 | 0.72 | 0.63 | 0.68 |
Negative predictive value | 0.76 | 0.84 | 0.76 | 0.84 |
Geometric mean | 0.84 | 0.76 | 0.64 | 0.76 |
ROC-AUCe | 0.75 | 0.81 | 0.70 | 0.78 |
aE-RF: model built using the random algorithm and expert consultation method.
bE-SVM: model built using the support vector machine algorithm and expert consultation method.
cF-RF: model built using the random forest algorithm and random forest-based filter feature selection method.
dF-SVM: model built using the support vector machine algorithm and Random forest-based filter feature selection method.
eROC-AUC: receiver operating characteristic curve-area under the curve.
The receiver operating characteristic curve of E-RF. AUC: area under the curve; ROC: receiver operating characteristic.
The receiver operating characteristic curve of E-SVM. AUC: area under the curve; ROC: receiver operating characteristic.
The receiver operating characteristic curve of F-RF. AUC: area under the curve; ROC: receiver operating characteristic.
The receiver operating characteristic curve of F-SVM. AUC: area under the curve; ROC: receiver operating characteristic curve.
The features selected by the expert consultation method and FFS-RF method were put into the E-RF and F-RF models, respectively. The importance of the features was ranked as shown in
The relative feature importance rankings of the E-RF and the F-RF based on the two feature selection methods.
Level of psychological resilience
Depressive symptoms in the third trimester
Monthly income level
Husband’s education
Education
Husband’s monthly income level
Mother’s menopausal symptoms
Premenstrual syndrome-mood instability
Marital satisfaction
Age
Level of psychological resilience
Depressive symptoms in early pregnancy
Monthly income level
Husband’s monthly income level
Husband’s education
Education
Mother’s menopausal symptoms
We compared four PPD prediction models and provided a reference for the application of ML in PPD. Compared with the expert consultation method approach, the FFS-RF method identified fewer predictive factors. We found that the F-SVM model was the best model. The strongest predictive factor was the psychological resilience of pregnant women.
Between the expert consultation method and FFS-RF method, the latter selected far fewer predictive factors. Furthermore, there was no significant difference between the two methods in terms of their effects on model performance, indicating that the FFS-RS method could reduce dimensions and improve the efficiency of the algorithmic function without changing model predictive performance. The reduction in the number of predictive factors means that the burden of collecting information is reduced, making the model easier to implement and popularize, especially in busy obstetric clinics.
The SVM was chosen as the better algorithm, as it showed higher sensitivity than the RF algorithm (E-SVM=0.67, F-SVM=0.69, E-RF=0.48, F-RF=0.48). SVM had a clear advantage over RF in processing our research data, and the smaller sample size may be the main reason for this finding. Previous research on depression suggested that sample size is a key factor affecting the performance of ML models. When the sample size is small, SVM can avoid overfitting while providing efficient computing time and produces better prediction results in depression [
We found that the top 3 most important predictors in the models were psychological resilience, depression during the third trimester, and monthly income level. First, psychological resilience is the most important factor in the prediction of PPD, which can be attributed to the protective effect of psychological elasticity. Pregnancy and childbirth are a challenging time for women emotionally and physiologically, and the mother's body and mind are under greater stress [
The identification of these predictors also reveals the different aspects of PPD risk factors. A pregnant woman's psychological elasticity may reflect her personality traits. Depression in the third trimester may be a special symptom accompanying pregnancy. The income of a pregnant woman and her partner reflects the stability and coping resources available to them. It indicates that PPD risk should be assessed based on a combination of individual long-term, short-term, and environmental characteristics.
This study has several limitations. First, there was potential selection bias. Women who were not lost to follow-up might have had a greater awareness of mental health services. Second, the 50% loss to follow-up and small sample size may have negatively affected the applicability of the PPD model, indicating that more extensive validation is required. Third, a larger number of potential predictive factors would have been useful. Further studies should develop different PPD models using other ML algorithms and data from different sources as well as incorporating additional cultural factors to expand the application of the PPD models.
Comparison of candidate predictors in the sample of pregnant women (N=508).
Comparison of demographic characteristics, including data sets of 618 pregnant women lost in the cohort and 508 mothers who left the cohort study after childbirth.
Definitions and coding of analyzed variables.
Brief Resilience Scale
model built using the random forest algorithm and expert consultation method
model built using the support vector machine algorithm and expert consultation method
Edinburgh Postnatal Depression Scale
model built using the random forest algorithm and Random forest-based filter feature selection method
model built using the support vector machine algorithm and Random forest-based filter feature selection method
random forest-based filter feature selection
Generalized Anxiety Disorder-7
machine learning
postpartum depression
Pittsburgh Sleep Quality Index
random forest
receiver operator curve-area under the curve
Social Support Rating Scale
support vector machine.
We acknowledge the people who have contributed to the field of this study, including professor KK Cheng from University of Birmingham, Liu Lu from Central South University. This project is funded by the National Natural Science Foundation of China (Grant No 81402690, 81773446), the National Natural Science Foundation of Hunan Province (Grant No 2019JJ40351), and the Graduate Research and Innovation Project of Central South University (Grant No 1053320183626).
WZ, as the first author, developed the initial manuscript. She helped with recruitment of the participants and collected the data. Authors WZ and HL performed the statistical analysis. Authors HL and VS contributed substantially to the revision and refinement of the final manuscript study. Authors WG and PQ guided the overall design of the study and supervised the model development and manuscript. WG and PQ contributed equally to this paper.
None declared.