Published on 23.03.20 in Vol 8, No 3 (2020): March
Preprints (earlier versions) of this paper are available at http://preprints.jmir.org/preprint/17110, first published Nov 19, 2019.
Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study
Background: Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling.
Objective: We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan.
Methods: Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables.
Results: Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively.
Conclusions: Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.
JMIR Med Inform 2020;8(3):e17110
Metabolic syndrome is a cluster of disorders, including insulin resistance or hyperglycemia, visceral adiposity (identified by a large waistline or overweight), atherogenic dyslipidemia (eg, raised triglycerides or reduced high-density lipoprotein [LDL]), and endothelial dysfunction (characterized by elevated blood pressure) . Metabolic syndrome has significant impacts on the development and deterioration of several diseases and is a critical predictor of cardiovascular diseases [ , ]. Numerous modifiable risk factors and practical intervention strategies regarding metabolic syndrome have been proposed [ - ]. Identifying high-risk patients to prevent the incidence and deterioration of metabolic dysregulation and relevant diseases is therefore vital.
A recent study showed that nonalcoholic fatty liver disease (NAFLD) is closely correlated to metabolic syndrome. Patients with metabolic syndrome frequently show an increase in fat accumulation in the liver (steatosis) and hepatic insulin resistance . Nevertheless, the gold-standard method for NAFLD diagnosis is liver biopsy, which is a highly invasive procedure for patients. Several reports have demonstrated that ultrasound using FibroScan, also known as transient elastometry, can accurately assess the staging of NAFLD in a noninvasive manner with comparable results to liver biopsy [ , ]. The new models of FibroScan (marketed after 2013) can assess the staging of NAFLD using a liver stiffness score (E score) and a liver steatosis score (controlled attenuation parameter [CAP] score). Interestingly, the CAP score alone was found to be a useful indicator of the presence and severity of metabolic syndrome [ ]. Using traditional statistical modeling, we previously validated this finding, confirming that the CAP score alone can be used to detect metabolic syndrome with moderate accuracy (area under the receiver operating characteristic curve [ROC] of 0.79), and the accuracy was improved to 0.88 when combined with other biomarkers [ ].
Machine learning, whereby a computer algorithm learns from prior experience, was recently shown to have better performance over traditional statistical modeling approaches [- ]. Various supervised machine learning models based on decision trees have been successfully applied to medical data [ - ] for accurate prediction of a wide range of clinical conditions such as myocardial infarction [ ], atrial fibrillation [ ], trauma [ ], breast cancer [ - ], Alzheimer disease [ - ], cardiac surgery [ , ], and others [ , , - ]. However, each decision tree machine learning algorithm has its own strength and weakness. Therefore, comparing different decision tree algorithms can reduce the bias in the results and provide a more robust outcome. Accordingly, the aim of this study was to determine whether decision tree algorithms can predict the state of metabolic syndrome among self-paid health examination subjects who were examined with FibroScan.
This was a single-center retrospective cohort study. The cohort comprised self-paid health examination subjects at the Health Management Center of Taipei Medical University Hospital who were examined with FibroScan from September 2015 to December 2018.
The electronic healthcare records of subjects examined with FibroScan were reviewed at Taipei Medical University Hospital, which is a private, tertiary-care, 800-bed teaching hospital in Taiwan. The Institutional Review Board of Taipei Medical University Hospital approved the study design for data collection (TMU-JIRB No.: N201903080) in accordance with the original and amended Declaration of Helsinki. The requirement for informed consent was waived owing to the retrospective nature of the study.
Population and Data Collection
The study included all Taiwanese adult patients aged >18 years who had undergone a self-paid health examination comprising an abdominal transient elastography inspection using FibroScan 502 Touch (Echosens, Paris, France). Individuals who underwent FibroScan examination on physician’s orders were excluded. The routine protocols of the Health Management Center were applied to all participants. The subjects were first interviewed by thoroughly trained personnel who verified the correctness of self-completed questionnaires on demographics, existing medical conditions, and medication use. In addition, the personnel confirmed adherence to health examination prerequisites (eg, overnight fasting for at least 8 hours) for the package chosen by the study participant. Those found to have not fulfilled the necessary prerequisites were advised to reschedule their appointment. Anthropometrics, including weight, height, waist circumference, and arterial pressure, were measured. Instruments were regularly calibrated per the manufacturer’s specifications. According to the chosen package, the required samples of blood, urine, and specimens were collected for laboratory tests. Regular laboratory test items included alpha-fetoprotein, glycated hemoglobin (HbA1c), serum glutamic oxaloacetic transaminase (GOT), serum glutamic pyruvic transaminase (GPT), uric acid, creatinine, blood urine nitrogen, red blood cell count, hemoglobin, hematocrit, mean corpuscular hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin concentration, platelet count, white blood cell count, percentage of neutrophils, lymphocytes, monocytes, eosinophils and basophils, total protein, albumin, globulin, albumin/globulin ratio, total bilirubin, direct bilirubin, alkaline phosphatase, gamma-glutamyl transpeptidase (γ-GT), total cholesterol, LDL cholesterol, high-density lipoprotein (HDL) cholesterol, LDL/HDL ratio, triglycerides, fasting blood sugar, and thyroid-stimulating hormone. The estimated glomerular filtration rate (eGFR) was calculated using equations for the Modification of Diet in Renal Disease for Chinese patients , with chronic kidney disease (CKD) measured as follows: 175 × (Scr)–1.234 × (Age)–0.179 × 0.79 (if female). CKD was defined as an eGFR of <60 mL/min per 1.73 m2 of body surface (mL/min/1.73 m2), according to the definition from the Kidney Disease Outcomes Quality Initiative for CKD ≥ stage 3 [ , ]. Body mass index categories were defined as follows: obesity, ≥27 kg/m2; overweight, 24-26.9 kg/m2; and normal weight, <23.9 kg/m2, according to the ranges established for Asian populations by the Ministry of Health and Welfare of Taiwan [ ].
According to the National Cholesterol Education Program Adult Treatment Panel III definition of metabolic syndrome consensus, metabolic syndrome was identified if at least three out of the following five symptoms were present: large waistline (80 cm for women and 90 cm for men), high triglycerides (150 mg/dL) or use of medication to control triglycerides, reduced HDL levels (<50 mg/dL for women and <40 mg/dL for men) or use of medication to control HDL, elevated blood pressure (systolic blood pressure 130 mmHg or diastolic blood pressure 85 mmHg) or use of relevant medication to control blood pressure, and increased fasting blood sugar (100 mg/dL) or use of relevant medication to control blood sugar. The classification of cutoff points was adopted from the National Cholesterol Education Program Adult Treatment Panel III definition consensus with ethnicity-specific cutoff points for waist circumference [, ] and an equality principle for the five disorders.
FibroScan is a noninvasive device that assesses the hardness of the liver using ultrasound-based elastography. Liver hardness is evaluated by measuring the velocity of a vibration wave, which is determined by measuring the time that the vibration wave takes to travel to a particular depth inside the liver from the skin (). For each FibroScan inspection, two scores are reported: the CAP score and E score. The dashboard of FibroScan provides a CAP score only when an E score derived from identical signals is validated as successfully computed; higher E scores indicate higher transmission velocity and liver stiffness levels, and higher CAP scores indicate faster wave amplitude attenuation and higher levels of liver steatosis. Notably, the adoption of probe size (medium or extra large) is based on the recommendation of the instrumental autodetection function.
Machine Learning Technique
A decision tree is a widely used effective nonparametric machine learning modeling technique for regression or classification purposes. To obtain solutions, a decision tree makes a sequential, hierarchical decision regarding outcome variables based on the predictor .
Classification and Regression Trees
Classification and regression trees (CART), the typical tree-based models, explore the structure of data, while evolving to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome . The decision at each internal node is assessed by information gain or entropy to compare the value of attributes in the data from the root to each of the leaves. CART was generated through the “rpart” package in R [ ].
C5.0 is derived from C4.5 and ID3 with improvements according to the disadvantages of the predecessor trees. The “C50” package was applied to implement the C5.0 tree [, ].
Chi-Square Automatic Interaction Detection
Chi-square automatic interaction detection (CHAID) is a specific decision tree using adjusted significance testing (Bonferroni testing) for prediction. An algorithm for recursive partitioning is implemented by maximizing the significance of a chi-square statistic for crosstabulations between the categorical dependent variable and the categorical predictors at each partition. Moreover, CHAID can create nonbinary trees since nominal, ordinal, and continuous data are used. CHAID tree is available from the “CHAID” package in R .
Conditional Interference Trees
Conditional inference trees (ctrees) embed tree-structured regression models into a well-defined theory of conditional inference procedures. They use a significance test procedure to select variables instead of selecting the variable that maximizes any information measure. In addition, ctree is applicable to all types of regression issues, including nominal, ordinal, numeric, censored, and multivariate response variables, as well as arbitrary measurement scales of covariates. A flexible and extensible computational tool in the “partykit” package of R is suitable for fitting and visualizing ctrees [, ].
Evolutionary Learning of Globally Optimal Trees
Evolutionary learning of globally optimal trees (evtree) describes recursive partitioning methods that create models using a forward stepwise search. An evtree is learned using an evolutionary algorithm. Notably, a set of trees is initialized with random split rules in the root nodes. Mutation and crossover operators are then applied to modify the tree’s structure and tests that are applied in the internal nodes. After each modification step, a survivor selection mechanism identifies the best candidate models for the next iteration, terminating when the quality of the best trees ceases to improve. The “evtree” package in R applies an evolutionary algorithm for learning globally optimal classification and regression trees .
Generalized Linear Model Trees
Generalized linear model trees (glmtree) involve model-based recursive partitioning based on generalized linear models. They are convenient for fitting model-based recursive partitions using “mob” functions in R. A glmtree internally sets up a model-fit function for mob using the negative log likelihood as the objective function. It is also implemented by the “partykit” package in R [, ].
Random decision forests are an ensemble learning method for classification, regression or other applications based on decision tree structures at the time of training. The idea of random forest is to create multiple decision trees (CART) and then combine the output generated by each of the decision trees. In the decision tree algorithm, the Gini index is a measure of the frequency of a randomly chosen element from the set that would be incorrectly labeled. The Gini index is calculated by subtracting the sum of the squared probabilities of each class from 1. This approach removes the bias that a decision tree model might introduce to a system while considerably improving the predictive power. In addition, random forests can be used to rank the importance of variables in a regression or classification problem in a natural manner, which can be conducted in the R package “randomForest” .
Statistical analysis was conducted using R (version 3.6.1; R Foundation for Statistical Computing, Vienna, Austria) or SPSS (version 17.0; SPSS Inc, Chicago, IL, USA) software.
Categorical variables were tested using the chi-square test or Fisher exact test. The nonparametric Mann-Whitney U-test was applied to determine differences in the median of continuous variables between the two groups. Multivariate logistic regression was employed to assess the significance of clinical data, and the variance inflation factor was also used to check for multicollinearity. P<.05 was considered statistically significant [, ].
Principal Components Analysis
High-dimension data were processed by principal components analysis (PCA), using an orthogonal transformation to convert a set of observations of correlated variables to provide a two-dimensional or three-dimensional visualization with its leading principal components.
Receiver Operating Characteristic Curve
ROC curves were used to illustrate the diagnostic ability of classification trees in the machine learning methodology. The area under the ROC curve (AUC), true positive rate (also called sensitivity or recall), and false positive rate (specificity) are represented in a graphical plot . The F1 score, which constitutes the harmonic mean of precision and recall, was also evaluated. The F1 score has been widely used in the natural language processing literature and for machine learning [ , ].
Data with missing values were statistically regulated by the expectation-maximization algorithm, which is an iterative procedure that preserves the relationship with other variables. Only 9 factors had missing values, and most of them accounted for less than 5% of the sample size. Direct bilirubin, which had the largest proportion of missing values (298/1333, 22.36%), was at high risk of multicollinearity; thus, it was not a crucial element in the model .
Comparison of Decision Trees
To compare the performance of the aforementioned decision trees, the same setting for the training set and testing set was considered. In addition, the boundary for each tree’s height was limited between 4 and 5 instead of pruning each decision tree according to its own criteria. Finally, outcomes from each decision tree were summarized to investigate common and reliable results supporting the conclusions.
After data cleaning, a total of 1333 individuals undergoing self-paid annual health examination were enrolled in this study. The baseline characteristics of the 193 patients diagnosed with metabolic syndrome and 1140 participants without metabolic syndrome are compared in. All categorical elements were found to be extremely significant in the chi-square test. Among the continuous variables, most of the risk factors were highly significantly different between groups in the nonparametric test, although not enough evidence was found for age, alpha-fetoprotein, bilirubin, and thyroid stimulating hormone to support rejection of the null hypothesis. However, large samples and P value problems had to be considered owing to the numerous and complex data in this analysis [ ]. The foremost factors were then validated by a series of additional evaluations.
The visualization of the two groups was achieved by PCA with the advantage of dimensionality reduction (). All factors with significant outcomes by the tests mentioned above and shown in depicted an intermixing view because the two groups of patients overlapped ( ), with weak explanatory power for the first two principal components PC1 and PC2 at 27.7% and 13.2%, respectively. A variety of views in three-dimensional PCA plots are also displayed in . The two groups could not be clearly discriminated, even if the coordinates were rotated in the three-dimensional graph.
|Risk factors||No metabolic syndrome (N=1140)||Metabolic syndrome (N=193)||P value|
|Chronic kidney diseasea, n (%)|
|Stage 1||585 (51.32)||71 (36.8)||.001|
|Stage 2||530 (46.49)||115 (59.6)|
|Stage 3||24 (2.11)||6 (3.1)|
|Stage 4||1 (0.09)||1 (0.5)|
|Sex, n (%)|
|female||564 (49.47)||44 (22.8)||<.001|
|male||576 (50.53)||149 (77.2)|
|Obesitya, n (%)|
|underweight||49 (4.30)||0 (0.0)||<.001|
|normal weight||667 (58.51)||22 (11.4)|
|overweight I||293 (25.70)||65 (33.7)|
|overweight II||131 (11.49)||106 (54.9)|
|Age (years), median (IQR)||44 (38-50)||45 (40-51)||.12|
|Hepatic indices, median (IQR)||4.6 (4.4-4.8)||4.7 (4.5-4.8)||<.001|
|Albumin (g/dL)||2.26 (1.637-3.12)||2.43 (1.71-3.14)||.15|
|AFPb (ng/mL)||58 (48-69)||62 (54-75)||<.001|
|ALKpc (IU/L)||20 (17-24)||24 (19-31)||<.001|
|GOTd (IU/L)||19 (13-27)||33 (22-51)||<.001|
|GPTe (IU/L)||0.6 (0.5-0.8)||0.7 (0.5-0.9)||.08|
|Total bilirubin (mg/dL)||0.2 (0.2-0.3)||0.2087 (0.2-0.3)||.66|
|Direct bilirubin (mg/dL)||16 (12-25)||29 (20-45)||<.001|
|γ-GTf (U/L)||239 (209-274)||311 (271-340)||<.001|
|CAPg score (dB/m)||4 (3.4-4.8)||4.9 (4.3-5.8)||<.001|
|E score (kPa)||12 (10-15)||13 (11-15)||.03|
|Nephritic indices, median (IQR)|
|Creatinine (mg/dL)||0.8 (0.6-0.9)||0.9 (0.7-1.0)||<.001|
|MDRDi||91.07 (81.3-105.17)||86.23 (75.05-98.82)||<.001|
|UAj (mg/dL)||5.2 (4.3-6.5)||6.3 (5.5-7.3)||<.001|
|Blood lipid and thyroid markers, median (IQR)|
|Cholesterol (mg/dL)||187 (165-208)||194 (165-220)||.03|
|LDLk (mg/dL)||121 (101-142)||136 (106-158)||<.001|
|HbA1cl (%)||5.4 (5.2-5.7)||5.7 (5.4-6.1)||<.001|
|TSHm (μIU/mL)||1.93 (1.30-2.52)||1.89 (1.38-2.51)||.83|
aProgressive discrete variables.
cALKp: alkaline phosphatase.
dGOT: glutamic-oxalocetic transaminase.
eGPT: glutamic-pyruvic transaminase.
fγ-GT: gamma-glutamyl transpeptidase.
gCAP: controlled attenuation parameter.
hBUN: blood urea nitrogen.
iMDRD: Modification of Diet in Renal Disease.
jUA: uric acid.
kLDL: low-density lipoprotein cholesterol.
lHbA1c: glycated hemoglobin.
mTSH: thyroid-stimulating hormone.
Next, we applied multivariate logistic regression to assess factors influencing metabolic syndrome. As shown in, the number of significant variables was reduced to 3, and included obesity, CAP score, and HbA1c. Among these, HbA1c was obtained from blood tests, whereas information on obesity and the CAP score was obtained through noninvasive means. Notably, obesity and HbA1c exhibited high odds ratios, exceeding 2. In addition, the variance inflation factor was taken into account for multicollinearity.
|Factor||Odds ratioa (95% CI)||VIFb||ΔVIFc||P value|
|Sex, Male/Female||0.742 (0.335-1.641)||3.590||1.630||.99|
|Age, years||1.025 (0.999-1.051)||1.622||1.506||.15|
|Albumin, g/dL||1.866 (0.821-4.239)||1.218||1.176||.12|
|AFPd, ng/mL||1.045 (0.915-1.193)||1.162||1.153||.48|
|ALKpe, IU/L||0.995 (0.983-1.007)||1.158||1.140||.52|
|GOTf, IU/L||0.959 (0.923-0.997)||7.226||-||-|
|GPTg, IU/L||1.023 (1.003-1.045)||7.747||1.555||.51|
|Total bilirubin, mg/dL||2.599 (0.562-12.015)||8.334||1.246||.39|
|Direct bilirubin, mg/dL||0.011 (0-2.507)||8.413||-||-|
|γ-GTh, U/L||1.002 (0.994-1.009)||1.414||1.379||.77|
|CAPi score, dB/m||1.011 (1.007-1.016)||1.455||1.398||<.001|
|E score, kPa||1.046 (0.926-1.182)||1.284||1.256||.61|
|BUNk, mg/dL||0.952 (0.893-1.016)||1.397||1.338||.13|
|Creatinine, mg/dL||4.288 (0.196-94.014)||10.957||-||-|
|UAm, mg/dL||1.127 (0.967-1.314)||1.642||1.596||.08|
|Cholesterol, mg/dL||1.003 (0.986-1.021)||9.855||-||-|
|LDLn mg/dL||0.994 (0.976-1.012)||9.701||1.069||.82|
|HbA1co, %||2.170 (1.631-2.888)||1.236||1.230||<.001|
|TSHp, μIU/mL||0.876 (0.727-1.054)||1.086||1.078||.13|
aThe odds ratio represents the exp(β), which is the exponential of the estimator in logistic regression.
bVIF: variance inflation factor (to check multicollinearity); factors with high VIF values are italicized.
cΔVIF: variance inflation factor after removal of predictor variables with high VIF values; VIF values with a sharp decline are italicized.
eALKp: alkaline phosphatase.
fGOT: glutamic-oxalocetic transaminase.
gGPT: glutamic-pyruvic transaminase.
hγ-GT: gamma-glutamyl transpeptidase.
iCAP: controlled attenuation parameter.
jCKD: chronic kidney disease.
kBUN: blood urea nitrogen.
lMDRD: Modification of Diet in Renal Disease.
mUA: uric acid.
nLDL: low-density lipoprotein cholesterol.
oHbA1c: glycated hemoglobin.
pTSH: thyroid-stimulating hormone.
To inspect the potential indices used for metabolic syndrome, several types of decision trees were applied to health examination data for the classification of metabolic syndrome (). In general, obesity, CAP score, and HbA1c were found to be important predictive variables in the decision trees. Moreover, important variables appearing in each node of the decision trees were recorded 100 times ( ). CAP score, obesity, and HbA1c were regarded as outstanding variables in the root, and E score, γ-GT, LDL, and GPT were secondary variables in the decision trees. The thresholds for factors classified as nodes are listed on the branches of each decision tree. In addition, a right skew pattern at the leaves was apparent and expected because the classification of metabolic syndrome was achieved efficiently and hierarchically by the decision trees ( ).
|Decision tree||Rootb||Primary nodec (root included)||Secondary noded|
|CAP score (0.99)|
E score (0.05)v
E score (0.21)v
|Total bilirubin (0.19)|
CAP score (0.17)
|C5.0||CAP score (0.90)|
CAP score (0.94)
Total bilirubin (0.41)
E score (0.29)v
|CHAIDr||Obesity (1.00)||Obesity (1.00)|
CAP score (0.70)
|CAP score (1.21)|
E score (0.32)v
CAP score (0.04)
CAP score (0.85)
CAP score (0.67)
CAP score (0.17)
CAP score (0.53)
CAP score (0.44)
E score (0.25)v
|glmtreeu||CAP score (0.85)|
CAP score (0.92)
CAP score (0.22)
aMajor variables are listed with their weights as candidate nodes in each decision tree; since some variables may be considered candidate nodes in the decision tree more than once, the proportion of variables can be larger than 1.
bThe root shows factors appearing as the first classified node and their proportions.
cThe primary node (italicized) includes variables selected as the top three nodes (root included) with their proportions (>0.05); variables with lower weights as candidate nodes in the primary nodes are excluded.
dThe secondary node includes all remaining candidate nodes in each decision tree with their proportions; only candidate nodes with proportions >0.1 with a certain influence in the classification of metabolic syndrome are shown.
eCART: classification and regression trees.
fCAP: controlled attenuation parameter.
gHbA1c: glycated hemoglobin.
hγ-GT: gamma-glutamyl transpeptidase.
iGOT: glutamic-oxalocetic transaminase.
jUA: uric acid.
lALKp: alkaline phosphatase.
mGPT: glutamic-pyruvic transaminase.
nLDL: low-density lipoprotein cholesterol.
oMDRD: Modification of Diet in Renal Disease.
pTSH: thyroid-stimulating hormone.
qCKD: chronic kidney disease.
rCHAID: chi-square automatic interaction detection.
sctree: conditional inference tree.
tevtree: evolutionary learning of globally optimal tree.
uglmtree: generalized linear model tree.
vSecondary variables for classification of metabolic syndrome in several decision tree algorithms.
PCA was then applied again to visualize the nonmetabolic syndrome and metabolic syndrome groups according to the prominence of factors from the decision trees, which comprised the CAP score, obesity, and HbA1c (b). PC1 and PC2 explained greater variability of 56.7% and 29.1%, respectively. With this analysis, discrimination between the two groups was evident, although the junction of the two groups was explicit in the union ( ).
Finally, the accuracies of various decision trees were determined using 500 rounds of random sampling from the entire health examination dataset with fixed-size divisions of training and testing sets (). Independent training and testing sets were used for each evaluation to confirm the performance and reliability of each model. The AUC of the ROC curve was determined to evaluate the performance of each decision tree and random forest ( , , and ). Prominent variables obtained with random forest are shown in . In general, CAP score, obesity, HbA1c, GPT, and γ-GT were the leading variables in accuracy, whereas CAP score, HbA1c, obesity, GPT, γ-GT, and E score played essential roles in random forest for classification.
aAccuracy and F1-score were calculated from 500 machine learning trials with different training sets for comparison with the number of candidate trees from random forest. Accuracy is the probability of true positives and true negatives for all data, whereas F1-score is a measure of performance, which is the harmonic mean of precision and recall. The dataset was divided 80% as the training set and 20% as the testing set independently for each analysis with randomized sampling.
bCART: classification and regression trees.
cCHAID: chi-square automatic interaction detection.
dctree: conditional inference tree.
eevtree: evolutionary learning of globally optimal tree.
fglmtree: generalized linear model tree.
gThe terminal nodes of the R package glmtree are not a simple classification form to calculate the confusion matrix for accuracy; therefore, the area under the curve was used to reach a balance in comparison between the seven decision tree techniques on the same training and testing set.
The use of artificial intelligence in health care, particularly machine learning methods, can help to discover underlying patterns and correlations through the learning of data-driven prediction models. We applied various machine learning techniques to visualize and investigate predictive variables leading to metabolic syndrome, which revealed that obesity, serum GOT, serum GPT, CAP score, and HbA1c are the most important predictive variables.
Among these predictive variables, the predictive power of the CAP score was similar to that of other key indices such as obesity. Despite the significance of the CAP score, these factors make sense cumulatively rather than as exclusive alternatives. In other words, more research is required to determine whether the CAP score can be used as a standalone test method to screen for metabolic syndrome, and whether a minimum set of nonblood test variables can be combined with the CAP score to improve the accuracy of predicting metabolic syndrome. Such future research may help subjects who are resistant to the inconvenience of overnight starvation or painful blood assays.
Metabolic syndrome demonstrates a spectrum of physiological manifestations with groups of pathologies that are complicated and progressive. Traditional diagnostic criteria often dichotomize the population into those with metabolic syndrome and those without. However, based on the results of our PCA, such a sharp distinction may be inappropriate. We found that CAP score, obesity, and HbA1c were the principal factors predicting metabolic syndrome, although E score, γ-GT, LDL, and GPT also considerably affected the predictions. Notably, GPT had more predictive power than GOT. We consider this difference to be related to aspartate aminotransferase as a relatively less specific indicator of liver damage than alanine aminotransferase, which is common in patients with fatty liver. Our study suggests that current diagnostic criteria for metabolic syndrome fail to capture its wide range of presentations, and should thus be expanded to include hepatic and nephritic indices.
Liver-related indices such as γ-GT, GPT, and E score ranked among the highest predictors in our models. A previous study also showed a strong correlation between liver function tests and metabolic syndrome based on Pearson correlation coefficients . HbA1c is reported to be more closely associated with several chronic diseases than fasting plasma glucose. In addition, although fasting glucose levels are commonly believed to be reproducible across days, acute perturbations of glucose homeostasis due to stress and other factors have been described. By contrast, HbA1c is not influenced by acute perturbations or insufficient fasting; thus, it can be measured at any time. Accordingly, HbA1c might prove to be a more suitable predictor of metabolic syndrome [ ].
Multivariate logistic regression has been extensively utilized in medical research, and its many biases have been well documented. One of the drawbacks we observed in our models was the multicollinearity problem. To avoid multicollinearity (), GOT, direct bilirubin, creatinine, and cholesterol were eliminated from the regression model. By contrast, the decision trees had few such disadvantages and offered more intuitive visualizations. The trained decision tree models could also be more easily interpreted by human experts, which is vital for establishing various important pathways to metabolic syndrome. In general, our result that random forest has the best accuracy in detecting metabolic syndrome agrees with previous research [ ]. One of the reasons for the better accuracy of a random forest model is that it creates multiple decision trees and then combines the output generated by each tree; each tree is built from a sample drawn with replacement from the training set. This approach therefore removes the bias that a decision tree model might introduce in the system, thus substantially improving the predictive power.
This study has several limitations. First, this was a retrospective study, and therefore a sufficiently powered prospective cohort study is needed to conclusively address the usefulness of supervised machine learning models to diagnose metabolic syndrome. Second, this study included only health-conscious Taiwanese participants that underwent a self-paid health examination; therefore, this study should be replicated and validated in other populations. Third, this study failed to include some new obesity biomarkers (such as leptin and adiponectin) that may further improve the prediction of metabolic syndrome .
To the best of our knowledge, this is the first study to apply machine learning algorithms to identify metabolic syndrome in subjects examined with FibroScan. We found that decision tree learning algorithms identified metabolic syndrome in self-paid health examination subjects with high accuracy, and obesity, serum GOT, serum GPT, CAP score, and HbA1c emerged as important predictive variables. More research is required to validate the CAP score as a standalone test method to screen for metabolic syndrome, and to determine whether a minimum set of nonblood tests variables can be combined with the CAP score to improve the accuracy of predicting metabolic syndrome.
This study is supported by the Taiwan National Science Foundation (grant NSC108-2314-B-038-073). The funding body had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflicts of Interest
Three-dimensional principal components analysis (PCA) of all risk factors. The three-dimensional PCA plots provide different visual points to observe the scatter of both the metabolic syndrome and nonmetabolic syndrome groups. All factors in Table 1 are considered in this analysis. The leading principal components PC1, PC2, and PC3—which explain more variability among the samples—are shown in all three-dimensional graphs. The aggregation of two groups is obvious when rotating the coordinate in the three-dimensional graph.PNG File , 330 KB
Three-dimensional principal components analysis of only the major risk factors. In this case, the distinction between the metabolic syndrome and nonmetabolic syndrome groups is apparent because only the major variables obtained from Table 3 are included, although the borders of the two groups still overlap.PNG File , 317 KB
Receiving operator characteristic curves and area under the curve (AUC) values of six decision trees. The specificity is revealed by the color bar, and the diagonal line is presented as a dashed line. Most AUC values exceed 0.80 except for that of the C5.0 tree.PNG File , 388 KB
- Huang PL. A comprehensive definition for metabolic syndrome. Dis Model Mech 2009;2(5-6):231-237 [FREE Full text] [CrossRef] [Medline]
- Gurusamy J, Gandhi S, Damodharan D, Ganesan V, Palaniappan M. Exercise, diet and educational interventions for metabolic syndrome in persons with schizophrenia: A systematic review. Asian J Psychiatr 2018 Aug;36:73-85. [CrossRef] [Medline]
- Bassi N, Karagodin I, Wang S, Vassallo P, Priyanath A, Massaro E, et al. Lifestyle modification for metabolic syndrome: a systematic review. Am J Med 2014 Dec;127(12):1242. [CrossRef] [Medline]
- de Lédinghen V, Vergniol J. Transient elastography (FibroScan). Gastroentérol Clin Biol 2008 Sep;32(6):58-67. [CrossRef]
- Chang PE, Goh GB, Ngu JH, Tan HK, Tan CK. Clinical applications, limitations and future role of transient elastography in the management of liver disease. World J Gastrointest Pharmacol Ther 2016 Feb 06;7(1):91-106 [FREE Full text] [CrossRef] [Medline]
- Chivinge A, Harris R, Guha N, Aithal G, James M, Ryder S, et al. Risk-stratified screening for chronic liver disease using vibration-controlled transient elastography (Fibroscan). Gastrointest Nurs 2018 Jun 02;16(5):S15-S22. [CrossRef]
- Wong GL. Transient elastography: Kill two birds with one stone? World J Hepatol 2013 May 27;5(5):264-274 [FREE Full text] [CrossRef] [Medline]
- Recio E, Cifuentes C, Macías J, Mira JA, Parra-Sánchez M, Rivero-Juárez A, et al. Interobserver concordance in controlled attenuation parameter measurement, a novel tool for the assessment of hepatic steatosis on the basis of transient elastography. Eur J Gastroenterol Hepatol 2013 Aug;25(8):905-911. [CrossRef] [Medline]
- Fraquelli M, Rigamonti C, Casazza G, Conte D, Donato MF, Ronchi G, et al. Reproducibility of transient elastography in the evaluation of liver fibrosis in patients with chronic liver disease. Gut 2007 Jul;56(7):968-973 [FREE Full text] [CrossRef] [Medline]
- Boursier J, Konaté A, Gorea G, Reaud S, Quemener E, Oberti F, et al. Reproducibility of liver stiffness measurement by ultrasonographic elastometry. Clin Gastroenterol Hepatol 2008 Nov;6(11):1263-1269. [CrossRef] [Medline]
- Li Y, Huang Y, Wang Z, Yang Z, Sun F, Zhan S, et al. Systematic review with meta-analysis: the diagnostic accuracy of transient elastography for the staging of liver fibrosis in patients with chronic hepatitis B. Aliment Pharmacol Ther 2016 Feb;43(4):458-469. [CrossRef] [Medline]
- Pavlov CS, Casazza G, Nikolova D, Tsochatzis E, Burroughs AK, Ivashkin VT, et al. Transient elastography for diagnosis of stages of hepatic fibrosis and cirrhosis in people with alcoholic liver disease. Cochrane Database Syst Rev 2015 Jan 22;1:CD010542. [CrossRef] [Medline]
- Hartl J, Denzer U, Ehlken H, Zenouzi R, Peiseler M, Sebode M, et al. Transient elastography in autoimmune hepatitis: Timing determines the impact of inflammation and fibrosis. J Hepatol 2016 Oct;65(4):769-775. [CrossRef] [Medline]
- Roulot D, Costes J, Buyck J, Warzocha U, Gambier N, Czernichow S, et al. Transient elastography as a screening tool for liver fibrosis and cirrhosis in a community-based population aged over 45 years. Gut 2011 Jul;60(7):977-984. [CrossRef] [Medline]
- Kotronen A, Westerbacka J, Bergholm R, Pietiläinen KH, Yki-Järvinen H. Liver fat in the metabolic syndrome. J Clin Endocrinol Metab 2007 Sep;92(9):3490-3497. [CrossRef] [Medline]
- Vuppalanchi R, Siddiqui MS, Van Natta ML, Hallinan E, Brandman D, Kowdley K, NASH Clinical Research Network. Performance characteristics of vibration-controlled transient elastography for evaluation of nonalcoholic fatty liver disease. Hepatology 2018 Jan;67(1):134-144 [FREE Full text] [CrossRef] [Medline]
- Eddowes PJ, Sasso M, Allison M, Tsochatzis E, Anstee QM, Sheridan D, et al. Accuracy of FibroScan Controlled Attenuation Parameter and Liver Stiffness Measurement in Assessing Steatosis and Fibrosis in Patients With Nonalcoholic Fatty Liver Disease. Gastroenterology 2019 May;156(6):1717-1730. [CrossRef] [Medline]
- Hu Y, Dong N, Qu Q, Zhao X, Yang H. The correlation between controlled attenuation parameter and metabolic syndrome and its components in middle-aged and elderly nonalcoholic fatty liver disease patients. Medicine (Baltimore) 2018 Oct;97(43):e12931. [CrossRef] [Medline]
- Lin YJ, Lin CH, Wang ST, Lin SY, Chang SS. Noninvasive and Convenient Screening of Metabolic Syndrome Using the Controlled Attenuation Parameter Technology: An Evaluation Based on Self-Paid Health Examination Participants. J Clin Med 2019 Oct 24;8(11):E1775 [FREE Full text] [CrossRef] [Medline]
- Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA 2018 Apr 03;319(13):1317-1318. [CrossRef] [Medline]
- Chen JH, Asch SM. Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. N Engl J Med 2017 Jun 29;376(26):2507-2509. [CrossRef] [Medline]
- Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J 2017 Jun 14;38(23):1805-1814. [CrossRef] [Medline]
- Koh HC, Tan G. Data mining applications in healthcare. J Healthc Inf Manag 2005;19(2):64-72. [Medline]
- Lisboa PJ. A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Networks 2002 Jan;15(1):11-39. [CrossRef]
- Bertsimas D, Bjarnadóttir MV, Kane MA, Kryder JC, Pandey R, Vempala S, et al. Algorithmic Prediction of Health-Care Costs. Operations Res 2008 Dec;56(6):1382-1392. [CrossRef]
- Bell LM, Grundmeier R, Localio R, Zorc J, Fiks AG, Zhang X, et al. Electronic health record-based decision support to improve asthma care: a cluster-randomized trial. Pediatrics 2010 Apr;125(4):e770-e777. [CrossRef] [Medline]
- Tomar D, Agarwal S. A survey on Data Mining approaches for Healthcare. Int J Bioci Biotechnol 2013 Oct 31;5(5):241-266. [CrossRef]
- Fei Y, Hu J, Gao K, Tu J, Li W, Wang W. Predicting risk for portal vein thrombosis in acute pancreatitis patients: A comparison of radical basis function artificial neural network and logistic regression models. J Crit Care 2017 Jun;39:115-123. [CrossRef] [Medline]
- Santelices LC, Wang Y, Severyn D, Druzdzel MJ, Kormos RL, Antaki JF. Development of a hybrid decision support model for optimal ventricular assist device weaning. Ann Thorac Surg 2010 Sep;90(3):713-720 [FREE Full text] [CrossRef] [Medline]
- Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med 1991 Dec 01;115(11):843-848. [CrossRef] [Medline]
- Artis SG, Mark RG, Moody GB. Detection of atrial fibrillation using artificial neural networks. In: (1991) Proceedings Computers in Cardiology.: IEEE; 1991 Presented at: Computers in Cardiology (CinC); September 23-26, 1991; Venice, Italy p. 23-26 URL: https://ieeexplore.ieee.org/document/169073 [CrossRef]
- Eftekhar B, Mohammad K, Ardebili HE, Ghodsi M, Ketabchi E. Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data. BMC Med Inform Decis Mak 2005 Feb 15;5:3 [FREE Full text] [CrossRef] [Medline]
- Belciug S, Salem A, Gorunescu F, Gorunescu M. Clustering-based approach for detecting breast cancer recurrence. : IEEE; 2010 Presented at: 10th International Conference on Intelligent Systems Design and Applications; 2010; Cairo, Egypt p. 533-538. [CrossRef]
- Salama GI, Abdelhalim MB, Zeid M. Breast Cancer Diagnosis on Three Different Datasets using Multi-classifiers. Int J Comput Inf Technol 2012;1(1):36-43 [FREE Full text]
- Ayer T, Chhatwal J, Alagoz O, Kahn CE, Woods RW, Burnside ES. Informatics in radiology: comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radiographics 2010 Jan;30(1):13-22 [FREE Full text] [CrossRef] [Medline]
- Joshi S, Shenoy DG, G.G VS, Rrashmi PL, Venugopal KR, Patnaik LM. Classification of Alzheimer's Disease and Parkinson's Disease by Using Machine Learning and Neural Network Methods. IEEE: IEEE Computer Society; 2010 Presented at: ICMLC 2010: The 2nd International Conference on Machine Learning and Computing; February 9-11, 2010; Bangalore, India p. 218-222. [CrossRef]
- Escudero J, Zajicek JP, Ifeachor E. Early detection and characterization of Alzheimer's disease in clinical scenarios using Bioprofile concepts and K-means. In: Conf Proc IEEE Eng Med Biol Soc.: IEEE; 2011 Presented at: 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2011; Boston. [CrossRef]
- Ramani RG, Sivagami G. Parkinson Disease Classification using Data Mining Algorithms. International Journal of Computer Applications October. Int J Comput Appl 2011;32(9):17-22 [FREE Full text] [CrossRef]
- Nilsson J, Ohlsson M, Thulin L, Höglund P, Nashef SA, Brandt J. Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks. J Thorac Cardiovasc Surg 2006 Jul;132(1):12-19 [FREE Full text] [CrossRef] [Medline]
- Kazemi Y, Mirroshandel SA. A novel method for predicting kidney stone type using ensemble learning. Artif Intell Med 2018 Jan;84:117-126. [CrossRef] [Medline]
- Edelstein P. Emerging directions in analytics. Predictive analytics will play an indispensable role in healthcare transformation and reform. Health Manag Technol 2013 Jan;34(1):16-17. [Medline]
- Moradi M, Ghadiri N. Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif Intell Med 2018 Jan;84:101-116. [CrossRef] [Medline]
- Ma YC, Zuo L, Chen JH, Luo Q, Yu XQ, Li Y, et al. Modified glomerular filtration rate estimating equation for Chinese patients with chronic kidney disease. J Am Soc Nephrol 2006 Oct;17(10):2937-2944 [FREE Full text] [CrossRef] [Medline]
- National Kidney Foundation. K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Am J Kidney Dis 2002 Feb;39(2 Suppl 1):S1-266. [Medline]
- Choo V. WHO reassesses appropriate body-mass index for Asian populations. Lancet 2002 Jul 20;360(9328):235. [CrossRef] [Medline]
- Alberti KG, Zimmet P, Shaw J. Metabolic syndrome--a new world-wide definition. A Consensus Statement from the International Diabetes Federation. Diabet Med 2006 May;23(5):469-480. [CrossRef] [Medline]
- Alberti KG, Zimmet P, Shaw J, IDF Epidemiology Task Force Consensus Group. The metabolic syndrome--a new worldwide definition. Lancet 2005;366(9491):1059-1062. [CrossRef] [Medline]
- Quinlan JR. Simplifying decision trees. Int J Man-Machine Stud 1987 Sep;27(3):221-234. [CrossRef]
- Breiman L, Friedman J, Stone CJ, Olshen R. Classification and Regression Trees (Wadsworth Statistics/Probability). Boca Raton, Florida: Chapman and Hall; 1984.
- Therneau T, Atkinson B, Ripley B. The rpart Package. 2010. URL: https://cran.r-project.org/web/packages/rpart/rpart.pdf [accessed 2019-10-15]
- Quinlan JR. C4.5: Programs For Machine Learning (Morgan Kaufmann Series In Machine Learning). San Francisco, CA: Morgan Kaufmann; 1993.
- Kuhn M, Weston S, Culp M, Coulter N, Quinlan R. Package ‘C50’. 2018. URL: https://cran.r-project.org/web/packages/C50/C50.pdf [accessed 2019-10-01]
- Kass GV. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Appl Stat 1980;29(2):119. [CrossRef]
- Hothorn T, Zeileis A. partykit: A modular toolkit for recursive partytioning in R. J Machine Learn Res 2015;16(1):3905-3909 [FREE Full text]
- Hothorn T, Hornik K, Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. J Comput Graph Stat 2006 Sep;15(3):651-674. [CrossRef]
- Grubinger T, Zeileis A, Pfeiffer K. evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R. J Stat Soft 2014;61(1):1-29. [CrossRef]
- Zeileis A, Hothorn T, Hornik K. Model-Based Recursive Partitioning. J Comput Graph Stat 2008 Jun;17(2):492-514. [CrossRef]
- Breiman L. Random forests. Machine learning 2001;45(1):5-32 [FREE Full text] [CrossRef]
- Fox J, Monette G. Generalized Collinearity Diagnostics. J Am Stat Assoc 1992 Mar;87(417):178. [CrossRef]
- Fox J, Bates D, Firth D, Friendly M, Gorjanc G, Graves S. The car Package. 2007. URL: ftp://ftp.uni-bayreuth.de/pub/math/statlib/R/CRAN/doc/packages/car.pdf [accessed 2019-10-20]
- Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett 2006 Jun;27(8):861-874. [CrossRef]
- Sasaki Y. The truth of the F-measure. 2007 Jun. URL: https://www.cs.odu.edu/~mukka/cs795sum09dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf [accessed 2019-10-15]
- Powers DMW. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2011;2(1):37-63 [FREE Full text] [CrossRef]
- von Hippel PT. Biases in SPSS 12.0 Missing Value Analysis. Am Stat 2004 May;58(2):160-164. [CrossRef]
- Lin M, Lucas HC, Shmueli G. Research commentary—Too big to fail: large Samples and the p-value problem. Inf Syst Res 2013 Dec;24(4):906-917. [CrossRef]
- Wang S, Zhang J, Zhu L, Song L, Meng Z, Jia Q, et al. Association between liver function and metabolic syndrome in Chinese men and women. Sci Rep 2017 Mar 20;7(1):44844. [CrossRef] [Medline]
- Vijayakumar P, Nelson RG, Hanson RL, Knowler WC, Sinha M. HbA1c and the Prediction of Type 2 Diabetes in Children and Adults. Diabetes Care 2017 Jan;40(1):16-21 [FREE Full text] [CrossRef] [Medline]
- Alam MZ, Rahman MS, Rahman MS. A Random Forest based predictor for medical data classification using feature ranking. Informat Med Unlocked 2019;15:100180. [CrossRef]
- Srikanthan K, Feyh A, Visweshwar H, Shapiro JI, Sodhi K. Systematic Review of Metabolic Syndrome Biomarkers: A Panel for Early Detection, Management, and Risk Stratification in the West Virginian Population. Int J Med Sci 2016;13(1):25-38 [FREE Full text] [CrossRef] [Medline]
|ALKp: alkaline phosphatase|
|AUC: area under the curve|
|BUN: blood urea nitrogen|
|CAP score: controlled attenuation parameter score|
|CART: classification and regression tree|
|CHAID: Chi-square automatic interaction detection|
|CKD: chronic kidney disease|
|ctree: conditional interference tree|
|eGFR: estimated glomerular filtration rate|
|E score: liver stiffness score|
|evtree: evolutionary learning of globally optimal trees|
|γ-GT: gamma-glutamyl transpeptidase|
|GOT: serum glutamic oxaloacetic transaminase|
|GPT: serum glutamic pyruvic transaminase|
|HbA1c: glycated hemoglobin|
|HDL: high-density lipoprotein|
|LDL: low-density lipoprotein|
|MDRD: Modification of Diet in Renal Disease|
|NAFLD: nonalcoholic fatty liver disease|
|PCA: principal components analysis|
|ROC: receiver operating characteristic|
|TSH: thyroid-stimulating hormone|
|UA: uric acid|
|VIF: variance inflation factor|
Edited by G Eysenbach; submitted 19.11.19; peer-reviewed by MT Lee, JY Wu, J Meyer; comments to author 14.12.19; revised version received 07.02.20; accepted 05.03.20; published 23.03.20
©Cheng-Sheng Yu, Yu-Jiun Lin, Chang-Hsien Lin, Sen-Te Wang, Shiyng-Yu Lin, Sanders H Lin, Jenny L Wu, Shy-Shin Chang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 23.03.2020.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.