This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling.
We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in selfpaid health examination subjects who were examined with FibroScan.
Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables.
Obesity, serum glutamicoxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively.
Machine learning technology facilitates the identification of metabolic syndrome in selfpaid health examination subjects with high accuracy.
Metabolic syndrome is a cluster of disorders, including insulin resistance or hyperglycemia, visceral adiposity (identified by a large waistline or overweight), atherogenic dyslipidemia (eg, raised triglycerides or reduced highdensity lipoprotein [LDL]), and endothelial dysfunction (characterized by elevated blood pressure) [
A recent study showed that nonalcoholic fatty liver disease (NAFLD) is closely correlated to metabolic syndrome. Patients with metabolic syndrome frequently show an increase in fat accumulation in the liver (steatosis) and hepatic insulin resistance [
Machine learning, whereby a computer algorithm learns from prior experience, was recently shown to have better performance over traditional statistical modeling approaches [
This was a singlecenter retrospective cohort study. The cohort comprised selfpaid health examination subjects at the Health Management Center of Taipei Medical University Hospital who were examined with FibroScan from September 2015 to December 2018.
The electronic healthcare records of subjects examined with FibroScan were reviewed at Taipei Medical University Hospital, which is a private, tertiarycare, 800bed teaching hospital in Taiwan. The Institutional Review Board of Taipei Medical University Hospital approved the study design for data collection (TMUJIRB No.: N201903080) in accordance with the original and amended Declaration of Helsinki. The requirement for informed consent was waived owing to the retrospective nature of the study.
The study included all Taiwanese adult patients aged >18 years who had undergone a selfpaid health examination comprising an abdominal transient elastography inspection using FibroScan 502 Touch (Echosens, Paris, France). Individuals who underwent FibroScan examination on physician’s orders were excluded. The routine protocols of the Health Management Center were applied to all participants. The subjects were first interviewed by thoroughly trained personnel who verified the correctness of selfcompleted questionnaires on demographics, existing medical conditions, and medication use. In addition, the personnel confirmed adherence to health examination prerequisites (eg, overnight fasting for at least 8 hours) for the package chosen by the study participant. Those found to have not fulfilled the necessary prerequisites were advised to reschedule their appointment. Anthropometrics, including weight, height, waist circumference, and arterial pressure, were measured. Instruments were regularly calibrated per the manufacturer’s specifications. According to the chosen package, the required samples of blood, urine, and specimens were collected for laboratory tests. Regular laboratory test items included alphafetoprotein, glycated hemoglobin (HbA_{1c}), serum glutamic oxaloacetic transaminase (GOT), serum glutamic pyruvic transaminase (GPT), uric acid, creatinine, blood urine nitrogen, red blood cell count, hemoglobin, hematocrit, mean corpuscular hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin concentration, platelet count, white blood cell count, percentage of neutrophils, lymphocytes, monocytes, eosinophils and basophils, total protein, albumin, globulin, albumin/globulin ratio, total bilirubin, direct bilirubin, alkaline phosphatase, gammaglutamyl transpeptidase (γGT), total cholesterol, LDL cholesterol, highdensity lipoprotein (HDL) cholesterol, LDL/HDL ratio, triglycerides, fasting blood sugar, and thyroidstimulating hormone. The estimated glomerular filtration rate (eGFR) was calculated using equations for the Modification of Diet in Renal Disease for Chinese patients [
According to the National Cholesterol Education Program Adult Treatment Panel III definition of metabolic syndrome consensus, metabolic syndrome was identified if at least three out of the following five symptoms were present: large waistline (80 cm for women and 90 cm for men), high triglycerides (150 mg/dL) or use of medication to control triglycerides, reduced HDL levels (<50 mg/dL for women and <40 mg/dL for men) or use of medication to control HDL, elevated blood pressure (systolic blood pressure 130 mmHg or diastolic blood pressure 85 mmHg) or use of relevant medication to control blood pressure, and increased fasting blood sugar (100 mg/dL) or use of relevant medication to control blood sugar. The classification of cutoff points was adopted from the National Cholesterol Education Program Adult Treatment Panel III definition consensus with ethnicityspecific cutoff points for waist circumference [
FibroScan is a noninvasive device that assesses the hardness of the liver using ultrasoundbased elastography. Liver hardness is evaluated by measuring the velocity of a vibration wave, which is determined by measuring the time that the vibration wave takes to travel to a particular depth inside the liver from the skin (
Illustration of the FibroScan device: liver diagnosis by ultrasoundbased elastography. FibroScan measures fibrosis and steatosis in the liver. Measurements are performed by scanning the right liver lobe through the right intercostal space. The fibrosis result is measured in kiloPascals (kPa), and is normally between 2.5 and 6 kPa; the highest possible result is 75 kPa. Fibrosis score: F0 to F1, no liver scarring or mild liver scarring; F2, moderate liver scarring; F3, severe liver scarring; F4, advanced liver scarring (cirrhosis). The steatosis result is measured in decibels per meter (dB/m), and is normally between 100 and 400 dB/m. Steatosis can be graded from S0 to S3, corresponding to the severity of fatty liver from "010%" to "67% or more".
A decision tree is a widely used effective nonparametric machine learning modeling technique for regression or classification purposes. To obtain solutions, a decision tree makes a sequential, hierarchical decision regarding outcome variables based on the predictor [
Classification and regression trees (CART), the typical treebased models, explore the structure of data, while evolving to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome [
C5.0 is derived from C4.5 and ID3 with improvements according to the disadvantages of the predecessor trees. The “C50” package was applied to implement the C5.0 tree [
Chisquare automatic interaction detection (CHAID) is a specific decision tree using adjusted significance testing (Bonferroni testing) for prediction. An algorithm for recursive partitioning is implemented by maximizing the significance of a chisquare statistic for crosstabulations between the categorical dependent variable and the categorical predictors at each partition. Moreover, CHAID can create nonbinary trees since nominal, ordinal, and continuous data are used. CHAID tree is available from the “CHAID” package in R [
Conditional inference trees (ctrees) embed treestructured regression models into a welldefined theory of conditional inference procedures. They use a significance test procedure to select variables instead of selecting the variable that maximizes any information measure. In addition, ctree is applicable to all types of regression issues, including nominal, ordinal, numeric, censored, and multivariate response variables, as well as arbitrary measurement scales of covariates. A flexible and extensible computational tool in the “partykit” package of R is suitable for fitting and visualizing ctrees [
Evolutionary learning of globally optimal trees (evtree) describes recursive partitioning methods that create models using a forward stepwise search. An evtree is learned using an evolutionary algorithm. Notably, a set of trees is initialized with random split rules in the root nodes. Mutation and crossover operators are then applied to modify the tree’s structure and tests that are applied in the internal nodes. After each modification step, a survivor selection mechanism identifies the best candidate models for the next iteration, terminating when the quality of the best trees ceases to improve. The “evtree” package in R applies an evolutionary algorithm for learning globally optimal classification and regression trees [
Generalized linear model trees (glmtree) involve modelbased recursive partitioning based on generalized linear models. They are convenient for fitting modelbased recursive partitions using “mob” functions in R. A glmtree internally sets up a modelfit function for mob using the negative log likelihood as the objective function. It is also implemented by the “partykit” package in R [
Random decision forests are an ensemble learning method for classification, regression or other applications based on decision tree structures at the time of training. The idea of random forest is to create multiple decision trees (CART) and then combine the output generated by each of the decision trees. In the decision tree algorithm, the Gini index is a measure of the frequency of a randomly chosen element from the set that would be incorrectly labeled. The Gini index is calculated by subtracting the sum of the squared probabilities of each class from 1. This approach removes the bias that a decision tree model might introduce to a system while considerably improving the predictive power. In addition, random forests can be used to rank the importance of variables in a regression or classification problem in a natural manner, which can be conducted in the R package “randomForest” [
Statistical analysis was conducted using R (version 3.6.1; R Foundation for Statistical Computing, Vienna, Austria) or SPSS (version 17.0; SPSS Inc, Chicago, IL, USA) software.
Categorical variables were tested using the chisquare test or Fisher exact test. The nonparametric MannWhitney Utest was applied to determine differences in the median of continuous variables between the two groups. Multivariate logistic regression was employed to assess the significance of clinical data, and the variance inflation factor was also used to check for multicollinearity.
Highdimension data were processed by principal components analysis (PCA), using an orthogonal transformation to convert a set of observations of correlated variables to provide a twodimensional or threedimensional visualization with its leading principal components.
ROC curves were used to illustrate the diagnostic ability of classification trees in the machine learning methodology. The area under the ROC curve (AUC), true positive rate (also called sensitivity or recall), and false positive rate (specificity) are represented in a graphical plot [
Data with missing values were statistically regulated by the expectationmaximization algorithm, which is an iterative procedure that preserves the relationship with other variables. Only 9 factors had missing values, and most of them accounted for less than 5% of the sample size. Direct bilirubin, which had the largest proportion of missing values (298/1333, 22.36%), was at high risk of multicollinearity; thus, it was not a crucial element in the model [
To compare the performance of the aforementioned decision trees, the same setting for the training set and testing set was considered. In addition, the boundary for each tree’s height was limited between 4 and 5 instead of pruning each decision tree according to its own criteria. Finally, outcomes from each decision tree were summarized to investigate common and reliable results supporting the conclusions.
After data cleaning, a total of 1333 individuals undergoing selfpaid annual health examination were enrolled in this study. The baseline characteristics of the 193 patients diagnosed with metabolic syndrome and 1140 participants without metabolic syndrome are compared in
The visualization of the two groups was achieved by PCA with the advantage of dimensionality reduction (
Descriptive statistics and testing of risk factors in health examination data with potential metabolic syndrome as the dependent variable.
Risk factors  No metabolic syndrome (N=1140)  Metabolic syndrome (N=193)  






Stage 1  585 (51.32)  71 (36.8)  .001 

Stage 2  530 (46.49)  115 (59.6) 


Stage 3  24 (2.11)  6 (3.1) 


Stage 4  1 (0.09)  1 (0.5) 







female  564 (49.47)  44 (22.8)  <.001 

male  576 (50.53)  149 (77.2) 







underweight  49 (4.30)  0 (0.0)  <.001 

normal weight  667 (58.51)  22 (11.4) 


overweight I  293 (25.70)  65 (33.7) 


overweight II  131 (11.49)  106 (54.9) 

Age (years), median (IQR)  44 (3850)  45 (4051)  .12  

4.6 (4.44.8)  4.7 (4.54.8)  <.001  

Albumin (g/dL)  2.26 (1.6373.12)  2.43 (1.713.14)  .15 

AFP^{b} (ng/mL)  58 (4869)  62 (5475)  <.001 

ALKp^{c} (IU/L)  20 (1724)  24 (1931)  <.001 

GOT^{d} (IU/L)  19 (1327)  33 (2251)  <.001 

GPT^{e} (IU/L)  0.6 (0.50.8)  0.7 (0.50.9)  .08 

Total bilirubin (mg/dL)  0.2 (0.20.3)  0.2087 (0.20.3)  .66 

Direct bilirubin (mg/dL)  16 (1225)  29 (2045)  <.001 

γGT^{f} (U/L)  239 (209274)  311 (271340)  <.001 

CAP^{g} score (dB/m)  4 (3.44.8)  4.9 (4.35.8)  <.001 

E score (kPa)  12 (1015)  13 (1115)  .03 






BUN^{h} (mg/dL) 




Creatinine (mg/dL)  0.8 (0.60.9)  0.9 (0.71.0)  <.001 

MDRD^{i}  91.07 (81.3105.17)  86.23 (75.0598.82)  <.001 

UA^{j} (mg/dL)  5.2 (4.36.5)  6.3 (5.57.3)  <.001 





Cholesterol (mg/dL)  187 (165208)  194 (165220)  .03 

LDL^{k} (mg/dL)  121 (101142)  136 (106158)  <.001 

HbA_{1c}^{l} (%)  5.4 (5.25.7)  5.7 (5.46.1)  <.001 

TSH^{m} (μIU/mL)  1.93 (1.302.52)  1.89 (1.382.51)  .83 
^{a}Progressive discrete variables.
^{b}AFP: alphafetoprotein.
^{c}ALKp: alkaline phosphatase.
^{d}GOT: glutamicoxalocetic transaminase.
^{e}GPT: glutamicpyruvic transaminase.
^{f}γGT: gammaglutamyl transpeptidase.
^{g}CAP: controlled attenuation parameter.
^{h}BUN: blood urea nitrogen.
^{i}MDRD: Modification of Diet in Renal Disease.
^{j}UA: uric acid.
^{k}LDL: lowdensity lipoprotein cholesterol.
^{l}HbA_{1c}: glycated hemoglobin.
^{m}TSH: thyroidstimulating hormone.
Principal components analysis (PCA) of metabolic and nonmetabolic groups by twodimensional and threedimensional visualization. (a) PCA with 95% CI shown as ellipses for all risk factors in
Next, we applied multivariate logistic regression to assess factors influencing metabolic syndrome. As shown in
Multivariate logistic regression analysis of risk factors related to metabolic syndrome.
Factor  Odds ratio^{a} (95% CI)  VIF^{b}  ΔVIF^{c}  
Sex, Male/Female  0.742 (0.3351.641)  3.590  1.630  .99 
Age, years  1.025 (0.9991.051)  1.622  1.506  .15 
Obesity  2.915 (2.1753.907)  1.429  1.406  <.001 
Albumin, g/dL  1.866 (0.8214.239)  1.218  1.176  .12 
AFP^{d}, ng/mL  1.045 (0.9151.193)  1.162  1.153  .48 
ALKp^{e}, IU/L  0.995 (0.9831.007)  1.158  1.140  .52 
GOT^{f}, IU/L  0.959 (0.9230.997) 

   
GPT^{g}, IU/L  1.023 (1.0031.045) 


.51 
Total bilirubin, mg/dL  2.599 (0.56212.015) 


.39 
Direct bilirubin, mg/dL  0.011 (02.507) 

   
γGT^{h}, U/L  1.002 (0.9941.009)  1.414  1.379  .77 
CAP^{i} score, dB/m  1.011 (1.0071.016)  1.455  1.398  <.001 
E score, kPa  1.046 (0.9261.182)  1.284  1.256  .61 
CKD^{j}  1.135 (0.6152.097)  3.387  2.550  .27 
BUN^{k}, mg/dL  0.952 (0.8931.016)  1.397  1.338  .13 
Creatinine, mg/dL  4.288 (0.19694.014) 

   
MDRD^{l}  1.012 (0.9941.031)  4.860  2.863  .39 
UA^{m}, mg/dL  1.127 (0.9671.314)  1.642  1.596  .08 
Cholesterol, mg/dL  1.003 (0.9861.021) 

   
LDL^{n} mg/dL  0.994 (0.9761.012) 


.82 
HbA_{1c}^{o}, %  2.170 (1.6312.888)  1.236  1.230  <.001 
TSH^{p}, μIU/mL  0.876 (0.7271.054)  1.086  1.078  .13 
^{a}The odds ratio represents the exp(β), which is the exponential of the estimator in logistic regression.
^{b}VIF: variance inflation factor (to check multicollinearity); factors with high VIF values are italicized.
^{c}ΔVIF: variance inflation factor after removal of predictor variables with high VIF values; VIF values with a sharp decline are italicized.
^{d}AFP: alphafetoprotein.
^{e}ALKp: alkaline phosphatase.
^{f}GOT: glutamicoxalocetic transaminase.
^{g}GPT: glutamicpyruvic transaminase.
^{h}γGT: gammaglutamyl transpeptidase.
^{i}CAP: controlled attenuation parameter.
^{j}CKD: chronic kidney disease.
^{k}BUN: blood urea nitrogen.
^{l}MDRD: Modification of Diet in Renal Disease.
^{m}UA: uric acid.
^{n}LDL: lowdensity lipoprotein cholesterol.
^{o}HbA_{1c}: glycated hemoglobin.
^{p}TSH: thyroidstimulating hormone.
To inspect the potential indices used for metabolic syndrome, several types of decision trees were applied to health examination data for the classification of metabolic syndrome (
Metabolic syndrome prediction by various decision tree models. The decision tree takes on a flowchartlike structure. The six most commonly used decision trees are shown: (a) classification and regression tree (CART), (b) C5.0 classification tree modified from C4.5 and ID3 tree, (c) chisquare automatic interaction detection (CHAID), (d) conditional inference tree (ctree), (e) evolutionary learning of globally optimal tree (evtree), and (f) generalized linear model tree (glmtree). Each decision tree is applied for the prediction of metabolic syndrome to explore the factors with the greatest influence as an index to distinguish metabolic syndrome.
Major factors as classified nodes in decision trees^{a}.
Decision tree  Root^{b}  Primary node^{c} (root included)  Secondary node^{d}  
CART^{e} 

Total bilirubin (0.19) 

C5.0 


Sex (0.28) 

CHAID^{r} 

AFP (0.12) 

ctree^{s} 

AFP (0.16)  
evtree^{t}  GOT (0.19) 

glmtree^{u} 

LDL (0.23)^{v} 
^{a}Major variables are listed with their weights as candidate nodes in each decision tree; since some variables may be considered candidate nodes in the decision tree more than once, the proportion of variables can be larger than 1.
^{b}The root shows factors appearing as the first classified node and their proportions.
^{c}The primary node (italicized) includes variables selected as the top three nodes (root included) with their proportions (>0.05); variables with lower weights as candidate nodes in the primary nodes are excluded.
^{d}The secondary node includes all remaining candidate nodes in each decision tree with their proportions; only candidate nodes with proportions >0.1 with a certain influence in the classification of metabolic syndrome are shown.
^{e}CART: classification and regression trees.
^{f}CAP: controlled attenuation parameter.
^{g}HbA_{1c}: glycated hemoglobin.
^{h}γGT: gammaglutamyl transpeptidase.
^{i}GOT: glutamicoxalocetic transaminase.
^{j}UA: uric acid.
^{k}AFP: alphafetoprotein.
^{l}ALKp: alkaline phosphatase.
^{m}GPT: glutamicpyruvic transaminase.
^{n}LDL: lowdensity lipoprotein cholesterol.
^{o}MDRD: Modification of Diet in Renal Disease.
^{p}TSH: thyroidstimulating hormone.
^{q}CKD: chronic kidney disease.
^{r}CHAID: chisquare automatic interaction detection.
^{s}ctree: conditional inference tree.
^{t}evtree: evolutionary learning of globally optimal tree.
^{u}glmtree: generalized linear model tree.
^{v}Secondary variables for classification of metabolic syndrome in several decision tree algorithms.
PCA was then applied again to visualize the nonmetabolic syndrome and metabolic syndrome groups according to the prominence of factors from the decision trees, which comprised the CAP score, obesity, and HbA_{1c} (
Finally, the accuracies of various decision trees were determined using 500 rounds of random sampling from the entire health examination dataset with fixedsize divisions of training and testing sets (
Accuracya and area under the curve (AUC) values of various decision trees in receiver operating characteristic curve analysis.
Decision tree  Accuracy  F1score  AUC  

minimum  mean  maximum  minimum  mean  maximum 


CART^{b}  0.797  0.857  0.914  0.888  0.919  0.948  0.831  
C5.0  0.805  0.861  0.921  0.884  0.922  0.951  0.769  
CHAID^{c}  0.823  0.873  0.917  0.894  0.930  0.956  0.867  
ctree^{d}  0.801  0.864  0.914  0.883  0.923  0.954  0.896  
evtree^{e}  0.805  0.857  0.906  0.880  0.920  0.953  0.815  
glmtree^{f}  –^{g}  –  –  –  –  –  0.889  
Random forest  0.812  0.870  0.940  0.888  0.928  0.959  0.904 
^{a}Accuracy and F1score were calculated from 500 machine learning trials with different training sets for comparison with the number of candidate trees from random forest. Accuracy is the probability of true positives and true negatives for all data, whereas F1score is a measure of performance, which is the harmonic mean of precision and recall. The dataset was divided 80% as the training set and 20% as the testing set independently for each analysis with randomized sampling.
^{b}CART: classification and regression trees.
^{c}CHAID: chisquare automatic interaction detection.
^{d}ctree: conditional inference tree.
^{e}evtree: evolutionary learning of globally optimal tree.
^{f}glmtree: generalized linear model tree.
^{g}The terminal nodes of the R package glmtree are not a simple classification form to calculate the confusion matrix for accuracy; therefore, the area under the curve was used to reach a balance in comparison between the seven decision tree techniques on the same training and testing set.
Random forest model for predicting classification performance and variable importance. Receiver operating characteristic (ROC) curve with area under the curve values for (a) classification and regression tree and (b) random forest. The color bar indicates the value of specificity in the false positive rate. (c) Variable importance ordered by accuracy of a mean decrease in random forest. (d) Variable importance ordered by the gini index of a mean decrease in random forest. The leading variables obtained by random forest are listed in darker blue, and less important variables are in lighter blue.
The use of artificial intelligence in health care, particularly machine learning methods, can help to discover underlying patterns and correlations through the learning of datadriven prediction models. We applied various machine learning techniques to visualize and investigate predictive variables leading to metabolic syndrome, which revealed that obesity, serum GOT, serum GPT, CAP score, and HbA_{1c} are the most important predictive variables.
Among these predictive variables, the predictive power of the CAP score was similar to that of other key indices such as obesity. Despite the significance of the CAP score, these factors make sense cumulatively rather than as exclusive alternatives. In other words, more research is required to determine whether the CAP score can be used as a standalone test method to screen for metabolic syndrome, and whether a minimum set of nonblood test variables can be combined with the CAP score to improve the accuracy of predicting metabolic syndrome. Such future research may help subjects who are resistant to the inconvenience of overnight starvation or painful blood assays.
Metabolic syndrome demonstrates a spectrum of physiological manifestations with groups of pathologies that are complicated and progressive. Traditional diagnostic criteria often dichotomize the population into those with metabolic syndrome and those without. However, based on the results of our PCA, such a sharp distinction may be inappropriate. We found that CAP score, obesity, and HbA_{1c} were the principal factors predicting metabolic syndrome, although E score, γGT, LDL, and GPT also considerably affected the predictions. Notably, GPT had more predictive power than GOT. We consider this difference to be related to aspartate aminotransferase as a relatively less specific indicator of liver damage than alanine aminotransferase, which is common in patients with fatty liver. Our study suggests that current diagnostic criteria for metabolic syndrome fail to capture its wide range of presentations, and should thus be expanded to include hepatic and nephritic indices.
Liverrelated indices such as γGT, GPT, and E score ranked among the highest predictors in our models. A previous study also showed a strong correlation between liver function tests and metabolic syndrome based on Pearson correlation coefficients [
Multivariate logistic regression has been extensively utilized in medical research, and its many biases have been well documented. One of the drawbacks we observed in our models was the multicollinearity problem. To avoid multicollinearity (
This study has several limitations. First, this was a retrospective study, and therefore a sufficiently powered prospective cohort study is needed to conclusively address the usefulness of supervised machine learning models to diagnose metabolic syndrome. Second, this study included only healthconscious Taiwanese participants that underwent a selfpaid health examination; therefore, this study should be replicated and validated in other populations. Third, this study failed to include some new obesity biomarkers (such as leptin and adiponectin) that may further improve the prediction of metabolic syndrome [
To the best of our knowledge, this is the first study to apply machine learning algorithms to identify metabolic syndrome in subjects examined with FibroScan. We found that decision tree learning algorithms identified metabolic syndrome in selfpaid health examination subjects with high accuracy, and obesity, serum GOT, serum GPT, CAP score, and HbA_{1c} emerged as important predictive variables. More research is required to validate the CAP score as a standalone test method to screen for metabolic syndrome, and to determine whether a minimum set of nonblood tests variables can be combined with the CAP score to improve the accuracy of predicting metabolic syndrome.
Threedimensional principal components analysis (PCA) of all risk factors. The threedimensional PCA plots provide different visual points to observe the scatter of both the metabolic syndrome and nonmetabolic syndrome groups. All factors in Table 1 are considered in this analysis. The leading principal components PC1, PC2, and PC3—which explain more variability among the samples—are shown in all threedimensional graphs. The aggregation of two groups is obvious when rotating the coordinate in the threedimensional graph.
Threedimensional principal components analysis of only the major risk factors. In this case, the distinction between the metabolic syndrome and nonmetabolic syndrome groups is apparent because only the major variables obtained from Table 3 are included, although the borders of the two groups still overlap.
Receiving operator characteristic curves and area under the curve (AUC) values of six decision trees. The specificity is revealed by the color bar, and the diagonal line is presented as a dashed line. Most AUC values exceed 0.80 except for that of the C5.0 tree.
alphafetoprotein
alkaline phosphatase
area under the curve
blood urea nitrogen
controlled attenuation parameter score
classification and regression tree
Chisquare automatic interaction detection
chronic kidney disease
conditional interference tree
estimated glomerular filtration rate
liver stiffness score
evolutionary learning of globally optimal trees
gammaglutamyl transpeptidase
serum glutamic oxaloacetic transaminase
serum glutamic pyruvic transaminase
glycated hemoglobin
highdensity lipoprotein
lowdensity lipoprotein
Modification of Diet in Renal Disease
nonalcoholic fatty liver disease
principal components analysis
receiver operating characteristic
thyroidstimulating hormone
uric acid
variance inflation factor
This study is supported by the Taiwan National Science Foundation (grant NSC1082314B038073). The funding body had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
None declared.