TY - JOUR AU - Choudhary, Soumya AU - Thomas, Nikita AU - Alshamrani, Sultan AU - Srinivasan, Girish AU - Ellenberger, Janine AU - Nawaz, Usman AU - Cohen, Roy PY - 2022 DA - 2022/8/30 TI - A Machine Learning Approach for Continuous Mining of Nonidentifiable Smartphone Data to Create a Novel Digital Biomarker Detecting Generalized Anxiety Disorder: Prospective Cohort Study JO - JMIR Med Inform SP - e38943 VL - 10 IS - 8 KW - digital phenotyping KW - machine learning KW - mental health KW - profiling metric KW - smartphone data KW - anxiety assessment KW - mining technique KW - algorithm prediction KW - digital marker KW - behavioral marker KW - anxiety AB - Background: Anxiety is one of the leading causes of mental health disability around the world. Currently, a majority of the population who experience anxiety go undiagnosed or untreated. New and innovative ways of diagnosing and monitoring anxiety have emerged using smartphone sensor–based monitoring as a metric for the management of anxiety. This is a novel study as it adds to the field of research through the use of nonidentifiable smartphone usage to help detect and monitor anxiety remotely and in a continuous and passive manner. Objective: This study aims to evaluate the accuracy of a novel mental behavioral profiling metric derived from smartphone usage for the identification and tracking of generalized anxiety disorder (GAD). Methods: Smartphone data and self-reported 7-item GAD anxiety assessments were collected from 229 participants using an Android operating system smartphone in an observational study over an average of 14 days (SD 29.8). A total of 34 features were mined to be constructed as a potential digital phenotyping marker from continuous smartphone usage data. We further analyzed the correlation of these digital behavioral markers against each item of the 7-item Generalized Anxiety Disorder Scale (GAD-7) and its influence on the predictions of machine learning algorithms. Results: A total of 229 participants were recruited in this study who had completed the GAD-7 assessment and had at least one set of passive digital data collected within a 24-hour period. The mean GAD-7 score was 11.8 (SD 5.7). Regression modeling was tested against classification modeling and the highest prediction accuracy was achieved from a binary XGBoost classification model (precision of 73%-81%; recall of 68%-87%; F1-score of 71%-79%; accuracy of 76%; area under the curve of 80%). Nonparametric permutation testing with Pearson correlation results indicated that the proposed metric (Mental Health Similarity Score [MHSS]) had a colinear relationship between GAD-7 Items 1, 3 and 7. Conclusions: The proposed MHSS metric demonstrates the feasibility of using passively collected nonintrusive smartphone data and machine learning–based data mining techniques to track an individuals’ daily anxiety levels with a 76% accuracy that directly relates to the GAD-7 scale. SN - 2291-9694 UR - https://medinform.jmir.org/2022/8/e38943 UR - https://doi.org/10.2196/38943 UR - http://www.ncbi.nlm.nih.gov/pubmed/36040777 DO - 10.2196/38943 ID - info:doi/10.2196/38943 ER -