@Article{info:doi/10.2196/38943, author="Choudhary, Soumya and Thomas, Nikita and Alshamrani, Sultan and Srinivasan, Girish and Ellenberger, Janine and Nawaz, Usman and Cohen, Roy", title="A Machine Learning Approach for Continuous Mining of Nonidentifiable Smartphone Data to Create a Novel Digital Biomarker Detecting Generalized Anxiety Disorder: Prospective Cohort Study", journal="JMIR Med Inform", year="2022", month="Aug", day="30", volume="10", number="8", pages="e38943", keywords="digital phenotyping; machine learning; mental health; profiling metric; smartphone data; anxiety assessment; mining technique; algorithm prediction; digital marker; behavioral marker; anxiety", abstract="Background: Anxiety is one of the leading causes of mental health disability around the world. Currently, a majority of the population who experience anxiety go undiagnosed or untreated. New and innovative ways of diagnosing and monitoring anxiety have emerged using smartphone sensor--based monitoring as a metric for the management of anxiety. This is a novel study as it adds to the field of research through the use of nonidentifiable smartphone usage to help detect and monitor anxiety remotely and in a continuous and passive manner. Objective: This study aims to evaluate the accuracy of a novel mental behavioral profiling metric derived from smartphone usage for the identification and tracking of generalized anxiety disorder (GAD). Methods: Smartphone data and self-reported 7-item GAD anxiety assessments were collected from 229 participants using an Android operating system smartphone in an observational study over an average of 14 days (SD 29.8). A total of 34 features were mined to be constructed as a potential digital phenotyping marker from continuous smartphone usage data. We further analyzed the correlation of these digital behavioral markers against each item of the 7-item Generalized Anxiety Disorder Scale (GAD-7) and its influence on the predictions of machine learning algorithms. Results: A total of 229 participants were recruited in this study who had completed the GAD-7 assessment and had at least one set of passive digital data collected within a 24-hour period. The mean GAD-7 score was 11.8 (SD 5.7). Regression modeling was tested against classification modeling and the highest prediction accuracy was achieved from a binary XGBoost classification model (precision of 73{\%}-81{\%}; recall of 68{\%}-87{\%}; F1-score of 71{\%}-79{\%}; accuracy of 76{\%}; area under the curve of 80{\%}). Nonparametric permutation testing with Pearson correlation results indicated that the proposed metric (Mental Health Similarity Score [MHSS]) had a colinear relationship between GAD-7 Items 1, 3 and 7. Conclusions: The proposed MHSS metric demonstrates the feasibility of using passively collected nonintrusive smartphone data and machine learning--based data mining techniques to track an individuals' daily anxiety levels with a 76{\%} accuracy that directly relates to the GAD-7 scale. ", issn="2291-9694", doi="10.2196/38943", url="https://medinform.jmir.org/2022/8/e38943", url="https://doi.org/10.2196/38943", url="http://www.ncbi.nlm.nih.gov/pubmed/36040777" }