@Article{info:doi/10.2196/medinform.8611, author="Zheng, Jiaping and Yu, Hong", title="Assessing the Readability of Medical Documents: A Ranking Approach", journal="JMIR Med Inform", year="2018", month="Mar", day="23", volume="6", number="1", pages="e17", keywords="electronic health records", keywords="readability", keywords="comprehension", keywords="machine learning", abstract="Background: The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. Objective: Our objective was to develop a machine learning--based system to assess readability levels of complex documents such as EHR notes. Methods: We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. Results: Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). Conclusions: We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning--based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge. ", doi="10.2196/medinform.8611", url="http://medinform.jmir.org/2018/1/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/29572199" } @Article{info:doi/10.2196/medinform.8286, author="Sadat, Nazmus Md and Jiang, Xiaoqian and Aziz, Al Md Momin and Wang, Shuang and Mohammed, Noman", title="Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation", journal="JMIR Med Inform", year="2018", month="Mar", day="05", volume="6", number="1", pages="e14", keywords="privacy-preserving regression analysis", keywords="Intel SGX", keywords="somewhat homomorphic encryption", abstract="Background: Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Objective: Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Methods: Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Results: Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. Conclusions: To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. ", doi="10.2196/medinform.8286", url="http://medinform.jmir.org/2018/1/e14/", url="http://www.ncbi.nlm.nih.gov/pubmed/29506966" } @Article{info:doi/10.2196/publichealth.9138, author="Fukuoka, Yoshimi and Zhou, Mo and Vittinghoff, Eric and Haskell, William and Goldberg, Ken and Aswani, Anil", title="Objectively Measured Baseline Physical Activity Patterns in Women in the mPED Trial: Cluster Analysis", journal="JMIR Public Health Surveill", year="2018", month="Feb", day="01", volume="4", number="1", pages="e10", keywords="accelerometer", keywords="physical activity", keywords="cluster analysis", keywords="women", keywords="randomized controlled trial", keywords="machine learning", keywords="body mass index", keywords="metabolism", keywords="primary prevention", keywords="mHealth", abstract="Background: Determining patterns of physical activity throughout the day could assist in developing more personalized interventions or physical activity guidelines in general and, in particular, for women who are less likely to be physically active than men. Objective: The aims of this report are to identify clusters of women based on accelerometer-measured baseline raw metabolic equivalent of task (MET) values and a normalized version of the METs ?3 data, and to compare sociodemographic and cardiometabolic risks among these identified clusters. Methods: A total of 215 women who were enrolled in the Mobile Phone Based Physical Activity Education (mPED) trial and wore an accelerometer for at least 8 hours per day for the 7 days prior to the randomization visit were analyzed. The k-means clustering method and the Lloyd algorithm were used on the data. We used the elbow method to choose the number of clusters, looking at the percentage of variance explained as a function of the number of clusters. Results: The results of the k-means cluster analyses of raw METs revealed three different clusters. The unengaged group (n=102) had the highest depressive symptoms score compared with the afternoon engaged (n=65) and morning engaged (n=48) groups (overall P<.001). Based on a normalized version of the METs ?3 data, the moderate-to-vigorous physical activity (MVPA) evening peak group (n=108) had a higher body mass index (P=.03), waist circumference (P=.02), and hip circumference (P=.03) than the MVPA noon peak group (n=61). Conclusions: Categorizing physically inactive individuals into more specific activity patterns could aid in creating timing, frequency, duration, and intensity of physical activity interventions for women. Further research is needed to confirm these cluster groups using a large national dataset. Trial Registration: ClinicalTrials.gov NCT01280812; https://clinicaltrials.gov/ct2/show/NCT01280812 (Archived by WebCite at http://www.webcitation.org/6vVyLzwft) ", doi="10.2196/publichealth.9138", url="http://publichealth.jmir.org/2018/1/e10/", url="http://www.ncbi.nlm.nih.gov/pubmed/29391341" } @Article{info:doi/10.2196/jmir.9268, author="Ye, Chengyin and Fu, Tianyun and Hao, Shiying and Zhang, Yan and Wang, Oliver and Jin, Bo and Xia, Minjie and Liu, Modi and Zhou, Xin and Wu, Qian and Guo, Yanting and Zhu, Chunqing and Li, Yu-Ming and Culver, S. Devore and Alfreds, T. Shaun and Stearns, Frank and Sylvester, G. Karl and Widen, Eric and McElhinney, Doff and Ling, Xuefeng", title="Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning", journal="J Med Internet Res", year="2018", month="Jan", day="30", volume="20", number="1", pages="e22", keywords="hypertension", keywords="risk assessment", keywords="electronic health records", keywords="multiple chronic conditions", keywords="mental disorders", keywords="social determinants of health", abstract="Background: As a high-prevalence health condition, hypertension is clinically costly, difficult to manage, and often leads to severe and life-threatening diseases such as cardiovascular disease (CVD) and stroke. Objective: The aim of this study was to develop and validate prospectively a risk prediction model of incident essential hypertension within the following year. Methods: Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. Retrospective (N=823,627, calendar year 2013) and prospective (N=680,810, calendar year 2014) cohorts were formed. A machine learning algorithm, XGBoost, was adopted in the process of feature selection and model building. It generated an ensemble of classification trees and assigned a final predictive risk score to each individual. Results: The 1-year incident hypertension risk model attained areas under the curve (AUCs) of 0.917 and 0.870 in the retrospective and prospective cohorts, respectively. Risk scores were calculated and stratified into five risk categories, with 4526 out of 381,544 patients (1.19\%) in the lowest risk category (score 0-0.05) and 21,050 out of 41,329 patients (50.93\%) in the highest risk category (score 0.4-1) receiving a diagnosis of incident hypertension in the following 1 year. Type 2 diabetes, lipid disorders, CVDs, mental illness, clinical utilization indicators, and socioeconomic determinants were recognized as driving or associated features of incident essential hypertension. The very high risk population mainly comprised elderly (age>50 years) individuals with multiple chronic conditions, especially those receiving medications for mental disorders. Disparities were also found in social determinants, including some community-level factors associated with higher risk and others that were protective against hypertension. Conclusions: With statewide EHR datasets, our study prospectively validated an accurate 1-year risk prediction model for incident essential hypertension. Our real-time predictive analytic model has been deployed in the state of Maine, providing implications in interventions for hypertension and related diseases and hopefully enhancing hypertension care. ", doi="10.2196/jmir.9268", url="http://www.jmir.org/2018/1/e22/", url="http://www.ncbi.nlm.nih.gov/pubmed/29382633" } @Article{info:doi/10.2196/mhealth.9117, author="Zhou, Mo and Fukuoka, Yoshimi and Mintz, Yonatan and Goldberg, Ken and Kaminsky, Philip and Flowers, Elena and Aswani, Anil", title="Evaluating Machine Learning--Based Automated Personalized Daily Step Goals Delivered Through a Mobile Phone App: Randomized Controlled Trial", journal="JMIR Mhealth Uhealth", year="2018", month="Jan", day="25", volume="6", number="1", pages="e28", keywords="physical activity", keywords="cell phone", keywords="fitness tracker", keywords="clinical trial", abstract="Background: Growing evidence shows that fixed, nonpersonalized daily step goals can discourage individuals, resulting in unchanged or even reduced physical activity. Objective: The aim of this randomized controlled trial (RCT) was to evaluate the efficacy of an automated mobile phone--based personalized and adaptive goal-setting intervention using machine learning as compared with an active control with steady daily step goals of 10,000. Methods: In this 10-week RCT, 64 participants were recruited via email announcements and were required to attend an initial in-person session. The participants were randomized into either the intervention or active control group with a one-to-one ratio after a run-in period for data collection. A study-developed mobile phone app (which delivers daily step goals using push notifications and allows real-time physical activity monitoring) was installed on each participant's mobile phone, and participants were asked to keep their phone in a pocket throughout the entire day. Through the app, the intervention group received fully automated adaptively personalized daily step goals, and the control group received constant step goals of 10,000 steps per day. Daily step count was objectively measured by the study-developed mobile phone app. Results: The mean (SD) age of participants was 41.1 (11.3) years, and 83\% (53/64) of participants were female. The baseline demographics between the 2 groups were similar (P>.05). Participants in the intervention group (n=34) had a decrease in mean (SD) daily step count of 390 (490) steps between run-in and 10 weeks, compared with a decrease of 1350 (420) steps among control participants (n=30; P=.03). The net difference in daily steps between the groups was 960 steps (95\% CI 90-1830 steps). Both groups had a decrease in daily step count between run-in and 10 weeks because interventions were also provided during run-in and no natural baseline was collected. Conclusions: The results showed the short-term efficacy of this intervention, which should be formally evaluated in a full-scale RCT with a longer follow-up period. Trial Registration: ClinicalTrials.gov: NCT02886871; https://clinicaltrials.gov/ct2/show/NCT02886871 (Archived by WebCite at http://www.webcitation.org/6wM1Be1Ng). ", doi="10.2196/mhealth.9117", url="http://mhealth.jmir.org/2018/1/e28/", url="http://www.ncbi.nlm.nih.gov/pubmed/29371177" } @Article{info:doi/10.2196/medinform.8751, author="Kim, Seongsoon and Park, Donghyeon and Choi, Yonghwa and Lee, Kyubum and Kim, Byounggun and Jeon, Minji and Kim, Jihye and Tan, Choon Aik and Kang, Jaewoo", title="A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis", journal="JMIR Med Inform", year="2018", month="Jan", day="05", volume="6", number="1", pages="e2", keywords="machine comprehension", keywords="biomedical text comprehension", keywords="deep learning", keywords="machine comprehension dataset", abstract="Background: With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elementary school-level storybooks. However, no attempt has been made to determine whether an up-to-date deep learning-based machine comprehension model can also process scientific literature containing expert-level knowledge, especially in the biomedical domain. Objective: This study aims to investigate whether a machine comprehension model can process biomedical articles as well as general texts. Since there is no dataset for the biomedical literature comprehension task, our work includes generating a large-scale question answering dataset using PubMed and manually evaluating the generated dataset. Methods: We present an attention-based deep neural model tailored to the biomedical domain. To further enhance the performance of our model, we used a pretrained word vector and biomedical entity type embedding. We also developed an ensemble method of combining the results of several independent models to reduce the variance of the answers from the models. Results: The experimental results showed that our proposed deep neural network model outperformed the baseline model by more than 7\% on the new dataset. We also evaluated human performance on the new dataset. The human evaluation result showed that our deep neural model outperformed humans in comprehension by 22\% on average. Conclusions: In this work, we introduced a new task of machine comprehension in the biomedical domain using a deep neural model. Since there was no large-scale dataset for training deep neural models in the biomedical domain, we created the new cloze-style datasets Biomedical Knowledge Comprehension Title (BMKC\_T) and Biomedical Knowledge Comprehension Last Sentence (BMKC\_LS) (together referred to as BioMedical Knowledge Comprehension) using the PubMed corpus. The experimental results showed that the performance of our model is much higher than that of humans. We observed that our model performed consistently better regardless of the degree of difficulty of a text, whereas humans have difficulty when performing biomedical literature comprehension tasks that require expert level knowledge. ", doi="10.2196/medinform.8751", url="http://medinform.jmir.org/2018/1/e2/", url="http://www.ncbi.nlm.nih.gov/pubmed/29305341" } @Article{info:doi/10.2196/medinform.9170, author="P Tafti, Ahmad and Badger, Jonathan and LaRose, Eric and Shirzadi, Ehsan and Mahnke, Andrea and Mayer, John and Ye, Zhan and Page, David and Peissig, Peggy", title="Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure", journal="JMIR Med Inform", year="2017", month="Dec", day="08", volume="5", number="4", pages="e51", keywords="adverse drug event", keywords="adverse drug reaction", keywords="drug side effects", keywords="machine learning", keywords="text mining", abstract="Background: The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. Objective: The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. Methods: We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. Results: The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7\%, 93.6\%, 93.0\%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. Conclusions: To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. ", doi="10.2196/medinform.9170", url="http://medinform.jmir.org/2017/4/e51/", url="http://www.ncbi.nlm.nih.gov/pubmed/29222076" } @Article{info:doi/10.2196/medinform.8680, author="Wellner, Ben and Grand, Joan and Canzone, Elizabeth and Coarr, Matt and Brady, W. Patrick and Simmons, Jeffrey and Kirkendall, Eric and Dean, Nathan and Kleinman, Monica and Sylvester, Peter", title="Predicting Unplanned Transfers to the Intensive Care Unit: A Machine Learning Approach Leveraging Diverse Clinical Elements", journal="JMIR Med Inform", year="2017", month="Nov", day="22", volume="5", number="4", pages="e45", keywords="clinical deterioration", keywords="machine learning", keywords="data mining", keywords="electronic health record", keywords="patient acuity", keywords="vital signs", keywords="nursing assessment", keywords="clinical laboratory techniques", abstract="Background: Early warning scores aid in the detection of pediatric clinical deteriorations but include limited data inputs, rarely include data trends over time, and have limited validation. Objective: Machine learning methods that make use of large numbers of predictor variables are now commonplace. This work examines how different types of predictor variables derived from the electronic health record affect the performance of predicting unplanned transfers to the intensive care unit (ICU) at three large children's hospitals. Methods: We trained separate models with data from three different institutions from 2011 through 2013 and evaluated models with 2014 data. Cases consisted of patients who transferred from the floor to the ICU and met one or more of 5 different priori defined criteria for suspected unplanned transfers. Controls were patients who were never transferred to the ICU. Predictor variables for the models were derived from vitals, labs, acuity scores, and nursing assessments. Classification models consisted of L1 and L2 regularized logistic regression and neural network models. We evaluated model performance over prediction horizons ranging from 1 to 16 hours. Results: Across the three institutions, the c-statistic values for our best models were 0.892 (95\% CI 0.875-0.904), 0.902 (95\% CI 0.880-0.923), and 0.899 (95\% CI 0.879-0.919) for the task of identifying unplanned ICU transfer 6 hours before its occurrence and achieved 0.871 (95\% CI 0.855-0.888), 0.872 (95\% CI 0.850-0.895), and 0.850 (95\% CI 0.825-0.875) for a prediction horizon of 16 hours. For our first model at 80\% sensitivity, this resulted in a specificity of 80.5\% (95\% CI 77.4-83.7) and a positive predictive value of 5.2\% (95\% CI 4.5-6.2). Conclusions: Feature-rich models with many predictor variables allow for patient deterioration to be predicted accurately, even up to 16 hours in advance. ", doi="10.2196/medinform.8680", url="http://medinform.jmir.org/2017/4/e45/", url="http://www.ncbi.nlm.nih.gov/pubmed/29167089" } @Article{info:doi/10.2196/mhealth.8201, author="Shawen, Nicholas and Lonini, Luca and Mummidisetty, Krishna Chaithanya and Shparii, Ilona and Albert, V. Mark and Kording, Konrad and Jayaraman, Arun", title="Fall Detection in Individuals With Lower Limb Amputations Using Mobile Phones: Machine Learning Enhances Robustness for Real-World Applications", journal="JMIR Mhealth Uhealth", year="2017", month="Oct", day="11", volume="5", number="10", pages="e151", keywords="fall detection", keywords="lower limb amputation", keywords="mobile phones", keywords="machine learning", abstract="Background: Automatically detecting falls with mobile phones provides an opportunity for rapid response to injuries and better knowledge of what precipitated the fall and its consequences. This is beneficial for populations that are prone to falling, such as people with lower limb amputations. Prior studies have focused on fall detection in able-bodied individuals using data from a laboratory setting. Such approaches may provide a limited ability to detect falls in amputees and in real-world scenarios. Objective: The aim was to develop a classifier that uses data from able-bodied individuals to detect falls in individuals with a lower limb amputation, while they freely carry the mobile phone in different locations and during free-living. Methods: We obtained 861 simulated indoor and outdoor falls from 10 young control (non-amputee) individuals and 6 individuals with a lower limb amputation. In addition, we recorded a broad database of activities of daily living, including data from three participants' free-living routines. Sensor readings (accelerometer and gyroscope) from a mobile phone were recorded as participants freely carried it in three common locations---on the waist, in a pocket, and in the hand. A set of 40 features were computed from the sensors data and four classifiers were trained and combined through stacking to detect falls. We compared the performance of two population-specific models, trained and tested on either able-bodied or amputee participants, with that of a model trained on able-bodied participants and tested on amputees. A simple threshold-based classifier was used to benchmark our machine-learning classifier. Results: The accuracy of fall detection in amputees for a model trained on control individuals (sensitivity: mean 0.989, 1.96*standard error of the mean [SEM] 0.017; specificity: mean 0.968, SEM 0.025) was not statistically different (P=.69) from that of a model trained on the amputee population (sensitivity: mean 0.984, SEM 0.016; specificity: mean 0.965, SEM 0.022). Detection of falls in control individuals yielded similar results (sensitivity: mean 0.979, SEM 0.022; specificity: mean 0.991, SEM 0.012). A mean 2.2 (SD 1.7) false alarms per day were obtained when evaluating the model (vs mean 122.1, SD 166.1 based on thresholds) on data recorded as participants carried the phone during their daily routine for two or more days. Machine-learning classifiers outperformed the threshold-based one (P<.001). Conclusions: A mobile phone-based fall detection model can use data from non-amputee individuals to detect falls in individuals walking with a prosthesis. We successfully detected falls when the mobile phone was carried across multiple locations and without a predetermined orientation. Furthermore, the number of false alarms yielded by the model over a longer period of time was reasonably low. This moves the application of mobile phone-based fall detection systems closer to a real-world use case scenario. ", doi="10.2196/mhealth.8201", url="http://mhealth.jmir.org/2017/10/e151/", url="http://www.ncbi.nlm.nih.gov/pubmed/29021127" } @Article{info:doi/10.2196/medinform.8076, author="Luo, Gang and Sward, Katherine", title="A Roadmap for Optimizing Asthma Care Management via Computational Approaches", journal="JMIR Med Inform", year="2017", month="Sep", day="26", volume="5", number="3", pages="e32", keywords="patient care management", keywords="clinical decision support", keywords="machine learning", doi="10.2196/medinform.8076", url="http://medinform.jmir.org/2017/3/e32/", url="http://www.ncbi.nlm.nih.gov/pubmed/28951380" } @Article{info:doi/10.2196/resprot.7757, author="Luo, Gang and Stone, L. Bryan and Johnson, D. Michael and Tarczy-Hornoch, Peter and Wilcox, B. Adam and Mooney, D. Sean and Sheng, Xiaoming and Haug, J. Peter and Nkoy, L. Flory", title="Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods", journal="JMIR Res Protoc", year="2017", month="Aug", day="29", volume="6", number="8", pages="e175", keywords="machine learning", keywords="automated temporal aggregation", keywords="automatic model selection", keywords="care management", keywords="clinical big data", abstract="Background: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15\% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. Objective: This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. Methods: This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes. Results: We are currently writing Auto-ML's design document. We intend to finish our study by around the year 2022. Conclusions: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes. ", doi="10.2196/resprot.7757", url="http://www.researchprotocols.org/2017/8/e175/", url="http://www.ncbi.nlm.nih.gov/pubmed/28851678" } @Article{info:doi/10.2196/jmir.7956, author="Birnbaum, L. Michael and Ernala, Kiranmai Sindhu and Rizvi, F. Asra and De Choudhury, Munmun and Kane, M. John", title="A Collaborative Approach to Identifying Social Media Markers of Schizophrenia by Employing Machine Learning and Clinical Appraisals", journal="J Med Internet Res", year="2017", month="Aug", day="14", volume="19", number="8", pages="e289", keywords="schizophrenia", keywords="psychotic disorders", keywords="online social networks", keywords="machine learning", keywords="linguistic analysis", keywords="Twitter", abstract="Background: Linguistic analysis of publicly available Twitter feeds have achieved success in differentiating individuals who self-disclose online as having schizophrenia from healthy controls. To date, limited efforts have included expert input to evaluate the authenticity of diagnostic self-disclosures. Objective: This study aims to move from noisy self-reports of schizophrenia on social media to more accurate identification of diagnoses by exploring a human-machine partnered approach, wherein computational linguistic analysis of shared content is combined with clinical appraisals. Methods: Twitter timeline data, extracted from 671 users with self-disclosed diagnoses of schizophrenia, was appraised for authenticity by expert clinicians. Data from disclosures deemed true were used to build a classifier aiming to distinguish users with schizophrenia from healthy controls. Results from the classifier were compared to expert appraisals on new, unseen Twitter users. Results: Significant linguistic differences were identified in the schizophrenia group including greater use of interpersonal pronouns (P<.001), decreased emphasis on friendship (P<.001), and greater emphasis on biological processes (P<.001). The resulting classifier distinguished users with disclosures of schizophrenia deemed genuine from control users with a mean accuracy of 88\% using linguistic data alone. Compared to clinicians on new, unseen users, the classifier's precision, recall, and accuracy measures were 0.27, 0.77, and 0.59, respectively. Conclusions: These data reinforce the need for ongoing collaborations integrating expertise from multiple fields to strengthen our ability to accurately identify and effectively engage individuals with mental illness online. These collaborations are crucial to overcome some of mental illnesses' biggest challenges by using digital technology. ", doi="10.2196/jmir.7956", url="http://www.jmir.org/2017/8/e289/", url="http://www.ncbi.nlm.nih.gov/pubmed/28807891" } @Article{info:doi/10.2196/mhealth.7521, author="Dominguez Veiga, Juan Jose and O'Reilly, Martin and Whelan, Darragh and Caulfield, Brian and Ward, E. Tomas", title="Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation", journal="JMIR Mhealth Uhealth", year="2017", month="Aug", day="04", volume="5", number="8", pages="e115", keywords="machine learning", keywords="exercise", keywords="biofeedback", abstract="Background: Inertial sensors are one of the most commonly used sources of data for human activity recognition (HAR) and exercise detection (ED) tasks. The time series produced by these sensors are generally analyzed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature preprocessing step can involve nontrivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature-set development. Objective: The study aimed to present a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of transfer learning. This can be done by using a pretrained convolutional neural network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network. Methods: We applied a CNN, an established machine vision technique, to the task of ED. Tensorflow, a high-level framework for machine learning, was used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available neural network (Inception), originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn inertial measurement unit (IMU), was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this dataset. For comparative purposes, classification using the same dataset was also performed using the more conventional approach of feature-extraction and classification using random forest classifiers. Results: With the collected dataset and the proposed method, the different exercises could be recognized with a 95.89\% (3827/3991) accuracy, which is competitive with current state-of-the-art techniques in ED. Conclusions: The high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises is sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning backgrounds. ", doi="10.2196/mhealth.7521", url="http://mhealth.jmir.org/2017/8/e115/", url="http://www.ncbi.nlm.nih.gov/pubmed/28778851" } @Article{info:doi/10.2196/medinform.7779, author="Tapi Nzali, Donald Mike and Bringay, Sandra and Lavergne, Christian and Mollevi, Caroline and Opitz, Thomas", title="What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer", journal="JMIR Med Inform", year="2017", month="Jul", day="31", volume="5", number="3", pages="e23", keywords="breast cancer", keywords="text mining", keywords="social media", keywords="unsupervised learning", abstract="Background: Social media dedicated to health are increasingly used by patients and health professionals. They are rich textual resources with content generated through free exchange between patients. We are proposing a method to tackle the problem of retrieving clinically relevant information from such social media in order to analyze the quality of life of patients with breast cancer. Objective: Our aim was to detect the different topics discussed by patients on social media and to relate them to functional and symptomatic dimensions assessed in the internationally standardized self-administered questionnaires used in cancer clinical trials (European Organization for Research and Treatment of Cancer [EORTC] Quality of Life Questionnaire Core 30 [QLQ-C30] and breast cancer module [QLQ-BR23]). Methods: First, we applied a classic text mining technique, latent Dirichlet allocation (LDA), to detect the different topics discussed on social media dealing with breast cancer. We applied the LDA model to 2 datasets composed of messages extracted from public Facebook groups and from a public health forum (cancerdusein.org, a French breast cancer forum) with relevant preprocessing. Second, we applied a customized Jaccard coefficient to automatically compute similarity distance between the topics detected with LDA and the questions in the self-administered questionnaires used to study quality of life. Results: Among the 23 topics present in the self-administered questionnaires, 22 matched with the topics discussed by patients on social media. Interestingly, these topics corresponded to 95\% (22/23) of the forum and 86\% (20/23) of the Facebook group topics. These figures underline that topics related to quality of life are an important concern for patients. However, 5 social media topics had no corresponding topic in the questionnaires, which do not cover all of the patients' concerns. Of these 5 topics, 2 could potentially be used in the questionnaires, and these 2 topics corresponded to a total of 3.10\% (523/16,868) of topics in the cancerdusein.org corpus and 4.30\% (3014/70,092) of the Facebook corpus. Conclusions: We found a good correspondence between detected topics on social media and topics covered by the self-administered questionnaires, which substantiates the sound construction of such questionnaires. We detected new emerging topics from social media that can be used to complete current self-administered questionnaires. Moreover, we confirmed that social media mining is an important source of information for complementary analysis of quality of life. ", doi="10.2196/medinform.7779", url="http://medinform.jmir.org/2017/3/e23/", url="http://www.ncbi.nlm.nih.gov/pubmed/28760725" } @Article{info:doi/10.2196/medinform.7140, author="Elmessiry, Adel and Cooper, O. William and Catron, F. Thomas and Karrass, Jan and Zhang, Zhe and Singh, P. Munindar", title="Triaging Patient Complaints: Monte Carlo Cross-Validation of Six Machine Learning Classifiers", journal="JMIR Med Inform", year="2017", month="Jul", day="31", volume="5", number="3", pages="e19", keywords="natural language processing", keywords="NLP", keywords="machine learning", keywords="patient complaints", abstract="Background: Unsolicited patient complaints can be a useful service recovery tool for health care organizations. Some patient complaints contain information that may necessitate further action on the part of the health care organization and/or the health care professional. Current approaches depend on the manual processing of patient complaints, which can be costly, slow, and challenging in terms of scalability. Objective: The aim of this study was to evaluate automatic patient triage, which can potentially improve response time and provide much-needed scale, thereby enhancing opportunities to encourage physicians to self-regulate. Methods: We implemented a comparison of several well-known machine learning classifiers to detect whether a complaint was associated with a physician or his/her medical practice. We compared these classifiers using a real-life dataset containing 14,335 patient complaints associated with 768 physicians that was extracted from patient complaints collected by the Patient Advocacy Reporting System developed at Vanderbilt University and associated institutions. We conducted a 10-splits Monte Carlo cross-validation to validate our results. Results: We achieved an accuracy of 82\% and F-score of 81\% in correctly classifying patient complaints with sensitivity and specificity of 0.76 and 0.87, respectively. Conclusions: We demonstrate that natural language processing methods based on modeling patient complaint text can be effective in identifying those patient complaints requiring physician action. ", doi="10.2196/medinform.7140", url="http://medinform.jmir.org/2017/3/e19/", url="http://www.ncbi.nlm.nih.gov/pubmed/28760726" } @Article{info:doi/10.2196/medinform.7954, author="Hao, Shiying and Fu, Tianyun and Wu, Qian and Jin, Bo and Zhu, Chunqing and Hu, Zhongkai and Guo, Yanting and Zhang, Yan and Yu, Yunxian and Fouts, Terry and Ng, Phillip and Culver, S. Devore and Alfreds, T. Shaun and Stearns, Frank and Sylvester, G. Karl and Widen, Eric and McElhinney, B. Doff and Ling, B. Xuefeng", title="Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine", journal="JMIR Med Inform", year="2017", month="Jul", day="26", volume="5", number="3", pages="e21", keywords="electronic medical record", keywords="chronic kidney disease", keywords="risk model", keywords="retrospective study", abstract="Background: Chronic kidney disease (CKD) is a major public health concern in the United States with high prevalence, growing incidence, and serious adverse outcomes. Objective: We aimed to develop and validate a model to identify patients at risk of receiving a new diagnosis of CKD (incident CKD) during the next 1 year in a general population. Methods: The study population consisted of patients who had visited any care facility in the Maine Health Information Exchange network any time between January 1, 2013, and December 31, 2015, and had no history of CKD diagnosis. Two retrospective cohorts of electronic medical records (EMRs) were constructed for model derivation (N=1,310,363) and validation (N=1,430,772). The model was derived using a gradient tree-based boost algorithm to assign a score to each individual that measured the probability of receiving a new diagnosis of CKD from January 1, 2014, to December 31, 2014, based on the preceding 1-year clinical profile. A feature selection process was conducted to reduce the dimension of the data from 14,680 EMR features to 146 as predictors in the final model. Relative risk was calculated by the model to gauge the risk ratio of the individual to population mean of receiving a CKD diagnosis in next 1 year. The model was tested on the validation cohort to predict risk of CKD diagnosis in the period from January 1, 2015, to December 31, 2015, using the preceding 1-year clinical profile. Results: The final model had a c-statistic of 0.871 in the validation cohort. It stratified patients into low-risk (score 0-0.005), intermediate-risk (score 0.005-0.05), and high-risk (score ? 0.05) levels. The incidence of CKD in the high-risk patient group was 7.94\%, 13.7 times higher than the incidence in the overall cohort (0.58\%). Survival analysis showed that patients in the 3 risk categories had significantly different CKD outcomes as a function of time (P<.001), indicating an effective classification of patients by the model. Conclusions: We developed and validated a model that is able to identify patients at high risk of having CKD in the next 1 year by statistically learning from the EMR-based clinical history in the preceding 1 year. Identification of these patients indicates care opportunities such as monitoring and adopting intervention plans that may benefit the quality of care and outcomes in the long term. ", doi="10.2196/medinform.7954", url="http://medinform.jmir.org/2017/3/e21/", url="http://www.ncbi.nlm.nih.gov/pubmed/28747298" } @Article{info:doi/10.2196/jmir.7276, author="Cheng, Qijin and Li, MH Tim and Kwok, Chi-Leung and Zhu, Tingshao and Yip, SF Paul", title="Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study", journal="J Med Internet Res", year="2017", month="Jul", day="10", volume="19", number="7", pages="e243", keywords="suicide", keywords="psychological stress", keywords="social media", keywords="Chinese", keywords="natural language", keywords="machine learning", abstract="Background: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. Objective: The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one's suicide risk and emotional distress in Chinese social media. Methods: A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants' Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. Results: A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. Conclusions: SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life. ", doi="10.2196/jmir.7276", url="http://www.jmir.org/2017/7/e243/", url="http://www.ncbi.nlm.nih.gov/pubmed/28694239" } @Article{info:doi/10.2196/medinform.7123, author="Duz, Marco and Marshall, F. John and Parkin, Tim", title="Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records", journal="JMIR Med Inform", year="2017", month="Jun", day="29", volume="5", number="2", pages="e17", keywords="text mining", keywords="data mining", keywords="electronic medical record", keywords="validation studies", abstract="Background: The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used. Objective: The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs. Methods: The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, {\textasciitilde}2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text notes. Validation was performed by comparison of the computer-assisted method with manual analysis, which was used as the gold standard. Sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and F values of the computer-assisted process were calculated by comparing them with the manual classification. Results: Lowest sensitivity, specificity, PPVs, NPVs, and F values were 99.82\% (1128/1130), 99.88\% (16410/16429), 94.6\% (223/239), 100.00\% (16410/16412), and 99.0\% (100{\texttimes}2{\texttimes}0.983{\texttimes}0.998/[0.983+0.998]), respectively. The computer-assisted process required few seconds to run, although an estimated 30 h were required for dictionary creation. Manual classification required approximately 80 man-hours. Conclusions: The critical step in this work is the creation of accurate and inclusive dictionaries to ensure that no potential cases are missed. It is significantly easier to remove false positive terms from a SS/WS selected subset of a large database than search that original database for potential false negatives. The benefits of using this method are proportional to the size of the dataset to be analyzed. ", doi="10.2196/medinform.7123", url="http://medinform.jmir.org/2017/2/e17/", url="http://www.ncbi.nlm.nih.gov/pubmed/28663163" } @Article{info:doi/10.2196/jmir.7156, author="Guo, Haihong and Na, Xu and Hou, Li and Li, Jiao", title="Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet", journal="J Med Internet Res", year="2017", month="Jun", day="20", volume="19", number="6", pages="e220", keywords="classification", keywords="natural language processing", keywords="hypertension", keywords="consumer health information", abstract="Background: In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging. Objective: This study aimed to classify health care--related questions posted by the general public (Chinese speakers) on the Internet. Methods: A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology. Results: The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41\%, recall of 89.62\%, and F1 score of 90.24\%. Conclusions: In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users' information needs on health care. ", doi="10.2196/jmir.7156", url="http://www.jmir.org/2017/6/e220/", url="http://www.ncbi.nlm.nih.gov/pubmed/28634156" } @Article{info:doi/10.2196/medinform.7235, author="Zheng, Shuai and Lu, J. James and Ghasemzadeh, Nima and Hayek, S. Salim and Quyyumi, A. Arshed and Wang, Fusheng", title="Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies", journal="JMIR Med Inform", year="2017", month="May", day="09", volume="5", number="2", pages="e12", keywords="information extraction", keywords="natural language processing", keywords="controlled vocabulary", keywords="electronic medical records", abstract="Background: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective: Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods: A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results: Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports---each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95\%. Conclusions: IDEAL-X adopts a unique online machine learning--based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. ", doi="10.2196/medinform.7235", url="http://medinform.jmir.org/2017/2/e12/", url="http://www.ncbi.nlm.nih.gov/pubmed/28487265" } @Article{info:doi/10.2196/jmir.7092, author="Park, Eunjeong and Chang, Hyuk-Jae and Nam, Suk Hyo", title="Use of Machine Learning Classifiers and Sensor Data to Detect Neurological Deficit in Stroke Patients", journal="J Med Internet Res", year="2017", month="Apr", day="18", volume="19", number="4", pages="e120", keywords="medical informatics", keywords="machine learning", keywords="motor", keywords="neurological examination", keywords="stroke", abstract="Background: The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. Objective: The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. Methods: We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Results: Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3\%. Conclusions: Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ", doi="10.2196/jmir.7092", url="http://www.jmir.org/2017/4/e120/", url="http://www.ncbi.nlm.nih.gov/pubmed/28420599" } @Article{info:doi/10.2196/jmir.6533, author="Gibbons, Chris and Richards, Suzanne and Valderas, Maria Jose and Campbell, John", title="Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy", journal="J Med Internet Res", year="2017", month="Mar", day="15", volume="19", number="3", pages="e65", keywords="machine learning", keywords="surveys and questionnaires", keywords="feedback", keywords="data mining", keywords="work performance", abstract="Background: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. Objective: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. Methods: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75\% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to ``popular'' (recall=.97), ``innovator'' (recall=.98), and ``respected'' (recall=.87) codes and was lower for the ``interpersonal'' (recall=.80) and ``professional'' (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as ``respected,'' ``professional,'' and ``interpersonal'' related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Conclusions: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance. ", doi="10.2196/jmir.6533", url="http://www.jmir.org/2017/3/e65/", url="http://www.ncbi.nlm.nih.gov/pubmed/28298265" } @Article{info:doi/10.2196/jmir.7207, author="Shah, Ahmar Syed and Velardo, Carmelo and Farmer, Andrew and Tarassenko, Lionel", title="Exacerbations in Chronic Obstructive Pulmonary Disease: Identification and Prediction Using a Digital Health System", journal="J Med Internet Res", year="2017", month="Mar", day="07", volume="19", number="3", pages="e69", keywords="COPD", keywords="disease exacerbation", keywords="mobile health", keywords="self-management", keywords="pulse oximetry", keywords="respiratory rate", keywords="clinical prediction rule", keywords="algorithms", abstract="Background: Chronic obstructive pulmonary disease (COPD) is a progressive, chronic respiratory disease with a significant socioeconomic burden. Exacerbations, the sudden and sustained worsening of symptoms, can lead to hospitalization and reduce quality of life. Major limitations of previous telemonitoring interventions for COPD include low compliance, lack of consensus on what constitutes an exacerbation, limited numbers of patients, and short monitoring periods. We developed a telemonitoring system based on a digital health platform that was used to collect data from the 1-year EDGE (Self Management and Support Programme) COPD clinical trial aiming at daily monitoring in a heterogeneous group of patients with moderate to severe COPD. Objective: The objectives of the study were as follows: first, to develop a systematic and reproducible approach to exacerbation identification and to track the progression of patient condition during remote monitoring; and second, to develop a robust algorithm able to predict COPD exacerbation, based on vital signs acquired from a pulse oximeter. Methods: We used data from 110 patients, with a combined monitoring period of more than 35,000 days. We propose a finite-state machine--based approach for modeling COPD exacerbation to gain a deeper insight into COPD patient condition during home monitoring to take account of the time course of symptoms. A robust algorithm based on short-period trend analysis and logistic regression using vital signs derived from a pulse oximeter is also developed to predict exacerbations. Results: On the basis of 27,260 sessions recorded during the clinical trial (average usage of 5.3 times per week for 12 months), there were 361 exacerbation events. There was considerable variation in the length of exacerbation events, with a mean length of 8.8 days. The mean value of oxygen saturation was lower, and both the pulse rate and respiratory rate were higher before an impending exacerbation episode, compared with stable periods. On the basis of the classifier developed in this work, prediction of COPD exacerbation episodes with 60\%-80\% sensitivity will result in 68\%-36\% specificity. Conclusions: All 3 vital signs acquired from a pulse oximeter (pulse rate, oxygen saturation, and respiratory rate) are predictive of COPD exacerbation events, with oxygen saturation being the most predictive, followed by respiratory rate and pulse rate. Combination of these vital signs with a robust algorithm based on machine learning leads to further improvement in positive predictive accuracy. Trial Registration: International Standard Randomized Controlled Trial Number (ISRCTN): 40367841; http://www.isrctn.com/ISRCTN40367841 (Archived by WebCite at http://www.webcitation.org/6olpMWNpc) ", doi="10.2196/jmir.7207", url="http://www.jmir.org/2017/3/e69/", url="http://www.ncbi.nlm.nih.gov/pubmed/28270380" } @Article{info:doi/10.2196/resprot.5948, author="Luther, L. Stephen and Thomason, S. Susan and Sabharwal, Sunil and Finch, K. Dezon and McCart, James and Toyinbo, Peter and Bouayad, Lina and Matheny, E. Michael and Gobbel, T. Glenn and Powell-Cope, Gail", title="Leveraging Electronic Health Care Record Information to Measure Pressure Ulcer Risk in Veterans With Spinal Cord Injury: A Longitudinal Study Protocol", journal="JMIR Res Protoc", year="2017", month="Jan", day="19", volume="6", number="1", pages="e3", keywords="natural language processing", keywords="pressure ulcer", keywords="risk assessment", keywords="spinal cord injury", keywords="text mining", abstract="Background: Pressure ulcers (PrUs) are a frequent, serious, and costly complication for veterans with spinal cord injury (SCI). The health care team should periodically identify PrU risk, although there is no tool in the literature that has been found to be reliable, valid, and sensitive enough to assess risk in this vulnerable population. Objective: The immediate goal is to develop a risk assessment model that validly estimates the probability of developing a PrU. The long-term goal is to assist veterans with SCI and their providers in preventing PrUs through an automated system of risk assessment integrated into the veteran's electronic health record (EHR). Methods: This 5-year longitudinal, retrospective, cohort study targets 12,344 veterans with SCI who were cared for in the Veterans Health Administration (VHA) in fiscal year (FY) 2009 and had no record of a PrU in the prior 12 months. Potential risk factors identified in the literature were reviewed by an expert panel that prioritized factors and determined if these were found in structured data or unstructured form in narrative clinical notes for FY 2009-2013. These data are from the VHA enterprise Corporate Data Warehouse that is derived from the EHR structured (ie, coded in database/table) or narrative (ie, text in clinical notes) data for FY 2009-2013. Results: This study is ongoing and final results are expected in 2017. Thus far, the expert panel reviewed the initial list of risk factors extracted from the literature; the panel recommended additions and omissions and provided insights about the format in which the documentation of the risk factors might exist in the EHR. This list was then iteratively refined through review and discussed with individual experts in the field. The cohort for the study was then identified, and all structured, unstructured, and semistructured data were extracted. Annotation schemas were developed, samples of documents were extracted, and annotations are ongoing. Operational definitions of structured data elements have been created and steps to create an analytic dataset are underway. Conclusions: To our knowledge, this is the largest cohort employed to identify PrU risk factors in the United States. It also represents the first time natural language processing and statistical text mining will be used to expand the number of variables available for analysis. A major strength of this quantitative study is that all VHA SCI centers were included in the analysis, reducing potential for selection bias and providing increased power for complex statistical analyses. This longitudinal study will eventually result in a risk prediction tool to assess PrU risk that is reliable and valid, and that is sensitive to this vulnerable population. ", doi="10.2196/resprot.5948", url="http://www.researchprotocols.org/2017/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/28104580" } @Article{info:doi/10.2196/medinform.6690, author="Lee, Joon", title="Patient-Specific Predictive Modeling Using Random Forests: An Observational Study for the Critically Ill", journal="JMIR Med Inform", year="2017", month="Jan", day="17", volume="5", number="1", pages="e3", keywords="forecasting", keywords="critical care", keywords="predictive analytics", keywords="patient similarity", keywords="random forest", abstract="Background: With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling. Objective: The objective of the study is to investigate the random forest (RF) proximity measure as a PSM in the context of personalized mortality prediction for intensive care unit (ICU) patients. Methods: A total of 17,152 ICU admissions were extracted from the Multiparameter Intelligent Monitoring in Intensive Care II database. A number of predictor variables were extracted from the first 24 hours in the ICU. Outcome to be predicted was 30-day mortality. A patient-specific predictive model was trained for each ICU admission using an RF PSM inspired by the RF proximity measure. Death counting, logistic regression, decision tree, and RF models were studied with a hard threshold applied to RF PSM values to only include the M most similar patients in model training, where M was varied. In addition, case-specific random forests (CSRFs), which uses RF proximity for weighted bootstrapping, were trained. Results: Compared to our previous study that investigated a cosine similarity PSM, the RF PSM resulted in superior or comparable predictive performance. RF and CSRF exhibited the best performances (in terms of mean area under the receiver operating characteristic curve [95\% confidence interval], RF: 0.839 [0.835-0.844]; CSRF: 0.832 [0.821-0.843]). RF and CSRF did not benefit from personalization via the use of the RF PSM, while the other models did. Conclusions: The RF PSM led to good mortality prediction performance for several predictive models, although it failed to induce improved performance in RF and CSRF. The distinction between predictor and similarity variables is an important issue arising from the present study. RFs present a promising method for patient-specific outcome prediction. ", doi="10.2196/medinform.6690", url="http://medinform.jmir.org/2017/1/e3/", url="http://www.ncbi.nlm.nih.gov/pubmed/28096065" }