TY - JOUR AU - Haun, N. Jolie AU - Alman, C. Amy AU - Melillo, Christine AU - Standifer, Maisha AU - McMahon-Grenz, Julie AU - Shin, Marlena AU - Lapcevic, A. W. AU - Patel, Nitin AU - Elwy, Rani A. PY - 2020/6/26 TI - Using Electronic Data Collection Platforms to Assess Complementary and Integrative Health Patient-Reported Outcomes: Feasibility Project JO - JMIR Med Inform SP - e15609 VL - 8 IS - 6 KW - integrative medicine KW - health information technology KW - health services research KW - mobile phone KW - patient-reported outcomes KW - veteran N2 - Background: The Veteran Administration (VA) Office of Patient-Centered Care and Cultural Transformation is invested in improving veteran health through a whole-person approach while taking advantage of the electronic resources suite available through the VA. Currently, there is no standardized process to collect and integrate electronic patient-reported outcomes (ePROs) of complementary and integrative health (CIH) into clinical care using a web-based survey platform. This quality improvement project enrolled veterans attending CIH appointments within a VA facility and used web-based technologies to collect ePROs. Objective: This study aimed to (1) determine a practical process for collecting ePROs using patient email services and a web-based survey platform and (2) conduct analyses of survey data using repeated measures to estimate the effects of CIH on patient outcomes. Methods: In total, 100 veterans from one VA facility, comprising 11 cohorts, agreed to participate. The VA patient email services (Secure Messaging) were used to manually send links to a 16-item web-based survey stored on a secure web-based survey storage platform (Qualtrics). Each survey included questions about patient outcomes from CIH programs. Each cohort was sent survey links via Secure Messaging (SM) at 6 time points: weeks 1 through 4, week 8, and week 12. Process evaluation interviews were conducted with five primary care providers to assess barriers and facilitators to using the patient-reported outcome survey in usual care. Results: This quality improvement project demonstrated the usability of SM and Qualtrics for ePRO collection. However, SM for ePROs was labor intensive for providers. Descriptive statistics on health competence (2-item Perceived Health Competence Scale), physical and mental health (Patient-Reported Outcomes Measurement Information System Global-10), and stress (4-item Perceived Stress Scale) indicated that scores did not significantly change over time. Survey response rates varied (18/100, 18.0%-42/100, 42.0%) across each of the 12 weekly survey periods. In total, 74 of 100 participants provided ?1 survey, and 90% (66/74) were female. The majority, 62% (33/53) of participants, who reported the use of any CIH modality, reported the use of two or more unique modalities. Primary care providers highlighted specific challenges with SM and offered solutions regarding staff involvement in survey implementation. Conclusions: This quality improvement project informs our understanding of the processes currently available for using SM and web-based data platforms to collect ePROs. The study results indicate that although it is possible to use SM and web-based survey platforms for ePROs, automating scheduled administration will be necessary to reduce provider burden. The lack of significant change in ePROs may be due to standard measures taking a biomedical approach to wellness. Future work should focus on identifying ideal ePRO processes that would include standardized, whole-person measures of wellness. UR - http://medinform.jmir.org/2020/6/e15609/ UR - http://dx.doi.org/10.2196/15609 UR - http://www.ncbi.nlm.nih.gov/pubmed/32589163 ID - info:doi/10.2196/15609 ER - TY - JOUR AU - Becker, Linda AU - Ganslandt, Thomas AU - Prokosch, Hans-Ulrich AU - Newe, Axel PY - 2020/6/16 TI - Applied Practice and Possible Leverage Points for Information Technology Support for Patient Screening in Clinical Trials: Qualitative Study JO - JMIR Med Inform SP - e15749 VL - 8 IS - 6 KW - clinical trial KW - patient screening KW - electronic support KW - clinical information systems KW - inclusion criteria KW - exclusion criteria KW - feasibility studies KW - mobile phone N2 - Background: Clinical trials are one of the most challenging and meaningful designs in medical research. One essential step before starting a clinical trial is screening, that is, to identify patients who fulfill the inclusion criteria and do not fulfill the exclusion criteria. The screening step for clinical trials might be supported by modern information technology (IT). Objective: This explorative study aimed (1) to obtain insights into which tools for feasibility estimations and patient screening are actually used in clinical routine and (2) to determine which method and type of IT support could benefit clinical staff. Methods: Semistandardized interviews were conducted in 5 wards (cardiology, gynecology, gastroenterology, nephrology, and palliative care) in a German university hospital. Of the 5 interviewees, 4 were directly involved in patient screening. Three of them were clinicians, 1 was a study nurse, and 1 was a research assistant. Results: The existing state of study feasibility estimation and the screening procedure were dominated by human communication and estimations from memory, although there were many possibilities for IT support. Success mostly depended on the experience and personal motivation of the clinical staff. Electronic support has been used but with little importance so far. Searches in ward-specific patient registers (databases) and searches in clinical information systems were reported. Furthermore, free-text searches in medical reports were mentioned. For potential future applications, a preference for either proactive or passive systems was not expressed. Most of the interviewees saw the potential for the improvement of the actual systems, but they were also largely satisfied with the outcomes of the current approach. Most of the interviewees were interested in learning more about the various ways in which IT could support and relieve them in their clinical routine. Conclusions: Overall, IT support currently plays a minor role in the screening step for clinical trials. The lack of IT usage and the estimations made from memory reported by all the participants might constrain cognitive resources, which might distract from clinical routine. We conclude that electronic support for the screening step for clinical trials is still a challenge and that education of the staff about the possibilities for electronic support in clinical trials is necessary. UR - http://medinform.jmir.org/2020/6/e15749/ UR - http://dx.doi.org/10.2196/15749 UR - http://www.ncbi.nlm.nih.gov/pubmed/32442156 ID - info:doi/10.2196/15749 ER - TY - JOUR AU - Si, Yan AU - Wu, Hong AU - Liu, Qing PY - 2020/6/29 TI - Factors Influencing Doctors? Participation in the Provision of Medical Services Through Crowdsourced Health Care Information Websites: Elaboration-Likelihood Perspective Study JO - JMIR Med Inform SP - e16704 VL - 8 IS - 6 KW - crowdsourcing KW - crowdsourced medical services KW - online health communities KW - doctors? participation KW - elaboration-likelihood model N2 - Background: Web-based crowdsourcing promotes the goals achieved effectively by gaining solutions from public groups via the internet, and it has gained extensive attention in both business and academia. As a new mode of sourcing, crowdsourcing has been proven to improve efficiency, quality, and diversity of tasks. However, little attention has been given to crowdsourcing in the health sector. Objective: Crowdsourced health care information websites enable patients to post their questions in the question pool, which is accessible to all doctors, and the patients wait for doctors to respond to their questions. Since the sustainable development of crowdsourced health care information websites depends on the participation of the doctors, we aimed to investigate the factors influencing doctors? participation in providing health care information in these websites from the perspective of the elaboration-likelihood model. Methods: We collected 1524 questions with complete patient-doctor interaction processes from an online health community in China to test all the hypotheses. We divided the doctors into 2 groups based on the sequence of the answers: (1) doctor who answered the patient?s question first and (2) the doctors who answered that question after the doctor who answered first. All analyses were conducted using the ordinary least squares method. Results: First, the ability of the doctor who first answered the health-related question was found to positively influence the participation of the following doctors who answered after the first doctor responded to the question (?offline1=.177, P<.001; ?offline2=.063, P=.048; ?online=.418, P<.001). Second, the reward that the patient offered for the best answer showed a positive effect on doctors? participation (?=.019, P<.001). Third, the question?s complexity was found to positively moderate the relationships between the ability of the first doctor who answered and the participation of the following doctors (?=.186, P=.05) and to mitigate the effect between the reward and the participation of the following doctors (?=?.003, P=.10). Conclusions: This study has both theoretical and practical contributions. Online health community managers can build effective incentive mechanisms to encourage highly competent doctors to participate in the provision of medical services in crowdsourced health care information websites and they can increase the reward incentives for each question to increase the participation of the doctors. UR - http://medinform.jmir.org/2020/6/e16704/ UR - http://dx.doi.org/10.2196/16704 UR - http://www.ncbi.nlm.nih.gov/pubmed/32597787 ID - info:doi/10.2196/16704 ER - TY - JOUR AU - Yu, Yue AU - Ruddy, Kathryn AU - Mansfield, Aaron AU - Zong, Nansu AU - Wen, Andrew AU - Tsuji, Shintaro AU - Huang, Ming AU - Liu, Hongfang AU - Shah, Nilay AU - Jiang, Guoqian PY - 2020/6/12 TI - Detecting and Filtering Immune-Related Adverse Events Signal Based on Text Mining and Observational Health Data Sciences and Informatics Common Data Model: Framework Development Study JO - JMIR Med Inform SP - e17353 VL - 8 IS - 6 KW - immunotherapy/adverse effects KW - drug-related side effects and adverse reactions KW - pharmacovigilance KW - adverse drug reaction reporting systems/standards KW - text mining N2 - Background: Immune checkpoint inhibitors are associated with unique immune-related adverse events (irAEs). As most of the immune checkpoint inhibitors are new to the market, it is important to conduct studies using real-world data sources to investigate their safety profiles. Objective: The aim of the study was to develop a framework for signal detection and filtration of novel irAEs for 6 Food and Drug Administration?approved immune checkpoint inhibitors. Methods: In our framework, we first used the Food and Drug Administration?s Adverse Event Reporting System (FAERS) standardized in an Observational Health Data Sciences and Informatics (OHDSI) common data model (CDM) to collect immune checkpoint inhibitor-related event data and conducted irAE signal detection. OHDSI CDM is a standard-driven data model that focuses on transforming different databases into a common format and standardizing medical terms to a common representation. We then filtered those already known irAEs from drug labels and literature by using a customized text-mining pipeline based on clinical text analysis and knowledge extraction system with Medical Dictionary for Regulatory Activities (MedDRA) as a dictionary. Finally, we classified the irAE detection results into three different categories to discover potentially new irAE signals. Results: By our text-mining pipeline, 490 irAE terms were identified from drug labels, and 918 terms were identified from the literature. In addition, of the 94 positive signals detected using CDM-based FAERS, 53 signals (56%) were labeled signals, 10 (11%) were unlabeled published signals, and 31 (33%) were potentially new signals. Conclusions: We demonstrated that our approach is effective for irAE signal detection and filtration. Moreover, our CDM-based framework could facilitate adverse drug events detection and filtration toward the goal of next-generation pharmacovigilance that seamlessly integrates electronic health record data for improved signal detection. UR - http://medinform.jmir.org/2020/6/e17353/ UR - http://dx.doi.org/10.2196/17353 UR - http://www.ncbi.nlm.nih.gov/pubmed/32530430 ID - info:doi/10.2196/17353 ER - TY - JOUR AU - Chuang, Li-Yeh AU - Yang, Cheng-San AU - Yang, Huai-Shuo AU - Yang, Cheng-Hong PY - 2020/6/17 TI - Identification of High-Order Single-Nucleotide Polymorphism Barcodes in Breast Cancer Using a Hybrid Taguchi-Genetic Algorithm: Case-Control Study JO - JMIR Med Inform SP - e16886 VL - 8 IS - 6 KW - genetic algorithm KW - single-nucleotide polymorphism KW - breast cancer KW - case-control study N2 - Background: Breast cancer has a major disease burden in the female population, and it is a highly genome-associated human disease. However, in genetic studies of complex diseases, modern geneticists face challenges in detecting interactions among loci. Objective: This study aimed to investigate whether variations of single-nucleotide polymorphisms (SNPs) are associated with histopathological tumor characteristics in breast cancer patients. Methods: A hybrid Taguchi-genetic algorithm (HTGA) was proposed to identify the high-order SNP barcodes in a breast cancer case-control study. A Taguchi method was used to enhance a genetic algorithm (GA) for identifying high-order SNP barcodes. The Taguchi method was integrated into the GA after the crossover operations in order to optimize the generated offspring systematically for enhancing the GA search ability. Results: The proposed HTGA effectively converged to a promising region within the problem space and provided excellent SNP barcode identification. Regression analysis was used to validate the association between breast cancer and the identified high-order SNP barcodes. The maximum OR was less than 1 (range 0.870-0.755) for two- to seven-order SNP barcodes. Conclusions: We systematically evaluated the interaction effects of 26 SNPs within growth factor?related genes for breast carcinogenesis pathways. The HTGA could successfully identify relevant high-order SNP barcodes by evaluating the differences between cases and controls. The validation results showed that the HTGA can provide better fitness values as compared with other methods for the identification of high-order SNP barcodes using breast cancer case-control data sets. UR - https://medinform.jmir.org/2020/6/e16886 UR - http://dx.doi.org/10.2196/16886 UR - http://www.ncbi.nlm.nih.gov/pubmed/32554381 ID - info:doi/10.2196/16886 ER - TY - JOUR AU - Li, Xin AU - Rousseau, F. Justin AU - Ding, Ying AU - Song, Min AU - Lu, Wei PY - 2020/6/16 TI - Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin JO - JMIR Med Inform SP - e16739 VL - 8 IS - 6 KW - drug repurposing KW - biomedical entities KW - entitymetrics KW - bibliometrics KW - aspirin KW - acetylsalicylic acid N2 - Background: Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research. Objective: The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution. Methods: In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P1), Promising Index (P2), Prestige Index (P3), and Collaboration Index (CI). Results: We found that the maxima of P1, P3, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P1 and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P2 was more sensitive to immediate changes. P3 reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug. Conclusions: In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development. UR - https://medinform.jmir.org/2020/6/e16739 UR - http://dx.doi.org/10.2196/16739 UR - http://www.ncbi.nlm.nih.gov/pubmed/32543442 ID - info:doi/10.2196/16739 ER - TY - JOUR AU - Faruqui, Akhter Syed Hasib AU - Alaeddini, Adel AU - Chang, C. Mike AU - Shirinkam, Sara AU - Jaramillo, Carlos AU - NajafiRad, Peyman AU - Wang, Jing AU - Pugh, Jo Mary PY - 2020/6/17 TI - Summarizing Complex Graphical Models of Multiple Chronic Conditions Using the Second Eigenvalue of Graph Laplacian: Algorithm Development and Validation JO - JMIR Med Inform SP - e16372 VL - 8 IS - 6 KW - graphical models KW - graph summarization KW - graph Laplacian KW - disease network KW - multiple chronic conditions N2 - Background: It is important but challenging to understand the interactions of multiple chronic conditions (MCC) and how they develop over time in patients and populations. Clinical data on MCC can now be represented using graphical models to study their interaction and identify the path toward the development of MCC. However, the current graphical models representing MCC are often complex and difficult to analyze. Therefore, it is necessary to develop improved methods for generating these models. Objective: This study aimed to summarize the complex graphical models of MCC interactions to improve comprehension and aid analysis. Methods: We examined the emergence of 5 chronic medical conditions (ie, traumatic brain injury [TBI], posttraumatic stress disorder [PTSD], depression [Depr], substance abuse [SuAb], and back pain [BaPa]) over 5 years among 257,633 veteran patients. We developed 3 algorithms that utilize the second eigenvalue of the graph Laplacian to summarize the complex graphical models of MCC by removing less significant edges. The first algorithm learns a sparse probabilistic graphical model of MCC interactions directly from the data. The second algorithm summarizes an existing probabilistic graphical model of MCC interactions when a supporting data set is available. The third algorithm, which is a variation of the second algorithm, summarizes the existing graphical model of MCC interactions with no supporting data. Finally, we examined the coappearance of the 100 most common terms in the literature of MCC to validate the performance of the proposed model. Results: The proposed summarization algorithms demonstrate considerable performance in extracting major connections among MCC without reducing the predictive accuracy of the resulting graphical models. For the model learned directly from the data, the area under the curve (AUC) performance for predicting TBI, PTSD, BaPa, SuAb, and Depr, respectively, during the next 4 years is as follows?year 2: 79.91%, 84.04%, 78.83%, 82.50%, and 81.47%; year 3: 76.23%, 80.61%, 73.51%, 79.84%, and 77.13%; year 4: 72.38%, 78.22%, 72.96%, 77.92%, and 72.65%; and year 5: 69.51%, 76.15%, 73.04%, 76.72%, and 69.99%, respectively. This demonstrates an overall 12.07% increase in the cumulative sum of AUC in comparison with the classic multilevel temporal Bayesian network. Conclusions: Using graph summarization can improve the interpretability and the predictive power of the complex graphical models of MCC. UR - http://medinform.jmir.org/2020/6/e16372/ UR - http://dx.doi.org/10.2196/16372 UR - http://www.ncbi.nlm.nih.gov/pubmed/32554376 ID - info:doi/10.2196/16372 ER - TY - JOUR AU - Her, Qoua AU - Malenfant, Jessica AU - Zhang, Zilu AU - Vilk, Yury AU - Young, Jessica AU - Tabano, David AU - Hamilton, Jack AU - Johnson, Ron AU - Raebel, Marsha AU - Boudreau, Denise AU - Toh, Sengwee PY - 2020/6/4 TI - Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance JO - JMIR Med Inform SP - e15073 VL - 8 IS - 6 KW - distributed regression analysis KW - distributed data networks KW - privacy-protecting analytics KW - pharmacoepidemiology KW - PopMedNet N2 - Background: A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited. Objective: This study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network. Methods: We executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow. Results: All DRA results were precise (<10?12), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution. Conclusions: We integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies. UR - https://medinform.jmir.org/2020/6/e15073 UR - http://dx.doi.org/10.2196/15073 UR - http://www.ncbi.nlm.nih.gov/pubmed/32496200 ID - info:doi/10.2196/15073 ER - TY - JOUR AU - Ye, Qing AU - Zhou, Jin AU - Wu, Hong PY - 2020/6/8 TI - Using Information Technology to Manage the COVID-19 Pandemic: Development of a Technical Framework Based on Practical Experience in China JO - JMIR Med Inform SP - e19515 VL - 8 IS - 6 KW - COVID-19 KW - pandemic KW - health informatics KW - health information technology KW - technical framework KW - privacy protection N2 - Background: The coronavirus disease (COVID-19) epidemic poses an enormous challenge to the global health system, and governments have taken active preventive and control measures. The health informatics community in China has actively taken action to leverage health information technologies for epidemic monitoring, detection, early warning, prevention and control, and other tasks. Objective: The aim of this study was to develop a technical framework to respond to the COVID-19 epidemic from a health informatics perspective. Methods: In this study, we collected health information technology?related information to understand the actions taken by the health informatics community in China during the COVID-19 outbreak and developed a health information technology framework for epidemic response based on health information technology?related measures and methods. Results: Based on the framework, we review specific health information technology practices for managing the outbreak in China, describe the highlights of their application in detail, and discuss critical issues to consider when using health information technology. Technologies employed include mobile and web-based services such as Internet hospitals and Wechat, big data analyses (including digital contact tracing through QR codes or epidemic prediction), cloud computing, Internet of things, Artificial Intelligence (including the use of drones, robots, and intelligent diagnoses), 5G telemedicine, and clinical information systems to facilitate clinical management for COVID-19. Conclusions: Practical experience in China shows that health information technologies play a pivotal role in responding to the COVID-19 epidemic. UR - http://medinform.jmir.org/2020/6/e19515/ UR - http://dx.doi.org/10.2196/19515 UR - http://www.ncbi.nlm.nih.gov/pubmed/32479411 ID - info:doi/10.2196/19515 ER - TY - JOUR AU - Massonnaud, R. Clément AU - Kerdelhué, Gaétan AU - Grosjean, Julien AU - Lelong, Romain AU - Griffon, Nicolas AU - Darmoni, J. Stefan PY - 2020/6/4 TI - Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study JO - JMIR Med Inform SP - e12799 VL - 8 IS - 6 KW - bibliographic database KW - information retrieval KW - literature search KW - Medical Subject Headings KW - MEDLINE KW - PubMed KW - precision KW - recall KW - search strategy KW - thesaurus N2 - Background: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval. Objective: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form ?preferred term?[MH] OR ?preferred term?[TIAB] OR ?synonym 1?[TIAB] OR ?synonym 2?[TIAB] OR ?, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure). Methods: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard (?preferred term?[MH]), the number of citations retrieved by the added terms (?synonym 1?[TIAB] OR ?synonym 2?[TIAB] OR ?) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an ?AND? operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a ?preferred term,? corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric. Results: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors. Conclusions: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user?s objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure). UR - https://medinform.jmir.org/2020/6/e12799 UR - http://dx.doi.org/10.2196/12799 UR - http://www.ncbi.nlm.nih.gov/pubmed/32496201 ID - info:doi/10.2196/12799 ER - TY - JOUR AU - Myneni, Sahiti AU - Lewis, Brittney AU - Singh, Tavleen AU - Paiva, Kristi AU - Kim, Min Seon AU - Cebula, V. Adrian AU - Villanueva, Gloria AU - Wang, Jing PY - 2020/6/30 TI - Diabetes Self-Management in the Age of Social Media: Large-Scale Analysis of Peer Interactions Using Semiautomated Methods JO - JMIR Med Inform SP - e18441 VL - 8 IS - 6 KW - diabetes KW - self-management KW - social media KW - digital health N2 - Background: Online communities have been gaining popularity as support venues for chronic disease management. User engagement, information exposure, and social influence mechanisms can play a significant role in the utility of these platforms. Objective: In this paper, we characterize peer interactions in an online community for chronic disease management. Our objective is to identify key communications and study their prevalence in online social interactions. Methods: The American Diabetes Association Online community is an online social network for diabetes self-management. We analyzed 80,481 randomly selected deidentified peer-to-peer messages from 1212 members, posted between June 1, 2012, and May 30, 2019. Our mixed methods approach comprised qualitative coding and automated text analysis to identify, visualize, and analyze content-specific communication patterns underlying diabetes self-management. Results: Qualitative analysis revealed that ?social support? was the most prevalent theme (84.9%), followed by ?readiness to change? (18.8%), ?teachable moments? (14.7%), ?pharmacotherapy? (13.7%), and ?progress? (13.3%). The support vector machine classifier resulted in reasonable accuracy with a recall of 0.76 and precision 0.78 and allowed us to extend our thematic codes to the entire data set. Conclusions: Modeling health-related communication through high throughput methods can enable the identification of specific content related to sustainable chronic disease management, which facilitates targeted health promotion. UR - https://medinform.jmir.org/2020/6/e18441 UR - http://dx.doi.org/10.2196/18441 UR - http://www.ncbi.nlm.nih.gov/pubmed/32602843 ID - info:doi/10.2196/18441 ER - TY - JOUR AU - Montvida, Olga AU - Dibato, Epoh John AU - Paul, Sanjoy PY - 2020/6/3 TI - Evaluating the Representativeness of US Centricity Electronic Medical Records With Reports From the Centers for Disease Control and Prevention: Comparative Study on Office Visits and Cardiometabolic Conditions JO - JMIR Med Inform SP - e17174 VL - 8 IS - 6 KW - electronic medical records KW - observational study KW - epidemiology KW - population health N2 - Background: Electronic medical record (EMR)?based clinical and epidemiological research has dramatically increased over the last decade, although establishing the generalizability of such big databases for conducting epidemiological studies has been an ongoing challenge. To draw meaningful inferences from such studies, it is essential to fully understand the characteristics of the underlying population and potential biases in EMRs. Objective: This study aimed to assess the generalizability and representativity of the widely used US Centricity Electronic Medical Record (CEMR), a primary and ambulatory care EMR for population health research, using data from the National Ambulatory Medical Care Surveys (NAMCS) and the National Health and Nutrition Examination Surveys (NHANES). Methods: The number of office visits reported in the NAMCS, designed to meet the need for objective and reliable information about the provision and the use of ambulatory medical care services, was compared with similar data from the CEMR. The distribution of major cardiometabolic diseases in the NHANES, designed to assess the health and nutritional status of adults and children in the United States, was compared with similar data from the CEMR. Results: Gender and ethnicity distributions were similar between the NAMCS and the CEMR. Younger patients (aged <15 years) were underrepresented in the CEMR compared with the NAMCS. The number of office visits per 100 persons per year was similar: 277.9 (95% CI 259.3-296.5) in the NAMCS and 284.6 (95% CI 284.4-284.7) in the CEMR. However, the number of visits for males was significantly higher in the CEMR (CEMR: 270.8 and NAMCS: 239.0). West and South regions were underrepresented and overrepresented, respectively, in the CEMR. The overall prevalence of diabetes along with age and gender distribution was similar in the CEMR and the NHANES: overall prevalence, 10.1% and 9.7%; male, 11.5% and 10.8%; female, 9.1% and 8.8%; age 20 to 40 years, 2.5% and 1.8%; and age 40 to 60 years, 9.4% and 11.1%, respectively. The prevalence of obesity was similar: 42.1% and 39.6%, with similar age and female distribution (41.5% and 41.1%) but different male distribution (42.7% and 37.9%). The overall prevalence of high cholesterol along with age and female distribution was similar in the CEMR and the NHANES: overall prevalence, 12.4% and 12.4%; and female, 14.8% and 13.2%, respectively. The overall prevalence of hypertension was significantly higher in the CEMR (33.5%) than in the NHANES (95% CI: 27.0%-31.0%). Conclusions: The distribution of major cardiometabolic diseases in the CEMR is comparable with the national survey results. The CEMR represents the general US population well in terms of office visits and major chronic conditions, whereas the potential subgroup differences in terms of age and gender distribution and prevalence may differ and, therefore, should be carefully taken care of in future studies. UR - https://medinform.jmir.org/2020/6/e17174 UR - http://dx.doi.org/10.2196/17174 UR - http://www.ncbi.nlm.nih.gov/pubmed/32490850 ID - info:doi/10.2196/17174 ER - TY - JOUR AU - Tarekegn, Adane AU - Ricceri, Fulvio AU - Costa, Giuseppe AU - Ferracin, Elisa AU - Giacobini, Mario PY - 2020/6/4 TI - Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches JO - JMIR Med Inform SP - e16678 VL - 8 IS - 6 KW - predictive modeling KW - frailty KW - machine learning KW - genetic programming KW - imbalanced dataset KW - elderly people KW - classification N2 - Background: Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings. Objective: The objective of our study was to develop predictive models for frailty conditions in older people using different machine learning methods based on a database of clinical characteristics and socioeconomic factors. Methods: An administrative health database containing 1,095,612 elderly people aged 65 or older with 58 input variables and 6 output variables was used. We first identify and define six problems/outputs as surrogates of frailty. We then resolve the imbalanced nature of the data through resampling process and a comparative study between the different machine learning (ML) algorithms ? Artificial neural network (ANN), Genetic programming (GP), Support vector machines (SVM), Random Forest (RF), Logistic regression (LR) and Decision tree (DT) ? was carried out. The performance of each model was evaluated using a separate unseen dataset. Results: Predicting mortality outcome has shown higher performance with ANN (TPR 0.81, TNR 0.76, accuracy 0.78, F1-score 0.79) and SVM (TPR 0.77, TNR 0.80, accuracy 0.79, F1-score 0.78) than predicting the other outcomes. On average, over the six problems, the DT classifier has shown the lowest accuracy, while other models (GP, LR, RF, ANN, and SVM) performed better. All models have shown lower accuracy in predicting an event of an emergency admission with red code than predicting fracture and disability. In predicting urgent hospitalization, only SVM achieved better performance (TPR 0.75, TNR 0.77, accuracy 0.73, F1-score 0.76) with the 10-fold cross validation compared with other models in all evaluation metrics. Conclusions: We developed machine learning models for predicting frailty conditions (mortality, urgent hospitalization, disability, fracture, and emergency admission). The results show that the prediction performance of machine learning models significantly varies from problem to problem in terms of different evaluation metrics. Through further improvement, the model that performs better can be used as a base for developing decision-support tools to improve early identification and prediction of frail older adults. UR - http://medinform.jmir.org/2020/6/e16678/ UR - http://dx.doi.org/10.2196/16678 UR - http://www.ncbi.nlm.nih.gov/pubmed/32442149 ID - info:doi/10.2196/16678 ER - TY - JOUR AU - Wongvibulsin, Shannon AU - Wu, C. Katherine AU - Zeger, L. Scott PY - 2020/6/9 TI - Improving Clinical Translation of Machine Learning Approaches Through Clinician-Tailored Visual Displays of Black Box Algorithms: Development and Validation JO - JMIR Med Inform SP - e15791 VL - 8 IS - 6 KW - machine learning KW - interpretability KW - clinical translation KW - prediction models KW - visualization N2 - Background: Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and black box nature of these algorithms. Objective: The study aimed to demonstrate a general and simple framework for generating clinically relevant and interpretable visualizations of black box predictions to aid in the clinical translation of ML. Methods: To obtain improved transparency of ML, simplified models and visual displays can be generated using common methods from clinical practice such as decision trees and effect plots. We illustrated the approach based on postprocessing of ML predictions, in this case random forest predictions, and applied the method to data from the Left Ventricular (LV) Structural Predictors of Sudden Cardiac Death (SCD) Registry for individualized risk prediction of SCD, a leading cause of death. Results: With the LV Structural Predictors of SCD Registry data, SCD risk predictions are obtained from a random forest algorithm that identifies the most important predictors, nonlinearities, and interactions among a large number of variables while naturally accounting for missing data. The black box predictions are postprocessed using classification and regression trees into a clinically relevant and interpretable visualization. The method also quantifies the relative importance of an individual or a combination of predictors. Several risk factors (heart failure hospitalization, cardiac magnetic resonance imaging indices, and serum concentration of systemic inflammation) can be clearly visualized as branch points of a decision tree to discriminate between low-, intermediate-, and high-risk patients. Conclusions: Through a clinically important example, we illustrate a general and simple approach to increase the clinical translation of ML through clinician-tailored visual displays of results from black box algorithms. We illustrate this general model-agnostic framework by applying it to SCD risk prediction. Although we illustrate the methods using SCD prediction with random forest, the methods presented are applicable more broadly to improving the clinical translation of ML, regardless of the specific ML algorithm or clinical application. As any trained predictive model can be summarized in this manner to a prespecified level of precision, we encourage the use of simplified visual displays as an adjunct to the complex predictive model. Overall, this framework can allow clinicians to peek inside the black box and develop a deeper understanding of the most important features from a model to gain trust in the predictions and confidence in applying them to clinical care. UR - https://medinform.jmir.org/2020/6/e15791 UR - http://dx.doi.org/10.2196/15791 UR - http://www.ncbi.nlm.nih.gov/pubmed/32515746 ID - info:doi/10.2196/15791 ER - TY - JOUR AU - Hou, Can AU - Zhong, Xiaorong AU - He, Ping AU - Xu, Bin AU - Diao, Sha AU - Yi, Fang AU - Zheng, Hong AU - Li, Jiayuan PY - 2020/6/8 TI - Predicting Breast Cancer in Chinese Women Using Machine Learning Techniques: Algorithm Development JO - JMIR Med Inform SP - e17364 VL - 8 IS - 6 KW - machine learning KW - XGBoost KW - random forest KW - deep neural network KW - breast cancer N2 - Background: Risk-based breast cancer screening is a cost-effective intervention for controlling breast cancer in China, but the successful implementation of such intervention requires an accurate breast cancer prediction model for Chinese women. Objective: This study aimed to evaluate and compare the performance of four machine learning algorithms on predicting breast cancer among Chinese women using 10 breast cancer risk factors. Methods: A dataset consisting of 7127 breast cancer cases and 7127 matched healthy controls was used for model training and testing. We used repeated 5-fold cross-validation and calculated AUC, sensitivity, specificity, and accuracy as the measures of the model performance. Results: The three novel machine-learning algorithms (XGBoost, Random Forest and Deep Neural Network) all achieved significantly higher area under the receiver operating characteristic curves (AUCs), sensitivity, and accuracy than logistic regression. Among the three novel machine learning algorithms, XGBoost (AUC 0.742) outperformed deep neural network (AUC 0.728) and random forest (AUC 0.728). Main residence, number of live births, menopause status, age, and age at first birth were considered as top-ranked variables in the three novel machine learning algorithms. Conclusions: The novel machine learning algorithms, especially XGBoost, can be used to develop breast cancer prediction models to help identify women at high risk for breast cancer in developing countries. UR - http://medinform.jmir.org/2020/6/e17364/ UR - http://dx.doi.org/10.2196/17364 UR - http://www.ncbi.nlm.nih.gov/pubmed/32510459 ID - info:doi/10.2196/17364 ER - TY - JOUR AU - Chen, Weijia AU - Lu, Zhijun AU - You, Lijue AU - Zhou, Lingling AU - Xu, Jie AU - Chen, Ken PY - 2020/6/15 TI - Artificial Intelligence?Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study JO - JMIR Med Inform SP - e18186 VL - 8 IS - 6 KW - surgical site infection KW - machine learning KW - deep learning KW - natural language processing KW - artificial intelligence KW - risk assessment model KW - routinely collected data KW - electronic medical record KW - neural network KW - word embedding N2 - Background: Surgical site infection (SSI) is one of the most common types of health care?associated infections. It increases mortality, prolongs hospital length of stay, and raises health care costs. Many institutions developed risk assessment models for SSI to help surgeons preoperatively identify high-risk patients and guide clinical intervention. However, most of these models had low accuracies. Objective: We aimed to provide a solution in the form of an Artificial intelligence?based Multimodal Risk Assessment Model for Surgical site infection (AMRAMS) for inpatients undergoing operations, using routinely collected clinical data. We internally and externally validated the discriminations of the models, which combined various machine learning and natural language processing techniques, and compared them with the National Nosocomial Infections Surveillance (NNIS) risk index. Methods: We retrieved inpatient records between January 1, 2014, and June 30, 2019, from the electronic medical record (EMR) system of Rui Jin Hospital, Luwan Branch, Shanghai, China. We used data from before July 1, 2018, as the development set for internal validation and the remaining data as the test set for external validation. We included patient demographics, preoperative lab results, and free-text preoperative notes as our features. We used word-embedding techniques to encode text information, and we trained the LASSO (least absolute shrinkage and selection operator) model, random forest model, gradient boosting decision tree (GBDT) model, convolutional neural network (CNN) model, and self-attention network model using the combined data. Surgeons manually scored the NNIS risk index values. Results: For internal bootstrapping validation, CNN yielded the highest mean area under the receiver operating characteristic curve (AUROC) of 0.889 (95% CI 0.886-0.892), and the paired-sample t test revealed statistically significant advantages as compared with other models (P<.001). The self-attention network yielded the second-highest mean AUROC of 0.882 (95% CI 0.878-0.886), but the AUROC was only numerically higher than the AUROC of the third-best model, GBDT with text embeddings (mean AUROC 0.881, 95% CI 0.878-0.884, P=.47). The AUROCs of LASSO, random forest, and GBDT models using text embeddings were statistically higher than the AUROCs of models not using text embeddings (P<.001). For external validation, the self-attention network yielded the highest AUROC of 0.879. CNN was the second-best model (AUROC 0.878), and GBDT with text embeddings was the third-best model (AUROC 0.872). The NNIS risk index scored by surgeons had an AUROC of 0.651. Conclusions: Our AMRAMS based on EMR data and deep learning methods?CNN and self-attention network?had significant advantages in terms of accuracy compared with other conventional machine learning methods and the NNIS risk index. Moreover, the semantic embeddings of preoperative notes improved the model performance further. Our models could replace the NNIS risk index to provide personalized guidance for the preoperative intervention of SSIs. Through this case, we offered an easy-to-implement solution for building multimodal RAMs for other similar scenarios. UR - http://medinform.jmir.org/2020/6/e18186/ UR - http://dx.doi.org/10.2196/18186 UR - http://www.ncbi.nlm.nih.gov/pubmed/32538798 ID - info:doi/10.2196/18186 ER - TY - JOUR AU - Yang, Tianzhou AU - Zhang, Li AU - Yi, Liwei AU - Feng, Huawei AU - Li, Shimeng AU - Chen, Haoyu AU - Zhu, Junfeng AU - Zhao, Jian AU - Zeng, Yingyue AU - Liu, Hongsheng PY - 2020/6/18 TI - Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation JO - JMIR Med Inform SP - e15431 VL - 8 IS - 6 KW - type 2 diabetes KW - screening KW - non-invasive attributes KW - machine learning N2 - Background: Early diabetes screening can effectively reduce the burden of disease. However, natural population?based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes. Objective: The aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner. Methods: The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models. Results: We selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set. Conclusions: This study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention. UR - https://medinform.jmir.org/2020/6/e15431 UR - http://dx.doi.org/10.2196/15431 UR - http://www.ncbi.nlm.nih.gov/pubmed/32554386 ID - info:doi/10.2196/15431 ER - TY - JOUR AU - Ding, Xiaodong AU - Cheng, Feng AU - Morris, Robert AU - Chen, Cong AU - Wang, Yiqin PY - 2020/6/22 TI - Machine Learning?Based Signal Quality Evaluation of Single-Period Radial Artery Pulse Waves: Model Development and Validation JO - JMIR Med Inform SP - e18134 VL - 8 IS - 6 KW - pulse wave KW - quality evaluation KW - single period KW - segmentation KW - machine learning N2 - Background: The radial artery pulse wave is a widely used physiological signal for disease diagnosis and personal health monitoring because it provides insight into the overall health of the heart and blood vessels. Periodic radial artery pulse signals are subsequently decomposed into single pulse wave periods (segments) for physiological parameter evaluations. However, abnormal periods frequently arise due to external interference, the inherent imperfections of current segmentation methods, and the quality of the pulse wave signals. Objective: The objective of this paper was to develop a machine learning model to detect abnormal pulse periods in real clinical data. Methods: Various machine learning models, such as k-nearest neighbor, logistic regression, and support vector machines, were applied to classify the normal and abnormal periods in 8561 segments extracted from the radial pulse waves of 390 outpatients. The recursive feature elimination method was used to simplify the classifier. Results: It was found that a logistic regression model with only four input features can achieve a satisfactory result. The area under the receiver operating characteristic curve from the test set was 0.9920. In addition, these classifiers can be easily interpreted. Conclusions: We expect that this model can be applied in smart sport watches and watchbands to accurately evaluate human health status. UR - http://medinform.jmir.org/2020/6/e18134/ UR - http://dx.doi.org/10.2196/18134 UR - http://www.ncbi.nlm.nih.gov/pubmed/32568091 ID - info:doi/10.2196/18134 ER - TY - JOUR AU - Singh, Mark AU - Murthy, Akansh AU - Singh, Shridhar PY - 2020/6/23 TI - Correction: Prioritization of Free-Text Clinical Documents: A Novel Use of a Bayesian Classifier JO - JMIR Med Inform SP - e21379 VL - 8 IS - 6 UR - https://medinform.jmir.org/2020/6/e21379 UR - http://dx.doi.org/10.2196/21379 UR - http://www.ncbi.nlm.nih.gov/pubmed/32574150 ID - info:doi/10.2196/21379 ER - TY - JOUR AU - Hane, A. Christopher AU - Nori, S. Vijay AU - Crown, H. William AU - Sanghavi, M. Darshak AU - Bleicher, Paul PY - 2020/6/3 TI - Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study JO - JMIR Med Inform SP - e17819 VL - 8 IS - 6 KW - Alzheimer disease KW - dementia KW - health information systems KW - machine learning KW - natural language processing KW - health information interoperability N2 - Background: Clinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis. Objective: This study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD. Methods: We used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians. Results: When using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes? volume was largest; results are mixed in years 7 and 8 with the smallest cohorts. Conclusions: Although clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy. UR - https://medinform.jmir.org/2020/6/e17819 UR - http://dx.doi.org/10.2196/17819 UR - http://www.ncbi.nlm.nih.gov/pubmed/32490841 ID - info:doi/10.2196/17819 ER - TY - JOUR AU - Zhang, Hong AU - Ni, Wandong AU - Li, Jing AU - Zhang, Jiajun PY - 2020/6/15 TI - Artificial Intelligence?Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study JO - JMIR Med Inform SP - e17608 VL - 8 IS - 6 KW - traditional Chinese medicine KW - TCM KW - disease diagnosis KW - syndrome prediction KW - syndrome differentiation KW - natural language processing KW - NLP KW - artificial intelligence KW - AI KW - assistive diagnostic system KW - convolutional neural network KW - CNN KW - machine learning KW - ML KW - BiLSTM-CRF N2 - Background: Artificial intelligence?based assistive diagnostic systems imitate the deductive reasoning process of a human physician in biomedical disease diagnosis and treatment decision making. While impressive progress in this area has been reported, most of the reported successes are applications of artificial intelligence in Western medicine. The application of artificial intelligence in traditional Chinese medicine has lagged mainly because traditional Chinese medicine practitioners need to perform syndrome differentiation as well as biomedical disease diagnosis before a treatment decision can be made. Syndrome, a concept unique to traditional Chinese medicine, is an abstraction of a variety of signs and symptoms. The fact that the relationship between diseases and syndromes is not one-to-one but rather many-to-many makes it very challenging for a machine to perform syndrome predictions. So far, only a handful of artificial intelligence?based assistive traditional Chinese medicine diagnostic models have been reported, and they are limited in application to a single disease-type. Objective: The objective was to develop an artificial intelligence?based assistive diagnostic system capable of diagnosing multiple types of diseases that are common in traditional Chinese medicine, given a patient?s electronic health record notes. The system was designed to simultaneously diagnose the disease and produce a list of corresponding syndromes. Methods: Unstructured freestyle electronic health record notes were processed by natural language processing techniques to extract clinical information such as signs and symptoms which were represented by named entities. Natural language processing used a recurrent neural network model called bidirectional long short-term memory network?conditional random forest. A convolutional neural network was then used to predict the disease-type out of 187 diseases in traditional Chinese medicine. A novel traditional Chinese medicine syndrome prediction method?an integrated learning model?was used to produce a corresponding list of probable syndromes. By following a majority-rule voting method, the integrated learning model for syndrome prediction can take advantage of four existing prediction methods (back propagation, random forest, extreme gradient boosting, and support vector classifier) while avoiding their respective weaknesses which resulted in a consistently high prediction accuracy. Results: A data set consisting of 22,984 electronic health records from Guanganmen Hospital of the China Academy of Chinese Medical Sciences that were collected between January 1, 2017 and September 7, 2018 was used. The data set contained a total of 187 diseases that are commonly diagnosed in traditional Chinese medicine. The diagnostic system was designed to be able to detect any one of the 187 disease-types. The data set was partitioned into a training set, a validation set, and a testing set in a ratio of 8:1:1. Test results suggested that the proposed system had a good diagnostic accuracy and a strong capability for generalization. The disease-type prediction accuracies of the top one, top three, and top five were 80.5%, 91.6%, and 94.2%, respectively. Conclusions: The main contributions of the artificial intelligence?based traditional Chinese medicine assistive diagnostic system proposed in this paper are that 187 commonly known traditional Chinese medicine diseases can be diagnosed and a novel prediction method called an integrated learning model is demonstrated. This new prediction method outperformed all four existing methods in our preliminary experimental results. With further improvement of the algorithms and the availability of additional electronic health record data, it is expected that a wider range of traditional Chinese medicine disease-types could be diagnosed and that better diagnostic accuracies could be achieved. UR - http://medinform.jmir.org/2020/6/e17608/ UR - http://dx.doi.org/10.2196/17608 UR - http://www.ncbi.nlm.nih.gov/pubmed/32538797 ID - info:doi/10.2196/17608 ER - TY - JOUR AU - Liu, Ziqing AU - He, Haiyang AU - Yan, Shixing AU - Wang, Yong AU - Yang, Tao AU - Li, Guo-Zheng PY - 2020/6/16 TI - End-to-End Models to Imitate Traditional Chinese Medicine Syndrome Differentiation in Lung Cancer Diagnosis: Model Development and Validation JO - JMIR Med Inform SP - e17821 VL - 8 IS - 6 KW - traditional Chinese medicine KW - syndrome differentiation KW - lung cancer KW - medical record KW - deep learning KW - model fusion N2 - Background: Traditional Chinese medicine (TCM) has been shown to be an efficient mode to manage advanced lung cancer, and accurate syndrome differentiation is crucial to treatment. Documented evidence of TCM treatment cases and the progress of artificial intelligence technology are enabling the development of intelligent TCM syndrome differentiation models. This is expected to expand the benefits of TCM to lung cancer patients. Objective: The objective of this work was to establish end-to-end TCM diagnostic models to imitate lung cancer syndrome differentiation. The proposed models used unstructured medical records as inputs to capitalize on data collected for practical TCM treatment cases by lung cancer experts. The resulting models were expected to be more efficient than approaches that leverage structured TCM datasets. Methods: We approached lung cancer TCM syndrome differentiation as a multilabel text classification problem. First, entity representation was conducted with Bidirectional Encoder Representations from Transformers and conditional random fields models. Then, five deep learning?based text classification models were applied to the construction of a medical record multilabel classifier, during which two data augmentation strategies were adopted to address overfitting issues. Finally, a fusion model approach was used to elevate the performance of the models. Results: The F1 score of the recurrent convolutional neural network (RCNN) model with augmentation was 0.8650, a 2.41% improvement over the unaugmented model. The Hamming loss for RCNN with augmentation was 0.0987, which is 1.8% lower than that of the same model without augmentation. Among the models, the text-hierarchical attention network (Text-HAN) model achieved the highest F1 scores of 0.8676 and 0.8751. The mean average precision for the word encoding?based RCNN was 10% higher than that of the character encoding?based representation. A fusion model of the text-convolutional neural network, text-recurrent neural network, and Text-HAN models achieved an F1 score of 0.8884, which showed the best performance among the models. Conclusions: Medical records could be used more productively by constructing end-to-end models to facilitate TCM diagnosis. With the aid of entity-level representation, data augmentation, and model fusion, deep learning?based multilabel classification approaches can better imitate TCM syndrome differentiation in complex cases such as advanced lung cancer. UR - https://medinform.jmir.org/2020/6/e17821 UR - http://dx.doi.org/10.2196/17821 UR - http://www.ncbi.nlm.nih.gov/pubmed/32543445 ID - info:doi/10.2196/17821 ER - TY - JOUR AU - Su, Longxiang AU - Liu, Chun AU - Li, Dongkai AU - He, Jie AU - Zheng, Fanglan AU - Jiang, Huizhen AU - Wang, Hao AU - Gong, Mengchun AU - Hong, Na AU - Zhu, Weiguo AU - Long, Yun PY - 2020/6/22 TI - Toward Optimal Heparin Dosing by Comparing Multiple Machine Learning Methods: Retrospective Study JO - JMIR Med Inform SP - e17648 VL - 8 IS - 6 KW - heparin KW - dosing KW - machine learning KW - optimization KW - intensive care unit N2 - Background: Heparin is one of the most commonly used medications in intensive care units. In clinical practice, the use of a weight-based heparin dosing nomogram is standard practice for the treatment of thrombosis. Recently, machine learning techniques have dramatically improved the ability of computers to provide clinical decision support and have allowed for the possibility of computer generated, algorithm-based heparin dosing recommendations. Objective: The objective of this study was to predict the effects of heparin treatment using machine learning methods to optimize heparin dosing in intensive care units based on the predictions. Patient state predictions were based upon activated partial thromboplastin time in 3 different ranges: subtherapeutic, normal therapeutic, and supratherapeutic, respectively. Methods: Retrospective data from 2 intensive care unit research databases (Multiparameter Intelligent Monitoring in Intensive Care III, MIMIC-III; e?Intensive Care Unit Collaborative Research Database, eICU) were used for the analysis. Candidate machine learning models (random forest, support vector machine, adaptive boosting, extreme gradient boosting, and shallow neural network) were compared in 3 patient groups to evaluate the classification performance for predicting the subtherapeutic, normal therapeutic, and supratherapeutic patient states. The model results were evaluated using precision, recall, F1 score, and accuracy. Results: Data from the MIMIC-III database (n=2789 patients) and from the eICU database (n=575 patients) were used. In 3-class classification, the shallow neural network algorithm performed the best (F1 scores of 87.26%, 85.98%, and 87.55% for data set 1, 2, and 3, respectively). The shallow neural network algorithm achieved the highest F1 scores within the patient therapeutic state groups: subtherapeutic (data set 1: 79.35%; data set 2: 83.67%; data set 3: 83.33%), normal therapeutic (data set 1: 93.15%; data set 2: 87.76%; data set 3: 84.62%), and supratherapeutic (data set 1: 88.00%; data set 2: 86.54%; data set 3: 95.45%) therapeutic ranges, respectively. Conclusions: The most appropriate model for predicting the effects of heparin treatment was found by comparing multiple machine learning models and can be used to further guide optimal heparin dosing. Using multicenter intensive care unit data, our study demonstrates the feasibility of predicting the outcomes of heparin treatment using data-driven methods, and thus, how machine learning?based models can be used to optimize and personalize heparin dosing to improve patient safety. Manual analysis and validation suggested that the model outperformed standard practice heparin treatment dosing. UR - http://medinform.jmir.org/2020/6/e17648/ UR - http://dx.doi.org/10.2196/17648 UR - http://www.ncbi.nlm.nih.gov/pubmed/32568089 ID - info:doi/10.2196/17648 ER - TY - JOUR AU - Li, Genghao AU - Li, Bing AU - Huang, Langlin AU - Hou, Sibing PY - 2020/6/23 TI - Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study JO - JMIR Med Inform SP - e17650 VL - 8 IS - 6 KW - depression detection KW - depression diagnosis KW - social media KW - automatic construction KW - domain-specific lexicon KW - depression lexicon KW - label propagation N2 - Background: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. Objective: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. Methods: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. Results: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. Conclusions: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods. UR - http://medinform.jmir.org/2020/6/e17650/ UR - http://dx.doi.org/10.2196/17650 UR - http://www.ncbi.nlm.nih.gov/pubmed/32574151 ID - info:doi/10.2196/17650 ER - TY - JOUR AU - Fu, Weifeng PY - 2020/6/3 TI - Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study JO - JMIR Med Inform SP - e18677 VL - 8 IS - 6 KW - speech recognition KW - isolated words KW - mental health KW - small vocabulary KW - HMM KW - hidden Markov model KW - programming N2 - Background: Speech recognition is a technology that enables machines to understand human language. Objective: In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods: A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results: The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions: The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system. UR - https://medinform.jmir.org/2020/6/e18677 UR - http://dx.doi.org/10.2196/18677 UR - http://www.ncbi.nlm.nih.gov/pubmed/32384054 ID - info:doi/10.2196/18677 ER - TY - JOUR AU - Du, Lin PY - 2020/6/25 TI - Medical Emergency Resource Allocation Model in Large-Scale Emergencies Based on Artificial Intelligence: Algorithm Development JO - JMIR Med Inform SP - e19202 VL - 8 IS - 6 KW - medical emergency KW - resource allocation model KW - distribution model KW - large-scale emergencies KW - artificial intelligence N2 - Background: Before major emergencies occur, the government needs to prepare various emergency supplies in advance. To do this, it should consider the coordinated storage of different types of materials while ensuring that emergency materials are not missed or superfluous. Objective: This paper aims to improve the dispatch and transportation efficiency of emergency materials under a model in which the government makes full use of Internet of Things technology and artificial intelligence technology. Methods: The paper established a model for emergency material preparation and dispatch based on queueing theory and further established a workflow system for emergency material preparation, dispatch, and transportation based on a Petri net, resulting in a highly efficient emergency material preparation and dispatch simulation system framework. Results: A decision support platform was designed to integrate all the algorithms and principles proposed. Conclusions: The resulting framework can effectively coordinate the workflow of emergency material preparation and dispatch, helping to shorten the total time of emergency material preparation, dispatch, and transportation. UR - http://medinform.jmir.org/2020/6/e19202/ UR - http://dx.doi.org/10.2196/19202 UR - http://www.ncbi.nlm.nih.gov/pubmed/32584262 ID - info:doi/10.2196/19202 ER -