TY - JOUR AU - Roytman, V. Maya AU - Lu, Layna AU - Soyemi, Elizabeth AU - Leziak, Karolina AU - Niznik, M. Charlotte AU - Yee, M. Lynn PY - 2025/4/21 TI - Exploring Psychosocial Burdens of Diabetes in Pregnancy and the Feasibility of Technology-Based Support: Qualitative Study JO - JMIR Diabetes SP - e53854 VL - 10 KW - digital health KW - mHealth KW - pregnancy KW - psychosocial KW - social determinants KW - technology KW - diabetes KW - burdens KW - qualitative analysis KW - mobile apps KW - feasibility N2 - Background: Gestational diabetes mellitus and type 2 diabetes mellitus impose psychosocial burdens on pregnant individuals. As there is less evidence about the experience and management of psychosocial burdens of diabetes mellitus during pregnancy, we sought to identify these psychosocial burdens and understand how a novel smartphone app may alleviate them. The app was designed to provide supportive, educational, motivational, and logistical support content, delivered through interactive messages. Objective: The study aimed to analyze the qualitative data generated in a feasibility randomized controlled trial of a novel mobile app designed to promote self-management skills, motivate healthy behaviors, and inform low-income pregnant individuals with diabetes. Methods: Individuals receiving routine clinical care at a single, large academic medical center in Chicago, Illinois were randomized to use of the SweetMama app (n=30) or usual care (n=10) from diagnosis of diabetes until 6 weeks post partum. All individuals completed exit interviews at delivery about their experience of having diabetes during pregnancy. Interviews were guided by a semistructured interview guide and were conducted by a single interviewer extensively trained in empathic, culturally sensitive qualitative interviewing of pregnant and postpartum people. SweetMama users were also queried about their perspectives on the app. Interview data were audio-recorded and professionally transcribed. Data were analyzed by 2 researchers independently using grounded theory constant comparative techniques. Results: Of the 40 participants, the majority had gestational diabetes mellitus (n=25, 63%), publicly funded prenatal care (n=33, 83%), and identified as non-Hispanic Black (n=25, 63%) or Hispanic (n=14, 35%). Participants identified multiple psychosocial burdens, including challenges taking action, negative affectivity regarding diagnosis, diet guilt, difficulties managing other responsibilities, and reluctance to use insulin. External factors, such as taking care of children or navigating the COVID-19 pandemic, affected participant self-perception and motivation to adhere to clinical recommendations. SweetMama participants largely agreed that the use of the app helped mitigate these burdens by enhancing self-efficacy, capitalizing on external motivation, validating efforts, maintaining medical nutrition therapy, extending clinical care, and building a sense of community. Participants expressed that SweetMama supported the goals they established with their clinical team and helped them harness motivating factors for self-care. Conclusions: Psychosocial burdens of diabetes during pregnancy present challenges with diabetes self-management. Mobile health support may be an effective tool to provide motivation, behavioral cues, and access to educational and social network resources to alleviate psychosocial burdens during pregnancy. Future incorporation of machine learning and language processing models in the app may provide further personalization of recommendations and education for individuals with DM during pregnancy. Trial Registration: ClinicalTrials.gov NCT03240874; https://clinicaltrials.gov/study/NCT03240874 UR - https://diabetes.jmir.org/2025/1/e53854 UR - http://dx.doi.org/10.2196/53854 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53854 ER - TY - JOUR AU - Draugelis, Sarah AU - Hunnewell, Jessica AU - Bishop, Sam AU - Goswami, Reena AU - Smith, G. Sean AU - Sutherland, Philip AU - Hickman, Justin AU - Donahue, A. Donald AU - Yendewa, A. George AU - Mohareb, M. Amir PY - 2025/4/17 TI - Leveraging Electronic Health Records in International Humanitarian Clinics for Population Health Research: Cross-Sectional Study JO - JMIR Public Health Surveill SP - e66223 VL - 11 KW - refugee KW - population health KW - disaster medicine KW - humanitarian clinic KW - electronic health record KW - Fast Electronic Medical Record KW - fEMR N2 - Background: As more humanitarian relief organizations are beginning to use electronic medical records in their operations, data from clinical encounters can be leveraged for public health planning. Currently, medical data from humanitarian medical workers are infrequently available in a format that can be analyzed, interpreted, and used for public health. Objectives: This study aims to develop and test a methodology by which diagnosis and procedure codes can be derived from free-text medical encounters by medical relief practitioners for the purposes of data analysis. Methods: We conducted a cross-sectional study of clinical encounters from humanitarian clinics for displaced persons in Mexico between August 3, 2021, and December 5, 2022. We developed and tested a method by which free-text encounters were reviewed by medical billing coders and assigned codes from the International Classification of Diseases, Tenth Revision (ICD-10) and the Current Procedural Terminology (CPT). Each encounter was independently reviewed in duplicate and assigned ICD-10 and CPT codes in a blinded manner. Encounters with discordant codes were reviewed and arbitrated by a more experienced medical coder, whose decision was used to determine the final ICD-10 and CPT codes. We used chi-square tests of independence to compare the ICD-10 codes for concordance across single-diagnosis and multidiagnosis encounters and across patient characteristics, such as age, sex, and country of origin. Results: We analyzed 8460 encounters representing 5623 unique patients and 2774 unique diagnosis codes. These free-text encounters had a mean of 20.5 words per encounter in the clinical documentation. There were 58.78% (4973/8460) encounters where both coders assigned 1 diagnosis code, 18.56% (1570/8460) encounters where both coders assigned multiple diagnosis codes, and 22.66% (1917/8460) encounters with a mixed number of codes assigned. Of the 4973 encounters with a single code, only 11.82% (n=588) had a unique diagnosis assigned by the arbitrator that was not assigned by either of the initial 2 coders. Of the 1570 encounters with multiple diagnosis codes, only 3.38% (n=53) had unique diagnosis codes assigned by the arbitrator that were not initially assigned by either coder. The frequency of complete concordance across diagnosis codes was similar across sex categories and ranged from 30.43% to 46.05% across age groups and countries of origin. Conclusions: Free-text electronic medical records from humanitarian relief clinics can be used to develop a database of diagnosis and procedure codes. The method developed in this study, which used multiple independent reviews of clinical encounters, appears to reliably assign diagnosis codes across a diverse patient population in a resource-limited setting. UR - https://publichealth.jmir.org/2025/1/e66223 UR - http://dx.doi.org/10.2196/66223 ID - info:doi/10.2196/66223 ER - TY - JOUR AU - Riou, Christine AU - El Azzouzi, Mohamed AU - Hespel, Anne AU - Guillou, Emeric AU - Coatrieux, Gouenou AU - Cuggia, Marc PY - 2025/4/17 TI - Ensuring General Data Protection Regulation Compliance and Security in a Clinical Data Warehouse From a University Hospital: Implementation Study JO - JMIR Med Inform SP - e63754 VL - 13 KW - clinical data warehouse KW - privacy KW - personal data protection KW - legislation KW - security KW - compliance KW - personal data KW - applicability KW - experiential analysis KW - university hospitals KW - French KW - France KW - data hub KW - operational challenge N2 - Background: The European Union?s General Data Protection Regulation (GDPR) has profoundly influenced health data management, with significant implications for clinical data warehouses (CDWs). In 2021, France pioneered a national framework for GDPR-compliant CDW implementation, established by its data protection authority (Commission Nationale de l?Informatique et des Libertés). This framework provides detailed guidelines for health care institutions, offering a unique opportunity to assess practical GDPR implementation in health data management. Objective: This study evaluates the real-world applicability of France?s CDW framework through its implementation at a major university hospital. It identifies practical challenges for its implementation by health institutions and proposes adaptations relevant to regulatory authorities in order to facilitate research in secondary use data domains. Methods: A systematic assessment was conducted in May 2023 at the University Hospital of Rennes, which manages data for over 2 million patients through the eHOP CDW system. The evaluation examined 116 criteria across 13 categories using a dual-assessment approach validated by information security and data protection officers. Compliance was rated as met, unmet, or not applicable, with criteria classified as software-related (n=25) or institution-related (n=91). Results: Software-related criteria showed 60% (n=15) compliance, with 28% (n=7) noncompliant or partially compliant and 12% (n=3) not applicable. Institution-related criteria achieved 72% (n=28) compliance for security requirements. Key challenges included managing genetic data, implementing automated archiving, and controlling data exports. The findings revealed effective privacy protection measures but also highlighted areas requiring regulatory adjustments to better support research. Conclusions: This first empirical assessment of a national CDW compliance framework offers valuable insights for health care institutions implementing GDPR requirements. While the framework establishes robust privacy protections, certain provisions may overly constrain research activities. The study identifies opportunities for framework evolution, balancing data protection with research imperatives. UR - https://medinform.jmir.org/2025/1/e63754 UR - http://dx.doi.org/10.2196/63754 ID - info:doi/10.2196/63754 ER - TY - JOUR AU - Wiesmüller, Fabian AU - Prenner, Andreas AU - Ziegl, Andreas AU - El-Moazen, Gihan AU - Modre-Osprian, Robert AU - Baumgartner, Martin AU - Brodmann, Marianne AU - Seinost, Gerald AU - Silbernagel, Günther AU - Schreier, Günter AU - Hayn, Dieter PY - 2025/4/10 TI - Support of Home-Based Structured Walking Training and Prediction of the 6-Minute Walk Test Distance in Patients With Peripheral Arterial Disease Based on Telehealth Data: Prospective Cohort Study JO - JMIR Form Res SP - e65721 VL - 9 KW - mHealth KW - telehealth KW - peripheral arterial disease KW - home-based structured walking training KW - trend estimation KW - predictive modeling KW - continuous data KW - walking KW - walking training KW - prediction KW - prediction model KW - cardiovascular disease KW - stroke KW - heart failure KW - physical fitness KW - telehealth system N2 - Background: Telehealth has been effective in managing cardiovascular diseases like stroke and heart failure and has shown promising results in managing patients with peripheral arterial disease. However, more work is needed to fully understand the effect of telehealth-based predictive modeling on the physical fitness of patients with peripheral arterial disease. Objective: For this work, data from the Keep Pace study were analyzed in depth to gain insights on temporal developments of patients? conditions and to develop models to predict the patients? total walking distance at the study end. This could help to determine patients who are likely to benefit from the telehealth program and to continuously provide estimations to the patients as a motivating factor. Methods: This work analyzes continuous patient-reported telehealth data, in combination with in-clinic data from 19 Fontaine stage II patients with peripheral arterial disease who underwent a 12-week telehealth-based walking program. This analysis granted insights into the increase of the total walking distance of the 6-minute walk tests (6MWT) as a measure for physical fitness, the steady decrease in the patients? pain, and the positive correlation between well-being and the total walking distance measured by the 6MWT. Results: This work analyzed trends of and correlations between continuous patient-generated data. Findings of this study include a significant decrease of the patients? pain sensation over time (P=.006), a low but highly significant correlation between pain sensation and steps taken on the same day (r=?0.11; P<.001) and the walking distance of the independently performed 6MWTs (r=?0.39; P<.001). Despite the reported pain, adherence to the 6MWT measurement protocol was high (85.53%). Additionally, patients significantly improved their timed-up-and-go test times during the study (P=.002). Predicting the total walking distance at the study end measured by the 6MWT worked well at study baseline (root mean squared error of 30 meters; 7.04% of the mean total walking distance at the study end of 425 meters) and continuously improved by adding further telehealth data. Future work should validate these findings in a larger cohort and in a prospective setting based on a clinical outcome. Conclusions: We conclude that the prototypical trend estimation has great potential for an integration in the telehealth system to be used in future work to provide tailored patient-specific advice based on these predictions. Continuous data from the telehealth system grant a deeper insight and a better understanding of the patients? status concerning well-being and level of pain as well as their current physical fitness level and the progress toward reaching set goals. Trial Registration: ClinicalTrials.gov Identifier: NCT05619835; https://tinyurl.com/mrxt7y9u UR - https://formative.jmir.org/2025/1/e65721 UR - http://dx.doi.org/10.2196/65721 ID - info:doi/10.2196/65721 ER - TY - JOUR AU - Sumsion, Daniel AU - Davis, Elijah AU - Fernandes, Marta AU - Wei, Ruoqi AU - Milde, Rebecca AU - Veltink, Malou Jet AU - Kong, Wan-Yee AU - Xiong, Yiwen AU - Rao, Samvrit AU - Westover, Tara AU - Petersen, Lydia AU - Turley, Niels AU - Singh, Arjun AU - Buss, Stephanie AU - Mukerji, Shibani AU - Zafar, Sahar AU - Das, Sudeshna AU - Junior, Moura Valdery AU - Ghanta, Manohar AU - Gupta, Aditya AU - Kim, Jennifer AU - Stone, Katie AU - Mignot, Emmanuel AU - Hwang, Dennis AU - Trotti, Marie Lynn AU - Clifford, D. Gari AU - Katwa, Umakanth AU - Thomas, Robert AU - Westover, Brandon M. AU - Sun, Haoqi PY - 2025/4/10 TI - Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study JO - JMIR Med Inform SP - e64113 VL - 13 KW - electronic health record KW - machine learning KW - artificial intelligence KW - phenotype KW - congestive heart failure KW - medication KW - claims database KW - International Classification of Diseases KW - effectiveness KW - natural language processing KW - model performance KW - logistic regression KW - validity N2 - Background: Congestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (using International Classification of Diseases [ICD] codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites. Objective: We seek to accurately classify the diagnosis of CHF based on structured and unstructured data from each patient, including medications, ICD codes, and information extracted through NLP of notes left by providers, by comparing the effectiveness of several machine learning models. Methods: We developed an NLP model to identify CHF from medical records using electronic health records (EHRs) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center; from 2010 to 2023), with 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forests, and RoBERTa models. We measured model performance using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other, and an overall estimated error was calculated using a completely random sample from both hospitals. Results: The average age of the patients was 66.7 (SD 17.2) years; 978 (54.3%) out of 1821 patients were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes, with an AUROC of 0.968 (95% CI 0.940-0.982) and an AUPRC of 0.921 (95% CI 0.835-0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample was 1.6%. The model also showed high external validity from training on Mass General Hospital data and testing on Beth Israel Deaconess Medical Center data (AUROC 0.927, 95% CI 0.908-0.944) and vice versa (AUROC 0.968, 95% CI 0.957-0.976). Conclusions: The proposed EHR-based phenotyping model for CHF achieved excellent performance, external validity, and generalization across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms. UR - https://medinform.jmir.org/2025/1/e64113 UR - http://dx.doi.org/10.2196/64113 UR - http://www.ncbi.nlm.nih.gov/pubmed/40208662 ID - info:doi/10.2196/64113 ER - TY - JOUR AU - Remaki, Adam AU - Ung, Jacques AU - Pages, Pierre AU - Wajsburt, Perceval AU - Liu, Elise AU - Faure, Guillaume AU - Petit-Jean, Thomas AU - Tannier, Xavier AU - Gérardin, Christel PY - 2025/4/9 TI - Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study JO - JMIR Med Inform SP - e68704 VL - 13 KW - secondary use of clinical data for research and surveillance KW - clinical informatics KW - clinical data warehouse KW - electronic health record KW - data science KW - artificial intelligence KW - AI KW - natural language processing KW - ontologies KW - classifications KW - coding KW - tools KW - programs and algorithms KW - immune-mediated inflammatory diseases N2 - Background: Valuable insights gathered by clinicians during their inquiries and documented in textual reports are often unavailable in the structured data recorded in electronic health records (EHRs). Objective: This study aimed to highlight that mining unstructured textual data with natural language processing techniques complements the available structured data and enables more comprehensive patient phenotyping. A proof-of-concept for patients diagnosed with specific autoimmune diseases is presented, in which the extraction of information on laboratory tests and drug treatments is performed. Methods: We collected EHRs available in the clinical data warehouse of the Greater Paris University Hospitals from 2012 to 2021 for patients hospitalized and diagnosed with 1 of 4 immune-mediated inflammatory diseases: systemic lupus erythematosus, systemic sclerosis, antiphospholipid syndrome, and Takayasu arteritis. Then, we built, trained, and validated natural language processing algorithms on 103 discharge summaries selected from the cohort and annotated by a clinician. Finally, all discharge summaries in the cohort were processed with the algorithms, and the extracted data on laboratory tests and drug treatments were compared with the structured data. Results: Named entity recognition followed by normalization yielded F1-scores of 71.1 (95% CI 63.6-77.8) for the laboratory tests and 89.3 (95% CI 85.9-91.6) for the drugs. Application of the algorithms to 18,604 EHRs increased the detection of antibody results and drug treatments. For instance, among patients in the systemic lupus erythematosus cohort with positive antinuclear antibodies, the rate increased from 18.34% (752/4102) to 71.87% (2949/4102), making the results more consistent with the literature. Conclusions: While challenges remain in standardizing laboratory tests, particularly with abbreviations, this work, based on secondary use of clinical data, demonstrates that automated processing of discharge summaries enriched the information available in structured data and facilitated more comprehensive patient profiling. UR - https://medinform.jmir.org/2025/1/e68704 UR - http://dx.doi.org/10.2196/68704 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/68704 ER - TY - JOUR AU - Han, Seunghoon AU - Song, Jihong AU - Han, Sungpil AU - Choi, Suein AU - Lim, Jonghyuk AU - Oh, Yeob Byeong AU - Shin, Dongoh PY - 2025/4/7 TI - Participant Adherence in Repeated-Dose Clinical Studies Using Video-Based Observation: Retrospective Data Analysis JO - JMIR Mhealth Uhealth SP - e65668 VL - 13 KW - adherence KW - mobile health KW - self-administration KW - repeated-dose clinical trials KW - video-based monitoring KW - mobile phone N2 - Background: Maintaining accurate medication records in clinical trials is essential to ensure data validity. Traditional methods such as direct observation, self-reporting, and pill counts have shown limitations that make them inaccurate or impractical. Video-based monitoring systems, available as commercial or proprietary mobile applications for smartphones and tablets, offer a promising solution to these traditional limitations. In Korea, a system applicable to the clinical trial context has been developed and used. Objective: This study aimed to evaluate the usefulness of an asynchronous video-based self-administration of the investigational medicinal product (SAI) monitoring system (VSMS) in ensuring accurate dosing and validating participant adherence to planned dosing times in repeated-dose clinical trials. Methods: A retrospective analysis was conducted using data from 17,619 SAI events in repeated-dose clinical trials using the VSMS between February 2020 and March 2023. The SAI events were classified into four categories: (1) Verified on-time dosing, (2) Verified deviated dosing, (3) Unverified dosing, and (4) Missed dosing. Analysis methods included calculating the success rate for verified SAI events and analyzing trends in difference between planned and actual dosing times (PADEV) over the dosing period and by push notification type. The mean PADEV for each subsequent dosing period was compared with the initial period using either a paired t test or a Wilcoxon signed-rank test to assess any differences. Results: A comprehensive analysis of 17,619 scheduled SAI events across 14 cohorts demonstrated a high success rate of 97% (17,151/17,619), with only 3% (468/17,619) unsuccessful due to issues like unclear video recordings or technical difficulties. Of the successful events, 99% (16,975/17,151) were verified as on-time dosing, confirming that the dosing occurred within the designated SAI time window with appropriate recorded behavior. In addition, over 90% (367/407) of participants consistently reported dosing videos on all analyzed SAI days, with most days showing over 90% objective dosing data, underscoring the system?s effectiveness in supporting accurate SAI. There were cohort differences in the tendency to dose earlier or later, but no associated cohort characteristics were identified. The initial SAI behaviors were generally sustained during the whole period of participation, with only 16% (13/79) of study days showing significant shifts in actual dosing times. Earlier deviations in SAI times were observed when only dosing notifications were used, compared with using reminders together or no notifications. Conclusions: VSMS has proven to be an effective tool for obtaining dosing information with accuracy comparable to direct observation, even in remote settings. The use of various alarm features and appropriate intervention by the investigator or observer was identified as a way to minimize adherence deterioration. It is expected that the usage and usefulness of VSMS will be continuously improved through the accumulation of experience in various medical fields. UR - https://mhealth.jmir.org/2025/1/e65668 UR - http://dx.doi.org/10.2196/65668 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65668 ER - TY - JOUR AU - Karschuck, Philipp AU - Groeben, Christer AU - Koch, Rainer AU - Krones, Tanja AU - Neisius, Andreas AU - von Ahn, Sven AU - Klopf, Peter Christian AU - Weikert, Steffen AU - Siebels, Michael AU - Haseke, Nicolas AU - Weissflog, Christian AU - Baunacke, Martin AU - Thomas, Christian AU - Liske, Peter AU - Tosev, Georgi AU - Benusch, Thomas AU - Schostak, Martin AU - Stein, Joachim AU - Spiegelhalder, Philipp AU - Ihrig, Andreas AU - Huber, Johannes PY - 2025/4/7 TI - Urologists? Estimation of Online Support Group Utilization Behavior of Their Patients With Newly Diagnosed Nonmetastatic Prostate Cancer in Germany: Predefined Secondary Analysis of a Randomized Controlled Trial JO - J Med Internet Res SP - e56092 VL - 27 KW - peer support KW - prostate cancer KW - online support KW - health services research KW - randomized controlled trial KW - decision aid N2 - Background: Due to its high incidence, prostate cancer (PC) imposes a burden on Western societies. Individualized treatment decision for nonmetastatic PC (eg, surgery, radiation, focal therapy, active surveillance, watchful waiting) is challenging. The range of options might make affected persons seek peer-to-peer counseling. Besides traditional face-to-face support groups (F2FGs), online support groups (OSGs) became important, especially during COVID-19. Objective: This study aims to investigate utilization behavior and physician advice concerning F2FGs and OSGs for patients with newly diagnosed PC. We hypothesized greater importance of OSGs to support treatment decisions. We assumed that this form of peer-to-peer support is underestimated by the treating physicians. We also considered the effects of the COVID-19 pandemic. Methods: This was a secondary analysis of data from a randomized controlled trial comparing an online decision aid versus a printed brochure for patients with nonmetastatic PC. We investigated 687 patients from 116 urological practices throughout Germany before primary treatment. Of these, 308 were included before and 379 during the COVID-19 pandemic. At the 1-year follow-up visit, patients filled an online questionnaire about their use of traditional or online self-help, including consultation behaviors or attitudes concerning initial treatment decisions. We measured secondary outcomes with validated questionnaires such as Distress Thermometer and the Patient Health Questionnaire-4 items to assess distress, anxiety, and depression. Physicians were asked in a paper-based questionnaire whether patients had accessed peer-to-peer support. Group comparisons were made using chi-square or McNemar tests for nominal variables and 2-sided t tests for ordinally scaled data. Results: Before COVID-19, 2.3% (7/308) of the patients attended an F2FG versus none thereafter. The frequency of OSG use did not change significantly: OSGs were used by 24.7% (76/308) and 23.5% (89/308) of the patients before and during COVID-19, respectively. OSG users had higher levels of anxiety and depression; 38% (46/121) reported OSG as helpful for decision-making. Although 4% (19/477) of OSG nonusers regretted treatment decisions, only 0.7% (1/153) of OSG users did (P=.03). More users than nonusers reported that OSGs were mentioned by physicians (P<.001). Patients and physicians agreed that F2FGs and OSGs were not mentioned in conversations or visited by patients. For 86% (6/7) of the patients, the physician was not aware of F2FG attendance. Physicians underestimated OSG usage by 2.6% (18/687) versus 24% (165/687) of actual use (P<.001). Conclusions: Physicians are more aware of F2FGs than OSGs. Before COVID-19, F2FGs played a minor role. One out of 4 patients used OSGs. One-third considered them helpful for treatment decision-making. OSG use rarely affects the final treatment decision. Urologists significantly underestimate OSG use by their patients. Peer-to-peer support is more likely to be received by patients with anxiety and depression. Comparative interventional trials are needed to recommend peer-to-peer interventions for suitable patients. Trial Registration: German Clinical Trials Register DRKS-ID DRKS00014627; https://drks.de/search/en/trial/DRKS00014627 UR - https://www.jmir.org/2025/1/e56092 UR - http://dx.doi.org/10.2196/56092 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56092 ER - TY - JOUR AU - Männikkö, Viljami AU - Tommola, Janne AU - Tikkanen, Emmi AU - Hätinen, Olli-Pekka AU - Åberg, Fredrik PY - 2025/4/2 TI - Large-Scale Evaluation and Liver Disease Risk Prediction in Finland?s National Electronic Health Record System: Feasibility Study Using Real-World Data JO - JMIR Med Inform SP - e62978 VL - 13 KW - Kanta archive KW - national patient data repository KW - real world data KW - risk prediction KW - chronic liver disease KW - mortality KW - risk detection KW - alcoholic liver KW - prediction KW - obesity KW - overweight KW - electronic health record KW - wearables KW - smartwatch N2 - Background: Globally, the incidence and mortality of chronic liver disease are escalating. Early detection of liver disease remains a challenge, often occurring at symptomatic stages when preventative measures are less effective. The Chronic Liver Disease score (CLivD) is a predictive risk model developed using Finnish health care data, aiming to forecast an individual?s risk of developing chronic liver disease in subsequent years. The Kanta Service is a national electronic health record system in Finland that stores comprehensive health care data including patient medical histories, prescriptions, and laboratory results, to facilitate health care delivery and research. Objective: This study aimed to evaluate the feasibility of implementing an automatic CLivD score with the current Kanta platform and identify and suggest improvements for Kanta that would enable accurate automatic risk detection. Methods: In this study, a real-world data repository (Kanta) was used as a data source for ?The ClivD score? risk calculation model. Our dataset consisted of 96,200 individuals? whole medical history from Kanta. For real-world data use, we designed processes to handle missing input in the calculation process. Results: We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by using the unstructured text in patient reports and by approximating variables by using other health data?like diagnosis information. Using structured data, we were able to identify only 33 out of 51,275 individuals in the ?low risk? category and 308 out of 51,275 individuals (<1%) in the ?moderate risk? category. By adding diagnosis information approximation and free text use, we were able to identify 18,895 out of 51,275 (37%) individuals in the ?low risk? category and 2125 out of 51,275 (4%) individuals in the ?moderate risk? category. In both cases, we were not able to identify any individuals in the ?high-risk? category because of the missing waist-hip ratio measurement. We evaluated 3 scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy. Conclusions: We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking, and alcohol use are important risk factors. Our simulations show up to 14% improvement in risk detection when adding support for missing input variables. Kanta shows the potential for implementing nationwide automated risk detection models that could result in improved disease prevention and public health. UR - https://medinform.jmir.org/2025/1/e62978 UR - http://dx.doi.org/10.2196/62978 UR - http://www.ncbi.nlm.nih.gov/pubmed/40172947 ID - info:doi/10.2196/62978 ER - TY - JOUR AU - Yang, Hao AU - Li, Jiaxi AU - Zhang, Chi AU - Sierra, Pazos Alejandro AU - Shen, Bairong PY - 2025/3/27 TI - Large Language Model?Driven Knowledge Graph Construction in Sepsis Care Using Multicenter Clinical Databases: Development and Usability Study JO - J Med Internet Res SP - e65537 VL - 27 KW - sepsis KW - knowledge graph KW - large language models KW - prompt engineering KW - real-world KW - GPT-4.0 N2 - Background: Sepsis is a complex, life-threatening condition characterized by significant heterogeneity and vast amounts of unstructured data, posing substantial challenges for traditional knowledge graph construction methods. The integration of large language models (LLMs) with real-world data offers a promising avenue to address these challenges and enhance the understanding and management of sepsis. Objective: This study aims to develop a comprehensive sepsis knowledge graph by leveraging the capabilities of LLMs, specifically GPT-4.0, in conjunction with multicenter clinical databases. The goal is to improve the understanding of sepsis and provide actionable insights for clinical decision-making. We also established a multicenter sepsis database (MSD) to support this effort. Methods: We collected clinical guidelines, public databases, and real-world data from 3 major hospitals in Western China, encompassing 10,544 patients diagnosed with sepsis. Using GPT-4.0, we used advanced prompt engineering techniques for entity recognition and relationship extraction, which facilitated the construction of a nuanced sepsis knowledge graph. Results: We established a sepsis database with 10,544 patient records, including 8497 from West China Hospital, 690 from Shangjin Hospital, and 357 from Tianfu Hospital. The sepsis knowledge graph comprises of 1894 nodes and 2021 distinct relationships, encompassing nine entity concepts (diseases, symptoms, biomarkers, imaging examinations, etc) and 8 semantic relationships (complications, recommended medications, laboratory tests, etc). GPT-4.0 demonstrated superior performance in entity recognition and relationship extraction, achieving an F1-score of 76.76 on a sepsis-specific dataset, outperforming other models such as Qwen2 (43.77) and Llama3 (48.39). On the CMeEE dataset, GPT-4.0 achieved an F1-score of 65.42 using few-shot learning, surpassing traditional models such as BERT-CRF (62.11) and Med-BERT (60.66). Building upon this, we compiled a comprehensive sepsis knowledge graph, comprising of 1894 nodes and 2021 distinct relationships. Conclusions: This study represents a pioneering effort in using LLMs, particularly GPT-4.0, to construct a comprehensive sepsis knowledge graph. The innovative application of prompt engineering, combined with the integration of multicenter real-world data, has significantly enhanced the efficiency and accuracy of knowledge graph construction. The resulting knowledge graph provides a robust framework for understanding sepsis, supporting clinical decision-making, and facilitating further research. The success of this approach underscores the potential of LLMs in medical research and sets a new benchmark for future studies in sepsis and other complex medical conditions. UR - https://www.jmir.org/2025/1/e65537 UR - http://dx.doi.org/10.2196/65537 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65537 ER - TY - JOUR AU - Acitores Cortina, Miguel Jose AU - Fatapour, Yasaman AU - Brown, LaRow Kathleen AU - Gisladottir, Undina AU - Zietz, Michael AU - Bear Don't Walk IV, John Oliver AU - Peter, Danner AU - Berkowitz, S. Jacob AU - Friedrich, A. Nadine AU - Kivelson, Sophia AU - Kuchi, Aditi AU - Liu, Hongyu AU - Srinivasan, Apoorva AU - Tsang, K. Kevin AU - Tatonetti, P. Nicholas PY - 2025/3/27 TI - Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for ?Complete Data?: Observational Clinical Data Analysis JO - JMIR Med Inform SP - e67591 VL - 13 KW - health disparities KW - data quality KW - observational research KW - electronic health records KW - racial and ethnic biases N2 - Background: Integrated clinical databases from national biobanks have advanced the capacity for disease research. Data quality and completeness filters are used when building clinical cohorts to address limitations of data missingness. However, these filters may unintentionally introduce systemic biases when they are correlated with race and ethnicity. Objective: In this study, we examined the race and ethnicity biases introduced by applying common filters to 4 clinical records databases. Specifically, we evaluated whether these filters introduce biases that disproportionately exclude minoritized groups. Methods: We applied 19 commonly used data filters to electronic health record datasets from 4 geographically varied locations comprising close to 12 million patients to understand how using these filters introduces sample bias along racial and ethnic groupings. These filters covered a range of information, including demographics, medication records, visit details, and observation periods. We observed the variation in sample drop-off between self-reported ethnic and racial groups for each site as we applied each filter individually. Results: Applying the observation period filter substantially reduced data availability across all races and ethnicities in all 4 datasets. However, among those examined, the availability of data in the white group remained consistently higher compared to other racial groups after applying each filter. Conversely, the Black or African American group was the most impacted by each filter on these 3 datasets: Cedars-Sinai dataset, UK Biobank, and Columbia University dataset. Among the 4 distinct datasets, only applying the filters to the All of Us dataset resulted in minimal deviation from the baseline, with most racial and ethnic groups following a similar pattern. Conclusions: Our findings underscore the importance of using only necessary filters, as they might disproportionally affect data availability of minoritized racial and ethnic populations. Researchers must consider these unintentional biases when performing data-driven research and explore techniques to minimize the impact of these filters, such as probabilistic methods or adjusted cohort selection methods. Additionally, we recommend disclosing sample sizes for racial and ethnic groups both before and after data filters are applied to aid the reader in understanding the generalizability of the results. Future work should focus on exploring the effects of filters on downstream analyses. UR - https://medinform.jmir.org/2025/1/e67591 UR - http://dx.doi.org/10.2196/67591 ID - info:doi/10.2196/67591 ER - TY - JOUR AU - Schmit, D. Cason AU - O?Connell, Curry Meghan AU - Shewbrooks, Sarah AU - Abourezk, Charles AU - Cochlin, J. Fallon AU - Doerr, Megan AU - Kum, Hye-Chung PY - 2025/3/26 TI - Dying in Darkness: Deviations From Data Sharing Ethics in the US Public Health System and the Data Genocide of American Indian and Alaska Native Communities JO - J Med Internet Res SP - e70983 VL - 27 KW - ethics KW - information dissemination KW - indigenous peoples KW - public health surveillance KW - privacy KW - data sharing KW - deidentification KW - data anonymization KW - public health ethics KW - data governance UR - https://www.jmir.org/2025/1/e70983 UR - http://dx.doi.org/10.2196/70983 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/70983 ER - TY - JOUR AU - Ehrig, Molly AU - Bullock, S. Garrett AU - Leng, Iris Xiaoyan AU - Pajewski, M. Nicholas AU - Speiser, Lynn Jaime PY - 2025/3/13 TI - Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data JO - JMIR Med Inform SP - e64354 VL - 13 KW - missing indicator method KW - missing data KW - imputation KW - longitudinal data KW - electronic health record data KW - electronic health records KW - EHR KW - simulation study KW - clinical prediction model KW - prediction model KW - older adults KW - falls KW - logistic regression KW - prediction modeling N2 - Background: Missing data in electronic health records are highly prevalent and result in analytical concerns such as heterogeneous sources of bias and loss of statistical power. One simple analytic method for addressing missing or unknown covariate values is to treat missingness for a particular variable as a category onto itself, which we refer to as the missing indicator method. For cross-sectional analyses, recent work suggested that there was minimal benefit to the missing indicator method; however, it is unclear how this approach performs in the setting of longitudinal data, in which correlation among clustered repeated measures may be leveraged for potentially improved model performance. Objectives: This study aims to conduct a simulation study to evaluate whether the missing indicator method improved model performance and imputation accuracy for longitudinal data mimicking an application of developing a clinical prediction model for falls in older adults based on electronic health record data. Methods: We simulated a longitudinal binary outcome using mixed effects logistic regression that emulated a falls assessment at annual follow-up visits. Using multivariate imputation by chained equations, we simulated time-invariant predictors such as sex and medical history, as well as dynamic predictors such as physical function, BMI, and medication use. We induced missing data in predictors under scenarios that had both random (missing at random) and dependent missingness (missing not at random). We evaluated aggregate performance using the area under the receiver operating characteristic curve (AUROC) for models with and with no missing indicators as predictors, as well as complete case analysis, across simulation replicates. We evaluated imputation quality using normalized root-mean-square error for continuous variables and percent falsely classified for categorical variables. Results: Independent of the mechanism used to simulate missing data (missing at random or missing not at random), overall model performance via AUROC was similar regardless of whether missing indicators were included in the model. The root-mean-square error and percent falsely classified measures were similar for models including missing indicators versus those with no missing indicators. Model performance and imputation quality were similar regardless of whether the outcome was related to missingness. Imputation with or with no missing indicators had similar mean values of AUROC compared with complete case analysis, although complete case analysis had the largest range of values. Conclusions: The results of this study suggest that the inclusion of missing indicators in longitudinal data modeling neither improves nor worsens overall performance or imputation accuracy. Future research is needed to address whether the inclusion of missing indicators is useful in prediction modeling with longitudinal data in different settings, such as high dimensional data analysis. UR - https://medinform.jmir.org/2025/1/e64354 UR - http://dx.doi.org/10.2196/64354 ID - info:doi/10.2196/64354 ER - TY - JOUR AU - Yang, Zhongbao AU - Xu, Shan-Shan AU - Liu, Xiaozhu AU - Xu, Ningyuan AU - Chen, Yuqing AU - Wang, Shuya AU - Miao, Ming-Yue AU - Hou, Mengxue AU - Liu, Shuai AU - Zhou, Yi-Min AU - Zhou, Jian-Xin AU - Zhang, Linlin PY - 2025/3/12 TI - Large Language Model?Based Critical Care Big Data Deployment and Extraction: Descriptive Analysis JO - JMIR Med Inform SP - e63216 VL - 13 KW - big data KW - critical care?related databases KW - database deployment KW - large language model KW - database extraction KW - intensive care unit KW - ICU KW - GPT KW - artificial intelligence KW - AI KW - LLM N2 - Background: Publicly accessible critical care?related databases contain enormous clinical data, but their utilization often requires advanced programming skills. The growing complexity of large databases and unstructured data presents challenges for clinicians who need programming or data analysis expertise to utilize these systems directly. Objective: This study aims to simplify critical care?related database deployment and extraction via large language models. Methods: The development of this platform was a 2-step process. First, we enabled automated database deployment using Docker container technology, with incorporated web-based analytics interfaces Metabase and Superset. Second, we developed the intensive care unit?generative pretrained transformer (ICU-GPT), a large language model fine-tuned on intensive care unit (ICU) data that integrated LangChain and Microsoft AutoGen. Results: The automated deployment platform was designed with user-friendliness in mind, enabling clinicians to deploy 1 or multiple databases in local, cloud, or remote environments without the need for manual setup. After successfully overcoming GPT?s token limit and supporting multischema data, ICU-GPT could generate Structured Query Language (SQL) queries and extract insights from ICU datasets based on request input. A front-end user interface was developed for clinicians to achieve code-free SQL generation on the web-based client. Conclusions: By harnessing the power of our automated deployment platform and ICU-GPT model, clinicians are empowered to easily visualize, extract, and arrange critical care?related databases more efficiently and flexibly than manual methods. Our research could decrease the time and effort spent on complex bioinformatics methods and advance clinical research. UR - https://medinform.jmir.org/2025/1/e63216 UR - http://dx.doi.org/10.2196/63216 ID - info:doi/10.2196/63216 ER - TY - JOUR AU - Mast, H. Nicholas AU - Oeste, L. Clara AU - Hens, Dries PY - 2025/3/12 TI - Assessing Total Hip Arthroplasty Outcomes and Generating an Orthopedic Research Outcome Database via a Natural Language Processing Pipeline: Development and Validation Study JO - JMIR Med Inform SP - e64705 VL - 13 KW - total hip arthroplasty KW - THA KW - direct anterior approach KW - electronic health records KW - EHR KW - natural language processing KW - NLP KW - complication rate KW - single-surgeon registry KW - hip arthroplasty KW - orthopedic KW - validation KW - surgeon KW - outpatient visits KW - hospitalizations KW - surgery N2 - Background: Processing data from electronic health records (EHRs) to build research-grade databases is a lengthy and expensive process. Modern arthroplasty practice commonly uses multiple sites of care, including clinics and ambulatory care centers. However, most private data systems prevent obtaining usable insights for clinical practice. Objective: This study aims to create an automated natural language processing (NLP) pipeline for extracting clinical concepts from EHRs related to orthopedic outpatient visits, hospitalizations, and surgeries in a multicenter, single-surgeon practice. The pipeline was also used to assess therapies and complications after total hip arthroplasty (THA). Methods: EHRs of 1290 patients undergoing primary THA from January 1, 2012 to December 31, 2019 (operated and followed by the same surgeon) were processed using artificial intelligence (AI)?based models (NLP and machine learning). In addition, 3 independent medical reviewers generated a gold standard using 100 randomly selected EHRs. The algorithm processed the entire database from different EHR systems, generating an aggregated clinical data warehouse. An additional manual control arm was used for data quality control. Results: The algorithm was as accurate as human reviewers (0.95 vs 0.94; P=.01), achieving a database-wide average F1-score of 0.92 (SD 0.09; range 0.67?0.99), validating its use as an automated data extraction tool. During the first year after direct anterior THA, 92.1% (1188/1290) of our population had a complication-free recovery. In 7.9% (102/1290) of cases where surgery or recovery was not uneventful, lateral femoral cutaneous nerve sensitivity (47/1290, 3.6%), intraoperative fractures (13/1290, 1%), and hematoma (9/1290, 0.7%) were the most common complications. Conclusions: Algorithm evaluation of this dataset accurately represented key clinical information swiftly, compared with human reviewers. This technology may provide substantial value for future surgeon practice and patient counseling. Furthermore, the low early complication rate of direct anterior THA in this surgeon?s hands was supported by the dataset, which included data from all treated patients in a multicenter practice. UR - https://medinform.jmir.org/2025/1/e64705 UR - http://dx.doi.org/10.2196/64705 ID - info:doi/10.2196/64705 ER - TY - JOUR AU - Dekel, Dana AU - Marchant, Amanda AU - Del Pozo Banos, Marcos AU - Mhereeg, Mohamed AU - Lee, Chim Sze AU - John, Ann PY - 2025/3/12 TI - Exploring the Views of Young People, Including Those With a History of Self-Harm, on the Use of Their Routinely Generated Data for Mental Health Research: Web-Based Cross-Sectional Survey Study JO - JMIR Ment Health SP - e60649 VL - 12 KW - self-harm KW - mental health KW - big data KW - survey KW - youth N2 - Background: Secondary use of routinely collected health care data has great potential benefits in epidemiological studies primarily due to the large scale of preexisting data. Objective: This study aimed to engage respondents with and without a history of self-harm, gain insight into their views on the use of their data for research, and determine whether there were any differences in opinions between the 2 groups. Methods: We examined young people?s views on the use of their routinely collected data for mental health research through a web-based survey, evaluating any differences between those with and without a history of self-harm. Results: A total of 1765 respondents aged 16 to 24 years were included. Respondents? views were mostly positive toward the use and linkage of their data for research purposes for public benefit, particularly with regard to the use of health care data (mental health or otherwise), and generally echoed existing evidence on the opinions of older age groups. Individuals who reported a history of self-harm and subsequently contacted health services more often reported being ?extremely likely? or ?likely? to share mental health data (contacted: 209/609, 34.3%; 95% CI 28.0-41.2; not contacted: 169/782, 21.6%; 95% CI 15.8-28.7) and physical health data (contacted: 117/609, 19.2%; 95% CI 12.7-27.8; not contacted: 96/782, 12.3%; 95% CI 6.7-20.9) compared with those who had not contacted services. Respondents were overall less likely to want to share their social media data, which they considered to be more personal compared to their health care data. Respondents stressed the importance of anonymity and the need for an appropriate ethical framework. Conclusions: Young people are aware, and they care about how their data are being used and for what purposes, irrespective of having a history of self-harm. They are largely positive about the use of health care data (mental or physical) for research and generally echo the opinions of older age groups raising issues around data security and the use of data for the public interest. UR - https://mental.jmir.org/2025/1/e60649 UR - http://dx.doi.org/10.2196/60649 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60649 ER - TY - JOUR AU - Spotnitz, Matthew AU - Giannini, John AU - Ostchega, Yechiam AU - Goff, L. Stephanie AU - Anandan, Priya Lakshmi AU - Clark, Emily AU - Litwin, R. Tamara AU - Berman, Lew PY - 2025/3/11 TI - Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: Cross-Sectional Study JO - JMIR Cancer SP - e59298 VL - 11 KW - data quality KW - electronic health record KW - breast cancer KW - breast-conserving surgery KW - total mastectomy KW - modified radical mastectomy KW - public health informatics KW - cohort KW - assessment KW - women KW - United States KW - American KW - nonmetastatic disease KW - treatment KW - breast cancer surgery KW - real-world evidence KW - data KW - mastectomy KW - female KW - data quality framework KW - therapy N2 - Background: Breast cancer is prevalent among females in the United States. Nonmetastatic disease is treated by partial or complete mastectomy procedures. However, the rates of those procedures vary across practices. Generating real-world evidence on breast cancer surgery could lead to improved and consistent practices. We investigated the quality of data from the All of Us Research Program, which is a precision medicine initiative that collected real-world electronic health care data from different sites in the United States both retrospectively and prospectively to participant enrollment. Objective: The paper aims to determine whether All of Us data are fit for use in generating real-world evidence on mastectomy procedures. Methods: Our mastectomy phenotype consisted of adult female participants who had CPT4 (Current Procedural Terminology 4), ICD-9 (International Classification of Diseases, Ninth Revision) procedure, or SNOMED (Systematized Nomenclature of Medicine) codes for a partial or complete mastectomy procedure that mapped to Observational Medical Outcomes Partnership Common Data Model concepts. We evaluated the phenotype with a data quality dimensions (DQD) framework that consisted of 5 elements: conformance, completeness, concordance, plausibility, and temporality. Also, we applied a previously developed DQD checklist to evaluate concept selection, internal verification, and external validation for each dimension. We compared the DQD of our cohort to a control group of adult women who did not have a mastectomy procedure. Our subgroup analysis compared partial to complete mastectomy procedure phenotypes. Results: There were 4175 female participants aged 18 years or older in the partial or complete mastectomy cohort, and 168,226 participants in the control cohort. The geospatial distribution of our cohort varied across states. For example, our cohort consisted of 835 (20%) participants from Massachusetts, but multiple other states contributed fewer than 20 participants. We compared the sociodemographic characteristics of the partial (n=2607) and complete (n=1568) mastectomy subgroups. Those groups differed in the distribution of age at procedure (P<.001), education (P=.02), and income (P=.03) levels, as per ?2 analysis. A total of 367 (9.9%) participants in our cohort had overlapping CPT4 and SNOMED codes for a mastectomy, and 63 (1.5%) had overlapping ICD-9 procedure and SNOMED codes. The prevalence of breast cancer?related concepts was higher in our cohort compared to the control group (P<.001). In both the partial and complete mastectomy subgroups, the correlations among concepts were consistent with the clinical management of breast cancer. The median time between biopsy and mastectomy was 5.5 (IQR 3.5-11.2) weeks. Although we did not have external benchmark comparisons, we were able to evaluate concept selection and internal verification for all domains. Conclusions: Our data quality framework was implemented successfully on a mastectomy phenotype. Our systematic approach identified data missingness. Moreover, the framework allowed us to differentiate breast-conserving therapy and complete mastectomy subgroups in the All of Us data. UR - https://cancer.jmir.org/2025/1/e59298 UR - http://dx.doi.org/10.2196/59298 ID - info:doi/10.2196/59298 ER - TY - JOUR AU - Malik, Salma AU - Dorothea, Pana Zoi AU - Argyropoulos, D. Christos AU - Themistocleous, Sophia AU - Macken, J. Alan AU - Valdenmaiier, Olena AU - Scheckenbach, Frank AU - Bardach, Elena AU - Pfeiffer, Andrea AU - Loens, Katherine AU - Ochando, Cano Jordi AU - Cornely, A. Oliver AU - Demotes-Mainard, Jacques AU - Contrino, Sergio AU - Felder, Gerd PY - 2025/3/7 TI - Data Interoperability in COVID-19 Vaccine Trials: Methodological Approach in the VACCELERATE Project JO - JMIR Med Inform SP - e65590 VL - 13 KW - interoperability KW - metadata KW - data management KW - clinical trials KW - protocol KW - harmonization KW - adult KW - pediatric KW - systems KW - standards N2 - Background: Data standards are not only key to making data processing efficient but also fundamental to ensuring data interoperability. When clinical trial data are structured according to international standards, they become significantly easier to analyze, reducing the efforts required for data cleaning, preprocessing, and secondary use. A common language and a shared set of expectations facilitate interoperability between systems and devices. Objective: The main objectives of this study were to identify commonalities and differences in clinical trial metadata, protocols, and data collection systems/items within the VACCELERATE project. Methods: To assess the degree of interoperability achieved in the project and suggest methodological improvements, interoperable points were identified based on the core outcome areas?immunogenicity, safety, and efficacy (clinical/physiological). These points were emphasized in the development of the master protocol template and were manually compared in the following ways: (1) summaries, objectives, and end points in the protocols of 3 VACCELERATE clinical trials (EU-COVAT-1_AGED, EU-COVAT-2_BOOSTAVAC, and EU-COVPT-1_CoVacc) against the master protocol template; (2) metadata of all 3 clinical trials; and (3) evaluations from a questionnaire survey regarding differences in data management systems and structures that enabled data exchange within the VACCELERATE network. Results: The noncommonalities identified in the protocols and metadata were attributed to differences in populations, variations in protocol design, and vaccination patterns. The detailed metadata released for all 3 vaccine trials were clearly structured using internal standards, terminology, and the general approach of Clinical Data Acquisition Standards Harmonisation (CDASH) for data collection (eg, on electronic case report forms). VACCELERATE benefited significantly from the selection of the Clinical Trials Centre Cologne as the sole data management provider. With system database development coordinated by a single individual and no need for coordination among different trial units, a high degree of uniformity was achieved automatically. The harmonized transfer of data to all sites, using well-established methods, enabled quick exchanges and provided a relatively secure means of data transfer. Conclusions: This study demonstrated that using master protocols can significantly enhance trial operational efficiency and data interoperability, provided that similar infrastructure and data management procedures are adopted across multiple trials. To further improve data interoperability and facilitate interpretation and analysis, shared data should be structured, described, formatted, and stored using widely recognized data and metadata standards. Trial Registration: EudraCT 2021-004526-29; https://www.clinicaltrialsregister.eu/ctr-search/trial/2021-004526-29/DE/; 2021-004889-35; https://www.clinicaltrialsregister.eu/ctr-search/search?query=eudract_number:2021-004889-35; and 2021-004526-29; https://www.clinicaltrialsregister.eu/ctr-search/search?query=eudract_number:2021-004526-29 UR - https://medinform.jmir.org/2025/1/e65590 UR - http://dx.doi.org/10.2196/65590 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65590 ER - TY - JOUR AU - El Kababji, Samer AU - Mitsakakis, Nicholas AU - Jonker, Elizabeth AU - Beltran-Bless, Ana-Alicia AU - Pond, Gregory AU - Vandermeer, Lisa AU - Radhakrishnan, Dhenuka AU - Mosquera, Lucy AU - Paterson, Alexander AU - Shepherd, Lois AU - Chen, Bingshu AU - Barlow, William AU - Gralow, Julie AU - Savard, Marie-France AU - Fesl, Christian AU - Hlauschek, Dominik AU - Balic, Marija AU - Rinnerthaler, Gabriel AU - Greil, Richard AU - Gnant, Michael AU - Clemons, Mark AU - El Emam, Khaled PY - 2025/3/5 TI - Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study JO - J Med Internet Res SP - e66821 VL - 27 KW - generative models KW - study accrual KW - recruitment KW - clinical trial replication KW - oncology KW - validation KW - simulated patient KW - simulation KW - retrospective KW - dataset KW - patient KW - artificial intelligence KW - machine learning N2 - Background: Insufficient patient accrual is a major challenge in clinical trials and can result in underpowered studies, as well as exposing study participants to toxicity and additional costs, with limited scientific benefit. Real-world data can provide external controls, but insufficient accrual affects all arms of a study, not just controls. Studies that used generative models to simulate more patients were limited in the accrual scenarios considered, replicability criteria, number of generative models, and number of clinical trials evaluated. Objective: This study aimed to perform a comprehensive evaluation on the extent generative models can be used to simulate additional patients to compensate for insufficient accrual in clinical trials. Methods: We performed a retrospective analysis using 10 datasets from 9 fully accrued, completed, and published cancer trials. For each trial, we removed the latest recruited patients (from 10% to 50%), trained a generative model on the remaining patients, and simulated additional patients to replace the removed ones using the generative model to augment the available data. We then replicated the published analysis on this augmented dataset to determine if the findings remained the same. Four different generative models were evaluated: sequential synthesis with decision trees, Bayesian network, generative adversarial network, and a variational autoencoder. These generative models were compared to sampling with replacement (ie, bootstrap) as a simple alternative. Replication of the published analyses used 4 metrics: decision agreement, estimate agreement, standardized difference, and CI overlap. Results: Sequential synthesis performed well on the 4 replication metrics for the removal of up to 40% of the last recruited patients (decision agreement: 88% to 100% across datasets, estimate agreement: 100%, cannot reject standardized difference null hypothesis: 100%, and CI overlap: 0.8-0.92). Sampling with replacement was the next most effective approach, with decision agreement varying from 78% to 89% across all datasets. There was no evidence of a monotonic relationship in the estimated effect size with recruitment order across these studies. This suggests that patients recruited earlier in a trial were not systematically different than those recruited later, at least partially explaining why generative models trained on early data can effectively simulate patients recruited later in a trial. The fidelity of the generated data relative to the training data on the Hellinger distance was high in all cases. Conclusions: For an oncology study with insufficient accrual with as few as 60% of target recruitment, sequential synthesis can enable the simulation of the full dataset had the study continued accruing patients and can be an alternative to drawing conclusions from an underpowered study. These results provide evidence demonstrating the potential for generative models to rescue poorly accruing clinical trials, but additional studies are needed to confirm these findings and to generalize them for other diseases. UR - https://www.jmir.org/2025/1/e66821 UR - http://dx.doi.org/10.2196/66821 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053790 ID - info:doi/10.2196/66821 ER - TY - JOUR AU - Mandel, L. Hannah AU - Shah, N. Shruti AU - Bailey, Charles L. AU - Carton, Thomas AU - Chen, Yu AU - Esquenazi-Karonika, Shari AU - Haendel, Melissa AU - Hornig, Mady AU - Kaushal, Rainu AU - Oliveira, R. Carlos AU - Perlowski, A. Alice AU - Pfaff, Emily AU - Rao, Suchitra AU - Razzaghi, Hanieh AU - Seibert, Elle AU - Thomas, L. Gelise AU - Weiner, G. Mark AU - Thorpe, E. Lorna AU - Divers, Jasmin AU - PY - 2025/3/5 TI - Opportunities and Challenges in Using Electronic Health Record Systems to Study Postacute Sequelae of SARS-CoV-2 Infection: Insights From the NIH RECOVER Initiative JO - J Med Internet Res SP - e59217 VL - 27 KW - COVID-19 KW - SARS-CoV-2 KW - Long COVID, post-acute COVID-19 syndrome KW - electronic health records KW - machine learning KW - public health surveillance KW - post-infection syndrome KW - medical informatics KW - electronic medical record KW - electronic health record network KW - electronic health record data KW - clinical research network KW - clinical data research network KW - common data model KW - digital health KW - infection KW - respiratory KW - infectious KW - epidemiological KW - pandemic UR - https://www.jmir.org/2025/1/e59217 UR - http://dx.doi.org/10.2196/59217 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053748 ID - info:doi/10.2196/59217 ER - TY - JOUR AU - Ohno, Yukiko AU - Aomori, Tohru AU - Nishiyama, Tomohiro AU - Kato, Riri AU - Fujiki, Reina AU - Ishikawa, Haruki AU - Kiyomiya, Keisuke AU - Isawa, Minae AU - Mochizuki, Mayumi AU - Aramaki, Eiji AU - Ohtani, Hisakazu PY - 2025/3/4 TI - Performance Improvement of a Natural Language Processing Tool for Extracting Patient Narratives Related to Medical States From Japanese Pharmaceutical Care Records by Increasing the Amount of Training Data: Natural Language Processing Analysis and Validation Study JO - JMIR Med Inform SP - e68863 VL - 13 KW - natural language processing KW - NLP KW - named entity recognition KW - NER KW - deep learning KW - pharmaceutical care record KW - electronic medical record KW - EMR KW - Japanese N2 - Background: Patients? oral expressions serve as valuable sources of clinical information to improve pharmacotherapy. Natural language processing (NLP) is a useful approach for analyzing unstructured text data, such as patient narratives. However, few studies have focused on using NLP for narratives in the Japanese language. Objective: We aimed to develop a high-performance NLP system for extracting clinical information from patient narratives by examining the performance progression with a gradual increase in the amount of training data. Methods: We used subjective texts from the pharmaceutical care records of Keio University Hospital from April 1, 2018, to March 31, 2019, comprising 12,004 records from 6559 cases. After preprocessing, we annotated diseases and symptoms within the texts. We then trained and evaluated a deep learning model (bidirectional encoder representations from transformers combined with a conditional random field [BERT-CRF]) through 10-fold cross-validation. The annotated data were divided into 10 subsets, and the amount of training data was progressively increased over 10 steps. We also analyzed the causes of errors. Finally, we applied the developed system to the analysis of case report texts to evaluate its usability for texts from other sources. Results: The F1-score of the system improved from 0.67 to 0.82 as the amount of training data increased from 1200 to 12,004 records. The F1-score reached 0.78 with 3600 records and was largely similar thereafter. As performance improved, errors from incorrect extractions decreased significantly, which resulted in an increase in precision. For case reports, the F1-score also increased from 0.34 to 0.41 as the training dataset expanded from 1200 to 12,004 records. Performance was lower for extracting symptoms from case report texts compared with pharmaceutical care records, suggesting that this system is more specialized for analyzing subjective data from pharmaceutical care records. Conclusions: We successfully developed a high-performance system specialized in analyzing subjective data from pharmaceutical care records by training a large dataset, with near-complete saturation of system performance with about 3600 training records. This system will be useful for monitoring symptoms, offering benefits for both clinical practice and research. UR - https://medinform.jmir.org/2025/1/e68863 UR - http://dx.doi.org/10.2196/68863 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053805 ID - info:doi/10.2196/68863 ER - TY - JOUR AU - Ohlsen, Tessa AU - Hofer, Viola AU - Ingenerf, Josef PY - 2025/2/28 TI - A Validation Tool (VaPCE) for Postcoordinated SNOMED CT Expressions: Development and Usability Study JO - JMIR Med Inform SP - e67984 VL - 13 KW - SNOMED CT KW - PCE KW - postcoordination KW - FHIR KW - validation KW - postcoordinated expression KW - Fast Healthcare Interoperability Resource N2 - Background: The digitalization of health care has increased the demand for efficient data exchange, emphasizing semantic interoperability. SNOMED Clinical Terms (SNOMED CT), a comprehensive terminology with over 360,000 medical concepts, supports this need. However, it cannot cover all medical scenarios, particularly in complex cases. To address this, SNOMED CT allows postcoordination, where users combine precoordinated concepts with new expressions. Despite SNOMED CT?s potential, the creation and validation of postcoordinated expressions (PCEs) remain challenging due to complex syntactic and semantic rules. Objective: This work aims to develop a tool that validates postcoordinated SNOMED CT expressions, focusing on providing users with detailed, automated correction instructions for syntactic and semantic errors. The goal is not just validation, but also offering user-friendly, actionable suggestions for improving PCEs. Methods: A tool was created using the Fast Healthcare Interoperability Resource (FHIR) service $validate-code and the terminology server Ontoserver to check the correctness of PCEs. When errors are detected, the tool processes the SNOMED CT Concept Model in JSON format and applies predefined error categories. For each error type, specific correction suggestions are generated and displayed to users. The key added value of the tool is in generating specific correction suggestions for each identified error, which are displayed to the users. The tool was integrated into a web application, where users can validate individual PCEs or bulk-upload files. The tool was tested with real existing PCEs, which were used as input and validated. In the event of errors, appropriate error messages were generated as output. Results: In the validation of 136 PCEs from 304 FHIR Questionnaires, 18 (13.2%) PCEs were invalid, with the most common errors being invalid attribute values. Additionally, 868 OncoTree codes were evaluated, resulting in 161 (20.9%) PCEs containing inactive concepts, which were successfully replaced with valid alternatives. A user survey reflects a favorable evaluation of the tool?s functionality. Participants found the error categorization and correction suggestions to be precise, offering clear guidance for addressing issues. However, there is potential for enhancement, particularly regarding the level of detail in the error messages. Conclusions: The validation tool significantly improves the accuracy of postcoordinated SNOMED CT expressions by not only identifying errors but also offering detailed correction instructions. This approach supports health care professionals in ensuring that their PCEs are syntactically and semantically valid, enhancing data quality and interoperability across systems. UR - https://medinform.jmir.org/2025/1/e67984 UR - http://dx.doi.org/10.2196/67984 ID - info:doi/10.2196/67984 ER - TY - JOUR AU - Ram, Sharan AU - Corbin, Marine AU - 't Mannetje, Andrea AU - Eng, Amanda AU - Kvalsvig, Amanda AU - Baker, G. Michael AU - Douwes, Jeroen PY - 2025/2/28 TI - Antibiotic Use In Utero and Early Life and Risk of Chronic Childhood Conditions in New Zealand: Protocol for a Data Linkage Retrospective Cohort Study JO - JMIR Res Protoc SP - e66184 VL - 14 KW - early childhood KW - chronic childhood conditions KW - antibiotics KW - data linkage KW - study protocol KW - routine data N2 - Background: The incidence of many common chronic childhood conditions has increased globally in the past few decades, which has been suggested to be potentially attributed to antibiotic overuse leading to dysbiosis in the gut microbiome. Objective: This linkage study will assess the role of antibiotic use in utero and in early life in the development of type 1 diabetes (T1D), attention-deficit/hyperactive disorder (ADHD), and inflammatory bowel disease. Methods: The study design involves several retrospective cohort studies using linked administrative health and social data from Statistics New Zealand?s Integrated Data Infrastructure. It uses data from all children who were born in New Zealand between October 2005 and December 2010 (N=334,204) and their mothers. Children?s antibiotic use is identified for 4 time periods (at pregnancy, at ?1 year, at ?2 years, and at ?5 years), and the development of T1D, ADHD, and inflammatory bowel disease is measured from the end of the antibiotic use periods until death, emigration, or the end of the follow-up period (2021), whichever came first. Children who emigrated or died before the end of the antibiotic use period are excluded. Cox proportional hazards regression models are used while adjusting for a range of potential confounders. Results: As of September 2024, data linkage has been completed, involving the integration of antibiotic exposure and outcome variables for 315,789 children. Preliminary analyses show that both prenatal and early life antibiotic consumption is associated with T1D. Full analyses for all 3 outcomes will be completed by the end of 2025. Conclusions: This series of linked cohort studies using detailed, complete, and systematically collected antibiotic prescription data will provide critical new knowledge regarding the role of antibiotics in the development of common chronic childhood conditions. Thus, this study has the potential to contribute to the development of primary prevention strategies through, for example, targeted changes in antibiotic use. International Registered Report Identifier (IRRID): DERR1-10.2196/66184 UR - https://www.researchprotocols.org/2025/1/e66184 UR - http://dx.doi.org/10.2196/66184 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053783 ID - info:doi/10.2196/66184 ER - TY - JOUR AU - Tang, Wen-Zhen AU - Zhu, Sheng-Rui AU - Mo, Shu-Tian AU - Xie, Yuan-Xi AU - Tan, Zheng-Ke-Ke AU - Teng, Yan-Juan AU - Jia, Kui PY - 2025/2/27 TI - Predictive Value of Frailty on Outcomes of Patients With Cirrhosis: Systematic Review and Meta-Analysis JO - JMIR Med Inform SP - e60683 VL - 13 KW - frailty KW - cirrhosis KW - diagnostic efficiency KW - survival KW - meta-analysis KW - prognostic factor KW - systematic review N2 - Background: Frailty is one of the most common symptoms in patients with cirrhosis. Many researchers have identified it as a prognostic factor for patients with cirrhosis. However, no quantitative meta-analysis has evaluated the prognostic value of frailty in patients with cirrhosis. Objective: This systematic review and meta-analysis aimed to assess the prognostic significance of frailty in patients with cirrhosis. Methods: The systematic review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) recommendations. We conducted a comprehensive search of the literature using databases such as PubMed, Cochrane Library, Embase, and Web of Science, as well as China National Knowledge Infrastructure, encompassing the period from inception to 22 December 2023. Data were extracted for frailty to predict adverse outcomes in patients with cirrhosis. RevMan (version 5.3) and R (version 4.2.2) were used to assess the extracted data. Results: A total of 26 studies with 9597 patients with cirrhosis were included. Compared with patients having low or no frailty, the frail group had a higher mortality rate (relative ratio, RR=2.07, 95% CI 1.82?2.34, P<.001), higher readmission rate (RR=1.50, 95% CI 1.22?1.84, P<.001), and lower quality of life (RR=5.78, 95% CI 2.25?14.82, P<.001). The summary receiver operator characteristic (SROC) curve of frailty for mortality in patients with cirrhosis showed that the false positive rate (FPR) was 0.25 (95% CI 0.17-0.34), diagnostic odds ratio (DOR) was 4.17 (95% CI 2.93-5.93), sensitivity was 0.54 (95% CI 0.39-0.69), and specificity was 0.73 (95% CI 0.64-0.81). The SROC curve of readmission showed that the FPR, DOR, sensitivity, and specificity were 0.39 (95% CI 0.17-0.66), 1.38 (95% CI 0.64-2.93), 0.46 (95% CI 0.28-0.64), and 0.60 (95% CI 0.28-0.85), respectively. Conclusions: This meta-analysis demonstrated that frailty is a reliable prognostic predictor of outcomes in patients with cirrhosis. To enhance the prognosis of patients with cirrhosis, more studies on frailty screening are required. Trial Registration: PROSPERO CRD42024497698; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=497698 UR - https://medinform.jmir.org/2025/1/e60683 UR - http://dx.doi.org/10.2196/60683 ID - info:doi/10.2196/60683 ER - TY - JOUR AU - Selcuk, Yesim AU - Kim, Eunhui AU - Ahn, Insung PY - 2025/2/10 TI - InfectA-Chat, an Arabic Large Language Model for Infectious Diseases: Comparative Analysis JO - JMIR Med Inform SP - e63881 VL - 13 KW - large language model KW - Arabic large language models KW - AceGPT KW - multilingual large language model KW - infectious disease monitoring KW - public health N2 - Background: Infectious diseases have consistently been a significant concern in public health, requiring proactive measures to safeguard societal well-being. In this regard, regular monitoring activities play a crucial role in mitigating the adverse effects of diseases on society. To monitor disease trends, various organizations, such as the World Health Organization (WHO) and the European Centre for Disease Prevention and Control (ECDC), collect diverse surveillance data and make them publicly accessible. However, these platforms primarily present surveillance data in English, which creates language barriers for non?English-speaking individuals and global public health efforts to accurately observe disease trends. This challenge is particularly noticeable in regions such as the Middle East, where specific infectious diseases, such as Middle East respiratory syndrome coronavirus (MERS-CoV), have seen a dramatic increase. For such regions, it is essential to develop tools that can overcome language barriers and reach more individuals to alleviate the negative impacts of these diseases. Objective: This study aims to address these issues; therefore, we propose InfectA-Chat, a cutting-edge large language model (LLM) specifically designed for the Arabic language but also incorporating English for question and answer (Q&A) tasks. InfectA-Chat leverages its deep understanding of the language to provide users with information on the latest trends in infectious diseases based on their queries. Methods: This comprehensive study was achieved by instruction tuning the AceGPT-7B and AceGPT-7B-Chat models on a Q&A task, using a dataset of 55,400 Arabic and English domain?specific instruction?following data. The performance of these fine-tuned models was evaluated using 2770 domain-specific Arabic and English instruction?following data, using the GPT-4 evaluation method. A comparative analysis was then performed against Arabic LLMs and state-of-the-art models, including AceGPT-13B-Chat, Jais-13B-Chat, Gemini, GPT-3.5, and GPT-4. Furthermore, to ensure the model had access to the latest information on infectious diseases by regularly updating the data without additional fine-tuning, we used the retrieval-augmented generation (RAG) method. Results: InfectA-Chat demonstrated good performance in answering questions about infectious diseases by the GPT-4 evaluation method. Our comparative analysis revealed that it outperforms the AceGPT-7B-Chat and InfectA-Chat (based on AceGPT-7B) models by a margin of 43.52%. It also surpassed other Arabic LLMs such as AceGPT-13B-Chat and Jais-13B-Chat by 48.61%. Among the state-of-the-art models, InfectA-Chat achieved a leading performance of 23.78%, competing closely with the GPT-4 model. Furthermore, the RAG method in InfectA-Chat significantly improved document retrieval accuracy. Notably, RAG retrieved more accurate documents based on queries when the top-k parameter value was increased. Conclusions: Our findings highlight the shortcomings of general Arabic LLMs in providing up-to-date information about infectious diseases. With this study, we aim to empower individuals and public health efforts by offering a bilingual Q&A system for infectious disease monitoring. UR - https://medinform.jmir.org/2025/1/e63881 UR - http://dx.doi.org/10.2196/63881 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63881 ER - TY - JOUR AU - Studer, Joseph AU - Cunningham, A. John AU - Schmutz, Elodie AU - Gaume, Jacques AU - Adam, Angéline AU - Daeppen, Jean-Bernard AU - Bertholet, Nicolas PY - 2025/2/6 TI - Smartphone-Based Intervention Targeting Norms and Risk Perception Among University Students with Unhealthy Alcohol Use: Secondary Mediation Analysis of a Randomized Controlled Trial JO - J Med Internet Res SP - e55541 VL - 27 KW - brief intervention KW - alcohol use KW - mechanism of action KW - mediation analysis KW - personalized feedback KW - smartphone app KW - students KW - Switzerland KW - mobile phone KW - mediation KW - feedback KW - student KW - health risk KW - drinking KW - drinker KW - support KW - feedback intervention N2 - Background: Many digital interventions for unhealthy alcohol use are based on personalized normative feedback (PNF) and personalized feedback on risks for health (PFR). The hypothesis is that PNF and PFR affect drinkers? perceptions of drinking norms and risks, resulting in changes in drinking behaviors. This study is a follow-up mediation analysis of the primary and secondary outcomes of a randomized controlled trial testing the effect of a smartphone-based intervention to reduce alcohol use. Objective: This study aimed to investigate whether perceptions of drinking norms and risks mediated the effects of a smartphone-based intervention to reduce alcohol use. Methods: A total of 1770 students from 4 higher education institutions in Switzerland (mean age 22.35, SD 3.07 years) who screened positive for unhealthy alcohol use were randomized to receive access to a smartphone app or to the no-intervention control condition. The smartphone app provided PNF and PFR. Outcomes were drinking volume (DV) in standard drinks per week and the number of heavy drinking days (HDDs) assessed at baseline and 6 months. Mediators were perceived drinking norms and perceived risks for health measured at baseline and 3 months. Parallel mediation analyses and moderated mediation analyses were conducted to test whether (1) the intervention effect was indirectly related to lower DV and HDDs at 6 months (adjusting for baseline values) through perceived drinking norms and perceived risks for health at 3 months (adjusting for baseline values) and (2) the indirect effects through perceived drinking norms differed between participants who overestimated or who did not overestimate other people?s drinking at baseline. Results: The intervention?s total effects were significant (DV: b=?0.85, 95% bootstrap CI ?1.49 to ?0.25; HDD: b=?0.44, 95% bootstrap CI ?0.72 to ?0.16), indicating less drinking at 6 months in the intervention group than in the control group. The direct effects (ie, controlling for mediators) were significant though smaller (DV: b=?0.73, 95% bootstrap CI ?1.33 to ?0.16; HDD: b=?0.39, 95% bootstrap CI ?0.66 to ?0.12). For DV, the indirect effect was significant through perceived drinking norms (b=?0.12, 95% bootstrap CI ?0.25 to ?0.03). The indirect effects through perceived risk (for DV and HDD) and perceived drinking norms (for HDD) were not significant. Results of moderated mediation analyses showed that the indirect effects through perceived drinking norms were significant among participants overestimating other people?s drinking (DV: b=?0.17, 95% bootstrap CI ?0.32 to ?0.05; HDD: b=?0.08, 95% bootstrap CI ?0.15 to ?0.01) but not significant among those not overestimating. Conclusions: Perceived drinking norms, but not perceived risks, partially mediated the intervention?s effect on alcohol use, confirming one of its hypothesized mechanisms of action. These findings lend support to using normative feedback interventions to discourage unhealthy alcohol use. Trial Registration: ISRCTN Registry 10007691; https://doi.org/10.1186/ISRCTN10007691 UR - https://www.jmir.org/2025/1/e55541 UR - http://dx.doi.org/10.2196/55541 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55541 ER - TY - JOUR AU - Mitsutake, Seigo AU - Ishizaki, Tatsuro AU - Yano, Shohei AU - Hirata, Takumi AU - Ito, Kae AU - Furuta, Ko AU - Shimazaki, Yoshitomo AU - Ito, Hideki AU - Mudge, Alison AU - Toba, Kenji PY - 2025/2/6 TI - Predictive Validity of Hospital-Associated Complications of Older People Identified Using Diagnosis Procedure Combination Data From an Acute Care Hospital in Japan: Observational Study JO - JMIR Aging SP - e68267 VL - 8 KW - delirium KW - functional decline KW - Japan KW - older adult KW - routinely collected health data KW - elder KW - hospital complication KW - HAC-OP KW - incontinence KW - pressure injury KW - inpatient care KW - diagnosis procedure combination KW - predictive validity KW - hospital length of stay KW - administrative data KW - acute care KW - index hospitalization KW - diagnostic code KW - linear regression KW - logistic regression KW - long-term care KW - retrospective cohort KW - observational study KW - patient care KW - gerontology KW - hospital care KW - patient complication N2 - Background: A composite outcome of hospital-associated complications of older people (HAC-OP; comprising functional decline, delirium, incontinence, falls, and pressure injuries) has been proposed as an outcome measure reflecting quality of acute hospital care. Estimating HAC-OP from routinely collected administrative data could facilitate the rapid and standardized evaluation of interventions in the clinical setting, thereby supporting the development, improvement, and wider implementation of effective interventions. Objective: This study aimed to create a Diagnosis Procedure Combination (DPC) data version of the HAC-OP measure (HAC-OP-DPC) and demonstrate its predictive validity by assessing its associations with hospital length of stay (LOS) and discharge destination. Methods: This retrospective cohort study acquired DPC data (routinely collected administrative data) from a general acute care hospital in Tokyo, Japan. We included data from index hospitalizations for patients aged ?65 years hospitalized for ?3 days and discharged between July 2016 and March 2021. HAC-OP-DPC were identified using diagnostic codes for functional decline, incontinence, delirium, pressure injury, and falls occurring during the index hospitalization. Generalized linear regression models were used to examine the associations between HAC-OP-DPC and LOS, and logistic regression models were used to examine the associations between HAC-OP-DPC and discharge to other hospitals and long-term care facilities (LTCFs). Results: Among 15,278 patients, 3610 (23.6%) patients had coding evidence of one or more HAC-OP-DPC (1: 18.8% and ?2: 4.8%). Using ?no HAC-OP-DPC? as the reference category, the analysis showed a significant and graded association with longer LOS (adjusted risk ratio for patients with one HAC-OP-DPC 1.29, 95% CI 1.25-1.33; adjusted risk ratio for ?2 HAC-OP-DPC 1.97, 95% CI 1.87-2.08), discharge to another hospital (adjusted odds ratio [AOR] for one HAC-OP-DPC 2.36, 95% CI 2.10-2.65; AOR for ?2 HAC-OP-DPC 6.96, 95% CI 5.81-8.35), and discharge to LTCFs (AOR for one HAC-OP-DPC 1.35, 95% CI 1.09-1.67; AOR for ?2 HAC-OP-DPC 1.68, 95% CI 1.18-2.39). Each individual HAC-OP was also significantly associated with longer LOS and discharge to another hospital, but only delirium was associated with discharge to LTCF. Conclusions: This study demonstrated the predictive validity of the HAC-OP-DPC measure for longer LOS and discharge to other hospitals and LTCFs. To attain a more robust understanding of these relationships, additional studies are needed to verify our findings in other hospitals and regions. The clinical implementation of HAC-OP-DPC, which is identified using routinely collected administrative data, could support the evaluation of integrated interventions aimed at optimizing inpatient care for older adults. UR - https://aging.jmir.org/2025/1/e68267 UR - http://dx.doi.org/10.2196/68267 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/68267 ER - TY - JOUR AU - Santiago-Torres, Margarita AU - Mull, E. Kristin AU - Sullivan, M. Brianna AU - Cupertino, Paula Ana AU - Salloum, G. Ramzi AU - Triplette, Matthew AU - Zvolensky, J. Michael AU - Bricker, B. Jonathan PY - 2025/1/31 TI - Evaluating the Impact of Pharmacotherapy in Augmenting Quit Rates Among Hispanic Adults in an App-Delivered Smoking Cessation Intervention: Secondary Analysis of a Randomized Controlled Trial JO - JMIR Form Res SP - e69311 VL - 9 KW - acceptance and commitment therapy KW - Hispanic or Latino KW - iCanQuit KW - QuitGuide KW - smartphone apps KW - smoking cessation KW - mobile phone N2 - Background: Hispanic adults receive less advice to quit smoking and use fewer evidence-based smoking cessation treatments compared to their non-Hispanic counterparts. Digital smoking cessation interventions, such as those delivered via smartphone apps, provide a feasible and within-reach treatment option for Hispanic adults who smoke and want to quit smoking. While the combination of pharmacotherapy and behavioral interventions are considered best practices for smoking cessation, its efficacy among Hispanic adults, especially alongside smartphone app?based interventions, is uncertain. Objective: This secondary analysis used data from a randomized controlled trial that compared the efficacy of 2 smoking cessation apps, iCanQuit (based on acceptance and commitment therapy) and QuitGuide (following US clinical practice guidelines), to explore the association between pharmacotherapy use and smoking cessation outcomes among the subsample of 173 Hispanic participants who reported on pharmacotherapy use. Given the randomized design, we first tested the potential interaction of pharmacotherapy use and intervention arm on 12-month cigarette smoking abstinence. We then examined whether the use of any pharmacotherapy (ie, nicotine replacement therapy [NRT], varenicline, or bupropion) and NRT alone augmented each app-based intervention efficacy. Methods: Participants reported using pharmacotherapy on their own during the 3-month follow-up and cigarette smoking abstinence at the 12-month follow-up via web-based surveys. These data were used (1) to test the interaction effect of using pharmacotherapy to aid smoking cessation and intervention arm (iCanQuit vs QuitGuide) on smoking cessation at 12 months and (2) to test whether the use of pharmacotherapy to aid smoking cessation augmented the efficacy of each intervention arm to help participants successfully quit smoking. Results: The subsample of Hispanic participants was recruited from 30 US states. They were on average 34.5 (SD 9.3) years of age, 50.9% (88/173) were female, and 56.1% (97/173) reported smoking at least 10 cigarettes daily. Approximately 22% (38/173) of participants reported using pharmacotherapy to aid smoking cessation at the 3-month follow-up, including NRT, varenicline, or bupropion, with no difference between intervention arms. There was an interaction between pharmacotherapy use and intervention arm that marginally influenced 12-month quit rates at 12 months (P for interaction=.053). In the iCanQuit arm, 12-month missing-as-smoking quit rates were 43.8% (7/16) for pharmacotherapy users versus 28.8% (19/16) for nonusers (odds ratio 2.21, 95% CI 0.66-7.48; P=.20). In the QuitGuide arm, quit rates were 9.1% (2/22) for pharmacotherapy users versus 21.7% (15/69) for nonusers (odds ratio 0.36, 95% CI 0.07-1.72; P=.20). Results were similar for the use of NRT only. Conclusions: Combining pharmacotherapy to aid smoking cessation with a smartphone app?based behavioral intervention that teaches acceptance of cravings to smoke (iCanQuit) shows promise in improving quit rates among Hispanic adults. However, this combined approach was not effective with the US clinical guideline?based app (QuitGuide). Trial Registration: ClinicalTrials.gov NCT02724462; https://clinicaltrials.gov/study/NCT02724462 International Registered Report Identifier (IRRID): RR2-10.1001/jamainternmed.2020.4055 UR - https://formative.jmir.org/2025/1/e69311 UR - http://dx.doi.org/10.2196/69311 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/69311 ER - TY - JOUR AU - Reuther, Christina AU - von Essen, Louise AU - Mustafa, Imran Mudassir AU - Saarijärvi, Markus AU - Woodford, Joanne PY - 2025/1/28 TI - Engagement With an Internet-Administered, Guided, Low-Intensity Cognitive Behavioral Therapy Intervention for Parents of Children Treated for Cancer: Analysis of Log-Data From the ENGAGE Feasibility Trial JO - JMIR Form Res SP - e67171 VL - 9 KW - childhood cancer survivor KW - cognitive behavioral therapy KW - engagement KW - internet-administered intervention KW - log-data KW - parents N2 - Background: Parents of children treated for cancer may experience psychological difficulties including depression, anxiety, and posttraumatic stress. Digital interventions, such as internet-administered cognitive behavioral therapy, offer an accessible and flexible means to support parents. However, engagement with and adherence to digital interventions remain a significant challenge, potentially limiting efficacy. Understanding factors influencing user engagement and adherence is crucial for enhancing the acceptability, feasibility, and efficacy of these interventions. We developed an internet-administered, guided, low-intensity cognitive behavioral therapy (LICBT)?based self-help intervention for parents of children treated for cancer, (EJDeR [internetbaserad självhjälp för föräldrar till barn som avslutat en behandling mot cancer or internet-based self-help for parents of children who have completed cancer treatment]). EJDeR included 2 LICBT techniques?behavioral activation and worry management. Subsequently, we conducted the ENGAGE feasibility trial and EJDeR was found to be acceptable and feasible. However, intervention adherence rates were marginally under progression criteria. Objective: This study aimed to (1) describe user engagement with the EJDeR intervention and examine whether (2) sociodemographic characteristics differed between adherers and nonadherers, (3) depression and anxiety scores differed between adherers and nonadherers at baseline, (4) user engagement differed between adherers and nonadherers, and (5) user engagement differed between fathers and mothers. Methods: We performed a secondary analysis of ENGAGE data, including 71 participants. User engagement data were collected through log-data tracking, for example, communication with e-therapists, homework submissions, log-ins, minutes working with EJDeR, and modules completed. Chi-square tests examined differences between adherers and nonadherers and fathers and mothers concerning categorical data. Independent-samples t tests examined differences regarding continuous variables. Results: Module completion rates were higher among those who worked with behavioral activation as their first LICBT module versus worry management. Of the 20 nonadherers who opened the first LICBT module allocated, 30% (n=6) opened behavioral activation and 70% (n=14) opened worry management. No significant differences in sociodemographic characteristics were found. Nonadherers who opened behavioral activation as the first LICBT module allocated had a significantly higher level of depression symptoms at baseline than adherers. No other differences in depression and anxiety scores between adherers and nonadherers were found. Minutes working with EJDeR, number of log-ins, days using EJDeR, number of written messages sent to e-therapists, number of written messages sent to participants, and total number of homework exercises submitted were significantly higher among adherers than among nonadherers. There were no significant differences between fathers and mothers regarding user engagement variables. Conclusions: Straightforward techniques, such as behavioral activation, may be well-suited for digital delivery, and more complex techniques, such as worry management, may require modifications to improve user engagement. User engagement was measured behaviorally, for example, through log-data tracking, and future research should measure emotional and cognitive components of engagement. Trial Registration: ISRCTN Registry 57233429; https://doi.org/10.1186/ISRCTN57233429 UR - https://formative.jmir.org/2025/1/e67171 UR - http://dx.doi.org/10.2196/67171 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67171 ER - TY - JOUR AU - Wang, Jiao AU - Chen, Jianrong AU - Liu, Ying AU - Xu, Jixiong PY - 2025/1/28 TI - Use of the FHTHWA Index as a Novel Approach for Predicting the Incidence of Diabetes in a Japanese Population Without Diabetes: Data Analysis Study JO - JMIR Med Inform SP - e64992 VL - 13 KW - prediction KW - diabetes KW - risk KW - index KW - population without diabetes N2 - Background: Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity. Objective: We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population. Methods: We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database. The least absolute shrinkage and selection operator model was used to select potentially relevant features. Multiple Cox proportional hazard analysis was used to develop a model based on the training set. Results: The final study population of 15464 participants had a mean age of 42 (range 18-79) years; 54.5% (8430) were men. The mean follow-up duration was 6.05 (SD 3.78) years. A total of 373 (2.41%) participants showed progression to diabetes during the follow-up period. Then, we established a novel parameter (the FHTHWA index), to evaluate the incidence of diabetes in a population without diabetes, comprising 6 parameters based on the training set. After multivariable adjustment, individuals in tertile 3 had a significantly higher rate of diabetes compared with those in tertile 1 (hazard ratio 32.141, 95% CI 11.545?89.476). Time receiver operating characteristic curve analyses showed that the FHTHWA index had high accuracy, with the area under the curve value being around 0.9 during the more than 12 years of follow-up. Conclusions: This research successfully developed a diabetes risk assessment index tailored for the Japanese population by utilizing an extensive dataset and a wide range of indices. By categorizing the diabetes risk levels among Japanese individuals, this study offers a novel predictive tool for identifying potential patients, while also delivering valuable insights into diabetes prevention strategies for the healthy Japanese populace. UR - https://medinform.jmir.org/2025/1/e64992 UR - http://dx.doi.org/10.2196/64992 ID - info:doi/10.2196/64992 ER - TY - JOUR AU - Cabon, Sandie AU - Brihi, Sarra AU - Fezzani, Riadh AU - Pierre-Jean, Morgane AU - Cuggia, Marc AU - Bouzillé, Guillaume PY - 2025/1/22 TI - Combining a Risk Factor Score Designed From Electronic Health Records With a Digital Cytology Image Scoring System to Improve Bladder Cancer Detection: Proof-of-Concept Study JO - J Med Internet Res SP - e56946 VL - 27 KW - bladder cancer KW - clinical data reuse KW - multimodal data fusion KW - clinical decision support KW - machine learning KW - risk factors KW - electronic health records KW - detection KW - mortality KW - therapeutic intervention KW - diagnostic tools KW - digital cytology KW - image-based model KW - clinical data KW - algorithms KW - patient KW - biological information N2 - Background: To reduce the mortality related to bladder cancer, efforts need to be concentrated on early detection of the disease for more effective therapeutic intervention. Strong risk factors (eg, smoking status, age, professional exposure) have been identified, and some diagnostic tools (eg, by way of cystoscopy) have been proposed. However, to date, no fully satisfactory (noninvasive, inexpensive, high-performance) solution for widespread deployment has been proposed. Some new models based on cytology image classification were recently developed and bring good perspectives, but there are still avenues to explore to improve their performance. Objective: Our team aimed to evaluate the benefit of combining the reuse of massive clinical data to build a risk factor model and a digital cytology image?based model (VisioCyt) for bladder cancer detection. Methods: The first step relied on designing a predictive model based on clinical data (ie, risk factors identified in the literature) extracted from the clinical data warehouse of the Rennes Hospital and machine learning algorithms (logistic regression, random forest, and support vector machine). It provides a score corresponding to the risk of developing bladder cancer based on the patient?s clinical profile. Second, we investigated 3 strategies (logistic regression, decision tree, and a custom strategy based on score interpretation) to combine the model?s score with the score from an image-based model to produce a robust bladder cancer scoring system. Results: We collected 2 data sets. The first set, including clinical data for 5422 patients extracted from the clinical data warehouse, was used to design the risk factor?based model. The second set was used to measure the models? performances and was composed of data for 620 patients from a clinical trial for which cytology images and clinicobiological features were collected. With this second data set, the combination of both models obtained areas under the curve of 0.82 on the training set and 0.83 on the test set, demonstrating the value of combining risk factor?based and image-based models. This combination offers a higher associated risk of cancer than VisioCyt alone for all classes, especially for low-grade bladder cancer. Conclusions: These results demonstrate the value of combining clinical and biological information, especially to improve detection of low-grade bladder cancer. Some improvements will need to be made to the automatic extraction of clinical features to make the risk factor?based model more robust. However, as of now, the results support the assumption that this type of approach will be of benefit to patients. UR - https://www.jmir.org/2025/1/e56946 UR - http://dx.doi.org/10.2196/56946 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56946 ER - TY - JOUR AU - Fukushima, Takuya AU - Manabe, Masae AU - Yada, Shuntaro AU - Wakamiya, Shoko AU - Yoshida, Akiko AU - Urakawa, Yusaku AU - Maeda, Akiko AU - Kan, Shigeyuki AU - Takahashi, Masayo AU - Aramaki, Eiji PY - 2025/1/16 TI - Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset JO - JMIR Med Inform SP - e65047 VL - 13 KW - large language models KW - genetic counseling KW - medical KW - health KW - artificial intelligence KW - machine learning KW - domain adaptation KW - retrieval-augmented generation KW - instruction tuning KW - prompt engineering KW - question-answer KW - dialogue KW - ethics KW - safety KW - low-rank adaptation KW - Japanese KW - expert evaluation N2 - Background: Advances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation. Objective: This study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs. Methods: Two primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs?instruction tuning, RAG, and prompt engineering?were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus. Results: The evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements. Conclusions: RAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues observed in JGCLLM responses underscore the critical need for ongoing refinement and thorough ethical evaluation before these systems can be implemented in health care settings. UR - https://medinform.jmir.org/2025/1/e65047 UR - http://dx.doi.org/10.2196/65047 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65047 ER - TY - JOUR AU - Williams, P. Michael AU - Manjourides, Justin AU - Smith, H. Louisa AU - Rainer, B. Crissi AU - Hightow-Weidman, B. Lisa AU - Haley, F. Danielle PY - 2025/1/13 TI - Studying the Digital Intervention Engagement?Mediated Relationship Between Intrapersonal Measures and Pre-Exposure Prophylaxis Adherence in Sexual and Gender Minority Youth: Secondary Analysis of a Randomized Controlled Trial JO - J Med Internet Res SP - e57619 VL - 27 KW - engagement KW - pre-exposure prophylaxis KW - PrEP KW - digital health intervention KW - adherence KW - men who have sex with men KW - sexual orientation KW - gender minority KW - youth KW - adolescent KW - teenager KW - HIV KW - randomized controlled trial KW - mental health KW - sociodemographic KW - logistic regression KW - health information KW - health behavior KW - sexual health N2 - Background: Improving adherence to pre-exposure prophylaxis (PrEP) via digital health interventions (DHIs) for young sexual and gender minority men who have sex with men (YSGMMSM) is promising for reducing the HIV burden. Measuring and achieving effective engagement (sufficient to solicit PrEP adherence) in YSGMMSM is challenging. Objective: This study is a secondary analysis of the primary efficacy randomized controlled trial (RCT) of Prepared, Protected, Empowered (P3), a digital PrEP adherence intervention that used causal mediation to quantify whether and to what extent intrapersonal behavioral, mental health, and sociodemographic measures were related to effective engagement for PrEP adherence in YSGMMSM. Methods: In May 2019, 264 YSGMMSM were recruited for the primary RCT via social media, community sites, and clinics from 9 study sites across the United States. For this secondary analysis, 140 participants were eligible (retained at follow-up, received DHI condition in primary RCT, and completed trial data). Participants earned US currency for daily use of P3 and lost US currency for nonuse. Dollars accrued at the 3-month follow-up were used to measure engagement. PrEP nonadherence was defined as blood serum concentrations of tenofovir-diphosphate and emtricitabine-triphosphate that correlated with ?4 doses weekly at the 3-month follow-up. Logistic regression was used to estimate the total effect of baseline intrapersonal measures on PrEP nonadherence, represented as odds ratios (ORs) with a null value of 1. The total OR for each intrapersonal measure was decomposed into direct and indirect effects. Results: For every US $1 earned above the mean (US $96, SD US $35.1), participants had 2% (OR 0.98, 95% CI 0.97-0.99) lower odds of PrEP nonadherence. Frequently using phone apps to track health information was associated with a 71% (OR 0.29, 95% CI 0.06-0.96) lower odds of PrEP nonadherence. This was overwhelmingly a direct effect, not mediated by engagement, with a percentage mediated (PM) of 1%. Non-Hispanic White participants had 83% lower odds of PrEP nonadherence (OR 0.17, 95% CI 0.05-0.48) and had a direct effect (PM=4%). Participants with depressive symptoms and anxiety symptoms had 3.4 (OR 3.42, 95% CI 0.95-12) and 3.5 (OR 3.51, 95% CI 1.06-11.55) times higher odds of PrEP nonadherence, respectively. Anxious symptoms largely operated through P3 engagement (PM=51%). Conclusions: P3 engagement (dollars accrued) was strongly related to lower odds of PrEP nonadherence. Intrapersonal measures operating through P3 engagement (indirect effect, eg, anxious symptoms) suggest possible pathways to improve PrEP adherence DHI efficacy in YSGMMSM via effective engagement. Conversely, the direct effects observed in this study may reflect existing structural disparity (eg, race and ethnicity) or behavioral dispositions toward technology (eg, tracking health via phone apps). Evaluating effective engagement in DHIs with causal mediation approaches provides a clarifying and mechanistic view of how DHIs impact health behavior. Trial Registration: ClinicalTrials.gov; NCT03320512; https://clinicaltrials.gov/study/NCT03320512 UR - https://www.jmir.org/2025/1/e57619 UR - http://dx.doi.org/10.2196/57619 UR - http://www.ncbi.nlm.nih.gov/pubmed/39804696 ID - info:doi/10.2196/57619 ER - TY - JOUR AU - Petit, Pascal AU - Vuillerme, Nicolas PY - 2025/1/9 TI - Leveraging Administrative Health Databases to Address Health Challenges in Farming Populations: Scoping Review and Bibliometric Analysis (1975-2024) JO - JMIR Public Health Surveill SP - e62939 VL - 11 KW - farming population KW - digital public health KW - digital epidemiology KW - administrative health database KW - farming exposome KW - review KW - bibliometric analysis KW - data reuse N2 - Background: Although agricultural health has gained importance, to date, much of the existing research relies on traditional epidemiological approaches that often face limitations related to sample size, geographic scope, temporal coverage, and the range of health events examined. To address these challenges, a complementary approach involves leveraging and reusing data beyond its original purpose. Administrative health databases (AHDs) are increasingly reused in population-based research and digital public health, especially for populations such as farmers, who face distinct environmental risks. Objective: We aimed to explore the reuse of AHDs in addressing health issues within farming populations by summarizing the current landscape of AHD-based research and identifying key areas of interest, research gaps, and unmet needs. Methods: We conducted a scoping review and bibliometric analysis using PubMed and Web of Science. Building upon previous reviews of AHD-based public health research, we conducted a comprehensive literature search using 72 terms related to the farming population and AHDs. To identify research hot spots, directions, and gaps, we used keyword frequency, co-occurrence, and thematic mapping. We also explored the bibliometric profile of the farming exposome by mapping keyword co-occurrences between environmental factors and health outcomes. Results: Between 1975 and April 2024, 296 publications across 118 journals, predominantly from high-income countries, were identified. Nearly one-third of these publications were associated with well-established cohorts, such as Agriculture and Cancer and Agricultural Health Study. The most frequently used AHDs included disease registers (158/296, 53.4%), electronic health records (124/296, 41.9%), insurance claims (106/296, 35.8%), population registers (95/296, 32.1%), and hospital discharge databases (41/296, 13.9%). Fifty (16.9%) of 296 studies involved >1 million participants. Although a broad range of exposure proxies were used, most studies (254/296, 85.8%) relied on broad proxies, which failed to capture the specifics of farming tasks. Research on the farming exposome remains underexplored, with a predominant focus on the specific external exposome, particularly pesticide exposure. A limited range of health events have been examined, primarily cancer, mortality, and injuries. Conclusions: The increasing use of AHDs holds major potential to advance public health research within farming populations. However, substantial research gaps persist, particularly in low-income regions and among underrepresented farming subgroups, such as women, children, and contingent workers. Emerging issues, including exposure to per- and polyfluoroalkyl substances, biological agents, microbiome, microplastics, and climate change, warrant further research. Major gaps also persist in understanding various health conditions, including cardiovascular, reproductive, ocular, sleep-related, age-related, and autoimmune diseases. Addressing these overlooked areas is essential for comprehending the health risks faced by farming communities and guiding public health policies. Within this context, promoting AHD-based research, in conjunction with other digital data sources (eg, mobile health, social health data, and wearables) and artificial intelligence approaches, represents a promising avenue for future exploration. UR - https://publichealth.jmir.org/2025/1/e62939 UR - http://dx.doi.org/10.2196/62939 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/62939 ER - TY - JOUR AU - Sang, Ling AU - Zheng, Bixin AU - Zeng, Xianzheng AU - Liu, Huizhen AU - Jiang, Qing AU - Liu, Maotong AU - Zhu, Chenyu AU - Wang, Maoying AU - Yi, Zengwei AU - Song, Keyu AU - Song, Li PY - 2024/12/30 TI - Effectiveness of Outpatient Chronic Pain Management for Middle-Aged Patients by Internet Hospitals: Retrospective Cohort Study JO - JMIR Med Inform SP - e54975 VL - 12 KW - chronic pain management KW - internet hospital KW - physical hospital KW - quality of life KW - outpatient care KW - telemedicine KW - digital health N2 - Background: Chronic pain is widespread and carries a heavy disease burden, and there is a lack of effective outpatient pain management. As an emerging internet medical platform in China, internet hospitals have been successfully applied for the management of chronic diseases. There are also a certain number of patients with chronic pain that use internet hospitals for pain management. However, no studies have investigated the effectiveness of pain management via internet hospitals. Objective: The aim of this retrospective cohort study was to explore the effectiveness of chronic pain management by internet hospitals and their advantages and disadvantages compared to traditional physical hospital visits. Methods: This was a retrospective cohort study. Demographic information such as the patient?s sex, age, and number of visits was obtained from the IT center. During the first and last patient visits, information on outcome variables such as the Brief Pain Inventory (BPI), medical satisfaction, medical costs, and adverse drug events was obtained through a telephone follow-up. All patients with chronic pain who had 3 or more visits (internet or offline) between September 2021, and February 2023, were included. The patients were divided into an internet hospital group and a physical hospital group, according to whether they had web-based or in-person consultations, respectively. To control for confounding variables, propensity score matching was used to match the two groups. Matching variables included age, sex, diagnosis, and number of clinic visits. Results: A total of 122 people in the internet hospital group and 739 people in the physical hospital group met the inclusion criteria. After propensity score matching, 77 patients in each of the two groups were included in the analysis. There was not a significant difference in the quality of life (QOL; QOL assessment was part of the BPI scale) between the internet hospital group and the physical hospital group (P=.80), but the QOL of both groups of patients improved after pain management (internet hospital group: P<.001; physical hospital group: P=.001). There were no significant differences in the pain relief rate (P=.25) or the incidence of adverse events (P=.60) between the two groups. The total cost (P<.001) and treatment-related cost (P<.001) of the physical hospital group were higher than those of the internet hospital group. In addition, the degree of satisfaction in the internet hospital group was greater than that in the physical hospital group (P=.01). Conclusions: Internet hospitals are an effective way of managing chronic pain. They can improve patients? QOL and satisfaction, reduce treatment costs, and can be used as part of a multimodal strategy for chronic pain self-management. UR - https://medinform.jmir.org/2024/1/e54975 UR - http://dx.doi.org/10.2196/54975 ID - info:doi/10.2196/54975 ER - TY - JOUR AU - Knight, Jo AU - Chandrabalan, Vardhan Vishnu AU - Emsley, A. Hedley C. PY - 2024/12/24 TI - Visualizing Patient Pathways and Identifying Data Repositories in a UK Neurosciences Center: Exploratory Study JO - JMIR Med Inform SP - e60017 VL - 12 KW - health data KW - business process monitoring notation KW - neurology KW - process monitoring KW - patient pathway KW - clinical pathway KW - patient care KW - EHR KW - electronic health record KW - dataset KW - questionnaire KW - patient data KW - NHS KW - National Health Service N2 - Background: Health and clinical activity data are a vital resource for research, improving patient care and service efficiency. Health care data are inherently complex, and their acquisition, storage, retrieval, and subsequent analysis require a thorough understanding of the clinical pathways underpinning such data. Better use of health care data could lead to improvements in patient care and service delivery. However, this depends on the identification of relevant datasets. Objective: We aimed to demonstrate the application of business process modeling notation (BPMN) to represent clinical pathways at a UK neurosciences center and map the clinical activity to corresponding data flows into electronic health records and other nonstandard data repositories. Methods: We used BPMN to map and visualize a patient journey and the subsequent movement and storage of patient data. After identifying several datasets that were being held outside of the standard applications, we collected information about these datasets using a questionnaire. Results: We identified 13 standard applications where neurology clinical activity was captured as part of the patient?s electronic health record including applications and databases for managing referrals, outpatient activity, laboratory data, imaging data, and clinic letters. We also identified 22 distinct datasets not within standard applications that were created and managed within the neurosciences department, either by individuals or teams. These were being used to deliver direct patient care and included datasets for tracking patient blood results, recording home visits, and tracking triage status. Conclusions: Mapping patient data flows and repositories allowed us to identify areas wherein the current electronic health record does not fulfill the needs of day-to-day patient care. Data that are being stored outside of standard applications represent a potential duplication in the effort and risks being overlooked. Future work should identify unmet data needs to inform correct data capture and centralization within appropriate data architectures. UR - https://medinform.jmir.org/2024/1/e60017 UR - http://dx.doi.org/10.2196/60017 ID - info:doi/10.2196/60017 ER - TY - JOUR AU - Hermansen, Anna AU - Pollard, Samantha AU - McGrail, Kimberlyn AU - Bansback, Nick AU - Regier, A. Dean PY - 2024/12/17 TI - Heuristics Identified in Health Data?Sharing Preferences of Patients With Cancer: Qualitative Focus Group Study JO - J Med Internet Res SP - e63155 VL - 26 KW - heuristics KW - health data sharing KW - cancer patients KW - decision-making KW - real-world data KW - altruism KW - trust KW - control KW - data sharing KW - focus group KW - precision medicine KW - clinical data KW - exploratory study KW - qualitative KW - Canada KW - thematic analysis KW - informed consent KW - patient education KW - information technology KW - healthcare KW - medical informatics N2 - Background: Evaluating precision oncology outcomes requires access to real-world and clinical trial data. Access is based on consent, and consent is based on patients? informed preferences when deciding to share their data. Decision-making is often modeled using utility theory, but a complex decision context calls for a consideration of how heuristic, intuitive thought processes interact with rational utility maximization. Data-sharing decision-making has been studied using heuristic theory, but almost no heuristic research exists in the health data context. This study explores this evidence gap, applying a qualitative approach to probe for evidence of heuristic mechanisms behind the health data-sharing preferences of those who have experienced cancer. Exploring qualitative decision-making reveals the types of heuristics used and how they are related to the process of decision-making to better understand whether consent mechanisms should consider nonrational processes to better serve patient decision-making. Objective: This study aimed to explore how patients with cancer use heuristics when deciding whether to share their data for research. Methods: The researchers conducted a focus group study of Canadians who have experienced cancer. We recruited participants through an online advertisement, screening individuals based on their ability to increase demographic diversity in the sample. We reviewed the literature on data-sharing platforms to develop a semistructured topic guide on concerns about data sharing, incentives to share, and consent and control. Focus group facilitators led the open-ended discussions about data-sharing preferences that revealed underlying heuristics. Two qualitative analysts coded transcripts using a heuristic framework developed from a review of the literature. Transcripts were analyzed for heuristic instances which were grouped according to sociocultural categories. Using thematic analysis, the analysts generated reflexive themes through norming sessions and consultations. Results: A total of 3 focus groups were held with 19 participants in total. The analysis identified 12 heuristics underlying intentions to share data. From the thematic analysis, we identified how the heuristics of social norms and community building were expressed through altruism; the recognition, reputation, and authority heuristics led to (dis)trust in certain institutions; the need for security prompted the illusion of control and transparency heuristics; and the availability and affect heuristics influenced attitudes around risk and benefit. These thematic relationships all had impacts on the participants? intentions to share their health data. Conclusions: The findings provide a novel qualitative understanding of how health data?sharing decisions and preferences may be based on heuristic processing. As patients consider the extent of risks and benefits, heuristic processes influence their assessment of anticipated outcomes, which may not result in rational, truly informed consent. This study shows how considering heuristic processing when designing current consent mechanisms opens up the opportunity for more meaningful and realistic interactions with the complex decision-making context. UR - https://www.jmir.org/2024/1/e63155 UR - http://dx.doi.org/10.2196/63155 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63155 ER - TY - JOUR AU - Hans, Patricius Felix AU - Kleinekort, Jan AU - Boerries, Melanie AU - Nieters, Alexandra AU - Kindle, Gerhard AU - Rautenberg, Micha AU - Bühler, Laura AU - Weiser, Gerda AU - Röttger, Clemens Michael AU - Neufischer, Carolin AU - Kühn, Matthias AU - Wehrle, Julius AU - Slagman, Anna AU - Fischer-Rosinsky, Antje AU - Eienbröker, Larissa AU - Hanses, Frank AU - Teepe, Wilhelm Gisbert AU - Busch, Hans-Jörg AU - Benning, Leo PY - 2024/12/17 TI - Information Mode?Dependent Success Rates of Obtaining German Medical Informatics Initiative?Compliant Broad Consent in the Emergency Department: Single-Center Prospective Observational Study JO - JMIR Med Inform SP - e65646 VL - 12 KW - biomedical research KW - delivery of health care KW - informed consent KW - medical informatics KW - digital health KW - emergency medical services KW - routinely collected health data KW - data science KW - secondary data analysis KW - data analysis KW - biomedical KW - emergency KW - Germany KW - Europe KW - prospective observational study KW - broad consent KW - inpatient stay KW - logistic regression analysis KW - health care delivery KW - inpatients N2 - Background: The broad consent (BC) developed by the German Medical Informatics Initiative is a pivotal national strategy for obtaining patient consent to use routinely collected data from electronic health records, insurance companies, contact information, and biomaterials for research. Emergency departments (EDs) are ideal for enrolling diverse patient populations in research activities. Despite regulatory and ethical challenges, obtaining BC from patients in ED with varying demographic, socioeconomic, and disease characteristics presents a promising opportunity to expand the availability of ED data. Objective: This study aimed to evaluate the success rate of obtaining BC through different consenting approaches in a tertiary ED and to explore factors influencing consent and dropout rates. Methods: A single-center prospective observational study was conducted in a German tertiary ED from September to December 2022. Every 30th patient was screened for eligibility. Eligible patients were informed via one of three modalities: (1) directly in the ED, (2) during their inpatient stay on the ward, or (3) via telephone after discharge. The primary outcome was the success rate of obtaining BC within 30 days of ED presentation. Secondary outcomes included analyzing potential influences on the success and dropout rates based on patient characteristics, information mode, and the interaction time required for patients to make an informed decision. Results: Of 11,842 ED visits, 419 patients were screened for BC eligibility, with 151 meeting the inclusion criteria. Of these, 68 (45%) consented to at least 1 BC module, while 24 (15.9%) refused participation. The dropout rate was 39.1% (n=59) and was highest in the telephone-based group (57/109, 52.3%) and lowest in the ED group (1/14, 7.1%). Patients informed face-to-face during their inpatient stay following the ED treatment had the highest consent rate (23/27, 85.2%), while those approached in the ED or by telephone had consent rates of 69.2% (9/13 and 36/52). Logistic regression analysis indicated that longer interaction time significantly improved consent rates (P=.03), while female sex was associated with higher dropout rates (P=.02). Age, triage category, billing details (inpatient treatment), or diagnosis did not significantly influence the primary outcome (all P>.05). Conclusions: Obtaining BC in an ED environment is feasible, enabling representative inclusion of ED populations. However, discharge from the ED and female sex negatively affected consent rates to the BC. Face-to-face interaction proved most effective, particularly for inpatients, while telephone-based approaches resulted in higher dropout rates despite comparable consent rates to direct consenting in the ED. The findings underscore the importance of tailored consent strategies and maintaining consenting staff in EDs and on the wards to enhance BC information delivery and consent processes for eligible patients. Trial Registration: German Clinical Trials Register DRKS00028753; https://drks.de/search/de/trial/DRKS00028753 UR - https://medinform.jmir.org/2024/1/e65646 UR - http://dx.doi.org/10.2196/65646 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65646 ER - TY - JOUR AU - Miller, I. Joshua AU - Hassell, L. Kathryn AU - Kellar-Guenther, Yvonne AU - Quesada, Stacey AU - West, Rhonda AU - Sontag, Marci PY - 2024/12/9 TI - The Prevalence of Sickle Cell Disease in Colorado and Methodologies of the Colorado Sickle Cell Data Collection Program: Public Health Surveillance Study JO - JMIR Public Health Surveill SP - e64995 VL - 10 KW - sickle cell disease KW - public health surveillance KW - prevalence KW - birth prevalence KW - Colorado KW - sickle cell KW - surveillance KW - SCD KW - USA KW - data collection KW - blood disorder KW - policy development KW - hematology KW - United States N2 - Background: Sickle cell disease (SCD) is a genetic blood disorder that affects approximately 100,000 individuals in the United States, with the highest prevalence among Black or African American populations. While advances in care have improved survival, comprehensive state-level data on the prevalence of SCD remain limited, which hampers efforts to optimize health care services. To address this gap, the Colorado Sickle Cell Data Collection (CO-SCDC) program was established in 2021 as part of the Centers for Disease Control and Prevention?s initiative to enhance surveillance and public health efforts for SCD. Objective: The objectives of this study were to describe the establishment of the CO-SCDC program and to provide updated estimates of the prevalence and birth prevalence of SCD in Colorado, including geographic dispersion. Additional objectives include evaluating the accuracy of case identification methods and leveraging surveillance activities to inform public health initiatives. Methods: Data were collected from Health Data Compass (a multi-institutional data warehouse) containing electronic health records from the University of Colorado Health and Children?s Hospital Colorado for the years 2012?2020. Colorado newborn screening program data were included for confirmed SCD diagnoses from 2001 to 2020. Records were linked using the Colorado University Record Linkage tool and deidentified for analysis. Case definitions, adapted from the Centers for Disease Control and Prevention?s Registry and Surveillance System for Hemoglobinopathies project, classified cases as possible, probable, or definite SCD. Clinical validation by hematologists was performed to ensure accuracy, and prevalence rates were calculated using 2020 US Census population estimates. Results: In 2019, 435 individuals were identified as living with SCD in Colorado, an increase of 16%?40% over previous estimates, with the majority (n=349, 80.2%) identifying as Black or African American. The median age of individuals was 19 years. The prevalence of SCD was highest in urban counties, with concentrations in Arapahoe, Denver, and El Paso counties. Birth prevalence of SCD increased from 11.9 per 100,000 live births between 2010 and 2014 to 20.1 per 100,000 live births between 2015 and 2019 with 58.5% (n=38) of cases being hemoglobin (Hb) SS or HbS?0 thalassemia subtypes. The study highlighted a 67% (n=26) increase in SCD births over the decade, correlating with the growth of the Black or African American population in the state. Conclusions: The CO-SCDC program successfully established the capacity to perform SCD surveillance and, in doing so, identified baseline prevalence estimates for SCD in Colorado. The findings highlight geographic dispersion across Colorado counties, highlighting the need for equitable access to specialty care, particularly for rural populations. The combination of automated data linkage and clinical validation improved case identification accuracy. Future efforts will expand surveillance to include claims data to better capture health care use and address potential underreporting. These results will guide public health interventions aimed at improving care for individuals with SCD in Colorado. UR - https://publichealth.jmir.org/2024/1/e64995 UR - http://dx.doi.org/10.2196/64995 ID - info:doi/10.2196/64995 ER - TY - JOUR AU - Qiu, Yuanbo AU - Huang, Huang AU - Gai, Junjie AU - De Leo, Gianluca PY - 2024/12/4 TI - The Effects of the COVID-19 Pandemic on Age-Based Disparities in Digital Health Technology Use: Secondary Analysis of the 2017-2022 Health Information National Trends Survey JO - J Med Internet Res SP - e65541 VL - 26 KW - age-based disparities KW - health equity KW - digital health technology use KW - digital divide KW - health policy KW - COVID-19 KW - mobile phone N2 - Background: The COVID-19 pandemic accelerated the adoption of digital health technology, but it could also impact age-based disparities as existing studies have pointed out. Compared with the pre-pandemic period, whether the rapid digitalization of the health care system during the pandemic widened the age-based disparities over a long period remains unclear. Objective: This study aimed to analyze the long-term effects of the COVID-19 pandemic on the multifaceted landscape of digital health technology used across diverse age groups among US citizens. Methods: We conducted the retrospective observational study using the 2017-2022 Health Information National Trends Survey to identify the influence of the COVID-19 pandemic on a wide range of digital health technology use outcomes across various age groups. The sample included 15,505 respondents, which were categorized into 3 age groups: adults (18-44 years), middle-aged adults (45-64 years), and older adults (more than 65 years). We also designated the time point of March 11, 2020, to divide the pre- and post-pandemic periods. Based on these categorizations, multivariate linear probability models were used to assess pre-post changes in digital health technology use, controlling for demographic, socioeconomic, and health-related variables among different age groups. Results: Essentially, older adults were found to be significantly less likely to use digital health technology compared with adults, with a 26.28% lower likelihood of using the internet for health information (P<.001) and a 32.63% lower likelihood of using health apps (P<.001). The usage of digital health technology for all age groups had significantly increased after the onset of the pandemic, and the age-based disparities became smaller in terms of using the internet to look for health information. However, the disparities have widened for older adults in using the internet to look up test results (11.21%, P<.001) and make appointments (10.03%, P=.006) and using wearable devices to track health (8.31%, P=.01). Conclusions: Our study reveals a significant increase in the use of digital health technology among all age groups during the pandemic. However, while the disparities in accessing online information have narrowed, age-based disparities, particularly for older adults, have widened in most areas such as looking up test results and making appointments with doctors. Therefore, older adults are more likely left behind by the rapidly digitalized US health care system during the pandemic. Policy makers and health care providers should focus on addressing these disparities to ensure equitable access to digital health resources for US baby boomers. UR - https://www.jmir.org/2024/1/e65541 UR - http://dx.doi.org/10.2196/65541 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65541 ER - TY - JOUR AU - Chen, Xiao AU - Shen, Zhiyun AU - Guan, Tingyu AU - Tao, Yuchen AU - Kang, Yichen AU - Zhang, Yuxia PY - 2024/11/29 TI - Analyzing Patient Experience on Weibo: Machine Learning Approach to Topic Modeling and Sentiment Analysis JO - JMIR Med Inform SP - e59249 VL - 12 KW - patient experience KW - experience KW - attitude KW - opinion KW - perception KW - perspective KW - machine learning KW - natural language process KW - NLP KW - social media KW - free-text KW - unstructured KW - Weibo KW - spatiotemporal KW - topic modeling KW - sentiment N2 - Background: Social media platforms allow individuals to openly gather, communicate, and share information about their interactions with health care services, becoming an essential supplemental means of understanding patient experience. Objective: We aimed to identify common discussion topics related to health care experience from the public?s perspective and to determine areas of concern from patients? perspectives that health care providers should act on. Methods: This study conducted a spatiotemporal analysis of the volume, sentiment, and topic of patient experience?related posts on the Weibo platform developed by Sina Corporation. We applied a supervised machine learning approach including human annotation and machine learning?based models for topic modeling and sentiment analysis of the public discourse. A multiclassifier voting method based on logistic regression, multinomial naïve Bayes, and random forest was used. Results: A total of 4008 posts were manually classified into patient experience topics. A patient experience theme framework was developed. The accuracy, precision, recall, and F-measure of the method integrating logistic regression, multinomial naïve Bayes, and random forest for patient experience themes were 0.93, 0.95, 0.80, 0.77, and 0.84, respectively, indicating a satisfactory prediction. The sentiment analysis revealed that negative sentiment posts constituted the highest proportion (3319/4008, 82.81%). Twenty patient experience themes were discussed on the social media platform. The majority of the posts described the interpersonal aspects of care (2947/4008, 73.53%); the five most frequently discussed topics were ?health care professionals? attitude,? ?access to care,? ?communication, information, and education,? ?technical competence,? and ?efficacy of treatment.? Conclusions: Hospital administrators and clinicians should consider the value of social media and pay attention to what patients and their family members are communicating on social media. To increase the utility of these data, a machine learning algorithm can be used for topic modeling. The results of this study highlighted the interpersonal and functional aspects of care, especially the interpersonal aspects, which are often the ?moment of truth? during a service encounter in which patients make a critical evaluation of hospital services. UR - https://medinform.jmir.org/2024/1/e59249 UR - http://dx.doi.org/10.2196/59249 ID - info:doi/10.2196/59249 ER - TY - JOUR AU - Bialke, Martin AU - Stahl, Dana AU - Leddig, Torsten AU - Hoffmann, Wolfgang PY - 2024/11/29 TI - The University Medicine Greifswald?s Trusted Third Party Dispatcher: State-of-the-Art Perspective Into Comprehensive Architectures and Complex Research Workflows JO - JMIR Med Inform SP - e65784 VL - 12 KW - architecture KW - scalability KW - trusted third party KW - application KW - security KW - consent KW - identifying data KW - infrastructure KW - modular KW - software KW - implementation KW - user interface KW - health platform KW - data management KW - data privacy KW - health record KW - electronic health record KW - EHR KW - pseudonymization UR - https://medinform.jmir.org/2024/1/e65784 UR - http://dx.doi.org/10.2196/65784 ID - info:doi/10.2196/65784 ER - TY - JOUR AU - Wündisch, Eric AU - Hufnagl, Peter AU - Brunecker, Peter AU - Meier zu Ummeln, Sophie AU - Träger, Sarah AU - Prasser, Fabian AU - Weber, Joachim PY - 2024/11/29 TI - Authors? Reply: The University Medicine Greifswald?s Trusted Third Party Dispatcher: State-of-the-Art Perspective Into Comprehensive Architectures and Complex Research Workflows JO - JMIR Med Inform SP - e67429 VL - 12 KW - architecture KW - scalability KW - trusted third party KW - application KW - security KW - consent KW - identifying data KW - infrastructure KW - modular KW - software KW - implementation KW - user interface KW - health platform KW - data management KW - data privacy KW - health record KW - electronic health record KW - EHR KW - pseudonymization UR - https://medinform.jmir.org/2024/1/e67429 UR - http://dx.doi.org/10.2196/67429 ID - info:doi/10.2196/67429 ER - TY - JOUR AU - Lee, Haeun AU - Kim, Seok AU - Moon, Hui-Woun AU - Lee, Ho-Young AU - Kim, Kwangsoo AU - Jung, Young Se AU - Yoo, Sooyoung PY - 2024/11/22 TI - Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study JO - J Med Internet Res SP - e59260 VL - 26 KW - length of stay KW - machine learning KW - Observational Medical Outcomes Partnership Common Data Model KW - allocation of resources KW - reproducibility of results KW - hospital KW - admission KW - retrospective study KW - prediction model KW - electronic health record KW - EHR KW - South Korea KW - logistic regression KW - algorithm KW - Shapley Additive Explanation KW - health care KW - clinical informatics N2 - Background: Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population. Objective: In this study, we developed and validated a machine learning (ML)?based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). Methods: Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital. Results: The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807). Conclusions: The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings. UR - https://www.jmir.org/2024/1/e59260 UR - http://dx.doi.org/10.2196/59260 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59260 ER - TY - JOUR AU - van Aubel, Evelyne AU - Vaessen, Thomas AU - Uyttebroek, Lotte AU - Steinhart, Henrietta AU - Beijer-Klippel, Annelie AU - Batink, Tim AU - van Winkel, Ruud AU - de Haan, Lieuwe AU - van der Gaag, Mark AU - van Amelsvoort, Thérèse AU - Marcelis, Machteld AU - Schirmbeck, Frederike AU - Reininghaus, Ulrich AU - Myin-Germeys, Inez PY - 2024/11/21 TI - Engagement and Acceptability of Acceptance and Commitment Therapy in Daily Life in Early Psychosis: Secondary Findings From a Multicenter Randomized Controlled Trial JO - JMIR Form Res SP - e57109 VL - 8 KW - acceptance and commitment therapy KW - ACT KW - first episode of psychosis KW - FEP KW - ultrahigh risk for psychosis KW - UHR KW - ecological momentary intervention KW - EMI KW - mobile health KW - mHealth KW - blended care KW - mobile phone N2 - Background: Acceptance and commitment therapy (ACT) is promising in the treatment of early psychosis. Augmenting face-to-face ACT with mobile health ecological momentary interventions may increase its treatment effects and empower clients to take treatment into their own hands. Objective: This study aimed to investigate and predict treatment engagement with and acceptability of acceptance and commitment therapy in daily life (ACT-DL), a novel ecological momentary intervention for people with an ultrahigh risk state and a first episode of psychosis. Methods: In the multicenter randomized controlled trial, 148 individuals with ultrahigh risk or first-episode psychosis aged 15-65 years were randomized to treatment as usual only (control) or to ACT-DL combined with treatment as usual (experimental), consisting of 8 face-to-face sessions augmented with an ACT-based smartphone app, delivering ACT skills and techniques in daily life. For individuals in the intervention arm, we collected data on treatment engagement with and acceptability of ACT-DL during and after the intervention. Predictors of treatment engagement and acceptability included baseline demographic, clinical, and functional outcomes. Results: Participants who received ACT-DL in addition to treatment as usual (n=71) completed a mean of 6 (SD 3) sessions, with 59% (n=42) of participants completing all sessions. App engagement data (n=58) shows that, on a weekly basis, participants used the app 13 times and were compliant with 6 of 24 (25%) notifications. Distribution plots of debriefing scores (n=46) show that 85%-96% of participants reported usefulness on all acceptability items to at least some extent (scores ?2; 1=no usefulness) and that 91% (n=42) of participants reported perceived burden by number and length of notifications (scores ?2; 1=no burden). Multiple linear regression models were fitted to predict treatment engagement and acceptability. Ethnic minority backgrounds predicted lower notification response compliance (B=?4.37; P=.01), yet higher app usefulness (B=1.25; P=.049). Negative (B=?0.26; P=.01) and affective (B=0.14; P=.04) symptom severity predicted lower and higher ACT training usefulness, respectively. Being female (B=?1.03; P=.005) predicted lower usefulness of the ACT metaphor images on the app. Conclusions: Our results corroborate good treatment engagement with and acceptability of ACT-DL in early psychosis. We provide recommendations for future intervention optimization. Trial Registration: OMON NL46439.068.13; https://onderzoekmetmensen.nl/en/trial/24803 UR - https://formative.jmir.org/2024/1/e57109 UR - http://dx.doi.org/10.2196/57109 UR - http://www.ncbi.nlm.nih.gov/pubmed/39570655 ID - info:doi/10.2196/57109 ER - TY - JOUR AU - Jefferson, Emily AU - Milligan, Gordon AU - Johnston, Jenny AU - Mumtaz, Shahzad AU - Cole, Christian AU - Best, Joseph AU - Giles, Charles Thomas AU - Cox, Samuel AU - Masood, Erum AU - Horban, Scott AU - Urwin, Esmond AU - Beggs, Jillian AU - Chuter, Antony AU - Reilly, Gerry AU - Morris, Andrew AU - Seymour, David AU - Hopkins, Susan AU - Sheikh, Aziz AU - Quinlan, Philip PY - 2024/11/20 TI - The Challenges and Lessons Learned Building a New UK Infrastructure for Finding and Accessing Population-Wide COVID-19 Data for Research and Public Health Analysis: The CO-CONNECT Project JO - J Med Internet Res SP - e50235 VL - 26 KW - COVID-19 KW - infrastructure KW - trusted research environments KW - safe havens KW - feasibility analysis KW - cohort discovery KW - federated analytics KW - federated discovery KW - lessons learned KW - population wide KW - data KW - public health KW - analysis KW - CO-CONNECT KW - challenges KW - data transformation UR - https://www.jmir.org/2024/1/e50235 UR - http://dx.doi.org/10.2196/50235 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/50235 ER - TY - JOUR AU - Seo, Junhyuk AU - Choi, Dasol AU - Kim, Taerim AU - Cha, Chul Won AU - Kim, Minha AU - Yoo, Haanju AU - Oh, Namkee AU - Yi, YongJin AU - Lee, Hwa Kye AU - Choi, Edward PY - 2024/11/20 TI - Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study JO - J Med Internet Res SP - e58329 VL - 26 KW - large language models KW - health care documentation KW - clinical evaluation KW - emergency department KW - artificial intelligence KW - medical record accuracy N2 - Background: The advancement of large language models (LLMs) offers significant opportunities for health care, particularly in the generation of medical documentation. However, challenges related to ensuring the accuracy and reliability of LLM outputs, coupled with the absence of established quality standards, have raised concerns about their clinical application. Objective: This study aimed to develop and validate an evaluation framework for assessing the accuracy and clinical applicability of LLM-generated emergency department (ED) records, aiming to enhance artificial intelligence integration in health care documentation. Methods: We organized the Healthcare Prompt-a-thon, a competitive event designed to explore the capabilities of LLMs in generating accurate medical records. The event involved 52 participants who generated 33 initial ED records using HyperCLOVA X, a Korean-specialized LLM. We applied a dual evaluation approach. First, clinical evaluation: 4 medical professionals evaluated the records using a 5-point Likert scale across 5 criteria?appropriateness, accuracy, structure/format, conciseness, and clinical validity. Second, quantitative evaluation: We developed a framework to categorize and count errors in the LLM outputs, identifying 7 key error types. Statistical methods, including Pearson correlation and intraclass correlation coefficients (ICC), were used to assess consistency and agreement among evaluators. Results: The clinical evaluation demonstrated strong interrater reliability, with ICC values ranging from 0.653 to 0.887 (P<.001), and a test-retest reliability Pearson correlation coefficient of 0.776 (P<.001). Quantitative analysis revealed that invalid generation errors were the most common, constituting 35.38% of total errors, while structural malformation errors had the most significant negative impact on the clinical evaluation score (Pearson r=?0.654; P<.001). A strong negative correlation was found between the number of quantitative errors and clinical evaluation scores (Pearson r=?0.633; P<.001), indicating that higher error rates corresponded to lower clinical acceptability. Conclusions: Our research provides robust support for the reliability and clinical acceptability of the proposed evaluation framework. It underscores the framework?s potential to mitigate clinical burdens and foster the responsible integration of artificial intelligence technologies in health care, suggesting a promising direction for future research and practical applications in the field. UR - https://www.jmir.org/2024/1/e58329 UR - http://dx.doi.org/10.2196/58329 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58329 ER - TY - JOUR AU - Wang, Yipei AU - Zhang, Pei AU - Xing, Yan AU - Shi, Huifeng AU - Cui, Yunpu AU - Wei, Yuan AU - Zhang, Ke AU - Wu, Xinxia AU - Ji, Hong AU - Xu, Xuedong AU - Dong, Yanhui AU - Jin, Changxiao PY - 2024/11/19 TI - Telemedicine Integrated Care Versus In-Person Care Mode for Patients With Short Stature: Comprehensive Comparison of a Retrospective Cohort Study JO - J Med Internet Res SP - e57814 VL - 26 KW - telemedicine KW - telemedicine integrated care mode KW - short stature KW - clinical outcomes KW - health-seeking behaviors KW - cost analysis KW - in-person care KW - mobile health KW - mHealth KW - telehealth KW - eHealth KW - video virtual visit KW - access to care KW - children KW - pediatrics KW - China KW - accessibility KW - temporal KW - spatial constraints KW - chronic disease N2 - Background: Telemedicine has demonstrated efficacy as a supplement to traditional in-person care when treating certain diseases. Nevertheless, more investigation is needed to comprehensively assess its potential as an alternative to in-person care and its influence on access to care. The successful treatment of short stature relies on timely and regular intervention, particularly in rural and economically disadvantaged regions where the disease is more prevalent. Objective: This study evaluated the clinical outcomes, health-seeking behaviors, and cost of telemedicine integrated into care for children with short stature in China. Methods: Our study involved 1241 individuals diagnosed with short stature at the pediatric outpatient clinic of Peking University Third Hospital between 2012 and 2023. Patients were divided into in-person care (IPC; 1183 patients receiving only in-person care) and telemedicine integrated care (TIC; 58 patients receiving both in-person and virtual care) groups. For both groups, the initial 71.43% (average of 58 percentages, with each percentage representing the ratio of patients in the treatment group) of visits were categorized into the pretelemedicine phase. We used propensity score matching to select individuals with similar baseline conditions. We used 7 variables such as age, gender, and medical insurance for the 1:5 closest neighbor match. Eventually, 115 patients in the IPC group and 54 patients in the TIC group were selected. The primary clinical outcome was the change in the standard height percentage. Health-seeking behavior was described by visit intervals in the pre- and post-telemedicine phases. The cost analysis compared costs both between different groups and between different visit modalities of the TIC group in the post-telemedicine phase. Results: In terms of clinical effectiveness, we demonstrated that the increase in height among the TIC group (?zTIC=0.74) was more substantial than that for the IPC group (?zIPC=0.51, P=.01; paired t test), while no unfavorable changes in other endpoints such as BMI or insulin-like growth factor 1 (IGF-1) levels were observed. As for health-seeking behaviors, the results showed that, during the post-telemedicine phase, the IPC group had a visit interval of 71.08 (IQR 50.75-90.73) days, significantly longer than the prior period (51.25 [IQR 34.75-82.00] days, P<.001; U test), whereas the TIC group?s visit interval remained unchanged. As for the cost per visit, there was no difference in the average cost per visit between the 2 groups nor between the pre- and post-telemedicine phases. During the post-telemedicine phase, within the TIC group, in-person visits had a higher average total cost, elevated medical and labor expenses, and greater medical cost compared with virtual visits. Conclusions: We contend that the rise in medical visits facilitated by integrating telemedicine into care effectively restored the previously constrained number of medical visits to their usual levels, without increasing costs. Our research underscores that administering prompt treatment may enable physicians to seize a crucial treatment opportunity for children with short stature, thus attaining superior results. UR - https://www.jmir.org/2024/1/e57814 UR - http://dx.doi.org/10.2196/57814 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57814 ER - TY - JOUR AU - Liu, Shuimei AU - Guo, Raymond L. PY - 2024/11/19 TI - Data Ownership in the AI-Powered Integrative Health Care Landscape JO - JMIR Med Inform SP - e57754 VL - 12 KW - data ownership KW - integrative healthcare KW - artificial intelligence KW - AI KW - ownership KW - data science KW - governance KW - consent KW - privacy KW - security KW - access KW - model KW - framework KW - transparency UR - https://medinform.jmir.org/2024/1/e57754 UR - http://dx.doi.org/10.2196/57754 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57754 ER - TY - JOUR AU - Mayito, Jonathan AU - Tumwine, Conrad AU - Galiwango, Ronald AU - Nuwamanya, Elly AU - Nakasendwa, Suzan AU - Hope, Mackline AU - Kiggundu, Reuben AU - Byonanebye, M. Dathan AU - Dhikusooka, Flavia AU - Twemanye, Vivian AU - Kambugu, Andrew AU - Kakooza, Francis PY - 2024/11/8 TI - Combating Antimicrobial Resistance Through a Data-Driven Approach to Optimize Antibiotic Use and Improve Patient Outcomes: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e58116 VL - 13 KW - antimicrobial resistance KW - AMR database KW - AMR KW - machine learning KW - antimicrobial use KW - artificial intelligence KW - antimicrobial KW - data-driven KW - mixed-method KW - patient outcome KW - drug-resistant infections KW - drug resistant KW - surveillance data KW - economic KW - antibiotic N2 - Background: It is projected that drug-resistant infections will lead to 10 million deaths annually by 2050 if left unabated. Despite this threat, surveillance data from resource-limited settings are scarce and often lack antimicrobial resistance (AMR)?related clinical outcomes and economic burden. We aim to build an AMR and antimicrobial use (AMU) data warehouse, describe the trends of resistance and antibiotic use, determine the economic burden of AMR in Uganda, and develop a machine learning algorithm to predict AMR-related clinical outcomes. Objective: The overall objective of the study is to use data-driven approaches to optimize antibiotic use and combat antimicrobial-resistant infections in Uganda. We aim to (1) build a dynamic AMR and antimicrobial use and consumption (AMUC) data warehouse to support research in AMR and AMUC to inform AMR-related interventions and public health policy, (2) evaluate the trends in AMR and antibiotic use based on annual antibiotic and point prevalence survey data collected at 9 regional referral hospitals over a 5-year period, (3) develop a machine learning model to predict the clinical outcomes of patients with bacterial infectious syndromes due to drug-resistant pathogens, and (4) estimate the annual economic burden of AMR in Uganda using the cost-of-illness approach. Methods: We will conduct a study involving data curation, machine learning?based modeling, and cost-of-illness analysis using AMR and AMU data abstracted from procurement, human resources, and clinical records of patients with bacterial infectious syndromes at 9 regional referral hospitals in Uganda collected between 2018 and 2026. We will use data curation procedures, FLAIR (Findable, Linkable, Accessible, Interactable and Repeatable) principles, and role-based access control to build a robust and dynamic AMR and AMU data warehouse. We will also apply machine learning algorithms to model AMR-related clinical outcomes, advanced statistical analysis to study AMR and AMU trends, and cost-of-illness analysis to determine the AMR-related economic burden. Results: The study received funding from the Wellcome Trust through the Centers for Antimicrobial Optimisation Network (CAMO-Net) in April 2023. As of October 28, 2024, we completed data warehouse development, which is now under testing; completed data curation of the historical Fleming Fund surveillance data (2020-2023); and collected retrospective AMR records for 599 patients that contained clinical outcomes and cost-of-illness economic burden data across 9 surveillance sites for objectives 3 and 4, respectively. Conclusions: The data warehouse will promote access to rich and interlinked AMR and AMU data sets to answer AMR program and research questions using a wide evidence base. The AMR-related clinical outcomes model and cost data will facilitate improvement in the clinical management of AMR patients and guide resource allocation to support AMR surveillance and interventions. International Registered Report Identifier (IRRID): PRR1-10.2196/58116 UR - https://www.researchprotocols.org/2024/1/e58116 UR - http://dx.doi.org/10.2196/58116 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58116 ER - TY - JOUR AU - Bhavaraju, L. Vasudha AU - Panchanathan, Sarada AU - Willis, C. Brigham AU - Garcia-Filion, Pamela PY - 2024/11/6 TI - Leveraging the Electronic Health Record to Measure Resident Clinical Experiences and Identify Training Gaps: Development and Usability Study JO - JMIR Med Educ SP - e53337 VL - 10 KW - clinical informatics KW - electronic health record KW - pediatric resident KW - COVID-19 KW - competence-based medical education KW - pediatric KW - children KW - SARS-CoV-2 KW - clinic KW - urban KW - diagnosis KW - health informatics KW - EHR KW - individualized learning plan N2 - Background: Competence-based medical education requires robust data to link competence with clinical experiences. The SARS-CoV-2 (COVID-19) pandemic abruptly altered the standard trajectory of clinical exposure in medical training programs. Residency program directors were tasked with identifying and addressing the resultant gaps in each trainee?s experiences using existing tools. Objective: This study aims to demonstrate a feasible and efficient method to capture electronic health record (EHR) data that measure the volume and variety of pediatric resident clinical experiences from a continuity clinic; generate individual-, class-, and graduate-level benchmark data; and create a visualization for learners to quickly identify gaps in clinical experiences. Methods: This pilot was conducted in a large, urban pediatric residency program from 2016 to 2022. Through consensus, 5 pediatric faculty identified diagnostic groups that pediatric residents should see to be competent in outpatient pediatrics. Information technology consultants used International Classification of Diseases, Tenth Revision (ICD-10) codes corresponding with each diagnostic group to extract EHR patient encounter data as an indicator of exposure to the specific diagnosis. The frequency (volume) and diagnosis types (variety) seen by active residents (classes of 2020?2022) were compared with class and graduated resident (classes of 2016?2019) averages. These data were converted to percentages and translated to a radar chart visualization for residents to quickly compare their current clinical experiences with peers and graduates. Residents were surveyed on the use of these data and the visualization to identify training gaps. Results: Patient encounter data about clinical experiences for 102 residents (N=52 graduates) were extracted. Active residents (n=50) received data reports with radar graphs biannually: 3 for the classes of 2020 and 2021 and 2 for the class of 2022. Radar charts distinctly demonstrated gaps in diagnoses exposure compared with classmates and graduates. Residents found the visualization useful in setting clinical and learning goals. Conclusions: This pilot describes an innovative method of capturing and presenting data about resident clinical experiences, compared with peer and graduate benchmarks, to identify learning gaps that may result from disruptions or modifications in medical training. This methodology can be aggregated across specialties and institutions and potentially inform competence-based medical education. UR - https://mededu.jmir.org/2024/1/e53337 UR - http://dx.doi.org/10.2196/53337 ID - info:doi/10.2196/53337 ER - TY - JOUR AU - Subramanian, Hemang AU - Sengupta, Arijit AU - Xu, Yilin PY - 2024/11/6 TI - Patient Health Record Protection Beyond the Health Insurance Portability and Accountability Act: Mixed Methods Study JO - J Med Internet Res SP - e59674 VL - 26 KW - security KW - privacy KW - security breach KW - breach report KW - health care KW - health care infrastructure KW - regulatory KW - law enforcement KW - Omnibus Rule KW - qualitative analysis KW - AI-generated data KW - artificial intelligence KW - difference-in-differences KW - best practice KW - data privacy KW - safe practice N2 - Background: The security and privacy of health care information are crucial for maintaining the societal value of health care as a public good. However, governance over electronic health care data has proven inefficient, despite robust enforcement efforts. Both federal (HIPAA [Health Insurance Portability and Accountability Act]) and state regulations, along with the ombudsman rule, have not effectively reduced the frequency or impact of data breaches in the US health care system. While legal frameworks have bolstered data security, recent years have seen a concerning increase in breach incidents. This paper investigates common breach types and proposes best practices derived from the data as potential solutions. Objective: The primary aim of this study is to analyze health care and hospital breach data, comparing it against HIPAA compliance levels across states (spatial analysis) and the impact of the Omnibus Rule over time (temporal analysis). The goal is to establish guidelines for best practices in handling sensitive information within hospitals and clinical environments. Methods: The study used data from the Department of Health and Human Services on reported breaches, assessing the severity and impact of each breach type. We then analyzed secondary data to examine whether HIPAA?s storage and retention rule amendments have influenced security and privacy incidents across all 50 states. Finally, we conducted a qualitative analysis of textual data from vulnerability and breach reports to identify actionable best practices for health care settings. Results: Our findings indicate that hacking or IT incidents have the most significant impact on the number of individuals affected, highlighting this as a primary breach category. The overall difference-in-differences trend reveals no significant reduction in breach rates (P=.50), despite state-level regulations exceeding HIPAA requirements and the introduction of the ombudsman rule. This persistence in breach trends implies that even strengthened protections and additional guidelines have not effectively curbed the rising number of affected individuals. Through qualitative analysis, we identified 15 unique values and associated best practices from industry standards. Conclusions: Combining quantitative and qualitative insights, we propose the ?SecureSphere framework? to enhance data security in health care institutions. This framework presents key security values structured in concentric circles: core values at the center and peripheral values around them. The core values include employee management, policy, procedures, and IT management. Peripheral values encompass the remaining security attributes that support these core elements. This structured approach provides a comprehensive security strategy for protecting patient health information and is designed to help health care organizations develop sustainable practices for data security. UR - https://www.jmir.org/2024/1/e59674 UR - http://dx.doi.org/10.2196/59674 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59674 ER - TY - JOUR AU - Brehmer, Alexander AU - Sauer, Martin Christopher AU - Salazar Rodríguez, Jayson AU - Herrmann, Kelsey AU - Kim, Moon AU - Keyl, Julius AU - Bahnsen, Hendrik Fin AU - Frank, Benedikt AU - Köhrmann, Martin AU - Rassaf, Tienush AU - Mahabadi, Amir-Abbas AU - Hadaschik, Boris AU - Darr, Christopher AU - Herrmann, Ken AU - Tan, Susanne AU - Buer, Jan AU - Brenner, Thorsten AU - Reinhardt, Christian Hans AU - Nensa, Felix AU - Gertz, Michael AU - Egger, Jan AU - Kleesiek, Jens PY - 2024/10/31 TI - Establishing Medical Intelligence?Leveraging Fast Healthcare Interoperability Resources to Improve Clinical Management: Retrospective Cohort and Clinical Implementation Study JO - J Med Internet Res SP - e55148 VL - 26 KW - clinical informatics KW - FHIR KW - real-world evidence KW - medical intelligence KW - interoperability KW - data exchange KW - clinical management KW - clinical decision-making KW - electronic health records KW - quality of care KW - quality improvement N2 - Background: FHIR (Fast Healthcare Interoperability Resources) has been proposed to enable health data interoperability. So far, its applicability has been demonstrated for selected research projects with limited data. Objective: This study aimed to design and implement a conceptual medical intelligence framework to leverage real-world care data for clinical decision-making. Methods: A Python package for the use of multimodal FHIR data (FHIRPACK [FHIR Python Analysis Conversion Kit]) was developed and pioneered in 5 real-world clinical use cases, that is, myocardial infarction, stroke, diabetes, sepsis, and prostate cancer. Patients were identified based on the ICD-10 (International Classification of Diseases, Tenth Revision) codes, and outcomes were derived from laboratory tests, prescriptions, procedures, and diagnostic reports. Results were provided as browser-based dashboards. Results: For 2022, a total of 1,302,988 patient encounters were analyzed. (1) Myocardial infarction: in 72.7% (261/359) of cases, medication regimens fulfilled guideline recommendations. (2) Stroke: out of 1277 patients, 165 received thrombolysis and 108 thrombectomy. (3) Diabetes: in 443,866 serum glucose and 16,180 glycated hemoglobin A1c measurements from 35,494 unique patients, the prevalence of dysglycemic findings was 39% (13,887/35,494). Among those with dysglycemia, diagnosis was coded in 44.2% (6138/13,887) of the patients. (4) Sepsis: In 1803 patients, Staphylococcus epidermidis was the primarily isolated pathogen (773/2672, 28.9%) and piperacillin and tazobactam was the primarily prescribed antibiotic (593/1593, 37.2%). (5) PC: out of 54, three patients who received radical prostatectomy were identified as cases with prostate-specific antigen persistence or biochemical recurrence. Conclusions: Leveraging FHIR data through large-scale analytics can enhance health care quality and improve patient outcomes across 5 clinical specialties. We identified (1) patients with sepsis requiring less broad antibiotic therapy, (2) patients with myocardial infarction who could benefit from statin and antiplatelet therapy, (3) patients who had a stroke with longer than recommended times to intervention, (4) patients with hyperglycemia who could benefit from specialist referral, and (5) patients with PC with early increases in cancer markers. UR - https://www.jmir.org/2024/1/e55148 UR - http://dx.doi.org/10.2196/55148 UR - http://www.ncbi.nlm.nih.gov/pubmed/39240144 ID - info:doi/10.2196/55148 ER - TY - JOUR AU - Wang, Xuan AU - Plantinga, M. Anna AU - Xiong, Xin AU - Cromer, J. Sara AU - Bonzel, Clara-Lea AU - Panickan, Vidul AU - Duan, Rui AU - Hou, Jue AU - Cai, Tianxi PY - 2024/10/22 TI - Comparing Insulin Against Glucagon-Like Peptide-1 Receptor Agonists, Dipeptidyl Peptidase-4 Inhibitors, and Sodium-Glucose Cotransporter 2 Inhibitors on 5-Year Incident Heart Failure Risk for Patients With Type 2 Diabetes Mellitus: Real-World Evidence Study Using Insurance Claims JO - JMIR Diabetes SP - e58137 VL - 9 KW - type 2 diabetes mellitus KW - diabetes KW - diabetes complications KW - heart failure KW - antidiabetic drug KW - diabetes pharmacotherapy KW - insulin KW - GLP-1 RA KW - DPP-4I KW - SGLT2I KW - real-world data KW - insurance data KW - claims data KW - glucagon-like peptide-1 receptor agonist KW - dipeptidyl peptidase-4 inhibitor KW - sodium-glucose cotransporter 2 inhibitor N2 - Background: Type 2 diabetes mellitus (T2DM) is a common health issue, with heart failure (HF) being a common and lethal long-term complication. Although insulin is widely used for the treatment of T2DM, evidence regarding the efficacy of insulin compared to noninsulin therapies on incident HF risk is missing among randomized controlled trials. Real-world evidence on insulin?s effect on long-term HF risk may supplement existing guidelines on the management of T2DM. Objective: This study aimed to compare insulin therapy against other medications on HF risk among patients with T2DM using real-world data extracted from insurance claims. Methods: A retrospective, observational study was conducted based on insurance claims data from a single health care network. The study period was from January 1, 2016, to August 11, 2021. The cohort was defined as patients having a T2DM diagnosis code. The inclusion criteria were patients who had at least 1 record of a glycated hemoglobin laboratory test result; full insurance for at least 1 year (either commercial or Medicare Part D); and received glucose-lowering therapy belonging to 1 of the following groups: insulin, glucagon-like peptide 1 receptor agonists (GLP-1 RAs), dipeptidyl peptidase-4 inhibitors (DPP-4Is), or sodium-glucose cotransporter-2 inhibitors (SGLT2Is). The main outcome was the 5-year incident HF rate. Baseline covariates, including demographic characteristics, comorbidities, and laboratory test results, were adjusted to correct for confounding. Results: After adjusting for a broad list of confounders, patients receiving insulin were found to be associated with an 11.8% (95% CI 11.0%?12.7%), 12.0% (95% CI 11.5%?12.4%), and 15.1% (95% CI 14.3%?16.0%) higher 5-year HF rate compared to those using GLP-1 RAs, DPP-4Is, and SGLT2Is, respectively. Subgroup analysis showed that insulin?s effect of a higher HF rate was significant in the subgroup with high HF risk but not significant in the subgroup with low HF risk. Conclusions: This study generated real-world evidence on the association of insulin therapy with a higher 5-year HF rate compared to GLP-1 RAs, DPP-4Is, and SGLT2Is based on insurance claims data. These findings also demonstrated the value of real-world data for comparative effectiveness studies to complement established guidelines. On the other hand, the study shares the common limitations of observational studies. Even though high-dimensional confounders are adjusted, remaining confounding may exist and induce bias in the analysis. UR - https://diabetes.jmir.org/2024/1/e58137 UR - http://dx.doi.org/10.2196/58137 ID - info:doi/10.2196/58137 ER - TY - JOUR AU - Liu, Shengyu AU - Wang, Anran AU - Xiu, Xiaolei AU - Zhong, Ming AU - Wu, Sizhu PY - 2024/10/17 TI - Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study JO - JMIR Med Inform SP - e59782 VL - 12 KW - natural language processing KW - NLP KW - model evaluation KW - macrofactors KW - medical named entity recognition models N2 - Background: Named entity recognition (NER) models are essential for extracting structured information from unstructured medical texts by identifying entities such as diseases, treatments, and conditions, enhancing clinical decision-making and research. Innovations in machine learning, particularly those involving Bidirectional Encoder Representations From Transformers (BERT)?based deep learning and large language models, have significantly advanced NER capabilities. However, their performance varies across medical datasets due to the complexity and diversity of medical terminology. Previous studies have often focused on overall performance, neglecting specific challenges in medical contexts and the impact of macrofactors like lexical composition on prediction accuracy. These gaps hinder the development of optimized NER models for medical applications. Objective: This study aims to meticulously evaluate the performance of various NER models in the context of medical text analysis, focusing on how complex medical terminology affects entity recognition accuracy. Additionally, we explored the influence of macrofactors on model performance, seeking to provide insights for refining NER models and enhancing their reliability for medical applications. Methods: This study comprehensively evaluated 7 NER models?hidden Markov models, conditional random fields, BERT for Biomedical Text Mining, Big Transformer Models for Efficient Long-Sequence Attention, Decoding-enhanced BERT with Disentangled Attention, Robustly Optimized BERT Pretraining Approach, and Gemma?across 3 medical datasets: Revised Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), BioCreative V CDR, and Anatomical Entity Mention (AnatEM). The evaluation focused on prediction accuracy, resource use (eg, central processing unit and graphics processing unit use), and the impact of fine-tuning hyperparameters. The macrofactors affecting model performance were also screened using the multilevel factor elimination algorithm. Results: The fine-tuned BERT for Biomedical Text Mining, with balanced resource use, generally achieved the highest prediction accuracy across the Revised JNLPBA and AnatEM datasets, with microaverage (AVG_MICRO) scores of 0.932 and 0.8494, respectively, highlighting its superior proficiency in identifying medical entities. Gemma, fine-tuned using the low-rank adaptation technique, achieved the highest accuracy on the BioCreative V CDR dataset with an AVG_MICRO score of 0.9962 but exhibited variability across the other datasets (AVG_MICRO scores of 0.9088 on the Revised JNLPBA and 0.8029 on AnatEM), indicating a need for further optimization. In addition, our analysis revealed that 2 macrofactors, entity phrase length and the number of entity words in each entity phrase, significantly influenced model performance. Conclusions: This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models. UR - https://medinform.jmir.org/2024/1/e59782 UR - http://dx.doi.org/10.2196/59782 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59782 ER - TY - JOUR AU - Yusuf, K. Zainab AU - Dixon, G. William AU - Sharp, Charlotte AU - Cook, Louise AU - Holm, Søren AU - Sanders, Caroline PY - 2024/10/15 TI - Building and Sustaining Public Trust in Health Data Sharing for Musculoskeletal Research: Semistructured Interview and Focus Group Study JO - J Med Internet Res SP - e53024 VL - 26 KW - data sharing KW - public trust KW - musculoskeletal KW - marginalized communities KW - underserved communities N2 - Background: Although many people are supportive of their deidentified health care data being used for research, concerns about privacy, safety, and security of health care data remain. There is low awareness about how data are used for research and related governance. Transparency about how health data are used for research is crucial for building public trust. One proposed solution is to ensure that affected communities are notified, particularly marginalized communities where there has previously been a lack of engagement and mistrust. Objective: This study aims to explore patient and public perspectives on the use of deidentified data from electronic health records for musculoskeletal research and to explore ways to build and sustain public trust in health data sharing for a research program (known as ?the Data Jigsaw?) piloting new ways of using and analyzing electronic health data. Views and perspectives about how best to engage with local communities informed the development of a public notification campaign about the research. Methods: Qualitative methods data were generated from 20 semistructured interviews and 8 focus groups, comprising 48 participants in total with musculoskeletal conditions or symptoms, including 3 carers. A presentation about the use of health data for research and examples from the specific research projects within the program were used to trigger discussion. We worked in partnership with a patient and public involvement group throughout the research and cofacilitated wider community engagement. Results: Respondents were supportive of their health care data being shared for research purposes, but there was low awareness about how electronic health records are used for research. Security and governance concerns about data sharing were noted, including collaborations with external companies and accessing social care records. Project examples from the Data Jigsaw program were viewed positively after respondents knew more about how their data were being used to improve patient care. A range of different methods to build and sustain trust were deemed necessary by participants. Information was requested about: data management; individuals with access to the data (including any collaboration with external companies); the National Health Service?s national data opt-out; and research outcomes. It was considered important to enable in-person dialogue with affected communities in addition to other forms of information. Conclusions: The findings have emphasized the need for transparency and awareness about health data sharing for research, and the value of tailoring this to reflect current and local research where residents might feel more invested in the focus of research and the use of local records. Thus, the provision for targeted information within affected communities with accessible messages and community-based dialogue could help to build and sustain public trust. These findings can also be extrapolated to other conditions beyond musculoskeletal conditions, making the findings relevant to a much wider community. UR - https://www.jmir.org/2024/1/e53024 UR - http://dx.doi.org/10.2196/53024 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53024 ER - TY - JOUR AU - Rosenau, Lorenz AU - Gruendner, Julian AU - Kiel, Alexander AU - Köhler, Thomas AU - Schaffer, Bastian AU - Majeed, W. Raphael PY - 2024/10/14 TI - Bridging Data Models in Health Care With a Novel Intermediate Query Format for Feasibility Queries: Mixed Methods Study JO - JMIR Med Inform SP - e58541 VL - 12 KW - feasibility KW - FHIR KW - CQL KW - eligibility criteria KW - clinical research KW - intermediate query format KW - healthcare interoperability KW - cohort definition KW - query KW - queries KW - interoperability KW - interoperable KW - informatics KW - portal KW - portals KW - implementation KW - develop KW - development KW - ontology KW - ontologies KW - JSON N2 - Background: To advance research with clinical data, it is essential to make access to the available data as fast and easy as possible for researchers, which is especially challenging for data from different source systems within and across institutions. Over the years, many research repositories and data standards have been created. One of these is the Fast Healthcare Interoperability Resources (FHIR) standard, used by the German Medical Informatics Initiative (MII) to harmonize and standardize data across university hospitals in Germany. One of the first steps to make these data available is to allow researchers to create feasibility queries to determine the data availability for a specific research question. Given the heterogeneity of different query languages to access different data across and even within standards such as FHIR (eg, CQL and FHIR Search), creating an intermediate query syntax for feasibility queries reduces the complexity of query translation and improves interoperability across different research repositories and query languages. Objective: This study describes the creation and implementation of an intermediate query syntax for feasibility queries and how it integrates into the federated German health research portal (Forschungsdatenportal Gesundheit) and the MII. Methods: We analyzed the requirements for feasibility queries and the feasibility tools that are currently available in research repositories. Based on this analysis, we developed an intermediate query syntax that can be easily translated into different research repository?specific query languages. Results: The resulting Clinical Cohort Definition Language (CCDL) for feasibility queries combines inclusion criteria in a conjunctive normal form and exclusion criteria in a disjunctive normal form, allowing for additional filters like time or numerical restrictions. The inclusion and exclusion results are combined via an expression to specify feasibility queries. We defined a JSON schema for the CCDL, generated an ontology, and demonstrated the use and translatability of the CCDL across multiple studies and real-world use cases. Conclusions: We developed and evaluated a structured query syntax for feasibility queries and demonstrated its use in a real-world example as part of a research platform across 39 German university hospitals. UR - https://medinform.jmir.org/2024/1/e58541 UR - http://dx.doi.org/10.2196/58541 ID - info:doi/10.2196/58541 ER - TY - JOUR AU - Mao, Lijun AU - Yu, Zhen AU - Lin, Luotao AU - Sharma, Manoj AU - Song, Hualing AU - Zhao, Hailei AU - Xu, Xianglong PY - 2024/10/9 TI - Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms JO - JMIR Aging SP - e59810 VL - 7 KW - visual impairment KW - China KW - middle-aged and elderly adults KW - machine learning KW - prediction model N2 - Background: Visual impairment (VI) is a prevalent global health issue, affecting over 2.2 billion people worldwide, with nearly half of the Chinese population aged 60 years and older being affected. Early detection of high-risk VI is essential for preventing irreversible vision loss among Chinese middle-aged and older adults. While machine learning (ML) algorithms exhibit significant predictive advantages, their application in predicting VI risk among the general middle-aged and older adult population in China remains limited. Objective: This study aimed to predict VI and identify its determinants using ML algorithms. Methods: We used 19,047 participants from 4 waves of the China Health and Retirement Longitudinal Study (CHARLS) that were conducted between 2011 and 2018. To envisage the prevalence of VI, we generated a geographical distribution map. Additionally, we constructed a model using indicators of a self-reported questionnaire, a physical examination, and blood biomarkers as predictors. Multiple ML algorithms, including gradient boosting machine, distributed random forest, the generalized linear model, deep learning, and stacked ensemble, were used for prediction. We plotted receiver operating characteristic and calibration curves to assess the predictive performance. Variable importance analysis was used to identify key predictors. Results: Among all participants, 33.9% (6449/19,047) had VI. Qinghai, Chongqing, Anhui, and Sichuan showed the highest VI rates, while Beijing and Xinjiang had the lowest. The generalized linear model, gradient boosting machine, and stacked ensemble achieved acceptable area under curve values of 0.706, 0.710, and 0.715, respectively, with the stacked ensemble performing best. Key predictors included hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, high-density lipoprotein cholesterol, and arthritis or rheumatism. Conclusions: Nearly one-third of middle-aged and older adults in China had VI. The prevalence of VI shows regional variations, but there are no distinct east-west or north-south distribution differences. ML algorithms demonstrate accurate predictive capabilities for VI. The combination of prediction models and variable importance analysis provides valuable insights for the early identification and intervention of VI among Chinese middle-aged and older adults. UR - https://aging.jmir.org/2024/1/e59810 UR - http://dx.doi.org/10.2196/59810 ID - info:doi/10.2196/59810 ER - TY - JOUR AU - Choi, Kyungseon AU - Park, Jun Sang AU - Yoon, Hyuna AU - Choi, Seoyoon AU - Mun, Yongseok AU - Kim, Seok AU - Yoo, Sooyoung AU - Woo, Joon Se AU - Park, Hyung Kyu AU - Na, Junghyun AU - Suh, Sun Hae PY - 2024/10/8 TI - Patient-Centered Economic Burden of Diabetic Macular Edema: Retrospective Cohort Study JO - JMIR Public Health Surveill SP - e56741 VL - 10 KW - diabetic macular edema KW - economic burden KW - cost of illness KW - retrospective cohort study KW - patient-centered care KW - Observational Medical Outcomes Partnership Common Data Model N2 - Background: Diabetic macular edema (DME), a leading cause of blindness, requires treatment with costly drugs, such as anti?vascular endothelial growth factor (VEGF) agents. The prolonged use of these effective but expensive drugs results in an incremental economic burden for patients with DME compared with those with diabetes mellitus (DM) without DME. However, there are no studies on the long-term patient-centered economic burden of DME after reimbursement for anti-VEGFs. Objective: This retrospective cohort study aims to estimate the 3-year patient-centered economic burden of DME compared with DM without DME, using the Common Data Model. Methods: We used medical data from 1,903,603 patients (2003-2020), transformed and validated using the Observational Medical Outcomes Partnership Common Data Model from Seoul National University Bundang Hospital. We defined the group with DME as patients aged >18 years with nonproliferative diabetic retinopathy and intravitreal anti-VEGF or steroid prescriptions. As control, we defined the group with DM without DME as patients aged >18 years with DM or diabetic retinopathy without intravitreal anti-VEGF or steroid prescriptions. Propensity score matching, performed using a regularized logistic regression with a Laplace prior, addressed selection bias. We estimated direct medical costs over 3 years categorized into total costs, reimbursement costs, nonreimbursement costs, out-of-pocket costs, and costs covered by insurance, as well as healthcare resource utilization. An exponential conditional model and a count model estimated unbiased incremental patient-centered economic burden using generalized linear models and a zero-inflation model. Results: In a cohort of 454 patients with DME matched with 1640 patients with DM, the economic burden of DME was significantly higher than that of DM, with total costs over 3 years being 2.09 (95% CI 1.78-2.47) times higher. Reimbursement costs were 1.89 (95% CI 1.57-2.28) times higher in the group with DME than with the group with DM, while nonreimbursement costs were 2.54 (95% CI 2.12-3.06) times higher. Out-of-pocket costs and costs covered by insurance were also higher by a factor of 2.11 (95% CI 1.58-2.59) and a factor of 2.01 (95% CI 1.85-2.42), respectively. Patients with DME had a significantly higher number of outpatient (1.87-fold) and inpatient (1.99-fold) visits compared with those with DM (P<.001 in all cases). Conclusions: Patients with DME experience a heightened economic burden compared with diabetic patients without DME. The substantial and enduring economic impact observed in real-world settings underscores the need to alleviate patients? burden through preventive measures, effective management, appropriate reimbursement policies, and the development of innovative treatments. Strategies to mitigate the economic impact of DME should include proactive approaches such as expanding anti-VEGF reimbursement criteria, approving and reimbursing cost-effective drugs such as bevacizumab, advocating for proactive eye examinations, and embracing early diagnosis by ophthalmologists facilitated by cutting-edge methodologies such as artificial intelligence for patients with DM. UR - https://publichealth.jmir.org/2024/1/e56741 UR - http://dx.doi.org/10.2196/56741 UR - http://www.ncbi.nlm.nih.gov/pubmed/39378098 ID - info:doi/10.2196/56741 ER - TY - JOUR AU - Sang, Hyunji AU - Lee, Hojae AU - Park, Jaeyu AU - Kim, Sunyoung AU - Woo, Geol Ho AU - Koyanagi, Ai AU - Smith, Lee AU - Lee, Sihoon AU - Hwang, You-Cheol AU - Park, Sun Tae AU - Lim, Hyunjung AU - Yon, Keon Dong AU - Rhee, Youl Sang PY - 2024/10/3 TI - Machine Learning?Based Prediction of Neurodegenerative Disease in Patients With Type 2 Diabetes by Derivation and Validation in 2 Independent Korean Cohorts: Model Development and Validation Study JO - J Med Internet Res SP - e56922 VL - 26 KW - machine learning KW - neurodegenerative disease KW - diabetes mellitus KW - prediction KW - AdaBoost N2 - Background: Several machine learning (ML) prediction models for neurodegenerative diseases (NDs) in type 2 diabetes mellitus (T2DM) have recently been developed. However, the predictive power of these models is limited by the lack of multiple risk factors. Objective: This study aimed to assess the validity and use of an ML model for predicting the 3-year incidence of ND in patients with T2DM. Methods: We used data from 2 independent cohorts?the discovery cohort (1 hospital; n=22,311) and the validation cohort (2 hospitals; n=2915)?to predict ND. The outcome of interest was the presence or absence of ND at 3 years. We selected different ML-based models with hyperparameter tuning in the discovery cohort and conducted an area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort. Results: The study dataset included 22,311 (discovery) and 2915 (validation) patients with T2DM recruited between 2008 and 2022. ND was observed in 133 (0.6%) and 15 patients (0.5%) in the discovery and validation cohorts, respectively. The AdaBoost model had a mean AUROC of 0.82 (95% CI 0.79-0.85) in the discovery dataset. When this result was applied to the validation dataset, the AdaBoost model exhibited the best performance among the models, with an AUROC of 0.83 (accuracy of 78.6%, sensitivity of 78.6%, specificity of 78.6%, and balanced accuracy of 78.6%). The most influential factors in the AdaBoost model were age and cardiovascular disease. Conclusions: This study shows the use and feasibility of ML for assessing the incidence of ND in patients with T2DM and suggests its potential for use in screening patients. Further international studies are required to validate these findings. UR - https://www.jmir.org/2024/1/e56922 UR - http://dx.doi.org/10.2196/56922 UR - http://www.ncbi.nlm.nih.gov/pubmed/39361401 ID - info:doi/10.2196/56922 ER - TY - JOUR AU - Huang, Yanqun AU - Chen, Siyuan AU - Wang, Yongfeng AU - Ou, Xiaohong AU - Yan, Huanhuan AU - Gan, Xin AU - Wei, Zhixiao PY - 2024/10/3 TI - Analyzing Comorbidity Patterns in Patients With Thyroid Disease Using Large-Scale Electronic Medical Records: Network-Based Retrospective Observational Study JO - Interact J Med Res SP - e54891 VL - 13 KW - thyroid disease KW - comorbidity patterns KW - prevalence KW - network analysis KW - electronic medical records N2 - Background: Thyroid disease (TD) is a prominent endocrine disorder that raises global health concerns; however, its comorbidity patterns remain unclear. Objective: This study aims to apply a network-based method to comprehensively analyze the comorbidity patterns of TD using large-scale real-world health data. Methods: In this retrospective observational study, we extracted the comorbidities of adult patients with TD from both private and public data sets. All comorbidities were identified using ICD-10 (International Classification of Diseases, 10th Revision) codes at the 3-digit level, and those with a prevalence greater than 2% were analyzed. Patients were categorized into several subgroups based on sex, age, and disease type. A phenotypic comorbidity network (PCN) was constructed, where comorbidities served as nodes and their significant correlations were represented as edges, encompassing all patients with TD and various subgroups. The associations and differences in comorbidities within the PCN of each subgroup were analyzed and compared. The PageRank algorithm was used to identify key comorbidities. Results: The final cohorts included 18,311 and 50,242 patients with TD in the private and public data sets, respectively. Patients with TD demonstrated complex comorbidity patterns, with coexistence relationships differing by sex, age, and type of TD. The number of comorbidities increased with age. The most prevalent TDs were nontoxic goiter, hypothyroidism, hyperthyroidism, and thyroid cancer, while hypertension, diabetes, and lipoprotein metabolism disorders had the highest prevalence and PageRank values among comorbidities. Males and patients with benign TD exhibited a greater number of comorbidities, increased disease diversity, and stronger comorbidity associations compared with females and patients with thyroid cancer. Conclusions: Patients with TD exhibited complex comorbidity patterns, particularly with cardiocerebrovascular diseases and diabetes. The associations among comorbidities varied across different TD subgroups. This study aims to enhance the understanding of comorbidity patterns in patients with TD and improve the integrated management of these individuals. UR - https://www.i-jmr.org/2024/1/e54891 UR - http://dx.doi.org/10.2196/54891 UR - http://www.ncbi.nlm.nih.gov/pubmed/39361379 ID - info:doi/10.2196/54891 ER - TY - JOUR AU - Conderino, Sarah AU - Anthopolos, Rebecca AU - Albrecht, S. Sandra AU - Farley, M. Shannon AU - Divers, Jasmin AU - Titus, R. Andrea AU - Thorpe, E. Lorna PY - 2024/10/1 TI - Addressing Information Biases Within Electronic Health Record Data to Improve the Examination of Epidemiologic Associations With Diabetes Prevalence Among Young Adults: Cross-Sectional Study JO - JMIR Med Inform SP - e58085 VL - 12 KW - information bias KW - electronic health record KW - EHR KW - epidemiologic method KW - confounding factor KW - diabetes KW - epidemiology KW - young adult KW - cross-sectional study KW - risk factor KW - asthma KW - race KW - ethnicity KW - diabetic KW - diabetic adult N2 - Background: Electronic health records (EHRs) are increasingly used for epidemiologic research to advance public health practice. However, key variables are susceptible to missing data or misclassification within EHRs, including demographic information or disease status, which could affect the estimation of disease prevalence or risk factor associations. Objective: In this paper, we applied methods from the literature on missing data and causal inference to assess whether we could mitigate information biases when estimating measures of association between potential risk factors and diabetes among a patient population of New York City young adults. Methods: We estimated the odds ratio (OR) for diabetes by race or ethnicity and asthma status using EHR data from NYU Langone Health. Methods from the missing data and causal inference literature were then applied to assess the ability to control for misclassification of health outcomes in the EHR data. We compared EHR-based associations with associations observed from 2 national health surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health and Nutrition Examination Survey, representing traditional public health surveillance systems. Results: Observed EHR-based associations between race or ethnicity and diabetes were comparable to health survey-based estimates, but the association between asthma and diabetes was significantly overestimated (OREHR 3.01, 95% CI 2.86-3.18 vs ORBRFSS 1.23, 95% CI 1.09-1.40). Missing data and causal inference methods reduced information biases in these estimates, yielding relative differences from traditional estimates below 50% (ORMissingData 1.79, 95% CI 1.67-1.92 and ORCausal 1.42, 95% CI 1.34-1.51). Conclusions: Findings suggest that without bias adjustment, EHR analyses may yield biased measures of association, driven in part by subgroup differences in health care use. However, applying missing data or causal inference frameworks can help control for and, importantly, characterize residual information biases in these estimates. UR - https://medinform.jmir.org/2024/1/e58085 UR - http://dx.doi.org/10.2196/58085 ID - info:doi/10.2196/58085 ER - TY - JOUR AU - Prakash, Ravi AU - Dupre, E. Matthew AU - Østbye, Truls AU - Xu, Hanzhang PY - 2024/9/24 TI - Extracting Critical Information from Unstructured Clinicians? Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study JO - JMIR Aging SP - e57926 VL - 7 KW - electronic health record KW - EHR KW - electric medical record KW - EMR KW - patient record KW - health record KW - personal health record KW - PHR KW - unstructured data KW - rule based analysis KW - artificial intelligence KW - AI KW - large language model KW - LLM KW - natural language processing KW - NLP KW - deep learning KW - Alzheimer's disease and related dementias KW - AD KW - ADRD KW - Alzheimer's disease KW - dementia KW - geriatric syndromes N2 - Background: The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or ?hidden? in unstructured text fields and not readily available for clinicians to act upon. Objective: We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data. Methods: We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians? notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, ?mild dementia? and ?advanced Alzheimer disease?). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm. Results: We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F1-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F1-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence. Conclusions: Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems. UR - https://aging.jmir.org/2024/1/e57926 UR - http://dx.doi.org/10.2196/57926 UR - http://www.ncbi.nlm.nih.gov/pubmed/39316421 ID - info:doi/10.2196/57926 ER - TY - JOUR AU - Brahma, Arindam AU - Chatterjee, Samir AU - Seal, Kala AU - Fitzpatrick, Ben AU - Tao, Youyou PY - 2024/9/24 TI - Development of a Cohort Analytics Tool for Monitoring Progression Patterns in Cardiovascular Diseases: Advanced Stochastic Modeling Approach JO - JMIR Med Inform SP - e59392 VL - 12 KW - healthcare analytics KW - eHealth KW - disease monitoring KW - cardiovascular disease KW - disease progression model KW - myocardial KW - stroke KW - decision support KW - continuous-time Markov chain model KW - stochastic model KW - stochastic KW - Markov KW - cardiology KW - cardiovascular KW - heart KW - monitoring KW - progression N2 - Background: The World Health Organization (WHO) reported that cardiovascular diseases (CVDs) are the leading cause of death worldwide. CVDs are chronic, with complex progression patterns involving episodes of comorbidities and multimorbidities. When dealing with chronic diseases, physicians often adopt a ?watchful waiting? strategy, and actions are postponed until information is available. Population-level transition probabilities and progression patterns can be revealed by applying time-variant stochastic modeling methods to longitudinal patient data from cohort studies. Inputs from CVD practitioners indicate that tools to generate and visualize cohort transition patterns have many impactful clinical applications. The resultant computational model can be embedded in digital decision support tools for clinicians. However, to date, no study has attempted to accomplish this for CVDs. Objective: This study aims to apply advanced stochastic modeling methods to uncover the transition probabilities and progression patterns from longitudinal episodic data of patient cohorts with CVD and thereafter use the computational model to build a digital clinical cohort analytics artifact demonstrating the actionability of such models. Methods: Our data were sourced from 9 epidemiological cohort studies by the National Heart Lung and Blood Institute and comprised chronological records of 1274 patients associated with 4839 CVD episodes across 16 years. We then used the continuous-time Markov chain method to develop our model, which offers a robust approach to time-variant transitions between disease states in chronic diseases. Results: Our study presents time-variant transition probabilities of CVD state changes, revealing patterns of CVD progression against time. We found that the transition from myocardial infarction (MI) to stroke has the fastest transition rate (mean transition time 3, SD 0 days, because only 1 patient had a MI-to-stroke transition in the dataset), and the transition from MI to angina is the slowest (mean transition time 1457, SD 1449 days). Congestive heart failure is the most probable first episode (371/840, 44.2%), followed by stroke (216/840, 25.7%). The resultant artifact is actionable as it can act as an eHealth cohort analytics tool, helping physicians gain insights into treatment and intervention strategies. Through expert panel interviews and surveys, we found 9 application use cases of our model. Conclusions: Past research does not provide actionable cohort-level decision support tools based on a comprehensive, 10-state, continuous-time Markov chain model to unveil complex CVD progression patterns from real-world patient data and support clinical decision-making. This paper aims to address this crucial limitation. Our stochastic model?embedded artifact can help clinicians in efficient disease monitoring and intervention decisions, guided by objective data-driven insights from real patient data. Furthermore, the proposed model can unveil progression patterns of any chronic disease of interest by inputting only 3 data elements: a synthetic patient identifier, episode name, and episode time in days from a baseline date. UR - https://medinform.jmir.org/2024/1/e59392 UR - http://dx.doi.org/10.2196/59392 UR - http://www.ncbi.nlm.nih.gov/pubmed/39316426 ID - info:doi/10.2196/59392 ER - TY - JOUR AU - Tabari, Parinaz AU - Costagliola, Gennaro AU - De Rosa, Mattia AU - Boeker, Martin PY - 2024/9/24 TI - State-of-the-Art Fast Healthcare Interoperability Resources (FHIR)?Based Data Model and Structure Implementations: Systematic Scoping Review JO - JMIR Med Inform SP - e58445 VL - 12 KW - data model KW - Fast Healthcare Interoperability Resources KW - FHIR KW - interoperability KW - modeling KW - PRISMA N2 - Background: Data models are crucial for clinical research as they enable researchers to fully use the vast amount of clinical data stored in medical systems. Standardized data and well-defined relationships between data points are necessary to guarantee semantic interoperability. Using the Fast Healthcare Interoperability Resources (FHIR) standard for clinical data representation would be a practical methodology to enhance and accelerate interoperability and data availability for research. Objective: This research aims to provide a comprehensive overview of the state-of-the-art and current landscape in FHIR-based data models and structures. In addition, we intend to identify and discuss the tools, resources, limitations, and other critical aspects mentioned in the selected research papers. Methods: To ensure the extraction of reliable results, we followed the instructions of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. We analyzed the indexed articles in PubMed, Scopus, Web of Science, IEEE Xplore, the ACM Digital Library, and Google Scholar. After identifying, extracting, and assessing the quality and relevance of the articles, we synthesized the extracted data to identify common patterns, themes, and variations in the use of FHIR-based data models and structures across different studies. Results: On the basis of the reviewed articles, we could identify 2 main themes: dynamic (pipeline-based) and static data models. The articles were also categorized into health care use cases, including chronic diseases, COVID-19 and infectious diseases, cancer research, acute or intensive care, random and general medical notes, and other conditions. Furthermore, we summarized the important or common tools and approaches of the selected papers. These items included FHIR-based tools and frameworks, machine learning approaches, and data storage and security. The most common resource was ?Observation? followed by ?Condition? and ?Patient.? The limitations and challenges of developing data models were categorized based on the issues of data integration, interoperability, standardization, performance, and scalability or generalizability. Conclusions: FHIR serves as a highly promising interoperability standard for developing real-world health care apps. The implementation of FHIR modeling for electronic health record data facilitates the integration, transmission, and analysis of data while also advancing translational research and phenotyping. Generally, FHIR-based exports of local data repositories improve data interoperability for systems and data warehouses across different settings. However, ongoing efforts to address existing limitations and challenges are essential for the successful implementation and integration of FHIR data models. UR - https://medinform.jmir.org/2024/1/e58445 UR - http://dx.doi.org/10.2196/58445 UR - http://www.ncbi.nlm.nih.gov/pubmed/39316433 ID - info:doi/10.2196/58445 ER - TY - JOUR AU - Stanhope, Victoria AU - Yoo, Nari AU - Matthews, Elizabeth AU - Baslock, Daniel AU - Hu, Yuanyuan PY - 2024/9/20 TI - The Impact of Collaborative Documentation on Person-Centered Care: Textual Analysis of Clinical Notes JO - JMIR Med Inform SP - e52678 VL - 12 KW - person-centered care KW - collaborative documentation KW - natural language processing KW - concurrent documentation KW - clinical documentations KW - visit notes KW - community KW - health center KW - mental health center KW - textual analysis KW - clinical informatics KW - behavioral health KW - mental health KW - linguistic KW - linguistic inquiry KW - dictionary-based KW - sentence fragment KW - psychology KW - psychological KW - clinical information KW - decision-making KW - mental health services KW - clinical notes KW - NLP N2 - Background: Collaborative documentation (CD) is a behavioral health practice involving shared writing of clinic visit notes by providers and consumers. Despite widespread dissemination of CD, research on its effectiveness or impact on person-centered care (PCC) has been limited. Principles of PCC planning, a recovery-based approach to service planning that operationalizes PCC, can inform the measurement of person-centeredness within clinical documentation. Objective: This study aims to use the clinical informatics approach of natural language processing (NLP) to examine the impact of CD on person-centeredness in clinic visit notes. Using a dictionary-based approach, this study conducts a textual analysis of clinic notes from a community mental health center before and after staff were trained in CD. Methods: This study used visit notes (n=1981) from 10 providers in a community mental health center 6 months before and after training in CD. LIWC-22 was used to assess all notes using the Linguistic Inquiry and Word Count (LIWC) dictionary, which categorizes over 5000 linguistic and psychological words. Twelve LIWC categories were selected and mapped onto PCC planning principles through the consensus of 3 domain experts. The LIWC-22 contextualizer was used to extract sentence fragments from notes corresponding to LIWC categories. Then, fixed-effects modeling was used to identify differences in notes before and after CD training while accounting for nesting within the provider. Results: Sentence fragments identified by the contextualizing process illustrated how visit notes demonstrated PCC. The fixed effects analysis found a significant positive shift toward person-centeredness; this was observed in 6 of the selected LIWC categories post CD. Specifically, there was a notable increase in words associated with achievement (?=.774, P<.001), power (?=.831, P<.001), money (?=.204, P<.001), physical health (?=.427, P=.03), while leisure words decreased (?=?.166, P=.002). Conclusions: By using a dictionary-based approach, the study identified how CD might influence the integration of PCC principles within clinical notes. Although the results were mixed, the findings highlight the potential effectiveness of CD in enhancing person-centeredness in clinic notes. By leveraging NLP techniques, this research illuminated the value of narrative clinical notes in assessing the quality of care in behavioral health contexts. These findings underscore the promise of NLP for quality assurance in health care settings and emphasize the need for refining algorithms to more accurately measure PCC. UR - https://medinform.jmir.org/2024/1/e52678 UR - http://dx.doi.org/10.2196/52678 ID - info:doi/10.2196/52678 ER - TY - JOUR AU - Jones, S. Brie AU - DeWitt, E. Michael AU - Wenner, J. Jennifer AU - Sanders, W. John PY - 2024/9/12 TI - Lyme Disease Under-Ascertainment During the COVID-19 Pandemic in the United States: Retrospective Study JO - JMIR Public Health Surveill SP - e56571 VL - 10 KW - surveillance KW - ascertainment KW - Lyme diseases KW - vector-borne diseases KW - vector-borne disease KW - vector-borne pathogens KW - public health KW - Lyme disease KW - United States KW - North Carolina KW - COVID-19 KW - pandemic KW - hospital KW - hospitals KW - clinic-based KW - surveillance program KW - geospatial model KW - spatiotemporal N2 - Background: The COVID-19 pandemic resulted in a massive disruption in access to care and thus passive, hospital- and clinic-based surveillance programs. In 2020, the reported cases of Lyme disease were the lowest both across the United States and North Carolina in recent years. During this period, human contact patterns began to shift with higher rates of greenspace utilization and outdoor activities, putting more people into contact with potential vectors and associated vector-borne diseases. Lyme disease reporting relies on passive surveillance systems, which were likely disrupted by changes in health care?seeking behavior during the pandemic. Objective: This study aimed to quantify the likely under-ascertainment of cases of Lyme disease during the COVID-19 pandemic in the United States and North Carolina. Methods: We fitted publicly available, reported Lyme disease cases for both the United States and North Carolina prior to the year 2020 to predict the number of anticipated Lyme disease cases in the absence of the pandemic using a Bayesian modeling approach. We then compared the ratio of reported cases divided by the predicted cases to quantify the number of likely under-ascertained cases. We then fitted geospatial models to further quantify the spatial distribution of the likely under-ascertained cases and characterize spatial dynamics at local scales. Results: Reported cases of Lyme Disease were lower in 2020 in both the United States and North Carolina than prior years. Our findings suggest that roughly 14,200 cases may have gone undetected given historical trends prior to the pandemic. Furthermore, we estimate that only 40% to 80% of Lyme diseases cases were detected in North Carolina between August 2020 and February 2021, the peak months of the COVID-19 pandemic in both the United States and North Carolina, with prior ascertainment rates returning to normal levels after this period. Our models suggest both strong temporal effects with higher numbers of cases reported in the summer months as well as strong geographic effects. Conclusions: Ascertainment rates of Lyme disease were highly variable during the pandemic period both at national and subnational scales. Our findings suggest that there may have been a substantial number of unreported Lyme disease cases despite an apparent increase in greenspace utilization. The use of counterfactual modeling using spatial and historical trends can provide insight into the likely numbers of missed cases. Variable ascertainment of cases has implications for passive surveillance programs, especially in the trending of disease morbidity and outbreak detection, suggesting that other methods may be appropriate for outbreak detection during disturbances to these passive surveillance systems. UR - https://publichealth.jmir.org/2024/1/e56571 UR - http://dx.doi.org/10.2196/56571 ID - info:doi/10.2196/56571 ER - TY - JOUR AU - Zheng, Chengyi AU - Ackerson, Bradley AU - Qiu, Sijia AU - Sy, S. Lina AU - Daily, Vega Leticia I. AU - Song, Jeannie AU - Qian, Lei AU - Luo, Yi AU - Ku, H. Jennifer AU - Cheng, Yanjun AU - Wu, Jun AU - Tseng, Fu Hung PY - 2024/9/10 TI - Natural Language Processing Versus Diagnosis Code?Based Methods for Postherpetic Neuralgia Identification: Algorithm Development and Validation JO - JMIR Med Inform SP - e57949 VL - 12 KW - postherpetic neuralgia KW - herpes zoster KW - natural language processing KW - electronic health record KW - real-world data KW - artificial intelligence KW - development KW - validation KW - diagnosis KW - EHR KW - algorithm KW - EHR data KW - sensitivity KW - specificity KW - validation data KW - neuralgia KW - recombinant zoster vaccine N2 - Background: Diagnosis codes and prescription data are used in algorithms to identify postherpetic neuralgia (PHN), a debilitating complication of herpes zoster (HZ). Because of the questionable accuracy of codes and prescription data, manual chart review is sometimes used to identify PHN in electronic health records (EHRs), which can be costly and time-consuming. Objective: This study aims to develop and validate a natural language processing (NLP) algorithm for automatically identifying PHN from unstructured EHR data and to compare its performance with that of code-based methods. Methods: This retrospective study used EHR data from Kaiser Permanente Southern California, a large integrated health care system that serves over 4.8 million members. The source population included members aged ?50 years who received an incident HZ diagnosis and accompanying antiviral prescription between 2018 and 2020 and had ?1 encounter within 90?180 days of the incident HZ diagnosis. The study team manually reviewed the EHR and identified PHN cases. For NLP development and validation, 500 and 800 random samples from the source population were selected, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, and Matthews correlation coefficient (MCC) of NLP and the code-based methods were evaluated using chart-reviewed results as the reference standard. Results: The NLP algorithm identified PHN cases with a 90.9% sensitivity, 98.5% specificity, 82% PPV, and 99.3% NPV. The composite scores of the NLP algorithm were 0.89 (F-score) and 0.85 (MCC). The prevalences of PHN in the validation data were 6.9% (reference standard), 7.6% (NLP), and 5.4%?13.1% (code-based). The code-based methods achieved a 52.7%?61.8% sensitivity, 89.8%?98.4% specificity, 27.6%?72.1% PPV, and 96.3%?97.1% NPV. The F-scores and MCCs ranged between 0.45 and 0.59 and between 0.32 and 0.61, respectively. Conclusions: The automated NLP-based approach identified PHN cases from the EHR with good accuracy. This method could be useful in population-based PHN research. UR - https://medinform.jmir.org/2024/1/e57949 UR - http://dx.doi.org/10.2196/57949 ID - info:doi/10.2196/57949 ER - TY - JOUR AU - van der Meijden, Lise Siri AU - van Boekel, M. Anna AU - van Goor, Harry AU - Nelissen, GHH Rob AU - Schoones, W. Jan AU - Steyerberg, W. Ewout AU - Geerts, F. Bart AU - de Boer, GJ Mark AU - Arbous, Sesmu M. PY - 2024/9/10 TI - Automated Identification of Postoperative Infections to Allow Prediction and Surveillance Based on Electronic Health Record Data: Scoping Review JO - JMIR Med Inform SP - e57195 VL - 12 KW - postoperative infections KW - surveillance KW - prediction KW - surgery KW - artificial intelligence KW - chart review KW - electronic health record KW - scoping review KW - postoperative KW - surgical KW - infection KW - infections KW - predictions KW - predict KW - predictive KW - bacterial KW - machine learning KW - record KW - records KW - EHR KW - EHRs KW - synthesis KW - review methods KW - review methodology KW - search KW - searches KW - searching KW - scoping N2 - Background: Postoperative infections remain a crucial challenge in health care, resulting in high morbidity, mortality, and costs. Accurate identification and labeling of patients with postoperative bacterial infections is crucial for developing prediction models, validating biomarkers, and implementing surveillance systems in clinical practice. Objective: This scoping review aimed to explore methods for identifying patients with postoperative infections using electronic health record (EHR) data to go beyond the reference standard of manual chart review. Methods: We performed a systematic search strategy across PubMed, Embase, Web of Science (Core Collection), the Cochrane Library, and Emcare (Ovid), targeting studies addressing the prediction and fully automated surveillance (ie, without manual check) of diverse bacterial infections in the postoperative setting. For prediction modeling studies, we assessed the labeling methods used, categorizing them as either manual or automated. We evaluated the different types of EHR data needed for the surveillance and labeling of postoperative infections, as well as the performance of fully automated surveillance systems compared with manual chart review. Results: We identified 75 different methods and definitions used to identify patients with postoperative infections in studies published between 2003 and 2023. Manual labeling was the predominant method in prediction modeling research, 65% (49/75) of the identified methods use structured data, and 45% (34/75) use free text and clinical notes as one of their data sources. Fully automated surveillance systems should be used with caution because the reported positive predictive values are between 0.31 and 0.76. Conclusions: There is currently no evidence to support fully automated labeling and identification of patients with infections based solely on structured EHR data. Future research should focus on defining uniform definitions, as well as prioritizing the development of more scalable, automated methods for infection detection using structured EHR data. UR - https://medinform.jmir.org/2024/1/e57195 UR - http://dx.doi.org/10.2196/57195 UR - http://www.ncbi.nlm.nih.gov/pubmed/39255011 ID - info:doi/10.2196/57195 ER - TY - JOUR AU - Wen, Andrew AU - Wang, Liwei AU - He, Huan AU - Fu, Sunyang AU - Liu, Sijia AU - Hanauer, A. David AU - Harris, R. Daniel AU - Kavuluru, Ramakanth AU - Zhang, Rui AU - Natarajan, Karthik AU - Pavinkurve, P. Nishanth AU - Hajagos, Janos AU - Rajupet, Sritha AU - Lingam, Veena AU - Saltz, Mary AU - Elowsky, Corey AU - Moffitt, A. Richard AU - Koraishy, M. Farrukh AU - Palchuk, B. Matvey AU - Donovan, Jordan AU - Lingrey, Lora AU - Stone-DerHagopian, Garo AU - Miller, T. Robert AU - Williams, E. Andrew AU - Leese, J. Peter AU - Kovach, I. Paul AU - Pfaff, R. Emily AU - Zemmel, Mikhail AU - Pates, D. Robert AU - Guthe, Nick AU - Haendel, A. Melissa AU - Chute, G. Christopher AU - Liu, Hongfang AU - PY - 2024/9/9 TI - A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation JO - JMIR Med Inform SP - e49997 VL - 12 KW - natural language processing KW - clinical information extraction KW - clinical phenotyping KW - extract KW - extraction KW - NLP KW - phenotype KW - phenotyping KW - narratives KW - unstructured KW - PASC KW - COVID KW - COVID-19 KW - SARS-CoV-2 KW - OHNLP KW - Open Health Natural Language Processing N2 - Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). Objective: This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. Methods: We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. Results: An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. Conclusions: The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement. UR - https://medinform.jmir.org/2024/1/e49997 UR - http://dx.doi.org/10.2196/49997 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/49997 ER - TY - JOUR AU - Victoria-Castro, Maria Angela AU - Arora, Tanima AU - Simonov, Michael AU - Biswas, Aditya AU - Alausa, Jameel AU - Subair, Labeebah AU - Gerber, Brett AU - Nguyen, Andrew AU - Hsiao, Allen AU - Hintz, Richard AU - Yamamoto, Yu AU - Soufer, Robert AU - Desir, Gary AU - Wilson, Perry Francis AU - Villanueva, Merceditas PY - 2024/9/3 TI - Promoting Collaborative Scholarship During the COVID-19 Pandemic Through an Innovative COVID-19 Data Explorer and Repository at Yale School of Medicine: Development and Usability Study JO - JMIR Form Res SP - e52120 VL - 8 KW - COVID-19 KW - database KW - data access KW - interdepartmental communication KW - collaborative scholarship KW - clinical data KW - repository KW - researchers KW - large-scale database KW - innovation N2 - Background: The COVID-19 pandemic sparked a surge of research publications spanning epidemiology, basic science, and clinical science. Thanks to the digital revolution, large data sets are now accessible, which also enables real-time epidemic tracking. However, despite this, academic faculty and their trainees have been struggling to access comprehensive clinical data. To tackle this issue, we have devised a clinical data repository that streamlines research processes and promotes interdisciplinary collaboration. Objective: This study aimed to present an easily accessible up-to-date database that promotes access to local COVID-19 clinical data, thereby increasing efficiency, streamlining, and democratizing the research enterprise. By providing a robust database, a broad range of researchers (faculty and trainees) and clinicians from different areas of medicine are encouraged to explore and collaborate on novel clinically relevant research questions. Methods: A research platform, called the Yale Department of Medicine COVID-19 Explorer and Repository (DOM-CovX), was constructed to house cleaned, highly granular, deidentified, and continually updated data from over 18,000 patients hospitalized with COVID-19 from January 2020 to January 2023, across the Yale New Haven Health System. Data across several key domains were extracted including demographics, past medical history, laboratory values during hospitalization, vital signs, medications, imaging, procedures, and outcomes. Given the time-varying nature of several data domains, summary statistics were constructed to limit the computational size of the database and provide a reasonable data file that the broader research community could use for basic statistical analyses. The initiative also included a front-end user interface, the DOM-CovX Explorer, for simple data visualization of aggregate data. The detailed clinical data sets were made available for researchers after a review board process. Results: As of January 2023, the DOM-CovX Explorer has received 38 requests from different groups of scientists at Yale and the repository has expanded research capability to a diverse group of stakeholders including clinical and research-based faculty and trainees within 15 different surgical and nonsurgical specialties. A dedicated DOM-CovX team guides access and use of the database, which has enhanced interdepartmental collaborations, resulting in the publication of 16 peer-reviewed papers, 2 projects available in preprint servers, and 8 presentations in scientific conferences. Currently, the DOM-CovX Explorer continues to expand and improve its interface. The repository includes up to 3997 variables across 7 different clinical domains, with continued growth in response to researchers? requests and data availability. Conclusions: The DOM-CovX Data Explorer and Repository is a user-friendly tool for analyzing data and accessing a consistently updated, standardized, and large-scale database. Its innovative approach fosters collaboration, diversity of scholarly pursuits, and expands medical education. In addition, it can be applied to other diseases beyond COVID-19. UR - https://formative.jmir.org/2024/1/e52120 UR - http://dx.doi.org/10.2196/52120 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52120 ER - TY - JOUR AU - Valdivieso-Martinez, Bernardo AU - Lopez-Sanchez, Victoria AU - Sauri, Inma AU - Diaz, Javier AU - Calderon, Miguel Jose AU - Gas-Lopez, Eugenia Maria AU - Lidon, Laura AU - Philibert, Juliette AU - Lopez-Hontangas, Luis Jose AU - Navarro, David AU - Cuenca, Llanos AU - Forner, Jose Maria AU - Redon, Josep PY - 2024/9/3 TI - Impact of Long SARS-CoV-2 Omicron Infection on the Health Care Burden: Comparative Case-Control Study Between Omicron and Pre-Omicron Waves JO - JMIR Public Health Surveill SP - e53580 VL - 10 KW - Omicron KW - long COVID KW - post?COVID-19 KW - diagnostics KW - primary care KW - specialist KW - emergency department KW - hospitalization N2 - Background: Following the initial acute phase of COVID-19, health care resource use has escalated among individuals with SARS-CoV-2 infection. Objective: This study aimed to compare new diagnoses of long COVID and the demand for health services in the general population after the Omicron wave with those observed during the pre-Omicron waves, using similar assessment protocols for both periods and to analyze the influence of vaccination. Methods: This matched retrospective case-control study included patients of both sexes diagnosed with acute SARS-CoV-2 infection using reverse transcription polymerase chain reaction or antigen tests in the hospital microbiology laboratory during the pandemic period regardless of whether the patients were hospitalized. We included patients of all ages from 2 health care departments that cover 604,000 subjects. The population was stratified into 2 groups, youths (<18 years) and adults (?18 years). Patients were followed-up for 6 months after SARS-CoV-2 infection. Previous vaccination, new diagnoses, and the use of health care resources were recorded. Patients were compared with controls selected using a prospective score matched for age, sex, and the Charlson index. Results: A total of 41,577 patients with a history of prior COVID-19 infection were included, alongside an equivalent number of controls. This cohort encompassed 33,249 (80%) adults aged ?18 years and 8328 (20%) youths aged <18 years. Our analysis identified 40 new diagnoses during the observation period. The incidence rate per 100 patients over a 6-month period was 27.2 for vaccinated and 25.1 for unvaccinated adults (P=.09), while among youths, the corresponding rates were 25.7 for vaccinated and 36.7 for unvaccinated individuals (P<.001). Overall, the incidence of new diagnoses was notably higher in patients compared to matched controls. Additionally, vaccinated patients exhibited a reduced incidence of new diagnoses, particularly among women (P<.001) and younger patients (P<.001) irrespective of the number of vaccine doses administered and the duration since the last dose. Furthermore, an increase in the use of health care resources was observed in both adult and youth groups, albeit with lower figures noted in vaccinated individuals. In the comparative analysis between the pre-Omicron and Omicron waves, the incidence of new diagnoses was higher in the former; however, distinct patterns of diagnosis were evident. Specifically, depressed mood (P=.03), anosmia (P=.003), hair loss (P<.001), dyspnea (<0.001), chest pain (P=.04), dysmenorrhea (P<.001), myalgia (P=.011), weakness (P<.001), and tachycardia (P=.015) were more common in the pre-Omicron period. Similarly, health care resource use, encompassing primary care, specialist, and emergency services, was more pronounced in the pre-Omicron wave. Conclusions: The rise in new diagnoses following SARS-CoV-2 infection warrants attention due to its potential implications for health systems, which may necessitate the allocation of supplementary resources. The absence of vaccination protection presents a challenge to the health care system. UR - https://publichealth.jmir.org/2024/1/e53580 UR - http://dx.doi.org/10.2196/53580 UR - http://www.ncbi.nlm.nih.gov/pubmed/39226091 ID - info:doi/10.2196/53580 ER - TY - JOUR AU - Jimenez-Garcia, Rodrigo AU - Lopez-de-Andres, Ana AU - Hernandez-Barrera, Valentin AU - Zamorano-Leon, J. Jose AU - Cuadrado-Corrales, Natividad AU - de Miguel-Diez, Javier AU - del-Barrio, L. Jose AU - Jimenez-Sierra, Ana AU - Carabantes-Alarcon, David PY - 2024/8/27 TI - Hospitalizations for Food-Induced Anaphylaxis Between 2016 and 2021: Population-Based Epidemiologic Study JO - JMIR Public Health Surveill SP - e57340 VL - 10 KW - food-induced anaphylaxis KW - epidemiology KW - hospitalizations KW - in-hospital mortality N2 - Background: Food-induced anaphylaxis (FIA) is a major public health problem resulting in serious clinical complications, emergency department visits, hospitalization, and death. Objective: This study aims to assess the epidemiology and the trends in hospitalizations because of FIA in Spain between 2016 and 2021. Methods: An observational descriptive study was conducted using data from the Spanish National Hospital discharge database. Information was coded based on the International Classification of Diseases, Tenth Revision. The study population was analyzed by gender and age group and according to food triggers, clinical characteristics, admission to the intensive care unit, severity, and in-hospital mortality. The annual incidence of hospitalizations because of FIA per 100,000 person-years was estimated and analyzed using Poisson regression models. Multivariable logistic regression models were constructed to identify which variables were associated with severe FIA. Results: A total of 2161 hospital admissions for FIA were recorded in Spain from 2016 to 2021. The overall incidence rate was 0.77 cases per 100,000 person-years. The highest incidence was found in those aged <15 years (3.68), with lower figures among those aged 15 to 59 years (0.25) and ?60 years (0.29). Poisson regression showed a significant increase in incidence from 2016 to 2021 only among children (3.78 per 100,000 person-years vs 5.02 per 100,000 person-years; P=.04). The most frequent food triggers were ?milk and dairy products? (419/2161, 19.39% of cases) and ?peanuts or tree nuts and seeds? (409/2161, 18.93%). Of the 2161 patients, 256 (11.85%) were hospitalized because FIA required admission to the intensive care unit, and 11 (0.51%) patients died in the hospital. Among children, the most severe cases of FIA appeared in patients aged 0 to 4 years (40/99, 40%). Among adults, 69.4% (111/160) of cases occurred in those aged 15 to 59 years. Multivariable logistic regression showed the variables associated with severe FIA to be age 15 to 59 years (odds ratio 5.1, 95% CI 3.11-8.36), age ?60 years (odds ratio 3.87, 95% CI 1.99-7.53), and asthma (odds ratio 1.71,95% CI 1.12-2.58). Conclusions: In Spain, the incidence of hospitalization because of FIA increased slightly, although the only significant increase (P=.04) was among children. Even if in-hospital mortality remains low and stable, the proportion of severe cases is high and has not improved from 2016 to 2021, with older age and asthma being risk factors for severity. Surveillance must be improved, and preventive strategies must be implemented to reduce the burden of FIA. UR - https://publichealth.jmir.org/2024/1/e57340 UR - http://dx.doi.org/10.2196/57340 UR - http://www.ncbi.nlm.nih.gov/pubmed/38940759 ID - info:doi/10.2196/57340 ER - TY - JOUR AU - Tabaie, Azade AU - Tran, Alberta AU - Calabria, Tony AU - Bennett, S. Sonita AU - Milicia, Arianna AU - Weintraub, William AU - Gallagher, James William AU - Yosaitis, John AU - Schubel, C. Laura AU - Hill, A. Mary AU - Smith, Michelle Kelly AU - Miller, Kristen PY - 2024/8/26 TI - Evaluation of a Natural Language Processing Approach to Identify Diagnostic Errors and Analysis of Safety Learning System Case Review Data: Retrospective Cohort Study JO - J Med Internet Res SP - e50935 VL - 26 KW - diagnostic error KW - electronic health records KW - machine learning KW - natural language processing KW - NLP KW - mortality KW - hospital KW - risk KW - length of stay KW - patient harm KW - diagnostic KW - EHR N2 - Background: Diagnostic errors are an underappreciated cause of preventable mortality in hospitals and pose a risk for severe patient harm and increase hospital length of stay. Objective: This study aims to explore the potential of machine learning and natural language processing techniques in improving diagnostic safety surveillance. We conducted a rigorous evaluation of the feasibility and potential to use electronic health records clinical notes and existing case review data. Methods: Safety Learning System case review data from 1 large health system composed of 10 hospitals in the mid-Atlantic region of the United States from February 2016 to September 2021 were analyzed. The case review outcome included opportunities for improvement including diagnostic opportunities for improvement. To supplement case review data, electronic health record clinical notes were extracted and analyzed. A simple logistic regression model along with 3 forms of logistic regression models (ie, Least Absolute Shrinkage and Selection Operator, Ridge, and Elastic Net) with regularization functions was trained on this data to compare classification performances in classifying patients who experienced diagnostic errors during hospitalization. Further, statistical tests were conducted to find significant differences between female and male patients who experienced diagnostic errors. Results: In total, 126 (7.4%) patients (of 1704) had been identified by case reviewers as having experienced at least 1 diagnostic error. Patients who had experienced diagnostic error were grouped by sex: 59 (7.1%) of the 830 women and 67 (7.7%) of the 874 men. Among the patients who experienced a diagnostic error, female patients were older (median 72, IQR 66-80 vs median 67, IQR 57-76; P=.02), had higher rates of being admitted through general or internal medicine (69.5% vs 47.8%; P=.01), lower rates of cardiovascular-related admitted diagnosis (11.9% vs 28.4%; P=.02), and lower rates of being admitted through neurology department (2.3% vs 13.4%; P=.04). The Ridge model achieved the highest area under the receiver operating characteristic curve (0.885), specificity (0.797), positive predictive value (PPV; 0.24), and F1-score (0.369) in classifying patients who were at higher risk of diagnostic errors among hospitalized patients. Conclusions: Our findings demonstrate that natural language processing can be a potential solution to more effectively identifying and selecting potential diagnostic error cases for review and therefore reducing the case review burden. UR - https://www.jmir.org/2024/1/e50935 UR - http://dx.doi.org/10.2196/50935 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/50935 ER - TY - JOUR AU - Kar, Debasish AU - Taylor, S. Kathryn AU - Joy, Mark AU - Venkatesan, Sudhir AU - Meeraus, Wilhelmine AU - Taylor, Sylvia AU - Anand, N. Sneha AU - Ferreira, Filipa AU - Jamie, Gavin AU - Fan, Xuejuan AU - de Lusignan, Simon PY - 2024/8/26 TI - Creating a Modified Version of the Cambridge Multimorbidity Score to Predict Mortality in People Older Than 16 Years: Model Development and Validation JO - J Med Internet Res SP - e56042 VL - 26 KW - pandemics KW - COVID-19 KW - multimorbidity KW - prevalence KW - predictive model KW - discrimination KW - calibration KW - systematized nomenclature of medicine KW - computerized medical records KW - systems N2 - Background: No single multimorbidity measure is validated for use in NHS (National Health Service) England?s General Practice Extraction Service Data for Pandemic Planning and Research (GDPPR), the nationwide primary care data set created for COVID-19 pandemic research. The Cambridge Multimorbidity Score (CMMS) is a validated tool for predicting mortality risk, with 37 conditions defined by Read Codes. The GDPPR uses the more internationally used Systematized Nomenclature of Medicine clinical terms (SNOMED CT). We previously developed a modified version of the CMMS using SNOMED CT, but the number of terms for the GDPPR data set is limited making it impossible to use this version. Objective: We aimed to develop and validate a modified version of CMMS using the clinical terms available for the GDPPR. Methods: We used pseudonymized data from the Oxford-Royal College of General Practitioners Research and Surveillance Centre (RSC), which has an extensive SNOMED CT list. From the 37 conditions in the original CMMS model, we selected conditions either with (1) high prevalence ratio (?85%), calculated as the prevalence in the RSC data set but using the GDPPR set of SNOMED CT codes, divided by the prevalence included in the RSC SNOMED CT codes or (2) conditions with lower prevalence ratios but with high predictive value. The resulting set of conditions was included in Cox proportional hazard models to determine the 1-year mortality risk in a development data set (n=500,000) and construct a new CMMS model, following the methods for the original CMMS study, with variable reduction and parsimony, achieved by backward elimination and the Akaike information stopping criterion. Model validation involved obtaining 1-year mortality estimates for a synchronous data set (n=250,000) and 1-year and 5-year mortality estimates for an asynchronous data set (n=250,000). We compared the performance with that of the original CMMS and the modified CMMS that we previously developed using RSC data. Results: The initial model contained 22 conditions and our final model included 17 conditions. The conditions overlapped with those of the modified CMMS using the more extensive SNOMED CT list. For 1-year mortality, discrimination was high in both the derivation and validation data sets (Harrell C=0.92) and 5-year mortality was slightly lower (Harrell C=0.90). Calibration was reasonable following an adjustment for overfitting. The performance was similar to that of both the original and previous modified CMMS models. Conclusions: The new modified version of the CMMS can be used on the GDPPR, a nationwide primary care data set of 54 million people, to enable adjustment for multimorbidity in predicting mortality in people in real-world vaccine effectiveness, pandemic planning, and other research studies. It requires 17 variables to produce a comparable performance with our previous modification of CMMS to enable it to be used in routine data using SNOMED CT. UR - https://www.jmir.org/2024/1/e56042 UR - http://dx.doi.org/10.2196/56042 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56042 ER - TY - JOUR AU - Liu, Chao AU - Jiao, Yuanshi AU - Su, Licong AU - Liu, Wenna AU - Zhang, Haiping AU - Nie, Sheng AU - Gong, Mengchun PY - 2024/8/20 TI - Effective Privacy Protection Strategies for Pregnancy and Gestation Information From Electronic Medical Records: Retrospective Study in a National Health Care Data Network in China JO - J Med Internet Res SP - e46455 VL - 26 KW - pregnancy KW - electronic medical record KW - privacy protection KW - risk stratification KW - rule-based N2 - Background: Pregnancy and gestation information is routinely recorded in electronic medical record (EMR) systems across China in various data sets. The combination of data on the number of pregnancies and gestations can imply occurrences of abortions and other pregnancy-related issues, which is important for clinical decision-making and personal privacy protection. However, the distribution of this information inside EMR is variable due to inconsistent IT structures across different EMR systems. A large-scale quantitative evaluation of the potential exposure of this sensitive information has not been previously performed, ensuring the protection of personal information is a priority, as emphasized in Chinese laws and regulations. Objective: This study aims to perform the first nationwide quantitative analysis of the identification sites and exposure frequency of sensitive pregnancy and gestation information. The goal is to propose strategies for effective information extraction and privacy protection related to women?s health. Methods: This study was conducted in a national health care data network. Rule-based protocols for extracting pregnancy and gestation information were developed by a committee of experts. A total of 6 different sub?data sets of EMRs were used as schemas for data analysis and strategy proposal. The identification sites and frequencies of identification in different sub?data sets were calculated. Manual quality inspections of the extraction process were performed by 2 independent groups of reviewers on 1000 randomly selected records. Based on these statistics, strategies for effective information extraction and privacy protection were proposed. Results: The data network covered hospitalized patients from 19 hospitals in 10 provinces of China, encompassing 15,245,055 patients over an 11-year period (January 1, 2010-December 12, 2020). Among women aged 14-50 years, 70% were randomly selected from each hospital, resulting in a total of 1,110,053 patients. Of these, 688,268 female patients with sensitive reproductive information were identified. The frequencies of identification were variable, with the marriage history in admission medical records being the most frequent at 63.24%. Notably, more than 50% of female patients were identified with pregnancy and gestation history in nursing records, which is not generally considered a sub?data set rich in reproductive information. During the manual curation and review process, 1000 cases were randomly selected, and the precision and recall rates of the information extraction method both exceeded 99.5%. The privacy-protection strategies were designed with clear technical directions. Conclusions: Significant amounts of critical information related to women?s health are recorded in Chinese routine EMR systems and are distributed in various parts of the records with different frequencies. This requires a comprehensive protocol for extracting and protecting the information, which has been demonstrated to be technically feasible. Implementing a data-based strategy will enhance the protection of women?s privacy and improve the accessibility of health care services. UR - https://www.jmir.org/2024/1/e46455 UR - http://dx.doi.org/10.2196/46455 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/46455 ER - TY - JOUR AU - Tanaka, L. Hideaki AU - Rees, R. Judy AU - Zhang, Ziyin AU - Ptak, A. Judy AU - Hannigan, M. Pamela AU - Silverman, M. Elaine AU - Peacock, L. Janet AU - Buckey, C. Jay AU - PY - 2024/8/20 TI - Emerging Indications for Hyperbaric Oxygen Treatment: Registry Cohort Study JO - Interact J Med Res SP - e53821 VL - 13 KW - hyperbaric oxygen KW - inflammatory bowel disease KW - calciphylaxis KW - post?COVID-19 condition KW - PCC KW - postacute sequelae of COVID-19 KW - PASC KW - infected implanted hardware KW - hypospadias KW - frostbite KW - facial filler KW - pyoderma gangrenosum N2 - Background: Hyperbaric oxygen (HBO2) treatment is used across a range of medical specialties for a variety of applications, particularly where hypoxia and inflammation are important contributors. Because of its hypoxia-relieving and anti-inflammatory effects HBO2 may be useful for new indications not currently approved by the Undersea and Hyperbaric Medical Society. Identifying these new applications for HBO2 is difficult because individual centers may only treat a few cases and not track the outcomes consistently. The web-based International Multicenter Registry for Hyperbaric Oxygen Therapy captures prospective outcome data for patients treated with HBO2 therapy. These data can then be used to identify new potential applications for HBO2, which has relevance for a range of medical specialties. Objective: Although hyperbaric medicine has established indications, new ones continue to emerge. One objective of this registry study was to identify cases where HBO2 has been used for conditions falling outside of current Undersea and Hyperbaric Medical Society?approved indications and present outcome data for them. Methods: This descriptive study used data from a web-based, multicenter, international registry of patients treated with HBO2. Participating centers agree to collect data on all patients treated using standard outcome measures, and individual centers send deidentified data to the central registry. HBO2 treatment programs in the United States, the United Kingdom, and Australia participate. Demographic, outcome, complication, and treatment data, including pre- and posttreatment quality of life questionnaires (EQ-5D-5L) were collected for individuals referred for HBO2 treatment. Results: Out of 9726 patient entries, 378 (3.89%) individuals were treated for 45 emerging indications. Post?COVID-19 condition (PCC; also known as postacute sequelae of COVID-19; 149/378, 39.4%), ulcerative colitis (47/378, 12.4%), and Crohn disease (40/378, 10.6%) accounted for 62.4% (n=236) of the total cases. Calciphylaxis (20/378, 5.3%), frostbite (18/378, 4.8%), and peripheral vascular disease?related wounds (12/378, 3.2%) accounted for a further 13.2% (n=50). Patients with PCC reported significant improvement on the Neurobehavioral Symptom Inventory (NSI score: pretreatment=30.6; posttreatment=14.4; P<.001). Patients with Crohn disease reported significantly improved quality of life (EQ-5D score: pretreatment=53.8; posttreatment=68.8), and 5 (13%) reported closing a fistula. Patients with ulcerative colitis and complete pre- and post-HBO2 data reported improved quality of life and lower scores on a bowel questionnaire examining frequency, blood, pain, and urgency. A subset of patients with calciphylaxis and arterial ulcers also reported improvement. Conclusions: HBO2 is being used for a wide range of possible applications across various medical specialties for its hypoxia-relieving and anti-inflammatory effects. Results show statistically significant improvements in patient-reported outcomes for inflammatory bowel disease and PCC. HBO2 is also being used for frostbite, pyoderma gangrenosum, pterygium, hypospadias repair, and facial filler procedures. Other indications show evidence for improvement, and the case series for all indications is growing in the registry. International Registered Report Identifier (IRRID): RR2-10.2196/18857 UR - https://www.i-jmr.org/2024/1/e53821 UR - http://dx.doi.org/10.2196/53821 UR - http://www.ncbi.nlm.nih.gov/pubmed/39078624 ID - info:doi/10.2196/53821 ER - TY - JOUR AU - Kamdje Wabo, Gaetan AU - Moorthy, Preetha AU - Siegel, Fabian AU - Seuchter, A. Susanne AU - Ganslandt, Thomas PY - 2024/8/19 TI - Evaluating and Enhancing the Fitness-for-Purpose of Electronic Health Record Data: Qualitative Study on Current Practices and Pathway to an Automated Approach Within the Medical Informatics for Research and Care in University Medicine Consortium JO - JMIR Med Inform SP - e57153 VL - 12 KW - data quality KW - fitness-for-purpose KW - secondary use KW - thematic analysis KW - EHR data KW - electronic health record KW - data integration center KW - Medical Informatics Initiative KW - MIRACUM consortium KW - Medical Informatics for Research and Care in University Medicine KW - data science KW - integration KW - data use KW - visualization KW - visualizations KW - record KW - records KW - EHR KW - EHRs KW - survey KW - surveys KW - medical informatics N2 - Background: Leveraging electronic health record (EHR) data for clinical or research purposes heavily depends on data fitness. However, there is a lack of standardized frameworks to evaluate EHR data suitability, leading to inconsistent quality in data use projects (DUPs). This research focuses on the Medical Informatics for Research and Care in University Medicine (MIRACUM) Data Integration Centers (DICs) and examines empirical practices on assessing and automating the fitness-for-purpose of clinical data in German DIC settings. Objective: The study aims (1) to capture and discuss how MIRACUM DICs evaluate and enhance the fitness-for-purpose of observational health care data and examine the alignment with existing recommendations and (2) to identify the requirements for designing and implementing a computer-assisted solution to evaluate EHR data fitness within MIRACUM DICs. Methods: A qualitative approach was followed using an open-ended survey across DICs of 10 German university hospitals affiliated with MIRACUM. Data were analyzed using thematic analysis following an inductive qualitative method. Results: All 10 MIRACUM DICs participated, with 17 participants revealing various approaches to assessing data fitness, including the 4-eyes principle and data consistency checks such as cross-system data value comparison. Common practices included a DUP-related feedback loop on data fitness and using self-designed dashboards for monitoring. Most experts had a computer science background and a master?s degree, suggesting strong technological proficiency but potentially lacking clinical or statistical expertise. Nine key requirements for a computer-assisted solution were identified, including flexibility, understandability, extendibility, and practicability. Participants used heterogeneous data repositories for evaluating data quality criteria and practical strategies to communicate with research and clinical teams. Conclusions: The study identifies gaps between current practices in MIRACUM DICs and existing recommendations, offering insights into the complexities of assessing and reporting clinical data fitness. Additionally, a tripartite modular framework for fitness-for-purpose assessment was introduced to streamline the forthcoming implementation. It provides valuable input for developing and integrating an automated solution across multiple locations. This may include statistical comparisons to advanced machine learning algorithms for operationalizing frameworks such as the 3×3 data quality assessment framework. These findings provide foundational evidence for future design and implementation studies to enhance data quality assessments for specific DUPs in observational health care settings. UR - https://medinform.jmir.org/2024/1/e57153 UR - http://dx.doi.org/10.2196/57153 UR - http://www.ncbi.nlm.nih.gov/pubmed/39158950 ID - info:doi/10.2196/57153 ER - TY - JOUR AU - Butzin-Dozier, Zachary AU - Ji, Yunwen AU - Li, Haodong AU - Coyle, Jeremy AU - Shi, Junming AU - Phillips, V. Rachael AU - Mertens, N. Andrew AU - Pirracchio, Romain AU - van der Laan, J. Mark AU - Patel, C. Rena AU - Colford, M. John AU - Hubbard, E. Alan AU - PY - 2024/8/15 TI - Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study JO - JMIR Public Health Surveill SP - e53322 VL - 10 KW - long COVID KW - COVID-19 KW - machine learning KW - respiratory KW - infectious KW - SARS-CoV-2 KW - sequelae KW - chronic KW - long term KW - covariate KW - covariates KW - risk KW - risks KW - predict KW - prediction KW - predictions KW - predictive KW - Super Learner KW - ensemble KW - stacking N2 - Background: Postacute sequelae of COVID-19 (PASC), also known as long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19. These symptoms can occur across a range of biological systems, leading to challenges in determining risk factors for PASC and the causal etiology of this disorder. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited. Objective: Using a sample of 55,257 patients (at a ratio of 1 patient with PASC to 4 matched controls) from the National COVID Cohort Collaborative, as part of the National Institutes of Health Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. The National COVID Cohort Collaborative includes electronic health records for more than 22 million patients from 84 sites across the United States. Methods: We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal combination of gradient boosting and random forest algorithms to maximize the area under the receiver operator curve. We evaluated variable importance (Shapley values) based on 3 levels: individual features, temporal windows, and clinical domains. We externally validated these findings using a holdout set of randomly selected study sites. Results: We were able to predict individual PASC diagnoses accurately (area under the curve 0.874). The individual features of the length of observation period, number of health care interactions during acute COVID-19, and viral lower respiratory infection were the most predictive of subsequent PASC diagnosis. Temporally, we found that baseline characteristics were the most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after acute COVID-19. We found that the clinical domains of health care use, demographics or anthropometry, and respiratory factors were the most predictive of PASC diagnosis. Conclusions: The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings. Across individual predictors and clinical domains, we consistently found that factors related to health care use were the strongest predictors of PASC diagnosis. This indicates that any observational studies using PASC diagnosis as a primary outcome must rigorously account for heterogeneous health care use. Our temporal findings support the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients before acute COVID-19 diagnosis, which could improve early interventions and preventive care. Our findings also highlight the importance of respiratory characteristics in PASC risk assessment. International Registered Report Identifier (IRRID): RR2-10.1101/2023.07.27.23293272 UR - https://publichealth.jmir.org/2024/1/e53322 UR - http://dx.doi.org/10.2196/53322 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53322 ER - TY - JOUR AU - Fruchart, Mathilde AU - Quindroit, Paul AU - Jacquemont, Chloé AU - Beuscart, Jean-Baptiste AU - Calafiore, Matthieu AU - Lamer, Antoine PY - 2024/8/13 TI - Transforming Primary Care Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study JO - JMIR Med Inform SP - e49542 VL - 12 KW - data reuse KW - Observational Medical Outcomes Partnership KW - common data model KW - data warehouse KW - reproducible research KW - primary care KW - dashboard KW - electronic health record KW - patient tracking system KW - patient monitoring KW - EHR KW - primary care data N2 - Background: Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research. Objective: This study aimed to transform primary care data into the OMOP CDM format. Methods: We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard. Results: Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data. Conclusions: Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice. UR - https://medinform.jmir.org/2024/1/e49542 UR - http://dx.doi.org/10.2196/49542 ID - info:doi/10.2196/49542 ER - TY - JOUR AU - Metsallik, Janek AU - Draheim, Dirk AU - Sabic, Zlatan AU - Novak, Thomas AU - Ross, Peeter PY - 2024/8/8 TI - Assessing Opportunities and Barriers to Improving the Secondary Use of Health Care Data at the National Level: Multicase Study in the Kingdom of Saudi Arabia and Estonia JO - J Med Internet Res SP - e53369 VL - 26 KW - health data governance KW - secondary use KW - health information sharing maturity KW - large-scale interoperability KW - health data stewardship KW - health data custodianship KW - health information purpose KW - health data policy N2 - Background: Digitization shall improve the secondary use of health care data. The Government of the Kingdom of Saudi Arabia ordered a project to compile the National Master Plan for Health Data Analytics, while the Government of Estonia ordered a project to compile the Person-Centered Integrated Hospital Master Plan. Objective: This study aims to map these 2 distinct projects? problems, approaches, and outcomes to find the matching elements for reuse in similar cases. Methods: We assessed both health care systems? abilities for secondary use of health data by exploratory case studies with purposive sampling and data collection via semistructured interviews and documentation review. The collected content was analyzed qualitatively and coded according to a predefined framework. The analytical framework consisted of data purpose, flow, and sharing. The Estonian project used the Health Information Sharing Maturity Model from the Mitre Corporation as an additional analytical framework. The data collection and analysis in the Kingdom of Saudi Arabia took place in 2019 and covered health care facilities, public health institutions, and health care policy. The project in Estonia collected its inputs in 2020 and covered health care facilities, patient engagement, public health institutions, health care financing, health care policy, and health technology innovations. Results: In both cases, the assessments resulted in a set of recommendations focusing on the governance of health care data. In the Kingdom of Saudi Arabia, the health care system consists of multiple isolated sectors, and there is a need for an overarching body coordinating data sets, indicators, and reports at the national level. The National Master Plan of Health Data Analytics proposed a set of organizational agreements for proper stewardship. Despite Estonia?s national Digital Health Platform, the requirements remain uncoordinated between various data consumers. We recommended reconfiguring the stewardship of the national health data to include multipurpose data use into the scope of interoperability standardization. Conclusions: Proper data governance is the key to improving the secondary use of health data at the national level. The data flows from data providers to data consumers shall be coordinated by overarching stewardship structures and supported by interoperable data custodians. UR - https://www.jmir.org/2024/1/e53369 UR - http://dx.doi.org/10.2196/53369 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53369 ER - TY - JOUR AU - Zhu, Ziyu AU - Liu, Lan AU - Du, Min AU - Ye, Mao AU - Xu, Ximing AU - Xu, Ying PY - 2024/8/7 TI - Pediatric Sedation Assessment and Management System (PSAMS) for Pediatric Sedation in China: Development and Implementation Report JO - JMIR Med Inform SP - e53427 VL - 12 KW - electronic data capture KW - information systems KW - pediatric sedation KW - sedation management KW - workflow optimization N2 - Background: Recently, the growing demand for pediatric sedation services outside the operating room has imposed a heavy burden on pediatric centers in China. There is an urgent need to develop a novel system for improved sedation services. Objective: This study aimed to develop and implement a computerized system, the Pediatric Sedation Assessment and Management System (PSAMS), to streamline pediatric sedation services at a major children?s hospital in Southwest China. Methods: PSAMS was designed to reflect the actual workflow of pediatric sedation. It consists of 3 main components: server-hosted software; client applications on tablets and computers; and specialized devices like gun-type scanners, desktop label printers, and pulse oximeters. With the participation of a multidisciplinary team, PSAMS was developed and refined during its application in the sedation process. This study analyzed data from the first 2 years after the system?s deployment. Implementation (Results): From January 2020 to December 2021, a total of 127,325 sedations were performed on 85,281 patients using the PSAMS database. Besides basic variables imported from Hospital Information Systems (HIS), the PSAMS database currently contains 33 additional variables that capture comprehensive information from presedation assessment to postprocedural recovery. The recorded data from PSAMS indicates a one-time sedation success rate of 97.1% (50,752/52,282) in 2020 and 97.5% (73,184/75,043) in 2021. The observed adverse events rate was 3.5% (95% CI 3.4%?3.7%) in 2020 and 2.8% (95% CI 2.7%-2.9%) in 2021. Conclusions: PSAMS streamlined the entire sedation workflow, reduced the burden of data collection, and laid a foundation for future cooperation of multiple pediatric health care centers. UR - https://medinform.jmir.org/2024/1/e53427 UR - http://dx.doi.org/10.2196/53427 ID - info:doi/10.2196/53427 ER - TY - JOUR AU - Subramanian, Devika AU - Sonabend, Rona AU - Singh, Ila PY - 2024/8/7 TI - A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study JO - JMIR Diabetes SP - e53338 VL - 9 KW - pediatric type 1 diabetes KW - postdiagnosis diabetic ketoacidosis KW - risk prediction and stratification KW - XGBoost KW - Shapley values KW - ketoacidosis KW - risks KW - predict KW - prediction KW - predictive KW - gradient-boosted ensemble model KW - diabetes KW - pediatrics KW - children KW - machine learning N2 - Background: Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. Objective: We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. Methods: We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model?s predictive performance using the area under the receiver operating characteristic curve?weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. Results: Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA. Conclusions: We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors. UR - https://diabetes.jmir.org/2024/1/e53338 UR - http://dx.doi.org/10.2196/53338 UR - http://www.ncbi.nlm.nih.gov/pubmed/39110490 ID - info:doi/10.2196/53338 ER - TY - JOUR AU - Amadi, David AU - Kiwuwa-Muyingo, Sylvia AU - Bhattacharjee, Tathagata AU - Taylor, Amelia AU - Kiragga, Agnes AU - Ochola, Michael AU - Kanjala, Chifundo AU - Gregory, Arofan AU - Tomlin, Keith AU - Todd, Jim AU - Greenfield, Jay PY - 2024/8/1 TI - Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study JO - Online J Public Health Inform SP - e56237 VL - 16 KW - FAIR data principles KW - metadata KW - machine-readable metadata KW - DDI KW - Data Documentation Initiative KW - standardization KW - JSON-LD KW - JavaScript Object Notation for Linked Data KW - OMOP CDM KW - Observational Medical Outcomes Partnership Common Data Model KW - data science KW - data models N2 - Background: Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse. Objective: To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications. Methods: The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya. Results: The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website. Conclusions: The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide. UR - https://ojphi.jmir.org/2024/1/e56237 UR - http://dx.doi.org/10.2196/56237 UR - http://www.ncbi.nlm.nih.gov/pubmed/39088253 ID - info:doi/10.2196/56237 ER - TY - JOUR AU - Chen, Yu AU - Chen, Shouhang AU - Shen, Yuanfang AU - Li, Zhi AU - Li, Xiaolong AU - Zhang, Yaodong AU - Zhang, Xiaolong AU - Wang, Fang AU - Jin, Yuefei PY - 2024/7/31 TI - Molecular Evolutionary Dynamics of Coxsackievirus A6 Causing Hand, Foot, and Mouth Disease From 2021 to 2023 in China: Genomic Epidemiology Study JO - JMIR Public Health Surveill SP - e59604 VL - 10 KW - coxsackievirus A6 KW - hand, foot, and mouth disease KW - evolution KW - molecular epidemiology KW - China KW - CV-A6 KW - HFMD N2 - Background: Hand, foot, and mouth disease (HFMD) is a global public health concern, notably within the Asia-Pacific region. Recently, the primary pathogen causing HFMD outbreaks across numerous countries, including China, is coxsackievirus (CV) A6, one of the most prevalent enteroviruses in the world. It is a new variant that has undergone genetic recombination and evolution, which might not only induce modifications in the clinical manifestations of HFMD but also heighten its pathogenicity because of nucleotide mutation accumulation. Objective: The study assessed the epidemiological characteristics of HFMD in China and characterized the molecular epidemiology of the major pathogen (CV-A6) causing HFMD. We attempted to establish the association between disease progression and viral genetic evolution through a molecular epidemiological study. Methods: Surveillance data from the Chinese Center for Disease Control and Prevention from 2021 to 2023 were used to analyze the epidemiological seasons and peaks of HFMD in Henan, China, and capture the results of HFMD pathogen typing. We analyzed the evolutionary characteristics of all full-length CV-A6 sequences in the NCBI database and the isolated sequences in Henan. To characterize the molecular evolution of CV-A6, time-scaled tree and historical population dynamics regarding CV-A6 sequences were estimated. Additionally, we analyzed the isolated strains for mutated or missing amino acid sites compared to the prototype CV-A6 strain. Results: The 2021-2023 epidemic seasons for HFMD in Henan usually lasted from June to August, with peaks around June and July. The monthly case reporting rate during the peak period ranged from 20.7% (4854/23,440) to 35% (12,135/34,706) of the total annual number of cases. Analysis of the pathogen composition of 2850 laboratory-confirmed cases identified 8 enterovirus serotypes, among which CV-A6 accounted for the highest proportion (652/2850, 22.88%). CV-A6 emerged as the major pathogen for HFMD in 2022 (203/732, 27.73%) and 2023 (262/708, 37.01%). We analyzed all CV-A6 full-length sequences in the NCBI database and the evolutionary features of viruses isolated in Henan. In China, the D3 subtype gradually appeared from 2011, and by 2019, all CV-A6 virus strains belonged to the D3 subtype. The VP1 sequences analyzed in Henan showed that its subtypes were consistent with the national subtypes. Furthermore, we analyzed the molecular evolutionary features of CV-A6 using Bayesian phylogeny and found that the most recent common ancestor of CV-A6 D3 dates back to 2006 in China, earlier than the 2011 HFMD outbreak. Moreover, the strains isolated in 2023 had mutations at several amino acid sites compared to the original strain. Conclusions: The CV-A6 virus may have been introduced and circulating covertly within China prior to the large-scale HFMD outbreak. Our laboratory testing data confirmed the fluctuation and periodic patterns of CV-A6 prevalence. Our study provides valuable insights into understanding the evolutionary dynamics of CV-A6. UR - https://publichealth.jmir.org/2024/1/e59604 UR - http://dx.doi.org/10.2196/59604 ID - info:doi/10.2196/59604 ER - TY - JOUR AU - Ben Yehuda, Ori AU - Itelman, Edward AU - Vaisman, Adva AU - Segal, Gad AU - Lerner, Boaz PY - 2024/7/30 TI - Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study JO - J Med Internet Res SP - e48595 VL - 26 KW - pulmonary embolism KW - deep vein thrombosis KW - venous thromboembolism KW - imbalanced data KW - clustering KW - risk factors KW - Wells score KW - revised Genova score KW - hospital admission KW - machine learning N2 - Background: Under- or late identification of pulmonary embolism (PE)?a thrombosis of 1 or more pulmonary arteries that seriously threatens patients? lives?is a major challenge confronting modern medicine. Objective: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records. Methods: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient?s hospitalization?at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE. Results: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia. Conclusions: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient?s medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations. UR - https://www.jmir.org/2024/1/e48595 UR - http://dx.doi.org/10.2196/48595 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/48595 ER - TY - JOUR AU - Ghasemi, Peyman AU - Lee, Joon PY - 2024/7/26 TI - Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study JO - JMIR Med Inform SP - e52896 VL - 12 KW - unsupervised feature selection KW - ICD-10 KW - International Classification of Diseases KW - ATC KW - Anatomical Therapeutic Chemical KW - concrete autoencoder KW - Laplacian score KW - unsupervised feature selection for multicluster data KW - autoencoder-inspired unsupervised feature selection KW - principal feature analysis KW - machine learning KW - artificial intelligence KW - case study KW - coronary artery disease KW - artery disease KW - patient cohort KW - artery KW - mortality prediction KW - mortality KW - data set KW - interpretability KW - International Classification of Diseases, Tenth Revision N2 - Background: The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the ?curse of dimensionality? and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems. Objective: The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients. Methods: We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis. Results: In feature space reconstruction and mortality prediction, the concrete autoencoder?based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (P<.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives. Conclusions: This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features. UR - https://medinform.jmir.org/2024/1/e52896 UR - http://dx.doi.org/10.2196/52896 ID - info:doi/10.2196/52896 ER - TY - JOUR AU - Bellmann, Louis AU - Wiederhold, Johannes Alexander AU - Trübe, Leona AU - Twerenbold, Raphael AU - Ückert, Frank AU - Gottfried, Karl PY - 2024/7/24 TI - Introducing Attribute Association Graphs to Facilitate Medical Data Exploration: Development and Evaluation Using Epidemiological Study Data JO - JMIR Med Inform SP - e49865 VL - 12 KW - data exploration KW - cohort studies KW - data visualization KW - big data KW - statistical models KW - medical knowledge KW - data analysis KW - cardiovascular diseases KW - usability N2 - Background: Interpretability and intuitive visualization facilitate medical knowledge generation through big data. In addition, robustness to high-dimensional and missing data is a requirement for statistical approaches in the medical domain. A method tailored to the needs of physicians must meet all the abovementioned criteria. Objective: This study aims to develop an accessible tool for visual data exploration without the need for programming knowledge, adjusting complex parameterizations, or handling missing data. We sought to use statistical analysis using the setting of disease and control cohorts familiar to clinical researchers. We aimed to guide the user by identifying and highlighting data patterns associated with disease and reveal relations between attributes within the data set. Methods: We introduce the attribute association graph, a novel graph structure designed for visual data exploration using robust statistical metrics. The nodes capture frequencies of participant attributes in disease and control cohorts as well as deviations between groups. The edges represent conditional relations between attributes. The graph is visualized using the Neo4j (Neo4j, Inc) data platform and can be interactively explored without the need for technical knowledge. Nodes with high deviations between cohorts and edges of noticeable conditional relationship are highlighted to guide the user during the exploration. The graph is accompanied by a dashboard visualizing variable distributions. For evaluation, we applied the graph and dashboard to the Hamburg City Health Study data set, a large cohort study conducted in the city of Hamburg, Germany. All data structures can be accessed freely by researchers, physicians, and patients. In addition, we developed a user test conducted with physicians incorporating the System Usability Scale, individual questions, and user tasks. Results: We evaluated the attribute association graph and dashboard through an exemplary data analysis of participants with a general cardiovascular disease in the Hamburg City Health Study data set. All results extracted from the graph structure and dashboard are in accordance with findings from the literature, except for unusually low cholesterol levels in participants with cardiovascular disease, which could be induced by medication. In addition, 95% CIs of Pearson correlation coefficients were calculated for all associations identified during the data analysis, confirming the results. In addition, a user test with 10 physicians assessing the usability of the proposed methods was conducted. A System Usability Scale score of 70.5% and average successful task completion of 81.4% were reported. Conclusions: The proposed attribute association graph and dashboard enable intuitive visual data exploration. They are robust to high-dimensional as well as missing data and require no parameterization. The usability for clinicians was confirmed via a user test, and the validity of the statistical results was confirmed by associations known from literature and standard statistical inference. UR - https://medinform.jmir.org/2024/1/e49865 UR - http://dx.doi.org/10.2196/49865 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/49865 ER - TY - JOUR AU - Kizaki, Hayato AU - Satoh, Hiroki AU - Ebara, Sayaka AU - Watabe, Satoshi AU - Sawada, Yasufumi AU - Imai, Shungo AU - Hori, Satoko PY - 2024/7/23 TI - Construction of a Multi-Label Classifier for Extracting Multiple Incident Factors From Medication Incident Reports in Residential Care Facilities: Natural Language Processing Approach JO - JMIR Med Inform SP - e58141 VL - 12 KW - residential facilities KW - incidents KW - non-medical staff KW - natural language processing KW - risk management N2 - Background: Medication safety in residential care facilities is a critical concern, particularly when nonmedical staff provide medication assistance. The complex nature of medication-related incidents in these settings, coupled with the psychological impact on health care providers, underscores the need for effective incident analysis and preventive strategies. A thorough understanding of the root causes, typically through incident-report analysis, is essential for mitigating medication-related incidents. Objective: We aimed to develop and evaluate a multilabel classifier using natural language processing to identify factors contributing to medication-related incidents using incident report descriptions from residential care facilities, with a focus on incidents involving nonmedical staff. Methods: We analyzed 2143 incident reports, comprising 7121 sentences, from residential care facilities in Japan between April 1, 2015, and March 31, 2016. The incident factors were annotated using sentences based on an established organizational factor model and previous research findings. The following 9 factors were defined: procedure adherence, medicine, resident, resident family, nonmedical staff, medical staff, team, environment, and organizational management. To assess the label criteria, 2 researchers with relevant medical knowledge annotated a subset of 50 reports; the interannotator agreement was measured using Cohen ?. The entire data set was subsequently annotated by 1 researcher. Multiple labels were assigned to each sentence. A multilabel classifier was developed using deep learning models, including 2 Bidirectional Encoder Representations From Transformers (BERT)?type models (Tohoku-BERT and a University of Tokyo Hospital BERT pretrained with Japanese clinical text: UTH-BERT) and an Efficiently Learning Encoder That Classifies Token Replacements Accurately (ELECTRA), pretrained on Japanese text. Both sentence- and report-level training were performed; the performance was evaluated by the F1-score and exact match accuracy through 5-fold cross-validation. Results: Among all 7121 sentences, 1167, 694, 2455, 23, 1905, 46, 195, 1104, and 195 included ?procedure adherence,? ?medicine,? ?resident,? ?resident family,? ?nonmedical staff,? ?medical staff,? ?team,? ?environment,? and ?organizational management,? respectively. Owing to limited labels, ?resident family? and ?medical staff? were omitted from the model development process. The interannotator agreement values were higher than 0.6 for each label. A total of 10, 278, and 1855 reports contained no, 1, and multiple labels, respectively. The models trained using the report data outperformed those trained using sentences, with macro F1-scores of 0.744, 0.675, and 0.735 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. The report-trained models also demonstrated better exact match accuracy, with 0.411, 0.389, and 0.399 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. Notably, the accuracy was consistent even when the analysis was confined to reports containing multiple labels. Conclusions: The multilabel classifier developed in our study demonstrated potential for identifying various factors associated with medication-related incidents using incident reports from residential care facilities. Thus, this classifier can facilitate prompt analysis of incident factors, thereby contributing to risk management and the development of preventive strategies. UR - https://medinform.jmir.org/2024/1/e58141 UR - http://dx.doi.org/10.2196/58141 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58141 ER - TY - JOUR AU - Levinson, T. Rebecca AU - Paul, Cinara AU - Meid, D. Andreas AU - Schultz, Jobst-Hendrik AU - Wild, Beate PY - 2024/7/23 TI - Identifying Predictors of Heart Failure Readmission in Patients From a Statutory Health Insurance Database: Retrospective Machine Learning Study JO - JMIR Cardio SP - e54994 VL - 8 KW - statutory health insurance KW - readmission KW - machine learning KW - heart failure KW - heart KW - cardiology KW - cardiac KW - hospitalization KW - insurance KW - predict KW - predictive KW - prediction KW - predictions KW - predictor KW - predictors KW - all cause N2 - Background: Patients with heart failure (HF) are the most commonly readmitted group of adult patients in Germany. Most patients with HF are readmitted for noncardiovascular reasons. Understanding the relevance of HF management outside the hospital setting is critical to understanding HF and factors that lead to readmission. Application of machine learning (ML) on data from statutory health insurance (SHI) allows the evaluation of large longitudinal data sets representative of the general population to support clinical decision-making. Objective: This study aims to evaluate the ability of ML methods to predict 1-year all-cause and HF-specific readmission after initial HF-related admission of patients with HF in outpatient SHI data and identify important predictors. Methods: We identified individuals with HF using outpatient data from 2012 to 2018 from the AOK Baden-Württemberg SHI in Germany. We then trained and applied regression and ML algorithms to predict the first all-cause and HF-specific readmission in the year after the first admission for HF. We fitted a random forest, an elastic net, a stepwise regression, and a logistic regression to predict readmission by using diagnosis codes, drug exposures, demographics (age, sex, nationality, and type of coverage within SHI), degree of rurality for residence, and participation in disease management programs for common chronic conditions (diabetes mellitus type 1 and 2, breast cancer, chronic obstructive pulmonary disease, and coronary heart disease). We then evaluated the predictors of HF readmission according to their importance and direction to predict readmission. Results: Our final data set consisted of 97,529 individuals with HF, and 78,044 (80%) were readmitted within the observation period. Of the tested modeling approaches, the random forest approach best predicted 1-year all-cause and HF-specific readmission with a C-statistic of 0.68 and 0.69, respectively. Important predictors for 1-year all-cause readmission included prescription of pantoprazole, chronic obstructive pulmonary disease, atherosclerosis, sex, rurality, and participation in disease management programs for type 2 diabetes mellitus and coronary heart disease. Relevant features for HF-specific readmission included a large number of canonical HF comorbidities. Conclusions: While many of the predictors we identified were known to be relevant comorbidities for HF, we also uncovered several novel associations. Disease management programs have widely been shown to be effective at managing chronic disease; however, our results indicate that in the short term they may be useful for targeting patients with HF with comorbidity at increased risk of readmission. Our results also show that living in a more rural location increases the risk of readmission. Overall, factors beyond comorbid disease were relevant for risk of HF readmission. This finding may impact how outpatient physicians identify and monitor patients at risk of HF readmission. UR - https://cardio.jmir.org/2024/1/e54994 UR - http://dx.doi.org/10.2196/54994 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54994 ER - TY - JOUR AU - Turzhitsky, Vladimir AU - Bash, D. Lori AU - Urman, D. Richard AU - Kattan, Michael AU - Hofer, Ira PY - 2024/7/22 TI - Factors Influencing Neuromuscular Blockade Reversal Choice in the United States Before and During the COVID-19 Pandemic: Retrospective Longitudinal Analysis JO - JMIR Perioper Med SP - e52278 VL - 7 KW - neuromuscular blockade KW - sugammadex KW - neostigmine KW - rocuronium, vecuronium, intubation, counterfactual KW - anesthesia KW - anesthetic KW - anesthesiologist KW - anesthesiologists KW - surgery KW - surgical KW - preference KW - preferences KW - retrospective KW - utilization KW - pattern KW - patterns KW - trend KW - trends KW - national KW - healthcare database KW - healthcare databases KW - COVID-19 KW - time-trend analysis KW - neuromuscular KW - longitudinal analysis KW - longitudinal KW - neuromuscular blockade agent KW - clinical KW - surgical procedure KW - inpatient KW - inpatient surgery KW - retrospective analysis KW - USA KW - United States N2 - Background: Neuromuscular blockade (NMB) agents are a critical component of balanced anesthesia. NMB reversal methods can include spontaneous reversal, sugammadex, or neostigmine and the choice of reversal strategy can depend on various factors. Unanticipated changes to clinical practice emerged due to the COVID-19 pandemic, and a better understanding of how NMB reversal trends were affected by the pandemic may help provide insight into how providers view the tradeoffs in the choice of NMB reversal agents. Objective: We aim to analyze NMB reversal agent use patterns for US adult inpatient surgeries before and after the COVID-19 outbreak to determine whether pandemic-related practice changes affected use trends. Methods: A retrospective longitudinal analysis of a large all-payer national electronic US health care database (PINC AI Healthcare Database) was conducted to identify the use patterns of NMB reversal during early, middle, and late COVID-19 (EC, MC, and LC, respectively) time periods. Factors associated with NMB reversal choices in inpatient surgeries were assessed before and after the COVID-19 pandemic reached the United States. Multivariate logistic regression assessed the impact of the pandemic on NMB reversal, accounting for patient, clinical, procedural, and site characteristics. A counterfactual framework was used to understand if patient characteristics affected how COVID-19?era patients would have been treated before the pandemic. Results: More than 3.2 million inpatients experiencing over 3.6 million surgical procedures across 931 sites that met all inclusion criteria were identified between March 1, 2017, and December 31, 2021. NMB reversal trends showed a steady increase in reversal with sugammadex over time, with the trend from January 2018 onwards being linear with time (R2>0.99). Multivariate analysis showed that the post?COVID-19 time periods had a small but statistically significant effect on the trend, as measured by the interaction terms of the COVID-19 time periods and the time trend in NMB reversal. A slight increase in the likelihood of sugammadex reversal was observed during EC relative to the pre?COVID-19 trend (odds ratio [OR] 1.008, 95% CI 1.003-1.014; P=.003), followed by negation of that increase during MC (OR 0.992, 95% CI 0.987-0.997; P<.001), and no significant interaction identified during LC (OR 1.001, 95% CI 0.996-1.005; P=.81). Conversely, active reversal (using either sugammadex or neostigmine) did not show a significant association relative to spontaneous reversal, or a change in trend, during EC or MC (P>.05), though a slight decrease in the active reversal trend was observed during LC (OR 0.987, 95% CI 0.983-0.992; P<.001). Conclusions: We observed a steady increase in NMB active reversal overall, and specifically with sugammadex compared to neostigmine, during periods before and after the COVID-19 outbreak. Small, transitory alterations in the NMB reversal trends were observed during the height of the COVID-19 pandemic, though these alterations were independent of the underlying NMB reversal time trends. UR - https://periop.jmir.org/2024/1/e52278 UR - http://dx.doi.org/10.2196/52278 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52278 ER - TY - JOUR AU - Lamer, Antoine AU - Saint-Dizier, Chloé AU - Paris, Nicolas AU - Chazard, Emmanuel PY - 2024/7/17 TI - Data Lake, Data Warehouse, Datamart, and Feature Store: Their Contributions to the Complete Data Reuse Pipeline JO - JMIR Med Inform SP - e54590 VL - 12 KW - data reuse KW - data lake KW - data warehouse KW - feature extraction KW - datamart KW - feature store UR - https://medinform.jmir.org/2024/1/e54590 UR - http://dx.doi.org/10.2196/54590 ID - info:doi/10.2196/54590 ER - TY - JOUR AU - Ji, Hyerim AU - Kim, Seok AU - Sunwoo, Leonard AU - Jang, Sowon AU - Lee, Ho-Young AU - Yoo, Sooyoung PY - 2024/7/12 TI - Integrating Clinical Data and Medical Imaging in Lung Cancer: Feasibility Study Using the Observational Medical Outcomes Partnership Common Data Model Extension JO - JMIR Med Inform SP - e59187 VL - 12 KW - DICOM KW - OMOP KW - CDM KW - lung cancer KW - medical imaging KW - data integration KW - data quality KW - Common Data Model KW - Digital Imaging and Communications in Medicine KW - Observational Medical Outcomes Partnership N2 - Background: Digital transformation, particularly the integration of medical imaging with clinical data, is vital in personalized medicine. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardizes health data. However, integrating medical imaging remains a challenge. Objective: This study proposes a method for combining medical imaging data with the OMOP CDM to improve multimodal research. Methods: Our approach included the analysis and selection of digital imaging and communications in medicine header tags, validation of data formats, and alignment according to the OMOP CDM framework. The Fast Healthcare Interoperability Resources ImagingStudy profile guided our consistency in column naming and definitions. Imaging Common Data Model (I-CDM), constructed using the entity-attribute-value model, facilitates scalable and efficient medical imaging data management. For patients with lung cancer diagnosed between 2010 and 2017, we introduced 4 new tables?IMAGING_STUDY, IMAGING_SERIES, IMAGING_ANNOTATION, and FILEPATH?to standardize various imaging-related data and link to clinical data. Results: This framework underscores the effectiveness of I-CDM in enhancing our understanding of lung cancer diagnostics and treatment strategies. The implementation of the I-CDM tables enabled the structured organization of a comprehensive data set, including 282,098 IMAGING_STUDY, 5,674,425 IMAGING_SERIES, and 48,536 IMAGING_ANNOTATION records, illustrating the extensive scope and depth of the approach. A scenario-based analysis using actual data from patients with lung cancer underscored the feasibility of our approach. A data quality check applying 44 specific rules confirmed the high integrity of the constructed data set, with all checks successfully passed, underscoring the reliability of our findings. Conclusions: These findings indicate that I-CDM can improve the integration and analysis of medical imaging and clinical data. By addressing the challenges in data standardization and management, our approach contributes toward enhancing diagnostics and treatment strategies. Future research should expand the application of I-CDM to diverse disease populations and explore its wide-ranging utility for medical conditions. UR - https://medinform.jmir.org/2024/1/e59187 UR - http://dx.doi.org/10.2196/59187 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59187 ER - TY - JOUR AU - Suh, Jungyo AU - Lee, Garam AU - Kim, Woo Jung AU - Shin, Junbum AU - Kim, Yi-Jun AU - Lee, Sang-Wook AU - Kim, Sulgi PY - 2024/7/5 TI - Privacy-Preserving Prediction of Postoperative Mortality in Multi-Institutional Data: Development and Usability Study JO - JMIR Med Inform SP - e56893 VL - 12 KW - machine learning KW - privacy KW - in-hospital mortality KW - homomorphic encryption KW - multi-institutional system N2 - Background: To circumvent regulatory barriers that limit medical data exchange due to personal information security concerns, we use homomorphic encryption (HE) technology, enabling computation on encrypted data and enhancing privacy. Objective: This study explores whether using HE to integrate encrypted multi-institutional data enhances predictive power in research, focusing on the integration feasibility across institutions and determining the optimal size of hospital data sets for improved prediction models. Methods: We used data from 341,007 individuals aged 18 years and older who underwent noncardiac surgeries across 3 medical institutions. The study focused on predicting in-hospital mortality within 30 days postoperatively, using secure logistic regression based on HE as the prediction model. We compared the predictive performance of this model using plaintext data from a single institution against a model using encrypted data from multiple institutions. Results: The predictive model using encrypted data from all 3 institutions exhibited the best performance based on area under the receiver operating characteristic curve (0.941); the model combining Asan Medical Center (AMC) and Seoul National University Hospital (SNUH) data exhibited the best predictive performance based on area under the precision-recall curve (0.132). Both Ewha Womans University Medical Center and SNUH demonstrated improvement in predictive power for their own institutions upon their respective data?s addition to the AMC data. Conclusions: Prediction models using multi-institutional data sets processed with HE outperformed those using single-institution data sets, especially when our model adaptation approach was applied, which was further validated on a smaller host hospital with a limited data set. UR - https://medinform.jmir.org/2024/1/e56893 UR - http://dx.doi.org/10.2196/56893 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56893 ER - TY - JOUR UR - ID - ref1 ER - TY - JOUR AU - Yuan, Yannan AU - Mei, Yun AU - Zhao, Shuhua AU - Dai, Shenglong AU - Liu, Xiaohong AU - Sun, Xiaojing AU - Fu, Zhiying AU - Zhou, Liheng AU - Ai, Jie AU - Ma, Liheng AU - Jiang, Min PY - 2024/6/27 TI - Data Flow Construction and Quality Evaluation of Electronic Source Data in Clinical Trials: Pilot Study Based on Hospital Electronic Medical Records in China JO - JMIR Med Inform SP - e52934 VL - 12 KW - clinical trials KW - electronic source data KW - EHRs KW - electronic data capture systems KW - data quality KW - electronic health records N2 - Background: The traditional clinical trial data collection process requires a clinical research coordinator who is authorized by the investigators to read from the hospital?s electronic medical record. Using electronic source data opens a new path to extract patients? data from electronic health records (EHRs) and transfer them directly to an electronic data capture (EDC) system; this method is often referred to as eSource. eSource technology in a clinical trial data flow can improve data quality without compromising timeliness. At the same time, improved data collection efficiency reduces clinical trial costs. Objective: This study aims to explore how to extract clinical trial?related data from hospital EHR systems, transform the data into a format required by the EDC system, and transfer it into sponsors? environments, and to evaluate the transferred data sets to validate the availability, completeness, and accuracy of building an eSource dataflow. Methods: A prospective clinical trial study registered on the Drug Clinical Trial Registration and Information Disclosure Platform was selected, and the following data modules were extracted from the structured data of 4 case report forms: demographics, vital signs, local laboratory data, and concomitant medications. The extracted data was mapped and transformed, deidentified, and transferred to the sponsor?s environment. Data validation was performed based on availability, completeness, and accuracy. Results: In a secure and controlled data environment, clinical trial data was successfully transferred from a hospital EHR to the sponsor?s environment with 100% transcriptional accuracy, but the availability and completeness of the data could be improved. Conclusions: Data availability was low due to some required fields in the EDC system not being available directly in the EHR. Some data is also still in an unstructured or paper-based format. The top-level design of the eSource technology and the construction of hospital electronic data standards should help lay a foundation for a full electronic data flow from EHRs to EDC systems in the future. UR - https://medinform.jmir.org/2024/1/e52934 UR - http://dx.doi.org/10.2196/52934 ID - info:doi/10.2196/52934 ER - TY - JOUR AU - Karakis, Ioannis AU - Kostandini, Genti AU - Tsamakis, Konstantinos AU - Zahirovic-Herbert, Velma PY - 2024/6/26 TI - The Association of Broadband Internet Use With Drug Overdose Mortality Rates in the United States: Cross-Sectional Analysis JO - Online J Public Health Inform SP - e52686 VL - 16 KW - opioids KW - broadband internet KW - mortality KW - public health KW - digital divide KW - access KW - availability KW - causal KW - association KW - correlation KW - overdose KW - drug abuse KW - addiction KW - substance abuse KW - demographic KW - United States KW - population N2 - Background: The availability and use of broadband internet play an increasingly important role in health care and public health. Objective: This study examined the associations between broadband internet availability and use with drug overdose deaths in the United States. Methods: We linked 2019 county-level drug overdose death data in restricted-access multiple causes of death files from the National Vital Statistics System at the US Centers for Disease Control and Prevention with the 2019 county-level broadband internet rollout data from the Federal Communications Commission and the 2019 county-level broadband usage data available from Microsoft?s Airband Initiative. Cross-sectional analysis was performed with the fixed-effects regression method to assess the association of broadband internet availability and usage with opioid overdose deaths. Our model also controlled for county-level socioeconomic characteristics and county-level health policy variables. Results: Overall, a 1% increase in broadband internet use was linked with a 1.2% increase in overall drug overdose deaths. No significant association was observed for broadband internet availability. Although similar positive associations were found for both male and female populations, the association varied across different age subgroups. The positive association on overall drug overdose deaths was the greatest among Hispanic and Non-Hispanic White populations. Conclusions: Broadband internet use was positively associated with increased drug overdose deaths among the overall US population and some subpopulations, even after controlling for broadband availability, sociodemographic characteristics, unemployment, and median household income. UR - https://ojphi.jmir.org/2024/1/e52686 UR - http://dx.doi.org/10.2196/52686 UR - http://www.ncbi.nlm.nih.gov/pubmed/38922664 ID - info:doi/10.2196/52686 ER - TY - JOUR AU - Soni, Hiral AU - Ivanova, Julia AU - Wilczewski, Hattie AU - Ong, Triton AU - Ross, Nalubega J. AU - Bailey, Alexandra AU - Cummins, Mollie AU - Barrera, Janelle AU - Bunnell, Brian AU - Welch, Brandon PY - 2024/6/25 TI - User Preferences and Needs for Health Data Collection Using Research Electronic Data Capture: Survey Study JO - JMIR Med Inform SP - e49785 VL - 12 KW - Research Electronic Data Capture KW - REDCap KW - user experience KW - electronic data collection KW - health data KW - personal health information KW - clinical research KW - mobile phone N2 - Background: Self-administered web-based questionnaires are widely used to collect health data from patients and clinical research participants. REDCap (Research Electronic Data Capture; Vanderbilt University) is a global, secure web application for building and managing electronic data capture. Unfortunately, stakeholder needs and preferences of electronic data collection via REDCap have rarely been studied. Objective: This study aims to survey REDCap researchers and administrators to assess their experience with REDCap, especially their perspectives on the advantages, challenges, and suggestions for the enhancement of REDCap as a data collection tool. Methods: We conducted a web-based survey with representatives of REDCap member organizations in the United States. The survey captured information on respondent demographics, quality of patient-reported data collected via REDCap, patient experience of data collection with REDCap, and open-ended questions focusing on the advantages, challenges, and suggestions to enhance REDCap?s data collection experience. Descriptive and inferential analysis measures were used to analyze quantitative data. Thematic analysis was used to analyze open-ended responses focusing on the advantages, disadvantages, and enhancements in data collection experience. Results: A total of 207 respondents completed the survey. Respondents strongly agreed or agreed that the data collected via REDCap are accurate (188/207, 90.8%), reliable (182/207, 87.9%), and complete (166/207, 80.2%). More than half of respondents strongly agreed or agreed that patients find REDCap easy to use (165/207, 79.7%), could successfully complete tasks without help (151/207, 72.9%), and could do so in a timely manner (163/207, 78.7%). Thematic analysis of open-ended responses yielded 8 major themes: survey development, user experience, survey distribution, survey results, training and support, technology, security, and platform features. The user experience category included more than half of the advantage codes (307/594, 51.7% of codes); meanwhile, respondents reported higher challenges in survey development (169/516, 32.8% of codes), also suggesting the highest enhancement suggestions for the category (162/439, 36.9% of codes). Conclusions: Respondents indicated that REDCap is a valued, low-cost, secure resource for clinical research data collection. REDCap?s data collection experience was generally positive among clinical research and care staff members and patients. However, with the advancements in data collection technologies and the availability of modern, intuitive, and mobile-friendly data collection interfaces, there is a critical opportunity to enhance the REDCap experience to meet the needs of researchers and patients. UR - https://medinform.jmir.org/2024/1/e49785 UR - http://dx.doi.org/10.2196/49785 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/49785 ER - TY - JOUR AU - Xu, Stanley AU - Sy, S. Lina AU - Hong, Vennis AU - Holmquist, J. Kimberly AU - Qian, Lei AU - Farrington, Paddy AU - Bruxvoort, J. Katia AU - Klein, P. Nicola AU - Fireman, Bruce AU - Han, Bing AU - Lewin, J. Bruno PY - 2024/6/25 TI - Ischemic Stroke After Bivalent COVID-19 Vaccination: Self-Controlled Case Series Study JO - JMIR Public Health Surveill SP - e53807 VL - 10 KW - ischemic stroke KW - bivalent COVID-19 vaccine KW - influenza vaccine KW - self-controlled case series KW - coadministration KW - ischemic KW - stroke KW - TIA KW - transient ischemic attack KW - ischemia KW - cardiovascular KW - COVID-19 KW - SARS-CoV-2 KW - vaccine KW - vaccines KW - vaccination KW - association KW - correlation KW - risk KW - risks KW - adverse KW - side effect KW - subgroup analyses KW - subgroup analysis KW - bivalent KW - influenza KW - infectious KW - respiratory KW - incidence KW - case series N2 - Background: The potential association between bivalent COVID-19 vaccination and ischemic stroke remains uncertain, despite several studies conducted thus far. Objective: This study aimed to evaluate the risk of ischemic stroke following bivalent COVID-19 vaccination during the 2022-2023 season. Methods: A self-controlled case series study was conducted among members aged 12 years and older who experienced ischemic stroke between September 1, 2022, and March 31, 2023, in a large health care system. Ischemic strokes were identified using International Classification of Diseases, Tenth Revision codes in emergency departments and inpatient settings. Exposures were Pfizer-BioNTech or Moderna bivalent COVID-19 vaccination. Risk intervals were prespecified as 1-21 days and 1-42 days after bivalent vaccination; all non?risk-interval person-time served as the control interval. The incidence of ischemic stroke was compared in the risk interval and control interval using conditional Poisson regression. We conducted overall and subgroup analyses by age, history of SARS-CoV-2 infection, and coadministration of influenza vaccine. When an elevated risk was detected, we performed a chart review of ischemic strokes and analyzed the risk of chart-confirmed ischemic stroke. Results: With 4933 ischemic stroke events, we found no increased risk within the 21-day risk interval for the 2 vaccines and by subgroups. However, risk of ischemic stroke was elevated within the 42-day risk interval among individuals aged younger than 65 years with coadministration of Pfizer-BioNTech bivalent and influenza vaccines on the same day; the relative incidence (RI) was 2.13 (95% CI 1.01-4.46). Among those who also had a history of SARS-CoV-2 infection, the RI was 3.94 (95% CI 1.10-14.16). After chart review, the RIs were 2.34 (95% CI 0.97-5.65) and 4.27 (95% CI 0.97-18.85), respectively. Among individuals aged younger than 65 years who received Moderna bivalent vaccine and had a history of SARS-CoV-2 infection, the RI was 2.62 (95% CI 1.13-6.03) before chart review and 2.24 (95% CI 0.78-6.47) after chart review. Stratified analyses by sex did not show a significantly increased risk of ischemic stroke after bivalent vaccination. Conclusions: While the point estimate for the risk of chart-confirmed ischemic stroke was elevated in a risk interval of 1-42 days among individuals younger than 65 years with coadministration of Pfizer-BioNTech bivalent and influenza vaccines on the same day and among individuals younger than 65 years who received Moderna bivalent vaccine and had a history of SARS-CoV-2 infection, the risk was not statistically significant. The potential association between bivalent vaccination and ischemic stroke in the 1-42?day analysis warrants further investigation among individuals younger than 65 years with influenza vaccine coadministration and prior SARS-CoV-2 infection. Furthermore, the findings on ischemic stroke risk after bivalent COVID-19 vaccination underscore the need to evaluate monovalent COVID-19 vaccine safety during the 2023-2024 season. UR - https://publichealth.jmir.org/2024/1/e53807 UR - http://dx.doi.org/10.2196/53807 UR - http://www.ncbi.nlm.nih.gov/pubmed/38916940 ID - info:doi/10.2196/53807 ER - TY - JOUR AU - Karakachoff, Matilde AU - Goronflot, Thomas AU - Coudol, Sandrine AU - Toublant, Delphine AU - Bazoge, Adrien AU - Constant Dit Beaufils, Pacôme AU - Varey, Emilie AU - Leux, Christophe AU - Mauduit, Nicolas AU - Wargny, Matthieu AU - Gourraud, Pierre-Antoine PY - 2024/6/24 TI - Implementing a Biomedical Data Warehouse From Blueprint to Bedside in a Regional French University Hospital Setting: Unveiling Processes, Overcoming Challenges, and Extracting Clinical Insight JO - JMIR Med Inform SP - e50194 VL - 12 KW - data warehouse KW - biomedical data warehouse KW - clinical data repository KW - electronic health records KW - data reuse KW - secondary use KW - clinical routine data KW - real-world data KW - implementation report N2 - Background: Biomedical data warehouses (BDWs) have become an essential tool to facilitate the reuse of health data for both research and decisional applications. Beyond technical issues, the implementation of BDWs requires strong institutional data governance and operational knowledge of the European and national legal framework for the management of research data access and use. Objective: In this paper, we describe the compound process of implementation and the contents of a regional university hospital BDW. Methods: We present the actions and challenges regarding organizational changes, technical architecture, and shared governance that took place to develop the Nantes BDW. We describe the process to access clinical contents, give details about patient data protection, and use examples to illustrate merging clinical insights. Implementation (Results): More than 68 million textual documents and 543 million pieces of coded information concerning approximately 1.5 million patients admitted to CHUN between 2002 and 2022 can be queried and transformed to be made available to investigators. Since its creation in 2018, 269 projects have benefited from the Nantes BDW. Access to data is organized according to data use and regulatory requirements. Conclusions: Data use is entirely determined by the scientific question posed. It is the vector of legitimacy of data access for secondary use. Enabling access to a BDW is a game changer for research and all operational situations in need of data. Finally, data governance must prevail over technical issues in institution data strategy vis-à-vis care professionals and patients alike. UR - https://medinform.jmir.org/2024/1/e50194 UR - http://dx.doi.org/10.2196/50194 ID - info:doi/10.2196/50194 ER - TY - JOUR AU - Miller, Morven AU - McCann, Lisa AU - Lewis, Liane AU - Miaskowski, Christine AU - Ream, Emma AU - Darley, Andrew AU - Harris, Jenny AU - Kotronoulas, Grigorios AU - V Berg, Geir AU - Lubowitzki, Simone AU - Armes, Jo AU - Patiraki, Elizabeth AU - Furlong, Eileen AU - Fox, Patricia AU - Gaiger, Alexander AU - Cardone, Antonella AU - Orr, Dawn AU - Flowerday, Adrian AU - Katsaragakis, Stylianos AU - Skene, Simon AU - Moore, Margaret AU - McCrone, Paul AU - De Souza, Nicosha AU - Donnan, T. Peter AU - Maguire, Roma PY - 2024/6/20 TI - Patients? and Clinicians? Perceptions of the Clinical Utility of Predictive Risk Models for Chemotherapy-Related Symptom Management: Qualitative Exploration Using Focus Groups and Interviews JO - J Med Internet Res SP - e49309 VL - 26 KW - chemotherapy KW - digital health KW - education KW - predictive risk models KW - qualitative research methods KW - symptoms KW - symptom cluster KW - tailored information N2 - Background: Interest in the application of predictive risk models (PRMs) in health care to identify people most likely to experience disease and treatment-related complications is increasing. In cancer care, these techniques are focused primarily on the prediction of survival or life-threatening toxicities (eg, febrile neutropenia). Fewer studies focus on the use of PRMs for symptoms or supportive care needs. The application of PRMs to chemotherapy-related symptoms (CRS) would enable earlier identification and initiation of prompt, personalized, and tailored interventions. While some PRMs exist for CRS, few were translated into clinical practice, and human factors associated with their use were not reported. Objective: We aim to explore patients? and clinicians? perspectives of the utility and real-world application of PRMs to improve the management of CRS. Methods: Focus groups (N=10) and interviews (N=5) were conducted with patients (N=28) and clinicians (N=26) across 5 European countries. Interactions were audio-recorded, transcribed verbatim, and analyzed thematically. Results: Both clinicians and patients recognized the value of having individualized risk predictions for CRS and appreciated how this type of information would facilitate the provision of tailored preventative treatments or supportive care interactions. However, cautious and skeptical attitudes toward the use of PRMs in clinical care were noted by both groups, particularly in relationship to the uncertainty regarding how the information would be generated. Visualization and presentation of PRM information in a usable and useful format for both patients and clinicians was identified as a challenge to their successful implementation in clinical care. Conclusions: Findings from this study provide information on clinicians? and patients? perspectives on the clinical use of PRMs for the management of CRS. These international perspectives are important because they provide insight into the risks and benefits of using PRMs to evaluate CRS. In addition, they highlight the need to find ways to more effectively present and use this information in clinical practice. Further research that explores the best ways to incorporate this type of information while maintaining the human side of care is warranted. Trial Registration: ClinicalTrials.gov NCT02356081; https://clinicaltrials.gov/study/NCT02356081 UR - https://www.jmir.org/2024/1/e49309 UR - http://dx.doi.org/10.2196/49309 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/49309 ER - TY - JOUR AU - Xiao, Yi-Zhen AU - Chen, Xiao-Jia AU - Sun, Xiao-Ling AU - Chen, Huan AU - Luo, Yu-Xia AU - Chen, Yuan AU - Liang, Ye-Mei PY - 2024/6/19 TI - Effect of Implementing an Informatization Case Management Model on the Management of Chronic Respiratory Diseases in a General Hospital: Retrospective Controlled Study JO - JMIR Med Inform SP - e49978 VL - 12 KW - chronic disease management KW - chronic respiratory disease KW - hospital information system KW - informatization KW - information system KW - respiratory KW - pulmonary KW - breathing KW - implementation KW - care management KW - disease management KW - chronic obstructive pulmonary disease KW - case management N2 - Background: The use of chronic disease information systems in hospitals and communities plays a significant role in disease prevention, control, and monitoring. However, there are several limitations to these systems, including that the platforms are generally isolated, the patient health information and medical resources are not effectively integrated, and the ?Internet Plus Healthcare? technology model is not implemented throughout the patient consultation process. Objective: The aim of this study was to evaluate the efficiency of the application of a hospital case management information system in a general hospital in the context of chronic respiratory diseases as a model case. Methods: A chronic disease management information system was developed for use in general hospitals based on internet technology, a chronic disease case management model, and an overall quality management model. Using this system, the case managers provided sophisticated inpatient, outpatient, and home medical services for patients with chronic respiratory diseases. Chronic respiratory disease case management quality indicators (number of managed cases, number of patients accepting routine follow-up services, follow-up visit rate, pulmonary function test rate, admission rate for acute exacerbations, chronic respiratory diseases knowledge awareness rate, and patient satisfaction) were evaluated before (2019?2020) and after (2021?2022) implementation of the chronic disease management information system. Results: Before implementation of the chronic disease management information system, 1808 cases were managed in the general hospital, and an average of 603 (SD 137) people were provided with routine follow-up services. After use of the information system, 5868 cases were managed and 2056 (SD 211) patients were routinely followed-up, representing a significant increase of 3.2 and 3.4 times the respective values before use (U=342.779; P<.001). With respect to the quality of case management, compared to the indicators measured before use, the achievement rate of follow-up examination increased by 50.2%, the achievement rate of the pulmonary function test increased by 26.2%, the awareness rate of chronic respiratory disease knowledge increased by 20.1%, the retention rate increased by 16.3%, and the patient satisfaction rate increased by 9.6% (all P<.001), while the admission rate of acute exacerbation decreased by 42.4% (P<.001) after use of the chronic disease management information system. Conclusions: Use of a chronic disease management information system improves the quality of chronic respiratory disease case management and reduces the admission rate of patients owing to acute exacerbations of their diseases. UR - https://medinform.jmir.org/2024/1/e49978 UR - http://dx.doi.org/10.2196/49978 ID - info:doi/10.2196/49978 ER - TY - JOUR AU - Abdullahi, Tassallah AU - Mercurio, Laura AU - Singh, Ritambhara AU - Eickhoff, Carsten PY - 2024/6/19 TI - Retrieval-Based Diagnostic Decision Support: Mixed Methods Study JO - JMIR Med Inform SP - e50209 VL - 12 KW - clinical decision support KW - rare diseases KW - ensemble learning KW - retrieval-augmented learning KW - machine learning KW - electronic health records KW - natural language processing KW - retrieval augmented generation KW - RAG KW - electronic health record KW - EHR KW - data sparsity KW - information retrieval N2 - Background: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. Objective: This study aims to develop an information retrieval (IR)?based framework that accommodates data sparsity to facilitate broader diagnostic decision support. Methods: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR?s performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. Results: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. Conclusions: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases. UR - https://medinform.jmir.org/2024/1/e50209 UR - http://dx.doi.org/10.2196/50209 UR - http://www.ncbi.nlm.nih.gov/pubmed/38896468 ID - info:doi/10.2196/50209 ER - TY - JOUR AU - Richter, Gesine AU - Krawczak, Michael PY - 2024/6/18 TI - How to Elucidate Consent-Free Research Use of Medical Data: A Case for ?Health Data Literacy? JO - JMIR Med Inform SP - e51350 VL - 12 KW - health data literacy KW - informed consent KW - broad consent KW - data sharing KW - data collection KW - data donation KW - data linkage KW - personal health data UR - https://medinform.jmir.org/2024/1/e51350 UR - http://dx.doi.org/10.2196/51350 ID - info:doi/10.2196/51350 ER - TY - JOUR AU - Zheng, Yue AU - Zhao, Ailin AU - Yang, Yuqi AU - Wang, Laduona AU - Hu, Yifei AU - Luo, Ren AU - Wu, Yijun PY - 2024/6/12 TI - Real-World Survival Comparisons Between Radiotherapy and Surgery for Metachronous Second Primary Lung Cancer and Predictions of Lung Cancer?Specific Outcomes Using Machine Learning: Population-Based Study JO - JMIR Cancer SP - e53354 VL - 10 KW - metachronous second primary lung cancer KW - radiotherapy KW - surgical resection KW - propensity score matching analysis KW - machine learning N2 - Background: Metachronous second primary lung cancer (MSPLC) is not that rare but is seldom studied. Objective: We aim to compare real-world survival outcomes between different surgery strategies and radiotherapy for MSPLC. Methods: This retrospective study analyzed data collected from patients with MSPLC between 1988 and 2012 in the Surveillance, Epidemiology, and End Results (SEER) database. Propensity score matching (PSM) analyses and machine learning were performed to compare variables between patients with MSPLC. Survival curves were plotted using the Kaplan-Meier method and were compared using log-rank tests. Results: A total of 2451 MSPLC patients were categorized into the following treatment groups: 864 (35.3%) received radiotherapy, 759 (31%) underwent surgery, 89 (3.6%) had surgery plus radiotherapy, and 739 (30.2%) had neither treatment. After PSM, 470 pairs each for radiotherapy and surgery were generated. The surgery group had significantly better survival than the radiotherapy group (P<.001) and the untreated group (563 pairs; P<.001). Further analysis revealed that both wedge resection (85 pairs; P=.004) and lobectomy (71 pairs; P=.002) outperformed radiotherapy in overall survival for MSPLC patients. Machine learning models (extreme gradient boosting, random forest classifier, adaptive boosting) demonstrated high predictive performance based on area under the curve (AUC) values. Least absolute shrinkage and selection operator (LASSO) regression analysis identified 9 significant variables impacting cancer-specific survival, emphasizing surgery?s consistent influence across 1 year to 10 years. These variables encompassed age at diagnosis, sex, year of diagnosis, radiotherapy of initial primary lung cancer (IPLC), primary site, histology, surgery, chemotherapy, and radiotherapy of MPSLC. Competing risk analysis highlighted lower mortality for female MPSLC patients (hazard ratio [HR]=0.79, 95% CI 0.71-0.87) and recent IPLC diagnoses (HR=0.79, 95% CI 0.73-0.85), while radiotherapy for IPLC increased mortality (HR=1.31, 95% CI 1.16-1.50). Surgery alone had the lowest cancer-specific mortality (HR=0.83, 95% CI 0.81-0.85), with sublevel resection having the lowest mortality rate among the surgical approaches (HR=0.26, 95% CI 0.21-0.31). The findings provide valuable insights into the factors that influence cumulative cancer-specific mortality. Conclusions: Surgical resections such as wedge resection and lobectomy confer better survival than radiation therapy for MSPLC, but radiation can be a valid alternative for the treatment of MSPLC. UR - https://cancer.jmir.org/2024/1/e53354 UR - http://dx.doi.org/10.2196/53354 UR - http://www.ncbi.nlm.nih.gov/pubmed/38865182 ID - info:doi/10.2196/53354 ER - TY - JOUR AU - Stellmach, Caroline AU - Hopff, Marie Sina AU - Jaenisch, Thomas AU - Nunes de Miranda, Marina Susana AU - Rinaldi, Eugenia AU - PY - 2024/6/10 TI - Creation of Standardized Common Data Elements for Diagnostic Tests in Infectious Disease Studies: Semantic and Syntactic Mapping JO - J Med Internet Res SP - e50049 VL - 26 KW - core data element KW - CDE KW - case report form KW - CRF KW - interoperability KW - semantic standards KW - infectious disease KW - diagnostic test KW - covid19 KW - COVID-19 KW - mpox KW - ZIKV KW - patient data KW - data model KW - syntactic interoperability KW - clinical data KW - FHIR KW - SNOMED CT KW - LOINC KW - virus infection KW - common element N2 - Background: It is necessary to harmonize and standardize data variables used in case report forms (CRFs) of clinical studies to facilitate the merging and sharing of the collected patient data across several clinical studies. This is particularly true for clinical studies that focus on infectious diseases. Public health may be highly dependent on the findings of such studies. Hence, there is an elevated urgency to generate meaningful, reliable insights, ideally based on a high sample number and quality data. The implementation of core data elements and the incorporation of interoperability standards can facilitate the creation of harmonized clinical data sets. Objective: This study?s objective was to compare, harmonize, and standardize variables focused on diagnostic tests used as part of CRFs in 6 international clinical studies of infectious diseases in order to, ultimately, then make available the panstudy common data elements (CDEs) for ongoing and future studies to foster interoperability and comparability of collected data across trials. Methods: We reviewed and compared the metadata that comprised the CRFs used for data collection in and across all 6 infectious disease studies under consideration in order to identify CDEs. We examined the availability of international semantic standard codes within the Systemized Nomenclature of Medicine - Clinical Terms, the National Cancer Institute Thesaurus, and the Logical Observation Identifiers Names and Codes system for the unambiguous representation of diagnostic testing information that makes up the CDEs. We then proposed 2 data models that incorporate semantic and syntactic standards for the identified CDEs. Results: Of 216 variables that were considered in the scope of the analysis, we identified 11 CDEs to describe diagnostic tests (in particular, serology and sequencing) for infectious diseases: viral lineage/clade; test date, type, performer, and manufacturer; target gene; quantitative and qualitative results; and specimen identifier, type, and collection date. Conclusions: The identification of CDEs for infectious diseases is the first step in facilitating the exchange and possible merging of a subset of data across clinical studies (and with that, large research projects) for possible shared analysis to increase the power of findings. The path to harmonization and standardization of clinical study data in the interest of interoperability can be paved in 2 ways. First, a map to standard terminologies ensures that each data element?s (variable?s) definition is unambiguous and that it has a single, unique interpretation across studies. Second, the exchange of these data is assisted by ?wrapping? them in a standard exchange format, such as Fast Health care Interoperability Resources or the Clinical Data Interchange Standards Consortium?s Clinical Data Acquisition Standards Harmonization Model. UR - https://www.jmir.org/2024/1/e50049 UR - http://dx.doi.org/10.2196/50049 UR - http://www.ncbi.nlm.nih.gov/pubmed/38857066 ID - info:doi/10.2196/50049 ER - TY - JOUR AU - Lehmann, Marco AU - Jones, Lucy AU - Schirmann, Felix PY - 2024/6/7 TI - App Engagement as a Predictor of Weight Loss in Blended-Care Interventions: Retrospective Observational Study Using Large-Scale Real-World Data JO - J Med Internet Res SP - e45469 VL - 26 KW - obesity KW - weight loss KW - blended-care KW - digital health KW - real-world data KW - app engagement KW - mHealth KW - mobile health KW - technology engagement KW - weight management KW - mobile phone N2 - Background: Early weight loss is an established predictor for treatment outcomes in weight management interventions for people with obesity. However, there is a paucity of additional, reliable, and clinically actionable early predictors in weight management interventions. Novel blended-care weight management interventions combine coach and app support and afford new means of structured, continuous data collection, informing research on treatment adherence and outcome prediction. Objective: Against this backdrop, this study analyzes app engagement as a predictor for weight loss in large-scale, real-world, blended-care interventions. We hypothesize that patients who engage more frequently in app usage in blended-care treatment (eg, higher logging activity) lose more weight than patients who engage comparably less frequently at 3 and 6 months of intervention. Methods: Real-world data from 19,211 patients in obesity treatment were analyzed retrospectively. Patients were treated with 3 different blended-care weight management interventions, offered in Switzerland, the United Kingdom, and Germany by a digital behavior change provider. The principal component analysis identified an overarching metric for app engagement based on app usage. A median split informed a distinction in higher and lower engagers among the patients. Both groups were matched through optimal propensity score matching for relevant characteristics (eg, gender, age, and start weight). A linear regression model, combining patient characteristics and app-derived data, was applied to identify predictors for weight loss outcomes. Results: For the entire sample (N=19,211), mean weight loss was ?3.24% (SD 4.58%) at 3 months and ?5.22% (SD 6.29%) at 6 months. Across countries, higher app engagement yielded more weight loss than lower engagement after 3 but not after 6 months of intervention (P3 months<.001 and P6 months=.59). Early app engagement within the first 3 months predicted percentage weight loss in Switzerland and Germany, but not in the United Kingdom (PSwitzerland<.001, PUnited Kingdom=.12, and PGermany=.005). Higher age was associated with stronger weight loss in the 3-month period (PSwitzerland=.001, PUnited Kingdom=.002, and PGermany<.001) and, for Germany, also in the 6-month period (PSwitzerland=.09, PUnited Kingdom=.46, and PGermany=.03). In Switzerland, higher numbers of patients? messages to coaches were associated with higher weight loss (P3 months<.001 and P6 months<.001). Messages from coaches were not significantly associated with weight loss (all P>.05). Conclusions: Early app engagement is a predictor of weight loss, with higher engagement yielding more weight loss than lower engagement in this analysis. This new predictor lends itself to automated monitoring and as a digital indicator for needed or adapted clinical action. Further research needs to establish the reliability of early app engagement as a predictor for treatment adherence and outcomes. In general, the obtained results testify to the potential of app-derived data to inform clinical monitoring practices and intervention design. UR - https://www.jmir.org/2024/1/e45469 UR - http://dx.doi.org/10.2196/45469 UR - http://www.ncbi.nlm.nih.gov/pubmed/38848556 ID - info:doi/10.2196/45469 ER - TY - JOUR AU - Ohno, Yukiko AU - Kato, Riri AU - Ishikawa, Haruki AU - Nishiyama, Tomohiro AU - Isawa, Minae AU - Mochizuki, Mayumi AU - Aramaki, Eiji AU - Aomori, Tohru PY - 2024/6/4 TI - Using the Natural Language Processing System Medical Named Entity Recognition-Japanese to Analyze Pharmaceutical Care Records: Natural Language Processing Analysis JO - JMIR Form Res SP - e55798 VL - 8 KW - natural language processing KW - NLP KW - named entity recognition KW - pharmaceutical care records KW - machine learning KW - cefazolin sodium KW - electronic medical record KW - EMR KW - extraction KW - Japanese N2 - Background: Large language models have propelled recent advances in artificial intelligence technology, facilitating the extraction of medical information from unstructured data such as medical records. Although named entity recognition (NER) is used to extract data from physicians? records, it has yet to be widely applied to pharmaceutical care records. Objective: In this study, we aimed to investigate the feasibility of automatic extraction of the information regarding patients? diseases and symptoms from pharmaceutical care records. The verification was performed using Medical Named Entity Recognition-Japanese (MedNER-J), a Japanese disease-extraction system designed for physicians? records. Methods: MedNER-J was applied to subjective, objective, assessment, and plan data from the care records of 49 patients who received cefazolin sodium injection at Keio University Hospital between April 2018 and March 2019. The performance of MedNER-J was evaluated in terms of precision, recall, and F1-score. Results: The F1-scores of NER for subjective, objective, assessment, and plan data were 0.46, 0.70, 0.76, and 0.35, respectively. In NER and positive-negative classification, the F1-scores were 0.28, 0.39, 0.64, and 0.077, respectively. The F1-scores of NER for objective (0.70) and assessment data (0.76) were higher than those for subjective and plan data, which supported the superiority of NER performance for objective and assessment data. This might be because objective and assessment data contained many technical terms, similar to the training data for MedNER-J. Meanwhile, the F1-score of NER and positive-negative classification was high for assessment data alone (F1-score=0.64), which was attributed to the similarity of its description format and contents to those of the training data. Conclusions: MedNER-J successfully read pharmaceutical care records and showed the best performance for assessment data. However, challenges remain in analyzing records other than assessment data. Therefore, it will be necessary to reinforce the training data for subjective data in order to apply the system to pharmaceutical care records. UR - https://formative.jmir.org/2024/1/e55798 UR - http://dx.doi.org/10.2196/55798 UR - http://www.ncbi.nlm.nih.gov/pubmed/38833694 ID - info:doi/10.2196/55798 ER - TY - JOUR AU - Huang, Jiaoling AU - Qian, Ying AU - Yan, Yuge AU - Liang, Hong AU - Zhao, Laijun PY - 2024/6/3 TI - Addressing Hospital Overwhelm During the COVID-19 Pandemic by Using a Primary Health Care?Based Integrated Health System: Modeling Study JO - JMIR Med Inform SP - e54355 VL - 12 KW - hospital overwhelm KW - primary health care KW - modeling study KW - policy mix KW - pandemic KW - model KW - simulation KW - simulations KW - integrated KW - health system KW - hospital KW - hospitals KW - management KW - service KW - services KW - health systems KW - develop KW - development KW - bed KW - beds KW - overwhelm KW - death KW - deaths KW - mortality KW - primary care N2 - Background: After strict COVID-19?related restrictions were lifted, health systems globally were overwhelmed. Much has been discussed about how health systems could better prepare for future pandemics; however, primary health care (PHC) has been largely ignored. Objective: We aimed to investigate what combined policies PHC could apply to strengthen the health care system via a bottom-up approach, so as to better respond to a public health emergency. Methods: We developed a system dynamics model to replicate Shanghai?s response when COVID-19?related restrictions were lifted. We then simulated an alternative PHC-based integrated health system and tested the following three interventions: first contact in PHC with telemedicine services, recommendation to secondary care, and return to PHC for recovery. Results: The simulation results showed that each selected intervention could alleviate hospital overwhelm. Increasing the rate of first contact in PHC with telemedicine increased hospital bed availability by 6% to 12% and reduced the cumulative number of deaths by 35%. More precise recommendations had a limited impact on hospital overwhelm (<1%), but the simulation results showed that underrecommendation (rate: 80%) would result in a 19% increase in cumulative deaths. Increasing the rate of return to PHC from 5% to 20% improved hospital bed availability by 6% to 16% and reduced the cumulative number of deaths by 46%. Moreover, combining all 3 interventions had a multiplier effect; bed availability increased by 683%, and the cumulative number of deaths dropped by 75%. Conclusions: Rather than focusing on the allocation of medical resources in secondary care, we determined that an optimal PHC-based integrated strategy would be to have a 60% rate of first contact in PHC, a 110% recommendation rate, and a 20% rate of return to PHC. This could increase health system resilience during public health emergencies. UR - https://medinform.jmir.org/2024/1/e54355 UR - http://dx.doi.org/10.2196/54355 ID - info:doi/10.2196/54355 ER - TY - JOUR AU - Yoon, Dukyong AU - Han, Changho AU - Kim, Won Dong AU - Kim, Songsoo AU - Bae, SungA AU - Ryu, An Jee AU - Choi, Yujin PY - 2024/5/31 TI - Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange JO - J Med Internet Res SP - e56614 VL - 26 KW - health care interoperability KW - large language models KW - medical data transformation KW - data standardization KW - text-based N2 - Background: Efficient data exchange and health care interoperability are impeded by medical records often being in nonstandardized or unstructured natural language format. Advanced language models, such as large language models (LLMs), may help overcome current challenges in information exchange. Objective: This study aims to evaluate the capability of LLMs in transforming and transferring health care data to support interoperability. Methods: Using data from the Medical Information Mart for Intensive Care III and UK Biobank, the study conducted 3 experiments. Experiment 1 assessed the accuracy of transforming structured laboratory results into unstructured format. Experiment 2 explored the conversion of diagnostic codes between the coding frameworks of the ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), and Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) using a traditional mapping table and a text-based approach facilitated by the LLM ChatGPT. Experiment 3 focused on extracting targeted information from unstructured records that included comprehensive clinical information (discharge notes). Results: The text-based approach showed a high conversion accuracy in transforming laboratory results (experiment 1) and an enhanced consistency in diagnostic code conversion, particularly for frequently used diagnostic names, compared with the traditional mapping approach (experiment 2). In experiment 3, the LLM showed a positive predictive value of 87.2% in extracting generic drug names. Conclusions: This study highlighted the potential role of LLMs in significantly improving health care data interoperability, demonstrated by their high accuracy and efficiency in data transformation and exchange. The LLMs hold vast potential for enhancing medical data exchange without complex standardization for medical terms and data structure. UR - https://www.jmir.org/2024/1/e56614 UR - http://dx.doi.org/10.2196/56614 UR - http://www.ncbi.nlm.nih.gov/pubmed/38819879 ID - info:doi/10.2196/56614 ER - TY - JOUR AU - Heider, M. Paul AU - Meystre, M. Stéphane PY - 2024/5/28 TI - An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study JO - J Med Internet Res SP - e55676 VL - 26 KW - natural language processing KW - evaluation methodology KW - deidentification KW - privacy protection KW - de-identification KW - secondary use KW - patient privacy N2 - Background: Clinical natural language processing (NLP) researchers need access to directly comparable evaluation results for applications such as text deidentification across a range of corpus types and the means to easily test new systems or corpora within the same framework. Current systems, reported metrics, and the personally identifiable information (PII) categories evaluated are not easily comparable. Objective: This study presents an open-source and extensible end-to-end framework for comparing clinical NLP system performance across corpora even when the annotation categories do not align. Methods: As a use case for this framework, we use 6 off-the-shelf text deidentification systems (ie, CliniDeID, deid from PhysioNet, MITRE Identity Scrubber Toolkit [MIST], NeuroNER, National Library of Medicine [NLM] Scrubber, and Philter) across 3 standard clinical text corpora for the task (2 of which are publicly available) and 1 private corpus (all in English), with annotation categories that are not directly analogous. The framework is built on shell scripts that can be extended to include new systems, corpora, and performance metrics. We present this open tool, multiple means for aligning PII categories during evaluation, and our initial timing and performance metric findings. Code for running this framework with all settings needed to run all pairs are available via Codeberg and GitHub. Results: From this case study, we found large differences in processing speed between systems. The fastest system (ie, MIST) processed an average of 24.57 (SD 26.23) notes per second, while the slowest (ie, CliniDeID) processed an average of 1.00 notes per second. No system uniformly outperformed the others at identifying PII across corpora and categories. Instead, a rich tapestry of performance trade-offs emerged for PII categories. CliniDeID and Philter prioritize recall over precision (with an average recall 6.9 and 11.2 points higher, respectively, for partially matching spans of text matching any PII category), while the other 4 systems consistently have higher precision (with MIST?s precision scoring 20.2 points higher, NLM Scrubber scoring 4.4 points higher, NeuroNER scoring 7.2 points higher, and deid scoring 17.1 points higher). The macroaverage recall across corpora for identifying names, one of the more sensitive PII categories, included deid (48.8%) and MIST (66.9%) at the low end and NeuroNER (84.1%), NLM Scrubber (88.1%), and CliniDeID (95.9%) at the high end. A variety of metrics across categories and corpora are reported with a wider variety (eg, F2-score) available via the tool. Conclusions: NLP systems in general and deidentification systems and corpora in our use case tend to be evaluated in stand-alone research articles that only include a limited set of comparators. We hold that a single evaluation pipeline across multiple systems and corpora allows for more nuanced comparisons. Our open pipeline should reduce barriers to evaluation and system advancement. UR - https://www.jmir.org/2024/1/e55676 UR - http://dx.doi.org/10.2196/55676 UR - http://www.ncbi.nlm.nih.gov/pubmed/38805692 ID - info:doi/10.2196/55676 ER - TY - JOUR AU - Hoang, Uy AU - Delanerolle, Gayathri AU - Fan, Xuejuan AU - Aspden, Carole AU - Byford, Rachel AU - Ashraf, Mansoor AU - Haag, Mendel AU - Elson, William AU - Leston, Meredith AU - Anand, Sneha AU - Ferreira, Filipa AU - Joy, Mark AU - Hobbs, Richard AU - de Lusignan, Simon PY - 2024/5/24 TI - A Profile of Influenza Vaccine Coverage for 2019-2020: Database Study of the English Primary Care Sentinel Cohort JO - JMIR Public Health Surveill SP - e39297 VL - 10 KW - medical records systems KW - computerize KW - influenza KW - influenza vaccines KW - sentinel surveillance KW - vocabulary controlled KW - general practitioners KW - general practice KW - primary health care KW - vaccine KW - public health KW - surveillance KW - uptake N2 - Background: Innovation in seasonal influenza vaccine development has resulted in a wider range of formulations becoming available. Understanding vaccine coverage across populations including the timing of administration is important when evaluating vaccine benefits and risks. Objective: This study aims to report the representativeness, uptake of influenza vaccines, different formulations of influenza vaccines, and timing of administration within the English Primary Care Sentinel Cohort (PCSC). Methods: We used the PCSC of the Oxford-Royal College of General Practitioners Research and Surveillance Centre. We included patients of all ages registered with PCSC member general practices, reporting influenza vaccine coverage between September 1, 2019, and January 29, 2020. We identified influenza vaccination recipients and characterized them by age, clinical risk groups, and vaccine type. We reported the date of influenza vaccination within the PCSC by International Standard Organization (ISO) week. The representativeness of the PCSC population was compared with population data provided by the Office for National Statistics. PCSC influenza vaccine coverage was compared with published UK Health Security Agency?s national data. We used paired t tests to compare populations, reported with 95% CI. Results: The PCSC comprised 7,010,627 people from 693 general practices. The study population included a greater proportion of people aged 18-49 years (2,982,390/7,010,627, 42.5%; 95% CI 42.5%-42.6%) compared with the Office for National Statistics 2019 midyear population estimates (23,219,730/56,286,961, 41.3%; 95% CI 4.12%-41.3%; P<.001). People who are more deprived were underrepresented and those in the least deprived quintile were overrepresented. Within the study population, 24.7% (1,731,062/7,010,627; 95% CI 24.7%-24.7%) of people of all ages received an influenza vaccine compared with 24.2% (14,468,665/59,764,928; 95% CI 24.2%-24.2%; P<.001) in national data. The highest coverage was in people aged ?65 years (913,695/1,264,700, 72.3%; 95% CI 72.2%-72.3%). The proportion of people in risk groups who received an influenza vaccine was also higher; for example, 69.8% (284,280/407,228; 95% CI 69.7%-70%) of people with diabetes in the PCSC received an influenza vaccine compared with 61.2% (983,727/1,607,996; 95% CI 61.1%-61.3%; P<.001) in national data. In the PCSC, vaccine type and brand information were available for 71.8% (358,365/498,923; 95% CI 71.7%-72%) of people aged 16-64 years and 81.9% (748,312/913,695; 95% CI 81.8%-82%) of people aged ?65 years, compared with 23.6% (696,880/2,900,000) and 17.8% (1,385,888/7,700,000), respectively, of the same age groups in national data. Vaccination commenced during ISO week 35, continued until ISO week 3, and peaked during ISO week 41. The in-week peak in vaccination administration was on Saturdays. Conclusions: The PCSC?s sociodemographic profile was similar to the national population and captured more data about risk groups, vaccine brands, and batches. This may reflect higher data quality. Its capabilities included reporting precise dates of administration. The PCSC is suitable for undertaking studies of influenza vaccine coverage. UR - https://publichealth.jmir.org/2024/1/e39297 UR - http://dx.doi.org/10.2196/39297 UR - http://www.ncbi.nlm.nih.gov/pubmed/38787605 ID - info:doi/10.2196/39297 ER - TY - JOUR AU - Bilotta, Isabel AU - Tonidandel, Scott AU - Liaw, R. Winston AU - King, Eden AU - Carvajal, N. Diana AU - Taylor, Ayana AU - Thamby, Julie AU - Xiang, Yang AU - Tao, Cui AU - Hansen, Michael PY - 2024/5/23 TI - Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis JO - JMIR Med Inform SP - e50428 VL - 12 KW - bias KW - sociodemographic factors KW - health care disparities KW - natural language processing KW - sentiment analysis KW - diabetes KW - electronic health record KW - racial KW - ethnic KW - diversity KW - Hispanic KW - medical interaction N2 - Background: Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias. Objective: We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias. Methods: In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes. Results: We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient ?0.02, SE 0.007), trust verbs (coefficient ?0.009, SE 0.004), and joy words (coefficient ?0.03, SE 0.01) than those for White non-Hispanic patients. Conclusions: This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias. UR - https://medinform.jmir.org/2024/1/e50428 UR - http://dx.doi.org/10.2196/50428 ID - info:doi/10.2196/50428 ER - TY - JOUR AU - Stevens, R. Elizabeth AU - Xu, Lynn AU - Kwon, JaeEun AU - Tasneem, Sumaiya AU - Henning, Natalie AU - Feldthouse, Dawn AU - Kim, Ji Eun AU - Hess, Rachel AU - Dauber-Decker, L. Katherine AU - Smith, D. Paul AU - Halm, Wendy AU - Gautam-Goyal, Pranisha AU - Feldstein, A. David AU - Mann, M. Devin PY - 2024/5/23 TI - Barriers to Implementing Registered Nurse?Driven Clinical Decision Support for Antibiotic Stewardship: Retrospective Case Study JO - JMIR Form Res SP - e54996 VL - 8 KW - integrated clinical prediction rules KW - EHR KW - electronic health record KW - implementation KW - barriers KW - acute respiratory infections KW - antibiotics KW - CDS KW - clinical decision support KW - decision support KW - antibiotic KW - prescribe KW - prescription KW - acute respiratory infection KW - barrier KW - effectiveness KW - registered nurse KW - RN KW - RN-driven intervention KW - personnel availability KW - workflow variability KW - infrastructure KW - infrastructures KW - law KW - laws KW - policy KW - policies KW - clinical-care setting KW - clinical setting KW - electronic health records KW - RN-driven KW - antibiotic stewardship KW - retrospective analysis KW - Consolidated Framework for Implementation Research KW - CFIR KW - CDS-based intervention KW - urgent care KW - New York KW - chart review KW - interview KW - interviews KW - staff change KW - staff changes KW - RN shortage KW - RN shortages KW - turnover KW - health system KW - nurse KW - nurses KW - researcher KW - researchers N2 - Background: Up to 50% of antibiotic prescriptions for upper respiratory infections (URIs) are inappropriate. Clinical decision support (CDS) systems to mitigate unnecessary antibiotic prescriptions have been implemented into electronic health records, but their use by providers has been limited. Objective: As a delegation protocol, we adapted a validated electronic health record?integrated clinical prediction rule (iCPR) CDS-based intervention for registered nurses (RNs), consisting of triage to identify patients with low-acuity URI followed by CDS-guided RN visits. It was implemented in February 2022 as a randomized controlled stepped-wedge trial in 43 primary and urgent care practices within 4 academic health systems in New York, Wisconsin, and Utah. While issues were pragmatically addressed as they arose, a systematic assessment of the barriers to implementation is needed to better understand and address these barriers. Methods: We performed a retrospective case study, collecting quantitative and qualitative data regarding clinical workflows and triage-template use from expert interviews, study surveys, routine check-ins with practice personnel, and chart reviews over the first year of implementation of the iCPR intervention. Guided by the updated CFIR (Consolidated Framework for Implementation Research), we characterized the initial barriers to implementing a URI iCPR intervention for RNs in ambulatory care. CFIR constructs were coded as missing, neutral, weak, or strong implementation factors. Results: Barriers were identified within all implementation domains. The strongest barriers were found in the outer setting, with those factors trickling down to impact the inner setting. Local conditions driven by COVID-19 served as one of the strongest barriers, impacting attitudes among practice staff and ultimately contributing to a work infrastructure characterized by staff changes, RN shortages and turnover, and competing responsibilities. Policies and laws regarding scope of practice of RNs varied by state and institutional application of those laws, with some allowing more clinical autonomy for RNs. This necessitated different study procedures at each study site to meet practice requirements, increasing innovation complexity. Similarly, institutional policies led to varying levels of compatibility with existing triage, rooming, and documentation workflows. These workflow conflicts were compounded by limited available resources, as well as an implementation climate of optional participation, few participation incentives, and thus low relative priority compared to other clinical duties. Conclusions: Both between and within health care systems, significant variability existed in workflows for patient intake and triage. Even in a relatively straightforward clinical workflow, workflow and cultural differences appreciably impacted intervention adoption. Takeaways from this study can be applied to other RN delegation protocol implementations of new and innovative CDS tools within existing workflows to support integration and improve uptake. When implementing a system-wide clinical care intervention, considerations must be made for variability in culture and workflows at the state, health system, practice, and individual levels. Trial Registration: ClinicalTrials.gov NCT04255303; https://clinicaltrials.gov/ct2/show/NCT04255303 UR - https://formative.jmir.org/2024/1/e54996 UR - http://dx.doi.org/10.2196/54996 UR - http://www.ncbi.nlm.nih.gov/pubmed/38781006 ID - info:doi/10.2196/54996 ER - TY - JOUR AU - Yardley, Elizabeth AU - Davis, Alice AU - Eldridge, Chris AU - Vasilakis, Christos PY - 2024/5/21 TI - Data-Driven Exploration of National Health Service Talking Therapies Care Pathways Using Process Mining: Retrospective Cohort Study JO - JMIR Ment Health SP - e53894 VL - 11 KW - electronic health record KW - EHR KW - electronic health records KW - EHRs KW - health record KW - data science KW - secondary data analysis KW - mental health services KW - mental health KW - health information system KW - HIS KW - information system KW - information systems KW - process mining KW - flow KW - flows KW - path KW - pathway KW - pathways KW - delivery KW - visualization N2 - Background: The National Health Service (NHS) Talking Therapies program treats people with common mental health problems in England according to ?stepped care,? in which lower-intensity interventions are offered in the first instance, where clinically appropriate. Limited resources and pressure to achieve service standards mean that program providers are exploring all opportunities to evaluate and improve the flow of patients through their service. Existing research has found variation in clinical performance and stepped care implementation across sites and has identified associations between service delivery and patient outcomes. Process mining offers a data-driven approach to analyzing and evaluating health care processes and systems, enabling comparison of presumed models of service delivery and their actual implementation in practice. The value and utility of applying process mining to NHS Talking Therapies data for the analysis of care pathways have not been studied. Objective: A better understanding of systems of service delivery will support improvements and planned program expansion. Therefore, this study aims to demonstrate the value and utility of applying process mining to NHS Talking Therapies care pathways using electronic health records. Methods: Routine collection of a wide variety of data regarding activity and patient outcomes underpins the Talking Therapies program. In our study, anonymized individual patient referral records from two sites over a 2-year period were analyzed using process mining to visualize the care pathway process by mapping the care pathway and identifying common pathway routes. Results: Process mining enabled the identification and visualization of patient flows directly from routinely collected data. These visualizations illustrated waiting periods and identified potential bottlenecks, such as the wait for higher-intensity cognitive behavioral therapy (CBT) at site 1. Furthermore, we observed that patients discharged from treatment waiting lists appeared to experience longer wait durations than those who started treatment. Process mining allowed analysis of treatment pathways, showing that patients commonly experienced treatment routes that involved either low- or high-intensity interventions alone. Of the most common routes, >5 times as many patients experienced direct access to high-intensity treatment rather than stepped care. Overall, 3.32% (site 1: 1507/45,401) and 4.19% (site 2: 527/12,590) of all patients experienced stepped care. Conclusions: Our findings demonstrate how process mining can be applied to Talking Therapies care pathways to evaluate pathway performance, explore relationships among performance issues, and highlight systemic issues, such as stepped care being relatively uncommon within a stepped care system. Integration of process mining capability into routine monitoring will enable NHS Talking Therapies service stakeholders to explore such issues from a process perspective. These insights will provide value to services by identifying areas for service improvement, providing evidence for capacity planning decisions, and facilitating better quality analysis into how health systems can affect patient outcomes. UR - https://mental.jmir.org/2024/1/e53894 UR - http://dx.doi.org/10.2196/53894 UR - http://www.ncbi.nlm.nih.gov/pubmed/38771630 ID - info:doi/10.2196/53894 ER - TY - JOUR AU - Manz, R. Christopher AU - Schriver, Emily AU - Ferrell, J. William AU - Williamson, Joelle AU - Wakim, Jonathan AU - Khan, Neda AU - Kopinsky, Michael AU - Balachandran, Mohan AU - Chen, Jinbo AU - Patel, S. Mitesh AU - Takvorian, U. Samuel AU - Shulman, N. Lawrence AU - Bekelman, E. Justin AU - Barnett, J. Ian AU - Parikh, B. Ravi PY - 2024/5/17 TI - Association of Remote Patient-Reported Outcomes and Step Counts With Hospitalization or Death Among Patients With Advanced Cancer Undergoing Chemotherapy: Secondary Analysis of the PROStep Randomized Trial JO - J Med Internet Res SP - e51059 VL - 26 KW - wearables KW - accelerometers KW - patient-reported outcomes KW - step counts KW - oncology KW - accelerometer KW - patient-generated health data KW - cancer KW - death KW - chemotherapy KW - symptoms KW - gastrointestinal cancer KW - lung cancer KW - monitoring KW - symptom burden KW - risk KW - hospitalization KW - mobile phone N2 - Background: Patients with advanced cancer undergoing chemotherapy experience significant symptoms and declines in functional status, which are associated with poor outcomes. Remote monitoring of patient-reported outcomes (PROs; symptoms) and step counts (functional status) may proactively identify patients at risk of hospitalization or death. Objective: The aim of this study is to evaluate the association of (1) longitudinal PROs with step counts and (2) PROs and step counts with hospitalization or death. Methods: The PROStep randomized trial enrolled 108 patients with advanced gastrointestinal or lung cancers undergoing cytotoxic chemotherapy at a large academic cancer center. Patients were randomized to weekly text-based monitoring of 8 PROs plus continuous step count monitoring via Fitbit (Google) versus usual care.?This preplanned secondary analysis included 57 of 75 patients randomized to the intervention who had PRO and step count data. We analyzed the associations between PROs and mean daily step counts and the associations of PROs and step counts with the composite outcome of hospitalization or death using bootstrapped generalized linear models to account for longitudinal data. Results: Among 57 patients, the mean age was 57 (SD 10.9) years, 24 (42%) were female, 43 (75%) had advanced gastrointestinal cancer, 14 (25%) had advanced lung cancer, and 25 (44%) were hospitalized or died during follow-up. A 1-point weekly increase (on a 32-point scale) in aggregate PRO score was associated with 247 fewer mean daily steps (95% CI ?277 to ?213; P<.001). PROs most strongly associated with step count decline were patient-reported activity (daily step change ?892), nausea score (?677), and constipation score (524). A 1-point weekly increase in aggregate PRO score was associated with 20% greater odds of hospitalization or death (adjusted odds ratio [aOR] 1.2, 95% CI 1.1-1.4; P=.01). PROs most strongly associated with hospitalization or death were pain (aOR 3.2, 95% CI 1.6-6.5; P<.001), decreased activity (aOR 3.2, 95% CI 1.4-7.1; P=.01), dyspnea (aOR 2.6, 95% CI 1.2-5.5; P=.02), and sadness (aOR 2.1, 95% CI 1.1-4.3; P=.03). A decrease in 1000 steps was associated with 16% greater odds of hospitalization or death (aOR 1.2, 95% CI 1.0-1.3; P=.03). Compared with baseline, mean daily step count decreased 7% (n=274 steps), 9% (n=351 steps), and 16% (n=667 steps) in the 3, 2, and 1 weeks before hospitalization or death, respectively. Conclusions: In this secondary analysis of a randomized trial among patients with advanced cancer, higher symptom burden and decreased step count were independently associated with and predictably worsened close to hospitalization or death. Future interventions should leverage longitudinal PRO and step count data to target interventions toward patients at risk for poor outcomes. Trial Registration: ClinicalTrials.gov NCT04616768; https://clinicaltrials.gov/study/NCT04616768 International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2021-054675 UR - https://www.jmir.org/2024/1/e51059 UR - http://dx.doi.org/10.2196/51059 UR - http://www.ncbi.nlm.nih.gov/pubmed/38758583 ID - info:doi/10.2196/51059 ER - TY - JOUR AU - Yanovitzky, Itzhak AU - Stahlman, Gretchen AU - Quow, Justine AU - Ackerman, Matthew AU - Perry, Yehuda AU - Kim, Miriam PY - 2024/5/16 TI - National Public Health Dashboards: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e52843 VL - 13 KW - dashboard KW - scoping review KW - public health KW - design KW - development KW - implementation KW - evaluation KW - user need KW - protocol KW - data dashboards KW - audiences KW - audience KW - systematic treatment KW - public health data dashboards KW - PRISMA-ScR KW - snowballing techniques KW - gray literature sources KW - evidence-informed framework KW - framework KW - COVID-19 KW - pandemic N2 - Background: The COVID-19 pandemic highlighted the importance of robust public health data systems and the potential utility of data dashboards for ensuring access to critical public health data for diverse groups of stakeholders and decision makers. As dashboards are becoming ubiquitous, it is imperative to consider how they may be best integrated with public health data systems and the decision-making routines of diverse audiences. However, additional progress on the continued development, improvement, and sustainability of these tools requires the integration and synthesis of a largely fragmented scholarship regarding the purpose, design principles and features, successful implementation, and decision-making supports provided by effective public health data dashboards across diverse users and applications. Objective: This scoping review aims to provide a descriptive and thematic overview of national public health data dashboards including their purpose, intended audiences, health topics, design elements, impact, and underlying mechanisms of use and usefulness of these tools in decision-making processes. It seeks to identify gaps in the current literature on the topic and provide the first-of-its-kind systematic treatment of actionability as a critical design element of public health data dashboards. Methods: The scoping review follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. The review considers English-language, peer-reviewed journal papers, conference proceedings, book chapters, and reports that describe the design, implementation, and evaluation of a public health dashboard published between 2000 and 2023. The search strategy covers scholarly databases (CINAHL, PubMed, Medline, and Web of Science) and gray literature sources and uses snowballing techniques. An iterative process of testing for and improving intercoder reliability was implemented to ensure that coders are properly trained to screen documents according to the inclusion criteria prior to beginning the full review of relevant papers. Results: The search process initially identified 2544 documents, including papers located via databases, gray literature searching, and snowballing. Following the removal of duplicate documents (n=1416), nonrelevant items (n=839), and items classified as literature reviews and background information (n=73), 216 documents met the inclusion criteria: US case studies (n=90) and non-US case studies (n=126). Data extraction will focus on key variables, including public health data characteristics; dashboard design elements and functionalities; intended users, usability, logistics, and operation; and indicators of usefulness and impact reported. Conclusions: The scoping review will analyze the goals, design, use, usefulness, and impact of public health data dashboards. The review will also inform the continued development and improvement of these tools by analyzing and synthesizing current practices and lessons emerging from the literature on the topic and proposing a theory-grounded and evidence-informed framework for designing, implementing, and evaluating public health data dashboards. International Registered Report Identifier (IRRID): DERR1-10.2196/52843 UR - https://www.researchprotocols.org/2024/1/e52843 UR - http://dx.doi.org/10.2196/52843 UR - http://www.ncbi.nlm.nih.gov/pubmed/38753428 ID - info:doi/10.2196/52843 ER - TY - JOUR AU - Gandrup, Julie AU - Selby, A. David AU - Dixon, G. William PY - 2024/5/14 TI - Classifying Self-Reported Rheumatoid Arthritis Flares Using Daily Patient-Generated Data From a Smartphone App: Exploratory Analysis Applying Machine Learning Approaches JO - JMIR Form Res SP - e50679 VL - 8 KW - rheumatoid arthritis KW - flare KW - patient-generated health data KW - smartphone KW - mobile health KW - machine learning KW - arthritis KW - rheumatic KW - rheumatism KW - joint KW - joints KW - arthritic KW - musculoskeletal KW - flares KW - classify KW - classification KW - symptom KW - symptoms KW - mobile phone N2 - Background: The ability to predict rheumatoid arthritis (RA) flares between clinic visits based on real-time, longitudinal patient-generated data could potentially allow for timely interventions to avoid disease worsening. Objective: This exploratory study aims to investigate the feasibility of using machine learning methods to classify self-reported RA flares based on a small data set of daily symptom data collected on a smartphone app. Methods: Daily symptoms and weekly flares reported on the Remote Monitoring of Rheumatoid Arthritis (REMORA) smartphone app from 20 patients with RA over 3 months were used. Predictors were several summary features of the daily symptom scores (eg, pain and fatigue) collected in the week leading up to the flare question. We fitted 3 binary classifiers: logistic regression with and without elastic net regularization, a random forest, and naive Bayes. Performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. For the best-performing model, we considered sensitivity and specificity for different thresholds in order to illustrate different ways in which the predictive model could behave in a clinical setting. Results: The data comprised an average of 60.6 daily reports and 10.5 weekly reports per participant. Participants reported a median of 2 (IQR 0.75-4.25) flares each over a median follow-up time of 81 (IQR 79-82) days. AUCs were broadly similar between models, but logistic regression with elastic net regularization had the highest AUC of 0.82. At a cutoff requiring specificity to be 0.80, the corresponding sensitivity to detect flares was 0.60 for this model. The positive predictive value (PPV) in this population was 53%, and the negative predictive value (NPV) was 85%. Given the prevalence of flares, the best PPV achieved meant only around 2 of every 3 positive predictions were correct (PPV 0.65). By prioritizing a higher NPV, the model correctly predicted over 9 in every 10 non-flare weeks, but the accuracy of predicted flares fell to only 1 in 2 being correct (NPV and PPV of 0.92 and 0.51, respectively). Conclusions: Predicting self-reported flares based on daily symptom scorings in the preceding week using machine learning methods was feasible. The observed predictive accuracy might improve as we obtain more data, and these exploratory results need to be validated in an external cohort. In the future, analysis of frequently collected patient-generated data may allow us to predict flares before they unfold, opening opportunities for just-in-time adaptative interventions. Depending on the nature and implication of an intervention, different cutoff values for an intervention decision need to be considered, as well as the level of predictive certainty required. UR - https://formative.jmir.org/2024/1/e50679 UR - http://dx.doi.org/10.2196/50679 UR - http://www.ncbi.nlm.nih.gov/pubmed/38743480 ID - info:doi/10.2196/50679 ER - TY - JOUR AU - Senior, Rashaud AU - Tsai, Timothy AU - Ratliff, William AU - Nadler, Lisa AU - Balu, Suresh AU - Malcolm, Elizabeth AU - McPeek Hinz, Eugenia PY - 2024/5/9 TI - Evaluation of SNOMED CT Grouper Accuracy and Coverage in Organizing the Electronic Health Record Problem List by Clinical System: Observational Study JO - JMIR Med Inform SP - e51274 VL - 12 KW - electronic health record KW - problem List KW - problem list organization KW - problem list management KW - SNOMED CT KW - SNOMED CT Groupers KW - Systematized Nomenclature of Medicine KW - clinical term KW - ICD-10 KW - International Classification of Diseases N2 - Background: The problem list (PL) is a repository of diagnoses for patients? medical conditions and health-related issues. Unfortunately, over time, our PLs have become overloaded with duplications, conflicting entries, and no-longer-valid diagnoses. The lack of a standardized structure for review adds to the challenges of clinical use. Previously, our default electronic health record (EHR) organized the PL primarily via alphabetization, with other options available, for example, organization by clinical systems or priority settings. The system?s PL was built with limited groupers, resulting in many diagnoses that were inconsistent with the expected clinical systems or not associated with any clinical systems at all. As a consequence of these limited EHR configuration options, our PL organization has poorly supported clinical use over time, particularly as the number of diagnoses on the PL has increased. Objective: We aimed to measure the accuracy of sorting PL diagnoses into PL system groupers based on Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) concept groupers implemented in our EHR. Methods: We transformed and developed 21 system- or condition-based groupers, using 1211 SNOMED CT hierarchal concepts refined with Boolean logic, to reorganize the PL in our EHR. To evaluate the clinical utility of our new groupers, we extracted all diagnoses on the PLs from a convenience sample of 50 patients with 3 or more encounters in the previous year. To provide a spectrum of clinical diagnoses, we included patients from all ages and divided them by sex in a deidentified format. Two physicians independently determined whether each diagnosis was correctly attributed to the expected clinical system grouper. Discrepancies were discussed, and if no consensus was reached, they were adjudicated by a third physician. Descriptive statistics and Cohen ? statistics for interrater reliability were calculated. Results: Our 50-patient sample had a total of 869 diagnoses (range 4-59; median 12, IQR 9-24). The reviewers initially agreed on 821 system attributions. Of the remaining 48 items, 16 required adjudication with the tie-breaking third physician. The calculated ? statistic was 0.7. The PL groupers appropriately associated diagnoses to the expected clinical system with a sensitivity of 97.6%, a specificity of 58.7%, a positive predictive value of 96.8%, and an F1-score of 0.972. Conclusions: We found that PL organization by clinical specialty or condition using SNOMED CT concept groupers accurately reflects clinical systems. Our system groupers were subsequently adopted by our vendor EHR in their foundation system for PL organization. UR - https://medinform.jmir.org/2024/1/e51274 UR - http://dx.doi.org/10.2196/51274 ID - info:doi/10.2196/51274 ER - TY - JOUR AU - Trojan, Andreas AU - Kühne, Christian AU - Kiessling, Michael AU - Schumacher, Johannes AU - Dröse, Stefan AU - Singer, Christian AU - Jackisch, Christian AU - Thomssen, Christoph AU - Kullak-Ublick, A. Gerd PY - 2024/5/6 TI - Impact of Electronic Patient-Reported Outcomes on Unplanned Consultations and Hospitalizations in Patients With Cancer Undergoing Systemic Therapy: Results of a Patient-Reported Outcome Study Compared With Matched Retrospective Data JO - JMIR Form Res SP - e55917 VL - 8 KW - systemic cancer therapy KW - electronic patient-reported outcome KW - ePRO KW - ePROs KW - Consilium Care KW - medidux KW - unplanned consultation KW - hospitalization KW - hospitalizations KW - hospitalized KW - cancer KW - oncology KW - side effect KW - side effects KW - adverse KW - chemotherapy KW - patient reported outcome KW - PRO KW - PROs KW - mobile health KW - mHealth KW - app KW - apps KW - application KW - applications KW - mobile phone N2 - Background: The evaluation of electronic patient-reported outcomes (ePROs) is increasingly being used in clinical studies of patients with cancer and enables structured and standardized data collection in patients? everyday lives. So far, few studies or analyses have focused on the medical benefit of ePROs for patients. Objective: The current exploratory analysis aimed to obtain an initial indication of whether the use of the Consilium Care app (recently renamed medidux; mobile Health AG) for structured and regular self-assessment of side effects by ePROs had a recognizable effect on incidences of unplanned consultations and hospitalizations of patients with cancer compared to a control group in a real-world care setting without app use. To analyze this, the incidences of unplanned consultations and hospitalizations of patients with cancer using the Consilium Care app that were recorded by the treating physicians as part of the patient reported outcome (PRO) study were compared retrospectively to corresponding data from a comparable population of patients with cancer collected at 2 Swiss oncology centers during standard-of-care treatment. Methods: Patients with cancer in the PRO study (178 included in this analysis) receiving systemic therapy in a neoadjuvant or noncurative setting performed a self-assessment of side effects via the Consilium Care app over an observational period of 90 days. In this period, unplanned (emergency) consultations and hospitalizations were documented by the participating physicians. The incidence of these events was compared with retrospective data obtained from 2 Swiss tumor centers for a matched cohort of patients with cancer. Results: Both patient groups were comparable in terms of age and gender ratio, as well as the distribution of cancer entities and Joint Committee on Cancer stages. In total, 139 patients from each group were treated with chemotherapy and 39 with other therapies. Looking at all patients, no significant difference in events per patient was found between the Consilium group and the control group (odds ratio 0.742, 90% CI 0.455-1.206). However, a multivariate regression model revealed that the interaction term between the Consilium group and the factor ?chemotherapy? was significant at the 5% level (P=.048). This motivated a corresponding subgroup analysis that indicated a relevant reduction of the risk for the intervention group in the subgroup of patients who underwent chemotherapy. The corresponding odds ratio of 0.53, 90% CI 0.288-0.957 is equivalent to a halving of the risk for patients in the Consilium group and suggests a clinically relevant effect that is significant at a 2-sided 10% level (P=.08, Fisher exact test). Conclusions: A comparison of unplanned consultations and hospitalizations from the PRO study with retrospective data from a comparable cohort of patients with cancer suggests a positive effect of regular app-based ePROs for patients receiving chemotherapy. These data are to be verified in the ongoing randomized PRO2 study (registered on ClinicalTrials.gov; NCT05425550). Trial Registration: ClinicalTrials.gov NCT03578731; https://www.clinicaltrials.gov/ct2/show/NCT03578731 International Registered Report Identifier (IRRID): RR2-10.2196/29271 UR - https://formative.jmir.org/2024/1/e55917 UR - http://dx.doi.org/10.2196/55917 UR - http://www.ncbi.nlm.nih.gov/pubmed/38710048 ID - info:doi/10.2196/55917 ER - TY - JOUR AU - Green, Shaw Sara AU - Lee, Sung-Jae AU - Chahin, Samantha AU - Pooler-Burgess, Meardith AU - Green-Jones, Monique AU - Gurung, Sitaji AU - Outlaw, Y. Angulique AU - Naar, Sylvie PY - 2024/5/2 TI - Regulatory Issues in Electronic Health Records for Adolescent HIV Research: Strategies and Lessons Learned JO - JMIR Form Res SP - e46420 VL - 8 KW - electronic health record KW - HIV KW - pragmatic trial KW - regulatory KW - EHR KW - pre-exposure prophylaxis KW - retention KW - attrition KW - dropout KW - legal KW - regulation KW - adherence KW - ethic KW - review board KW - implementation KW - data use KW - privacy N2 - Background: Electronic health records (EHRs) are a cost-effective approach to provide the necessary foundations for clinical trial research. The ability to use EHRs in real-world clinical settings allows for pragmatic approaches to intervention studies with the emerging adult HIV population within these settings; however, the regulatory components related to the use of EHR data in multisite clinical trials poses unique challenges that researchers may find themselves unprepared to address, which may result in delays in study implementation and adversely impact study timelines, and risk noncompliance with established guidance. Objective: As part of the larger Adolescent Trials Network (ATN) for HIV/AIDS Interventions Protocol 162b (ATN 162b) study that evaluated clinical-level outcomes of an intervention including HIV treatment and pre-exposure prophylaxis services to improve retention within the emerging adult HIV population, the objective of this study is to highlight the regulatory process and challenges in the implementation of a multisite pragmatic trial using EHRs to assist future researchers conducting similar studies in navigating the often time-consuming regulatory process and ensure compliance with adherence to study timelines and compliance with institutional and sponsor guidelines. Methods: Eight sites were engaged in research activities, with 4 sites selected from participant recruitment venues as part of the ATN, who participated in the intervention and data extraction activities, and an additional 4 sites were engaged in data management and analysis. The ATN 162b protocol team worked with site personnel to establish the necessary regulatory infrastructure to collect EHR data to evaluate retention in care and viral suppression, as well as para-data on the intervention component to assess the feasibility and acceptability of the mobile health intervention. Methods to develop this infrastructure included site-specific training activities and the development of both institutional reliance and data use agreements. Results: Due to variations in site-specific activities, and the associated regulatory implications, the study team used a phased approach with the data extraction sites as phase 1 and intervention sites as phase 2. This phased approach was intended to address the unique regulatory needs of all participating sites to ensure that all sites were properly onboarded and all regulatory components were in place. Across all sites, the regulatory process spanned 6 months for the 4 data extraction and intervention sites, and up to 10 months for the data management and analysis sites. Conclusions: The process for engaging in multisite clinical trial studies using EHR data is a multistep, collaborative effort that requires proper advanced planning from the proposal stage to adequately implement the necessary training and infrastructure. Planning, training, and understanding the various regulatory aspects, including the necessity of data use agreements, reliance agreements, external institutional review board review, and engagement with clinical sites, are foremost considerations to ensure successful implementation and adherence to pragmatic trial timelines and outcomes. UR - https://formative.jmir.org/2024/1/e46420 UR - http://dx.doi.org/10.2196/46420 UR - http://www.ncbi.nlm.nih.gov/pubmed/38696775 ID - info:doi/10.2196/46420 ER - TY - JOUR AU - Gao, Zhenyue AU - Liu, Xiaoli AU - Kang, Yu AU - Hu, Pan AU - Zhang, Xiu AU - Yan, Wei AU - Yan, Muyang AU - Yu, Pengming AU - Zhang, Qing AU - Xiao, Wendong AU - Zhang, Zhengbo PY - 2024/5/2 TI - Improving the Prognostic Evaluation Precision of Hospital Outcomes for Heart Failure Using Admission Notes and Clinical Tabular Data: Multimodal Deep Learning Model JO - J Med Internet Res SP - e54363 VL - 26 KW - heart failure KW - multimodal deep learning KW - mortality prediction KW - admission notes KW - clinical tabular data KW - tabular KW - notes KW - deep learning KW - machine learning KW - cardiology KW - heart KW - cardiac KW - documentation KW - prognostic KW - prognosis KW - prognoses KW - predict KW - prediction KW - predictions KW - predictive N2 - Background: Clinical notes contain contextualized information beyond structured data related to patients? past and current health status. Objective: This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data. Methods: Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors. Results: The study included 9989 (16.4%) patients in the development set, 2497 (14.1%) patients in the internal validation set, 1896 (18.3%) in the prospective validation set, and 7432 (15%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95% CI 0.827-0.851), 0.849 (95% CI 0.841-0.856), and 0.767 (95% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments. Conclusions: The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support. UR - https://www.jmir.org/2024/1/e54363 UR - http://dx.doi.org/10.2196/54363 UR - http://www.ncbi.nlm.nih.gov/pubmed/38696251 ID - info:doi/10.2196/54363 ER - TY - JOUR AU - Resendez, Skyler AU - Brown, H. Steven AU - Ruiz Ayala, Sebastian Hugo AU - Rangan, Prahalad AU - Nebeker, Jonathan AU - Montella, Diane AU - Elkin, L. Peter PY - 2024/4/30 TI - Defining the Subtypes of Long COVID and Risk Factors for Prolonged Disease: Population-Based Case-Crossover Study JO - JMIR Public Health Surveill SP - e49841 VL - 10 KW - long COVID KW - PASC KW - postacute sequelae of COVID-19 KW - public health KW - policy initiatives KW - pandemic KW - diagnosis KW - COVID-19 treatment KW - long COVID cause KW - health care support KW - public safety KW - COVID-19 KW - Veterans Affairs KW - United States KW - COVID-19 testing KW - clinician KW - mobile phone N2 - Background: There have been over 772 million confirmed cases of COVID-19 worldwide. A significant portion of these infections will lead to long COVID (post?COVID-19 condition) and its attendant morbidities and costs. Numerous life-altering complications have already been associated with the development of long COVID, including chronic fatigue, brain fog, and dangerous heart rhythms. Objective: We aim to derive an actionable long COVID case definition consisting of significantly increased signs, symptoms, and diagnoses to support pandemic-related clinical, public health, research, and policy initiatives. Methods: This research employs a case-crossover population-based study using International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) data generated at Veterans Affairs medical centers nationwide between January 1, 2020, and August 18, 2022. In total, 367,148 individuals with ICD-10-CM data both before and after a positive COVID-19 test were selected for analysis. We compared ICD-10-CM codes assigned 1 to 7 months following each patient?s positive test with those assigned up to 6 months prior. Further, 350,315 patients had novel codes assigned during this window of time. We defined signs, symptoms, and diagnoses as being associated with long COVID if they had a novel case frequency of ?1:1000, and they significantly increased in our entire cohort after a positive test. We present odds ratios with CIs for long COVID signs, symptoms, and diagnoses, organized by ICD-10-CM functional groups and medical specialty. We used our definition to assess long COVID risk based on a patient?s demographics, Elixhauser score, vaccination status, and COVID-19 disease severity. Results: We developed a long COVID definition consisting of 323 ICD-10-CM diagnosis codes grouped into 143 ICD-10-CM functional groups that were significantly increased in our 367,148 patient post?COVID-19 population. We defined 17 medical-specialty long COVID subtypes such as cardiology long COVID. Patients who were COVID-19?positive developed signs, symptoms, or diagnoses included in our long COVID definition at a proportion of at least 59.7% (268,320/449,450, based on a denominator of all patients who were COVID-19?positive). The long COVID cohort was 8 years older with more comorbidities (2-year Elixhauser score 7.97 in the patients with long COVID vs 4.21 in the patients with non?long COVID). Patients who had a more severe bout of COVID-19, as judged by their minimum oxygen saturation level, were also more likely to develop long COVID. Conclusions: An actionable, data-driven definition of long COVID can help clinicians screen for and diagnose long COVID, allowing identified patients to be admitted into appropriate monitoring and treatment programs. This long COVID definition can also support public health, research, and policy initiatives. Patients with COVID-19 who are older or have low oxygen saturation levels during their bout of COVID-19, or those who have multiple comorbidities should be preferentially watched for the development of long COVID. UR - https://publichealth.jmir.org/2024/1/e49841 UR - http://dx.doi.org/10.2196/49841 UR - http://www.ncbi.nlm.nih.gov/pubmed/38687984 ID - info:doi/10.2196/49841 ER - TY - JOUR AU - Pilgram, Lisa AU - Meurers, Thierry AU - Malin, Bradley AU - Schaeffner, Elke AU - Eckardt, Kai-Uwe AU - Prasser, Fabian AU - PY - 2024/4/24 TI - The Costs of Anonymization: Case Study Using Clinical Data JO - J Med Internet Res SP - e49445 VL - 26 KW - data sharing KW - anonymization KW - deidentification KW - privacy-utility trade-off KW - privacy-enhancing technologies KW - medical informatics KW - privacy KW - anonymized KW - security KW - identification KW - confidentiality KW - data science N2 - Background: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set?s statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. Objective: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. Methods: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case?specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case?specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. Results: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case?specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. Conclusions: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case?specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. Trial Registration: German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971 International Registered Report Identifier (IRRID): RR2-10.1093/ndt/gfr456 UR - https://www.jmir.org/2024/1/e49445 UR - http://dx.doi.org/10.2196/49445 UR - http://www.ncbi.nlm.nih.gov/pubmed/38657232 ID - info:doi/10.2196/49445 ER - TY - JOUR AU - Nourallah, Abdulnaser AU - Alshehri, Abdulrahman AU - Alhejazi, Ayman AU - Usman, Binyam AU - ElGohary, Ghada AU - Malhan, Hafiz AU - Motabi, Ibraheem AU - Al Farsi, Khalil AU - Alshuaibi, Mohammed AU - Siddiqui, Mustaqeem AU - Ghonema, Rasha AU - Taha, Yasin Ruba AU - Abouzeid, Tarek AU - Ahmed, Wesam AU - Diab, Mohanad AU - Alhuraiji, Ahmad AU - Rabea, Magdy AU - Chouikrat, Zahir Mohamed PY - 2024/4/24 TI - Real-World Registry on the Pharmacotherapy of Multiple Myeloma and Associated Renal and Pulmonary Impairments in the Greater Gulf Region: Protocol for a Retrospective Real-World Data Study JO - JMIR Res Protoc SP - e49861 VL - 13 KW - Greater Gulf region KW - multiple myeloma KW - pulmonary dysfunction KW - renal impairment KW - RRMM KW - Real-world data N2 - Background: Multiple myeloma (MM) is the second-most common cancer among hematological malignancies. Patients with active disease may experience several comorbidities, including renal insufficiency and asthma, which may lead to treatment failure. The treatment of relapsed or refractory MM (RRMM) has been associated with multiple factors, causing a decline in progression-free survival as well as overall survival with subsequent lines of therapy. Data about the characteristics of this group of patients in the Greater Gulf region are lacking. Objective: The primary objective of this study is to describe the disease characteristics and various treatment approaches or regimens used in the management of patients with RRMM in the Greater Gulf region. Methods: We will conduct a regional, retrospective study collecting real-world and epidemiological data on patients with MM in countries of the Greater Gulf region. Medical records will be used to obtain the required data. Around 150 to 170 patients? records are planned to be retrospectively reviewed over 6 months without any cross-sectional or prospective intervention. Cases will be collected from Saudi Arabia, the United Arab Emirates, Kuwait, Oman, and Qatar. Descriptive as well as analytical statistics will be performed on the extracted data. The calculated sample size will allow us to estimate the percentages of RRMM cases with acceptable precision while complying with the challenges in light of data scarcity. We will obtain a comprehensive description of the demographic profile of patients with MM; treatment outcomes; the proportion of patients with MM with renal impairment and asthma, chronic obstructive pulmonary disease, or both at the time of diagnosis and any subsequent point; and data related to treatment lines, regimens, and MM-associated morbidities. Results: Patient medical records were reviewed between June 2022 and January 2023 for eligibility and data extraction. A total of 148 patients were eligible for study inclusion, of whom 64.2% (n=95) were male and 35.8% (n=53) were female. The study is currently in its final stages of data analysis. The final manuscript is expected to be published in 2024. Conclusions: Although MM is a predominant hematological disease, data on its prevalence and patients? characteristics in the Greater Gulf region are scarce. Therefore, this study will give us real-world insights into disease characteristics and various management approaches of patients with MM in the Greater Gulf region. International Registered Report Identifier (IRRID): DERR1-10.2196/49861 UR - https://www.researchprotocols.org/2024/1/e49861 UR - http://dx.doi.org/10.2196/49861 UR - http://www.ncbi.nlm.nih.gov/pubmed/38657230 ID - info:doi/10.2196/49861 ER - TY - JOUR AU - Abu Attieh, Hammam AU - Neves, Telmo Diogo AU - Guedes, Mariana AU - Mirandola, Massimo AU - Dellacasa, Chiara AU - Rossi, Elisa AU - Prasser, Fabian PY - 2024/4/23 TI - A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study JO - JMIR Med Inform SP - e49646 VL - 12 KW - biomedical research KW - research network KW - data sharing KW - data protection KW - privacy KW - pseudonymization N2 - Background: The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share, and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially for handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data and biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows the necessary linkage and reidentification. Objective: Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions. Moreover, using centralized services is often not possible, for example, when data are reused and consent for this type of data processing is lacking. We present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of the rapid pandemic response in Europe. Methods: To overcome challenges caused by the heterogeneity of IT infrastructures across institutions, the OPT was developed based on programmable runtime environments available at practically every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large data sets, which we demonstrate through a comprehensive evaluation. Results: The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux, and MacOS. It provides multiuser support and is configurable to meet the needs of different types of research projects. Within the ORCHESTRA research network, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. As of June 2023, the software manages data about more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, pseudonymizing 100,000 subjects in 10 seconds using Microsoft Excel and in 54 seconds using LibreOffice. Conclusions: Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software. UR - https://medinform.jmir.org/2024/1/e49646 UR - http://dx.doi.org/10.2196/49646 ID - info:doi/10.2196/49646 ER - TY - JOUR AU - Karimian Sichani, Elnaz AU - Smith, Aaron AU - El Emam, Khaled AU - Mosquera, Lucy PY - 2024/4/22 TI - Creating High-Quality Synthetic Health Data: Framework for Model Development and Validation JO - JMIR Form Res SP - e53241 VL - 8 KW - synthetic data KW - tensor decomposition KW - data sharing KW - data utility KW - data privacy KW - electronic health record KW - longitudinal KW - model development KW - model validation KW - generative models N2 - Background: Electronic health records are a valuable source of patient information that must be properly deidentified before being shared with researchers. This process requires expertise and time. In addition, synthetic data have considerably reduced the restrictions on the use and sharing of real data, allowing researchers to access it more rapidly with far fewer privacy constraints. Therefore, there has been a growing interest in establishing a method to generate synthetic data that protects patients? privacy while properly reflecting the data. Objective: This study aims to develop and validate a model that generates valuable synthetic longitudinal health data while protecting the privacy of the patients whose data are collected. Methods: We investigated the best model for generating synthetic health data, with a focus on longitudinal observations. We developed a generative model that relies on the generalized canonical polyadic (GCP) tensor decomposition. This model also involves sampling from a latent factor matrix of GCP decomposition, which contains patient factors, using sequential decision trees, copula, and Hamiltonian Monte Carlo methods. We applied the proposed model to samples from the MIMIC-III (version 1.4) data set. Numerous analyses and experiments were conducted with different data structures and scenarios. We assessed the similarity between our synthetic data and the real data by conducting utility assessments. These assessments evaluate the structure and general patterns present in the data, such as dependency structure, descriptive statistics, and marginal distributions. Regarding privacy disclosure, our model preserves privacy by preventing the direct sharing of patient information and eliminating the one-to-one link between the observed and model tensor records. This was achieved by simulating and modeling a latent factor matrix of GCP decomposition associated with patients. Results: The findings show that our model is a promising method for generating synthetic longitudinal health data that is similar enough to real data. It can preserve the utility and privacy of the original data while also handling various data structures and scenarios. In certain experiments, all simulation methods used in the model produced the same high level of performance. Our model is also capable of addressing the challenge of sampling patients from electronic health records. This means that we can simulate a variety of patients in the synthetic data set, which may differ in number from the patients in the original data. Conclusions: We have presented a generative model for producing synthetic longitudinal health data. The model is formulated by applying the GCP tensor decomposition. We have provided 3 approaches for the synthesis and simulation of a latent factor matrix following the process of factorization. In brief, we have reduced the challenge of synthesizing massive longitudinal health data to synthesizing a nonlongitudinal and significantly smaller data set. UR - https://formative.jmir.org/2024/1/e53241 UR - http://dx.doi.org/10.2196/53241 UR - http://www.ncbi.nlm.nih.gov/pubmed/38648097 ID - info:doi/10.2196/53241 ER - TY - JOUR AU - Siepe, Sebastian Björn AU - Sander, Christian AU - Schultze, Martin AU - Kliem, Andreas AU - Ludwig, Sascha AU - Hegerl, Ulrich AU - Reich, Hanna PY - 2024/4/18 TI - Time-Varying Network Models for the Temporal Dynamics of Depressive Symptomatology in Patients With Depressive Disorders: Secondary Analysis of Longitudinal Observational Data JO - JMIR Ment Health SP - e50136 VL - 11 KW - depression KW - time series analysis KW - network analysis KW - experience sampling KW - idiography KW - time varying KW - mobile phone N2 - Background: As depression is highly heterogenous, an increasing number of studies investigate person-specific associations of depressive symptoms in longitudinal data. However, most studies in this area of research conceptualize symptom interrelations to be static and time invariant, which may lead to important temporal features of the disorder being missed. Objective: To reveal the dynamic nature of depression, we aimed to use a recently developed technique to investigate whether and how associations among depressive symptoms change over time. Methods: Using daily data (mean length 274, SD 82 d) of 20 participants with depression, we modeled idiographic associations among depressive symptoms, rumination, sleep, and quantity and quality of social contacts as dynamic networks using time-varying vector autoregressive models. Results: The resulting models showed marked interindividual and intraindividual differences. For some participants, associations among variables changed in the span of some weeks, whereas they stayed stable over months for others. Our results further indicated nonstationarity in all participants. Conclusions: Idiographic symptom networks can provide insights into the temporal course of mental disorders and open new avenues of research for the study of the development and stability of psychopathological processes. UR - https://mental.jmir.org/2024/1/e50136 UR - http://dx.doi.org/10.2196/50136 UR - http://www.ncbi.nlm.nih.gov/pubmed/38635978 ID - info:doi/10.2196/50136 ER - TY - JOUR AU - Wang, Echo H. AU - Weiner, P. Jonathan AU - Saria, Suchi AU - Kharrazi, Hadi PY - 2024/4/18 TI - Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis JO - J Med Internet Res SP - e47125 VL - 26 KW - algorithmic bias KW - model bias KW - predictive models KW - model fairness KW - health disparity KW - hospital readmission KW - retrospective analysis N2 - Background: The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. Objective: This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. Methods: We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. Results: The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. Conclusions: Caution must be taken when interpreting fairness measures? face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities. UR - https://www.jmir.org/2024/1/e47125 UR - http://dx.doi.org/10.2196/47125 UR - http://www.ncbi.nlm.nih.gov/pubmed/38422347 ID - info:doi/10.2196/47125 ER - TY - JOUR AU - Wündisch, Eric AU - Hufnagl, Peter AU - Brunecker, Peter AU - Meier zu Ummeln, Sophie AU - Träger, Sarah AU - Kopp, Marcus AU - Prasser, Fabian AU - Weber, Joachim PY - 2024/4/17 TI - Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study JO - JMIR Med Inform SP - e53075 VL - 12 KW - pseudonymisation KW - architecture KW - scalability KW - trusted third party KW - application KW - security KW - consent KW - identifying data KW - infrastructure KW - modular KW - software KW - implementation KW - user interface KW - health platform KW - data management KW - data privacy KW - health record KW - electronic health record KW - EHR KW - pseudonymization N2 - Background: Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective: Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité ? Universitätsmedizin Berlin. Methods: The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results: By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions: With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures. UR - https://medinform.jmir.org/2024/1/e53075 UR - http://dx.doi.org/10.2196/53075 ID - info:doi/10.2196/53075 ER - TY - JOUR AU - Taye, Kefiyalew Biniam AU - Gezie, Derseh Lemma AU - Atnafu, Asmamaw AU - Mengiste, Anagaw Shegaw AU - Kaasbøll, Jens AU - Gullslett, Knudsen Monika AU - Tilahun, Binyam PY - 2024/4/5 TI - Effect of Performance-Based Nonfinancial Incentives on Data Quality in Individual Medical Records of Institutional Births: Quasi-Experimental Study JO - JMIR Med Inform SP - e54278 VL - 12 KW - individual medical records KW - data quality KW - completeness KW - consistency KW - nonfinancial incentives KW - institutional birth KW - health care quality KW - quasi-experimental design KW - Ethiopia N2 - Background: Despite the potential of routine health information systems in tackling persistent maternal deaths stemming from poor service quality at health facilities during and around childbirth, research has demonstrated their suboptimal performance, evident from the incomplete and inaccurate data unfit for practical use. There is a consensus that nonfinancial incentives can enhance health care providers? commitment toward achieving the desired health care quality. However, there is limited evidence regarding the effectiveness of nonfinancial incentives in improving the data quality of institutional birth services in Ethiopia. Objective: This study aimed to evaluate the effect of performance-based nonfinancial incentives on the completeness and consistency of data in the individual medical records of women who availed institutional birth services in northwest Ethiopia. Methods: We used a quasi-experimental design with a comparator group in the pre-post period, using a sample of 1969 women?s medical records. The study was conducted in the ?Wegera? and ?Tach-armacheho? districts, which served as the intervention and comparator districts, respectively. The intervention comprised a multicomponent nonfinancial incentive, including smartphones, flash disks, power banks, certificates, and scholarships. Personal records of women who gave birth within 6 months before (April to September 2020) and after (February to July 2021) the intervention were included. Three distinct women?s birth records were examined: the integrated card, integrated individual folder, and delivery register. The completeness of the data was determined by examining the presence of data elements, whereas the consistency check involved evaluating the agreement of data elements among women?s birth records. The average treatment effect on the treated (ATET), with 95% CIs, was computed using a difference-in-differences model. Results: In the intervention district, data completeness in women?s personal records was nearly 4 times higher (ATET 3.8, 95% CI 2.2-5.5; P=.02), and consistency was approximately 12 times more likely (ATET 11.6, 95% CI 4.18-19; P=.03) than in the comparator district. Conclusions: This study indicates that performance-based nonfinancial incentives enhance data quality in the personal records of institutional births. Health care planners can adapt these incentives to improve the data quality of comparable medical records, particularly pregnancy-related data within health care facilities. Future research is needed to assess the effectiveness of nonfinancial incentives across diverse contexts to support successful scale-up. UR - https://medinform.jmir.org/2024/1/e54278 UR - http://dx.doi.org/10.2196/54278 UR - http://www.ncbi.nlm.nih.gov/pubmed/38578684 ID - info:doi/10.2196/54278 ER - TY - JOUR AU - McMurry, J. Andrew AU - Zipursky, R. Amy AU - Geva, Alon AU - Olson, L. Karen AU - Jones, R. James AU - Ignatov, Vladimir AU - Miller, A. Timothy AU - Mandl, D. Kenneth PY - 2024/4/4 TI - Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study JO - J Med Internet Res SP - e53367 VL - 26 KW - natural language processing KW - COVID-19 KW - artificial intelligence KW - AI KW - public health, biosurveillance KW - surveillance KW - respiratory KW - infectious KW - pulmonary KW - SARS-CoV-2 KW - symptom KW - symptoms KW - detect KW - detection KW - pipeline KW - pipelines KW - clinical note KW - clinical notes KW - documentation KW - emergency KW - urgent KW - pediatric KW - pediatrics KW - paediatric KW - paediatrics KW - child KW - children KW - youth KW - adolescent KW - adolescents KW - teen KW - teens KW - teenager KW - teenagers KW - diagnose KW - diagnosis KW - diagnostic KW - diagnostics N2 - Background: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. Objective: This study sought to validate and test an artificial intelligence (AI)?based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. Methods: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children?s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. Results: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. Conclusions: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance. UR - https://www.jmir.org/2024/1/e53367 UR - http://dx.doi.org/10.2196/53367 UR - http://www.ncbi.nlm.nih.gov/pubmed/38573752 ID - info:doi/10.2196/53367 ER - TY - JOUR AU - Gu, Xinchun AU - Watson, Conall AU - Agrawal, Utkarsh AU - Whitaker, Heather AU - Elson, H. William AU - Anand, Sneha AU - Borrow, Ray AU - Buckingham, Anna AU - Button, Elizabeth AU - Curtis, Lottie AU - Dunn, Dominic AU - Elliot, J. Alex AU - Ferreira, Filipa AU - Goudie, Rosalind AU - Hoang, Uy AU - Hoschler, Katja AU - Jamie, Gavin AU - Kar, Debasish AU - Kele, Beatrix AU - Leston, Meredith AU - Linley, Ezra AU - Macartney, Jack AU - Marsden, L. Gemma AU - Okusi, Cecilia AU - Parvizi, Omid AU - Quinot, Catherine AU - Sebastianpillai, Praveen AU - Sexton, Vanashree AU - Smith, Gillian AU - Suli, Timea AU - Thomas, B. Nicholas P. AU - Thompson, Catherine AU - Todkill, Daniel AU - Wimalaratna, Rashmi AU - Inada-Kim, Matthew AU - Andrews, Nick AU - Tzortziou-Brown, Victoria AU - Byford, Rachel AU - Zambon, Maria AU - Lopez-Bernal, Jamie AU - de Lusignan, Simon PY - 2024/4/3 TI - Postpandemic Sentinel Surveillance of Respiratory Diseases in the Context of the World Health Organization Mosaic Framework: Protocol for a Development and Evaluation Study Involving the English Primary Care Network 2023-2024 JO - JMIR Public Health Surveill SP - e52047 VL - 10 KW - sentinel surveillance KW - pandemic KW - COVID-19 KW - human influenza KW - influenza vaccines KW - respiratory tract infections KW - vaccination KW - World Health Organization KW - respiratory syncytial virus KW - phenotype KW - computerized medical record system N2 - Background: Prepandemic sentinel surveillance focused on improved management of winter pressures, with influenza-like illness (ILI) being the key clinical indicator. The World Health Organization (WHO) global standards for influenza surveillance include monitoring acute respiratory infection (ARI) and ILI. The WHO?s mosaic framework recommends that the surveillance strategies of countries include the virological monitoring of respiratory viruses with pandemic potential such as influenza. The Oxford-Royal College of General Practitioner Research and Surveillance Centre (RSC) in collaboration with the UK Health Security Agency (UKHSA) has provided sentinel surveillance since 1967, including virology since 1993. Objective: We aim to describe the RSC?s plans for sentinel surveillance in the 2023-2024 season and evaluate these plans against the WHO mosaic framework. Methods: Our approach, which includes patient and public involvement, contributes to surveillance objectives across all 3 domains of the mosaic framework. We will generate an ARI phenotype to enable reporting of this indicator in addition to ILI. These data will support UKHSA?s sentinel surveillance, including vaccine effectiveness and burden of disease studies. The panel of virology tests analyzed in UKHSA?s reference laboratory will remain unchanged, with additional plans for point-of-care testing, pneumococcus testing, and asymptomatic screening. Our sampling framework for serological surveillance will provide greater representativeness and more samples from younger people. We will create a biomedical resource that enables linkage between clinical data held in the RSC and virology data, including sequencing data, held by the UKHSA. We describe the governance framework for the RSC. Results: We are co-designing our communication about data sharing and sampling, contextualized by the mosaic framework, with national and general practice patient and public involvement groups. We present our ARI digital phenotype and the key data RSC network members are requested to include in computerized medical records. We will share data with the UKHSA to report vaccine effectiveness for COVID-19 and influenza, assess the disease burden of respiratory syncytial virus, and perform syndromic surveillance. Virological surveillance will include COVID-19, influenza, respiratory syncytial virus, and other common respiratory viruses. We plan to pilot point-of-care testing for group A streptococcus, urine tests for pneumococcus, and asymptomatic testing. We will integrate test requests and results with the laboratory-computerized medical record system. A biomedical resource will enable research linking clinical data to virology data. The legal basis for the RSC?s pseudonymized data extract is The Health Service (Control of Patient Information) Regulations 2002, and all nonsurveillance uses require research ethics approval. Conclusions: The RSC extended its surveillance activities to meet more but not all of the mosaic framework?s objectives. We have introduced an ARI indicator. We seek to expand our surveillance scope and could do more around transmissibility and the benefits and risks of nonvaccine therapies. UR - https://publichealth.jmir.org/2024/1/e52047 UR - http://dx.doi.org/10.2196/52047 UR - http://www.ncbi.nlm.nih.gov/pubmed/38569175 ID - info:doi/10.2196/52047 ER - TY - JOUR AU - von Wolff, Michael AU - Germeyer, Ariane AU - Böttcher, Bettina AU - Magaton, Martha Isotta AU - Marcu, Irene AU - Pape, Janna AU - Sänger, Nicole AU - Nordhoff, Verena AU - Roumet, Marie AU - Weidlinger, Susanna PY - 2024/3/20 TI - Evaluation of the Gonadotoxicity of Cancer Therapies to Improve Counseling of Patients About Fertility and Fertility Preservation Measures: Protocol for a Retrospective Systematic Data Analysis and a Prospective Cohort Study JO - JMIR Res Protoc SP - e51145 VL - 13 KW - fertility KW - fertility preservation KW - cancer KW - gonadotoxicity KW - FertiPROTEKT KW - FertiTOX KW - data analysis KW - cohort study KW - internet KW - platform KW - internet-based KW - data N2 - Background: Cytotoxic treatments such as chemo- and radiotherapy and immune therapies are required in cancer diseases. These therapies have the potential to cure patients but may also have an impact on gonadal function and, therefore, on fertility. Consequently, fertility preservation treatments such as freezing of gametes and gonadal tissue might be required. However, as detailed data about the necessity to perform fertility preservation treatment are very limited, this study was designed to fill this data gap. Objective: Primary objective of this study is to analyze the impact of cancer therapies and chemotherapies on the ovarian reserve and sperm quality. Secondary objectives are to analyze the (1) impact of cancer therapies and chemotherapies on other fertility parameters and (2) probability of undergoing fertility preservation treatments in relation to specific cancer diseases and treatment protocols and the probability to use the frozen gametes and gonadal tissue to achieve pregnancies. Methods: First, previously published studies on the gonadotoxicity of chemo- and radiotherapies among patients with cancer will be systematically analyzed. Second, a prospective cohort study set up by approximately 70 centers in Germany, Switzerland, and Austria will collect the following data: ovarian function by analyzing anti-Müllerian hormone (AMH) concentrations and testicular function by analyzing sperm parameters and total testosterone immediately before and around 1 year after gonadotoxic therapies (short-term fertility). A follow-up of these fertility parameters, including history of conceptions, will be performed 5 and 10 years after gonadotoxic therapies (long-term fertility). Additionally, the proportion of patients undergoing fertility-preserving procedures, their satisfaction with these procedures, and the amount of gametes and gonadal tissue and the children achieved by using the frozen material will be analyzed. Third, the data will be merged to create the internet-based data platform FertiTOX. The platform will be structured in accordance with the ICD (International Classification of Diseases) classification of cancer diseases and will be easily be accessible using a specific App. Results: Several funding bodies have funded this study. Ten systematic reviews are in progress and the first one has been accepted for publication. All Swiss and many German and Austrian ethics committees have provided their approval for the prospective cohort study. The study registry has been set up, and a study website has been created. In total, 50 infertility centers have already been prepared for data collection, which started on December 1, 2023. Conclusions: The study can be expected to bridge the data gap regarding the gonadotoxicity of cancer therapies to better counsel patients about their infertility risk and their need to undergo fertility preservation procedures. Initial data are expected to be uploaded on the FertiTOX platform in 2026. Trial Registration: ClinicalTrials.gov NCT05885048; https://clinicaltrials.gov/study/NCT05885048 International Registered Report Identifier (IRRID): DERR1-10.2196/51145 UR - https://www.researchprotocols.org/2024/1/e51145 UR - http://dx.doi.org/10.2196/51145 UR - http://www.ncbi.nlm.nih.gov/pubmed/38506900 ID - info:doi/10.2196/51145 ER - TY - JOUR AU - Hatef, Elham AU - Chang, Hsien-Yen AU - Richards, M. Thomas AU - Kitchen, Christopher AU - Budaraju, Janya AU - Foroughmand, Iman AU - Lasser, C. Elyse AU - Weiner, P. Jonathan PY - 2024/3/12 TI - Development of a Social Risk Score in the Electronic Health Record to Identify Social Needs Among Underserved Populations: Retrospective Study JO - JMIR Form Res SP - e54732 VL - 8 KW - AI KW - algorithms KW - artificial intelligence KW - community health KW - deep learning KW - EHR KW - electronic health record KW - machine learning KW - ML KW - population demographics KW - population health KW - practical models KW - predictive analytics KW - predictive modeling KW - predictive modelling KW - predictive models KW - predictive system KW - public health KW - public surveillance KW - SDOH KW - social determinants of health KW - social needs KW - social risks N2 - Background: Patients with unmet social needs and social determinants of health (SDOH) challenges continue to face a disproportionate risk of increased prevalence of disease, health care use, higher health care costs, and worse outcomes. Some existing predictive models have used the available data on social needs and SDOH challenges to predict health-related social needs or the need for various social service referrals. Despite these one-off efforts, the work to date suggests that many technical and organizational challenges must be surmounted before SDOH-integrated solutions can be implemented on an ongoing, wide-scale basis within most US-based health care organizations. Objective: We aimed to retrieve available information in the electronic health record (EHR) relevant to the identification of persons with social needs and to develop a social risk score for use within clinical practice to better identify patients at risk of having future social needs. Methods: We conducted a retrospective study using EHR data (2016-2021) and data from the US Census American Community Survey. We developed a prospective model using current year-1 risk factors to predict future year-2 outcomes within four 2-year cohorts. Predictors of interest included demographics, previous health care use, comorbidity, previously identified social needs, and neighborhood characteristics as reflected by the area deprivation index. The outcome variable was a binary indicator reflecting the likelihood of the presence of a patient with social needs. We applied a generalized estimating equation approach, adjusting for patient-level risk factors, the possible effect of geographically clustered data, and the effect of multiple visits for each patient. Results: The study population of 1,852,228 patients included middle-aged (mean age range 53.76-55.95 years), White (range 324,279/510,770, 63.49% to 290,688/488,666, 64.79%), and female (range 314,741/510,770, 61.62% to 278,488/448,666, 62.07%) patients from neighborhoods with high socioeconomic status (mean area deprivation index percentile range 28.76-30.31). Between 8.28% (37,137/448,666) and 11.55% (52,037/450,426) of patients across the study cohorts had at least 1 social need documented in their EHR, with safety issues and economic challenges (ie, financial resource strain, employment, and food insecurity) being the most common documented social needs (87,152/1,852,228, 4.71% and 58,242/1,852,228, 3.14% of overall patients, respectively). The model had an area under the curve of 0.702 (95% CI 0.699-0.705) in predicting prospective social needs in the overall study population. Previous social needs (odds ratio 3.285, 95% CI 3.237-3.335) and emergency department visits (odds ratio 1.659, 95% CI 1.634-1.684) were the strongest predictors of future social needs. Conclusions: Our model provides an opportunity to make use of available EHR data to help identify patients with high social needs. Our proposed social risk score could help identify the subset of patients who would most benefit from further social needs screening and data collection to avoid potentially more burdensome primary data collection on all patients in a target population of interest. UR - https://formative.jmir.org/2024/1/e54732 UR - http://dx.doi.org/10.2196/54732 UR - http://www.ncbi.nlm.nih.gov/pubmed/38470477 ID - info:doi/10.2196/54732 ER - TY - JOUR AU - Declerck, Jens AU - Kalra, Dipak AU - Vander Stichele, Robert AU - Coorevits, Pascal PY - 2024/3/6 TI - Frameworks, Dimensions, Definitions of Aspects, and Assessment Methods for the Appraisal of Quality of Health Data for Secondary Use: Comprehensive Overview of Reviews JO - JMIR Med Inform SP - e51560 VL - 12 KW - data quality KW - data quality dimensions KW - data quality assessment KW - secondary use KW - data quality framework KW - fit for purpose N2 - Background: Health care has not reached the full potential of the secondary use of health data because of?among other issues?concerns about the quality of the data being used. The shift toward digital health has led to an increase in the volume of health data. However, this increase in quantity has not been matched by a proportional improvement in the quality of health data. Objective: This review aims to offer a comprehensive overview of the existing frameworks for data quality dimensions and assessment methods for the secondary use of health data. In addition, it aims to consolidate the results into a unified framework. Methods: A review of reviews was conducted including reviews describing frameworks of data quality dimensions and their assessment methods, specifically from a secondary use perspective. Reviews were excluded if they were not related to the health care ecosystem, lacked relevant information related to our research objective, and were published in languages other than English. Results: A total of 22 reviews were included, comprising 22 frameworks, with 23 different terms for dimensions, and 62 definitions of dimensions. All dimensions were mapped toward the data quality framework of the European Institute for Innovation through Health Data. In total, 8 reviews mentioned 38 different assessment methods, pertaining to 31 definitions of the dimensions. Conclusions: The findings in this review revealed a lack of consensus in the literature regarding the terminology, definitions, and assessment methods for data quality dimensions. This creates ambiguity and difficulties in developing specific assessment methods. This study goes a step further by assigning all observed definitions to a consolidated framework of 9 data quality dimensions. UR - https://medinform.jmir.org/2024/1/e51560 UR - http://dx.doi.org/10.2196/51560 UR - http://www.ncbi.nlm.nih.gov/pubmed/38446534 ID - info:doi/10.2196/51560 ER - TY - JOUR AU - Peng, Yuan AU - Bathelt, Franziska AU - Gebler, Richard AU - Gött, Robert AU - Heidenreich, Andreas AU - Henke, Elisa AU - Kadioglu, Dennis AU - Lorenz, Stephan AU - Vengadeswaran, Abishaa AU - Sedlmayr, Martin PY - 2024/2/14 TI - Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review JO - JMIR Med Inform SP - e52967 VL - 12 KW - ETL KW - ELT KW - Extract-Load-Transform KW - Extract-Transform-Load KW - interoperability KW - metadata-driven KW - medical domain KW - data harmonization N2 - Background: Multisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. However, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes. Objective: In this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata. Methods: We conducted a literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We used 4 publication databases (ie, PubMed, IEEE Explore, Web of Science, and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based tool (Evidence Synthesis Hackathon). All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization. Results: Regarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into 7 different focus groups (ie, medicine, data warehouse, big data, industry, geoinformatics, archaeology, and military). Based on the extracted data, ontology-based and rule-based approaches were the 2 most used approaches in different thematic categories. Different approaches and tools were chosen to achieve different purposes within the use cases. Conclusions: Our literature review shows that using metadata-driven (MDD) approaches to develop an ETL/ELT process can serve different purposes in different thematic categories. The results show that it is promising to implement an ETL/ELT process by applying MDD approach to automate the data transformation from Fast Healthcare Interoperability Resources to Observational Medical Outcomes Partnership Common Data Model. However, the determining of an appropriate MDD approach and tool to implement such an ETL/ELT process remains a challenge. This is due to the lack of comprehensive insight into the characterizations of the MDD approaches presented in this study. Therefore, our next step is to evaluate the MDD approaches presented in this study and to determine the most appropriate MDD approaches and the way to integrate them into the ETL/ELT process. This could verify the ability of using MDD approaches to generalize the ETL process for harmonizing medical data. UR - https://medinform.jmir.org/2024/1/e52967 UR - http://dx.doi.org/10.2196/52967 UR - http://www.ncbi.nlm.nih.gov/pubmed/38354027 ID - info:doi/10.2196/52967 ER - TY - JOUR AU - Blasini, Romina AU - Strantz, Cosima AU - Gulden, Christian AU - Helfer, Sven AU - Lidke, Jakub AU - Prokosch, Hans-Ulrich AU - Sohrabi, Keywan AU - Schneider, Henning PY - 2024/1/31 TI - Evaluation of Eligibility Criteria Relevance for the Purpose of IT-Supported Trial Recruitment: Descriptive Quantitative Analysis JO - JMIR Form Res SP - e49347 VL - 8 KW - CTRSS KW - clinical trial recruitment support system KW - PRS KW - patient recruitment system KW - clinical trials KW - classifications KW - data groups KW - data elements KW - data classification KW - criteria KW - relevance KW - automated clinical trials KW - participants KW - clinical trial N2 - Background: Clinical trials (CTs) are crucial for medical research; however, they frequently fall short of the requisite number of participants who meet all eligibility criteria (EC). A clinical trial recruitment support system (CTRSS) is developed to help identify potential participants by performing a search on a specific data pool. The accuracy of the search results is directly related to the quality of the data used for comparison. Data accessibility can present challenges, making it crucial to identify the necessary data for a CTRSS to query. Prior research has examined the data elements frequently used in CT EC but has not evaluated which criteria are actually used to search for participants. Although all EC must be met to enroll a person in a CT, not all criteria have the same importance when searching for potential participants in an existing data pool, such as an electronic health record, because some of the criteria are only relevant at the time of enrollment. Objective: In this study, we investigated which groups of data elements are relevant in practice for finding suitable participants and whether there are typical elements that are not relevant and can therefore be omitted. Methods: We asked trial experts and CTRSS developers to first categorize the EC of their CTs according to data element groups and then to classify them into 1 of 3 categories: necessary, complementary, and irrelevant. In addition, the experts assessed whether a criterion was documented (on paper or digitally) or whether it was information known only to the treating physicians or patients. Results: We reviewed 82 CTs with 1132 unique EC. Of these 1132 EC, 350 (30.9%) were considered necessary, 224 (19.8%) complementary, and 341 (30.1%) total irrelevant. To identify the most relevant data elements, we introduced the data element relevance index (DERI). This describes the percentage of studies in which the corresponding data element occurs and is also classified as necessary or supplementary. We found that the query of ?diagnosis? was relevant for finding participants in 79 (96.3%) of the CTs. This group was followed by ?date of birth/age? with a DERI of 85.4% (n=70) and ?procedure? with a DERI of 35.4% (n=29). Conclusions: The distribution of data element groups in CTs has been heterogeneously described in previous works. Therefore, we recommend identifying the percentage of CTs in which data element groups can be found as a more reliable way to determine the relevance of EC. Only necessary and complementary criteria should be included in this DERI. UR - https://formative.jmir.org/2024/1/e49347 UR - http://dx.doi.org/10.2196/49347 UR - http://www.ncbi.nlm.nih.gov/pubmed/38294862 ID - info:doi/10.2196/49347 ER - TY - JOUR AU - Qian, Lei AU - Sy, S. Lina AU - Hong, Vennis AU - Glenn, C. Sungching AU - Ryan, S. Denison AU - Nelson, C. Jennifer AU - Hambidge, J. Simon AU - Crane, Bradley AU - Zerbo, Ousseny AU - DeSilva, B. Malini AU - Glanz, M. Jason AU - Donahue, G. James AU - Liles, Elizabeth AU - Duffy, Jonathan AU - Xu, Stanley PY - 2024/1/23 TI - Impact of the COVID-19 Pandemic on Health Care Utilization in the Vaccine Safety Datalink: Retrospective Cohort Study JO - JMIR Public Health Surveill SP - e48159 VL - 10 KW - COVID-19 pandemic KW - health care utilization KW - telehealth KW - inpatient KW - emergency department KW - outpatient KW - vaccine safety KW - electronic health record KW - resource allocation KW - difference-in-difference KW - interrupted time series analysis N2 - Background: Understanding the long-term impact of the COVID-19 pandemic on health care utilization is important to health care organizations and policy makers for strategic planning, as well as to researchers when designing studies that use observational electronic health record data during the pandemic period. Objective: This study aimed to evaluate the changes in health care utilization across all care settings among a large, diverse, and insured population in the United States during the COVID-19 pandemic. Methods: We conducted a retrospective cohort study within 8 health care organizations participating in the Vaccine Safety Datalink Project using electronic health record data from members of all ages from January 1, 2017, to December 31, 2021. The visit rates per person-year were calculated monthly during the study period for 4 health care settings combined as well as by inpatient, emergency department (ED), outpatient, and telehealth settings, both among all members and members without COVID-19. Difference-in-difference analysis and interrupted time series analysis were performed to assess the changes in visit rates from the prepandemic period (January 2017 to February 2020) to the early pandemic period (April-December 2020) and the later pandemic period (July-December 2021), respectively. An exploratory analysis was also conducted to assess trends through June 2023 at one of the largest sites, Kaiser Permanente Southern California. Results: The study included more than 11 million members from 2017 to 2021. Compared with the prepandemic period, we found reductions in visit rates during the early pandemic period for all in-person care settings. During the later pandemic period, overall use reached 8.36 visits per person-year, exceeding the prepandemic level of 7.49 visits per person-year in 2019 (adjusted percent change 5.1%, 95% CI 0.6%-9.9%); inpatient and ED visits returned to prepandemic levels among all members, although they remained low at 0.095 and 0.241 visits per person-year, indicating a 7.5% and 8% decrease compared to pre-pandemic levels among members without COVID-19, respectively. Telehealth visits, which were approximately 42% of the volume of outpatient visits during the later pandemic period, were increased by 97.5% (95% CI 86.0%-109.7%) from 0.865 visits per person-year in 2019 to 2.35 visits per person-year in the later pandemic period. The trends in Kaiser Permanente Southern California were similar to those of the entire study population. Visit rates from January 2022 to June 2023 were stable and appeared to be a continuation of the use levels observed at the end of 2021. Conclusions: Telehealth services became a mainstay of the health care system during the late COVID-19 pandemic period. Inpatient and ED visits returned to prepandemic levels, although they remained low among members without evidence of COVID-19. Our findings provide valuable information for strategic resource allocation for postpandemic patient care and for designing observational studies involving the pandemic period. UR - https://publichealth.jmir.org/2024/1/e48159 UR - http://dx.doi.org/10.2196/48159 UR - http://www.ncbi.nlm.nih.gov/pubmed/38091476 ID - info:doi/10.2196/48159 ER - TY - JOUR AU - Valvi, Nimish AU - McFarlane, Timothy AU - Allen, S. Katie AU - Gibson, Joseph P. AU - Dixon, Edward Brian PY - 2023/12/27 TI - Identification of Hypertension in Electronic Health Records Through Computable Phenotype Development and Validation for Use in Public Health Surveillance: Retrospective Study JO - JMIR Form Res SP - e46413 VL - 7 KW - computable phenotypes KW - electronic health records KW - health information exchange KW - hypertension KW - population surveillance KW - public health informatics N2 - Background: Electronic health record (EHR) systems are widely used in the United States to document care delivery and outcomes. Health information exchange (HIE) networks, which integrate EHR data from the various health care providers treating patients, are increasingly used to analyze population-level data. Existing methods for population health surveillance of essential hypertension by public health authorities may be complemented using EHR data from HIE networks to characterize disease burden at the community level. Objective: We aimed to derive and validate computable phenotypes (CPs) to estimate hypertension prevalence for population-based surveillance using an HIE network. Methods: Using existing data available from an HIE network, we developed 6 candidate CPs for essential (primary) hypertension in an adult population from a medium-sized Midwestern metropolitan area in the United States. A total of 2 independent clinician reviewers validated the phenotypes through a manual chart review of 150 randomly selected patient records. We assessed the precision of CPs by calculating sensitivity, specificity, positive predictive value (PPV), F1-score, and validity of chart reviews using prevalence-adjusted bias-adjusted ?. We further used the most balanced CP to estimate the prevalence of hypertension in the population. Results: Among a cohort of 548,232 adults, 6 CPs produced PPVs ranging from 71% (95% CI 64.3%-76.9%) to 95.7% (95% CI 84.9%-98.9%). The F1-score ranged from 0.40 to 0.91. The prevalence-adjusted bias-adjusted ? revealed a high percentage agreement of 0.88 for hypertension. Similarly, interrater agreement for individual phenotype determination demonstrated substantial agreement (range 0.70-0.88) for all 6 phenotypes examined. A phenotype based solely on diagnostic codes possessed reasonable performance (F1-score=0.63; PPV=95.1%) but was imbalanced with low sensitivity (47.6%). The most balanced phenotype (F1-score=0.91; PPV=83.5%) included diagnosis, blood pressure measurements, and medications and identified 210,764 (38.4%) individuals with hypertension during the study period (2014-2015). Conclusions: We identified several high-performing phenotypes to identify essential hypertension prevalence for local public health surveillance using EHR data. Given the increasing availability of EHR systems in the United States and other nations, leveraging EHR data has the potential to enhance surveillance of chronic disease in health systems and communities. Yet given variability in performance, public health authorities will need to decide whether to seek optimal balance or declare a preference for algorithms that lean toward sensitivity or specificity to estimate population prevalence of disease. UR - https://formative.jmir.org/2023/1/e46413 UR - http://dx.doi.org/10.2196/46413 UR - http://www.ncbi.nlm.nih.gov/pubmed/38150296 ID - info:doi/10.2196/46413 ER - TY - JOUR AU - Matsuzaki, Keiichi AU - Kitayama, Megumi AU - Yamamoto, Keiichi AU - Aida, Rei AU - Imai, Takumi AU - Ishida, Mami AU - Katafuchi, Ritsuko AU - Kawamura, Tetsuya AU - Yokoo, Takashi AU - Narita, Ichiei AU - Suzuki, Yusuke PY - 2023/12/21 TI - A Pragmatic Method to Integrate Data From Preexisting Cohort Studies Using the Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model: Case Study JO - JMIR Med Inform SP - e46725 VL - 11 KW - data warehousing KW - data management KW - database integration KW - integrate multiple data sets KW - Study Data Tabulation Model KW - SDTM KW - Clinical Data Interchange Standards Consortium KW - CDISC N2 - Background: In recent years, many researchers have focused on the use of legacy data, such as pooled analyses that collect and reanalyze data from multiple studies. However, the methodology for the integration of preexisting databases whose data were collected for different purposes has not been established. Previously, we developed a tool to efficiently generate Study Data Tabulation Model (SDTM) data from hypothetical clinical trial data using the Clinical Data Interchange Standards Consortium (CDISC) SDTM. Objective: This study aimed to design a practical model for integrating preexisting databases using the CDISC SDTM. Methods: Data integration was performed in three phases: (1) the confirmation of the variables, (2) SDTM mapping, and (3) the generation of the SDTM data. In phase 1, the definitions of the variables in detail were confirmed, and the data sets were converted to a vertical structure. In phase 2, the items derived from the SDTM format were set as mapping items. Three types of metadata (domain name, variable name, and test code), based on the CDISC SDTM, were embedded in the Research Electronic Data Capture (REDCap) field annotation. In phase 3, the data dictionary, including the SDTM metadata, was outputted in the Operational Data Model (ODM) format. Finally, the mapped SDTM data were generated using REDCap2SDTM version 2. Results: SDTM data were generated as a comma-separated values file for each of the 7 domains defined in the metadata. A total of 17 items were commonly mapped to 3 databases. Because the SDTM data were set in each database correctly, we were able to integrate 3 independently preexisting databases into 1 database in the CDISC SDTM format. Conclusions: Our project suggests that the CDISC SDTM is useful for integrating multiple preexisting databases. UR - https://medinform.jmir.org/2023/1/e46725 UR - http://dx.doi.org/10.2196/46725 ID - info:doi/10.2196/46725 ER - TY - JOUR AU - Bazoge, Adrien AU - Morin, Emmanuel AU - Daille, Béatrice AU - Gourraud, Pierre-Antoine PY - 2023/12/15 TI - Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review JO - JMIR Med Inform SP - e42477 VL - 11 KW - natural language processing KW - data warehousing KW - clinical data warehouse KW - artificial intelligence KW - AI N2 - Background: In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. Objective: The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. Methods: This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. Results: We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). Conclusions: CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice. UR - https://medinform.jmir.org/2023/1/e42477 UR - http://dx.doi.org/10.2196/42477 UR - http://www.ncbi.nlm.nih.gov/pubmed/38100200 ID - info:doi/10.2196/42477 ER - TY - JOUR AU - Gierend, Kerstin AU - Freiesleben, Sherry AU - Kadioglu, Dennis AU - Siegel, Fabian AU - Ganslandt, Thomas AU - Waltemath, Dagmar PY - 2023/11/8 TI - The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study JO - J Med Internet Res SP - e48809 VL - 25 KW - data management KW - provenance KW - traceability KW - metadata KW - data integration center KW - maturity model N2 - Background: In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status. Objective: Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data. Methods: In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. Results: Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. Conclusions: The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality. UR - https://www.jmir.org/2023/1/e48809 UR - http://dx.doi.org/10.2196/48809 UR - http://www.ncbi.nlm.nih.gov/pubmed/37938878 ID - info:doi/10.2196/48809 ER - TY - JOUR AU - Yang, Xulin AU - Qiu, Hang AU - Wang, Liya AU - Wang, Xiaodong PY - 2023/10/26 TI - Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study JO - J Med Internet Res SP - e44417 VL - 25 KW - colorectal cancer KW - survival prediction KW - machine learning KW - time-to-event KW - SHAP KW - SHapley Additive exPlanations N2 - Background: Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling. Objective: This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival. Methods: The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance. Results: A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival. Conclusions: This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models. UR - https://www.jmir.org/2023/1/e44417 UR - http://dx.doi.org/10.2196/44417 UR - http://www.ncbi.nlm.nih.gov/pubmed/37883174 ID - info:doi/10.2196/44417 ER - TY - JOUR AU - Zhang, Wang AU - Zhu, Zhu AU - Zhao, Yonggen AU - Li, Zheming AU - Chen, Lingdong AU - Huang, Jian AU - Li, Jing AU - Yu, Gang PY - 2023/9/20 TI - Analyzing and Forecasting Pediatric Fever Clinic Visits in High Frequency Using Ensemble Time-Series Methods After the COVID-19 Pandemic in Hangzhou, China: Retrospective Study JO - JMIR Med Inform SP - e45846 VL - 11 KW - time-series forecasting KW - outpatient visits KW - hospital management KW - pediatric fever clinic KW - long sequence KW - visits in high frequency KW - COVID-19 N2 - Background: The COVID-19 pandemic has significantly altered the global health and medical landscape. In response to the outbreak, Chinese hospitals have established 24-hour fever clinics to serve patients with COVID-19. The emergence of these clinics and the impact of successive epidemics have led to a surge in visits, placing pressure on hospital resource allocation and scheduling. Therefore, accurate prediction of outpatient visits is essential for informed decision-making in hospital management. Objective: Hourly visits to fever clinics can be characterized as a long-sequence time series in high frequency, which also exhibits distinct patterns due to the particularity of pediatric treatment behavior in an epidemic context. This study aimed to build models to forecast fever clinic visit with outstanding prediction accuracy and robust generalization in forecast horizons. In addition, this study hopes to provide a research paradigm for time-series forecasting problems, which involves an exploratory analysis revealing data patterns before model development. Methods: An exploratory analysis, including graphical analysis, autocorrelation analysis, and seasonal-trend decomposition, was conducted to reveal the seasonality and structural patterns of the retrospective fever clinic visit data. The data were found to exhibit multiseasonality and nonlinearity. On the basis of these results, an ensemble of time-series analysis methods, including individual models and their combinations, was validated on the data set. Root mean square error and mean absolute error were used as accuracy metrics, with the cross-validation of rolling forecasting origin conducted across different forecast horizons. Results: Hybrid models generally outperformed individual models across most forecast horizons. A novel model combination, the hybrid neural network autoregressive (NNAR)-seasonal and trend decomposition using Loess forecasting (STLF), was identified as the optimal model for our forecasting task, with the best performance in all accuracy metrics (root mean square error=20.1, mean absolute error=14.3) for the 15-days-ahead forecasts and an overall advantage for forecast horizons that were 1 to 30 days ahead. Conclusions: Although forecast accuracy tends to decline with an increasing forecast horizon, the hybrid NNAR-STLF model is applicable for short-, medium-, and long-term forecasts owing to its ability to fit multiseasonality (captured by the STLF component) and nonlinearity (captured by the NNAR component). The model identified in this study is also applicable to hospitals in other regions with similar epidemic outpatient configurations or forecasting tasks whose data conform to long-sequence time series in high frequency exhibiting multiseasonal and nonlinear patterns. However, as external variables and disruptive events were not accounted for, the model performance declined slightly following changes in the COVID-19 containment policy in China. Future work may seek to improve accuracy by incorporating external variables that characterize moving events or other factors as well as by adding data from different organizations to enhance algorithm generalization. UR - https://medinform.jmir.org/2023/1/e45846 UR - http://dx.doi.org/10.2196/45846 UR - http://www.ncbi.nlm.nih.gov/pubmed/37728972 ID - info:doi/10.2196/45846 ER - TY - JOUR AU - Esmaeilzadeh, Pouyan AU - Mirzaei, Tala PY - 2023/8/18 TI - Role of Incentives in the Use of Blockchain-Based Platforms for Sharing Sensitive Health Data: Experimental Study JO - J Med Internet Res SP - e41805 VL - 25 KW - blockchain technology KW - data sharing KW - health data KW - clinical research KW - incentive mechanisms N2 - Background: Blockchain is an emerging technology that enables secure and decentralized approaches to reduce technical risks and governance challenges associated with sharing data. Although blockchain-based solutions have been suggested for sharing health information, it is still unclear whether a suitable incentive mechanism (intrinsic or extrinsic) can be identified to encourage individuals to share their sensitive data for research purposes. Objective: This study aimed to investigate how important extrinsic incentives are and what type of incentive is the best option in blockchain-based platforms designed for sharing sensitive health information. Methods: In this study, we conducted 3 experiments with 493 individuals to investigate the role of extrinsic incentives (ie, cryptocurrency, money, and recognition) in data sharing with research organizations. Results: The findings highlight that offering different incentives is insufficient to encourage individuals to use blockchain technology or to change their perceptions about the technology?s premise for sharing sensitive health data. The results demonstrate that individuals still attribute serious risks to blockchain-based platforms. Privacy and security concerns, trust issues, lack of knowledge about the technology, lack of public acceptance, and lack of regulations are reported as top risks. In terms of attracting people to use blockchain-based platforms for data sharing in health care, we show that the effects of extrinsic motivations (cryptoincentives, money, and status) are significantly overshadowed by inhibitors to technology use. Conclusions: We suggest that before emphasizing the use of various types of extrinsic incentives, the users must be educated about the capabilities and benefits offered by this technology. Thus, an essential first step for shifting from an institution-based data exchange to a patient-centric data exchange (using blockchain) is addressing technology inhibitors to promote patient-driven data access control. This study shows that extrinsic incentives alone are inadequate to change users? perceptions, increase their trust, or encourage them to use technology for sharing health data. UR - https://www.jmir.org/2023/1/e41805 UR - http://dx.doi.org/10.2196/41805 UR - http://www.ncbi.nlm.nih.gov/pubmed/37594783 ID - info:doi/10.2196/41805 ER - TY - JOUR AU - Areias, C. Anabela AU - Janela, Dora AU - Molinos, Maria AU - Moulder, G. Robert AU - Bento, Virgílio AU - Yanamadala, Vijay AU - Cohen, P. Steven AU - Correia, Dias Fernando AU - Costa, Fabíola PY - 2023/8/15 TI - Managing Musculoskeletal Pain in Older Adults Through a Digital Care Solution: Secondary Analysis of a Prospective Clinical Study JO - JMIR Rehabil Assist Technol SP - e49673 VL - 10 KW - aged KW - digital therapy KW - eHealth KW - musculoskeletal conditions KW - older adults KW - pain KW - physical therapy KW - telehealth KW - telerehabilitation N2 - Background: Aging is closely associated with an increased prevalence of musculoskeletal conditions. Digital musculoskeletal care interventions emerged to deliver timely and proper rehabilitation; however, older adults frequently face specific barriers and concerns with digital care programs (DCPs). Objective: This study aims to investigate whether known barriers and concerns of older adults impacted their participation in or engagement with a DCP or the observed clinical outcomes in comparison with younger individuals. Methods: We conducted a secondary analysis of a single-arm investigation assessing the recovery of patients with musculoskeletal conditions following a DCP for up to 12 weeks. Patients were categorized according to age: ?44 years old (young adults), 45-64 years old (middle-aged adults), and ?65 years old (older adults). DCP access and engagement were evaluated by assessing starting proportions, completion rates, ability to perform exercises autonomously, assistance requests, communication with their physical therapist, and program satisfaction. Clinical outcomes included change between baseline and program end for pain (including response rate to a minimal clinically important difference of 30%), analgesic usage, mental health, work productivity, and non?work-related activity impairment. Results: Of 16,229 patients, 12,082 started the program: 38.3% (n=4629) were young adults, 55.7% (n=6726) were middle-aged adults, and 6% (n=727) were older adults. Older patients were more likely to start the intervention and to complete the program compared to young adults (odds ratio [OR] 1.72, 95% CI 1.45-2.06; P<.001 and OR 2.40, 95% CI 1.97-2.92; P<.001, respectively) and middle-aged adults (OR 1.22, 95% CI 1.03-1.45; P=.03 and OR 1.38, 95% CI 1.14-1.68; P=.001, respectively). Whereas older patients requested more technical assistance and exhibited a slower learning curve in exercise performance, their engagement was higher, as reflected by higher adherence to both exercise and education pieces. Older patients interacted more with the physical therapist (mean 12.6, SD 18.4 vs mean 10.7, SD 14.7 of young adults) and showed higher satisfaction scores (mean 8.7, SD 1.9). Significant improvements were observed in all clinical outcomes and were similar between groups, including pain response rates (young adults: 949/1516, 62.6%; middle-aged adults: 1848/2834, 65.2%; and older adults: 241/387, 62.3%; P=.17). Conclusions: Older adults showed high adherence, engagement, and satisfaction with the DCP, which were greater than in their younger counterparts, together with significant clinical improvements in all studied outcomes. This suggests DCPs can successfully address and overcome some of the barriers surrounding the participation and adequacy of digital models in the older adult population. UR - https://rehab.jmir.org/2023/1/e49673 UR - http://dx.doi.org/10.2196/49673 UR - http://www.ncbi.nlm.nih.gov/pubmed/37465960 ID - info:doi/10.2196/49673 ER - TY - JOUR AU - Ahmadi, Najia AU - Zoch, Michele AU - Kelbert, Patricia AU - Noll, Richard AU - Schaaf, Jannik AU - Wolfien, Markus AU - Sedlmayr, Martin PY - 2023/8/3 TI - Methods Used in the Development of Common Data Models for Health Data: Scoping Review JO - JMIR Med Inform SP - e45116 VL - 11 KW - common data model KW - common data elements KW - health data KW - electronic health record KW - Observational Medical Outcomes Partnership KW - stakeholder involvement KW - Data harmonisation KW - Interoperability KW - Standardized Data Repositories KW - Suggestive Development Process KW - Healthcare KW - Medical Informatics KW - N2 - Background: Common data models (CDMs) are essential tools for data harmonization, which can lead to significant improvements in the health domain. CDMs unite data from disparate sources and ease collaborations across institutions, resulting in the generation of large standardized data repositories across different entities. An overview of existing CDMs and methods used to develop these data sets may assist in the development process of future models for the health domain, such as for decision support systems. Objective: This scoping review investigates methods used in the development of CDMs for health data. We aim to provide a broad overview of approaches and guidelines that are used in the development of CDMs (ie, common data elements or common data sets) for different health domains on an international level. Methods: This scoping review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. We conducted the literature search in prominent databases, namely, PubMed, Web of Science, Science Direct, and Scopus, starting from January 2000 until March 2022. We identified and screened 1309 articles. The included articles were evaluated based on the type of adopted method, which was used in the conception, users? needs collection, implementation, and evaluation phases of CDMs, and whether stakeholders (such as medical experts, patients? representatives, and IT staff) were involved during the process. Moreover, the models were grouped into iterative or linear types based on the imperativeness of the stages during development. Results: We finally identified 59 articles that fit our eligibility criteria. Of these articles, 45 specifically focused on common medical conditions, 10 focused on rare medical conditions, and the remaining 4 focused on both conditions. The development process usually involved stakeholders but in different ways (eg, working group meetings, Delphi approaches, interviews, and questionnaires). Twenty-two models followed an iterative process. Conclusions: The included articles showed the diversity of methods used to develop a CDM in different domains of health. We highlight the need for more specialized CDM development methods in the health domain and propose a suggestive development process that might ease the development of CDMs in the health domain in the future. UR - https://medinform.jmir.org/2023/1/e45116 UR - http://dx.doi.org/10.2196/45116 UR - http://www.ncbi.nlm.nih.gov/pubmed/37535410 ID - info:doi/10.2196/45116 ER - TY - JOUR AU - Kusejko, Katharina AU - Smith, Daniel AU - Scherrer, Alexandra AU - Paioni, Paolo AU - Kohns Vasconcelos, Malte AU - Aebi-Popp, Karoline AU - Kouyos, D. Roger AU - Günthard, F. Huldrych AU - Kahlert, R. Christian AU - PY - 2023/5/31 TI - Migrating a Well-Established Longitudinal Cohort Database From Oracle SQL to Research Electronic Data Entry (REDCap): Data Management Research and Design Study JO - JMIR Form Res SP - e44567 VL - 7 KW - REDCap KW - cohort study KW - data collection KW - electronic case report forms KW - eCRF KW - software KW - digital solution KW - electronic data entry KW - HIV N2 - Background: Providing user-friendly electronic data collection tools for large multicenter studies is key for obtaining high-quality research data. Research Electronic Data Capture (REDCap) is a software solution developed for setting up research databases with integrated graphical user interfaces for electronic data entry. The Swiss Mother and Child HIV Cohort Study (MoCHiV) is a longitudinal cohort study with around 2 million data entries dating back to the early 1980s. Until 2022, data collection in MoCHiV was paper-based. Objective: The objective of this study was to provide a user-friendly graphical interface for electronic data entry for physicians and study nurses reporting MoCHiV data. Methods: MoCHiV collects information on obstetric events among women living with HIV and children born to mothers living with HIV. Until 2022, MoCHiV data were stored in an Oracle SQL relational database. In this project, R and REDCap were used to develop an electronic data entry platform for MoCHiV with migration of already collected data. Results: The key steps for providing an electronic data entry option for MoCHiV were (1) design, (2) data cleaning and formatting, (3) migration and compliance, and (4) add-on features. In the first step, the database structure was defined in REDCap, including the specification of primary and foreign keys, definition of study variables, and the hierarchy of questions (termed ?branching logic?). In the second step, data stored in Oracle were cleaned and formatted to adhere to the defined database structure. Systematic data checks ensured compliance to all branching logic and levels of categorical variables. REDCap-specific variables and numbering of repeated events for enabling a relational data structure in REDCap were generated using R. In the third step, data were imported to REDCap and then systematically compared to the original data. In the last step, add-on features, such as data access groups, redirections, and summary reports, were integrated to facilitate data entry in the multicenter MoCHiV study. Conclusions: By combining different software tools?Oracle SQL, R, and REDCap?and building a systematic pipeline for data cleaning, formatting, and comparing, we were able to migrate a multicenter longitudinal cohort study from Oracle SQL to REDCap. REDCap offers a flexible way for developing customized study designs, even in the case of longitudinal studies with different study arms (ie, obstetric events, women, and mother-child pairs). However, REDCap does not offer built-in tools for preprocessing large data sets before data import. Additional software is needed (eg, R) for data formatting and cleaning to achieve the predefined REDCap data structure. UR - https://formative.jmir.org/2023/1/e44567 UR - http://dx.doi.org/10.2196/44567 UR - http://www.ncbi.nlm.nih.gov/pubmed/37256686 ID - info:doi/10.2196/44567 ER - TY - JOUR AU - Li, Xi-liang AU - Huang, Hang AU - Lu, Ying AU - Stafford, S. Randall AU - Lima, Maria Simone AU - Mota, Caroline AU - Shi, Xin PY - 2023/5/30 TI - Prediction of Multimorbidity in Brazil: Latest Fifth of a Century Population Study JO - JMIR Public Health Surveill SP - e44647 VL - 9 KW - Brazil KW - demographic factors KW - logistic regression analysis KW - multimorbidity KW - nomogram prediction KW - prevalence N2 - Background: Multimorbidity is characterized by the co-occurrence of 2 or more chronic diseases and has been a focus of the health care sector and health policy makers due to its severe adverse effects. Objective: This paper aims to use the latest 2 decades of national health data in Brazil to analyze the effects of demographic factors and predict the impact of various risk factors on multimorbidity. Methods: Data analysis methods include descriptive analysis, logistic regression, and nomogram prediction. The study makes use of a set of national cross-sectional data with a sample size of 877,032. The study used data from 1998, 2003, and 2008 from the Brazilian National Household Sample Survey, and from 2013 and 2019 from the Brazilian National Health Survey. We developed a logistic regression model to assess the influence of risk factors on multimorbidity and predict the influence of the key risk factors in the future, based on the prevalence of multimorbidity in Brazil. Results: Overall, females were 1.7 times more likely to experience multimorbidity than males (odds ratio [OR] 1.72, 95% CI 1.69-1.74). The prevalence of multimorbidity among unemployed individuals was 1.5 times that of employed individuals (OR 1.51, 95% CI 1.49-1.53). Multimorbidity prevalence increased significantly with age. People over 60 years of age were about 20 times more likely to have multiple chronic diseases than those between 18 and 29 years of age (OR 19.6, 95% CI 19.15-20.07). The prevalence of multimorbidity in illiterate individuals was 1.2 times that in literate ones (OR 1.26, 95% CI 1.24-1.28). The subjective well-being of seniors without multimorbidity was 15 times that among people with multimorbidity (OR 15.29, 95% CI 14.97-15.63). Adults with multimorbidity were more than 1.5 times more likely to be hospitalized than those without (OR 1.53, 95% CI 1.50-1.56) and 1.9 times more likely need medical care (OR 1.94, 95% CI 1.91-1.97). These patterns were similar in all 5 cohort studies and remained stable for over 21 years. A nomogram model was used to predict multimorbidity prevalence under the influence of various risk factors. The prediction results were consistent with the effects of logistic regression; older age and poorer participant well-being had the strongest correlation with multimorbidity. Conclusions: Our study shows that multimorbidity prevalence varied little in the past 2 decades but varies widely across social groups. Identifying populations with higher rates of multimorbidity prevalence may improve policy making around multimorbidity prevention and management. The Brazilian government can create public health policies targeting these groups, and provide more medical treatment and health services to support and protect the multimorbidity population. UR - https://publichealth.jmir.org/2023/1/e44647 UR - http://dx.doi.org/10.2196/44647 UR - http://www.ncbi.nlm.nih.gov/pubmed/37252771 ID - info:doi/10.2196/44647 ER - TY - JOUR AU - Ma, E. Jessica AU - Lowe, Jared AU - Berkowitz, Callie AU - Kim, Azalea AU - Togo, Ira AU - Musser, Clayton R. AU - Fischer, Jonathan AU - Shah, Kevin AU - Ibrahim, Salam AU - Bosworth, B. Hayden AU - Totten, M. Annette AU - Dolor, Rowena PY - 2023/5/12 TI - Provider Interaction With an Electronic Health Record Notification to Identify Eligible Patients for a Cluster Randomized Trial of Advance Care Planning in Primary Care: Secondary Analysis JO - J Med Internet Res SP - e41884 VL - 25 KW - advance care planning KW - electronic health record KW - notification KW - EHR KW - provider interaction KW - primary care KW - clinical study KW - referral KW - notifications KW - alerts N2 - Background: Advance care planning (ACP) improves patient-provider communication and aligns care to patient values, preferences, and goals. Within a multisite Meta-network Learning and Research Center ACP study, one health system deployed an electronic health record (EHR) notification and algorithm to alert providers about patients potentially appropriate for ACP and the clinical study. Objective: The aim of the study is to describe the implementation and usage of an EHR notification for referring patients to an ACP study, evaluate the association of notifications with study referrals and engagement in ACP, and assess provider interactions with and perspectives on the notifications. Methods: A secondary analysis assessed provider usage and their response to the notification (eg, acknowledge, dismiss, or engage patient in ACP conversation and refer patient to the clinical study). We evaluated all patients identified by the EHR algorithm during the Meta-network Learning and Research Center ACP study. Descriptive statistics compared patients referred to the study to those who were not referred to the study. Health care utilization, hospice referrals, and mortality as well as documentation and billing for ACP and related legal documents are reported. We evaluated associations between notifications with provider actions (ie, referral to study, ACP not documentation, and ACP billing). Provider free-text comments in the notifications were summarized qualitatively. Providers were surveyed on their satisfaction with the notification. Results: Among the 2877 patients identified by the EHR algorithm over 20 months, 17,047 unique notifications were presented to 45 providers in 6 clinics, who then referred 290 (10%) patients. Providers had a median of 269 (IQR 65-552) total notifications, and patients had a median of 4 (IQR 2-8). Patients with more (over 5) notifications were less likely to be referred to the study than those with fewer notifications (57/1092, 5.2% vs 233/1785, 13.1%; P<.001). The most common free-text comment on the notification was lack of time. Providers who referred patients to the study were more likely to document ACP and submit ACP billing codes (P<.001). In the survey, 11 providers would recommend the notification (n=7, 64%); however, the notification impacted clinical workflow (n=9, 82%) and was difficult to navigate (n=6, 55%). Conclusions: An EHR notification can be implemented to remind providers to both perform ACP conversations and refer patients to a clinical study. There were diminishing returns after the fifth EHR notification where additional notifications did not lead to more trial referrals, ACP documentation, or ACP billing. Creation and optimization of EHR notifications for study referrals and ACP should consider the provider user, their workflow, and alert fatigue to improve implementation and adoption. Trial Registration: ClinicalTrials.gov NCT03577002; https://clinicaltrials.gov/ct2/show/NCT03577002 UR - https://www.jmir.org/2023/1/e41884 UR - http://dx.doi.org/10.2196/41884 UR - http://www.ncbi.nlm.nih.gov/pubmed/37171856 ID - info:doi/10.2196/41884 ER - TY - JOUR AU - Fitzpatrick, K. Natalie AU - Dobson, Richard AU - Roberts, Angus AU - Jones, Kerina AU - Shah, D. Anoop AU - Nenadic, Goran AU - Ford, Elizabeth PY - 2023/5/3 TI - Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders JO - JMIR Med Inform SP - e45534 VL - 11 KW - consent KW - databank KW - electronic health records KW - free text KW - governance KW - natural language processing KW - public involvement KW - unstructured text N2 - Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery. UR - https://medinform.jmir.org/2023/1/e45534 UR - http://dx.doi.org/10.2196/45534 UR - http://www.ncbi.nlm.nih.gov/pubmed/37133927 ID - info:doi/10.2196/45534 ER - TY - JOUR AU - Vestesson, Maria Emma AU - De Corte, An Kaat Lieve AU - Crellin, Elizabeth AU - Ledger, Jean AU - Bakhai, Minal AU - Clarke, M. Geraldine PY - 2023/5/2 TI - Consultation Rate and Mode by Deprivation in English General Practice From 2018 to 2022: Population-Based Study JO - JMIR Public Health Surveill SP - e44944 VL - 9 KW - primary care KW - deprivation KW - England KW - remote consultations KW - pandemic KW - COVID-19 KW - CPRD Aurum KW - Clinical Practice Research Datalink KW - general practice KW - health inequalities KW - consultation KW - electronic health records KW - age KW - sex KW - care delivery KW - database KW - information management KW - population data KW - Index of Multiple Deprivation KW - IMD KW - cohort KW - longitudinal KW - data linkage KW - data link KW - random KW - consult KW - person-year N2 - Background: The COVID-19 pandemic has had a significant impact on primary care service delivery with an increased use of remote consultations. With general practice delivering record numbers of appointments and rising concerns around access, funding, and staffing in the UK National Health Service, we assessed contemporary trends in consultation rate and modes (ie, face-to-face versus remote). Objective: This paper describes trends in consultation rates in general practice in England for key demographics before and during the COVID-19 pandemic. We explore the use of remote and face-to-face consultations with regard to socioeconomic deprivation to understand the possible effect of changes in consultation modes on health inequalities. Methods: We did a retrospective analysis of 9,429,919 consultations by general practitioners, nurses, or other health care professionals between March 2018 and February 2022 for patients registered at 397 general practices in England. We used routine electronic health records from Clinical Practice Research Datalink Aurum with linkage to national data sets. Negative binomial models were used to predict consultation rates and modes (ie, remote versus face-to-face) by age, sex, and socioeconomic deprivation over time. Results: Overall consultation rates increased by 15% from 4.92 in 2018-2019 to 5.66 in 2021-2022 with some fluctuation during the start of the COVID-19 pandemic. The breakdown into face-to-face and remote consultations shows that the pandemic precipitated a rapid increase in remote consultations across all groups, but the extent varies by age. Consultation rates increased with increasing levels of deprivation. Socioeconomic differences in consultation rates, adjusted for sex and age, halved during the pandemic (from 0.36 to 0.18, indicating more consultations in the most deprived), effectively narrowing relative differences between deprivation quintiles. This trend remains when stratified by sex, but the difference across deprivation quintiles is smaller for men. The most deprived saw a relatively larger increase in remote and decrease in face-to-face consultation rates compared to the least deprived. Conclusions: The substantial increases in consultation rates observed in this study imply an increased pressure on general practice. The narrowing of consultation rates between deprivation quintiles is cause for concern, given ample evidence that health needs are greater in more deprived areas. UR - https://publichealth.jmir.org/2023/1/e44944 UR - http://dx.doi.org/10.2196/44944 UR - http://www.ncbi.nlm.nih.gov/pubmed/37129943 ID - info:doi/10.2196/44944 ER - TY - JOUR AU - Hopcroft, EM Lisa AU - Massey, Jon AU - Curtis, J. Helen AU - Mackenna, Brian AU - Croker, Richard AU - Brown, D. Andrew AU - O'Dwyer, Thomas AU - Macdonald, Orla AU - Evans, David AU - Inglesby, Peter AU - Bacon, CJ Sebastian AU - Goldacre, Ben AU - Walker, J. Alex PY - 2023/4/19 TI - Data-Driven Identification of Unusual Prescribing Behavior: Analysis and Use of an Interactive Data Tool Using 6 Months of Primary Care Data From 6500 Practices in England JO - JMIR Med Inform SP - e44237 VL - 11 KW - dashboard KW - data science KW - EHR KW - electronic health records KW - general practice KW - outliers KW - prescribing KW - primary care N2 - Background: Approaches to addressing unwarranted variation in health care service delivery have traditionally relied on the prospective identification of activities and outcomes, based on a hypothesis, with subsequent reporting against defined measures. Practice-level prescribing data in England are made publicly available by the National Health Service (NHS) Business Services Authority for all general practices. There is an opportunity to adopt a more data-driven approach to capture variability and identify outliers by applying hypothesis-free, data-driven algorithms to national data sets. Objective: This study aimed to develop and apply a hypothesis-free algorithm to identify unusual prescribing behavior in primary care data at multiple administrative levels in the NHS in England and to visualize these results using organization-specific interactive dashboards, thereby demonstrating proof of concept for prioritization approaches. Methods: Here we report a new data-driven approach to quantify how ?unusual? the prescribing rates of a particular chemical within an organization are as compared to peer organizations, over a period of 6 months (June-December 2021). This is followed by a ranking to identify which chemicals are the most notable outliers in each organization. These outlying chemicals are calculated for all practices, primary care networks, clinical commissioning groups, and sustainability and transformation partnerships in England. Our results are presented via organization-specific interactive dashboards, the iterative development of which has been informed by user feedback. Results: We developed interactive dashboards for every practice (n=6476) in England, highlighting the unusual prescribing of 2369 chemicals (dashboards are also provided for 42 sustainability and transformation partnerships, 106 clinical commissioning groups, and 1257 primary care networks). User feedback and internal review of case studies demonstrate that our methodology identifies prescribing behavior that sometimes warrants further investigation or is a known issue. Conclusions: Data-driven approaches have the potential to overcome existing biases with regard to the planning and execution of audits, interventions, and policy making within NHS organizations, potentially revealing new targets for improved health care service delivery. We present our dashboards as a proof of concept for generating candidate lists to aid expert users in their interpretation of prescribing data and prioritize further investigations and qualitative research in terms of potential targets for improved performance. UR - https://medinform.jmir.org/2023/1/e44237 UR - http://dx.doi.org/10.2196/44237 UR - http://www.ncbi.nlm.nih.gov/pubmed/37074763 ID - info:doi/10.2196/44237 ER - TY - JOUR AU - Weinert, Lina AU - Klass, Maximilian AU - Schneider, Gerd AU - Heinze, Oliver PY - 2023/4/18 TI - Exploring Stakeholder Requirements to Enable Research and Development of Artificial Intelligence Algorithms in a Hospital-Based Generic Infrastructure: Results of a Multistep Mixed Methods Study JO - JMIR Form Res SP - e43958 VL - 7 KW - artificial intelligence KW - requirements analysis KW - mixed-methods KW - data availability KW - qualitative research N2 - Background: Legal, controlled, and regulated access to high-quality data from academic hospitals currently poses a barrier to the development and testing of new artificial intelligence (AI) algorithms. To overcome this barrier, the German Federal Ministry of Health supports the ?pAItient? (Protected Artificial Intelligence Innovation Environment for Patient Oriented Digital Health Solutions for developing, testing and evidence-based evaluation of clinical value) project, with the goal to establish an AI Innovation Environment at the Heidelberg University Hospital, Germany. It is designed as a proof-of-concept extension to the preexisting Medical Data Integration Center. Objective: The first part of the pAItient project aims to explore stakeholders? requirements for developing AI in partnership with an academic hospital and granting AI experts access to anonymized personal health data. Methods: We designed a multistep mixed methods approach. First, researchers and employees from stakeholder organizations were invited to participate in semistructured interviews. In the following step, questionnaires were developed based on the participants? answers and distributed among the stakeholders? organizations. In addition, patients and physicians were interviewed. Results: The identified requirements covered a wide range and were conflicting sometimes. Relevant patient requirements included adequate provision of necessary information for data use, clear medical objective of the research and development activities, trustworthiness of the organization collecting the patient data, and data should not be reidentifiable. Requirements of AI researchers and developers encompassed contact with clinical users, an acceptable user interface (UI) for shared data platforms, stable connection to the planned infrastructure, relevant use cases, and assistance in dealing with data privacy regulations. In a next step, a requirements model was developed, which depicts the identified requirements in different layers. This developed model will be used to communicate stakeholder requirements within the pAItient project consortium. Conclusions: The study led to the identification of necessary requirements for the development, testing, and validation of AI applications within a hospital-based generic infrastructure. A requirements model was developed, which will inform the next steps in the development of an AI innovation environment at our institution. Results from our study replicate previous findings from other contexts and will add to the emerging discussion on the use of routine medical data for the development of AI applications. International Registered Report Identifier (IRRID): RR2-10.2196/42208 UR - https://formative.jmir.org/2023/1/e43958 UR - http://dx.doi.org/10.2196/43958 UR - http://www.ncbi.nlm.nih.gov/pubmed/37071450 ID - info:doi/10.2196/43958 ER - TY - JOUR AU - Frid, Santiago AU - Pastor Duran, Xavier AU - Bracons Cucó, Guillem AU - Pedrera-Jiménez, Miguel AU - Serrano-Balazote, Pablo AU - Muñoz Carrero, Adolfo AU - Lozano-Rubí, Raimundo PY - 2023/3/8 TI - An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology JO - JMIR Med Inform SP - e44547 VL - 11 KW - health information interoperability KW - health research KW - health information standards KW - dual model KW - secondary use of health data KW - Observational Medical Outcomes Partnership Common Data Model KW - European Norm/International Organization for Standardization 13606 KW - health records KW - ontologies KW - clinical data N2 - Background: To discover new knowledge from data, they must be correct and in a consistent format. OntoCR, a clinical repository developed at Hospital Clínic de Barcelona, uses ontologies to represent clinical knowledge and map locally defined variables to health information standards and common data models. Objective: The aim of the study is to design and implement a scalable methodology based on the dual-model paradigm and the use of ontologies to consolidate clinical data from different organizations in a standardized repository for research purposes without loss of meaning. Methods: First, the relevant clinical variables are defined, and the corresponding European Norm/International Organization for Standardization (EN/ISO) 13606 archetypes are created. Data sources are then identified, and an extract, transform, and load process is carried out. Once the final data set is obtained, the data are transformed to create EN/ISO 13606?normalized electronic health record (EHR) extracts. Afterward, ontologies that represent archetyped concepts and map them to EN/ISO 13606 and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) standards are created and uploaded to OntoCR. Data stored in the extracts are inserted into its corresponding place in the ontology, thus obtaining instantiated patient data in the ontology-based repository. Finally, data can be extracted via SPARQL queries as OMOP CDM?compliant tables. Results: Using this methodology, EN/ISO 13606?standardized archetypes that allow for the reuse of clinical information were created, and the knowledge representation of our clinical repository by modeling and mapping ontologies was extended. Furthermore, EN/ISO 13606?compliant EHR extracts of patients (6803), episodes (13,938), diagnosis (190,878), administered medication (222,225), cumulative drug dose (222,225), prescribed medication (351,247), movements between units (47,817), clinical observations (6,736,745), laboratory observations (3,392,873), limitation of life-sustaining treatment (1,298), and procedures (19,861) were created. Since the creation of the application that inserts data from extracts into the ontologies is not yet finished, the queries were tested and the methodology was validated by importing data from a random subset of patients into the ontologies using a locally developed Protégé plugin (?OntoLoad?). In total, 10 OMOP CDM?compliant tables (?Condition_occurrence,? 864 records; ?Death,? 110; ?Device_exposure,? 56; ?Drug_exposure,? 5609; ?Measurement,? 2091; ?Observation,? 195; ?Observation_period,? 897; ?Person,? 922; ?Visit_detail,? 772; and ?Visit_occurrence,? 971) were successfully created and populated. Conclusions: This study proposes a methodology for standardizing clinical data, thus allowing its reuse without any changes in the meaning of the modeled concepts. Although this paper focuses on health research, our methodology suggests that the data be initially standardized per EN/ISO 13606 to obtain EHR extracts with a high level of granularity that can be used for any purpose. Ontologies constitute a valuable approach for knowledge representation and standardization of health information in a standard-agnostic manner. With the proposed methodology, institutions can go from local raw data to standardized, semantically interoperable EN/ISO 13606 and OMOP repositories. UR - https://medinform.jmir.org/2023/1/e44547 UR - http://dx.doi.org/10.2196/44547 UR - http://www.ncbi.nlm.nih.gov/pubmed/36884279 ID - info:doi/10.2196/44547 ER - TY - JOUR AU - Waithira, Naomi AU - Kestelyn, Evelyne AU - Chotthanawathit, Keitcheya AU - Osterrieder, Anne AU - Mukaka, Mavuto AU - Lang, Trudie AU - Cheah, Yeong Phaik PY - 2023/3/6 TI - Investigating the Secondary Use of Clinical Research Data: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e44875 VL - 12 KW - data reuse KW - data sharing KW - secondary data use KW - clinical trials data KW - artificial intelligence KW - machine learning KW - individual patient data KW - clinical research KW - barriers KW - online survey KW - mixed methods KW - low- and middle-income country N2 - Background: The increasing emphasis to share patient data from clinical research has resulted in substantial investments in data repositories and infrastructure. However, it is unclear how shared data are used and whether anticipated benefits are being realized. Objective: The purpose of our study is to examine the current utilization of shared clinical research data sets and assess the effects on both scientific research and public health outcomes. Additionally, the study seeks to identify the factors that hinder or facilitate the ethical and efficient use of existing data based on the perspectives of data users. Methods: The study will utilize a mixed methods design, incorporating a cross-sectional survey and in-depth interviews. The survey will involve at least 400 clinical researchers, while the in-depth interviews will include 20 to 40 participants who have utilized data from repositories or institutional data access committees. The survey will target a global sample, while the in-depth interviews will focus on individuals who have used data collected from low- and middle-income countries. Quantitative data will be summarized by using descriptive statistics, while multivariable analyses will be used to assess the relationships between variables. Qualitative data will be analyzed through thematic analysis, and the findings will be reported in accordance with the COREQ (Consolidated Criteria for Reporting Qualitative Research) guidelines. The study received ethical approval from the Oxford Tropical Research Ethics Committee in 2020 (reference number: 568-20). Results: The results of the analysis, including both quantitative data and qualitative data, will be available in 2023. Conclusions: The outcomes of our study will offer crucial understanding into the current status of data reuse in clinical research, serving as a basis for guiding future endeavors to enhance the utilization of shared data for the betterment of public health outcomes and for scientific progress. Trial Registration: Thai Clinical Trials Registry TCTR20210301006; https://tinyurl.com/2p9atzhr International Registered Report Identifier (IRRID): DERR1-10.2196/44875 UR - https://www.researchprotocols.org/2023/1/e44875 UR - http://dx.doi.org/10.2196/44875 UR - http://www.ncbi.nlm.nih.gov/pubmed/36877564 ID - info:doi/10.2196/44875 ER - TY - JOUR AU - de Man, Yvonne AU - Wieland-Jorna, Yvonne AU - Torensma, Bart AU - de Wit, Koos AU - Francke, L. Anneke AU - Oosterveld-Vlug, G. Mariska AU - Verheij, A. Robert PY - 2023/2/28 TI - Opt-In and Opt-Out Consent Procedures for the Reuse of Routinely Recorded Health Data in Scientific Research and Their Consequences for Consent Rate and Consent Bias: Systematic Review JO - J Med Internet Res SP - e42131 VL - 25 KW - real-world data KW - secondary data use KW - electronic health records KW - routine health data KW - consent resentativeness KW - consent bias KW - procedure KW - opt-in KW - opt-out KW - consent rate KW - representativeness N2 - Background: Scientific researchers who wish to reuse health data pertaining to individuals can obtain consent through an opt-in procedure or opt-out procedure. The choice of procedure may have consequences for the consent rate and representativeness of the study sample and the quality of the research, but these consequences are not well known. Objective: This review aimed to provide insight into the consequences for the consent rate and consent bias of the study sample of opt-in procedures versus opt-out procedures for the reuse of routinely recorded health data for scientific research purposes. Methods: A systematic review was performed based on searches in PubMed, Embase, CINAHL, PsycINFO, Web of Science Core Collection, and the Cochrane Library. Two reviewers independently included studies based on predefined eligibility criteria and assessed whether the statistical methods used in the reviewed literature were appropriate for describing the differences between consenters and nonconsenters. Statistical pooling was conducted, and a description of the results was provided. Results: A total of 15 studies were included in this meta-analysis. Of the 15 studies, 13 (87%) implemented an opt-in procedure, 1 (7%) implemented an opt-out procedure, and 1 (7%) implemented both the procedures. The average weighted consent rate was 84% (60,800/72,418 among the studies that used an opt-in procedure and 96.8% (2384/2463) in the single study that used an opt-out procedure. In the single study that described both procedures, the consent rate was 21% in the opt-in group and 95.6% in the opt-out group. Opt-in procedures resulted in more consent bias compared with opt-out procedures. In studies with an opt-in procedure, consenting individuals were more likely to be males, had a higher level of education, higher income, and higher socioeconomic status. Conclusions: Consent rates are generally lower when using an opt-in procedure compared with using an opt-out procedure. Furthermore, in studies with an opt-in procedure, participants are less representative of the study population. However, both the study populations and the way in which opt-in or opt-out procedures were organized varied widely between the studies, which makes it difficult to draw general conclusions regarding the desired balance between patient control over data and learning from health data. The reuse of routinely recorded health data for scientific research purposes may be hampered by administrative burdens and the risk of bias. UR - https://www.jmir.org/2023/1/e42131 UR - http://dx.doi.org/10.2196/42131 UR - http://www.ncbi.nlm.nih.gov/pubmed/36853745 ID - info:doi/10.2196/42131 ER - TY - JOUR AU - Sotoodeh, Mani AU - Zhang, Wenhui AU - Simpson, L. Roy AU - Hertzberg, Stover Vicki AU - Ho, C. Joyce PY - 2023/2/23 TI - A Comprehensive and Improved Definition for Hospital-Acquired Pressure Injury Classification Based on Electronic Health Records: Comparative Study JO - JMIR Med Inform SP - e40672 VL - 11 KW - pressure ulcer KW - decubitus ulcer KW - electronic medical records KW - bedsore KW - nursing KW - data mining KW - electronic health record KW - EHR KW - nursing assessment KW - pressure ulcer care KW - pressure ulcer prevention KW - EHR data KW - EHR systems KW - nursing quality N2 - Background: Patients develop pressure injuries (PIs) in the hospital owing to low mobility, exposure to localized pressure, circulatory conditions, and other predisposing factors. Over 2.5 million Americans develop PIs annually. The Center for Medicare and Medicaid considers hospital-acquired PIs (HAPIs) as the most frequent preventable event, and they are the second most common claim in lawsuits. With the growing use of electronic health records (EHRs) in hospitals, an opportunity exists to build machine learning models to identify and predict HAPI rather than relying on occasional manual assessments by human experts. However, accurate computational models rely on high-quality HAPI data labels. Unfortunately, the different data sources within EHRs can provide conflicting information on HAPI occurrence in the same patient. Furthermore, the existing definitions of HAPI disagree with each other, even within the same patient population. The inconsistent criteria make it impossible to benchmark machine learning methods to predict HAPI. Objective: The objective of this project was threefold. We aimed to identify discrepancies in HAPI sources within EHRs, to develop a comprehensive definition for HAPI classification using data from all EHR sources, and to illustrate the importance of an improved HAPI definition. Methods: We assessed the congruence among HAPI occurrences documented in clinical notes, diagnosis codes, procedure codes, and chart events from the Medical Information Mart for Intensive Care III database. We analyzed the criteria used for the 3 existing HAPI definitions and their adherence to the regulatory guidelines. We proposed the Emory HAPI (EHAPI), which is an improved and more comprehensive HAPI definition. We then evaluated the importance of the labels in training a HAPI classification model using tree-based and sequential neural network classifiers. Results: We illustrate the complexity of defining HAPI, with <13% of hospital stays having at least 3 PI indications documented across 4 data sources. Although chart events were the most common indicator, it was the only PI documentation for >49% of the stays. We demonstrate a lack of congruence across existing HAPI definitions and EHAPI, with only 219 stays having a consensus positive label. Our analysis highlights the importance of our improved HAPI definition, with classifiers trained using our labels outperforming others on a small manually labeled set from nurse annotators and a consensus set in which all definitions agreed on the label. Conclusions: Standardized HAPI definitions are important for accurately assessing HAPI nursing quality metric and determining HAPI incidence for preventive measures. We demonstrate the complexity of defining an occurrence of HAPI, given the conflicting and incomplete EHR data. Our EHAPI definition has favorable properties, making it a suitable candidate for HAPI classification tasks. UR - https://medinform.jmir.org/2023/1/e40672 UR - http://dx.doi.org/10.2196/40672 UR - http://www.ncbi.nlm.nih.gov/pubmed/36649481 ID - info:doi/10.2196/40672 ER - TY - JOUR AU - Vuokko, Riikka AU - Vakkuri, Anne AU - Palojoki, Sari PY - 2023/2/6 TI - Systematized Nomenclature of Medicine?Clinical Terminology (SNOMED CT) Clinical Use Cases in the Context of Electronic Health Record Systems: Systematic Literature Review JO - JMIR Med Inform SP - e43750 VL - 11 KW - clinical KW - electronic health record KW - EHR KW - review method KW - literature review KW - SNOMED CT KW - Systematized Nomenclature for Medicine KW - use case KW - terminology KW - terminologies KW - SNOMED N2 - Background: The Systematized Medical Nomenclature for Medicine?Clinical Terminology (SNOMED CT) is a clinical terminology system that provides a standardized and scientifically validated way of representing clinical information captured by clinicians. It can be integrated into electronic health records (EHRs) to increase the possibilities for effective data use and ensure a better quality of documentation that supports continuity of care, thus enabling better quality in the care process. Even though SNOMED CT consists of extensively studied clinical terminology, previous research has repeatedly documented a lack of scientific evidence for SNOMED CT in the form of reported clinical use cases in electronic health record systems. Objective: The aim of this study was to explore evidence in previous literature reviews of clinical use cases of SNOMED CT integrated into EHR systems or other clinical applications during the last 5 years of continued development. The study sought to identify the main clinical use purposes, use phases, and key clinical benefits documented in SNOMED CT use cases. Methods: The Cochrane review protocol was applied for the study design. The application of the protocol was modified step-by-step to fit the research problem by first defining the search strategy, identifying the articles for the review by isolating the exclusion and inclusion criteria for assessing the search results, and lastly, evaluating and summarizing the review results. Results: In total, 17 research articles illustrating SNOMED CT clinical use cases were reviewed. The use purpose of SNOMED CT was documented in all the articles, with the terminology as a standard in EHR being the most common (8/17). The clinical use phase was documented in all the articles. The most common category of use phases was SNOMED CT in development (6/17). Core benefits achieved by applying SNOMED CT in a clinical context were identified by the researchers. These were related to terminology use outcomes, that is, to data quality in general or to enabling a consistent way of indexing, storing, retrieving, and aggregating clinical data (8/17). Additional benefits were linked to the productivity of coding or to advances in the quality and continuity of care. Conclusions: While the SNOMED CT use categories were well supported by previous research, this review demonstrates that further systematic research on clinical use cases is needed to promote the scalability of the review results. To achieve the best out-of-use case reports, more emphasis is suggested on describing the contextual factors, such as the electronic health care system and the use of previous frameworks to enable comparability of results. A lesson to be drawn from our study is that SNOMED CT is essential for structuring clinical data; however, research is needed to gather more evidence of how SNOMED CT benefits clinical care and patient safety. UR - https://medinform.jmir.org/2023/1/e43750 UR - http://dx.doi.org/10.2196/43750 UR - http://www.ncbi.nlm.nih.gov/pubmed/36745498 ID - info:doi/10.2196/43750 ER - TY - JOUR AU - Kozak, Karol AU - Seidel, André AU - Matvieieva, Nataliia AU - Neupetsch, Constanze AU - Teicher, Uwe AU - Lemme, Gordon AU - Ben Achour, Anas AU - Barth, Martin AU - Ihlenfeldt, Steffen AU - Drossel, Welf-Guntram PY - 2023/1/27 TI - Unique Device Identification?Based Linkage of Hierarchically Accessible Data Domains in Prospective Surgical Hospital Data Ecosystems: User-Centered Design Approach JO - JMIR Med Inform SP - e41614 VL - 11 KW - electronic health record KW - unique device identification KW - cyber-physical production systems KW - mHealth KW - data integration ecosystem KW - hierarchical data access KW - shell embedded role model N2 - Background: The electronic health record (EHR) targets systematized collection of patient-specific, electronically stored health data. The EHR is an evolving concept driven by ongoing developments and open or unclear legal issues concerning medical technologies, cross-domain data integration, and unclear access roles. Consequently, an interdisciplinary discourse based on representative pilot scenarios is required to connect previously unconnected domains. Objective: We address cross-domain data integration including access control using the specific example of a unique device identification (UDI)?expanded hip implant. In fact, the integration of technical focus data into the hospital information system (HIS) is considered based on surgically relevant information. Moreover, the acquisition of social focus data based on mobile health (mHealth) is addressed, covering data integration and networking with therapeutic intervention and acute diagnostics data. Methods: In addition to the additive manufacturing of a hip implant with the integration of a UDI, we built a database that combines database technology and a wrapper layer known from extract, transform, load systems and brings it into a SQL database, WEB application programming interface (API) layer (back end), interface layer (rest API), and front end. It also provides semantic integration through connection mechanisms between data elements. Results: A hip implant is approached by design, production, and verification while linking operation-relevant specifics like implant-bone fit by merging patient-specific image material (computed tomography, magnetic resonance imaging, or a biomodel) and the digital implant twin for well-founded selection pairing. This decision-facilitating linkage, which improves surgical planning, relates to patient-specific postoperative influencing factors during the healing phase. A unique product identification approach is presented, allowing a postoperative read-out with state-of-the-art hospital technology while enabling future access scenarios for patient and implant data. The latter was considered from the manufacturing perspective using the process manufacturing chain for a (patient-specific) implant to identify quality-relevant data for later access. In addition, sensor concepts were identified to use to monitor the patient-implant interaction during the healing phase using wearables, for example. A data aggregation and integration concept for heterogeneous data sources from the considered focus domains is also presented. Finally, a hierarchical data access concept is shown, protecting sensitive patient data from misuse using existing scenarios. Conclusions: Personalized medicine requires cross-domain linkage of data, which, in turn, require an appropriate data infrastructure and adequate hierarchical data access solutions in a shared and federated data space. The hip implant is used as an example for the usefulness of cross-domain data linkage since it bundles social, medical, and technical aspects of the implantation. It is necessary to open existing databases using interfaces for secure integration of data from end devices and to assure availability through suitable access models while guaranteeing long-term, independent data persistence. A suitable strategy requires the combination of technical solutions from the areas of identity and trust, federated data storage, cryptographic procedures, and software engineering as well as organizational changes. UR - https://medinform.jmir.org/2023/1/e41614 UR - http://dx.doi.org/10.2196/41614 UR - http://www.ncbi.nlm.nih.gov/pubmed/36705946 ID - info:doi/10.2196/41614 ER - TY - JOUR AU - Reinecke, Ines AU - Siebel, Joscha AU - Fuhrmann, Saskia AU - Fischer, Andreas AU - Sedlmayr, Martin AU - Weidner, Jens AU - Bathelt, Franziska PY - 2023/1/25 TI - Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation JO - JMIR Med Inform SP - e40312 VL - 11 KW - secondary usage KW - Observational Medical Outcomes Partnership KW - OMOP KW - drug data KW - data quality KW - Anatomical Therapeutic Chemical KW - ATC KW - RxNorm KW - interoperability N2 - Background: Digitization offers a multitude of opportunities to gain insights into current diagnostics and therapies from retrospective data. In this context, real-world data and their accessibility are of increasing importance to support unbiased and reliable research on big data. However, routinely collected data are not readily usable for research owing to the unstructured nature of health care systems and a lack of interoperability between these systems. This challenge is evident in drug data. Objective: This study aimed to present an approach that identifies and increases the structuredness of drug data while ensuring standardization according to Anatomical Therapeutic Chemical (ATC) classification. Methods: Our approach was based on available drug prescriptions and a drug catalog and consisted of 4 steps. First, we performed an initial analysis of the structuredness of local drug data to define a point of comparison for the effectiveness of the overall approach. Second, we applied 3 algorithms to unstructured data that translated text into ATC codes based on string comparisons in terms of ingredients and product names and performed similarity comparisons based on Levenshtein distance. Third, we validated the results of the 3 algorithms with expert knowledge based on the 1000 most frequently used prescription texts. Fourth, we performed a final validation to determine the increased degree of structuredness. Results: Initially, 47.73% (n=843,980) of 1,768,153 drug prescriptions were classified as structured. With the application of the 3 algorithms, we were able to increase the degree of structuredness to 85.18% (n=1,506,059) based on the 1000 most frequent medication prescriptions. In this regard, the combination of algorithms 1, 2, and 3 resulted in a correctness level of 100% (with 57,264 ATC codes identified), algorithms 1 and 3 resulted in 99.6% (with 152,404 codes identified), and algorithms 1 and 2 resulted in 95.9% (with 39,472 codes identified). Conclusions: As shown in the first analysis steps of our approach, the availability of a product catalog to select during the documentation process is not sufficient to generate structured data. Our 4-step approach reduces the problems and reliably increases the structuredness automatically. Similarity matching shows promising results, particularly for entries with no connection to a product catalog. However, further enhancement of the correctness of such a similarity matching algorithm needs to be investigated in future work. UR - https://medinform.jmir.org/2023/1/e40312 UR - http://dx.doi.org/10.2196/40312 UR - http://www.ncbi.nlm.nih.gov/pubmed/36696159 ID - info:doi/10.2196/40312 ER - TY - JOUR AU - Miyaji, Atsuko AU - Watanabe, Kaname AU - Takano, Yuuki AU - Nakasho, Kazuhisa AU - Nakamura, Sho AU - Wang, Yuntao AU - Narimatsu, Hiroto PY - 2022/12/30 TI - A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System JO - JMIR Med Inform SP - e38922 VL - 10 IS - 12 KW - data linkage KW - data security KW - secure data integration KW - privacy-preserving linkage KW - secure matching privacy-preserving linkage KW - private set intersection KW - PSI KW - privacy-preserving distributed data integration KW - PDDI KW - big data KW - medical informatics KW - cancer prevention KW - cancer epidemiology KW - epidemiological survey N2 - Background: Big data useful for epidemiological research can be obtained by integrating data corresponding to individuals between databases managed by different institutions. Privacy information must be protected while performing efficient, high-level data matching. Objective: Privacy-preserving distributed data integration (PDDI) enables data matching between multiple databases without moving privacy information; however, its actual implementation requires matching security, accuracy, and performance. Moreover, identifying the optimal data item in the absence of a unique matching key is necessary. We aimed to conduct a basic matching experiment using a model to assess the accuracy of cancer screening. Methods: To experiment with actual data, we created a data set mimicking the cancer screening and registration data in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the data set. The matching-key error rate of the data common to both data sets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancers, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer-screening data points, and the effect of matching accuracy on the sensitivity and specificity of cancer screening was estimated based on the obtained values. To evaluate the performance, we measured central processing unit use, memory use, and network traffic. Results: For combinations with a specificity ?99% and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and 100% specificity. When the matching sensitivity was 82.26%, the apparent screening sensitivity was maintained at 90%, and the screening specificity decreased to 89.89%. For 214 data points, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory use was 3.4 GB for the PDDI server and 2.7 GB for the data-holding institutions. Conclusions: We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer-screening accuracy assessment. We plan to conduct matching experiments based on actual data and compare them with the existing methods. UR - https://medinform.jmir.org/2022/12/e38922 UR - http://dx.doi.org/10.2196/38922 UR - http://www.ncbi.nlm.nih.gov/pubmed/36583931 ID - info:doi/10.2196/38922 ER - TY - JOUR AU - MacKenna, Brian AU - Curtis, J. Helen AU - Hopcroft, M. Lisa E. AU - Walker, J. Alex AU - Croker, Richard AU - Macdonald, Orla AU - Evans, W. Stephen J. AU - Inglesby, Peter AU - Evans, David AU - Morley, Jessica AU - Bacon, J. Sebastian C. AU - Goldacre, Ben PY - 2022/12/20 TI - Identifying Patterns of Clinical Interest in Clinicians? Treatment Preferences: Hypothesis-free Data Science Approach to Prioritizing Prescribing Outliers for Clinical Review JO - JMIR Med Inform SP - e41200 VL - 10 IS - 12 KW - prescribing KW - NHS England KW - antipsychotics KW - promazine hydrochloride KW - pericyazine KW - clinical audit KW - data science N2 - Background: Data analysis is used to identify signals suggestive of variation in treatment choice or clinical outcome. Analyses to date have generally focused on a hypothesis-driven approach. Objective: This study aimed to develop a hypothesis-free approach to identify unusual prescribing behavior in primary care data. We aimed to apply this methodology to a national data set in a cross-sectional study to identify chemicals with significant variation in use across Clinical Commissioning Groups (CCGs) for further clinical review, thereby demonstrating proof of concept for prioritization approaches. Methods: Here we report a new data-driven approach to identify unusual prescribing behaviour in primary care data. This approach first applies a set of filtering steps to identify chemicals with prescribing rate distributions likely to contain outliers, then applies two ranking approaches to identify the most extreme outliers amongst those candidates. This methodology has been applied to three months of national prescribing data (June-August 2017). Results: Our methodology provides rankings for all chemicals by administrative region. We provide illustrative results for 2 antipsychotic drugs of particular clinical interest: promazine hydrochloride and pericyazine, which rank highly by outlier metrics. Specifically, our method identifies that, while promazine hydrochloride and pericyazine are barely used by most clinicians (with national prescribing rates of 11.1 and 6.2 per 1000 antipsychotic prescriptions, respectively), they make up a substantial proportion of antipsychotic prescribing in 2 small geographic regions in England during the study period (with maximum regional prescribing rates of 298.7 and 241.1 per 1000 antipsychotic prescriptions, respectively). Conclusions: Our hypothesis-free approach is able to identify candidates for audit and review in clinical practice. To illustrate this, we provide 2 examples of 2 very unusual antipsychotics used disproportionately in 2 small geographic areas of England. UR - https://medinform.jmir.org/2022/12/e41200 UR - http://dx.doi.org/10.2196/41200 UR - http://www.ncbi.nlm.nih.gov/pubmed/36538350 ID - info:doi/10.2196/41200 ER - TY - JOUR AU - Leston, Meredith AU - Elson, H. William AU - Watson, Conall AU - Lakhani, Anissa AU - Aspden, Carole AU - Bankhead, R. Clare AU - Borrow, Ray AU - Button, Elizabeth AU - Byford, Rachel AU - Elliot, J. Alex AU - Fan, Xuejuan AU - Hoang, Uy AU - Linley, Ezra AU - Macartney, Jack AU - Nicholson, D. Brian AU - Okusi, Cecilia AU - Ramsay, Mary AU - Smith, Gillian AU - Smith, Sue AU - Thomas, Mark AU - Todkill, Dan AU - Tsang, SM Ruby AU - Victor, William AU - Williams, J. Alice AU - Williams, John AU - Zambon, Maria AU - Howsam, Gary AU - Amirthalingam, Gayatri AU - Lopez-Bernal, Jamie AU - Hobbs, Richard F. D. AU - de Lusignan, Simon PY - 2022/12/19 TI - Representativeness, Vaccination Uptake, and COVID-19 Clinical Outcomes 2020-2021 in the UK Oxford-Royal College of General Practitioners Research and Surveillance Network: Cohort Profile Summary JO - JMIR Public Health Surveill SP - e39141 VL - 8 IS - 12 KW - cohort profile KW - computerized medical record systems KW - general practice KW - influenza KW - COVID-19 KW - sentinel surveillance KW - syndromic surveillance KW - serology KW - virology KW - public health KW - digital surveillance KW - vaccination KW - primary care data KW - health data KW - cohort KW - virus KW - immunology KW - surveillance KW - representation KW - uptake KW - outcome KW - hospital KW - sampling KW - monitoring N2 - Background: The Oxford-Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) is one of Europe?s oldest sentinel systems, working with the UK Health Security Agency (UKHSA) and its predecessor bodies for 55 years. Its surveillance report now runs twice weekly, supplemented by online observatories. In addition to conducting sentinel surveillance from a nationally representative group of practices, the RSC is now also providing data for syndromic surveillance. Objective: The aim of this study was to describe the cohort profile at the start of the 2021-2022 surveillance season and recent changes to our surveillance practice. Methods: The RSC?s pseudonymized primary care data, linked to hospital and other data, are held in the Oxford-RCGP Clinical Informatics Digital Hub, a Trusted Research Environment. We describe the RSC?s cohort profile as of September 2021, divided into a Primary Care Sentinel Cohort (PCSC)?collecting virological and serological specimens?and a larger group of syndromic surveillance general practices (SSGPs). We report changes to our sampling strategy that brings the RSC into alignment with European Centre for Disease Control guidance and then compare our cohort?s sociodemographic characteristics with Office for National Statistics data. We further describe influenza and COVID-19 vaccine coverage for the 2020-2021 season (week 40 of 2020 to week 39 of 2021), with the latter differentiated by vaccine brand. Finally, we report COVID-19?related outcomes in terms of hospitalization, intensive care unit (ICU) admission, and death. Results: As a response to COVID-19, the RSC grew from just over 500 PCSC practices in 2019 to 1879 practices in 2021 (PCSC, n=938; SSGP, n=1203). This represents 28.6% of English general practices and 30.59% (17,299,780/56,550,136) of the population. In the reporting period, the PCSC collected >8000 virology and >23,000 serology samples. The RSC population was broadly representative of the national population in terms of age, gender, ethnicity, National Health Service Region, socioeconomic status, obesity, and smoking habit. The RSC captured vaccine coverage data for influenza (n=5.4 million) and COVID-19, reporting dose one (n=11.9 million), two (n=11 million), and three (n=0.4 million) for the latter as well as brand-specific uptake data (AstraZeneca vaccine, n=11.6 million; Pfizer, n=10.8 million; and Moderna, n=0.7 million). The median (IQR) number of COVID-19 hospitalizations and ICU admissions was 1181 (559-1559) and 115 (50-174) per week, respectively. Conclusions: The RSC is broadly representative of the national population; its PCSC is geographically representative and its SSGPs are newly supporting UKHSA syndromic surveillance efforts. The network captures vaccine coverage and has expanded from reporting primary care attendances to providing data on onward hospital outcomes and deaths. The challenge remains to increase virological and serological sampling to monitor the effectiveness and waning of all vaccines available in a timely manner. UR - https://publichealth.jmir.org/2022/12/e39141 UR - http://dx.doi.org/10.2196/39141 UR - http://www.ncbi.nlm.nih.gov/pubmed/36534462 ID - info:doi/10.2196/39141 ER - TY - JOUR AU - Meaney, Christopher AU - Escobar, Michael AU - Stukel, A. Therese AU - Austin, C. Peter AU - Jaakkimainen, Liisa PY - 2022/12/19 TI - Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study JO - JMIR Med Inform SP - e40102 VL - 10 IS - 12 KW - clinical text data KW - temporal topic model KW - nonnegative matrix factorization KW - latent Dirichlet allocation KW - structural topic model KW - BERTopic KW - text mining N2 - Background: Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. Objective: We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada. Methods: We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model. Results: Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs). Conclusions: Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system. UR - https://medinform.jmir.org/2022/12/e40102 UR - http://dx.doi.org/10.2196/40102 UR - http://www.ncbi.nlm.nih.gov/pubmed/36534443 ID - info:doi/10.2196/40102 ER - TY - JOUR AU - Kosowan, Leanne AU - Singer, Alexander AU - Zulkernine, Farhana AU - Zafari, Hasan AU - Nesca, Marcello AU - Muthumuni, Dhasni PY - 2022/12/13 TI - Pan-Canadian Electronic Medical Record Diagnostic and Unstructured Text Data for Capturing PTSD: Retrospective Observational Study JO - JMIR Med Inform SP - e41312 VL - 10 IS - 12 KW - electronic health records KW - EHR KW - natural language processing KW - NLP KW - medical informatics KW - primary health care KW - stress disorders, posttraumatic KW - posttraumatic stress disorder KW - PTSD N2 - Background: The availability of electronic medical record (EMR) free-text data for research varies. However, access to short diagnostic text fields is more widely available. Objective: This study assesses agreement between free-text and short diagnostic text data from primary care EMR for identification of posttraumatic stress disorder (PTSD). Methods: This retrospective cross-sectional study used EMR data from a pan-Canadian repository representing 1574 primary care providers at 265 clinics using 11 EMR vendors. Medical record review using free text and short diagnostic text fields of the EMR produced reference standards for PTSD. Agreement was assessed with sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Results: Our reference set contained 327 patients with free text and short diagnostic text. Among these patients, agreement between free text and short diagnostic text had an accuracy of 93.6% (CI 90.4%-96.0%). In a single Canadian province, case definitions 1 and 4 had a sensitivity of 82.6% (CI 74.4%-89.0%) and specificity of 99.5% (CI 97.4%-100%). However, when the reference set was expanded to a pan-Canada reference (n=12,104 patients), case definition 4 had the strongest agreement (sensitivity: 91.1%, CI 90.1%-91.9%; specificity: 99.1%, CI 98.9%-99.3%). Conclusions: Inclusion of free-text encounter notes during medical record review did not lead to improved capture of PTSD cases, nor did it lead to significant changes in case definition agreement. Within this pan-Canadian database, jurisdictional differences in diagnostic codes and EMR structure suggested the need to supplement diagnostic codes with natural language processing to capture PTSD. When unavailable, short diagnostic text can supplement free-text data for reference set creation and case validation. Application of the PTSD case definition can inform PTSD prevalence and characteristics. UR - https://medinform.jmir.org/2022/12/e41312 UR - http://dx.doi.org/10.2196/41312 UR - http://www.ncbi.nlm.nih.gov/pubmed/36512389 ID - info:doi/10.2196/41312 ER - TY - JOUR AU - Wang, Kai AU - Luan, Zemin AU - Guo, Zihao AU - Ran, Jinjun AU - Tian, Maozai AU - Zhao, Shi PY - 2022/11/18 TI - The Association Between Clinical Severity and Incubation Period of SARS-CoV-2 Delta Variants: Retrospective Observational Study JO - JMIR Public Health Surveill SP - e40751 VL - 8 IS - 11 KW - COVID-19 KW - Delta variant KW - incubation period KW - clinical severity KW - China N2 - Background: As of August 25, 2021, Jiangsu province experienced the largest COVID-19 outbreak in eastern China that was seeded by SARS-CoV-2 Delta variants. As one of the key epidemiological parameters characterizing the transmission dynamics of COVID-19, the incubation period plays an essential role in informing public health measures for epidemic control. The incubation period of COVID-19 could vary by different age, sex, disease severity, and study settings. However, the impacts of these factors on the incubation period of Delta variants remains uninvestigated. Objective: The objective of this study is to characterize the incubation period of the Delta variant using detailed contact tracing data. The effects of age, sex, and disease severity on the incubation period were investigated by multivariate regression analysis and subgroup analysis. Methods: We extracted contact tracing data of 353 laboratory-confirmed cases of SARS-CoV-2 Delta variants? infection in Jiangsu province, China, from July to August 2021. The distribution of incubation period of Delta variants was estimated by using likelihood-based approach with adjustment for interval-censored observations. The effects of age, sex, and disease severity on the incubation period were expiated by using multivariate logistic regression model with interval censoring. Results: The mean incubation period of the Delta variant was estimated at 6.64 days (95% credible interval: 6.27-7.00). We found that female cases and cases with severe symptoms had relatively longer mean incubation periods than male cases and those with nonsevere symptoms, respectively. One-day increase in the incubation period of Delta variants was associated with a weak decrease in the probability of having severe illness with an adjusted odds ratio of 0.88 (95% credible interval: 0.71-1.07). Conclusions: In this study, the incubation period was found to vary across different levels of sex, age, and disease severity of COVID-19. These findings provide additional information on the incubation period of Delta variants and highlight the importance of continuing surveillance and monitoring of the epidemiological characteristics of emerging SARS-CoV-2 variants as they evolve. UR - https://publichealth.jmir.org/2022/11/e40751 UR - http://dx.doi.org/10.2196/40751 UR - http://www.ncbi.nlm.nih.gov/pubmed/36346940 ID - info:doi/10.2196/40751 ER - TY - JOUR AU - Chen, Pei-Fu AU - He, Tai-Liang AU - Lin, Sheng-Che AU - Chu, Yuan-Chia AU - Kuo, Chen-Tsung AU - Lai, Feipei AU - Wang, Ssu-Ming AU - Zhu, Wan-Xuan AU - Chen, Kuan-Chih AU - Kuo, Lu-Cheng AU - Hung, Fang-Ming AU - Lin, Yu-Cheng AU - Tsai, I-Chang AU - Chiu, Chi-Hao AU - Chang, Shu-Chih AU - Yang, Chi-Yu PY - 2022/11/10 TI - Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study JO - JMIR Med Inform SP - e41342 VL - 10 IS - 11 KW - federated learning KW - International Classification of Diseases KW - machine learning KW - natural language processing KW - multilabel text classification N2 - Background: The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase a model?s performance and external validity, the privacy of clinical documents should be protected. We used federated learning to train a model with multicenter data, without sharing data per se. Objective: This study aims to train a classification model via federated learning for ICD-10 multilabel classification. Methods: Text data from discharge notes in electronic medical records were collected from the following three medical centers: Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital. After comparing the performance of different variants of bidirectional encoder representations from transformers (BERT), PubMedBERT was chosen for the word embeddings. With regard to preprocessing, the nonalphanumeric characters were retained because the model?s performance decreased after the removal of these characters. To explain the outputs of our model, we added a label attention mechanism to the model architecture. The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F1 score was used to evaluate model performance across all 3 centers. Results: The F1 scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F1 score of the model that retained nonalphanumeric characters was 0.8120, whereas the F1 score after removing these characters was 0.7875?a decrease of 0.0245 (3.11%). The F1 scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture. Conclusions: Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model?s performance was better than that of models that were trained locally. UR - https://medinform.jmir.org/2022/11/e41342 UR - http://dx.doi.org/10.2196/41342 UR - http://www.ncbi.nlm.nih.gov/pubmed/36355417 ID - info:doi/10.2196/41342 ER - TY - JOUR AU - Dehesh, Paria AU - Baradaran, Reza Hamid AU - Eshrati, Babak AU - Motevalian, Abbas Seyed AU - Salehi, Masoud AU - Donyavi, Tahereh PY - 2022/11/8 TI - The Relationship Between Population-Level SARS-CoV-2 Cycle Threshold Values and Trend of COVID-19 Infection: Longitudinal Study JO - JMIR Public Health Surveill SP - e36424 VL - 8 IS - 11 KW - cycle threshold value KW - COVID-19 KW - trend KW - surveillance KW - epidemiology KW - disease surveillance KW - digital surveillance KW - prediction model KW - epidemic modeling KW - health system KW - infectious disease N2 - Background: The distribution of population-level real-time reverse transcription-polymerase chain reaction (RT-PCR) cycle threshold (Ct) values as a proxy of viral load may be a useful indicator for predicting COVID-19 dynamics. Objective: The aim of this study was to determine the relationship between the daily trend of average Ct values and COVID-19 dynamics, calculated as the daily number of hospitalized patients with COVID-19, daily number of new positive tests, daily number of COVID-19 deaths, and number of hospitalized patients with COVID-19 by age. We further sought to determine the lag between these data series. Methods: The samples included in this study were collected from March 21, 2021, to December 1, 2021. Daily Ct values of all patients who were referred to the Molecular Diagnostic Laboratory of Iran University of Medical Sciences in Tehran, Iran, for RT-PCR tests were recorded. The daily number of positive tests and the number of hospitalized patients by age group were extracted from the COVID-19 patient information registration system in Tehran province, Iran. An autoregressive integrated moving average (ARIMA) model was constructed for the time series of variables. Cross-correlation analysis was then performed to determine the best lag and correlations between the average daily Ct value and other COVID-19 dynamics?related variables. Finally, the best-selected lag of Ct identified through cross-correlation was incorporated as a covariate into the autoregressive integrated moving average with exogenous variables (ARIMAX) model to calculate the coefficients. Results: Daily average Ct values showed a significant negative correlation (23-day time delay) with the daily number of newly hospitalized patients (P=.02), 30-day time delay with the daily number of new positive tests (P=.02), and daily number of COVID-19 deaths (P=.02). The daily average Ct value with a 30-day delay could impact the daily number of positive tests for COVID-19 (?=?16.87, P<.001) and the daily number of deaths from COVID-19 (?=?1.52, P=.03). There was a significant association between Ct lag (23 days) and the number of COVID-19 hospitalizations (?=?24.12, P=.005). Cross-correlation analysis showed significant time delays in the average Ct values and daily hospitalized patients between 18-59 years (23-day time delay, P=.02) and in patients over 60 years old (23-day time delay, P<.001). No statistically significant relation was detected in the number of daily hospitalized patients under 5 years old (9-day time delay, P=.27) and aged 5-17 years (13-day time delay, P=.39). Conclusions: It is important for surveillance of COVID-19 to find a good indicator that can predict epidemic surges in the community. Our results suggest that the average daily Ct value with a 30-day delay can predict increases in the number of positive confirmed COVID-19 cases, which may be a useful indicator for the health system. UR - https://publichealth.jmir.org/2022/11/e36424 UR - http://dx.doi.org/10.2196/36424 UR - http://www.ncbi.nlm.nih.gov/pubmed/36240022 ID - info:doi/10.2196/36424 ER - TY - JOUR AU - Guardiolle, Vianney AU - Bazoge, Adrien AU - Morin, Emmanuel AU - Daille, Béatrice AU - Toublant, Delphine AU - Bouzillé, Guillaume AU - Merel, Youenn AU - Pierre-Jean, Morgane AU - Filiot, Alexandre AU - Cuggia, Marc AU - Wargny, Matthieu AU - Lamer, Antoine AU - Gourraud, Pierre-Antoine PY - 2022/11/1 TI - Linking Biomedical Data Warehouse Records With the National Mortality Database in France: Large-scale Matching Algorithm JO - JMIR Med Inform SP - e36711 VL - 10 IS - 11 KW - data warehousing KW - clinical data warehouse KW - medical informatics applications KW - medical record linkage KW - French National Mortality Database KW - data reuse KW - open data, R KW - clinical informatics N2 - Background: Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absence of unique common identifiers between the 2 databases, names changing over life, clerical errors, and the exponential growth of the number of comparisons to compute. Objective: We aimed to develop a new algorithm for matching BDW records to the FNMD and evaluated its performance. Methods: We developed a deterministic algorithm based on advanced data cleaning and knowledge of the naming system and the Damerau-Levenshtein distance (DLD). The algorithm?s performance was independently assessed using BDW data of 3 university hospitals: Lille, Nantes, and Rennes. Specificity was evaluated with living patients on January 1, 2016 (ie, patients with at least 1 hospital encounter before and after this date). Sensitivity was evaluated with patients recorded as deceased between January 1, 2001, and December 31, 2020. The DLD-based algorithm was compared to a direct matching algorithm with minimal data cleaning as a reference. Results: All centers combined, sensitivity was 11% higher for the DLD-based algorithm (93.3%, 95% CI 92.8-93.9) than for the direct algorithm (82.7%, 95% CI 81.8-83.6; P<.001). Sensitivity was superior for men at 2 centers (Nantes: 87%, 95% CI 85.1-89 vs 83.6%, 95% CI 81.4-85.8; P=.006; Rennes: 98.6%, 95% CI 98.1-99.2 vs 96%, 95% CI 94.9-97.1; P<.001) and for patients born in France at all centers (Nantes: 85.8%, 95% CI 84.3-87.3 vs 74.9%, 95% CI 72.8-77.0; P<.001). The DLD-based algorithm revealed significant differences in sensitivity among centers (Nantes, 85.3% vs Lille and Rennes, 97.3%, P<.001). Specificity was >98% in all subgroups. Our algorithm matched tens of millions of death records from BDWs, with parallel computing capabilities and low RAM requirements. We used the Inseehop open-source R script for this measurement. Conclusions: Overall, sensitivity/recall was 11% higher using the DLD-based algorithm than that using the direct algorithm. This shows the importance of advanced data cleaning and knowledge of a naming system through DLD use. Statistically significant differences in sensitivity between groups could be found and must be considered when performing an analysis to avoid differential biases. Our algorithm, originally conceived for linking a BDW with the FNMD, can be used to match any large-scale databases. While matching operations using names are considered sensitive computational operations, the Inseehop package released here is easy to run on premises, thereby facilitating compliance with cybersecurity local framework. The use of an advanced deterministic matching algorithm such as the DLD-based algorithm is an insightful example of combining open-source external data to improve the usage value of BDWs. UR - https://medinform.jmir.org/2022/11/e36711 UR - http://dx.doi.org/10.2196/36711 UR - http://www.ncbi.nlm.nih.gov/pubmed/36318244 ID - info:doi/10.2196/36711 ER - TY - JOUR AU - Pickering, Gisèle AU - Mezouar, Linda AU - Kechemir, Hayet AU - Ebel-Bitoun, Caty PY - 2022/10/27 TI - Paracetamol Use in Patients With Osteoarthritis and Lower Back Pain: Infodemiology Study and Observational Analysis of Electronic Medical Record Data JO - JMIR Public Health Surveill SP - e37790 VL - 8 IS - 10 KW - osteoarthritis KW - lower back pain KW - general practice KW - rheumatology KW - paracetamol KW - real-world evidence N2 - Background: Lower back pain (LBP) and osteoarthritis (OA) are common musculoskeletal disorders and account for around 17.0% of years lived with disability worldwide; however, there is a lack of real-world data on these conditions. Paracetamol brands are frequently prescribed in France for musculoskeletal pain and include Doliprane, Dafalgan, and Ixprim (tramadol-paracetamol). Objective: The objective of this retrospective study was to understand the journey of patients with LBP or OA when treated with paracetamol. Methods: Three studies were undertaken. Two studies analyzed electronic medical records from general practitioners (GPs) and rheumatologists of patients with OA or LBP, who had received at least one paracetamol prescription between 2013 and 2018 in France. Data were extracted, anonymized, and stratified by gender, age, and provider specialty. The third study, an infodemiology study, analyzed associations between terms used on public medical forums and Twitter in France and the United States for OA only. Results: In the first 2 studies, among patients with LBP (98,998), most (n=92,068, 93.0%) saw a GP, and Doliprane was a first-line therapy for 87.0% (n=86,128) of patients (71.0% [n=61,151] in combination with nonsteroidal anti-inflammatory drugs [NSAIDs] or opioids). Among patients with OA (99,997), most (n=84,997, 85.0%) saw a GP, and Doliprane was a first-line therapy for 83.0% (n=82,998) of patients (62.0% [n=51,459] in combination). Overall, paracetamol monotherapy prescriptions decreased as episodes increased. In the third study, in line with available literature, the data confirmed that the prevalence of OA increases with age (91.5% [212,875/232,650] above 41 years), OA is more predominant in females (46,530/232,650, 20.0%), and paracetamol use varies between GPs and rheumatologists. Conclusions: This health surveillance analysis provides a better understanding of the journey for patients with LBP or OA. These data confirmed that although paracetamol remains the most common first-line analgesic for patients with LBP and OA, usage varies among patients and health care specialists, and there are concerns over efficacy. UR - https://publichealth.jmir.org/2022/10/e37790 UR - http://dx.doi.org/10.2196/37790 UR - http://www.ncbi.nlm.nih.gov/pubmed/36301591 ID - info:doi/10.2196/37790 ER - TY - JOUR AU - Rosario, Bedda AU - Zhang, Andrew AU - Patel, Mehool AU - Rajmane, Amol AU - Xie, Ning AU - Weeraratne, Dilhan AU - Alterovitz, Gil PY - 2022/10/21 TI - Characterizing Thrombotic Complication Risk Factors Associated With COVID-19 via Heterogeneous Patient Data: Retrospective Observational Study JO - J Med Internet Res SP - e35860 VL - 24 IS - 10 KW - COVID-19 KW - thrombotic complications KW - logistic regression KW - EHR KW - electronic health record KW - insurance claims data N2 - Background: COVID-19 has been observed to be associated with venous and arterial thrombosis. The inflammatory disease prolongs hospitalization, and preexisting comorbidities can intensity the thrombotic burden in patients with COVID-19. However, venous thromboembolism, arterial thrombosis, and other vascular complications may go unnoticed in critical care settings. Early risk stratification is paramount in the COVID-19 patient population for proactive monitoring of thrombotic complications. Objective: The aim of this exploratory research was to characterize thrombotic complication risk factors associated with COVID-19 using information from electronic health record (EHR) and insurance claims databases. The goal is to develop an approach for analysis using real-world data evidence that can be generalized to characterize thrombotic complications and additional conditions in other clinical settings as well, such as pneumonia or acute respiratory distress syndrome in COVID-19 patients or in the intensive care unit. Methods: We extracted deidentified patient data from the insurance claims database IBM MarketScan, and formulated hypotheses on thrombotic complications in patients with COVID-19 with respect to patient demographic and clinical factors using logistic regression. The hypotheses were then verified with analysis of deidentified patient data from the Research Patient Data Registry (RPDR) Mass General Brigham (MGB) patient EHR database. Data were analyzed according to odds ratios, 95% CIs, and P values. Results: The analysis identified significant predictors (P<.001) for thrombotic complications in 184,831 COVID-19 patients out of the millions of records from IBM MarketScan and the MGB RPDR. With respect to age groups, patients 60 years and older had higher odds (4.866 in MarketScan and 6.357 in RPDR) to have thrombotic complications than those under 60 years old. In terms of gender, men were more likely (odds ratio of 1.245 in MarketScan and 1.693 in RPDR) to have thrombotic complications than women. Among the preexisting comorbidities, patients with heart disease, cerebrovascular diseases, hypertension, and personal history of thrombosis all had significantly higher odds of developing a thrombotic complication. Cancer and obesity were also associated with odds>1. The results from RPDR validated the IBM MarketScan findings, as they were largely consistent and afford mutual enrichment. Conclusions: The analysis approach adopted in this study can work across heterogeneous databases from diverse organizations and thus facilitates collaboration. Searching through millions of patient records, the analysis helped to identify factors influencing a phenotype. Use of thrombotic complications in COVID-19 patients represents only a case study; however, the same design can be used across other disease areas by extracting corresponding disease-specific patient data from available databases. UR - https://www.jmir.org/2022/10/e35860 UR - http://dx.doi.org/10.2196/35860 UR - http://www.ncbi.nlm.nih.gov/pubmed/36044652 ID - info:doi/10.2196/35860 ER - TY - JOUR AU - Maletzky, Alexander AU - Böck, Carl AU - Tschoellitsch, Thomas AU - Roland, Theresa AU - Ludwig, Helga AU - Thumfart, Stefan AU - Giretzlehner, Michael AU - Hochreiter, Sepp AU - Meier, Jens PY - 2022/10/21 TI - Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities JO - JMIR Med Inform SP - e38557 VL - 10 IS - 10 KW - electronic health record KW - medical data preparation KW - machine learning KW - retrospective data analysis UR - https://medinform.jmir.org/2022/10/e38557 UR - http://dx.doi.org/10.2196/38557 UR - http://www.ncbi.nlm.nih.gov/pubmed/36269654 ID - info:doi/10.2196/38557 ER - TY - JOUR AU - Karystianis, George AU - Cabral, Carines Rina AU - Adily, Armita AU - Lukmanjaya, Wilson AU - Schofield, Peter AU - Buchan, Iain AU - Nenadic, Goran AU - Butler, Tony PY - 2022/10/20 TI - Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study JO - JMIR Form Res SP - e39373 VL - 6 IS - 10 KW - data linkage KW - mental health KW - domestic violence KW - police records KW - hospital records KW - text mining N2 - Background: To better understand domestic violence, data sources from multiple sectors such as police, justice, health, and welfare are needed. Linking police data to data collections from other agencies could provide unique insights and promote an all-of-government response to domestic violence. The New South Wales Police Force attends domestic violence events and records information in the form of both structured data and a free-text narrative, with the latter shown to be a rich source of information on the mental health status of persons of interest (POIs) and victims, abuse types, and sustained injuries. Objective: This study aims to examine the concordance (ie, matching) between mental illness mentions extracted from the police?s event narratives and mental health diagnoses from hospital and emergency department records. Methods: We applied a rule-based text mining method on 416,441 domestic violence police event narratives between December 2005 and January 2016 to identify mental illness mentions for POIs and victims. Using different window periods (1, 3, 6, and 12 months) before and after a domestic violence event, we linked the extracted mental illness mentions of victims and POIs to clinical records from the Emergency Department Data Collection and the Admitted Patient Data Collection in New South Wales, Australia using a unique identifier for each individual in the same cohort. Results: Using a 2-year window period (ie, 12 months before and after the domestic violence event), less than 1% (3020/416,441, 0.73%) of events had a mental illness mention and also a corresponding hospital record. About 16% of domestic violence events for both POIs (382/2395, 15.95%) and victims (101/631, 16.01%) had an agreement between hospital records and police narrative mentions of mental illness. A total of 51,025/416,441 (12.25%) events for POIs and 14,802/416,441 (3.55%) events for victims had mental illness mentions in their narratives but no hospital record. Only 841 events for POIs and 919 events for victims had a documented hospital record within 48 hours of the domestic violence event. Conclusions: Our findings suggest that current surveillance systems used to report on domestic violence may be enhanced by accessing rich information (ie, mental illness) contained in police text narratives, made available for both POIs and victims through the application of text mining. Additional insights can be gained by linkage to other health and welfare data collections. UR - https://formative.jmir.org/2022/10/e39373 UR - http://dx.doi.org/10.2196/39373 UR - http://www.ncbi.nlm.nih.gov/pubmed/36264613 ID - info:doi/10.2196/39373 ER - TY - JOUR AU - Lamer, Antoine AU - Fruchart, Mathilde AU - Paris, Nicolas AU - Popoff, Benjamin AU - Payen, Anaïs AU - Balcaen, Thibaut AU - Gacquer, William AU - Bouzillé, Guillaume AU - Cuggia, Marc AU - Doutreligne, Matthieu AU - Chazard, Emmanuel PY - 2022/10/17 TI - Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study JO - JMIR Med Inform SP - e38936 VL - 10 IS - 10 KW - feature extraction KW - data reuse KW - data warehouse KW - database KW - algorithm KW - Observation Medical Outcomes Partnership N2 - Background: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. Objective: The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. Methods: This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). Results: We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. ?Track? is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). ?Feature? is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables ?TRACK? and ?FEATURE? to store variables obtained in feature extraction and extend the OMOP CDM. Conclusions: We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies. UR - https://medinform.jmir.org/2022/10/e38936 UR - http://dx.doi.org/10.2196/38936 UR - http://www.ncbi.nlm.nih.gov/pubmed/36251369 ID - info:doi/10.2196/38936 ER - TY - JOUR AU - Bae, Kyung Woo AU - Cho, Jihoon AU - Kim, Seok AU - Kim, Borham AU - Baek, Hyunyoung AU - Song, Wongeun AU - Yoo, Sooyoung PY - 2022/10/13 TI - Coronary Artery Computed Tomography Angiography for Preventing Cardio-Cerebrovascular Disease: Observational Cohort Study Using the Observational Health Data Sciences and Informatics? Common Data Model JO - JMIR Med Inform SP - e41503 VL - 10 IS - 10 KW - cardiovascular diseases KW - coronary artery computed tomography angiography KW - observational study KW - common data model KW - population level estimation KW - cardiology KW - vascular disease KW - medical informatics KW - computed tomography KW - angiography KW - electronic health record KW - risk score KW - health data science KW - data modeling N2 - Background: Cardio-cerebrovascular diseases (CVDs) result in 17.5 million deaths annually worldwide, accounting for 46.2% of noncommunicable causes of death, and are the leading cause of death, followed by cancer, respiratory disease, and diabetes mellitus. Coronary artery computed tomography angiography (CCTA), which detects calcification in the coronary arteries, can be used to detect asymptomatic but serious vascular disease. It allows for noninvasive and quick testing despite involving radiation exposure. Objective: The objective of our study was to investigate the effectiveness of CCTA screening on CVD outcomes by using the Observational Health Data Sciences and Informatics? Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) data and the population-level estimation method. Methods: Using electronic health record?based OMOP-CDM data, including health questionnaire responses, adults (aged 30-74 years) without a history of CVD were selected, and 5-year CVD outcomes were compared between patients undergoing CCTA (target group) and a comparison group via 1:1 propensity score matching. Participants were stratified into low-risk and high-risk groups based on the American College of Cardiology/American Heart Association atherosclerotic cardiovascular disease (ASCVD) risk score and Framingham risk score (FRS) for subgroup analyses. Results: The 2-year and 5-year risk scores were compared as secondary outcomes between the two groups. In total, 8787 participants were included in both the target group and comparison group. No significant differences (calibration P=.37) were found between the hazard ratios of the groups at 5 years. The subgroup analysis also revealed no significant differences between the ASCVD risk scores and FRSs of the groups at 5 years (ASCVD risk score: P=.97; FRS: P=.85). However, the CCTA group showed a significantly lower increase in risk scores at 2 years (ASCVD risk score: P=.03; FRS: P=.02). Conclusions: Although we could not confirm a significant difference in the preventive effects of CCTA screening for CVDs over a long period of 5 years, it may have a beneficial effect on risk score management over 2 years. UR - https://medinform.jmir.org/2022/10/e41503 UR - http://dx.doi.org/10.2196/41503 UR - http://www.ncbi.nlm.nih.gov/pubmed/36227638 ID - info:doi/10.2196/41503 ER - TY - JOUR AU - Al-Otaibi, Jawaher AU - Tolma, Eleni AU - Alali, Walid AU - Alhuwail, Dari AU - Aljunid, Mohamed Syed PY - 2022/10/7 TI - The Factors Contributing to Physicians? Current Use of and Satisfaction With Electronic Health Records in Kuwait?s Public Health Care: Cross-sectional Questionnaire Study JO - JMIR Med Inform SP - e36313 VL - 10 IS - 10 KW - health informatics KW - information systems adoption KW - electronic health record KW - EHR KW - public health informatics N2 - Background: Electronic health record (EHR) has emerged as a backbone health care organization that aims to integrate health care records and automate clinical workflow. With the adoption of the eHealth care system, health information communication technologies and EHRs are offering significant health care advantages in the form of error reduction, improved communication, and patient satisfaction. Objective: This study aimed to (1) investigate factors associated with physicians? EHR adoption status and prevalence of EHRs in Kuwait and (2) identify factors predicting physician satisfaction with EHRs in public hospitals in Kuwait. Methods: This study was conducted at Kuwait?s public Al-Jahra hospital from May to September 2019, using quantitative research methods. Primary data were gathered via questionnaires distributed among 295 physicians recruited using convenience sampling. Data were analyzed in SPSS using descriptive, bivariate, and multivariate linear regression, adjusted for demographics. Results: Results of the study revealed that the controlled variable of gender (?=?.197; P=.02) along with explanatory variables, such as training quality (?=.068; P=.005), perception of barriers (?=?.107; P=.04), and effect on physician (?=.521; P<.001) have a significant statistical relationship with physicians? EHR adoption status. Furthermore, findings also suggested that controlled variables of gender (?=?.193; P=.02), education (?=?.164; P=.03), effect on physician (?=.417; P<.001), and level of ease of use (?=.254; P<.001) are significant predictors of the degree of physician satisfaction with the EHR system. Conclusions: The findings of this study had significant managerial and practical implications for creating an inductive environment for the acceptance of EHR systems across a broad spectrum of health care system in Kuwait. UR - https://medinform.jmir.org/2022/10/e36313 UR - http://dx.doi.org/10.2196/36313 UR - http://www.ncbi.nlm.nih.gov/pubmed/36206039 ID - info:doi/10.2196/36313 ER - TY - JOUR AU - Cheng, Christina AU - Gearon, Emma AU - Hawkins, Melanie AU - McPhee, Crystal AU - Hanna, Lisa AU - Batterham, Roy AU - Osborne, H. Richard PY - 2022/9/16 TI - Digital Health Literacy as a Predictor of Awareness, Engagement, and Use of a National Web-Based Personal Health Record: Population-Based Survey Study JO - J Med Internet Res SP - e35772 VL - 24 IS - 9 KW - eHealth KW - mobile health KW - mHealth KW - health literacy KW - health equity KW - electronic health records KW - vulnerable populations KW - disadvantaged populations N2 - Background: Web-based personal health records (PHRs) have the potential to improve the quality, accuracy, and timeliness of health care. However, the international uptake of web-based PHRs has been slow. Populations experiencing disadvantages are less likely to use web-based PHRs, potentially widening health inequities within and among countries. Objective: With limited understanding of the predictors of community uptake and use of web-based PHR, the aim of this study was to identify the predictors of awareness, engagement, and use of the Australian national web-based PHR, My Health Record (MyHR). Methods: A population-based survey of adult participants residing in regional Victoria, Australia, was conducted in 2018 using telephone interviews. Logistic regression, adjusted for age, was used to assess the relationship among digital health literacy, health literacy, and demographic characteristics, and the 3 dependent variables of MyHR: awareness, engagement, and use. Digital health literacy and health literacy were measured using multidimensional tools, using all 7 scales of the eHealth Literacy Questionnaire and 4 out of the 9 scales of the Health Literacy Questionnaire. Results: A total of 998 responses were analyzed. Many elements of digital health literacy were strongly associated with MyHR awareness, engagement, and use. A 1-unit increase in each of the 7 eHealth Literacy Questionnaire scales was associated with a 2- to 4-fold increase in the odds of using MyHR: using technology to process health information (odds ratio [OR] 4.14, 95% CI 2.34-7.31), understanding of health concepts and language (OR 2.25, 95% CI 1.08-4.69), ability to actively engage with digital services (OR 4.44, 95% CI 2.55-7.75), feel safe and in control (OR 2.36, 95% CI 1.43-3.88), motivated to engage with digital services (OR 4.24, 95% CI 2.36-7.61), access to digital services that work (OR 2.49, 95% CI 1.32-4.69), and digital services that suit individual needs (OR 3.48, 95% CI 1.97-6.15). The Health Literacy Questionnaire scales of health care support, actively managing health, and social support were also associated with a 1- to 2-fold increase in the odds of using MyHR. Using the internet to search for health information was another strong predictor; however, older people and those with less education were less likely to use MyHR. Conclusions: This study revealed strong and consistent patterns of association between digital health literacy and the use of a web-based PHR. The results indicate potential actions for promoting PHR uptake, including improving digital technology and skill experiences that may improve digital health literacy and willingness to engage in web-based PHR. Uptake may also be improved through more responsive digital services, strengthened health care, and better social support. A holistic approach, including targeted solutions, is needed to ensure that web-based PHR can realize its full potential to help reduce health inequities. UR - https://www.jmir.org/2022/9/e35772 UR - http://dx.doi.org/10.2196/35772 UR - http://www.ncbi.nlm.nih.gov/pubmed/36112404 ID - info:doi/10.2196/35772 ER - TY - JOUR AU - Muller, A. Sam H. AU - van Thiel, W. Ghislaine J. M. AU - Vrana, Marilena AU - Mostert, Menno AU - van Delden, M. Johannes J. PY - 2022/9/7 TI - Patients? and Publics? Preferences for Data-Intensive Health Research Governance: Survey Study JO - JMIR Hum Factors SP - e36797 VL - 9 IS - 3 KW - data-intensive health research KW - big data KW - data sharing KW - patient and public preferences KW - health data sharing conditions KW - ethics KW - governance KW - policy KW - patient and public involvement KW - research participants KW - trust N2 - Background: Patients and publics are generally positive about data-intensive health research. However, conditions need to be fulfilled for their support. Ensuring confidentiality, security, and privacy of patients? health data is pivotal. Patients and publics have concerns about secondary use of data by commercial parties and the risk of data misuse, reasons for which they favor personal control of their data. Yet, the potential of public benefit highlights the potential of building trust to attenuate these perceptions of harm and risk. Nevertheless, empirical evidence on how conditions for support of data-intensive health research can be operationalized to that end remains scant. Objective: This study aims to inform efforts to design governance frameworks for data-intensive health research, by gaining insight into the preferences of patients and publics for governance policies and measures. Methods: We distributed a digital questionnaire among a purposive sample of patients and publics. Data were analyzed using descriptive statistics and nonparametric inferential statistics to compare group differences and explore associations between policy preferences. Results: Study participants (N=987) strongly favored sharing their health data for scientific health research. Personal decision-making about which research projects health data are shared with (346/980, 35.3%), which researchers/organizations can have access (380/978, 38.9%), and the provision of information (458/981, 46.7%) were found highly important. Health data?sharing policies strengthening direct personal control, like being able to decide under which conditions health data are shared (538/969, 55.5%), were found highly important. Policies strengthening collective governance, like reliability checks (805/967, 83.2%) and security safeguards (787/976, 80.6%), were also found highly important. Further analysis revealed that participants willing to share health data, to a lesser extent, demanded policies strengthening direct personal control than participants who were reluctant to share health data. This was the case for the option to have health data deleted at any time (P<.001) and the ability to decide the conditions under which health data can be shared (P<.001). Overall, policies and measures enforcing conditions for support at the collective level of governance, like having an independent committee to evaluate requests for access to health data (P=.02), were most strongly favored. This also applied to participants who explicitly stressed that it was important to be able to decide the conditions under which health data can be shared, for instance, whether sanctions on data misuse are in place (P=.03). Conclusions: This study revealed that both a positive attitude toward health data sharing and demand for personal decision-making abilities were associated with policies and measures strengthening control at the collective level of governance. We recommend pursuing the development of this type of governance policy. More importantly, further study is required to understand how governance policies and measures can contribute to the trustworthiness of data-intensive health research. UR - https://humanfactors.jmir.org/2022/3/e36797 UR - http://dx.doi.org/10.2196/36797 UR - http://www.ncbi.nlm.nih.gov/pubmed/36069794 ID - info:doi/10.2196/36797 ER - TY - JOUR AU - Cook, Lily AU - Espinoza, Juan AU - Weiskopf, G. Nicole AU - Mathews, Nisha AU - Dorr, A. David AU - Gonzales, L. Kelly AU - Wilcox, Adam AU - Madlock-Brown, Charisse AU - PY - 2022/9/6 TI - Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave JO - JMIR Med Inform SP - e39235 VL - 10 IS - 9 KW - social determinants of health KW - health equity KW - bias KW - data quality KW - data harmonization KW - data standards KW - terminology KW - data aggregation N2 - Background: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. Objective: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. Methods: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as ?Declined? were grouped with ?Refused,? and ?Multiple Race? was grouped with ?Two or more races? and ?Multiracial.? Results: ?No matching concept? was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. Conclusions: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy. UR - https://medinform.jmir.org/2022/9/e39235 UR - http://dx.doi.org/10.2196/39235 UR - http://www.ncbi.nlm.nih.gov/pubmed/35917481 ID - info:doi/10.2196/39235 ER - TY - JOUR AU - Kiser, C. Amber AU - Eilbeck, Karen AU - Ferraro, P. Jeffrey AU - Skarda, E. David AU - Samore, H. Matthew AU - Bucher, Brian PY - 2022/8/30 TI - Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care?Associated Infection JO - JMIR Med Inform SP - e39057 VL - 10 IS - 8 KW - standard vocabularies KW - machine learning KW - electronic health records KW - model transferability KW - data heterogeneity N2 - Background: With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts. Objective: This study aimed to evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative health care?associated infections across institutions with different EHR systems. Methods: Patients who underwent surgery from the University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was a health care?associated infection within 30 days of the procedure. EHR data from 0-30 days after the operation were mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the area under the receiver operating characteristic curve (AUC) and F1-score in internal and external validations. To evaluate model transferability, a difference-in-difference metric was defined as the difference in performance drop between internal and external validations for the baseline and grouped models. Results: A total of 5775 patients from the University of Utah and 15,434 patients from Intermountain Healthcare were included. The prevalence of selected outcomes was from 4.9% (761/15,434) to 5% (291/5775) for surgical site infections, from 0.8% (44/5775) to 1.1% (171/15,434) for pneumonia, from 2.6% (400/15,434) to 3% (175/5775) for sepsis, and from 0.8% (125/15,434) to 0.9% (50/5775) for urinary tract infections. In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F1-score in external validation compared to baseline features (all P<.001, except urinary tract infection AUC: P=.002). The difference-in-difference metrics ranged from 0.005 to 0.248 for AUC and from 0.075 to 0.216 for F1-score. Conclusions: We demonstrated that grouping machine learning model features based on standard vocabularies improved model transferability between data sets across 2 institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the health care system. UR - https://medinform.jmir.org/2022/8/e39057 UR - http://dx.doi.org/10.2196/39057 UR - http://www.ncbi.nlm.nih.gov/pubmed/36040784 ID - info:doi/10.2196/39057 ER - TY - JOUR AU - Wang, Peng AU - Li, Yong AU - Yang, Liang AU - Li, Simin AU - Li, Linfeng AU - Zhao, Zehan AU - Long, Shaopei AU - Wang, Fei AU - Wang, Hongqian AU - Li, Ying AU - Wang, Chengliang PY - 2022/8/30 TI - An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation JO - JMIR Med Inform SP - e38154 VL - 10 IS - 8 KW - EHR KW - PHI KW - personal information KW - protected data KW - protected information KW - patient information KW - health information KW - de-identification KW - de-identify KW - privacy KW - TinyBert KW - model KW - development KW - algorithm KW - machine learning KW - CRF KW - data augmentation KW - health record KW - medical record N2 - Background: With the popularization of electronic health records in China, the utilization of digitalized data has great potential for the development of real-world medical research. However, the data usually contains a great deal of protected health information and the direct usage of this data may cause privacy issues. The task of deidentifying protected health information in electronic health records can be regarded as a named entity recognition problem. Existing rule-based, machine learning?based, or deep learning?based methods have been proposed to solve this problem. However, these methods still face the difficulties of insufficient Chinese electronic health record data and the complex features of the Chinese language. Objective: This paper proposes a method to overcome the difficulties of overfitting and a lack of training data for deep neural networks to enable Chinese protected health information deidentification. Methods: We propose a new model that merges TinyBERT (bidirectional encoder representations from transformers) as a text feature extraction module and the conditional random field method as a prediction module for deidentifying protected health information in Chinese medical electronic health records. In addition, a hybrid data augmentation method that integrates a sentence generation strategy and a mention-replacement strategy is proposed for overcoming insufficient Chinese electronic health records. Results: We compare our method with 5 baseline methods that utilize different BERT models as their feature extraction modules. Experimental results on the Chinese electronic health records that we collected demonstrate that our method had better performance (microprecision: 98.7%, microrecall: 99.13%, and micro-F1 score: 98.91%) and higher efficiency (40% faster) than all the BERT-based baseline methods. Conclusions: Compared to baseline methods, the efficiency advantage of TinyBERT on our proposed augmented data set was kept while the performance improved for the task of Chinese protected health information deidentification. UR - https://medinform.jmir.org/2022/8/e38154 UR - http://dx.doi.org/10.2196/38154 UR - http://www.ncbi.nlm.nih.gov/pubmed/36040774 ID - info:doi/10.2196/38154 ER - TY - JOUR AU - Maré, Adele Irma AU - Kramer, Beverley AU - Hazelhurst, Scott AU - Nhlapho, Dorcus Mapule AU - Zent, Roy AU - Harris, A. Paul AU - Klipin, Michael PY - 2022/8/30 TI - Electronic Data Capture System (REDCap) for Health Care Research and Training in a Resource-Constrained Environment: Technology Adoption Case Study JO - JMIR Med Inform SP - e33402 VL - 10 IS - 8 KW - electronic data capture KW - implementation science KW - Research Electronic Data Capture KW - REDCap KW - biomedical informatics KW - South Africa N2 - Background: Electronic data capture (EDC) in academic health care organizations provides an opportunity for the management, aggregation, and secondary use of research and clinical data. It is especially important in resource-constrained environments such as the South African public health care sector, where paper records are still the main form of clinical record keeping. Objective: The aim of this study was to describe the strategies followed by the University of the Witwatersrand Faculty of Health Sciences (Wits FHS) during the period from 2013 to 2021 to overcome resistance to, and encourage the adoption of, the REDCap (Research Electronic Data Capture; Vanderbilt University) system by academic and clinical staff. REDCap has found wide use in varying domains, including clinical studies and research projects as well as administrative, financial, and human resource applications. Given REDCap?s global footprint in >5000 institutions worldwide and potential for future growth, the strategies followed by the Wits FHS to support users and encourage adoption may be of importance to others using the system, particularly in resource-constrained settings. Methods: The strategies to support users and encourage adoption included top-down organizational support; secure and reliable application, hosting infrastructure, and systems administration; an enabling and accessible REDCap support team; regular hands-on training workshops covering REDCap project setup and data collection instrument design techniques; annual local symposia to promote networking and awareness of all the latest software features and best practices for using them; participation in REDCap Consortium activities; and regular and ongoing mentorship from members of the Vanderbilt University Medical Center. Results: During the period from 2013 to 2021, the use of the REDCap EDC system by individuals at the Wits FHS increased, respectively, from 129 active user accounts to 3447 active user accounts. The number of REDCap projects increased from 149 in 2013 to 12,865 in 2021. REDCap at Wits also supported various publications and research outputs, including journal articles and postgraduate monographs. As of 2020, a total of 233 journal articles and 87 postgraduate monographs acknowledged the use of the Wits REDCap system. Conclusions: By providing reliable infrastructure and accessible support resources, we were able to successfully implement and grow the REDCap EDC system at the Wits FHS and its associated academic medical centers. We believe that the increase in the use of REDCap was driven by offering a dependable, secure service with a strong end-user training and support model. This model may be applied by other academic and health care organizations in resource-constrained environments planning to implement EDC technology. UR - https://medinform.jmir.org/2022/8/e33402 UR - http://dx.doi.org/10.2196/33402 UR - http://www.ncbi.nlm.nih.gov/pubmed/36040763 ID - info:doi/10.2196/33402 ER - TY - JOUR AU - Noor, Kawsar AU - Roguski, Lukasz AU - Bai, Xi AU - Handy, Alex AU - Klapaukh, Roman AU - Folarin, Amos AU - Romao, Luis AU - Matteson, Joshua AU - Lea, Nathan AU - Zhu, Leilei AU - Asselbergs, W. Folkert AU - Wong, Keong Wai AU - Shah, Anoop AU - Dobson, JB Richard PY - 2022/8/24 TI - Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals JO - JMIR Med Inform SP - e38122 VL - 10 IS - 8 KW - natural language processing KW - text mining KW - information retrieval KW - electronic health record system KW - clinical support N2 - Background: As more health care organizations transition to using electronic health record (EHR) systems, it is important for these organizations to maximize the secondary use of their data to support service improvement and clinical research. These organizations will find it challenging to have systems capable of harnessing the unstructured data fields in the record (clinical notes, letters, etc) and more practically have such systems interact with all of the hospital data systems (legacy and current). Objective: We describe the deployment of the EHR interfacing information extraction and retrieval platform CogStack at University College London Hospitals (UCLH). Methods: At UCLH, we have deployed the CogStack platform, an information retrieval platform with natural language processing capabilities. The platform addresses the problem of data ingestion and harmonization from multiple data sources using the Apache NiFi module for managing complex data flows. The platform also facilitates the extraction of structured data from free-text records through use of the MedCAT natural language processing library. Finally, data science tools are made available to support data scientists and the development of downstream applications dependent upon data ingested and analyzed by CogStack. Results: The platform has been deployed at the hospital, and in particular, it has facilitated a number of research and service evaluation projects. To date, we have processed over 30 million records, and the insights produced from CogStack have informed a number of clinical research use cases at the hospital. Conclusions: The CogStack platform can be configured to handle the data ingestion and harmonization challenges faced by a hospital. More importantly, the platform enables the hospital to unlock important clinical information from the unstructured portion of the record using natural language processing technology. UR - https://medinform.jmir.org/2022/8/e38122 UR - http://dx.doi.org/10.2196/38122 UR - http://www.ncbi.nlm.nih.gov/pubmed/36001371 ID - info:doi/10.2196/38122 ER - TY - JOUR AU - Shi, Jianlin AU - Morgan, L. Keaton AU - Bradshaw, L. Richard AU - Jung, Se-Hee AU - Kohlmann, Wendy AU - Kaphingst, A. Kimberly AU - Kawamoto, Kensaku AU - Fiol, Del Guilherme PY - 2022/8/11 TI - Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach JO - JMIR Med Inform SP - e37842 VL - 10 IS - 8 KW - clinical natural language processing KW - family health history extraction KW - cohort identification KW - genetic testing of hereditary cancers N2 - Background: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. Objective: The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. Methods: Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. Results: Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. Conclusions: Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers. UR - https://medinform.jmir.org/2022/8/e37842 UR - http://dx.doi.org/10.2196/37842 UR - http://www.ncbi.nlm.nih.gov/pubmed/35969459 ID - info:doi/10.2196/37842 ER - TY - JOUR AU - Li, Jili AU - Liu, Siru AU - Hu, Yundi AU - Zhu, Lingfeng AU - Mao, Yujia AU - Liu, Jialin PY - 2022/8/9 TI - Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study JO - J Med Internet Res SP - e38082 VL - 24 IS - 8 KW - heart failure KW - mortality KW - intensive care unit KW - prediction KW - XGBoost KW - SHAP KW - SHapley Additive exPlanation N2 - Background: Heart failure (HF) is a common disease and a major public health problem. HF mortality prediction is critical for developing individualized prevention and treatment plans. However, due to their lack of interpretability, most HF mortality prediction models have not yet reached clinical practice. Objective: We aimed to develop an interpretable model to predict the mortality risk for patients with HF in intensive care units (ICUs) and used the SHapley Additive exPlanation (SHAP) method to explain the extreme gradient boosting (XGBoost) model and explore prognostic factors for HF. Methods: In this retrospective cohort study, we achieved model development and performance comparison on the eICU Collaborative Research Database (eICU-CRD). We extracted data during the first 24 hours of each ICU admission, and the data set was randomly divided, with 70% used for model training and 30% used for model validation. The prediction performance of the XGBoost model was compared with three other machine learning models by the area under the curve. We used the SHAP method to explain the XGBoost model. Results: A total of 2798 eligible patients with HF were included in the final cohort for this study. The observed in-hospital mortality of patients with HF was 9.97%. Comparatively, the XGBoost model had the highest predictive performance among four models with an area under the curve (AUC) of 0.824 (95% CI 0.7766-0.8708), whereas support vector machine had the poorest generalization ability (AUC=0.701, 95% CI 0.6433-0.7582). The decision curve showed that the net benefit of the XGBoost model surpassed those of other machine learning models at 10%~28% threshold probabilities. The SHAP method reveals the top 20 predictors of HF according to the importance ranking, and the average of the blood urea nitrogen was recognized as the most important predictor variable. Conclusions: The interpretable predictive model helps physicians more accurately predict the mortality risk in ICU patients with HF, and therefore, provides better treatment plans and optimal resource allocation for their patients. In addition, the interpretable framework can increase the transparency of the model and facilitate understanding the reliability of the predictive model for the physicians. UR - https://www.jmir.org/2022/8/e38082 UR - http://dx.doi.org/10.2196/38082 UR - http://www.ncbi.nlm.nih.gov/pubmed/35943767 ID - info:doi/10.2196/38082 ER - TY - JOUR AU - Wendelboe, Aaron AU - Saber, Ibrahim AU - Dvorak, Justin AU - Adamski, Alys AU - Feland, Natalie AU - Reyes, Nimia AU - Abe, Karon AU - Ortel, Thomas AU - Raskob, Gary PY - 2022/8/5 TI - Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study JO - JMIR Bioinform Biotech SP - e36877 VL - 3 IS - 1 KW - venous thromboembolism KW - public health surveillance KW - machine learning KW - natural language processing KW - medical imaging review KW - public health N2 - Background: Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review. Objective: We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)?an NLP tool?in automatically classifying cases of VTE by ?reading? unstructured text from diagnostic imaging records collected from 2012 to 2014. Methods: After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians? comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05. Results: The VTE model of IDEAL-X ?read? 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI 93.7%-93.8%), 96.3% sensitivity (95% CI 96.2%-96.4%), 92% specificity (95% CI 91.9%-92%), an 89.1% positive predictive value (95% CI 89%-89.2%), and a 97.3% negative predictive value (95% CI 97.3%-97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI 97.8%-98%) than at the OUHSC (93.3%, 95% CI 93.1%-93.4%; P<.001), but the specificity was higher at the OUHSC (95.9%, 95% CI 95.8%-96%) than at Duke University (86.5%, 95% CI 86.4%-86.7%; P<.001). Conclusions: The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process. UR - https://bioinform.jmir.org/2022/1/e36877 UR - http://dx.doi.org/10.2196/36877 UR - http://www.ncbi.nlm.nih.gov/pubmed/37206160 ID - info:doi/10.2196/36877 ER - TY - JOUR AU - Krzyzanowski, Brittany AU - Manson, M. Steven PY - 2022/8/3 TI - Twenty Years of the Health Insurance Portability and Accountability Act Safe Harbor Provision: Unsolved Challenges and Ways Forward JO - JMIR Med Inform SP - e37756 VL - 10 IS - 8 KW - Health Insurance Portability and Accountability Act KW - HIPAA KW - data privacy KW - health KW - maps KW - safe harbor KW - visualization KW - patient privacy UR - https://medinform.jmir.org/2022/8/e37756 UR - http://dx.doi.org/10.2196/37756 UR - http://www.ncbi.nlm.nih.gov/pubmed/35921140 ID - info:doi/10.2196/37756 ER - TY - JOUR AU - Fan, Bi AU - Peng, Jiaxuan AU - Guo, Hainan AU - Gu, Haobin AU - Xu, Kangkang AU - Wu, Tingting PY - 2022/7/20 TI - Accurate Forecasting of Emergency Department Arrivals With Internet Search Index and Machine Learning Models: Model Development and Performance Evaluation JO - JMIR Med Inform SP - e34504 VL - 10 IS - 7 KW - emergency department KW - internet search index KW - machine learning KW - nonlinear model KW - patient arrival forecasting N2 - Background: Emergency department (ED) overcrowding is a concerning global health care issue, which is mainly caused by the uncertainty of patient arrivals, especially during the pandemic. Accurate forecasting of patient arrivals can allow health resource allocation in advance to reduce overcrowding. Currently, traditional data, such as historical patient visits, weather, holiday, and calendar, are primarily used to create forecasting models. However, data from an internet search engine (eg, Google) is less studied, although they can provide pivotal real-time surveillance information. The internet data can be employed to improve forecasting performance and provide early warning, especially during the epidemic. Moreover, possible nonlinearities between patient arrivals and these variables are often ignored. Objective: This study aims to develop an intelligent forecasting system with machine learning models and internet search index to provide an accurate prediction of ED patient arrivals, to verify the effectiveness of the internet search index, and to explore whether nonlinear models can improve the forecasting accuracy. Methods: Data on ED patient arrivals were collected from July 12, 2009, to June 27, 2010, the period of the 2009 H1N1 pandemic. These included 139,910 ED visits in our collaborative hospital, which is one of the biggest public hospitals in Hong Kong. Traditional data were also collected during the same period. The internet search index was generated from 268 search queries on Google to comprehensively capture the information about potential patients. The relationship between the index and patient arrivals was verified by Pearson correlation coefficient, Johansen cointegration, and Granger causality. Linear and nonlinear models were then developed with the internet search index to predict patient arrivals. The accuracy and robustness were also examined. Results: All models could accurately predict patient arrivals. The causality test indicated internet search index as a strong predictor of ED patient arrivals. With the internet search index, the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the linear model reduced from 5.3% to 5.0% and from 24.44 to 23.18, respectively, whereas the MAPE and RMSE of the nonlinear model decreased even more, from 3.5% to 3% and from 16.72 to 14.55, respectively. Compared with each other, the experimental results revealed that the forecasting system with extreme learning machine, as well as the internet search index, had the best performance in both forecasting accuracy and robustness analysis. Conclusions: The proposed forecasting system can make accurate, real-time prediction of ED patient arrivals. Compared with the static traditional variables, the internet search index significantly improves forecasting as a reliable predictor monitoring continuous behavior trend and sudden changes during the epidemic (P=.002). The nonlinear model performs better than the linear counterparts by capturing the dynamic relationship between the index and patient arrivals. Thus, the system can facilitate staff planning and workflow monitoring. UR - https://medinform.jmir.org/2022/7/e34504 UR - http://dx.doi.org/10.2196/34504 UR - http://www.ncbi.nlm.nih.gov/pubmed/35857360 ID - info:doi/10.2196/34504 ER - TY - JOUR AU - Ma, E. Jessica AU - Grubber, Janet AU - Coffman, J. Cynthia AU - Wang, Virginia AU - Hastings, Nicole S. AU - Allen, D. Kelli AU - Shepherd-Banigan, Megan AU - Decosimo, Kasey AU - Dadolf, Joshua AU - Sullivan, Caitlin AU - Sperber, R. Nina AU - Van Houtven, H. Courtney PY - 2022/7/18 TI - Identifying Family and Unpaid Caregivers in Electronic Health Records: Descriptive Analysis JO - JMIR Form Res SP - e35623 VL - 6 IS - 7 KW - veterans KW - caregivers KW - electronic health record N2 - Background: Most efforts to identify caregivers for research use passive approaches such as self-nomination. We describe an approach in which electronic health records (EHRs) can help identify, recruit, and increase diverse representations of family and other unpaid caregivers. Objective: Few health systems have implemented systematic processes for identifying caregivers. This study aimed to develop and evaluate an EHR-driven process for identifying veterans likely to have unpaid caregivers in a caregiver survey study. We additionally examined whether there were EHR-derived veteran characteristics associated with veterans having unpaid caregivers. Methods: We selected EHR home- and community-based referrals suggestive of veterans? need for supportive care from friends or family. We identified veterans with these referrals across the 8 US Department of Veteran Affairs medical centers enrolled in our study. Phone calls to a subset of these veterans confirmed whether they had a caregiver, specifically an unpaid caregiver. We calculated the screening contact rate for unpaid caregivers of veterans using attempted phone screening and for those who completed phone screening. The veteran characteristics from the EHR were compared across referral and screening groups using descriptive statistics, and logistic regression was used to compare the likelihood of having an unpaid caregiver among veterans who completed phone screening. Results: During the study period, our EHR-driven process identified 12,212 veterans with home- and community-based referrals; 2134 (17.47%) veteran households were called for phone screening. Among the 2134 veterans called, 1367 (64.06%) answered the call, and 813 (38.1%) veterans had a caregiver based on self-report of the veteran, their caregiver, or another person in the household. The unpaid caregiver identification rate was 38.1% and 59.5% among those with an attempted phone screening and completed phone screening, respectively. Veterans had increased odds of having an unpaid caregiver if they were married (adjusted odds ratio [OR] 2.69, 95% CI 1.68-4.34), had respite care (adjusted OR 2.17, 95% CI 1.41-3.41), or had adult day health care (adjusted OR 3.69, 95% CI 1.60-10.00). Veterans with a dementia diagnosis (adjusted OR 1.37, 95% CI 1.00-1.89) or veteran-directed care referral (adjusted OR 1.95, 95% CI 0.97-4.20) were also suggestive of an association with having an unpaid caregiver. Conclusions: The EHR-driven process to identify veterans likely to have unpaid caregivers is systematic and resource intensive. Approximately 60% (813/1367) of veterans who were successfully screened had unpaid caregivers. In the absence of discrete fields in the EHR, our EHR-driven process can be used to identify unpaid caregivers; however, incorporating caregiver identification fields into the EHR would support a more efficient and systematic identification of caregivers. Trial Registration: ClincalTrials.gov NCT03474380; https://clinicaltrials.gov/ct2/show/NCT03474380 UR - https://formative.jmir.org/2022/7/e35623 UR - http://dx.doi.org/10.2196/35623 UR - http://www.ncbi.nlm.nih.gov/pubmed/35849430 ID - info:doi/10.2196/35623 ER - TY - JOUR AU - Jing, Xia AU - Patel, L. Vimla AU - Cimino, J. James AU - Shubrook, H. Jay AU - Zhou, Yuchun AU - Liu, Chang AU - De Lacalle, Sonsoles PY - 2022/7/18 TI - The Roles of a Secondary Data Analytics Tool and Experience in Scientific Hypothesis Generation in Clinical Research: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e39414 VL - 11 IS - 7 KW - clinical research KW - observational study KW - scientific hypothesis generation KW - secondary data analytics tool KW - think-aloud method N2 - Background: Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions. Objective: This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis generation process. Methods: This manuscript introduces a study design. Experienced and inexperienced clinical researchers are being recruited since July 2021 to take part in this 2×2 factorial study, in which all participants use the same data sets during scientific hypothesis?generation sessions and follow predetermined scripts. The clinical researchers are separated into experienced or inexperienced groups based on predetermined criteria and are then randomly assigned into groups that use and do not use VIADS via block randomization. The study sessions, screen activities, and audio recordings of participants are captured. Participants use the think-aloud protocol during the study sessions. After each study session, every participant is given a follow-up survey, with participants using VIADS completing an additional modified System Usability Scale survey. A panel of clinical research experts will assess the scientific hypotheses generated by participants based on predeveloped metrics. All data will be anonymized, transcribed, aggregated, and analyzed. Results: Data collection for this study began in July 2021. Recruitment uses a brief online survey. The preliminary results showed that study participants can generate a few to over a dozen scientific hypotheses during a 2-hour study session, regardless of whether they used VIADS or other analytics tools. A metric to more accurately, comprehensively, and consistently assess scientific hypotheses within a clinical research context has been developed. Conclusions: The scientific hypothesis?generation process is an advanced cognitive activity and a complex process. Our results so far show that clinical researchers can quickly generate initial scientific hypotheses based on data sets and prior experience. However, refining these scientific hypotheses is a much more time-consuming activity. To uncover the fundamental mechanisms underlying the generation of scientific hypotheses, we need breakthroughs that can capture thinking processes more precisely. International Registered Report Identifier (IRRID): DERR1-10.2196/39414 UR - https://www.researchprotocols.org/2022/7/e39414 UR - http://dx.doi.org/10.2196/39414 UR - http://www.ncbi.nlm.nih.gov/pubmed/35736798 ID - info:doi/10.2196/39414 ER - TY - JOUR AU - Chen, Pei-Fu AU - Chen, Kuan-Chih AU - Liao, Wei-Chih AU - Lai, Feipei AU - He, Tai-Liang AU - Lin, Sheng-Che AU - Chen, Wei-Jen AU - Yang, Chi-Yu AU - Lin, Yu-Cheng AU - Tsai, I-Chang AU - Chiu, Chi-Hao AU - Chang, Shu-Chih AU - Hung, Fang-Ming PY - 2022/6/29 TI - Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches JO - JMIR Med Inform SP - e37557 VL - 10 IS - 6 KW - deep learning KW - International Classification of Diseases KW - medical records KW - multilabel text classification KW - natural language processing KW - coding system KW - algorithm KW - electronic health record KW - data mining N2 - Background: The tenth revision of the International Classification of Diseases (ICD-10) is widely used for epidemiological research and health management. The clinical modification (CM) and procedure coding system (PCS) of ICD-10 were developed to describe more clinical details with increasing diagnosis and procedure codes and applied in disease-related groups for reimbursement. The expansion of codes made the coding time-consuming and less accurate. The state-of-the-art model using deep contextual word embeddings was used for automatic multilabel text classification of ICD-10. In addition to input discharge diagnoses (DD), the performance can be improved by appropriate preprocessing methods for the text from other document types, such as medical history, comorbidity and complication, surgical method, and special examination. Objective: This study aims to establish a contextual language model with rule-based preprocessing methods to develop the model for ICD-10 multilabel classification. Methods: We retrieved electronic health records from a medical center. We first compared different word embedding methods. Second, we compared the preprocessing methods using the best-performing embeddings. We compared biomedical bidirectional encoder representations from transformers (BioBERT), clinical generalized autoregressive pretraining for language understanding (Clinical XLNet), label tree-based attention-aware deep model for high-performance extreme multilabel text classification (AttentionXLM), and word-to-vector (Word2Vec) to predict ICD-10-CM. To compare different preprocessing methods for ICD-10-CM, we included DD, medical history, and comorbidity and complication as inputs. We compared the performance of ICD-10-CM prediction using different preprocesses, including definition training, external cause code removal, number conversion, and combination code filtering. For the ICD-10 PCS, the model was trained using different combinations of DD, surgical method, and key words of special examination. The micro F1 score and the micro area under the receiver operating characteristic curve were used to compare the model?s performance with that of different preprocessing methods. Results: BioBERT had an F1 score of 0.701 and outperformed other models such as Clinical XLNet, AttentionXLM, and Word2Vec. For the ICD-10-CM, the model had an F1 score that significantly increased from 0.749 (95% CI 0.744-0.753) to 0.769 (95% CI 0.764-0.773) with the ICD-10 definition training, external cause code removal, number conversion, and combination code filter. For the ICD-10-PCS, the model had an F1 score that significantly increased from 0.670 (95% CI 0.663-0.678) to 0.726 (95% CI 0.719-0.732) with a combination of discharge diagnoses, surgical methods, and key words of special examination. With our preprocessing methods, the model had the highest area under the receiver operating characteristic curve of 0.853 (95% CI 0.849-0.855) and 0.831 (95% CI 0.827-0.834) for ICD-10-CM and ICD-10-PCS, respectively. Conclusions: The performance of our model with the pretrained contextualized language model and rule-based preprocessing method is better than that of the state-of-the-art model for ICD-10-CM or ICD-10-PCS. This study highlights the importance of rule-based preprocessing methods based on coder coding rules. UR - https://medinform.jmir.org/2022/6/e37557 UR - http://dx.doi.org/10.2196/37557 UR - http://www.ncbi.nlm.nih.gov/pubmed/35767353 ID - info:doi/10.2196/37557 ER - TY - JOUR AU - McLennan, Stuart AU - Rachut, Sarah AU - Lange, Johannes AU - Fiske, Amelia AU - Heckmann, Dirk AU - Buyx, Alena PY - 2022/6/27 TI - Practices and Attitudes of Bavarian Stakeholders Regarding the Secondary Use of Health Data for Research Purposes During the COVID-19 Pandemic: Qualitative Interview Study JO - J Med Internet Res SP - e38754 VL - 24 IS - 6 KW - COVID-19 KW - data sharing KW - General Data Protection Regulation KW - GDPR KW - research exemption KW - public health KW - research KW - digital health KW - electronic health records N2 - Background: The COVID-19 pandemic is a threat to global health and requires collaborative health research efforts across organizations and countries to address it. Although routinely collected digital health data are a valuable source of information for researchers, benefiting from these data requires accessing and sharing the data. Health care organizations focusing on individual risk minimization threaten to undermine COVID-19 research efforts, and it has been argued that there is an ethical obligation to use the European Union?s General Data Protection Regulation (GDPR) scientific research exemption during the COVID-19 pandemic to support collaborative health research. Objective: This study aims to explore the practices and attitudes of stakeholders in the German federal state of Bavaria regarding the secondary use of health data for research purposes during the COVID-19 pandemic, with a specific focus on the GDPR scientific research exemption. Methods: Individual semistructured qualitative interviews were conducted between December 2020 and January 2021 with a purposive sample of 17 stakeholders from 3 different groups in Bavaria: researchers involved in COVID-19 research (n=5, 29%), data protection officers (n=6, 35%), and research ethics committee representatives (n=6, 35%). The transcripts were analyzed using conventional content analysis. Results: Participants identified systemic challenges in conducting collaborative secondary-use health data research in Bavaria; secondary health data research generally only happens when patient consent has been obtained, or the data have been fully anonymized. The GDPR research exemption has not played a significant role during the pandemic and is currently seldom and restrictively used. Participants identified 3 key groups of barriers that led to difficulties: the wider ecosystem at many Bavarian health care organizations, legal uncertainty that leads to risk-adverse approaches, and ethical positions that patient consent ought to be obtained whenever possible to respect patient autonomy. To improve health data research in Bavaria and across Germany, participants wanted greater legal certainty regarding the use of pseudonymized data for research purposes without the patient?s consent. Conclusions: The current balance between enabling the positive goals of health data research and avoiding associated data protection risks is heavily skewed toward avoiding risks; so much so that it makes reaching the goals of health data research extremely difficult. This is important, as it is widely recognized that there is an ethical imperative to use health data to improve care. The current approach also creates a problematic conflict with the ambitions of Germany, and the federal state of Bavaria, to be a leader in artificial intelligence. A recent development in the field of German public administration known as norm screening (Normenscreening) could potentially provide a systematic approach to minimize legal barriers. This approach would likely be beneficial to other countries. UR - https://www.jmir.org/2022/6/e38754 UR - http://dx.doi.org/10.2196/38754 UR - http://www.ncbi.nlm.nih.gov/pubmed/35696598 ID - info:doi/10.2196/38754 ER - TY - JOUR AU - Ge, Wendong AU - Alabsi, Haitham AU - Jain, Aayushee AU - Ye, Elissa AU - Sun, Haoqi AU - Fernandes, Marta AU - Magdamo, Colin AU - Tesh, A. Ryan AU - Collens, I. Sarah AU - Newhouse, Amy AU - MVR Moura, Lidia AU - Zafar, Sahar AU - Hsu, John AU - Akeju, Oluwaseun AU - Robbins, K. Gregory AU - Mukerji, S. Shibani AU - Das, Sudeshna AU - Westover, Brandon M. PY - 2022/6/24 TI - Identifying Patients With Delirium Based on Unstructured Clinical Notes: Observational Study JO - JMIR Form Res SP - e33834 VL - 6 IS - 6 KW - delirium KW - electronic health records KW - clinical notes KW - machine learning KW - natural language processing N2 - Background: Delirium in hospitalized patients is a syndrome of acute brain dysfunction. Diagnostic (International Classification of Diseases [ICD]) codes are often used in studies using electronic health records (EHRs), but they are inaccurate. Objective: We sought to develop a more accurate method using natural language processing (NLP) to detect delirium episodes on the basis of unstructured clinical notes. Methods: We collected 1.5 million notes from >10,000 patients from among 9 hospitals. Seven experts iteratively labeled 200,471 sentences. Using these, we trained three NLP classifiers: Support Vector Machine, Recurrent Neural Networks, and Transformer. Testing was performed using an external data set. We also evaluated associations with delirium billing (ICD) codes, medications, orders for restraints and sitters, direct assessments (Confusion Assessment Method [CAM] scores), and in-hospital mortality. F1 scores, confusion matrices, and areas under the receiver operating characteristic curve (AUCs) were used to compare NLP models. We used the ? coefficient to measure associations with other delirium indicators. Results: The transformer NLP performed best on the following parameters: micro F1=0.978, macro F1=0.918, positive AUC=0.984, and negative AUC=0.992. NLP detections exhibited higher correlations (?) than ICD codes with deliriogenic medications (0.194 vs 0.073 for ICD codes), restraints and sitter orders (0.358 vs 0.177), mortality (0.216 vs 0.000), and CAM scores (0.256 vs ?0.028). Conclusions: Clinical notes are an attractive alternative to ICD codes for EHR delirium studies but require automated methods. Our NLP model detects delirium with high accuracy, similar to manual chart review. Our NLP approach can provide more accurate determination of delirium for large-scale EHR-based studies regarding delirium, quality improvement, and clinical trails. UR - https://formative.jmir.org/2022/6/e33834 UR - http://dx.doi.org/10.2196/33834 UR - http://www.ncbi.nlm.nih.gov/pubmed/35749214 ID - info:doi/10.2196/33834 ER - TY - JOUR AU - Yang, Hao AU - Li, Jiaxi AU - Liu, Siru AU - Yang, Xiaoling AU - Liu, Jialin PY - 2022/6/16 TI - Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record?Based Machine Learning: Development and Validation JO - JMIR Med Inform SP - e36958 VL - 10 IS - 6 KW - diabetes KW - type 2 diabetes KW - hypoglycemia KW - learning KW - machine learning model KW - EHR KW - electronic health record KW - XGBoost KW - natural language processing N2 - Background: Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed. Objective: The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes. Methods: We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance. Results: We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA. Conclusions: The Paragraph Vector?Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes. UR - https://medinform.jmir.org/2022/6/e36958 UR - http://dx.doi.org/10.2196/36958 UR - http://www.ncbi.nlm.nih.gov/pubmed/35708754 ID - info:doi/10.2196/36958 ER - TY - JOUR AU - Lam, Carson AU - Thapa, Rahul AU - Maharjan, Jenish AU - Rahmani, Keyvan AU - Tso, Foon Chak AU - Singh, Preet Navan AU - Casie Chetty, Satish AU - Mao, Qingqing PY - 2022/6/15 TI - Multitask Learning With Recurrent Neural Networks for Acute Respiratory Distress Syndrome Prediction Using Only Electronic Health Record Data: Model Development and Validation Study JO - JMIR Med Inform SP - e36202 VL - 10 IS - 6 KW - deep learning KW - neural networks KW - ARDS KW - health care KW - multitask learning KW - clinical decision support KW - prediction model KW - COVID-19 KW - electronic health record KW - risk outcome KW - respiratory distress KW - diagnostic criteria KW - recurrent neural network N2 - Background: Acute respiratory distress syndrome (ARDS) is a condition that is often considered to have broad and subjective diagnostic criteria and is associated with significant mortality and morbidity. Early and accurate prediction of ARDS and related conditions such as hypoxemia and sepsis could allow timely administration of therapies, leading to improved patient outcomes. Objective: The aim of this study is to perform an exploration of how multilabel classification in the clinical setting can take advantage of the underlying dependencies between ARDS and related conditions to improve early prediction of ARDS in patients. Methods: The electronic health record data set included 40,703 patient encounters from 7 hospitals from April 20, 2018, to March 17, 2021. A recurrent neural network (RNN) was trained using data from 5 hospitals, and external validation was conducted on data from 2 hospitals. In addition to ARDS, 12 target labels for related conditions such as sepsis, hypoxemia, and COVID-19 were used to train the model to classify a total of 13 outputs. As a comparator, XGBoost models were developed for each of the 13 target labels. Model performance was assessed using the area under the receiver operating characteristic curve. Heat maps to visualize attention scores were generated to provide interpretability to the neural networks. Finally, cluster analysis was performed to identify potential phenotypic subgroups of patients with ARDS. Results: The single RNN model trained to classify 13 outputs outperformed the individual XGBoost models for ARDS prediction, achieving an area under the receiver operating characteristic curve of 0.842 on the external test sets. Models trained on an increasing number of tasks resulted in improved performance. Earlier prediction of ARDS nearly doubled the rate of in-hospital survival. Cluster analysis revealed distinct ARDS subgroups, some of which had similar mortality rates but different clinical presentations. Conclusions: The RNN model presented in this paper can be used as an early warning system to stratify patients who are at risk of developing one of the multiple risk outcomes, hence providing practitioners with the means to take early action. UR - https://medinform.jmir.org/2022/6/e36202 UR - http://dx.doi.org/10.2196/36202 UR - http://www.ncbi.nlm.nih.gov/pubmed/35704370 ID - info:doi/10.2196/36202 ER - TY - JOUR AU - Zhang, Yichao AU - Lu, Sha AU - Wu, Yina AU - Hu, Wensheng AU - Yuan, Zhenming PY - 2022/6/13 TI - The Prediction of Preterm Birth Using Time-Series Technology-Based Machine Learning: Retrospective Cohort Study JO - JMIR Med Inform SP - e33835 VL - 10 IS - 6 KW - preterm birth prediction KW - temporal data mining KW - electronic medical records KW - pregnant healthcare N2 - Background: Globally, the preterm birth rate has tended to increase over time. Ultrasonography cervical-length assessment is considered to be the most effective screening method for preterm birth, but routine, universal cervical-length screening remains controversial because of its cost. Objective: We used obstetric data to analyze and assess the risk of preterm birth. A machine learning model based on time-series technology was used to analyze regular, repeated obstetric examination records during pregnancy to improve the performance of the preterm birth screening model. Methods: This study attempts to use continuous electronic medical record (EMR) data from pregnant women to construct a preterm birth prediction classifier based on long short-term memory (LSTM) networks. Clinical data were collected from 5187 pregnant Chinese women who gave birth with natural vaginal delivery. The data included more than 25,000 obstetric EMRs from the early trimester to 28 weeks of gestation. The area under the curve (AUC), accuracy, sensitivity, and specificity were used to assess the performance of the prediction model. Results: Compared with a traditional cross-sectional study, the LSTM model in this time-series study had better overall prediction ability and a lower misdiagnosis rate at the same detection rate. Accuracy was 0.739, sensitivity was 0.407, specificity was 0.982, and the AUC was 0.651. Important-feature identification indicated that blood pressure, blood glucose, lipids, uric acid, and other metabolic factors were important factors related to preterm birth. Conclusions: The results of this study will be helpful to the formulation of guidelines for the prevention and treatment of preterm birth, and will help clinicians make correct decisions during obstetric examinations. The time-series model has advantages for preterm birth prediction. UR - https://medinform.jmir.org/2022/6/e33835 UR - http://dx.doi.org/10.2196/33835 UR - http://www.ncbi.nlm.nih.gov/pubmed/35700004 ID - info:doi/10.2196/33835 ER - TY - JOUR AU - Davidson, Lena AU - Canelón, P. Silvia AU - Boland, Regina Mary PY - 2022/6/7 TI - Medication-Wide Association Study Using Electronic Health Record Data of Prescription Medication Exposure and Multifetal Pregnancies: Retrospective Study JO - JMIR Med Inform SP - e32229 VL - 10 IS - 6 KW - pregnancy KW - pregnancy, multiple KW - assisted reproductive technique KW - electronic health record N2 - Background: Medication-wide association studies (MWAS) have been applied to assess the risk of individual prescription use and a wide range of health outcomes, including cancer, acute myocardial infarction, acute liver failure, acute renal failure, and upper gastrointestinal ulcers. Current literature on the use of preconception and periconception medication and its association with the risk of multiple gestation pregnancies (eg, monozygotic and dizygotic) is largely based on assisted reproductive technology (ART) cohorts. However, among non-ART pregnancies, it is unknown whether other medications increase the risk of multifetal pregnancies. Objective: This study aimed to investigate the risk of multiple gestational births (eg, twins and triplets) following preconception and periconception exposure to prescription medications in patients who delivered at Penn Medicine. Methods: We used electronic health record data between 2010 and 2017 on patients who delivered babies at Penn Medicine, a health care system in the Greater Philadelphia area. We explored 3 logistic regression models: model 1 (no adjustment); model 2 (adjustment for maternal age); and model 3?our final logistic regression model (adjustment for maternal age, ART use, and infertility diagnosis). In all models, multiple births (MBs) were our outcome of interest (binary outcome), and each medication was assessed separately as a binary variable. To assess our MWAS model performance, we defined ART medications as our gold standard, given that these medications are known to increase the risk of MB. Results: Of the 63,334 distinct deliveries in our cohort, only 1877 pregnancies (2.96%) were prescribed any medication during the preconception and first trimester period. Of the 123 medications prescribed, we found 26 (21.1%) medications associated with MB (using nominal P values) and 10 (8.1%) medications associated with MB (using Bonferroni adjustment) in fully adjusted model 3. We found that our model 3 algorithm had an accuracy of 85% (using nominal P values) and 89% (using Bonferroni-adjusted P values). Conclusions: Our work demonstrates the opportunities in applying the MWAS approach with electronic health record data to explore associations between preconception and periconception medication exposure and the risk of MB while identifying novel candidate medications for further study. Overall, we found 3 novel medications linked with MB that could be explored in further work; this demonstrates the potential of our method to be used for hypothesis generation. UR - https://medinform.jmir.org/2022/6/e32229 UR - http://dx.doi.org/10.2196/32229 UR - http://www.ncbi.nlm.nih.gov/pubmed/35671076 ID - info:doi/10.2196/32229 ER - TY - JOUR AU - Wu, Yonghui AU - Yang, Xi AU - Morris, L. Heather AU - Gurka, J. Matthew AU - Shenkman, A. Elizabeth AU - Cusi, Kenneth AU - Bril, Fernando AU - Donahoo, T. William PY - 2022/6/6 TI - Noninvasive Diagnosis of Nonalcoholic Steatohepatitis and Advanced Liver Fibrosis Using Machine Learning Methods: Comparative Study With Existing Quantitative Risk Scores JO - JMIR Med Inform SP - e36997 VL - 10 IS - 6 KW - machine learning KW - nonalcoholic fatty liver disease KW - nonalcoholic steatohepatitis KW - fatty liver KW - liver fibrosis N2 - Background: Nonalcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation; however, they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications and high expenses. Knowing the difference between the more benign isolated steatosis and the more severe NASH and cirrhosis informs the physician regarding the need for more aggressive management. Objective: We intend to explore the feasibility of using machine learning methods for noninvasive diagnosis of NASH and advanced liver fibrosis and compare machine learning methods with existing quantitative risk scores. Methods: We conducted a retrospective analysis of clinical data from a cohort of 492 patients with biopsy-proven nonalcoholic fatty liver disease (NAFLD), NASH, or advanced fibrosis. We systematically compared 5 widely used machine learning algorithms for the prediction of NAFLD, NASH, and fibrosis using 2 variable encoding strategies. Then, we compared the machine learning methods with 3 existing quantitative scores and identified the important features for prediction using the SHapley Additive exPlanations method. Results: The best machine learning method, gradient boosting (GB), achieved the best area under the curve scores of 0.9043, 0.8166, and 0.8360 for NAFLD, NASH, and advanced fibrosis, respectively. GB also outperformed 3 existing risk scores for fibrosis. Among the variables, alanine aminotransferase (ALT), triglyceride (TG), and BMI were the important risk factors for the prediction of NAFLD, whereas aspartate transaminase (AST), ALT, and TG were the important variables for the prediction of NASH, and AST, hyperglycemia (A1c), and high-density lipoprotein were the important variables for predicting advanced fibrosis. Conclusions: It is feasible to use machine learning methods for predicting NAFLD, NASH, and advanced fibrosis using routine clinical data, which potentially can be used to better identify patients who still need liver biopsy. Additionally, understanding the relative importance and differences in predictors could lead to improved understanding of the disease process as well as support for identifying novel treatment options. UR - https://medinform.jmir.org/2022/6/e36997 UR - http://dx.doi.org/10.2196/36997 UR - http://www.ncbi.nlm.nih.gov/pubmed/35666557 ID - info:doi/10.2196/36997 ER - TY - JOUR AU - Alvarez-Romero, Celia AU - Martinez-Garcia, Alicia AU - Ternero Vega, Jara AU - Díaz-Jimènez, Pablo AU - Jimènez-Juan, Carlos AU - Nieto-Martín, Dolores María AU - Román Villarán, Esther AU - Kovacevic, Tomi AU - Bokan, Darijo AU - Hromis, Sanja AU - Djekic Malbasa, Jelena AU - Besla?, Suzana AU - Zaric, Bojan AU - Gencturk, Mert AU - Sinaci, Anil A. AU - Ollero Baturone, Manuel AU - Parra Calderón, Luis Carlos PY - 2022/6/2 TI - Predicting 30-Day Readmission Risk for Patients With Chronic Obstructive Pulmonary Disease Through a Federated Machine Learning Architecture on Findable, Accessible, Interoperable, and Reusable (FAIR) Data: Development and Validation Study JO - JMIR Med Inform SP - e35307 VL - 10 IS - 6 KW - FAIR principles KW - research data management KW - clinical validation KW - chronic obstructive pulmonary disease KW - privacy-preserving distributed data mining KW - early predictive model N2 - Background: Owing to the nature of health data, their sharing and reuse for research are limited by legal, technical, and ethical implications. In this sense, to address that challenge and facilitate and promote the discovery of scientific knowledge, the Findable, Accessible, Interoperable, and Reusable (FAIR) principles help organizations to share research data in a secure, appropriate, and useful way for other researchers. Objective: The objective of this study was the FAIRification of existing health research data sets and applying a federated machine learning architecture on top of the FAIRified data sets of different health research performing organizations. The entire FAIR4Health solution was validated through the assessment of a federated model for real-time prediction of 30-day readmission risk in patients with chronic obstructive pulmonary disease (COPD). Methods: The application of the FAIR principles on health research data sets in 3 different health care settings enabled a retrospective multicenter study for the development of specific federated machine learning models for the early prediction of 30-day readmission risk in patients with COPD. This predictive model was generated upon the FAIR4Health platform. Finally, an observational prospective study with 30 days follow-up was conducted in 2 health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective studies. Results: Clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified data sets from different health research performing organizations. The federated model for predicting the 30-day hospital readmission risk was trained using retrospective data from 4.944 patients with COPD. The assessment of the predictive model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients during the observational prospective study, which was executed from April 2021 to September 2021. Significant accuracy (0.98) and precision (0.25) of the predictive model generated upon the FAIR4Health platform were observed. Therefore, the generated prediction of 30-day readmission risk was confirmed in 87% (87/100) of cases. Conclusions: Implementing a FAIR data policy in health research performing organizations to facilitate data sharing and reuse is relevant and needed, following the discovery, access, integration, and analysis of health research data. The FAIR4Health project proposes a technological solution in the health domain to facilitate alignment with the FAIR principles. UR - https://medinform.jmir.org/2022/6/e35307 UR - http://dx.doi.org/10.2196/35307 UR - http://www.ncbi.nlm.nih.gov/pubmed/35653170 ID - info:doi/10.2196/35307 ER - TY - JOUR AU - Gruendner, Julian AU - Deppenwiese, Noemi AU - Folz, Michael AU - Köhler, Thomas AU - Kroll, Björn AU - Prokosch, Hans-Ulrich AU - Rosenau, Lorenz AU - Rühle, Mathias AU - Scheidl, Marc-Anton AU - Schüttler, Christina AU - Sedlmayr, Brita AU - Twrdik, Alexander AU - Kiel, Alexander AU - Majeed, W. Raphael PY - 2022/5/25 TI - The Architecture of a Feasibility Query Portal for Distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) Patient Data Repositories: Design and Implementation Study JO - JMIR Med Inform SP - e36709 VL - 10 IS - 5 KW - federated feasibility queries KW - FHIR KW - distributed analysis KW - feasibility study KW - HL7 FHIR KW - FHIR Search KW - CQL KW - COVID-19 KW - pandemic KW - health data KW - query KW - patient data KW - consensus data set KW - medical informatics KW - Fast Healthcare Interoperability Resources N2 - Background: An essential step in any medical research project after identifying the research question is to determine if there are sufficient patients available for a study and where to find them. Pursuing digital feasibility queries on available patient data registries has proven to be an excellent way of reusing existing real-world data sources. To support multicentric research, these feasibility queries should be designed and implemented to run across multiple sites and securely access local data. Working across hospitals usually involves working with different data formats and vocabularies. Recently, the Fast Healthcare Interoperability Resources (FHIR) standard was developed by Health Level Seven to address this concern and describe patient data in a standardized format. The Medical Informatics Initiative in Germany has committed to this standard and created data integration centers, which convert existing data into the FHIR format at each hospital. This partially solves the interoperability problem; however, a distributed feasibility query platform for the FHIR standard is still missing. Objective: This study described the design and implementation of the components involved in creating a cross-hospital feasibility query platform for researchers based on FHIR resources. This effort was part of a large COVID-19 data exchange platform and was designed to be scalable for a broad range of patient data. Methods: We analyzed and designed the abstract components necessary for a distributed feasibility query. This included a user interface for creating the query, backend with an ontology and terminology service, middleware for query distribution, and FHIR feasibility query execution service. Results: We implemented the components described in the Methods section. The resulting solution was distributed to 33 German university hospitals. The functionality of the comprehensive network infrastructure was demonstrated using a test data set based on the German Corona Consensus Data Set. A performance test using specifically created synthetic data revealed the applicability of our solution to data sets containing millions of FHIR resources. The solution can be easily deployed across hospitals and supports feasibility queries, combining multiple inclusion and exclusion criteria using standard Health Level Seven query languages such as Clinical Quality Language and FHIR Search. Developing a platform based on multiple microservices allowed us to create an extendable platform and support multiple Health Level Seven query languages and middleware components to allow integration with future directions of the Medical Informatics Initiative. Conclusions: We designed and implemented a feasibility platform for distributed feasibility queries, which works directly on FHIR-formatted data and distributed it across 33 university hospitals in Germany. We showed that developing a feasibility platform directly on the FHIR standard is feasible. UR - https://medinform.jmir.org/2022/5/e36709 UR - http://dx.doi.org/10.2196/36709 UR - http://www.ncbi.nlm.nih.gov/pubmed/35486893 ID - info:doi/10.2196/36709 ER - TY - JOUR AU - Zheng, Chengyi AU - Duffy, Jonathan AU - Liu, Amy In-Lu AU - Sy, S. Lina AU - Navarro, A. Ronald AU - Kim, S. Sunhea AU - Ryan, S. Denison AU - Chen, Wansu AU - Qian, Lei AU - Mercado, Cheryl AU - Jacobsen, J. Steven PY - 2022/5/24 TI - Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method JO - JMIR Public Health Surveill SP - e30426 VL - 8 IS - 5 KW - health KW - informatics KW - shoulder injury related to vaccine administration KW - SIRVA KW - natural language processing KW - NLP KW - causal relation KW - temporal relation KW - pharmacovigilance KW - electronic health records KW - EHR KW - vaccine safety KW - artificial intelligence KW - big data KW - population health KW - real-world data KW - vaccines N2 - Background: Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce. Objective: The goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes. Methods: We conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases. Results: In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively. Conclusions: The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation. UR - https://publichealth.jmir.org/2022/5/e30426 UR - http://dx.doi.org/10.2196/30426 UR - http://www.ncbi.nlm.nih.gov/pubmed/35608886 ID - info:doi/10.2196/30426 ER - TY - JOUR AU - Abaza, Haitham AU - Kadioglu, Dennis AU - Martin, Simona AU - Papadopoulou, Andri AU - dos Santos Vieira, Bruna AU - Schaefer, Franz AU - Storf, Holger PY - 2022/5/20 TI - Domain-Specific Common Data Elements for Rare Disease Registration: Conceptual Approach of a European Joint Initiative Toward Semantic Interoperability in Rare Disease Research JO - JMIR Med Inform SP - e32158 VL - 10 IS - 5 KW - semantic interoperability KW - common data elements KW - standardization KW - data collection KW - data discoverability KW - rare diseases KW - EJP RD KW - EU RD Platform KW - ERNs KW - FAIRification KW - health infrastructure KW - industry KW - medical informatics KW - health platforms KW - health registries KW - health and research platforms KW - health domains N2 - Background: With hundreds of registries across Europe, rare diseases (RDs) suffer from fragmented knowledge, expertise, and research. A joint initiative of the European Commission Joint Research Center and its European Platform on Rare Disease Registration (EU RD Platform), the European Reference Networks (ERNs), and the European Joint Programme on Rare Diseases (EJP RD) was launched in 2020. The purpose was to extend the set of common data elements (CDEs) for RD registration by defining domain-specific CDEs (DCDEs). Objective: This study aims to introduce and assess the feasibility of the concept of a joint initiative that unites the efforts of the European Platform on Rare Disease Registration Platform, ERNs, and European Joint Programme on Rare Diseases toward extending RD CDEs, aiming to improve the semantic interoperability of RD registries and enhance the quality of RD research. Methods: A joint conference was conducted in December 2020. All 24 ERNs were invited. Before the conference, a survey was communicated to all ERNs, proposing 18 medical domains and requesting them to identify highly relevant choices. After the conference, a 3-phase plan for defining and modeling DCDEs was drafted. Expected outcomes included harmonized lists of DCDEs. Results: All ERNs attended the conference. The survey results indicated that genetic, congenital, pediatric, and cancer were the most overlapping domains. Accordingly, the proposed list was reorganized into 10 domain groups and recommunicated to all ERNs, aiming at a smaller number of domains. Conclusions: The approach described for defining DCDEs appears to be feasible. However, it remains dynamic and should be repeated regularly based on arising research needs. UR - https://medinform.jmir.org/2022/5/e32158 UR - http://dx.doi.org/10.2196/32158 UR - http://www.ncbi.nlm.nih.gov/pubmed/35594066 ID - info:doi/10.2196/32158 ER - TY - JOUR AU - Zheng, Yaguang AU - Dickson, Vaughan Victoria AU - Blecker, Saul AU - Ng, M. Jason AU - Rice, Campbell Brynne AU - Melkus, D?Eramo Gail AU - Shenkar, Liat AU - Mortejo, R. Marie Claire AU - Johnson, B. Stephen PY - 2022/5/16 TI - Identifying Patients With Hypoglycemia Using Natural Language Processing: Systematic Literature Review JO - JMIR Diabetes SP - e34681 VL - 7 IS - 2 KW - hypoglycemia KW - natural language processing KW - electronic health records KW - diabetes N2 - Background: Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. Natural language processing (NLP), a form of artificial intelligence, uses computational algorithms to extract information from text data. NLP is a scalable, efficient, and quick method to extract hypoglycemia-related information when using electronic health record data sources from a large population. Objective: The objective of this systematic review was to synthesize the literature on the application of NLP to extract hypoglycemia from electronic health record clinical notes. Methods: Literature searches were conducted electronically in PubMed, Web of Science Core Collection, CINAHL (EBSCO), PsycINFO (Ovid), IEEE Xplore, Google Scholar, and ACL Anthology. Keywords included hypoglycemia, low blood glucose, NLP, and machine learning. Inclusion criteria included studies that applied NLP to identify hypoglycemia, reported the outcomes related to hypoglycemia, and were published in English as full papers. Results: This review (n=8 studies) revealed heterogeneity of the reported results related to hypoglycemia. Of the 8 included studies, 4 (50%) reported that the prevalence rate of any level of hypoglycemia was 3.4% to 46.2%. The use of NLP to analyze clinical notes improved the capture of undocumented or missed hypoglycemic events using International Classification of Diseases, Ninth Revision (ICD-9), and International Classification of Diseases, Tenth Revision (ICD-10), and laboratory testing. The combination of NLP and ICD-9 or ICD-10 codes significantly increased the identification of hypoglycemic events compared with individual methods; for example, the prevalence rates of hypoglycemia were 12.4% for International Classification of Diseases codes, 25.1% for an NLP algorithm, and 32.2% for combined algorithms. All the reviewed studies applied rule-based NLP algorithms to identify hypoglycemia. Conclusions: The findings provided evidence that the application of NLP to analyze clinical notes improved the capture of hypoglycemic events, particularly when combined with the ICD-9 or ICD-10 codes and laboratory testing. UR - https://diabetes.jmir.org/2022/2/e34681 UR - http://dx.doi.org/10.2196/34681 UR - http://www.ncbi.nlm.nih.gov/pubmed/35576579 ID - info:doi/10.2196/34681 ER - TY - JOUR AU - Cuenca-Zaldívar, Nicolás Juan AU - Torrente-Regidor, Maria AU - Martín-Losada, Laura AU - Fernández-De-Las-Peñas, César AU - Florencio, Lima Lidiane AU - Sousa, Alexandre Pedro AU - Palacios-Ceña, Domingo PY - 2022/5/12 TI - Exploring Sentiment and Care Management of Hospitalized Patients During the First Wave of the COVID-19 Pandemic Using Electronic Nursing Health Records: Descriptive Study JO - JMIR Med Inform SP - e38308 VL - 10 IS - 5 KW - electronic health records KW - COVID-19 KW - pandemic KW - content text analysis N2 - Background: The COVID-19 pandemic has changed the usual working of many hospitalization units (or wards). Few studies have used electronic nursing clinical notes (ENCN) and their unstructured text to identify alterations in patients' feelings and therapeutic procedures of interest. Objective: This study aimed to analyze positive or negative sentiments through inspection of the free text of the ENCN, compare sentiments of ENCN with or without hospitalized patients with COVID-19, carry out temporal analysis of the sentiments of the patients during the start of the first wave of the COVID-19 pandemic, and identify the topics in ENCN. Methods: This is a descriptive study with analysis of the text content of ENCN. All ENCNs between January and June 2020 at Guadarrama Hospital (Madrid, Spain) extracted from the CGM Selene Electronic Health Records System were included. Two groups of ENCNs were analyzed: one from hospitalized patients in post?intensive care units for COVID-19 and a second group from hospitalized patients without COVID-19. A sentiment analysis was performed on the lemmatized text, using the National Research Council of Canada, Affin, and Bing dictionaries. A polarity analysis of the sentences was performed using the Bing dictionary, SO Dictionaries V1.11, and Spa dictionary as amplifiers and decrementators. Machine learning techniques were applied to evaluate the presence of significant differences in the ENCN in groups of patients with and those without COVID-19. Finally, a structural analysis of thematic models was performed to study the abstract topics that occur in the ENCN, using Latent Dirichlet Allocation topic modeling. Results: A total of 37,564 electronic health records were analyzed. Sentiment analysis in ENCN showed that patients with subacute COVID-19 have a higher proportion of positive sentiments than those without COVID-19. Also, there are significant differences in polarity between both groups (Z=5.532, P<.001) with a polarity of 0.108 (SD 0.299) in patients with COVID-19 versus that of 0.09 (SD 0.301) in those without COVID-19. Machine learning modeling reported that despite all models presenting high values, it is the neural network that presents the best indicators (>0.8) and with significant P values between both groups. Through Structural Topic Modeling analysis, the final model containing 10 topics was selected. High correlations were noted among topics 2, 5, and 8 (pressure ulcer and pharmacotherapy treatment), topics 1, 4, 7, and 9 (incidences related to fever and well-being state, and baseline oxygen saturation) and topics 3 and 10 (blood glucose level and pain). Conclusions: The ENCN may help in the development and implementation of more effective programs, which allows patients with COVID-19 to adopt to their prepandemic lifestyle faster. Topic modeling could help identify specific clinical problems in patients and better target the care they receive. UR - https://medinform.jmir.org/2022/5/e38308 UR - http://dx.doi.org/10.2196/38308 UR - http://www.ncbi.nlm.nih.gov/pubmed/354869 ID - info:doi/10.2196/38308 ER - TY - JOUR AU - Chen, Uan-I AU - Xu, Hua AU - Krause, Millard Trudy AU - Greenberg, Raymond AU - Dong, Xiao AU - Jiang, Xiaoqian PY - 2022/5/12 TI - Factors Associated With COVID-19 Death in the United States: Cohort Study JO - JMIR Public Health Surveill SP - e29343 VL - 8 IS - 5 KW - COVID-19 KW - risk factors KW - survival analysis KW - cohort studies KW - EHR data N2 - Background: Since the initial COVID-19 cases were identified in the United States in February 2020, the United States has experienced a high incidence of the disease. Understanding the risk factors for severe outcomes identifies the most vulnerable populations and helps in decision-making. Objective: This study aims to assess the factors associated with COVID-19?related deaths from a large, national, individual-level data set. Methods: A cohort study was conducted using data from the Optum de-identified COVID-19 electronic health record (EHR) data set; 1,271,033 adult participants were observed from February 1, 2020, to August 31, 2020, until their deaths due to COVID-19, deaths due to other reasons, or the end of the study. Cox proportional hazards models were constructed to evaluate the risks for each patient characteristic. Results: A total of 1,271,033 participants (age: mean 52.6, SD 17.9 years; male: 507,574/1,271,033, 39.93%) were included in the study, and 3315 (0.26%) deaths were attributed to COVID-19. Factors associated with COVID-19?related death included older age (?80 vs 50-59 years old: hazard ratio [HR] 13.28, 95% CI 11.46-15.39), male sex (HR 1.68, 95% CI 1.57-1.80), obesity (BMI ?40 vs <30 kg/m2: HR 1.71, 95% CI 1.50-1.96), race (Hispanic White, African American, Asian vs non-Hispanic White: HR 2.46, 95% CI 2.01-3.02; HR 2.27, 95% CI 2.06-2.50; HR 2.06, 95% CI 1.65-2.57), region (South, Northeast, Midwest vs West: HR 1.62, 95% CI 1.33-1.98; HR 2.50, 95% CI 2.06-3.03; HR 1.35, 95% CI 1.11-1.64), chronic respiratory disease (HR 1.21, 95% CI 1.12-1.32), cardiac disease (HR 1.10, 95% CI 1.01-1.19), diabetes (HR 1.92, 95% CI 1.75-2.10), recent diagnosis of lung cancer (HR 1.70, 95% CI 1.14-2.55), severely reduced kidney function (HR 1.92, 95% CI 1.69-2.19), stroke or dementia (HR 1.25, 95% CI 1.15-1.36), other neurological diseases (HR 1.77, 95% CI 1.59-1.98), organ transplant (HR 1.35, 95% CI 1.09-1.67), and other immunosuppressive conditions (HR 1.21, 95% CI 1.01-1.46). Conclusions: This is one of the largest national cohort studies in the United States; we identified several patient characteristics associated with COVID-19?related deaths, and the results can serve as the basis for policy making. The study also offered directions for future studies, including the effect of other socioeconomic factors on the increased risk for minority groups. UR - https://publichealth.jmir.org/2022/5/e29343 UR - http://dx.doi.org/10.2196/29343 UR - http://www.ncbi.nlm.nih.gov/pubmed/35377319 ID - info:doi/10.2196/29343 ER - TY - JOUR AU - Attipoe, Selasi AU - Hoffman, Jeffrey AU - Rust, Steve AU - Huang, Yungui AU - Barnard, A. John AU - Schweikhart, Sharon AU - Hefner, L. Jennifer AU - Walker, M. Daniel AU - Linwood, Simon PY - 2022/5/12 TI - Characterization of Electronic Health Record Use Outside Scheduled Clinic Hours Among Primary Care Pediatricians: Retrospective Descriptive Task Analysis of Electronic Health Record Access Log Data JO - JMIR Med Inform SP - e34787 VL - 10 IS - 5 KW - electronic health records KW - access log analysis KW - pediatrics KW - primary care physicians KW - work outside work KW - work outside scheduled clinic hours N2 - Background: Many of the benefits of electronic health records (EHRs) have not been achieved at expected levels because of a variety of unintended negative consequences such as documentation burden. Previous studies have characterized EHR use during and outside work hours, with many reporting that physicians spend considerable time on documentation-related tasks. These studies characterized EHR use during and outside work hours using clock time versus actual physician clinic schedules to define the outside work time. Objective: This study aimed to characterize EHR work outside scheduled clinic hours among primary care pediatricians using a retrospective descriptive task analysis of EHR access log data and actual physician clinic schedules to define work time. Methods: We conducted a retrospective, exploratory, descriptive task analysis of EHR access log data from primary care pediatricians in September 2019 at a large Midwestern pediatric health center to quantify and identify actions completed outside scheduled clinic hours. Mixed-effects statistical modeling was used to investigate the effects of age, sex, clinical full-time equivalent status, and EHR work during scheduled clinic hours on the use of EHRs outside scheduled clinic hours. Results: Primary care pediatricians (n=56) in this study generated 1,523,872 access log data points (across 1069 physician workdays) and spent an average of 4.4 (SD 2.0) hours and 0.8 (SD 0.8) hours per physician per workday engaged in EHRs during and outside scheduled clinic hours, respectively. Approximately three-quarters of the time working in EHR during or outside scheduled clinic hours was spent reviewing data and reports. Mixed-effects regression revealed no associations of age, sex, or clinical full-time equivalent status with EHR use during or outside scheduled clinic hours. Conclusions: For every hour primary care pediatricians spent engaged with the EHR during scheduled clinic hours, they spent approximately 10 minutes interacting with the EHR outside scheduled clinic hours. Most of their time (during and outside scheduled clinic hours) was spent reviewing data, records, and other information in EHR. UR - https://medinform.jmir.org/2022/5/e34787 UR - http://dx.doi.org/10.2196/34787 UR - http://www.ncbi.nlm.nih.gov/pubmed/35551055 ID - info:doi/10.2196/34787 ER - TY - JOUR AU - Kwon, Osung AU - Na, Wonjun AU - Kang, Heejun AU - Jun, Joon Tae AU - Kweon, Jihoon AU - Park, Gyung-Min AU - Cho, YongHyun AU - Hur, Cinyoung AU - Chae, Jungwoo AU - Kang, Do-Yoon AU - Lee, Hyung Pil AU - Ahn, Jung-Min AU - Park, Duk-Woo AU - Kang, Soo-Jin AU - Lee, Seung-Whan AU - Lee, Whan Cheol AU - Park, Seong-Wook AU - Park, Seung-Jung AU - Yang, Hyun Dong AU - Kim, Young-Hak PY - 2022/5/11 TI - Electronic Medical Record?Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events After Invasive Coronary Treatment: Machine Learning Model Development and Validation JO - JMIR Med Inform SP - e26801 VL - 10 IS - 5 KW - big data KW - electronic medical record KW - machine learning KW - mortality KW - adverse cardiac event KW - coronary artery disease KW - prediction N2 - Background: Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. Objective: We aimed to develop and validate machine learning (ML) models by using diverse fields of EMR to predict the risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. Methods: EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006 and 2016 were categorized into static basic (eg, demographics), dynamic time-series (eg, laboratory values), and cardiac-specific data (eg, coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. The primary outcome was 30-day mortality following invasive treatment. Results: GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80, and the AUPRCs of RF, LR, and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with an AUROC of 0.90, while FNN had an AUROC of 0.85. The AUROCs of LR and RF were slightly lower at 0.80 and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. Among the categories in the GBM model, time-series dynamic data demonstrated a high AUROC of >0.95, contributing majorly to the excellent results. Conclusions: Exploiting the diverse fields of the EMR data set, the ML-based 30-day adverse cardiac event prediction models demonstrated outstanding results, and the applied framework could be generalized for various health care prediction models. UR - https://medinform.jmir.org/2022/5/e26801 UR - http://dx.doi.org/10.2196/26801 UR - http://www.ncbi.nlm.nih.gov/pubmed/35544292 ID - info:doi/10.2196/26801 ER - TY - JOUR AU - Ackermann, Khalia AU - Baker, Jannah AU - Festa, Marino AU - McMullan, Brendan AU - Westbrook, Johanna AU - Li, Ling PY - 2022/5/6 TI - Computerized Clinical Decision Support Systems for the Early Detection of Sepsis Among Pediatric, Neonatal, and Maternal Inpatients: Scoping Review JO - JMIR Med Inform SP - e35061 VL - 10 IS - 5 KW - sepsis KW - early detection of disease KW - computerized clinical decision support KW - patient safety KW - electronic health records KW - sepsis care pathway N2 - Background: Sepsis is a severe condition associated with extensive morbidity and mortality worldwide. Pediatric, neonatal, and maternal patients represent a considerable proportion of the sepsis burden. Identifying sepsis cases as early as possible is a key pillar of sepsis management and has prompted the development of sepsis identification rules and algorithms that are embedded in computerized clinical decision support (CCDS) systems. Objective: This scoping review aimed to systematically describe studies reporting on the use and evaluation of CCDS systems for the early detection of pediatric, neonatal, and maternal inpatients at risk of sepsis. Methods: MEDLINE, Embase, CINAHL, Cochrane, Latin American and Caribbean Health Sciences Literature (LILACS), Scopus, Web of Science, OpenGrey, ClinicalTrials.gov, and ProQuest Dissertations and Theses Global (PQDT) were searched by using a search strategy that incorporated terms for sepsis, clinical decision support, and early detection. Title, abstract, and full-text screening was performed by 2 independent reviewers, who consulted a third reviewer as needed. One reviewer performed data charting with a sample of data. This was checked by a second reviewer and via discussions with the review team, as necessary. Results: A total of 33 studies were included in this review?13 (39%) pediatric studies, 18 (55%) neonatal studies, and 2 (6%) maternal studies. All studies were published after 2011, and 27 (82%) were published from 2017 onward. The most common outcome investigated in pediatric studies was the accuracy of sepsis identification (9/13, 69%). Pediatric CCDS systems used different combinations of 18 diverse clinical criteria to detect sepsis across the 13 identified studies. In neonatal studies, 78% (14/18) of the studies investigated the Kaiser Permanente early-onset sepsis risk calculator. All studies investigated sepsis treatment and management outcomes, with 83% (15/18) reporting on antibiotics-related outcomes. Usability and cost-related outcomes were each reported in only 2 (6%) of the 31 pediatric or neonatal studies. Both studies on maternal populations were short abstracts. Conclusions: This review found limited research investigating CCDS systems to support the early detection of sepsis among pediatric, neonatal, and maternal patients, despite the high burden of sepsis in these vulnerable populations. We have highlighted the need for a consensus definition for pediatric and neonatal sepsis and the study of usability and cost-related outcomes as critical areas for future research. International Registered Report Identifier (IRRID): RR2-10.2196/24899 UR - https://medinform.jmir.org/2022/5/e35061 UR - http://dx.doi.org/10.2196/35061 UR - http://www.ncbi.nlm.nih.gov/pubmed/35522467 ID - info:doi/10.2196/35061 ER - TY - JOUR AU - Fränti, Pasi AU - Sieranoja, Sami AU - Wikström, Katja AU - Laatikainen, Tiina PY - 2022/5/4 TI - Clustering Diagnoses From 58 Million Patient Visits in Finland Between 2015 and 2018 JO - JMIR Med Inform SP - e35422 VL - 10 IS - 5 KW - multimorbidity KW - cluster analysis KW - disease co-occurrence KW - multimorbidity network KW - health care data analysis KW - graph clustering KW - k-means KW - data analysis KW - cluster KW - machine learning KW - comorbidity KW - register KW - big data KW - Finland KW - Europe KW - health record N2 - Background: Multiple chronic diseases in patients are a major burden on the health service system. Currently, diseases are mostly treated separately without paying sufficient attention to their relationships, which results in the fragmentation of the care process. The better integration of services can lead to the more effective organization of the overall health care system. Objective: This study aimed to analyze the connections between diseases based on their co-occurrences to support decision-makers in better organizing health care services. Methods: We performed a cluster analysis of diagnoses by using data from the Finnish Health Care Registers for primary and specialized health care visits and inpatient care. The target population of this study comprised those 3.8 million individuals (3,835,531/5,487,308, 69.90% of the whole population) aged ?18 years who used health care services from the years 2015 to 2018. They had a total of 58 million visits. Clustering was performed based on the co-occurrence of diagnoses. The more the same pair of diagnoses appeared in the records of the same patients, the more the diagnoses correlated with each other. On the basis of the co-occurrences, we calculated the relative risk of each pair of diagnoses and clustered the data by using a graph-based clustering algorithm called the M-algorithm?a variant of k-means. Results: The results revealed multimorbidity clusters, of which some were expected (eg, one representing hypertensive and cardiovascular diseases). Other clusters were more unexpected, such as the cluster containing lower respiratory tract diseases and systemic connective tissue disorders. The annual cost of all clusters was ?10.0 billion, and the costliest cluster was cardiovascular and metabolic problems, costing ?2.3 billion. Conclusions: The method and the achieved results provide new insights into identifying key multimorbidity groups, especially those resulting in burden and costs in health care services. UR - https://medinform.jmir.org/2022/5/e35422 UR - http://dx.doi.org/10.2196/35422 UR - http://www.ncbi.nlm.nih.gov/pubmed/35507390 ID - info:doi/10.2196/35422 ER - TY - JOUR AU - Ye, Jiancheng AU - Wang, Zidan AU - Hai, Jiarui PY - 2022/4/29 TI - Social Networking Service, Patient-Generated Health Data, and Population Health Informatics: National Cross-sectional Study of Patterns and Implications of Leveraging Digital Technologies to Support Mental Health and Well-being JO - J Med Internet Res SP - e30898 VL - 24 IS - 4 KW - patient-generated health data KW - social network KW - population health informatics KW - mental health KW - social determinants of health KW - health data sharing KW - technology acceptability KW - mobile phone KW - mobile health N2 - Background: The emerging health technologies and digital services provide effective ways of collecting health information and gathering patient-generated health data (PGHD), which provide a more holistic view of a patient?s health and quality of life over time, increase visibility into a patient?s adherence to a treatment plan or study protocol, and enable timely intervention before a costly care episode. Objective: Through a national cross-sectional survey in the United States, we aimed to describe and compare the characteristics of populations with and without mental health issues (depression or anxiety disorders), including physical health, sleep, and alcohol use. We also examined the patterns of social networking service use, PGHD, and attitudes toward health information sharing and activities among the participants, which provided nationally representative estimates. Methods: We drew data from the 2019 Health Information National Trends Survey of the National Cancer Institute. The participants were divided into 2 groups according to mental health status. Then, we described and compared the characteristics of the social determinants of health, health status, sleeping and drinking behaviors, and patterns of social networking service use and health information data sharing between the 2 groups. Multivariable logistic regression models were applied to assess the predictors of mental health. All the analyses were weighted to provide nationally representative estimates. Results: Participants with mental health issues were significantly more likely to be younger, White, female, and lower-income; have a history of chronic diseases; and be less capable of taking care of their own health. Regarding behavioral health, they slept <6 hours on average, had worse sleep quality, and consumed more alcohol. In addition, they were more likely to visit and share health information on social networking sites, write online diary blogs, participate in online forums or support groups, and watch health-related videos. Conclusions: This study illustrates that individuals with mental health issues have inequitable social determinants of health, poor physical health, and poor behavioral health. However, they are more likely to use social networking platforms and services, share their health information, and actively engage with PGHD. Leveraging these digital technologies and services could be beneficial for developing tailored and effective strategies for self-monitoring and self-management. UR - https://www.jmir.org/2022/4/e30898 UR - http://dx.doi.org/10.2196/30898 UR - http://www.ncbi.nlm.nih.gov/pubmed/35486428 ID - info:doi/10.2196/30898 ER - TY - JOUR AU - Ronaldson, Amy AU - Freestone, Mark AU - Zhang, Haoyuan AU - Marsh, William AU - Bhui, Kamaldeep PY - 2022/4/27 TI - Using Structural Equation Modelling in Routine Clinical Data on Diabetes and Depression: Observational Cohort Study JO - JMIRx Med SP - e22912 VL - 3 IS - 2 KW - depression KW - diabetes KW - electronic health records KW - acute care KW - PLS-SEM KW - path analysis KW - equation modelling KW - accident KW - emergency care KW - emergency KW - structural equation modelling KW - clinical data N2 - Background: Large data sets comprising routine clinical data are becoming increasingly available for use in health research. These data sets contain many clinical variables that might not lend themselves to use in research. Structural equation modelling (SEM) is a statistical technique that might allow for the creation of ?research-friendly? clinical constructs from these routine clinical variables and therefore could be an appropriate analytic method to apply more widely to routine clinical data. Objective: SEM was applied to a large data set of routine clinical data developed in East London to model well-established clinical associations. Depression is common among patients with type 2 diabetes, and is associated with poor diabetic control, increased diabetic complications, increased health service utilization, and increased health care costs. Evidence from trial data suggests that integrating psychological treatment into diabetes care can improve health status and reduce costs. Attempting to model these known associations using SEM will test the utility of this technique in routine clinical data sets. Methods: Data were cleaned extensively prior to analysis. SEM was used to investigate associations between depression, diabetic control, diabetic care, mental health treatment, and Accident & Emergency (A&E) use in patients with type 2 diabetes. The creation of the latent variables and the direction of association between latent variables in the model was based upon established clinical knowledge. Results: The results provided partial support for the application of SEM to routine clinical data. Overall, 19% (3106/16,353) of patients with type 2 diabetes had received a diagnosis of depression. In line with known clinical associations, depression was associated with worse diabetic control (?=.034, P<.001) and increased A&E use (?=.071, P<.001). However, contrary to expectation, worse diabetic control was associated with lower A&E use (?=?.055, P<.001) and receipt of mental health treatment did not impact upon diabetic control (P=.39). Receipt of diabetes care was associated with better diabetic control (?=?.072, P<.001), having depression (?=.018, P=.007), and receiving mental health treatment (?=.046, P<.001), which might suggest that comprehensive integrated care packages are being delivered in East London. Conclusions: Some established clinical associations were successfully modelled in a sample of patients with type 2 diabetes in a way that made clinical sense, providing partial evidence for the utility of SEM in routine clinical data. Several issues relating to data quality emerged. Data improvement would have likely enhanced the utility of SEM in this data set. UR - https://med.jmirx.org/2022/2/e22912 UR - http://dx.doi.org/10.2196/22912 UR - http://www.ncbi.nlm.nih.gov/pubmed/37725546 ID - info:doi/10.2196/22912 ER - TY - JOUR AU - Hu, Danqing AU - Li, Shaolei AU - Zhang, Huanyao AU - Wu, Nan AU - Lu, Xudong PY - 2022/4/25 TI - Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non?Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study JO - JMIR Med Inform SP - e35475 VL - 10 IS - 4 KW - non?small cell lung cancer KW - lymph node metastasis prediction KW - natural language processing KW - electronic medical records KW - lung cancer KW - prediction models KW - decision making KW - machine learning KW - algorithm KW - forest modeling N2 - Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non?small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician?s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician?s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician?s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models. UR - https://medinform.jmir.org/2022/4/e35475 UR - http://dx.doi.org/10.2196/35475 UR - http://www.ncbi.nlm.nih.gov/pubmed/35468085 ID - info:doi/10.2196/35475 ER - TY - JOUR AU - Mudaranthakam, Pal Dinesh AU - Gajewski, Byron AU - Krebill, Hope AU - Coulter, James AU - Springer, Michelle AU - Calhoun, Elizabeth AU - Hughes, Dorothy AU - Mayo, Matthew AU - Doolittle, Gary PY - 2022/4/21 TI - Barriers to Clinical Trial Participation: Comparative Study Between Rural and Urban Participants JO - JMIR Cancer SP - e33240 VL - 8 IS - 2 KW - rural residents KW - clinical trials KW - screening KW - cancer KW - patients KW - lung cancer KW - health policy epidemiology KW - cancer patients KW - electronic screening logs KW - electronic screening N2 - Background: The National Clinical Trials Network program conducts phase 2 or phase 3 treatment trials across all National Cancer Institute?s designated cancer centers. Participant accrual across these clinical trials is a critical factor in deciding their success. Cancer centers that cater to rural populations, such as The University of Kansas Cancer Center, have an additional responsibility to ensure rural residents have access and are well represented across these studies. Objective: There are scant data available regarding the factors that act as barriers to the accrual of rural residents in these clinical trials. This study aims to use electronic screening logs that were used to gather patient data at several participating sites in The Kansas University of Cancer Center?s Catchment area. Methods: Screening log data were used to assess what clinical trial participation barriers are faced by these patients. Additionally, the differences in clinical trial participation barriers were compared between rural and urban participating sites. Results: Analysis revealed that the hospital location rural urban category, defined as whether the hospital was in an urban or rural setting, had a medium effect on enrolment of patients in breast cancer and lung cancer trials (Cohen d=0.7). Additionally, the hospital location category had a medium effect on the proportion of recurrent lung cancer cases at the time of screening (d=0.6). Conclusions: In consideration of the financially hostile nature of cancer treatment as well as geographical and transportation barriers, clinical trials extended to rural communities are uniquely positioned to alleviate the burden of nonmedical costs in trial participation. However, these options can be far less feasible for patients in rural settings. Since the number of patients with cancer who are eligible for a clinical trial is already limited by the stringent eligibility criteria required of such a complex disease, improving accessibility for rural patients should be a greater focus in health policy. UR - https://cancer.jmir.org/2022/2/e33240 UR - http://dx.doi.org/10.2196/33240 UR - http://www.ncbi.nlm.nih.gov/pubmed/35451964 ID - info:doi/10.2196/33240 ER - TY - JOUR AU - Sharifi-Heris, Zahra AU - Laitala, Juho AU - Airola, Antti AU - Rahmani, M. Amir AU - Bender, Miriam PY - 2022/4/20 TI - Machine Learning Approach for Preterm Birth Prediction Using Health Records: Systematic Review JO - JMIR Med Inform SP - e33875 VL - 10 IS - 4 KW - preterm birth KW - prediction model KW - machine learning approach KW - artificial intelligence N2 - Background: Preterm birth (PTB), a common pregnancy complication, is responsible for 35% of the 3.1 million pregnancy-related deaths each year and significantly affects around 15 million children annually worldwide. Conventional approaches to predict PTB lack reliable predictive power, leaving >50% of cases undetected. Recently, machine learning (ML) models have shown potential as an appropriate complementary approach for PTB prediction using health records (HRs). Objective: This study aimed to systematically review the literature concerned with PTB prediction using HR data and the ML approach. Methods: This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement. A comprehensive search was performed in 7 bibliographic databases until May 15, 2021. The quality of the studies was assessed, and descriptive information, including descriptive characteristics of the data, ML modeling processes, and model performance, was extracted and reported. Results: A total of 732 papers were screened through title and abstract. Of these 732 studies, 23 (3.1%) were screened by full text, resulting in 13 (1.8%) papers that met the inclusion criteria. The sample size varied from a minimum value of 274 to a maximum of 1,400,000. The time length for which data were extracted varied from 1 to 11 years, and the oldest and newest data were related to 1988 and 2018, respectively. Population, data set, and ML models? characteristics were assessed, and the performance of the model was often reported based on metrics such as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve. Conclusions: Various ML models used for different HR data indicated potential for PTB prediction. However, evaluation metrics, software and package used, data size and type, selected features, and importantly data management method often remain unjustified, threatening the reliability, performance, and internal or external validity of the model. To understand the usefulness of ML in covering the existing gap, future studies are also suggested to compare it with a conventional method on the same data set. UR - https://medinform.jmir.org/2022/4/e33875 UR - http://dx.doi.org/10.2196/33875 UR - http://www.ncbi.nlm.nih.gov/pubmed/35442214 ID - info:doi/10.2196/33875 ER - TY - JOUR AU - Fitzer, Kai AU - Haeuslschmid, Renate AU - Blasini, Romina AU - Altun, Betül Fatma AU - Hampf, Christopher AU - Freiesleben, Sherry AU - Macho, Philipp AU - Prokosch, Hans-Ulrich AU - Gulden, Christian PY - 2022/4/20 TI - Patient Recruitment System for Clinical Trials: Mixed Methods Study About Requirements at Ten University Hospitals JO - JMIR Med Inform SP - e28696 VL - 10 IS - 4 KW - patient recruitment system KW - clinical trial recruitment support system KW - recruitment KW - patient screening KW - requirements KW - user needs KW - clinical trial KW - interview KW - survey KW - electronic support KW - clinical information systems KW - eHealth N2 - Background: Clinical trials are the gold standard for advancing medical knowledge and improving patient outcomes. For their success, an appropriately sized cohort is required. However, patient recruitment remains one of the most challenging aspects of clinical trials. Information technology (IT) support systems?for instance, patient recruitment systems?may help overcome existing challenges and improve recruitment rates, when customized to the user needs and environment. Objective: The goal of our study is to describe the status quo of patient recruitment processes and to identify user requirements for the development of a patient recruitment system. Methods: We conducted a web-based survey with 56 participants as well as semistructured interviews with 33 participants from 10 German university hospitals. Results: We here report the recruitment procedures and challenges of 10 university hospitals. The recruitment process was influenced by diverse factors such as the ward, use of software, and the study inclusion criteria. Overall, clinical staff seemed more involved in patient identification, while the research staff focused on screening tasks. Ad hoc and planned screenings were common. Identifying eligible patients was still associated with significant manual efforts. The recruitment staff used Microsoft Office suite because tailored software were not available. To implement such software, data from disparate sources will need to be made available. We discussed concrete technical challenges concerning patient recruitment systems, including requirements for features, data, infrastructure, and workflow integration, and we contributed to the support of developing a successful system. Conclusions: Identifying eligible patients is still associated with significant manual efforts. To fully make use of the high potential of IT in patient recruitment, many technical and process challenges have to be solved first. We contribute and discuss concrete technical challenges for patient recruitment systems, including requirements for features, data, infrastructure, and workflow integration. UR - https://medinform.jmir.org/2022/4/e28696 UR - http://dx.doi.org/10.2196/28696 UR - http://www.ncbi.nlm.nih.gov/pubmed/35442203 ID - info:doi/10.2196/28696 ER - TY - JOUR AU - Fong, Allan AU - Iscoe, Mark AU - Sinsky, A. Christine AU - Haimovich, D. Adrian AU - Williams, Brian AU - O'Connell, T. Ryan AU - Goldstein, Richard AU - Melnick, Edward PY - 2022/4/15 TI - Cluster Analysis of Primary Care Physician Phenotypes for Electronic Health Record Use: Retrospective Cohort Study JO - JMIR Med Inform SP - e34954 VL - 10 IS - 4 KW - electronic health record KW - phenotypes KW - cluster analysis KW - unsupervised machine learning KW - machine learning KW - EHR KW - primary care N2 - Background: Electronic health records (EHRs) have become ubiquitous in US office-based physician practices. However, the different ways in which users engage with EHRs remain poorly characterized. Objective: The aim of this study is to explore EHR use phenotypes among ambulatory care physicians. Methods: In this retrospective cohort analysis, we applied affinity propagation, an unsupervised clustering machine learning technique, to identify EHR user types among primary care physicians. Results: We identified 4 distinct phenotype clusters generalized across internal medicine, family medicine, and pediatrics specialties. Total EHR use varied for physicians in 2 clusters with above-average ratios of work outside of scheduled hours. This finding suggested that one cluster of physicians may have worked outside of scheduled hours out of necessity, whereas the other preferred ad hoc work hours. The two remaining clusters represented physicians with below-average EHR time and physicians who spend the largest proportion of their EHR time on documentation. Conclusions: These findings demonstrate the utility of cluster analysis for exploring EHR use phenotypes and may offer opportunities for interventions to improve interface design to better support users? needs. UR - https://medinform.jmir.org/2022/4/e34954 UR - http://dx.doi.org/10.2196/34954 UR - http://www.ncbi.nlm.nih.gov/pubmed/35275070 ID - info:doi/10.2196/34954 ER - TY - JOUR AU - Wang, Miye AU - Li, Sheyu AU - Zheng, Tao AU - Li, Nan AU - Shi, Qingke AU - Zhuo, Xuejun AU - Ding, Renxin AU - Huang, Yong PY - 2022/4/13 TI - Big Data Health Care Platform With Multisource Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application JO - JMIR Med Inform SP - e36481 VL - 10 IS - 4 KW - big data platform in health care KW - multisource KW - heterogeneous KW - data integration KW - data governance KW - data application KW - data security KW - data quality control KW - big data KW - data science KW - medical informatics KW - health care N2 - Background: With the advent of data-intensive science, a full integration of big data science and health care will bring a cross-field revolution to the medical community in China. The concept big data represents not only a technology but also a resource and a method. Big data are regarded as an important strategic resource both at the national level and at the medical institutional level, thus great importance has been attached to the construction of a big data platform for health care. Objective: We aimed to develop and implement a big data platform for a large hospital, to overcome difficulties in integrating, calculating, storing, and governing multisource heterogeneous data in a standardized way, as well as to ensure health care data security. Methods: The project to build a big data platform at West China Hospital of Sichuan University was launched in 2017. The West China Hospital of Sichuan University big data platform has extracted, integrated, and governed data from different departments and sections of the hospital since January 2008. A master?slave mode was implemented to realize the real-time integration of multisource heterogeneous massive data, and an environment that separates heterogeneous characteristic data storage and calculation processes was built. A business-based metadata model was improved for data quality control, and a standardized health care data governance system and scientific closed-loop data security ecology were established. Results: After 3 years of design, development, and testing, the West China Hospital of Sichuan University big data platform was formally brought online in November 2020. It has formed a massive multidimensional data resource database, with more than 12.49 million patients, 75.67 million visits, and 8475 data variables. Along with hospital operations data, newly generated data are entered into the platform in real time. Since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams, facilitating a shift in the data support model?from conventional manual extraction to self-service retrieval (which has reached 8561 retrievals per month). Conclusions: The platform can combine operation systems data from all departments and sections in a hospital to form a massive high-dimensional high-quality health care database that allows electronic medical records to be used effectively and taps into the value of data to fully support clinical services, scientific research, and operations management. The West China Hospital of Sichuan University big data platform can successfully generate multisource heterogeneous data storage and computing power. By effectively governing massive multidimensional data gathered from multiple sources, the West China Hospital of Sichuan University big data platform provides highly available data assets and thus has a high application value in the health care field. The West China Hospital of Sichuan University big data platform facilitates simpler and more efficient utilization of electronic medical record data for real-world research. UR - https://medinform.jmir.org/2022/4/e36481 UR - http://dx.doi.org/10.2196/36481 UR - http://www.ncbi.nlm.nih.gov/pubmed/35416792 ID - info:doi/10.2196/36481 ER - TY - JOUR AU - El Emam, Khaled AU - Mosquera, Lucy AU - Fang, Xi AU - El-Hussuna, Alaa PY - 2022/4/7 TI - Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study JO - JMIR Med Inform SP - e35734 VL - 10 IS - 4 KW - synthetic data KW - data utility KW - data privacy KW - generative models KW - utility metric KW - synthetic data generation KW - logistic regression KW - model validation KW - medical informatics KW - binary prediction model KW - prediction model N2 - Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. Objective: This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. Methods: We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. Results: The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. Conclusions: This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. UR - https://medinform.jmir.org/2022/4/e35734 UR - http://dx.doi.org/10.2196/35734 UR - http://www.ncbi.nlm.nih.gov/pubmed/35389366 ID - info:doi/10.2196/35734 ER - TY - JOUR AU - Nicolet, Anna AU - Assouline, Dan AU - Le Pogam, Marie-Annick AU - Perraudin, Clémence AU - Bagnoud, Christophe AU - Wagner, Joël AU - Marti, Joachim AU - Peytremann-Bridevaux, Isabelle PY - 2022/4/4 TI - Exploring Patient Multimorbidity and Complexity Using Health Insurance Claims Data: A Cluster Analysis Approach JO - JMIR Med Inform SP - e34274 VL - 10 IS - 4 KW - multimorbidity KW - pharmacy cost groups KW - cluster analysis KW - claims data KW - patient complexity KW - health claims KW - informatics N2 - Background: Although the trend of progressing morbidity is widely recognized, there are numerous challenges when studying multimorbidity and patient complexity. For multimorbid or complex patients, prone to fragmented care and high health care use, novel estimation approaches need to be developed. Objective: This study aims to investigate the patient multimorbidity and complexity of Swiss residents aged ?50 years using clustering methodology in claims data. Methods: We adopted a clustering methodology based on random forests and used 34 pharmacy-based cost groups as the only input feature for the procedure. To detect clusters, we applied hierarchical density-based spatial clustering of applications with noise. The reasonable hyperparameters were chosen based on various metrics embedded in the algorithms (out-of-bag misclassification error, normalized stress, and cluster persistence) and the clinical relevance of the obtained clusters. Results: Based on cluster analysis output for 18,732 individuals, we identified an outlier group and 7 clusters: individuals without diseases, patients with only hypertension-related diseases, patients with only mental diseases, complex high-cost high-need patients, slightly complex patients with inexpensive low-severity pharmacy-based cost groups, patients with 1 costly disease, and older high-risk patients. Conclusions: Our study demonstrated that cluster analysis based on pharmacy-based cost group information from claims-based data is feasible and highlights clinically relevant clusters. Such an approach allows expanding the understanding of multimorbidity beyond simple disease counts and can identify the population profiles with increased health care use and costs. This study may foster the development of integrated and coordinated care, which is high on the agenda in policy making, care planning, and delivery. UR - https://medinform.jmir.org/2022/4/e34274 UR - http://dx.doi.org/10.2196/34274 UR - http://www.ncbi.nlm.nih.gov/pubmed/35377334 ID - info:doi/10.2196/34274 ER - TY - JOUR AU - Shen, Nelson AU - Kassam, Iman AU - Zhao, Haoyu AU - Chen, Sheng AU - Wang, Wei AU - Wickham, Sarah AU - Strudwick, Gillian AU - Carter-Langford, Abigail PY - 2022/3/31 TI - Foundations for Meaningful Consent in Canada?s Digital Health Ecosystem: Retrospective Study JO - JMIR Med Inform SP - e30986 VL - 10 IS - 3 KW - consent KW - eConsent KW - privacy KW - trust KW - digital health KW - health information exchange KW - patient perspective KW - health informatics KW - Canada N2 - Background: Canadians are increasingly gaining web-based access to digital health services, and they expect to access their data from these services through a central patient access channel. Implementing data sharing between these services will require patient trust that is fostered through meaningful consent and consent management. Understanding user consent requirements and information needs is necessary for developing a trustworthy and transparent consent management system. Objective: The objective of this study is to explore consent management preferences and information needs to support meaningful consent. Methods: A secondary analysis of a national survey was conducted using a retrospective descriptive study design. The 2019 cross-sectional survey used a series of vignettes and consent scenarios to explore Canadians? privacy perspectives and preferences regarding consent management. Nonparametric tests and logistic regression analyses were conducted to identify the differences and associations between various factors. Results: Of the 1017 total responses, 716 (70.4%) participants self-identified as potential users. Of the potential users, almost all (672/716, 93.8%) felt that the ability to control their data was important, whereas some (385/716, 53.8%) believed that an all or none control at the data source level was adequate. Most potential users preferred new data sources to be accessible by health care providers (546/716, 76.3%) and delegated parties (389/716, 54.3%) by default. Prior digital health use was associated with greater odds of granting default access when compared with no prior use, with the greatest odds of granting default access to digital health service providers (odds ratio 2.17, 95% CI 1.36-3.46). From a list of 9 information elements found in consent forms, potential users selected an average of 5.64 (SD 2.68) and 5.54 (SD 2.85) items to feel informed in consenting to data access by care partners and commercial digital health service providers, respectively. There was no significant difference in the number of items selected between the 2 scenarios (P>.05); however, there were significant differences (P<.05) in information types that were selected between the scenarios. Conclusions: A majority of survey participants reported that they would register and use a patient access channel and believed that the ability to control data access was important, especially as it pertains to access by those outside their care. These findings suggest that a broad all or none approach based on data source may be accepted; however, approximately one-fifth of potential users were unable to decide. Although vignettes were used to introduce the questions, this study showed that more context is required for potential users to make informed consent decisions. Understanding their information needs will be critical, as these needs vary with the use case, highlighting the importance of prioritizing and tailoring information to enable meaningful consent. UR - https://medinform.jmir.org/2022/3/e30986 UR - http://dx.doi.org/10.2196/30986 UR - http://www.ncbi.nlm.nih.gov/pubmed/35357318 ID - info:doi/10.2196/30986 ER - TY - JOUR AU - Jung, Christian AU - Mamandipoor, Behrooz AU - Fjølner, Jesper AU - Bruno, Romano Raphael AU - Wernly, Bernhard AU - Artigas, Antonio AU - Bollen Pinto, Bernardo AU - Schefold, C. Joerg AU - Wolff, Georg AU - Kelm, Malte AU - Beil, Michael AU - Sviri, Sigal AU - van Heerden, V. Peter AU - Szczeklik, Wojciech AU - Czuczwar, Miroslaw AU - Elhadi, Muhammed AU - Joannidis, Michael AU - Oeyen, Sandra AU - Zafeiridis, Tilemachos AU - Marsh, Brian AU - Andersen, H. Finn AU - Moreno, Rui AU - Cecconi, Maurizio AU - Leaver, Susannah AU - De Lange, W. Dylan AU - Guidet, Bertrand AU - Flaatten, Hans AU - Osmani, Venet PY - 2022/3/31 TI - Disease-Course Adapting Machine Learning Prognostication Models in Elderly Patients Critically Ill With COVID-19: Multicenter Cohort Study With External Validation JO - JMIR Med Inform SP - e32949 VL - 10 IS - 3 KW - machine-based learning KW - outcome prediction KW - COVID-19 KW - pandemic KW - machine learning KW - prediction models KW - clinical informatics KW - patient data KW - elderly population N2 - Background: The COVID-19 pandemic caused by SARS-CoV-2 is challenging health care systems globally. The disease disproportionately affects the elderly population, both in terms of disease severity and mortality risk. Objective: The aim of this study was to evaluate machine learning?based prognostication models for critically ill elderly COVID-19 patients, which dynamically incorporated multifaceted clinical information on evolution of the disease. Methods: This multicenter cohort study (COVIP study) obtained patient data from 151 intensive care units (ICUs) from 26 countries. Different models based on the Sequential Organ Failure Assessment (SOFA) score, logistic regression (LR), random forest (RF), and extreme gradient boosting (XGB) were derived as baseline models that included admission variables only. We subsequently included clinical events and time-to-event as additional variables to derive the final models using the same algorithms and compared their performance with that of the baseline group. Furthermore, we derived baseline and final models on a European patient cohort, which were externally validated on a non-European cohort that included Asian, African, and US patients. Results: In total, 1432 elderly (?70 years old) COVID-19?positive patients admitted to an ICU were included for analysis. Of these, 809 (56.49%) patients survived up to 30 days after admission. The average length of stay was 21.6 (SD 18.2) days. Final models that incorporated clinical events and time-to-event information provided superior performance (area under the receiver operating characteristic curve of 0.81; 95% CI 0.804-0.811), with respect to both the baseline models that used admission variables only and conventional ICU prediction models (SOFA score, P<.001). The average precision increased from 0.65 (95% CI 0.650-0.655) to 0.77 (95% CI 0.759-0.770). Conclusions: Integrating important clinical events and time-to-event information led to a superior accuracy of 30-day mortality prediction compared with models based on the admission information and conventional ICU prediction models. This study shows that machine-learning models provide additional information and may support complex decision-making in critically ill elderly COVID-19 patients. Trial Registration: ClinicalTrials.gov NCT04321265; https://clinicaltrials.gov/ct2/show/NCT04321265 UR - https://medinform.jmir.org/2022/3/e32949 UR - http://dx.doi.org/10.2196/32949 UR - http://www.ncbi.nlm.nih.gov/pubmed/35099394 ID - info:doi/10.2196/32949 ER - TY - JOUR AU - Cooper, Drew AU - Ubben, Tebbe AU - Knoll, Christine AU - Ballhausen, Hanne AU - O'Donnell, Shane AU - Braune, Katarina AU - Lewis, Dana PY - 2022/3/31 TI - Open-source Web Portal for Managing Self-reported Data and Real-world Data Donation in Diabetes Research: Platform Feasibility Study JO - JMIR Diabetes SP - e33213 VL - 7 IS - 1 KW - diabetes KW - type 1 diabetes KW - automated insulin delivery KW - diabetes technology KW - open-source KW - patient-reported outcomes KW - real-world data KW - research methods KW - mixed methods KW - insulin KW - digital health KW - web portal N2 - Background: People with diabetes and their support networks have developed open-source automated insulin delivery systems to help manage their diabetes therapy, as well as to improve their quality of life and glycemic outcomes. Under the hashtag #WeAreNotWaiting, a wealth of knowledge and real-world data have been generated by users of these systems but have been left largely untapped by research; opportunities for such multimodal studies remain open. Objective: We aimed to evaluate the feasibility of several aspects of open-source automated insulin delivery systems including challenges related to data management and security across multiple disparate web-based platforms and challenges related to implementing follow-up studies. Methods: We developed a mixed methods study to collect questionnaire responses and anonymized diabetes data donated by participants?which included adults and children with diabetes and their partners or caregivers recruited through multiple diabetes online communities. We managed both front-end participant interactions and back-end data management with our web portal (called the Gateway). Participant questionnaire data from electronic data capture (REDCap) and personal device data aggregation (Open Humans) platforms were pseudonymously and securely linked and stored within a custom-built database that used both open-source and commercial software. Participants were later given the option to include their health care providers in the study to validate their questionnaire responses; the database architecture was designed specifically with this kind of extensibility in mind. Results: Of 1052 visitors to the study landing page, 930 participated and completed at least one questionnaire. After the implementation of health care professional validation of self-reported clinical outcomes to the study, an additional 164 individuals visited the landing page, with 142 completing at least one questionnaire. Of the optional study elements, 7 participant?health care professional dyads participated in the survey, and 97 participants who completed the survey donated their anonymized medical device data. Conclusions: The platform was accessible to participants while maintaining compliance with data regulations. The Gateway formalized a system of automated data matching between multiple data sets, which was a major benefit to researchers. Scalability of the platform was demonstrated with the later addition of self-reported data validation. This study demonstrated the feasibility of custom software solutions in addressing complex study designs. The Gateway portal code has been made available open-source and can be leveraged by other research groups. UR - https://diabetes.jmir.org/2022/1/e33213 UR - http://dx.doi.org/10.2196/33213 UR - http://www.ncbi.nlm.nih.gov/pubmed/35357312 ID - info:doi/10.2196/33213 ER - TY - JOUR AU - Lerner, Ivan AU - Serret-Larmande, Arnaud AU - Rance, Bastien AU - Garcelon, Nicolas AU - Burgun, Anita AU - Chouchana, Laurent AU - Neuraz, Antoine PY - 2022/3/30 TI - Mining Electronic Health Records for Drugs Associated With 28-day Mortality in COVID-19: Pharmacopoeia-wide Association Study (PharmWAS) JO - JMIR Med Inform SP - e35190 VL - 10 IS - 3 KW - COVID-19 KW - drug repurposing KW - wide association studies KW - clinical data KW - pharmacopeia KW - electronic medical records KW - health data KW - mortality rate KW - hospitalization KW - patient data N2 - Background: Patients hospitalized for a given condition may be receiving other treatments for other contemporary conditions or comorbidities. The use of such observational clinical data for pharmacological hypothesis generation is appealing in the context of an emerging disease but particularly challenging due to the presence of drug indication bias. Objective: With this study, our main objective was the development and validation of a fully data-driven pipeline that would address this challenge. Our secondary objective was to generate pharmacological hypotheses in patients with COVID-19 and demonstrate the clinical relevance of the pipeline. Methods: We developed a pharmacopeia-wide association study (PharmWAS) pipeline inspired from the PheWAS methodology, which systematically screens for associations between the whole pharmacopeia and a clinical phenotype. First, a fully data-driven procedure based on adaptive least absolute shrinkage and selection operator (LASSO) determined drug-specific adjustment sets. Second, we computed several measures of association, including robust methods based on propensity scores (PSs) to control indication bias. Finally, we applied the Benjamini and Hochberg procedure of the false discovery rate (FDR). We applied this method in a multicenter retrospective cohort study using electronic medical records from 16 university hospitals of the Greater Paris area. We included all adult patients between 18 and 95 years old hospitalized in conventional wards for COVID-19 between February 1, 2020, and June 15, 2021. We investigated the association between drug prescription within 48 hours from admission and 28-day mortality. We validated our data-driven pipeline against a knowledge-based pipeline on 3 treatments of reference, for which experts agreed on the expected association with mortality. We then demonstrated its clinical relevance by screening all drugs prescribed in more than 100 patients to generate pharmacological hypotheses. Results: A total of 5783 patients were included in the analysis. The median age at admission was 69.2 (IQR 56.7-81.1) years, and 3390 (58.62%) of the patients were male. The performance of our automated pipeline was comparable or better for controlling bias than the knowledge-based adjustment set for 3 reference drugs: dexamethasone, phloroglucinol, and paracetamol. After correction for multiple testing, 4 drugs were associated with increased in-hospital mortality. Among these, diazepam and tramadol were the only ones not discarded by automated diagnostics, with adjusted odds ratios of 2.51 (95% CI 1.52-4.16, Q=.01) and 1.94 (95% CI 1.32-2.85, Q=.02), respectively. Conclusions: Our innovative approach proved useful in generating pharmacological hypotheses in an outbreak setting, without requiring a priori knowledge of the disease. Our systematic analysis of early prescribed treatments from patients hospitalized for COVID-19 showed that diazepam and tramadol are associated with increased 28-day mortality. Whether these drugs could worsen COVID-19 needs to be further assessed. UR - https://medinform.jmir.org/2022/3/e35190 UR - http://dx.doi.org/10.2196/35190 UR - http://www.ncbi.nlm.nih.gov/pubmed/35275837 ID - info:doi/10.2196/35190 ER - TY - JOUR AU - de Lusignan, Simon AU - Tsang, M. Ruby S. AU - Akinyemi, Oluwafunmi AU - Lopez Bernal, Jamie AU - Amirthalingam, Gayatri AU - Sherlock, Julian AU - Smith, Gillian AU - Zambon, Maria AU - Howsam, Gary AU - Joy, Mark PY - 2022/3/28 TI - Adverse Events of Interest Following Influenza Vaccination in the First Season of Adjuvanted Trivalent Immunization: Retrospective Cohort Study JO - JMIR Public Health Surveill SP - e25803 VL - 8 IS - 3 KW - influenza KW - influenza vaccines KW - adverse events of interest KW - computerized medical record systems KW - sentinel surveillance N2 - Background: Vaccination is the most effective form of prevention of seasonal influenza; the United Kingdom has a national influenza vaccination program to cover targeted population groups. Influenza vaccines are known to be associated with some common minor adverse events of interest (AEIs), but it is not known if the adjuvanted trivalent influenza vaccine (aTIV), first offered in the 2018/2019 season, would be associated with more AEIs than other types of vaccines. Objective: We aim to compare the incidence of AEIs associated with different types of seasonal influenza vaccines offered in the 2018/2019 season. Methods: We carried out a retrospective cohort study using computerized medical record data from the Royal College of General Practitioners Research and Surveillance Centre sentinel network database. We extracted data on vaccine exposure and consultations for European Medicines Agency?specified AEIs for the 2018/2019 influenza season. We used a self-controlled case series design; computed relative incidence (RI) of AEIs following vaccination; and compared the incidence of AEIs associated with aTIV, the quadrivalent influenza vaccine, and the live attenuated influenza vaccine. We also compared the incidence of AEIs for vaccinations that took place in a practice with those that took place elsewhere. Results: A total of 1,024,160 individuals received a seasonal influenza vaccine, of which 165,723 individuals reported a total of 283,355 compatible symptoms in the 2018/2019 season. Most AEIs occurred within 7 days following vaccination, with a seasonal effect observed. Using aTIV as the reference group, the quadrivalent influenza vaccine was associated with a higher incidence of AEIs (RI 1.46, 95% CI 1.41-1.52), whereas the live attenuated influenza vaccine was associated with a lower incidence of AEIs (RI 0.79, 95% CI 0.73-0.83). No effect of vaccination setting on the incidence of AEIs was observed. Conclusions: Routine sentinel network data offer an opportunity to make comparisons between safety profiles of different vaccines. Evidence that supports the safety of newer types of vaccines may be reassuring for patients and could help improve uptake in the future. UR - https://publichealth.jmir.org/2022/3/e25803 UR - http://dx.doi.org/10.2196/25803 UR - http://www.ncbi.nlm.nih.gov/pubmed/35343907 ID - info:doi/10.2196/25803 ER - TY - JOUR AU - Ryan, Irene AU - Herrick, Cynthia AU - Ebeling, E. Mary F. AU - Foraker, Randi PY - 2022/3/25 TI - Constructing an Adapted Cascade of Diabetes Care Using Inpatient Admissions Data: Cross-sectional Study JO - JMIR Diabetes SP - e27486 VL - 7 IS - 1 KW - diabetes mellitus KW - cascade of care KW - EHR data KW - health care monitoring KW - inpatient care N2 - Background: The diabetes mellitus cascade of care has been constructed to evaluate diabetes care at a population level by determining the percentage of individuals diagnosed and linked to care as well as their reported glycemic control. Objective: We sought to adapt the cascade of care to an inpatient-only setting using the electronic health record (EHR) data of 81,633 patients with type 2 diabetes. Methods: In this adaptation, linkage to care was defined as prescription of diabetes medications within 3 months of discharge, and control was defined as hemoglobin A1c (HbA1c) below individual target levels, as these are the most reliably captured items in the inpatient setting. We applied the cascade model to assess differences in demographics and percent loss at each stage of the cascade; we then conducted two-sample chi-square equality of proportions tests for each demographic. Based on findings in the previous literature, we hypothesized that women, Black patients, younger patients (<45 years old), uninsured patients, and patients living in an economically deprived area called the Promise Zone would be disproportionately unlinked and uncontrolled. We also predicted that patients who received inpatient glycemic care would be more likely to reach glycemic control. Results: We found that out of 81,633 patients, 28,716 (35.2%) were linked to care via medication prescription. Women and younger patients were slightly less likely to be linked to care than their male and older counterparts, while Black patients (n=19,141, 23.4% of diagnosed population vs n=6741, 23.5% of the linked population) were as proportionately part of the linked population as White patients (n=58,291, 71.4% of diagnosed population vs n=20,402, 71.0% of the linked population). Those living in underserved communities (ie, the Promise Zone) and uninsured patients were slightly overrepresented (n=6789, 8.3% of diagnosed population vs n=2773, 9.7% of the linked population) in the linked population as compared to patients living in wealthier zip codes and those who were insured. Similar patterns were observed among those more likely to reach glycemic control via HbA1c. However, conclusions are limited by the relatively large amount of missing glycemic data. Conclusions: We conclude that inpatient EHR data do not adequately capture the care cascade as defined in the outpatient setting. In particular, missing data in this setting may preclude assessment of glycemic control. Future work should integrate inpatient and outpatient data sources to complete the picture of diabetes care. UR - https://diabetes.jmir.org/2022/1/e27486 UR - http://dx.doi.org/10.2196/27486 UR - http://www.ncbi.nlm.nih.gov/pubmed/35333182 ID - info:doi/10.2196/27486 ER - TY - JOUR AU - Liu, Nan AU - Xie, Feng AU - Siddiqui, Javaid Fahad AU - Ho, Wah Andrew Fu AU - Chakraborty, Bibhas AU - Nadarajan, Devi Gayathri AU - Tan, Kiat Kenneth Boon AU - Ong, Hock Marcus Eng PY - 2022/3/25 TI - Leveraging Large-Scale Electronic Health Records and Interpretable Machine Learning for Clinical Decision Making at the Emergency Department: Protocol for System Development and Validation JO - JMIR Res Protoc SP - e34201 VL - 11 IS - 3 KW - electronic health records KW - machine learning KW - clinical decision making KW - emergency department N2 - Background: There is a growing demand globally for emergency department (ED) services. An increase in ED visits has resulted in overcrowding and longer waiting times. The triage process plays a crucial role in assessing and stratifying patients? risks and ensuring that the critically ill promptly receive appropriate priority and emergency treatment. A substantial amount of research has been conducted on the use of machine learning tools to construct triage and risk prediction models; however, the black box nature of these models has limited their clinical application and interpretation. Objective: In this study, we plan to develop an innovative, dynamic, and interpretable System for Emergency Risk Triage (SERT) for risk stratification in the ED by leveraging large-scale electronic health records (EHRs) and machine learning. Methods: To achieve this objective, we will conduct a retrospective, single-center study based on a large, longitudinal data set obtained from the EHRs of the largest tertiary hospital in Singapore. Study outcomes include adverse events experienced by patients, such as the need for an intensive care unit and inpatient death. With preidentified candidate variables drawn from expert opinions and relevant literature, we will apply an interpretable machine learning?based AutoScore to develop 3 SERT scores. These 3 scores can be used at different times in the ED, that is, on arrival, during ED stay, and at admission. Furthermore, we will compare our novel SERT scores with established clinical scores and previously described black box machine learning models as baselines. Receiver operating characteristic analysis will be conducted on the testing cohorts for performance evaluation. Results: The study is currently being conducted. The extracted data indicate approximately 1.8 million ED visits by over 810,000 unique patients. Modelling results are expected to be published in 2022. Conclusions: The SERT scoring system proposed in this study will be unique and innovative because of its dynamic nature and modelling transparency. If successfully validated, our proposed solution will establish a standard for data processing and modelling by taking advantage of large-scale EHRs and interpretable machine learning tools. International Registered Report Identifier (IRRID): DERR1-10.2196/34201 UR - https://www.researchprotocols.org/2022/3/e34201 UR - http://dx.doi.org/10.2196/34201 UR - http://www.ncbi.nlm.nih.gov/pubmed/35333179 ID - info:doi/10.2196/34201 ER - TY - JOUR AU - Bukten, Anne AU - Lokdam, Toresen Nicoline AU - Skjærvø, Ingeborg AU - Ugelvik, Thomas AU - Skurtveit, Svetlana AU - Gabrhelík, Roman AU - Skardhamar, Torbjørn AU - Lund, Olea Ingunn AU - Havnes, Amalia Ingrid AU - Rognli, Borger Eline AU - Chang, Zheng AU - Fazel, Seena AU - Friestad, Christine AU - Hesse, Morten AU - Lothe, Johan AU - Ploeg, Gerhard AU - Dirkzwager, E. Anja J. AU - Clausen, Thomas AU - Tjagvad, Christian AU - Stavseth, Riksheim Marianne PY - 2022/3/23 TI - PriSUD-Nordic?Diagnosing and Treating Substance Use Disorders in the Prison Population: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e35182 VL - 11 IS - 3 KW - substance use disorders KW - prison KW - criminal justice KW - epidemiology KW - mixed methods KW - harm reduction KW - treatment N2 - Background: A large proportion of the prison population experiences substance use disorders (SUDs), which are associated with poor physical and mental health, social marginalization, and economic disadvantage. Despite the global situation characterized by the incarceration of large numbers of people with SUD and the health problems associated with SUD, people in prison are underrepresented in public health research. Objective: The overall objective of the PriSUD (Diagnosing and Treating Substance Use Disorders in Prison)-Nordic project is to develop new knowledge that will contribute to better mental and physical health, improved quality of life, and better life expectancies among people with SUD in prison. Methods: PriSUD-Nordic is based on a multidisciplinary mixed method approach, including the methodological perspectives of both quantitative and qualitative methods. The qualitative part includes ethnographic fieldwork and semistructured interviews. The quantitative part is a registry-based cohort study including national registry data from Norway, Denmark, and Sweden. The national prison cohorts will comprise approximately 500,000 individuals and include all people imprisoned in Norway, Sweden, and Demark during the period from 2000 to 2019. The project will investigate the prison population during three different time periods: before imprisonment, during imprisonment, and after release. Results: PriSUD-Nordic was funded by The Research Council of Norway in December 2019, and funding started in 2020. Data collection is ongoing and will be completed in the first quarter of 2022. Data will be analyzed in spring 2022 and the results will be disseminated in 2022-2023. The PriSUD-Nordic project has formal ethical approval related to all work packages. Conclusions: PriSUD-Nordic will be the first research project to investigate the epidemiology and the lived experiences of people with SUD in the Nordic prison population. Successful research in this field will have the potential to identify significant areas of benefit and will have important implications for ongoing policy related to interventions for SUD in the prison population. International Registered Report Identifier (IRRID): DERR1-10.2196/35182 UR - https://www.researchprotocols.org/2022/3/e35182 UR - http://dx.doi.org/10.2196/35182 UR - http://www.ncbi.nlm.nih.gov/pubmed/35320114 ID - info:doi/10.2196/35182 ER - TY - JOUR AU - Zirikly, Ayah AU - Desmet, Bart AU - Newman-Griffis, Denis AU - Marfeo, E. Elizabeth AU - McDonough, Christine AU - Goldman, Howard AU - Chan, Leighton PY - 2022/3/18 TI - Information Extraction Framework for Disability Determination Using a Mental Functioning Use-Case JO - JMIR Med Inform SP - e32245 VL - 10 IS - 3 KW - natural language processing KW - text mining KW - bioinformatics KW - health informatics KW - machine learning KW - disability KW - mental health KW - functioning KW - NLP KW - electronic health record KW - framework KW - EHR KW - automation KW - eHealth KW - decision support KW - functional status KW - whole-person function UR - https://medinform.jmir.org/2022/3/e32245 UR - http://dx.doi.org/10.2196/32245 UR - http://www.ncbi.nlm.nih.gov/pubmed/35302510 ID - info:doi/10.2196/32245 ER - TY - JOUR AU - Hek, Karin AU - Rolfes, Leàn AU - van Puijenbroek, P. Eugène AU - Flinterman, E. Linda AU - Vorstenbosch, Saskia AU - van Dijk, Liset AU - Verheij, A. Robert PY - 2022/3/16 TI - Electronic Health Record?Triggered Research Infrastructure Combining Real-world Electronic Health Record Data and Patient-Reported Outcomes to Detect Benefits, Risks, and Impact of Medication: Development Study JO - JMIR Med Inform SP - e33250 VL - 10 IS - 3 KW - adverse drug reaction KW - general practice KW - patient-reported outcome KW - electronic health record KW - overactive bladder KW - research infrastructure KW - learning health systems N2 - Background: Real-world data from electronic health records (EHRs) represent a wealth of information for studying the benefits and risks of medical treatment. However, they are limited in scope and should be complemented by information from the patient perspective. Objective: The aim of this study is to develop an innovative research infrastructure that combines information from EHRs with patient experiences reported in questionnaires to monitor the risks and benefits of medical treatment. Methods: We focused on the treatment of overactive bladder (OAB) in general practice as a use case. To develop the Benefit, Risk, and Impact of Medication Monitor (BRIMM) infrastructure, we first performed a requirement analysis. BRIMM?s starting point is routinely recorded general practice EHR data that are sent to the Dutch Nivel Primary Care Database weekly. Patients with OAB were flagged weekly on the basis of diagnoses and prescriptions. They were invited subsequently for participation by their general practitioner (GP), via a trusted third party. Patients received a series of questionnaires on disease status, pharmacological and nonpharmacological treatments, adverse drug reactions, drug adherence, and quality of life. The questionnaires and a dedicated feedback portal were developed in collaboration with a patient association for pelvic-related diseases, Bekkenbodem4All. Participating patients and GPs received feedback. An expert meeting was organized to assess the strengths, weaknesses, opportunities, and threats of the new research infrastructure. Results: The BRIMM infrastructure was developed and implemented. In the Nivel Primary Care Database, 2933 patients with OAB from 27 general practices were flagged. GPs selected 1636 (55.78%) patients who were eligible for the study, of whom 295 (18.0% of eligible patients) completed the first questionnaire. A total of 288 (97.6%) patients consented to the linkage of their questionnaire data with their EHR data. According to experts, the strengths of the infrastructure were the linkage of patient-reported outcomes with EHR data, comparison of pharmacological and nonpharmacological treatments, flexibility of the infrastructure, and low registration burden for GPs. Methodological weaknesses, such as susceptibility to bias, patient selection, and low participation rates among GPs and patients, were seen as weaknesses and threats. Opportunities represent usefulness for policy makers and health professionals, conditional approval of medication, data linkage to other data sources, and feedback to patients. Conclusions: The BRIMM research infrastructure has the potential to assess the benefits and safety of (medical) treatment in real-life situations using a unique combination of EHRs and patient-reported outcomes. As patient involvement is an important aspect of the treatment process, generating knowledge from clinical and patient perspectives is valuable for health care providers, patients, and policy makers. The developed methodology can easily be applied to other treatments and health problems. UR - https://medinform.jmir.org/2022/3/e33250 UR - http://dx.doi.org/10.2196/33250 UR - http://www.ncbi.nlm.nih.gov/pubmed/35293877 ID - info:doi/10.2196/33250 ER - TY - JOUR AU - Moghisi, Reihaneh AU - El Morr, Christo AU - Pace, T. Kenneth AU - Hajiha, Mohammad AU - Huang, Jimmy PY - 2022/3/16 TI - A Machine Learning Approach to Predict the Outcome of Urinary Calculi Treatment Using Shock Wave Lithotripsy: Model Development and Validation Study JO - Interact J Med Res SP - e33357 VL - 11 IS - 1 KW - lithotripsy KW - urolithiasis KW - machine learning KW - treatment outcome KW - ensemble learning KW - AdaBoost KW - renal stones KW - kidney disease N2 - Background: Shock wave lithotripsy (SWL), ureteroscopy, and percutaneous nephrolithotomy are established treatments for renal stones. Historically, SWL has been a predominant and commonly used procedure for treating upper tract renal stones smaller than 20 mm in diameter due to its noninvasive nature. However, the reported failure rate of SWL after one treatment session ranges from 30% to 89%. The failure rate can be reduced by identifying candidates likely to benefit from SWL and manage patients who are likely to fail SWL with other treatment modalities. This would enhance and optimize treatment results for SWL candidates. Objective: We proposed to develop a machine learning model that can predict SWL outcomes to assist practitioners in the decision-making process when considering patients for stone treatment. Methods: A data set including 58,349 SWL procedures performed during 31,569 patient visits for SWL to a single hospital between 1990 and 2016 was used to construct and validate the predictive model. The AdaBoost algorithm was applied to a data set with 17 predictive attributes related to patient demographics and stone characteristics, with success or failure as an outcome. The AdaBoost algorithm was also applied to a training data set. The generated model?s performance was compared to that of 5 other machine learning algorithms, namely C4.5 decision tree, naïve Bayes, Bayesian network, K-nearest neighbors, and multilayer perceptron. Results: The developed model was validated with a testing data set and performed significantly better than the models generated by the other 5 predictive algorithms. The sensitivity and specificity of the model were 0.875 and 0.653, respectively, while its positive predictive value was 0.7159 and negative predictive value was 0.839. The C-statistics of the receiver operating characteristic (ROC) analysis was 0.843, which reflects an excellent test. Conclusions:  We have developed a rigorous machine learning model to assist physicians and decision-makers to choose patients with renal stones who are most likely to have successful SWL treatment based on their demographics and stone characteristics. The proposed machine learning model can assist physicians and decision-makers in planning for SWL treatment and allow for more effective use of limited health care resources and improve patient prognoses. UR - https://www.i-jmr.org/2022/1/e33357 UR - http://dx.doi.org/10.2196/33357 UR - http://www.ncbi.nlm.nih.gov/pubmed/35293872 ID - info:doi/10.2196/33357 ER - TY - JOUR AU - Almowil, Zahra AU - Zhou, Shang-Ming AU - Brophy, Sinead AU - Croxall, Jodie PY - 2022/3/15 TI - Concept Libraries for Repeatable and Reusable Research: Qualitative Study Exploring the Needs of Users JO - JMIR Hum Factors SP - e31021 VL - 9 IS - 1 KW - electronic health records KW - record linkage KW - reproducible research KW - clinical codes KW - concept libraries N2 - Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library). Methods: This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data. Conclusions: The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library. UR - https://humanfactors.jmir.org/2022/1/e31021 UR - http://dx.doi.org/10.2196/31021 UR - http://www.ncbi.nlm.nih.gov/pubmed/35289755 ID - info:doi/10.2196/31021 ER - TY - JOUR AU - Lu, Sheng-Chieh AU - Xu, Cai AU - Nguyen, H. Chandler AU - Geng, Yimin AU - Pfob, André AU - Sidey-Gibbons, Chris PY - 2022/3/14 TI - Machine Learning?Based Short-Term Mortality Prediction Models for Patients With Cancer Using Electronic Health Record Data: Systematic Review and Critical Appraisal JO - JMIR Med Inform SP - e33182 VL - 10 IS - 3 KW - machine learning KW - cancer mortality KW - artificial intelligence KW - clinical prediction models KW - end-of-life care N2 - Background: In the United States, national guidelines suggest that aggressive cancer care should be avoided in the final months of life. However, guideline compliance currently requires clinicians to make judgments based on their experience as to when a patient is nearing the end of their life. Machine learning (ML) algorithms may facilitate improved end-of-life care provision for patients with cancer by identifying patients at risk of short-term mortality. Objective: This study aims to summarize the evidence for applying ML in ?1-year cancer mortality prediction to assist with the transition to end-of-life care for patients with cancer. Methods: We searched MEDLINE, Embase, Scopus, Web of Science, and IEEE to identify relevant articles. We included studies describing ML algorithms predicting ?1-year mortality in patients of oncology. We used the prediction model risk of bias assessment tool to assess the quality of the included studies. Results: We included 15 articles involving 110,058 patients in the final synthesis. Of the 15 studies, 12 (80%) had a high or unclear risk of bias. The model performance was good: the area under the receiver operating characteristic curve ranged from 0.72 to 0.92. We identified common issues leading to biased models, including using a single performance metric, incomplete reporting of or inappropriate modeling practice, and small sample size. Conclusions: We found encouraging signs of ML performance in predicting short-term cancer mortality. Nevertheless, no included ML algorithms are suitable for clinical practice at the current stage because of the high risk of bias and uncertainty regarding real-world performance. Further research is needed to develop ML models using the modern standards of algorithm development and reporting. UR - https://medinform.jmir.org/2022/3/e33182 UR - http://dx.doi.org/10.2196/33182 UR - http://www.ncbi.nlm.nih.gov/pubmed/35285816 ID - info:doi/10.2196/33182 ER - TY - JOUR AU - Jung, Hyesil AU - Yoo, Sooyoung AU - Kim, Seok AU - Heo, Eunjeong AU - Kim, Borham AU - Lee, Ho-Young AU - Hwang, Hee PY - 2022/3/11 TI - Patient-Level Fall Risk Prediction Using the Observational Medical Outcomes Partnership?s Common Data Model: Pilot Feasibility Study JO - JMIR Med Inform SP - e35104 VL - 10 IS - 3 KW - common data model KW - accidental falls KW - Observational Medical Outcomes Partnership KW - nursing records KW - medical informatics KW - health data KW - electronic health record KW - data model KW - prediction model KW - risk prediction KW - fall risk N2 - Background: Falls in acute care settings threaten patients? safety. Researchers have been developing fall risk prediction models and exploring risk factors to provide evidence-based fall prevention practices; however, such efforts are hindered by insufficient samples, limited covariates, and a lack of standardized methodologies that aid study replication. Objective: The objectives of this study were to (1) convert fall-related electronic health record data into the standardized Observational Medical Outcome Partnership's (OMOP) common data model format and (2) develop models that predict fall risk during 2 time periods. Methods: As a pilot feasibility test, we converted fall-related electronic health record data (nursing notes, fall risk assessment sheet, patient acuity assessment sheet, and clinical observation sheet) into standardized OMOP common data model format using an extraction, transformation, and load process. We developed fall risk prediction models for 2 time periods (within 7 days of admission and during the entire hospital stay) using 2 algorithms (least absolute shrinkage and selection operator logistic regression and random forest). Results: In total, 6277 nursing statements, 747,049,486 clinical observation sheet records, 1,554,775 fall risk scores, and 5,685,011 patient acuity scores were converted into OMOP common data model format. All our models (area under the receiver operating characteristic curve 0.692-0.726) performed better than the Hendrich II Fall Risk Model. Patient acuity score, fall history, age ?60 years, movement disorder, and central nervous system agents were the most important predictors in the logistic regression models. Conclusions: To enhance model performance further, we are currently converting all nursing records into the OMOP common data model data format, which will then be included in the models. Thus, in the near future, the performance of fall risk prediction models could be improved through the application of abundant nursing records and external validation. UR - https://medinform.jmir.org/2022/3/e35104 UR - http://dx.doi.org/10.2196/35104 UR - http://www.ncbi.nlm.nih.gov/pubmed/35275076 ID - info:doi/10.2196/35104 ER - TY - JOUR AU - Gao, Chuang AU - McGilchrist, Mark AU - Mumtaz, Shahzad AU - Hall, Christopher AU - Anderson, Ann Lesley AU - Zurowski, John AU - Gordon, Sharon AU - Lumsden, Joanne AU - Munro, Vicky AU - Wozniak, Artur AU - Sibley, Michael AU - Banks, Christopher AU - Duncan, Chris AU - Linksted, Pamela AU - Hume, Alastair AU - Stables, L. Catherine AU - Mayor, Charlie AU - Caldwell, Jacqueline AU - Wilde, Katie AU - Cole, Christian AU - Jefferson, Emily PY - 2022/3/9 TI - A National Network of Safe Havens: Scottish Perspective JO - J Med Internet Res SP - e31684 VL - 24 IS - 3 KW - electronic health records KW - Safe Haven KW - data governance UR - https://www.jmir.org/2022/3/e31684 UR - http://dx.doi.org/10.2196/31684 UR - http://www.ncbi.nlm.nih.gov/pubmed/35262495 ID - info:doi/10.2196/31684 ER - TY - JOUR AU - Yang, Ting-Ya AU - Chien, Tsair-Wei AU - Lai, Feng-Jie PY - 2022/3/9 TI - Web-Based Skin Cancer Assessment and Classification Using Machine Learning and Mobile Computerized Adaptive Testing in a Rasch Model: Development Study JO - JMIR Med Inform SP - e33006 VL - 10 IS - 3 KW - skin cancer assessment KW - computerized adaptive testing KW - naïve Bayes KW - k-nearest neighbors KW - logistic regression KW - Rasch partial credit model KW - receiver operating characteristic curve KW - mobile phone N2 - Background: Web-based computerized adaptive testing (CAT) implementation of the skin cancer (SC) risk scale could substantially reduce participant burden without compromising measurement precision. However, the CAT of SC classification has not been reported in academics thus far. Objective: We aim to build a CAT-based model using machine learning to develop an app for automatic classification of SC to help patients assess the risk at an early stage. Methods: We extracted data from a population-based Australian cohort study of SC risk (N=43,794) using the Rasch simulation scheme. All 30 feature items were calibrated using the Rasch partial credit model. A total of 1000 cases following a normal distribution (mean 0, SD 1) based on the item and threshold difficulties were simulated using three techniques of machine learning?naïve Bayes, k-nearest neighbors, and logistic regression?to compare the model accuracy in training and testing data sets with a proportion of 70:30, where the former was used to predict the latter. We calculated the sensitivity, specificity, receiver operating characteristic curve (area under the curve [AUC]), and CIs along with the accuracy and precision across the proposed models for comparison. An app that classifies the SC risk of the respondent was developed. Results: We observed that the 30-item k-nearest neighbors model yielded higher AUC values of 99% and 91% for the 700 training and 300 testing cases, respectively, than its 2 counterparts using the hold-out validation but had lower AUC values of 85% (95% CI 83%-87%) in the k-fold cross-validation and that an app that predicts SC classification for patients was successfully developed and demonstrated in this study. Conclusions: The 30-item SC prediction model, combined with the Rasch web-based CAT, is recommended for classifying SC in patients. An app we developed to help patients self-assess SC risk at an early stage is required for application in the future. UR - https://medinform.jmir.org/2022/3/e33006 UR - http://dx.doi.org/10.2196/33006 UR - http://www.ncbi.nlm.nih.gov/pubmed/35262505 ID - info:doi/10.2196/33006 ER - TY - JOUR AU - Jamieson Gilmore, Kendall AU - Bonciani, Manila AU - Vainieri, Milena PY - 2022/3/4 TI - A Comparison of Census and Cohort Sampling Models for the Longitudinal Collection of User-Reported Data in the Maternity Care Pathway: Mixed Methods Study JO - JMIR Med Inform SP - e25477 VL - 10 IS - 3 KW - longitudinal studies KW - mothers KW - pregnancy KW - survival analysis KW - patient-reported outcome measures KW - patient-reported experience measures KW - surveys KW - maternity KW - postpartum KW - online KW - digital health KW - digital collection N2 - Background: Typical measures of maternity performance remain focused on the technical elements of birth, especially pathological elements, with insufficient measurement of nontechnical measures and those collected pre- and postpartum. New technologies allow for patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) to be collected from large samples at multiple time points, which can be considered alongside existing administrative sources; however, such models are not widely implemented or evaluated. Since 2018, a longitudinal, personalized, and integrated user-reported data collection process for the maternal care pathway has been used in Tuscany, Italy. This model has been through two methodological iterations. Objective: The aim of this study was to compare and contrast two sampling models of longitudinal user-reported data for the maternity care pathway, exploring factors influencing participation, cost, and suitability of the models for different stakeholders. Methods: Data were collected by two modes: (1) ?cohort? recruitment at the birth hospital of a predetermined sample size and (2) continuous, ongoing ?census? recruitment of women at the first midwife appointment. Surveys were used to collect experiential and outcome data related to existing services. Women were included who passed 12 months after initial enrollment, meaning that they either received the surveys issued after that interval or dropped out in the intervening period. Data were collected from women in Tuscany, Italy, between September 2018 and July 2020. The total sample included 7784 individuals with 38,656 observations. The two models of longitudinal collection of user-reported data were analyzed using descriptive statistics, survival analysis, cost comparison, and a qualitative review. Results: Cohort sampling provided lower initial participation than census sampling, although very high subsequent response rates (87%) were obtained 1 year after enrollment. Census sampling had higher initial participation, but greater dropout (up to 45% at 1 year). Both models showed high response rates for online surveys. There were nonproportional dropout hazards over time. There were higher rates of dropout for women with foreign nationality (hazard ratio [HR] 1.88, P<.001), and lower rates of dropout for those who had a higher level of education (HR 0.77 and 0.61 for women completing high school and college, respectively; P<.001), were employed (HR 0.87, P=.01), in a relationship (HR 0.84, P=.04), and with previous pregnancies (HR 0.86, P=.002). The census model was initially more expensive, albeit with lower repeat costs and could become cheaper if repeated more than six times. Conclusions: The digital collection of user-reported data enables high response rates to targeted surveys in the maternity care pathway. The point at which pregnant women or mothers are recruited is relevant for response rates and sample bias. The census model of continuous enrollment and real-time data availability offers a wider set of potential benefits, but at an initially higher cost and with the requirement for more substantial data translation and managerial capacity to make use of such data. UR - https://medinform.jmir.org/2022/3/e25477 UR - http://dx.doi.org/10.2196/25477 UR - http://www.ncbi.nlm.nih.gov/pubmed/35254268 ID - info:doi/10.2196/25477 ER - TY - JOUR AU - Ip, Wui AU - Prahalad, Priya AU - Palma, Jonathan AU - Chen, H. Jonathan PY - 2022/3/3 TI - A Data-Driven Algorithm to Recommend Initial Clinical Workup for Outpatient Specialty Referral: Algorithm Development and Validation Using Electronic Health Record Data and Expert Surveys JO - JMIR Med Inform SP - e30104 VL - 10 IS - 3 KW - recommender system KW - electronic health records KW - clinical decision support KW - specialty consultation KW - machine learning KW - EHR KW - algorithm KW - algorithm development KW - algorithm validation KW - automation KW - prediction KW - patient needs N2 - Background: Millions of people have limited access to specialty care. The problem is exacerbated by ineffective specialty visits due to incomplete prereferral workup, leading to delays in diagnosis and treatment. Existing processes to guide prereferral diagnostic workup are labor-intensive (ie, building a consensus guideline between primary care doctors and specialists) and require the availability of the specialists (ie, electronic consultation). Objective: Using pediatric endocrinology as an example, we develop a recommender algorithm to anticipate patients? initial workup needs at the time of specialty referral and compare it to a reference benchmark using the most common workup orders. We also evaluate the clinical appropriateness of the algorithm recommendations. Methods: Electronic health record data were extracted from 3424 pediatric patients with new outpatient endocrinology referrals at an academic institution from 2015 to 2020. Using item co-occurrence statistics, we predicted the initial workup orders that would be entered by specialists and assessed the recommender?s performance in a holdout data set based on what the specialists actually ordered. We surveyed endocrinologists to assess the clinical appropriateness of the predicted orders and to understand the initial workup process. Results: Specialists (n=12) indicated that <50% of new patient referrals arrive with complete initial workup for common referral reasons. The algorithm achieved an area under the receiver operating characteristic curve of 0.95 (95% CI 0.95-0.96). Compared to a reference benchmark using the most common orders, precision and recall improved from 37% to 48% (P<.001) and from 27% to 39% (P<.001) for the top 4 recommendations, respectively. The top 4 recommendations generated for common referral conditions (abnormal thyroid studies, obesity, amenorrhea) were considered clinically appropriate the majority of the time by specialists surveyed and practice guidelines reviewed. Conclusions:  An item association?based recommender algorithm can predict appropriate specialists? workup orders with high discriminatory accuracy. This could support future clinical decision support tools to increase effectiveness and access to specialty referrals. Our study demonstrates important first steps toward a data-driven paradigm for outpatient specialty consultation with a tier of automated recommendations that proactively enable initial workup that would otherwise be delayed by awaiting an in-person visit. UR - https://medinform.jmir.org/2022/3/e30104 UR - http://dx.doi.org/10.2196/30104 UR - http://www.ncbi.nlm.nih.gov/pubmed/35238788 ID - info:doi/10.2196/30104 ER - TY - JOUR AU - Wang, Liya AU - Qiu, Hang AU - Luo, Li AU - Zhou, Li PY - 2022/2/25 TI - Age- and Sex-Specific Differences in Multimorbidity Patterns and Temporal Trends on Assessing Hospital Discharge Records in Southwest China: Network-Based Study JO - J Med Internet Res SP - e27146 VL - 24 IS - 2 KW - multimorbidity pattern KW - temporal trend KW - network analysis KW - multimorbidity prevalence KW - administrative data KW - longitudinal study KW - regional research N2 - Background: Multimorbidity represents a global health challenge, which requires a more global understanding of multimorbidity patterns and trends. However, the majority of studies completed to date have often relied on self-reported conditions, and a simultaneous assessment of the entire spectrum of chronic disease co-occurrence, especially in developing regions, has not yet been performed. Objective: We attempted to provide a multidimensional approach to understand the full spectrum of chronic disease co-occurrence among general inpatients in southwest China, in order to investigate multimorbidity patterns and temporal trends, and assess their age and sex differences. Methods: We conducted a retrospective cohort analysis based on 8.8 million hospital discharge records of about 5.0 million individuals of all ages from 2015 to 2019 in a megacity in southwest China. We examined all chronic diagnoses using the ICD-10 (International Classification of Diseases, 10th revision) codes at 3 digits and focused on chronic diseases with ?1% prevalence for each of the age and sex strata, which resulted in a total of 149 and 145 chronic diseases in males and females, respectively. We constructed multimorbidity networks in the general population based on sex and age, and used the cosine index to measure the co-occurrence of chronic diseases. Then, we divided the networks into communities and assessed their temporal trends. Results: The results showed complex interactions among chronic diseases, with more intensive connections among males and inpatients ?40 years old. A total of 9 chronic diseases were simultaneously classified as central diseases, hubs, and bursts in the multimorbidity networks. Among them, 5 diseases were common to both males and females, including hypertension, chronic ischemic heart disease, cerebral infarction, other cerebrovascular diseases, and atherosclerosis. The earliest leaps (degree leaps ?6) appeared at a disorder of glycoprotein metabolism that happened at 25-29 years in males, about 15 years earlier than in females. The number of chronic diseases in the community increased over time, but the new entrants did not replace the root of the community. Conclusions: Our multimorbidity network analysis identified specific differences in the co-occurrence of chronic diagnoses by sex and age, which could help in the design of clinical interventions for inpatient multimorbidity. UR - https://www.jmir.org/2022/2/e27146 UR - http://dx.doi.org/10.2196/27146 UR - http://www.ncbi.nlm.nih.gov/pubmed/35212632 ID - info:doi/10.2196/27146 ER - TY - JOUR AU - Ellis, A. Louise AU - Sarkies, Mitchell AU - Churruca, Kate AU - Dammery, Genevieve AU - Meulenbroeks, Isabelle AU - Smith, L. Carolynn AU - Pomare, Chiara AU - Mahmoud, Zeyad AU - Zurynski, Yvonne AU - Braithwaite, Jeffrey PY - 2022/2/23 TI - The Science of Learning Health Systems: Scoping Review of Empirical Research JO - JMIR Med Inform SP - e34907 VL - 10 IS - 2 KW - learning health systems KW - learning health care systems KW - implementation science KW - evaluation KW - health system KW - health care system KW - empirical research KW - medical informatics KW - review N2 - Background: The development and adoption of a learning health system (LHS) has been proposed as a means to address key challenges facing current and future health care systems. The first review of the LHS literature was conducted 5 years ago, identifying only a small number of published papers that had empirically examined the implementation or testing of an LHS. It is timely to look more closely at the published empirical research and to ask the question, Where are we now? 5 years on from that early LHS review. Objective: This study performed a scoping review of empirical research within the LHS domain. Taking an ?implementation science? lens, the review aims to map out the empirical research that has been conducted to date, identify limitations, and identify future directions for the field. Methods: Two academic databases (PubMed and Scopus) were searched using the terms ?learning health* system*? for papers published between January 1, 2016, to January 31, 2021, that had an explicit empirical focus on LHSs. Study information was extracted relevant to the review objective, including each study?s publication details; primary concern or focus; context; design; data type; implementation framework, model, or theory used; and implementation determinants or outcomes examined. Results: A total of 76 studies were included in this review. Over two-thirds of the studies were concerned with implementing a particular program, system, or platform (53/76, 69.7%) designed to contribute to achieving an LHS. Most of these studies focused on a particular clinical context or patient population (37/53, 69.8%), with far fewer studies focusing on whole hospital systems (4/53, 7.5%) or on other broad health care systems encompassing multiple facilities (12/53, 22.6%). Over two-thirds of the program-specific studies utilized quantitative methods (37/53, 69.8%), with a smaller number utilizing qualitative methods (10/53, 18.9%) or mixed-methods designs (6/53, 11.3%). The remaining 23 studies were classified into 1 of 3 key areas: ethics, policies, and governance (10/76, 13.2%); stakeholder perspectives of LHSs (5/76, 6.6%); or LHS-specific research strategies and tools (8/76, 10.5%). Overall, relatively few studies were identified that incorporated an implementation science framework. Conclusions: Although there has been considerable growth in empirical applications of LHSs within the past 5 years, paralleling the recent emergence of LHS-specific research strategies and tools, there are few high-quality studies. Comprehensive reporting of implementation and evaluation efforts is an important step to moving the LHS field forward. In particular, the routine use of implementation determinant and outcome frameworks will improve the assessment and reporting of barriers, enablers, and implementation outcomes in this field and will enable comparison and identification of trends across studies. UR - https://medinform.jmir.org/2022/2/e34907 UR - http://dx.doi.org/10.2196/34907 UR - http://www.ncbi.nlm.nih.gov/pubmed/35195529 ID - info:doi/10.2196/34907 ER - TY - JOUR AU - Montoto, Carmen AU - Gisbert, P. Javier AU - Guerra, Iván AU - Plaza, Rocío AU - Pajares Villarroya, Ramón AU - Moreno Almazán, Luis AU - López Martín, Carmen María Del AU - Domínguez Antonaya, Mercedes AU - Vera Mendoza, Isabel AU - Aparicio, Jesús AU - Martínez, Vicente AU - Tagarro, Ignacio AU - Fernandez-Nistal, Alonso AU - Canales, Lea AU - Menke, Sebastian AU - Gomollón, Fernando AU - PY - 2022/2/18 TI - Evaluation of Natural Language Processing for the Identification of Crohn Disease?Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project JO - JMIR Med Inform SP - e30345 VL - 10 IS - 2 KW - natural language processing KW - linguistic validation KW - artificial intelligence KW - electronic health records KW - Crohn disease KW - inflammatory bowel disease N2 - Background: The exploration of clinically relevant information in the free text of electronic health records (EHRs) holds the potential to positively impact clinical practice as well as knowledge regarding Crohn disease (CD), an inflammatory bowel disease that may affect any segment of the gastrointestinal tract. The EHRead technology, a clinical natural language processing (cNLP) system, was designed to detect and extract clinical information from narratives in the clinical notes contained in EHRs. Objective: The aim of this study is to validate the performance of the EHRead technology in identifying information of patients with CD. Methods: We used the EHRead technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the output of the EHRead technology with a manually curated gold standard to assess the quality of our cNLP system in detecting records containing any reference to CD and its related variables. Results: The validation metrics for the main variable (CD) were a precision of 0.88, a recall of 0.98, and an F1 score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71, and an F1 score of 0.80 for CD flare, while for the variable vedolizumab (treatment), a precision, recall, and F1 score of 0.86, 0.94, and 0.90 were obtained, respectively. Conclusions: This evaluation demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of EHRs. To the best of our knowledge, this study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish. UR - https://medinform.jmir.org/2022/2/e30345 UR - http://dx.doi.org/10.2196/30345 UR - http://www.ncbi.nlm.nih.gov/pubmed/35179507 ID - info:doi/10.2196/30345 ER - TY - JOUR AU - Kilgallon, L. John AU - Tewarie, Ashwini Ishaan AU - Broekman, D. Marike L. AU - Rana, Aakanksha AU - Smith, R. Timothy PY - 2022/2/15 TI - Passive Data Use for Ethical Digital Public Health Surveillance in a Postpandemic World JO - J Med Internet Res SP - e30524 VL - 24 IS - 2 KW - passive data KW - public health surveillance KW - digital public health surveillance KW - pandemic response KW - data privacy KW - digital phenotyping KW - smartphone KW - mobile phone KW - mHealth KW - digital health KW - informed consent KW - data equity KW - data ownership UR - https://www.jmir.org/2022/2/e30524 UR - http://dx.doi.org/10.2196/30524 UR - http://www.ncbi.nlm.nih.gov/pubmed/35166676 ID - info:doi/10.2196/30524 ER - TY - JOUR AU - Matsui, Hiroki AU - Yamana, Hayato AU - Fushimi, Kiyohide AU - Yasunaga, Hideo PY - 2022/2/11 TI - Development of Deep Learning Models for Predicting In-Hospital Mortality Using an Administrative Claims Database: Retrospective Cohort Study JO - JMIR Med Inform SP - e27936 VL - 10 IS - 2 KW - prognostic model KW - deep learning KW - real-world data KW - acute care KW - claims data KW - myocardial infarction KW - heart failure KW - stroke KW - pneumonia N2 - Background: Administrative claims databases have been used widely in studies because they have large sample sizes and are easily available. However, studies using administrative databases lack information on disease severity, so a risk adjustment method needs to be developed. Objective: We aimed to develop and validate deep learning?based prediction models for in-hospital mortality of acute care patients. Methods: The main model was developed using only administrative claims data (age, sex, diagnoses, and procedures on the day of admission). We also constructed disease-specific models for acute myocardial infarction, heart failure, stroke, and pneumonia using common severity indices for these diseases. Using the Japanese Diagnosis Procedure Combination data from July 2010 to March 2017, we identified 46,665,933 inpatients and divided them into derivation and validation cohorts in a ratio of 95:5. The main model was developed using a 9-layer deep neural network with 4 hidden dense layers that had 1000 nodes and were fully connected to adjacent layers. We evaluated model discrimination ability by an area under the receiver operating characteristic curve (AUC) and calibration ability by calibration plot. Results: Among the eligible patients, 2,005,035 (4.3%) died. Discrimination and calibration of the models were satisfactory. The AUC of the main model in the validation cohort was 0.954 (95% CI 0.954-0.955). The main model had higher discrimination ability than the disease-specific models. Conclusions: Our deep learning?based model using diagnoses and procedures produced valid predictions of in-hospital mortality. UR - https://medinform.jmir.org/2022/2/e27936 UR - http://dx.doi.org/10.2196/27936 UR - http://www.ncbi.nlm.nih.gov/pubmed/34997958 ID - info:doi/10.2196/27936 ER - TY - JOUR AU - Shara, Nawar AU - Anderson, M. Kelley AU - Falah, Noor AU - Ahmad, F. Maryam AU - Tavazoei, Darya AU - Hughes, M. Justin AU - Talmadge, Bethany AU - Crovatt, Samantha AU - Dempers, Ramon PY - 2022/2/10 TI - Early Identification of Maternal Cardiovascular Risk Through Sourcing and Preparing Electronic Health Record Data: Machine Learning Study JO - JMIR Med Inform SP - e34932 VL - 10 IS - 2 KW - electronic health record KW - maternal health KW - machine learning KW - maternal morbidity and mortality KW - cardiovascular risk KW - data transformation KW - extract KW - transform KW - load KW - artificial intelligence KW - electronic medical record N2 - Background: Health care data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes. However, the differences that exist in each individual?s health records, combined with the lack of health data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. Although these problems exist throughout health care, they are especially prevalent within maternal health and exacerbate the maternal morbidity and mortality crisis in the United States. Objective: This study aims to demonstrate that patient records extracted from the electronic health records (EHRs) of a large tertiary health care system can be made actionable for the goal of effectively using ML to identify maternal cardiovascular risk before evidence of diagnosis or intervention within the patient?s record. Maternal patient records were extracted from the EHRs of a large tertiary health care system and made into patient-specific, complete data sets through a systematic method. Methods: We outline the effort that was required to define the specifications of the computational systems, the data set, and access to relevant systems, while ensuring that data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for their use by a proprietary risk stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient?s baselines to inform early interventions. Results: Patient records can be made actionable for the goal of effectively using ML, specifically to identify cardiovascular risk in pregnant patients. Conclusions: Upon acquiring data, including their concatenation, anonymization, and normalization across multiple EHRs, the use of an ML-based tool can provide early identification of cardiovascular risk in pregnant patients. UR - https://medinform.jmir.org/2022/2/e34932 UR - http://dx.doi.org/10.2196/34932 UR - http://www.ncbi.nlm.nih.gov/pubmed/35142637 ID - info:doi/10.2196/34932 ER - TY - JOUR AU - Pi?ulin, Matej AU - Smole, Tim AU - ?unkovi?, Bojan AU - Kokalj, Enja AU - Robnik-?ikonja, Marko AU - Kukar, Matja? AU - Fotiadis, I. Dimitrios AU - Pezoulas, C. Vasileios AU - Tachos, S. Nikolaos AU - Barlocco, Fausto AU - Mazzarotto, Francesco AU - Popovi?, Dejana AU - Maier, S. Lars AU - Velicki, Lazar AU - Olivotto, Iacopo AU - MacGowan, A. Guy AU - Jakovljevi?, G. Djordje AU - Filipovi?, Nenad AU - Bosni?, Zoran PY - 2022/2/2 TI - Disease Progression of Hypertrophic Cardiomyopathy: Modeling Using Machine Learning JO - JMIR Med Inform SP - e30483 VL - 10 IS - 2 KW - hypertrophic cardiomyopathy KW - disease progression KW - machine learning KW - artificial intelligence KW - AI KW - ML KW - cardiomyopathy KW - cardiovascular disease KW - sudden cardiac death KW - SCD KW - prediction KW - prediction model KW - validation N2 - Background: Cardiovascular disorders in general are responsible for 30% of deaths worldwide. Among them, hypertrophic cardiomyopathy (HCM) is a genetic cardiac disease that is present in about 1 of 500 young adults and can cause sudden cardiac death (SCD). Objective: Although the current state-of-the-art methods model the risk of SCD for patients, to the best of our knowledge, no methods are available for modeling the patient's clinical status up to 10 years ahead. In this paper, we propose a novel machine learning (ML)-based tool for predicting disease progression for patients diagnosed with HCM in terms of adverse remodeling of the heart during a 10-year period. Methods: The method consisted of 6 predictive regression models that independently predict future values of 6 clinical characteristics: left atrial size, left atrial volume, left ventricular ejection fraction, New York Heart Association functional classification, left ventricular internal diastolic diameter, and left ventricular internal systolic diameter. We supplemented each prediction with the explanation that is generated using the Shapley additive explanation method. Results: The final experiments showed that predictive error is lower on 5 of the 6 constructed models in comparison to experts (on average, by 0.34) or a consortium of experts (on average, by 0.22). The experiments revealed that semisupervised learning and the artificial data from virtual patients help improve predictive accuracies. The best-performing random forest model improved R2 from 0.3 to 0.6. Conclusions: By engaging medical experts to provide interpretation and validation of the results, we determined the models' favorable performance compared to the performance of experts for 5 of 6 targets. UR - https://medinform.jmir.org/2022/2/e30483 UR - http://dx.doi.org/10.2196/30483 UR - http://www.ncbi.nlm.nih.gov/pubmed/35107432 ID - info:doi/10.2196/30483 ER - TY - JOUR AU - Liu, Yun-Chung AU - Cheng, Hao-Yuan AU - Chang, Tu-Hsuan AU - Ho, Te-Wei AU - Liu, Ting-Chi AU - Yen, Ting-Yu AU - Chou, Chia-Ching AU - Chang, Luan-Yin AU - Lai, Feipei PY - 2022/1/27 TI - Evaluation of the Need for Intensive Care in Children With Pneumonia: Machine Learning Approach JO - JMIR Med Inform SP - e28934 VL - 10 IS - 1 KW - child pneumonia KW - intensive care KW - machine learning KW - decision making KW - clinical index N2 - Background: Timely decision-making regarding intensive care unit (ICU) admission for children with pneumonia is crucial for a better prognosis. Despite attempts to establish a guideline or triage system for evaluating ICU care needs, no clinically applicable paradigm is available. Objective: The aim of this study was to develop machine learning (ML) algorithms to predict ICU care needs for pediatric pneumonia patients within 24 hours of admission, evaluate their performance, and identify clinical indices for making decisions for pediatric pneumonia patients. Methods: Pneumonia patients admitted to National Taiwan University Hospital from January 2010 to December 2019 aged under 18 years were enrolled. Their underlying diseases, clinical manifestations, and laboratory data at admission were collected. The outcome of interest was ICU transfer within 24 hours of hospitalization. We compared clinically relevant features between early ICU transfer patients and patients without ICU care. ML algorithms were developed to predict ICU admission. The performance of the algorithms was evaluated using sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and average precision. The relative feature importance of the best-performing algorithm was compared with physician-rated feature importance for explainability. Results: A total of 8464 pediatric hospitalizations due to pneumonia were recorded, and 1166 (1166/8464, 13.8%) hospitalized patients were transferred to the ICU within 24 hours. Early ICU transfer patients were younger (P<.001), had higher rates of underlying diseases (eg, cardiovascular, neuropsychological, and congenital anomaly/genetic disorders; P<.001), had abnormal laboratory data, had higher pulse rates (P<.001), had higher breath rates (P<.001), had lower oxygen saturation (P<.001), and had lower peak body temperature (P<.001) at admission than patients without ICU transfer. The random forest (RF) algorithm achieved the best performance (sensitivity 0.94, 95% CI 0.92-0.95; specificity 0.94, 95% CI 0.92-0.95; AUC 0.99, 95% CI 0.98-0.99; and average precision 0.93, 95% CI 0.90-0.96). The lowest systolic blood pressure and presence of cardiovascular and neuropsychological diseases ranked in the top 10 in both RF relative feature importance and clinician judgment. Conclusions: The ML approach could provide a clinically applicable triage algorithm and identify important clinical indices, such as age, underlying diseases, abnormal vital signs, and laboratory data for evaluating the need for intensive care in children with pneumonia. UR - https://medinform.jmir.org/2022/1/e28934 UR - http://dx.doi.org/10.2196/28934 UR - http://www.ncbi.nlm.nih.gov/pubmed/35084358 ID - info:doi/10.2196/28934 ER - TY - JOUR AU - Facile, Rhonda AU - Muhlbradt, Elizabeth Erin AU - Gong, Mengchun AU - Li, Qingna AU - Popat, Vaishali AU - Pétavy, Frank AU - Cornet, Ronald AU - Ruan, Yaoping AU - Koide, Daisuke AU - Saito, I. Toshiki AU - Hume, Sam AU - Rockhold, Frank AU - Bao, Wenjun AU - Dubman, Sue AU - Jauregui Wurst, Barbara PY - 2022/1/27 TI - Use of Clinical Data Interchange Standards Consortium (CDISC) Standards for Real-world Data: Expert Perspectives From a Qualitative Delphi Survey JO - JMIR Med Inform SP - e30363 VL - 10 IS - 1 KW - real-world data KW - real-world evidence KW - clinical trials KW - Delphi survey KW - clinical data standards KW - regulatory submission KW - academic research KW - public health data KW - registry data KW - electronic health records KW - observational data KW - data integration KW - FAIR principles N2 - Background: Real-world data (RWD) and real-world evidence (RWE) are playing increasingly important roles in clinical research and health care decision-making. To leverage RWD and generate reliable RWE, data should be well defined and structured in a way that is semantically interoperable and consistent across stakeholders. The adoption of data standards is one of the cornerstones supporting high-quality evidence for the development of clinical medicine and therapeutics. Clinical Data Interchange Standards Consortium (CDISC) data standards are mature, globally recognized, and heavily used by the pharmaceutical industry for regulatory submissions. The CDISC RWD Connect Initiative aims to better understand the barriers to implementing CDISC standards for RWD and to identify the tools and guidance needed to more easily implement them. Objective: The aim of this study is to understand the barriers to implementing CDISC standards for RWD and to identify the tools and guidance that may be needed to implement CDISC standards more easily for this purpose. Methods: We conducted a qualitative Delphi survey involving an expert advisory board with multiple key stakeholders, with 3 rounds of input and review. Results: Overall, 66 experts participated in round 1, 56 in round 2, and 49 in round 3 of the Delphi survey. Their inputs were collected and analyzed, culminating in group statements. It was widely agreed that the standardization of RWD is highly necessary, and the primary focus should be on its ability to improve data sharing and the quality of RWE. The priorities for RWD standardization included electronic health records, such as data shared using Health Level 7 Fast Health care Interoperability Resources (FHIR), and the data stemming from observational studies. With different standardization efforts already underway in these areas, a gap analysis should be performed to identify the areas where synergies and efficiencies are possible and then collaborate with stakeholders to create or extend existing mappings between CDISC and other standards, controlled terminologies, and models to represent data originating across different sources. Conclusions: There are many ongoing data standardization efforts around human health data?related activities, each with different definitions, levels of granularity, and purpose. Among these, CDISC has been successful in standardizing clinical trial-based data for regulation worldwide. However, the complexity of the CDISC standards and the fact that they were developed for different purposes, combined with the lack of awareness and incentives to use a new standard and insufficient training and implementation support, are significant barriers to setting up the use of CDISC standards for RWD. The collection and dissemination of use cases, development of tools and support systems for the RWD community, and collaboration with other standards development organizations are potential steps forward. Using CDISC will help link clinical trial data and RWD and promote innovation in health data science. UR - https://medinform.jmir.org/2022/1/e30363 UR - http://dx.doi.org/10.2196/30363 UR - http://www.ncbi.nlm.nih.gov/pubmed/35084343 ID - info:doi/10.2196/30363 ER - TY - JOUR AU - Triep, Karen AU - Leichtle, Benedikt Alexander AU - Meister, Martin AU - Fiedler, Martin Georg AU - Endrich, Olga PY - 2022/1/25 TI - Real-world Health Data and Precision for the Diagnosis of Acute Kidney Injury, Acute-on-Chronic Kidney Disease, and Chronic Kidney Disease: Observational Study JO - JMIR Med Inform SP - e31356 VL - 10 IS - 1 KW - acute kidney injury KW - chronic kidney disease KW - acute-on-chronic KW - real-world health data KW - clinical decision support KW - KDIGO KW - ICD coding N2 - Background: The criteria for the diagnosis of kidney disease outlined in the Kidney Disease: Improving Global Outcomes guidelines are based on a patient?s current, historical, and baseline data. The diagnosis of acute kidney injury, chronic kidney disease, and acute-on-chronic kidney disease requires previous measurements of creatinine, back-calculation, and the interpretation of several laboratory values over a certain period. Diagnoses may be hindered by unclear definitions of the individual creatinine baseline and rough ranges of normal values that are set without adjusting for age, ethnicity, comorbidities, and treatment. The classification of correct diagnoses and sufficient staging improves coding, data quality, reimbursement, the choice of therapeutic approach, and a patient?s outcome. Objective: In this study, we aim to apply a data-driven approach to assign diagnoses of acute, chronic, and acute-on-chronic kidney diseases with the help of a complex rule engine. Methods: Real-time and retrospective data from the hospital?s clinical data warehouse of inpatient and outpatient cases treated between 2014 and 2019 were used. Delta serum creatinine, baseline values, and admission and discharge data were analyzed. A Kidney Disease: Improving Global Outcomes?based SQL algorithm applied specific diagnosis-based International Classification of Diseases (ICD) codes to inpatient stays. Text mining on discharge documentation was also conducted to measure the effects on diagnosis. Results: We show that this approach yielded an increased number of diagnoses (4491 cases in 2014 vs 11,124 cases of ICD-coded kidney disease and injury in 2019) and higher precision in documentation and coding. The percentage of unspecific ICD N19-coded diagnoses of N19 codes generated dropped from 19.71% (1544/7833) in 2016 to 4.38% (416/9501) in 2019. The percentage of specific ICD N18-coded diagnoses of N19 codes generated increased from 50.1% (3924/7833) in 2016 to 62.04% (5894/9501) in 2019. Conclusions: Our data-driven method supports the process and reliability of diagnosis and staging and improves the quality of documentation and data. Measuring patient outcomes will be the next step in this project. UR - https://medinform.jmir.org/2022/1/e31356 UR - http://dx.doi.org/10.2196/31356 UR - http://www.ncbi.nlm.nih.gov/pubmed/35076410 ID - info:doi/10.2196/31356 ER - TY - JOUR AU - Yu, Jia-Ruei AU - Chen, Chun-Hsien AU - Huang, Tsung-Wei AU - Lu, Jang-Jih AU - Chung, Chia-Ru AU - Lin, Ting-Wei AU - Wu, Min-Hsien AU - Tseng, Yi-Ju AU - Wang, Hsin-Yao PY - 2022/1/25 TI - Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study JO - J Med Internet Res SP - e28036 VL - 24 IS - 1 KW - medical informatics KW - machine learning KW - algorithms KW - energy consumption KW - artificial intelligence KW - energy efficient KW - medical domain KW - medical data sets KW - informatics N2 - Background: The use of artificial intelligence (AI) in the medical domain has attracted considerable research interest. Inference applications in the medical domain require energy-efficient AI models. In contrast to other types of data in visual AI, data from medical laboratories usually comprise features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiency levels of different AI models used for medical applications have not been studied. Objective: The aim of this study was to explore and compare the energy efficiency levels of commonly used machine learning algorithms?logistic regression (LR), k-nearest neighbor, support vector machine, random forest (RF), and extreme gradient boosting (XGB) algorithms, as well as four different variants of neural network (NN) algorithms?when applied to clinical laboratory datasets. Methods: We applied the aforementioned algorithms to two distinct clinical laboratory data sets: a mass spectrometry data set regarding Staphylococcus aureus for predicting methicillin resistance (3338 cases; 268 features) and a urinalysis data set for predicting Trichomonas vaginalis infection (839,164 cases; 9 features). We compared the performance of the nine inference algorithms in terms of accuracy, area under the receiver operating characteristic curve (AUROC), time consumption, and power consumption. The time and power consumption levels were determined using performance counter data from Intel Power Gadget 3.5. Results: The experimental results indicated that the RF and XGB algorithms achieved the two highest AUROC values for both data sets (84.7% and 83.9%, respectively, for the mass spectrometry data set; 91.1% and 91.4%, respectively, for the urinalysis data set). The XGB and LR algorithms exhibited the shortest inference time for both data sets (0.47 milliseconds for both in the mass spectrometry data set; 0.39 and 0.47 milliseconds, respectively, for the urinalysis data set). Compared with the RF algorithm, the XGB and LR algorithms exhibited a 45% and 53%-60% reduction in inference time for the mass spectrometry and urinalysis data sets, respectively. In terms of energy efficiency, the XGB algorithm exhibited the lowest power consumption for the mass spectrometry data set (9.42 Watts) and the LR algorithm exhibited the lowest power consumption for the urinalysis data set (9.98 Watts). Compared with a five-hidden-layer NN, the XGB and LR algorithms achieved 16%-24% and 9%-13% lower power consumption levels for the mass spectrometry and urinalysis data sets, respectively. In all experiments, the XGB algorithm exhibited the best performance in terms of accuracy, run time, and energy efficiency. Conclusions: The XGB algorithm achieved balanced performance levels in terms of AUROC, run time, and energy efficiency for the two clinical laboratory data sets. Considering the energy constraints in real-world scenarios, the XGB algorithm is ideal for medical AI applications. UR - https://www.jmir.org/2022/1/e28036 UR - http://dx.doi.org/10.2196/28036 UR - http://www.ncbi.nlm.nih.gov/pubmed/35076405 ID - info:doi/10.2196/28036 ER - TY - JOUR AU - Kumar, Sajit AU - Nanelia, Alicia AU - Mariappan, Ragunathan AU - Rajagopal, Adithya AU - Rajan, Vaibhav PY - 2022/1/20 TI - Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study JO - JMIR Med Inform SP - e28842 VL - 10 IS - 1 KW - representation learning KW - deep collective matrix factorization KW - electronic medical records KW - knowledge graphs KW - multiview learning KW - graph embeddings KW - clinical decision support N2 - Background: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network?based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. Objective: This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. Methods: Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. Results: Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. Conclusions: Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations. UR - https://medinform.jmir.org/2022/1/e28842 UR - http://dx.doi.org/10.2196/28842 UR - http://www.ncbi.nlm.nih.gov/pubmed/35049514 ID - info:doi/10.2196/28842 ER - TY - JOUR AU - Ulrich, Hannes AU - Kock-Schoppenhauer, Ann-Kristin AU - Deppenwiese, Noemi AU - Gött, Robert AU - Kern, Jori AU - Lablans, Martin AU - Majeed, W. Raphael AU - Stöhr, R. Mark AU - Stausberg, Jürgen AU - Varghese, Julian AU - Dugas, Martin AU - Ingenerf, Josef PY - 2022/1/11 TI - Understanding the Nature of Metadata: Systematic Review JO - J Med Internet Res SP - e25440 VL - 24 IS - 1 KW - metadata KW - metadata definition KW - systematic review KW - data integration KW - data identification KW - data classification N2 - Background: Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term ?metadata? and its use is not always unambiguous. Objective: This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse. Methods: A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used. Results: The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas. Conclusions: Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context. UR - https://www.jmir.org/2022/1/e25440 UR - http://dx.doi.org/10.2196/25440 UR - http://www.ncbi.nlm.nih.gov/pubmed/35014967 ID - info:doi/10.2196/25440 ER - TY - JOUR AU - Wang, Ni AU - Wang, Muyu AU - Zhou, Yang AU - Liu, Honglei AU - Wei, Lan AU - Fei, Xiaolu AU - Chen, Hui PY - 2022/1/6 TI - Sequential Data?Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development JO - J Med Internet Res SP - e30720 VL - 24 IS - 1 KW - patient similarity KW - electronic medical records KW - time series KW - acute myocardial infarction KW - natural language processing KW - machine learning KW - deep learning KW - outcome prediction KW - informatics KW - health data N2 - Background: Sequential information in electronic medical records is valuable and helpful for patient outcome prediction but is rarely used for patient similarity measurement because of its unevenness, irregularity, and heterogeneity. Objective: We aimed to develop a patient similarity framework for patient outcome prediction that makes use of sequential and cross-sectional information in electronic medical record systems. Methods: Sequence similarity was calculated from timestamped event sequences using edit distance, and trend similarity was calculated from time series using dynamic time warping and Haar decomposition. We also extracted cross-sectional information, namely, demographic, laboratory test, and radiological report data, for additional similarity calculations. We validated the effectiveness of the framework by constructing k?nearest neighbors classifiers to predict mortality and readmission for acute myocardial infarction patients, using data from (1) a public data set and (2) a private data set, at 3 time points?at admission, on Day 7, and at discharge?to provide early warning patient outcomes. We also constructed state-of-the-art Euclidean-distance k?nearest neighbor, logistic regression, random forest, long short-term memory network, and recurrent neural network models, which were used for comparison. Results: With all available information during a hospitalization episode, predictive models using the similarity model outperformed baseline models based on both public and private data sets. For mortality predictions, all models except for the logistic regression model showed improved performances over time. There were no such increasing trends in predictive performances for readmission predictions. The random forest and logistic regression models performed best for mortality and readmission predictions, respectively, when using information from the first week after admission. Conclusions: For patient outcome predictions, the patient similarity framework facilitated sequential similarity calculations for uneven electronic medical record data and helped improve predictive performance. UR - https://www.jmir.org/2022/1/e30720 UR - http://dx.doi.org/10.2196/30720 UR - http://www.ncbi.nlm.nih.gov/pubmed/34989682 ID - info:doi/10.2196/30720 ER - TY - JOUR AU - Yao, Li-Hung AU - Leung, Ka-Chun AU - Tsai, Chu-Lin AU - Huang, Chien-Hua AU - Fu, Li-Chen PY - 2021/12/27 TI - A Novel Deep Learning?Based System for Triage in the Emergency Department Using Electronic Medical Records: Retrospective Cohort Study JO - J Med Internet Res SP - e27008 VL - 23 IS - 12 KW - emergency department KW - triage system KW - deep learning KW - hospital admission KW - data to text KW - electronic health record N2 - Background: Emergency department (ED) crowding has resulted in delayed patient treatment and has become a universal health care problem. Although a triage system, such as the 5-level emergency severity index, somewhat improves the process of ED treatment, it still heavily relies on the nurse?s subjective judgment and triages too many patients to emergency severity index level 3 in current practice. Hence, a system that can help clinicians accurately triage a patient?s condition is imperative. Objective: This study aims to develop a deep learning?based triage system using patients? ED electronic medical records to predict clinical outcomes after ED treatments. Methods: We conducted a retrospective study using data from an open data set from the National Hospital Ambulatory Medical Care Survey from 2012 to 2016 and data from a local data set from the National Taiwan University Hospital from 2009 to 2015. In this study, we transformed structured data into text form and used convolutional neural networks combined with recurrent neural networks and attention mechanisms to accomplish the classification task. We evaluated our performance using area under the receiver operating characteristic curve (AUROC). Results: A total of 118,602 patients from the National Hospital Ambulatory Medical Care Survey were included in this study for predicting hospitalization, and the accuracy and AUROC were 0.83 and 0.87, respectively. On the other hand, an external experiment was to use our own data set from the National Taiwan University Hospital that included 745,441 patients, where the accuracy and AUROC were similar, that is, 0.83 and 0.88, respectively. Moreover, to effectively evaluate the prediction quality of our proposed system, we also applied the model to other clinical outcomes, including mortality and admission to the intensive care unit, and the results showed that our proposed method was approximately 3% to 5% higher in accuracy than other conventional methods. Conclusions: Our proposed method achieved better performance than the traditional method, and its implementation is relatively easy, it includes commonly used variables, and it is better suited for real-world clinical settings. It is our future work to validate our novel deep learning?based triage algorithm with prospective clinical trials, and we hope to use it to guide resource allocation in a busy ED once the validation succeeds. UR - https://www.jmir.org/2021/12/e27008 UR - http://dx.doi.org/10.2196/27008 UR - http://www.ncbi.nlm.nih.gov/pubmed/34958305 ID - info:doi/10.2196/27008 ER - TY - JOUR AU - Chua, Horng-Ruey AU - Zheng, Kaiping AU - Vathsala, Anantharaman AU - Ngiam, Kee-Yuan AU - Yap, Hui-Kim AU - Lu, Liangjian AU - Tiong, Ho-Yee AU - Mukhopadhyay, Amartya AU - MacLaren, Graeme AU - Lim, Shir-Lynn AU - Akalya, K. AU - Ooi, Beng-Chin PY - 2021/12/24 TI - Health Care Analytics With Time-Invariant and Time-Variant Feature Importance to Predict Hospital-Acquired Acute Kidney Injury: Observational Longitudinal Study JO - J Med Internet Res SP - e30805 VL - 23 IS - 12 KW - acute kidney injury KW - artificial intelligence KW - biomarkers KW - clinical deterioration KW - electronic health records KW - hospital medicine KW - machine learning N2 - Background: Acute kidney injury (AKI) develops in 4% of hospitalized patients and is a marker of clinical deterioration and nephrotoxicity. AKI onset is highly variable in hospitals, which makes it difficult to time biomarker assessment in all patients for preemptive care. Objective: The study sought to apply machine learning techniques to electronic health records and predict hospital-acquired AKI by a 48-hour lead time, with the aim to create an AKI surveillance algorithm that is deployable in real time. Methods: The data were sourced from 20,732 case admissions in 16,288 patients over 1 year in our institution. We enhanced the bidirectional recurrent neural network model with a novel time-invariant and time-variant aggregated module to capture important clinical features temporal to AKI in every patient. Time-series features included laboratory parameters that preceded a 48-hour prediction window before AKI onset; the latter?s corresponding reference was the final in-hospital serum creatinine performed in case admissions without AKI episodes. Results: The cohort was of mean age 53 (SD 25) years, of whom 29%, 12%, 12%, and 53% had diabetes, ischemic heart disease, cancers, and baseline eGFR <90 mL/min/1.73 m2, respectively. There were 911 AKI episodes in 869 patients. We derived and validated an algorithm in the testing dataset with an AUROC of 0.81 (0.78-0.85) for predicting AKI. At a 15% prediction threshold, our model generated 699 AKI alerts with 2 false positives for every true AKI and predicted 26% of AKIs. A lowered 5% prediction threshold improved the recall to 60% but generated 3746 AKI alerts with 6 false positives for every true AKI. Representative interpretation results produced by our model alluded to the top-ranked features that predicted AKI that could be categorized in association with sepsis, acute coronary syndrome, nephrotoxicity, or multiorgan injury, specific to every case at risk. Conclusions: We generated an accurate algorithm from electronic health records through machine learning that predicted AKI by a lead time of at least 48 hours. The prediction threshold could be adjusted during deployment to optimize recall and minimize alert fatigue, while its precision could potentially be augmented by targeted AKI biomarker assessment in the high-risk cohort identified. UR - https://www.jmir.org/2021/12/e30805 UR - http://dx.doi.org/10.2196/30805 UR - http://www.ncbi.nlm.nih.gov/pubmed/34951595 ID - info:doi/10.2196/30805 ER - TY - JOUR AU - Chopard, Daphne AU - Treder, S. Matthias AU - Corcoran, Padraig AU - Ahmed, Nagheen AU - Johnson, Claire AU - Busse, Monica AU - Spasic, Irena PY - 2021/12/24 TI - Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach JO - JMIR Med Inform SP - e28632 VL - 9 IS - 12 KW - natural language processing KW - deep learning KW - machine learning KW - classification N2 - Background: Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. Objective: This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. Methods: We used the Uni?ed Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases?10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. Results: The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. Conclusions: These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion. UR - https://medinform.jmir.org/2021/12/e28632 UR - http://dx.doi.org/10.2196/28632 UR - http://www.ncbi.nlm.nih.gov/pubmed/34951601 ID - info:doi/10.2196/28632 ER - TY - JOUR AU - Paris, Nicolas AU - Lamer, Antoine AU - Parrot, Adrien PY - 2021/12/14 TI - Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study JO - JMIR Med Inform SP - e30970 VL - 9 IS - 12 KW - data reuse KW - open data KW - OMOP KW - common data model KW - critical care KW - machine learning KW - big data KW - health informatics KW - health data KW - health database KW - electronic health records KW - open access database KW - digital health KW - intensive care KW - health care N2 - Background: In the era of big data, the intensive care unit (ICU) is likely to benefit from real-time computer analysis and modeling based on close patient monitoring and electronic health record data. The Medical Information Mart for Intensive Care (MIMIC) is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools, and experience to be shared. The Observational Medical Outcomes Partnership (OMOP) CDM is spreading all over the world. Objective: The objective was to transform MIMIC into an OMOP database and to evaluate the benefits of this transformation for analysts. Methods: We transformed MIMIC (version 1.4.21) into OMOP format (version 5.3.3.1) through semantic and structural mapping. The structural mapping aimed at moving the MIMIC data into the right place in OMOP, with some data transformations. The mapping was divided into 3 phases: conception, implementation, and evaluation. The conceptual mapping aimed at aligning the MIMIC local terminologies to OMOP's standard ones. It consisted of 3 phases: integration, alignment, and evaluation. A documented, tested, versioned, exemplified, and open repository was set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48-hour datathon. Results: With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables were standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon, with 160 participants and 15,000 requests executed with a maximum duration of 1 minute. Conclusions: The resulting MIMIC-OMOP data set is the first MIMIC-OMOP data set available free of charge with real disidentified data ready for replicable intensive care research. This approach can be generalized to any medical field. UR - https://medinform.jmir.org/2021/12/e30970 UR - http://dx.doi.org/10.2196/30970 UR - http://www.ncbi.nlm.nih.gov/pubmed/34904958 ID - info:doi/10.2196/30970 ER - TY - JOUR AU - Bannay, Aurélie AU - Bories, Mathilde AU - Le Corre, Pascal AU - Riou, Christine AU - Lemordant, Pierre AU - Van Hille, Pascal AU - Chazard, Emmanuel AU - Dode, Xavier AU - Cuggia, Marc AU - Bouzillé, Guillaume PY - 2021/12/13 TI - Leveraging National Claims and Hospital Big Data: Cohort Study on a Statin-Drug Interaction Use Case JO - JMIR Med Inform SP - e29286 VL - 9 IS - 12 KW - drug interactions KW - statins KW - administrative claims KW - health care KW - big data KW - data linking KW - data warehousing N2 - Background: Linking different sources of medical data is a promising approach to analyze care trajectories. The aim of the INSHARE (Integrating and Sharing Health Big Data for Research) project was to provide the blueprint for a technological platform that facilitates integration, sharing, and reuse of data from 2 sources: the clinical data warehouse (CDW) of the Rennes academic hospital, called eHOP (entrepôt Hôpital), and a data set extracted from the French national claim data warehouse (Système National des Données de Santé [SNDS]). Objective: This study aims to demonstrate how the INSHARE platform can support big data analytic tasks in the health field using a pharmacovigilance use case based on statin consumption and statin-drug interactions. Methods: A Spark distributed cluster-computing framework was used for the record linkage procedure and all analyses. A semideterministic record linkage method based on the common variables between the chosen data sources was developed to identify all patients discharged after at least one hospital stay at the Rennes academic hospital between 2015 and 2017. The use-case study focused on a cohort of patients treated with statins prescribed by their general practitioner or during their hospital stay. Results: The whole process (record linkage procedure and use-case analyses) required 88 minutes. Of the 161,532 and 164,316 patients from the SNDS and eHOP CDW data sets, respectively, 159,495 patients were successfully linked (98.74% and 97.07% of patients from SNDS and eHOP CDW, respectively). Of the 16,806 patients with at least one statin delivery, 8293 patients started the consumption before and continued during the hospital stay, 6382 patients stopped statin consumption at hospital admission, and 2131 patients initiated statins in hospital. Statin-drug interactions occurred more frequently during hospitalization than in the community (3800/10,424, 36.45% and 3253/14,675, 22.17%, respectively; P<.001). Only 121 patients had the most severe level of statin-drug interaction. Hospital stay burden (length of stay and in-hospital mortality) was more severe in patients with statin-drug interactions during hospitalization. Conclusions: This study demonstrates the added value of combining and reusing clinical and claim data to provide large-scale measures of drug-drug interaction prevalence and care pathways outside hospitals. It builds a path to move the current health care system toward a Learning Health System using knowledge generated from research on real-world health data. UR - https://medinform.jmir.org/2021/12/e29286 UR - http://dx.doi.org/10.2196/29286 UR - http://www.ncbi.nlm.nih.gov/pubmed/34898457 ID - info:doi/10.2196/29286 ER - TY - JOUR AU - Lajonchere, Clara AU - Naeim, Arash AU - Dry, Sarah AU - Wenger, Neil AU - Elashoff, David AU - Vangala, Sitaram AU - Petruse, Antonia AU - Ariannejad, Maryam AU - Magyar, Clara AU - Johansen, Liliana AU - Werre, Gabriela AU - Kroloff, Maxwell AU - Geschwind, Daniel PY - 2021/12/8 TI - An Integrated, Scalable, Electronic Video Consent Process to Power Precision Health Research: Large, Population-Based, Cohort Implementation and Scalability Study JO - J Med Internet Res SP - e31121 VL - 23 IS - 12 KW - biobanking KW - precision medicine KW - electronic consent KW - privacy KW - consent KW - patient privacy KW - clinical data KW - eHealth KW - recruitment KW - population health KW - data collection KW - research methods KW - video KW - research KW - validation KW - scalability N2 - Background: Obtaining explicit consent from patients to use their remnant biological samples and deidentified clinical data for research is essential for advancing precision medicine. Objective: We aimed to describe the operational implementation and scalability of an electronic universal consent process that was used to power an institutional precision health biobank across a large academic health system. Methods: The University of California, Los Angeles, implemented the use of innovative electronic consent videos as the primary recruitment tool for precision health research. The consent videos targeted patients aged ?18 years across ambulatory clinical laboratories, perioperative settings, and hospital settings. Each of these major areas had slightly different workflows and patient populations. Sociodemographic information, comorbidity data, health utilization data (ambulatory visits, emergency room visits, and hospital admissions), and consent decision data were collected. Results: The consenting approach proved scalable across 22 clinical sites (hospital and ambulatory settings). Over 40,000 participants completed the consent process at a rate of 800 to 1000 patients per week over a 2-year time period. Participants were representative of the adult University of California, Los Angeles, Health population. The opt-in rates in the perioperative (16,500/22,519, 73.3%) and ambulatory clinics (2308/3390, 68.1%) were higher than those in clinical laboratories (7506/14,235, 52.7%; P<.001). Patients with higher medical acuity were more likely to opt in. The multivariate analyses showed that African American (odds ratio [OR] 0.53, 95% CI 0.49-0.58; P<.001), Asian (OR 0.72, 95% CI 0.68-0.77; P<.001), and multiple-race populations (OR 0.73, 95% CI 0.69-0.77; P<.001) were less likely to participate than White individuals. Conclusions: This is one of the few large-scale, electronic video?based consent implementation programs that reports a 65.5% (26,314/40,144) average overall opt-in rate across a large academic health system. This rate is higher than those previously reported for email (3.6%) and electronic biobank (50%) informed consent rates. This study demonstrates a scalable recruitment approach for population health research. UR - https://www.jmir.org/2021/12/e31121 UR - http://dx.doi.org/10.2196/31121 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889741 ID - info:doi/10.2196/31121 ER - TY - JOUR AU - Pan, Youcheng AU - Wang, Chenghao AU - Hu, Baotian AU - Xiang, Yang AU - Wang, Xiaolong AU - Chen, Qingcai AU - Chen, Junjie AU - Du, Jingcheng PY - 2021/12/8 TI - A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation JO - JMIR Med Inform SP - e32698 VL - 9 IS - 12 KW - electronic medical record KW - text-to-SQL generation KW - BERT KW - grammar-based decoding KW - tree-structured intermediate representation N2 - Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text?to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text?to-SQL generation, indicating strong potential to be applied in the real medical scenario. UR - https://medinform.jmir.org/2021/12/e32698 UR - http://dx.doi.org/10.2196/32698 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889749 ID - info:doi/10.2196/32698 ER - TY - JOUR AU - Singh, Janmajay AU - Sato, Masahiro AU - Ohkuma, Tomoko PY - 2021/12/8 TI - On Missingness Features in Machine Learning Models for Critical Care: Observational Study JO - JMIR Med Inform SP - e25022 VL - 9 IS - 12 KW - electronic health records KW - informative missingness KW - machine learning KW - missing data KW - hospital mortality KW - sepsis N2 - Background: Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient?s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. Objective: The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. Methods: A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. Results: Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. Conclusions: This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further. UR - https://medinform.jmir.org/2021/12/e25022 UR - http://dx.doi.org/10.2196/25022 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889756 ID - info:doi/10.2196/25022 ER - TY - JOUR AU - Liu, Dianbo AU - Zheng, Ming AU - Sepulveda, Andres Nestor PY - 2021/12/8 TI - Using Artificial Neural Network Condensation to Facilitate Adaptation of Machine Learning in Medical Settings by Reducing Computational Burden: Model Design and Evaluation Study JO - JMIR Form Res SP - e20767 VL - 5 IS - 12 KW - artificial neural network KW - electronic medical records KW - parameter pruning KW - machine learning KW - computational burden KW - N2 - Background: Machine learning applications in the health care domain can have a great impact on people?s lives. At the same time, medical data is usually big, requiring a significant number of computational resources. Although this might not be a problem for the wide adoption of machine learning tools in high-income countries, the availability of computational resources can be limited in low-income countries and on mobile devices. This can limit many people from benefiting from the advancement in machine learning applications in the field of health care. Objective: In this study, we explore three methods to increase the computational efficiency and reduce model sizes of either recurrent neural networks (RNNs) or feedforward deep neural networks (DNNs) without compromising their accuracy. Methods: We used inpatient mortality prediction as our case analysis upon review of an intensive care unit dataset. We reduced the size of RNN and DNN by applying pruning of ?unused? neurons. Additionally, we modified the RNN structure by adding a hidden layer to the RNN cell but reducing the total number of recurrent layers to accomplish a reduction of the total parameters used in the network. Finally, we implemented quantization on DNN by forcing the weights to be 8 bits instead of 32 bits. Results: We found that all methods increased implementation efficiency, including training speed, memory size, and inference speed, without reducing the accuracy of mortality prediction. Conclusions: Our findings suggest that neural network condensation allows for the implementation of sophisticated neural network algorithms on devices with lower computational resources. UR - https://formative.jmir.org/2021/12/e20767 UR - http://dx.doi.org/10.2196/20767 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889747 ID - info:doi/10.2196/20767 ER - TY - JOUR AU - Izadi, Neda AU - Etemad, Koorosh AU - Mehrabi, Yadollah AU - Eshrati, Babak AU - Hashemi Nazari, Saeed Seyed PY - 2021/12/7 TI - The Standardization of Hospital-Acquired Infection Rates Using Prediction Models in Iran: Observational Study of National Nosocomial Infection Registry Data JO - JMIR Public Health Surveill SP - e33296 VL - 7 IS - 12 KW - hospital-acquired infections KW - standardized infection ratio KW - prediction model KW - Iran N2 - Background: Many factors contribute to the spreading of hospital-acquired infections (HAIs). Objective: This study aimed to standardize the HAI rate using prediction models in Iran based on the National Healthcare Safety Network (NHSN) method. Methods: In this study, the Iranian nosocomial infections surveillance system (INIS) was used to gather data on patients with HAIs (126,314 infections). In addition, the hospital statistics and information system (AVAB) was used to collect data on hospital characteristics. First, well-performing hospitals, including 357 hospitals from all over the country, were selected. Data were randomly split into training (70%) and testing (30%) sets. Finally, the standardized infection ratio (SIR) and the corrected SIR were calculated for the HAIs. Results: The mean age of the 100,110 patients with an HAI was 40.02 (SD 23.56) years. The corrected SIRs based on the observed and predicted infections for respiratory tract infections (RTIs), urinary tract infections (UTIs), surgical site infections (SSIs), and bloodstream infections (BSIs) were 0.03 (95% CI 0-0.09), 1.02 (95% CI 0.95-1.09), 0.93 (95% CI 0.85-1.007), and 0.91 (95% CI 0.54-1.28), respectively. Moreover, the corrected SIRs for RTIs in the infectious disease, burn, obstetrics and gynecology, and internal medicine wards; UTIs in the burn, infectious disease, internal medicine, and intensive care unit wards; SSIs in the burn and infectious disease wards; and BSIs in most wards were >1, indicating that more HAIs were observed than expected. Conclusions: The results of this study can help to promote preventive measures based on scientific evidence. They can also lead to the continuous improvement of the monitoring system by collecting and systematically analyzing data on HAIs and encourage the hospitals to better control their infection rates by establishing a benchmarking system. UR - https://publichealth.jmir.org/2021/12/e33296 UR - http://dx.doi.org/10.2196/33296 UR - http://www.ncbi.nlm.nih.gov/pubmed/34879002 ID - info:doi/10.2196/33296 ER - TY - JOUR AU - Allam, Ahmed AU - Feuerriegel, Stefan AU - Rebhan, Michael AU - Krauthammer, Michael PY - 2021/12/3 TI - Analyzing Patient Trajectories With Artificial Intelligence JO - J Med Internet Res SP - e29812 VL - 23 IS - 12 KW - patient trajectories KW - longitudinal data KW - digital medicine KW - artificial intelligence KW - machine learning UR - https://www.jmir.org/2021/12/e29812 UR - http://dx.doi.org/10.2196/29812 UR - http://www.ncbi.nlm.nih.gov/pubmed/34870606 ID - info:doi/10.2196/29812 ER - TY - JOUR AU - Mahajan, Abhishaike AU - Deonarine, Andrew AU - Bernal, Axel AU - Lyons, Genevieve AU - Norgeot, Beau PY - 2021/11/26 TI - Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study JO - J Med Internet Res SP - e32900 VL - 23 IS - 11 KW - multimorbidity KW - clinical risk score KW - outcome research KW - machine learning KW - electronic health record KW - clinical informatics KW - morbidity KW - risk KW - outcome KW - population data KW - diagnostic KW - demographic KW - decision making KW - cohort KW - prediction N2 - Background: Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: existing multimorbidity scores (1) are generally limited to one data group (eg, diagnoses, labs) and may be missing vital information, (2) are usually limited to specific demographic groups (eg, age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention. Objective: Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHRs), we developed a physiologically diverse and generalizable set of multimorbidity risk scores. Methods: Using EHR data from a nationwide cohort of patients, we developed the total health profile, a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient hospital visitation over a 2-year follow-up window, attributable to specific organ systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for heart, lung, neuro, kidney, and digestive functions and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients. Results: Study patients closely matched national census averages, with a median age of 41 years, a median income of $66,829, and racial averages by zip code of 73.8% White, 5.9% Asian, and 11.9% African American. All models were well calibrated and demonstrated strong performance with areas under the receiver operating curve (AUROCs) of 0.83 for the total health score (THS), 0.89 for heart, 0.86 for lung, 0.84 for neuro, 0.90 for kidney, and 0.83 for digestive functions. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip code income levels. Each model learned to generate predictions by focusing on appropriate clinically relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly used multimorbidity scoring systems, specifically the Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) overall (AUROCs: THS=0.823, CCI=0.735, ECI=0.649) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower-income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups. Conclusions: Massive retrospective EHR data sets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalizable across diverse patient populations. UR - https://www.jmir.org/2021/11/e32900 UR - http://dx.doi.org/10.2196/32900 UR - http://www.ncbi.nlm.nih.gov/pubmed/34842542 ID - info:doi/10.2196/32900 ER - TY - JOUR AU - Chang, David AU - Lin, Eric AU - Brandt, Cynthia AU - Taylor, Andrew Richard PY - 2021/11/26 TI - Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison JO - JMIR Med Inform SP - e23101 VL - 9 IS - 11 KW - natural language processing KW - graph neural networks KW - National NLP Clinical Challenges KW - bidirectional encoder representation from transformers N2 - Background: Although electronic health record systems have facilitated clinical documentation in health care, they have also introduced new challenges, such as the proliferation of redundant information through the use of copy and paste commands or templates. One approach to trimming down bloated clinical documentation and improving clinical summarization is to identify highly similar text snippets with the goal of removing such text. Objective: We developed a natural language processing system for the task of assessing clinical semantic textual similarity. The system assigns scores to pairs of clinical text snippets based on their clinical semantic similarity. Methods: We leveraged recent advances in natural language processing and graph representation learning to create a model that combines linguistic and domain knowledge information from the MedSTS data set to assess clinical semantic textual similarity. We used bidirectional encoder representation from transformers (BERT)?based models as text encoders for the sentence pairs in the data set and graph convolutional networks (GCNs) as graph encoders for corresponding concept graphs that were constructed based on the sentences. We also explored techniques, including data augmentation, ensembling, and knowledge distillation, to improve the model?s performance, as measured by the Pearson correlation coefficient (r). Results: Fine-tuning the BERT_base and ClinicalBERT models on the MedSTS data set provided a strong baseline (Pearson correlation coefficients: 0.842 and 0.848, respectively) compared to those of the previous year?s submissions. Our data augmentation techniques yielded moderate gains in performance, and adding a GCN-based graph encoder to incorporate the concept graphs also boosted performance, especially when the node features were initialized with pretrained knowledge graph embeddings of the concepts (r=0.868). As expected, ensembling improved performance, and performing multisource ensembling by using different language model variants, conducting knowledge distillation with the multisource ensemble model, and taking a final ensemble of the distilled models further improved the system?s performance (Pearson correlation coefficients: 0.875, 0.878, and 0.882, respectively). Conclusions: This study presents a system for the MedSTS clinical semantic textual similarity benchmark task, which was created by combining BERT-based text encoders and GCN-based graph encoders in order to incorporate domain knowledge into the natural language processing pipeline. We also experimented with other techniques involving data augmentation, pretrained concept embeddings, ensembling, and knowledge distillation to further increase our system?s performance. Although the task and its benchmark data set are in the early stages of development, this study, as well as the results of the competition, demonstrates the potential of modern language model?based systems to detect redundant information in clinical notes. UR - https://medinform.jmir.org/2021/11/e23101 UR - http://dx.doi.org/10.2196/23101 UR - http://www.ncbi.nlm.nih.gov/pubmed/34842531 ID - info:doi/10.2196/23101 ER - TY - JOUR AU - Ramachandran, Raghav AU - McShea, J. Michael AU - Howson, N. Stephanie AU - Burkom, S. Howard AU - Chang, Hsien-Yen AU - Weiner, P. Jonathan AU - Kharrazi, Hadi PY - 2021/11/25 TI - Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data JO - JMIR Med Inform SP - e31442 VL - 9 IS - 11 KW - persistent high users KW - persistent high utilizers KW - latent class analysis KW - comorbidity patterns KW - utilization prediction KW - unsupervised clustering KW - population health analytics KW - health care KW - prediction models KW - health care services KW - health care costs N2 - Background: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. Objective: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. Methods: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20% of all patients? costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. Results: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were women, 3.3% had ?1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.09%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6% PHUs), mental health (n=34,456; 12.8% PHUs), otitis media (n=24,992; 4.5% PHUs), and musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77% and 43.05%, respectively). Conclusions: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings. UR - https://medinform.jmir.org/2021/11/e31442 UR - http://dx.doi.org/10.2196/31442 UR - http://www.ncbi.nlm.nih.gov/pubmed/34592712 ID - info:doi/10.2196/31442 ER - TY - JOUR AU - Pankhurst, Tanya AU - Evison, Felicity AU - Atia, Jolene AU - Gallier, Suzy AU - Coleman, Jamie AU - Ball, Simon AU - McKee, Deborah AU - Ryan, Steven AU - Black, Ruth PY - 2021/11/23 TI - Introduction of Systematized Nomenclature of Medicine?Clinical Terms Coding Into an Electronic Health Record and Evaluation of its Impact: Qualitative and Quantitative Study JO - JMIR Med Inform SP - e29532 VL - 9 IS - 11 KW - coding standards KW - clinical decision support KW - Clinician led design KW - clinician reported experience KW - clinical usability KW - data sharing KW - diagnoses KW - electronic health records KW - electronic health record standards KW - health data exchange KW - health data research KW - International Classification of Diseases version 10 (ICD-10) KW - National Health Service Blueprint KW - patient diagnoses KW - population health KW - problem list KW - research KW - Systematized Nomenclature Of Medicine?Clinical Terms (SNOMED-CT) KW - use of electronic health data KW - user-led design N2 - Background: This study describes the conversion within an existing electronic health record (EHR) from the International Classification of Diseases, Tenth Revision coding system to the SNOMED-CT (Systematized Nomenclature of Medicine?Clinical Terms) for the collection of patient histories and diagnoses. The setting is a large acute hospital that is designing and building its own EHR. Well-designed EHRs create opportunities for continuous data collection, which can be used in clinical decision support rules to drive patient safety. Collected data can be exchanged across health care systems to support patients in all health care settings. Data can be used for research to prevent diseases and protect future populations. Objective: The aim of this study was to migrate a current EHR, with all relevant patient data, to the SNOMED-CT coding system to optimize clinical use and clinical decision support, facilitate data sharing across organizational boundaries for national programs, and enable remodeling of medical pathways. Methods: The study used qualitative and quantitative data to understand the successes and gaps in the project, clinician attitudes toward the new tool, and the future use of the tool. Results: The new coding system (tool) was well received and immediately widely used in all specialties. This resulted in increased, accurate, and clinically relevant data collection. Clinicians appreciated the increased depth and detail of the new coding, welcomed the potential for both data sharing and research, and provided extensive feedback for further development. Conclusions: Successful implementation of the new system aligned the University Hospitals Birmingham NHS Foundation Trust with national strategy and can be used as a blueprint for similar projects in other health care settings. UR - https://medinform.jmir.org/2021/11/e29532 UR - http://dx.doi.org/10.2196/29532 UR - http://www.ncbi.nlm.nih.gov/pubmed/34817387 ID - info:doi/10.2196/29532 ER - TY - JOUR AU - Gierend, Kerstin AU - Krüger, Frank AU - Waltemath, Dagmar AU - Fünfgeld, Maximilian AU - Ganslandt, Thomas AU - Zeleke, Alamirrew Atinkut PY - 2021/11/22 TI - Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e31750 VL - 10 IS - 11 KW - provenance KW - biomedical KW - workflow KW - data sharing KW - lineage KW - scoping review KW - data genesis KW - scientific data KW - digital objects KW - healthcare data N2 - Background: Provenance supports the understanding of data genesis, and it is a key factor to ensure the trustworthiness of digital objects containing (sensitive) scientific data. Provenance information contributes to a better understanding of scientific results and fosters collaboration on existing data as well as data sharing. This encompasses defining comprehensive concepts and standards for transparency and traceability, reproducibility, validity, and quality assurance during clinical and scientific data workflows and research. Objective: The aim of this scoping review is to investigate existing evidence regarding approaches and criteria for provenance tracking as well as disclosing current knowledge gaps in the biomedical domain. This review covers modeling aspects as well as metadata frameworks for meaningful and usable provenance information during creation, collection, and processing of (sensitive) scientific biomedical data. This review also covers the examination of quality aspects of provenance criteria. Methods: This scoping review will follow the methodological framework by Arksey and O'Malley. Relevant publications will be obtained by querying PubMed and Web of Science. All papers in English language will be included, published between January 1, 2006 and March 23, 2021. Data retrieval will be accompanied by manual search for grey literature. Potential publications will then be exported into a reference management software, and duplicates will be removed. Afterwards, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted, and analyzed: title and abstract screening will be carried out by 4 independent reviewers. Majority vote is required for consent to eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading will be performed independently by 2 reviewers and in the last step, key information will be extracted on a pretested template. If agreement cannot be reached, the conflict will be resolved by a domain expert. Charted data will be analyzed by categorizing and summarizing the individual data items based on the research questions. Tabular or graphical overviews will be given, if applicable. Results: The reporting follows the extension of the Preferred Reporting Items for Systematic reviews and Meta-Analyses statements for Scoping Reviews. Electronic database searches in PubMed and Web of Science resulted in 469 matches after deduplication. As of September 2021, the scoping review is in the full-text screening stage. The data extraction using the pretested charting template will follow the full-text screening stage. We expect the scoping review report to be completed by February 2022. Conclusions: Information about the origin of healthcare data has a major impact on the quality and the reusability of scientific results as well as follow-up activities. This protocol outlines plans for a scoping review that will provide information about current approaches, challenges, or knowledge gaps with provenance tracking in biomedical sciences. International Registered Report Identifier (IRRID): DERR1-10.2196/31750 UR - https://www.researchprotocols.org/2021/11/e31750 UR - http://dx.doi.org/10.2196/31750 UR - http://www.ncbi.nlm.nih.gov/pubmed/34813494 ID - info:doi/10.2196/31750 ER - TY - JOUR AU - Kasturi, N. Suranga AU - Park, Jeremy AU - Wild, David AU - Khan, Babar AU - Haggstrom, A. David AU - Grannis, Shaun PY - 2021/11/15 TI - Predicting COVID-19?Related Health Care Resource Utilization Across a Statewide Patient Population: Model Development Study JO - J Med Internet Res SP - e31337 VL - 23 IS - 11 KW - COVID-19 KW - machine learning KW - population health KW - health care utilization KW - health disparities KW - health information KW - epidemiology KW - public health KW - digital health KW - health data KW - pandemic KW - decision models KW - health informatics KW - healthcare resources N2 - Background: The COVID-19 pandemic has highlighted the inability of health systems to leverage existing system infrastructure in order to rapidly develop and apply broad analytical tools that could inform state- and national-level policymaking, as well as patient care delivery in hospital settings. The COVID-19 pandemic has also led to highlighted systemic disparities in health outcomes and access to care based on race or ethnicity, gender, income-level, and urban-rural divide. Although the United States seems to be recovering from the COVID-19 pandemic owing to widespread vaccination efforts and increased public awareness, there is an urgent need to address the aforementioned challenges. Objective: This study aims to inform the feasibility of leveraging broad, statewide datasets for population health?driven decision-making by developing robust analytical models that predict COVID-19?related health care resource utilization across patients served by Indiana?s statewide Health Information Exchange. Methods: We leveraged comprehensive datasets obtained from the Indiana Network for Patient Care to train decision forest-based models that can predict patient-level need of health care resource utilization. To assess these models for potential biases, we tested model performance against subpopulations stratified by age, race or ethnicity, gender, and residence (urban vs rural). Results: For model development, we identified a cohort of 96,026 patients from across 957 zip codes in Indiana, United States. We trained the decision models that predicted health care resource utilization by using approximately 100 of the most impactful features from a total of 1172 features created. Each model and stratified subpopulation under test reported precision scores >70%, accuracy and area under the receiver operating curve scores >80%, and sensitivity scores approximately >90%. We noted statistically significant variations in model performance across stratified subpopulations identified by age, race or ethnicity, gender, and residence (urban vs rural). Conclusions: This study presents the possibility of developing decision models capable of predicting patient-level health care resource utilization across a broad, statewide region with considerable predictive performance. However, our models present statistically significant variations in performance across stratified subpopulations of interest. Further efforts are necessary to identify root causes of these biases and to rectify them. UR - https://www.jmir.org/2021/11/e31337 UR - http://dx.doi.org/10.2196/31337 UR - http://www.ncbi.nlm.nih.gov/pubmed/34581671 ID - info:doi/10.2196/31337 ER - TY - JOUR AU - Murtas, Rossella AU - Morici, Nuccia AU - Cogliati, Chiara AU - Puoti, Massimo AU - Omazzi, Barbara AU - Bergamaschi, Walter AU - Voza, Antonio AU - Rovere Querini, Patrizia AU - Stefanini, Giulio AU - Manfredi, Grazia Maria AU - Zocchi, Teresa Maria AU - Mangiagalli, Andrea AU - Brambilla, Vittoria Carla AU - Bosio, Marco AU - Corradin, Matteo AU - Cortellaro, Francesca AU - Trivelli, Marco AU - Savonitto, Stefano AU - Russo, Giampiero Antonio PY - 2021/11/15 TI - Algorithm for Individual Prediction of COVID-19?Related Hospitalization Based on Symptoms: Development and Implementation Study JO - JMIR Public Health Surveill SP - e29504 VL - 7 IS - 11 KW - COVID-19 KW - severe outcome KW - prediction KW - monitoring system KW - symptoms KW - risk prediction KW - risk KW - algorithms KW - prediction models KW - pandemic KW - digital data KW - health records N2 - Background: The COVID-19 pandemic has placed a huge strain on the health care system globally. The metropolitan area of Milan, Italy, was one of the regions most impacted by the COVID-19 pandemic worldwide. Risk prediction models developed by combining administrative databases and basic clinical data are needed to stratify individual patient risk for public health purposes. Objective: This study aims to develop a stratification tool aimed at improving COVID-19 patient management and health care organization. Methods: A predictive algorithm was developed and applied to 36,834 patients with COVID-19 in Italy between March 8 and the October 9, 2020, in order to foresee their risk of hospitalization. Exposures considered were age, sex, comorbidities, and symptoms associated with COVID-19 (eg, vomiting, cough, fever, diarrhea, myalgia, asthenia, headache, anosmia, ageusia, and dyspnea). The outcome was hospitalizations and emergency department admissions for COVID-19. Discrimination and calibration of the model were also assessed. Results: The predictive model showed a good fit for predicting COVID-19 hospitalization (C-index 0.79) and a good overall prediction accuracy (Brier score 0.14). The model was well calibrated (intercept ?0.0028, slope 0.9970). Based on these results, 118,804 patients diagnosed with COVID-19 from October 25 to December 11, 2020, were stratified into low, medium, and high risk for COVID-19 severity. Among the overall study population, 67,030 (56.42%) were classified as low-risk patients; 43,886 (36.94%), as medium-risk patients; and 7888 (6.64%), as high-risk patients. In all, 89.37% (106,179/118,804) of the overall study population was being assisted at home, 9% (10,695/118,804) was hospitalized, and 1.62% (1930/118,804) died. Among those assisted at home, most people (63,983/106,179, 60.26%) were classified as low risk, whereas only 3.63% (3858/106,179) were classified at high risk. According to ordinal logistic regression, the odds ratio (OR) of being hospitalized or dead was 5.0 (95% CI 4.6-5.4) among high-risk patients and 2.7 (95% CI 2.6-2.9) among medium-risk patients, as compared to low-risk patients. Conclusions: A simple monitoring system, based on primary care data sets linked to COVID-19 testing results, hospital admissions data, and death records may assist in the proper planning and allocation of patients and resources during the ongoing COVID-19 pandemic. UR - https://publichealth.jmir.org/2021/11/e29504 UR - http://dx.doi.org/10.2196/29504 UR - http://www.ncbi.nlm.nih.gov/pubmed/34543227 ID - info:doi/10.2196/29504 ER - TY - JOUR AU - Hammam, Nevin AU - Izadi, Zara AU - Li, Jing AU - Evans, Michael AU - Kay, Julia AU - Shiboski, Stephen AU - Schmajuk, Gabriela AU - Yazdany, Jinoos PY - 2021/11/12 TI - The Relationship Between Electronic Health Record System and Performance on Quality Measures in the American College of Rheumatology?s Rheumatology Informatics System for Effectiveness (RISE) Registry: Observational Study JO - JMIR Med Inform SP - e31186 VL - 9 IS - 11 KW - rheumatoid arthritis KW - electronic health record KW - patient-reported outcomes KW - quality measures KW - disease activity KW - quality of care KW - performance reporting KW - medical informatics KW - clinical informatics N2 - Background: Routine collection of disease activity (DA) and patient-reported outcomes (PROs) in rheumatoid arthritis (RA) are nationally endorsed quality measures and critical components of a treat-to-target approach. However, little is known about the role electronic health record (EHR) systems play in facilitating performance on these measures. Objective: Using the American College Rheumatology?s (ACR?s) RISE registry, we analyzed the relationship between EHR system and performance on DA and functional status (FS) quality measures. Methods: We analyzed data collected in 2018 from practices enrolled in RISE. We assessed practice-level performance on quality measures that require DA and FS documentation. Multivariable linear regression and zero-inflated negative binomial models were used to examine the independent effect of EHR system on practice-level quality measure performance, adjusting for practice characteristics and patient case-mix. Results: In total, 220 included practices cared for 314,793 patients with RA. NextGen was the most commonly used EHR system (34.1%). We found wide variation in performance on DA and FS quality measures by EHR system (median 30.1, IQR 0-74.8, and median 9.0, IQR 0-74.2), respectively). Even after adjustment, NextGen practices performed significantly better than Allscripts on the DA measure (51.4% vs 5.0%; P<.05) and significantly better than eClinicalWorks and eMDs on the FS measure (49.3% vs 29.0% and 10.9%; P<.05). Conclusions: Performance on national RA quality measures was associated with the EHR system, even after adjusting for practice and patient characteristics. These findings suggest that future efforts to improve quality of care in RA should focus not only on provider performance reporting but also on developing and implementing rheumatology-specific standards across EHRs. UR - https://medinform.jmir.org/2021/11/e31186 UR - http://dx.doi.org/10.2196/31186 UR - http://www.ncbi.nlm.nih.gov/pubmed/34766910 ID - info:doi/10.2196/31186 ER - TY - JOUR AU - McKenzie, Jordan AU - Rajapakshe, Rasika AU - Shen, Hua AU - Rajapakshe, Shan AU - Lin, Angela PY - 2021/11/12 TI - A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study JO - JMIR Med Inform SP - e29241 VL - 9 IS - 11 KW - chart review KW - natural language processing KW - text extraction KW - radiation pneumonitis KW - lung cancer KW - radiation therapy KW - python KW - electronic medical record KW - accuracy N2 - Background: Health research frequently requires manual chart reviews to identify patients in a study-specific cohort and examine their clinical outcomes. Manual chart review is a labor-intensive process that requires significant time investment for clinical researchers. Objective: This study aims to evaluate the feasibility and accuracy of an assisted chart review program, using an in-house rule-based text-extraction program written in Python, to identify patients who developed radiation pneumonitis (RP) after receiving curative radiotherapy. Methods: A retrospective manual chart review was completed for patients who received curative radiotherapy for stage 2-3 lung cancer from January 1, 2013 to December 31, 2015, at British Columbia Cancer, Kelowna Centre. In the manual chart review, RP diagnosis and grading were recorded using the Common Terminology Criteria for Adverse Events version 5.0. From the charts of 50 sample patients, a total of 1413 clinical documents were obtained for review from the electronic medical record system. The text-extraction program was built using the Natural Language Toolkit Python platform (and regular expressions, also known as RegEx). Python version 3.7.2 was used to run the text-extraction program. The output of the text-extraction program was a list of the full sentences containing the key terms, document IDs, and dates from which these sentences were extracted. The results from the manual review were used as the gold standard in this study, with which the results of the text-extraction program were compared. Results: Fifty percent (25/50) of the sample patients developed grade ?1 RP; the natural language processing program was able to ascertain 92% (23/25) of these patients (sensitivity 0.92, 95% CI 0.74-0.99; specificity 0.36, 95% CI 0.18-0.57). Furthermore, the text-extraction program was able to correctly identify all 9 patients with grade ?2 RP, which are patients with clinically significant symptoms (sensitivity 1.0, 95% CI 0.66-1.0; specificity 0.27, 95% CI 0.14-0.43). The program was useful for distinguishing patients with RP from those without RP. The text-extraction program in this study avoided unnecessary manual review of 22% (11/50) of the sample patients, as these patients were identified as grade 0 RP and would not require further manual review in subsequent studies. Conclusions: This feasibility study showed that the text-extraction program was able to assist with the identification of patients who developed RP after curative radiotherapy. The program streamlines the manual chart review further by identifying the key sentences of interest. This work has the potential to improve future clinical research, as the text-extraction program shows promise in performing chart review in a more time-efficient manner, compared with the traditional labor-intensive manual chart review. UR - https://medinform.jmir.org/2021/11/e29241 UR - http://dx.doi.org/10.2196/29241 UR - http://www.ncbi.nlm.nih.gov/pubmed/34766919 ID - info:doi/10.2196/29241 ER - TY - JOUR AU - Elkin, L. Peter AU - Mullin, Sarah AU - Mardekian, Jack AU - Crowner, Christopher AU - Sakilay, Sylvester AU - Sinha, Shyamashree AU - Brady, Gary AU - Wright, Marcia AU - Nolen, Kimberly AU - Trainer, JoAnn AU - Koppel, Ross AU - Schlegel, Daniel AU - Kaushik, Sashank AU - Zhao, Jane AU - Song, Buer AU - Anand, Edwin PY - 2021/11/9 TI - Using Artificial Intelligence With Natural Language Processing to Combine Electronic Health Record?s Structured and Free Text Data to Identify Nonvalvular Atrial Fibrillation to Decrease Strokes and Death: Evaluation and Case-Control Study JO - J Med Internet Res SP - e28946 VL - 23 IS - 11 KW - afib KW - atrial fibrillation KW - artificial intelligence KW - NVAF KW - natural language processing KW - stroke risk KW - bleed risk KW - CHA2DS2-VASc KW - HAS-BLED KW - bio-surveillance N2 - Background: Nonvalvular atrial fibrillation (NVAF) affects almost 6 million Americans and is a major contributor to stroke but is significantly undiagnosed and undertreated despite explicit guidelines for oral anticoagulation. Objective: The aim of this study is to investigate whether the use of semisupervised natural language processing (NLP) of electronic health record?s (EHR) free-text information combined with structured EHR data improves NVAF discovery and treatment and perhaps offers a method to prevent thousands of deaths and save billions of dollars. Methods: We abstracted 96,681 participants from the University of Buffalo faculty practice?s EHR. NLP was used to index the notes and compare the ability to identify NVAF, congestive heart failure, hypertension, age ?75 years, diabetes mellitus, stroke or transient ischemic attack, vascular disease, age 65 to 74 years, sex category (CHA2DS2-VASc), and Hypertension, Abnormal liver/renal function, Stroke history, Bleeding history or predisposition, Labile INR, Elderly, Drug/alcohol usage (HAS-BLED) scores using unstructured data (International Classification of Diseases codes) versus structured and unstructured data from clinical notes. In addition, we analyzed data from 63,296,120 participants in the Optum and Truven databases to determine the NVAF frequency, rates of CHA2DS2?VASc ?2, and no contraindications to oral anticoagulants, rates of stroke and death in the untreated population, and first year?s costs after stroke. Results: The structured-plus-unstructured method would have identified 3,976,056 additional true NVAF cases (P<.001) and improved sensitivity for CHA2DS2-VASc and HAS-BLED scores compared with the structured data alone (P=.002 and P<.001, respectively), causing a 32.1% improvement. For the United States, this method would prevent an estimated 176,537 strokes, save 10,575 lives, and save >US $13.5 billion. Conclusions: Artificial intelligence?informed bio-surveillance combining NLP of free-text information with structured EHR data improves data completeness, prevents thousands of strokes, and saves lives and funds. This method is applicable to many disorders with profound public health consequences. UR - https://www.jmir.org/2021/11/e28946 UR - http://dx.doi.org/10.2196/28946 UR - http://www.ncbi.nlm.nih.gov/pubmed/34751659 ID - info:doi/10.2196/28946 ER - TY - JOUR AU - Gong, Jianxia AU - Sihag, Vikrant AU - Kong, Qingxia AU - Zhao, Lindu PY - 2021/11/1 TI - Visualizing Knowledge Evolution Trends and Research Hotspots of Personal Health Data Research: Bibliometric Analysis JO - JMIR Med Inform SP - e31142 VL - 9 IS - 11 KW - knowledge evolution trends KW - research hotspots KW - personal health data KW - bibliometrics N2 - Background: The recent surge in clinical and nonclinical health-related data has been accompanied by a concomitant increase in personal health data (PHD) research across multiple disciplines such as medicine, computer science, and management. There is now a need to synthesize the dynamic knowledge of PHD in various disciplines to spot potential research hotspots. Objective: The aim of this study was to reveal the knowledge evolutionary trends in PHD and detect potential research hotspots using bibliometric analysis. Methods: We collected 8281 articles published between 2009 and 2018 from the Web of Science database. The knowledge evolution analysis (KEA) framework was used to analyze the evolution of PHD research. The KEA framework is a bibliometric approach that is based on 3 knowledge networks: reference co-citation, keyword co-occurrence, and discipline co-occurrence. Results: The findings show that the focus of PHD research has evolved from medicine centric to technology centric to human centric since 2009. The most active PHD knowledge cluster is developing knowledge resources and allocating scarce resources. The field of computer science, especially the topic of artificial intelligence (AI), has been the focal point of recent empirical studies on PHD. Topics related to psychology and human factors (eg, attitude, satisfaction, education) are also receiving more attention. Conclusions: Our analysis shows that PHD research has the potential to provide value-based health care in the future. All stakeholders should be educated about AI technology to promote value generation through PHD. Moreover, technology developers and health care institutions should consider human factors to facilitate the effective adoption of PHD-related technology. These findings indicate opportunities for interdisciplinary cooperation in several PHD research areas: (1) AI applications for PHD; (2) regulatory issues and governance of PHD; (3) education of all stakeholders about AI technology; and (4) value-based health care including ?allocative value,? ?technology value,? and ?personalized value.? UR - https://medinform.jmir.org/2021/11/e31142 UR - http://dx.doi.org/10.2196/31142 UR - http://www.ncbi.nlm.nih.gov/pubmed/34723823 ID - info:doi/10.2196/31142 ER - TY - JOUR AU - Zanotto, Stella Bruna AU - Beck da Silva Etges, Paula Ana AU - dal Bosco, Avner AU - Cortes, Gabriel Eduardo AU - Ruschel, Renata AU - De Souza, Claudia Ana AU - Andrade, V. Claudio M. AU - Viegas, Felipe AU - Canuto, Sergio AU - Luiz, Washington AU - Ouriques Martins, Sheila AU - Vieira, Renata AU - Polanczyk, Carisi AU - André Gonçalves, Marcos PY - 2021/11/1 TI - Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JO - JMIR Med Inform SP - e29120 VL - 9 IS - 11 KW - natural language processing KW - stroke KW - outcomes KW - electronic medical records KW - EHR KW - electronic health records KW - text processing KW - data mining KW - text classification KW - patient outcomes N2 - Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results: The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations. UR - https://medinform.jmir.org/2021/11/e29120 UR - http://dx.doi.org/10.2196/29120 UR - http://www.ncbi.nlm.nih.gov/pubmed/34723829 ID - info:doi/10.2196/29120 ER - TY - JOUR AU - Teramoto, Kei AU - Takeda, Toshihiro AU - Mihara, Naoki AU - Shimai, Yoshie AU - Manabe, Shirou AU - Kuwata, Shigeki AU - Kondoh, Hiroshi AU - Matsumura, Yasushi PY - 2021/11/1 TI - Detecting Adverse Drug Events Through the Chronological Relationship Between the Medication Period and the Presence of Adverse Reactions From Electronic Medical Record Systems: Observational Study JO - JMIR Med Inform SP - e28763 VL - 9 IS - 11 KW - real world data KW - electronic medical record KW - adverse drug event N2 - Background: Medicines may cause various adverse reactions. An enormous amount of money and effort is spent investigating adverse drug events (ADEs) in clinical trials and postmarketing surveillance. Real-world data from multiple electronic medical records (EMRs) can make it easy to understand the ADEs that occur in actual patients. Objective: In this study, we generated a patient medication history database from physician orders recorded in EMRs, which allowed the period of medication to be clearly identified. Methods: We developed a method for detecting ADEs based on the chronological relationship between the presence of an adverse event and the medication period. To verify our method, we detected ADEs with alanine aminotransferase elevation in patients receiving aspirin, clopidogrel, and ticlopidine. The accuracy of the detection was evaluated with a chart review and by comparison with the Roussel Uclaf Causality Assessment Method (RUCAM), which is a standard method for detecting drug-induced liver injury. Results: The calculated rates of ADE with ALT elevation in patients receiving aspirin, clopidogrel, and ticlopidine were 3.33% (868/26,059 patients), 3.70% (188/5076 patients), and 5.69% (226/3974 patients), respectively, which were in line with the rates of previous reports. We reviewed the medical records of the patients in whom ADEs were detected. Our method accurately predicted ADEs in 90% (27/30patients) treated with aspirin, 100% (9/9 patients) treated with clopidogrel, and 100% (4/4 patients) treated with ticlopidine. Only 3 ADEs that were detected by the RUCAM were not detected by our method. Conclusions: These findings demonstrate that the present method is effective for detecting ADEs based on EMR data. UR - https://medinform.jmir.org/2021/11/e28763 UR - http://dx.doi.org/10.2196/28763 UR - http://www.ncbi.nlm.nih.gov/pubmed/33993103 ID - info:doi/10.2196/28763 ER - TY - JOUR AU - Nunes Vilaza, Giovanna AU - Coyle, David AU - Bardram, Eyvind Jakob PY - 2021/10/29 TI - Public Attitudes to Digital Health Research Repositories: Cross-sectional International Survey JO - J Med Internet Res SP - e31294 VL - 23 IS - 10 KW - digital medicine KW - health informatics KW - health data repositories KW - personal sensing KW - technology acceptance KW - willingness to share data KW - human-centered computing KW - ethics N2 - Background: Digital health research repositories propose sharing longitudinal streams of health records and personal sensing data between multiple projects and researchers. Motivated by the prospect of personalizing patient care (precision medicine), these initiatives demand broad public acceptance and large numbers of data contributors, both of which are challenging. Objective: This study investigates public attitudes toward possibly contributing to digital health research repositories to identify factors for their acceptance and to inform future developments. Methods: A cross-sectional online survey was conducted from March 2020 to December 2020. Because of the funded project scope and a multicenter collaboration, study recruitment targeted young adults in Denmark and Brazil, allowing an analysis of the differences between 2 very contrasting national contexts. Through closed-ended questions, the survey examined participants? willingness to share different data types, data access preferences, reasons for concern, and motivations to contribute. The survey also collected information about participants? demographics, level of interest in health topics, previous participation in health research, awareness of examples of existing research data repositories, and current attitudes about digital health research repositories. Data analysis consisted of descriptive frequency measures and statistical inferences (bivariate associations and logistic regressions). Results: The sample comprises 1017 respondents living in Brazil (1017/1600, 63.56%) and 583 in Denmark (583/1600, 36.44%). The demographics do not differ substantially between participants of these countries. The majority is aged between 18 and 27 years (933/1600, 58.31%), is highly educated (992/1600, 62.00%), uses smartphones (1562/1600, 97.63%), and is in good health (1407/1600, 87.94%). The analysis shows a vast majority were very motivated by helping future patients (1366/1600, 85.38%) and researchers (1253/1600, 78.31%), yet very concerned about unethical projects (1219/1600, 76.19%), profit making without consent (1096/1600, 68.50%), and cyberattacks (1055/1600, 65.94%). Participants? willingness to share data is lower when sharing personal sensing data, such as the content of calls and texts (1206/1600, 75.38%), in contrast to more traditional health research information. Only 13.44% (215/1600) find it desirable to grant data access to private companies, and most would like to stay informed about which projects use their data (1334/1600, 83.38%) and control future data access (1181/1600, 73.81%). Findings indicate that favorable attitudes toward digital health research repositories are related to a personal interest in health topics (odds ratio [OR] 1.49, 95% CI 1.10-2.02; P=.01), previous participation in health research studies (OR 1.70, 95% CI 1.24-2.35; P=.001), and awareness of examples of research repositories (OR 2.78, 95% CI 1.83-4.38; P<.001). Conclusions: This study reveals essential factors for acceptance and willingness to share personal data with digital health research repositories. Implications include the importance of being more transparent about the goals and beneficiaries of research projects using and re-using data from repositories, providing participants with greater autonomy for choosing who gets access to which parts of their data, and raising public awareness of the benefits of data sharing for research. In addition, future developments should engage with and reduce risks for those unwilling to participate. UR - https://www.jmir.org/2021/10/e31294 UR - http://dx.doi.org/10.2196/31294 UR - http://www.ncbi.nlm.nih.gov/pubmed/34714253 ID - info:doi/10.2196/31294 ER - TY - JOUR AU - Lamer, Antoine AU - Abou-Arab, Osama AU - Bourgeois, Alexandre AU - Parrot, Adrien AU - Popoff, Benjamin AU - Beuscart, Jean-Baptiste AU - Tavernier, Benoît AU - Moussa, Djahoum Mouhamed PY - 2021/10/29 TI - Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study JO - J Med Internet Res SP - e29259 VL - 23 IS - 10 KW - data reuse KW - common data model KW - Observational Medical Outcomes Partnership KW - anesthesia KW - data warehouse KW - reproducible research N2 - Background: Electronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM. Objective: The primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM. Methods: Using our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards. Results: We identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care. Conclusions: Generic data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse. UR - https://www.jmir.org/2021/10/e29259 UR - http://dx.doi.org/10.2196/29259 UR - http://www.ncbi.nlm.nih.gov/pubmed/34714250 ID - info:doi/10.2196/29259 ER - TY - JOUR AU - Agrawal, Lavlin AU - Ndabu, Theophile AU - Mulgund, Pavankumar AU - Sharman, Raj PY - 2021/10/28 TI - Factors Affecting the Extent of Patients? Electronic Medical Record Use: An Empirical Study Focusing on System and Patient Characteristics JO - J Med Internet Res SP - e30637 VL - 23 IS - 10 KW - electronic medical record KW - patient safety KW - caregiver KW - chronic conditions KW - HINTS dataset KW - patient technology acceptance model N2 - Background: Patients? access to and use of electronic medical records (EMRs) places greater information in their hands, which helps them better comanage their health, leading to better clinical outcomes. Despite numerous benefits that promote health and well-being, patients? acceptance and use of EMRs remains low. We study the impact of predictors that affect the use of EMR by patients to understand better the underlying causal factors for the lower use of EMR. Objective: This study aims to examine the critical system (eg, performance expectancy and effort expectancy) and patient characteristics (eg, health condition, issue involvement, preventive health behaviors, and caregiving status) that influence the extent of patients? EMR use. Methods: We used secondary data collected by Health Information National Trends Survey 5 cycle 3 and performed survey data analysis using structural equation modeling technique to test our hypotheses. Structural equation modeling is a technique commonly used to measure and analyze the relationships of observed and latent variables. We also addressed common method bias to understand if there was any systematic effect on the observed correlation between the measures for the predictor and predicted variables. Results: The statistically significant drivers of the extent of EMR use were performance expectancy (?=.253; P<.001), perceived behavior control (?=.236; P<.001), health knowledge (?=?.071; P=.007), caregiving status (?=.059; P=.013), issue involvement (?=.356; P<.001), chronic conditions (?=.071; P=.016), and preventive health behavior (?=.076; P=.005). The model accounted for 32.9% of the variance in the extent of EMR use. Conclusions: The study found that health characteristics, such as chronic conditions and patient disposition (eg, preventive health behavior and issue involvement), directly affect the extent of EMR use. The study also revealed that issue involvement mediates the impact of preventive health behaviors and the presence of chronic conditions on the extent of patients? EMR use. UR - https://www.jmir.org/2021/10/e30637 UR - http://dx.doi.org/10.2196/30637 UR - http://www.ncbi.nlm.nih.gov/pubmed/34709181 ID - info:doi/10.2196/30637 ER - TY - JOUR AU - Borysowski, Jan AU - Górski, Andrzej PY - 2021/10/28 TI - ClinicalTrials.gov as a Source of Information About Expanded Access Programs: Cohort Study JO - J Med Internet Res SP - e26890 VL - 23 IS - 10 KW - ClinicalTrials.gov KW - expanded access KW - expanded access program KW - compassionate use KW - unapproved drug KW - investigational drug N2 - Background: ClinicalTrials.gov (CT.gov) is the most comprehensive internet-based register of different types of clinical studies. Expanded access is the use of unapproved drugs, biologics, or medical devices outside of clinical trials. One of the key problems in expanded access is the availability to both health care providers and patients of information about unapproved treatments. Objective: We aimed to evaluate CT.gov as a potential source of information about expanded access programs. Methods: We assessed the completeness of information in the records of 228 expanded access programs registered with CT.gov from February 2017 through May 2020. Moreover, we examined what percentage of published expanded access studies has been registered with CT.gov. Logistic regression (univariate and multivariate) and mediation analyses were used to identify the predictors of the absence of some information and a study?s nonregistration. Results: We found that some important data were missing from the records of many programs. Information that was missing most often included a detailed study description, facility information, central contact person, and eligibility criteria (55.3%, 54.0%, 41.7%, and 17.5% of the programs, respectively). Multivariate analysis showed that information about central contact person was more likely to be missing from records of studies registered in 2017 (adjusted OR 21.93; 95% CI 4.42-172.29; P<.001). This finding was confirmed by mediation analysis (P=.02). Furthermore, 14% of the programs were registered retrospectively. We also showed that only 33 of 77 (42.9%) expanded access studies performed in the United States and published from 2014 through 2019 were registered with CT.gov. However, multivariate logistic regression analysis showed no significant association between any of the variables related to the studies and the odds of study nonregistration (P>.01). Conclusions: Currently, CT.gov is a quite fragmentary source of data on expanded access programs. This problem is important because CT.gov is the only publicly available primary source of information about specific programs. We suggest the actions that should be taken by different stakeholders to fully exploit this register as a source of information about expanded access. UR - https://www.jmir.org/2021/10/e26890 UR - http://dx.doi.org/10.2196/26890 UR - http://www.ncbi.nlm.nih.gov/pubmed/34709189 ID - info:doi/10.2196/26890 ER - TY - JOUR AU - Abid, Leila AU - Kammoun, Ikram AU - Ben Halima, Manel AU - Charfeddine, Salma AU - Ben Slima, Hedi AU - Drissa, Meriem AU - Mzoughi, Khadija AU - Mbarek, Dorra AU - Riahi, Leila AU - Antit, Saoussen AU - Ben Halima, Afef AU - Ouechtati, Wejdene AU - Allouche, Emna AU - Mechri, Mehdi AU - Yousfi, Chedi AU - Khorchani, Ali AU - Abid, Omar AU - Sammoud, Kais AU - Ezzaouia, Khaled AU - Gtif, Imen AU - Ouali, Sana AU - Triki, Feten AU - Hamdi, Sonia AU - Boudiche, Selim AU - Chebbi, Marwa AU - Hentati, Mouna AU - Farah, Amani AU - Triki, Habib AU - Ghardallou, Houda AU - Raddaoui, Haythem AU - Zayed, Sofien AU - Azaiez, Fares AU - Omri, Fadwa AU - Zouari, Akram AU - Ben Ali, Zine AU - Najjar, Aymen AU - Thabet, Houssem AU - Chaker, Mouna AU - Mohamed, Samar AU - Chouaieb, Marwa AU - Ben Jemaa, Abdelhamid AU - Tangour, Haythem AU - Kammoun, Yassmine AU - Bouhlel, Mahmoud AU - Azaiez, Seifeddine AU - Letaief, Rim AU - Maskhi, Salah AU - Amri, Aymen AU - Naanaa, Hela AU - Othmani, Raoudha AU - Chahbani, Iheb AU - Zargouni, Houcine AU - Abid, Syrine AU - Ayari, Mokdad AU - ben Ameur, Ines AU - Gasmi, Ali AU - ben Halima, Nejeh AU - Haouala, Habib AU - Boughzela, Essia AU - Zakhama, Lilia AU - ben Youssef, Soraya AU - Nasraoui, Wided AU - Boujnah, Rachid Mohamed AU - Barakett, Nadia AU - Kraiem, Sondes AU - Drissa, Habiba AU - Ben Khalfallah, Ali AU - Gamra, Habib AU - Kachboura, Salem AU - Bezdah, Leila AU - Baccar, Hedi AU - Milouchi, Sami AU - Sdiri, Wissem AU - Ben Omrane, Skander AU - Abdesselem, Salem AU - Kanoun, Alifa AU - Hezbri, Karima AU - Zannad, Faiez AU - Mebazaa, Alexandre AU - Kammoun, Samir AU - Mourali, Sami Mohamed AU - Addad, Faouzi PY - 2021/10/27 TI - Design and Rationale of the National Tunisian Registry of Heart Failure (NATURE-HF): Protocol for a Multicenter Registry Study JO - JMIR Res Protoc SP - e12262 VL - 10 IS - 10 KW - heart failure KW - acute heart failure KW - chronic heart failure KW - diagnosis KW - prognosis KW - treatment N2 - Background: The frequency of heart failure (HF) in Tunisia is on the rise and has now become a public health concern. This is mainly due to an aging Tunisian population (Tunisia has one of the oldest populations in Africa as well as the highest life expectancy in the continent) and an increase in coronary artery disease and hypertension. However, no extensive data are available on demographic characteristics, prognosis, and quality of care of patients with HF in Tunisia (nor in North Africa). Objective: The aim of this study was to analyze, follow, and evaluate patients with HF in a large nation-wide multicenter trial. Methods: A total of 1700 patients with HF diagnosed by the investigator will be included in the National Tunisian Registry of Heart Failure study (NATURE-HF). Patients must visit the cardiology clinic 1, 3, and 12 months after study inclusion. This follow-up is provided by the investigator. All data are collected via the DACIMA Clinical Suite web interface. Results: At the end of the study, we will note the occurrence of cardiovascular death (sudden death, coronary artery disease, refractory HF, stroke), death from any cause (cardiovascular and noncardiovascular), and the occurrence of a rehospitalization episode for an HF relapse during the follow-up period. Based on these data, we will evaluate the demographic characteristics of the study patients, the characteristics of pathological antecedents, and symptomatic and clinical features of HF. In addition, we will report the paraclinical examination findings such as the laboratory standard parameters and brain natriuretic peptides, electrocardiogram or 24-hour Holter monitoring, echocardiography, and coronarography. We will also provide a description of the therapeutic environment and therapeutic changes that occur during the 1-year follow-up of patients, adverse events following medical treatment and intervention during the 3- and 12-month follow-up, the evaluation of left ventricular ejection fraction during the 3- and 12-month follow-up, the overall rate of rehospitalization over the 1-year follow-up for an HF relapse, and the rate of rehospitalization during the first 3 months after inclusion into the study. Conclusions: The NATURE-HF study will fill a significant gap in the dynamic landscape of HF care and research. It will provide unique and necessary data on the management and outcomes of patients with HF. This study will yield the largest contemporary longitudinal cohort of patients with HF in Tunisia. Trial Registration: ClinicalTrials.gov NCT03262675; https://clinicaltrials.gov/ct2/show/NCT03262675 International Registered Report Identifier (IRRID): DERR1-10.2196/12262 UR - https://www.researchprotocols.org/2021/10/e12262 UR - http://dx.doi.org/10.2196/12262 UR - http://www.ncbi.nlm.nih.gov/pubmed/34704958 ID - info:doi/10.2196/12262 ER - TY - JOUR AU - Li, Mengyang AU - Cai, Hailing AU - Nan, Shan AU - Li, Jialin AU - Lu, Xudong AU - Duan, Huilong PY - 2021/10/21 TI - A Patient-Screening Tool for Clinical Research Based on Electronic Health Records Using OpenEHR: Development Study JO - JMIR Med Inform SP - e33192 VL - 9 IS - 10 KW - openEHR KW - patient screening KW - electronic health record KW - clinical research N2 - Background: The widespread adoption of electronic health records (EHRs) has facilitated the secondary use of EHR data for clinical research. However, screening eligible patients from EHRs is a challenging task. The concepts in eligibility criteria are not completely matched with EHRs, especially derived concepts. The lack of high-level expression of Structured Query Language (SQL) makes it difficult and time consuming to express them. The openEHR Expression Language (EL) as a domain-specific language based on clinical information models shows promise to represent complex eligibility criteria. Objective: The study aims to develop a patient-screening tool based on EHRs for clinical research using openEHR to solve concept mismatch and improve query performance. Methods: A patient-screening tool based on EHRs using openEHR was proposed. It uses the advantages of information models and EL in openEHR to provide high-level expressions and improve query performance. First, openEHR archetypes and templates were chosen to define concepts called simple concepts directly from EHRs. Second, openEHR EL was used to generate derived concepts by combining simple concepts and constraints. Third, a hierarchical index corresponding to archetypes in Elasticsearch (ES) was generated to improve query performance for subqueries and join queries related to the derived concepts. Finally, we realized a patient-screening tool for clinical research. Results: In total, 500 sentences randomly selected from 4691 eligibility criteria in 389 clinical trials on stroke from the Chinese Clinical Trial Registry (ChiCTR) were evaluated. An openEHR-based clinical data repository (CDR) in a grade A tertiary hospital in China was considered as an experimental environment. Based on these, 589 medical concepts were found in the 500 sentences. Of them, 513 (87.1%) concepts could be represented, while the others could not be, because of a lack of information models and coarse-grained requirements. In addition, our case study on 6 queries demonstrated that our tool shows better query performance among 4 cases (66.67%). Conclusions: We developed a patient-screening tool using openEHR. It not only helps solve concept mismatch but also improves query performance to reduce the burden on researchers. In addition, we demonstrated a promising solution for secondary use of EHR data using openEHR, which can be referenced by other researchers. UR - https://medinform.jmir.org/2021/10/e33192 UR - http://dx.doi.org/10.2196/33192 UR - http://www.ncbi.nlm.nih.gov/pubmed/34673526 ID - info:doi/10.2196/33192 ER - TY - JOUR AU - Doyle, Riccardo PY - 2021/10/15 TI - Machine Learning?Based Prediction of COVID-19 Mortality With Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study JO - JMIRx Med SP - e29392 VL - 2 IS - 4 KW - COVID-19 KW - coronavirus KW - medical informatics KW - machine learning KW - artificial intelligence KW - dimensionality reduction KW - automation KW - model development KW - prediction KW - hospital KW - resource management KW - mortality KW - prognosis KW - triage KW - comorbidities KW - public data KW - epidemiology KW - pre-existing conditions N2 - Background: The onset and development of the COVID-19 pandemic have placed pressure on hospital resources and staff worldwide. The integration of more streamlined predictive modeling in prognosis and triage?related decision-making can partly ease this pressure. Objective: The objective of this study is to assess the performance impact of dimensionality reduction on COVID-19 mortality prediction models, demonstrating the high impact of a limited number of features to limit the need for complex variable gathering before reaching meaningful risk labelling in clinical settings. Methods: Standard machine learning classifiers were employed to predict an outcome of either death or recovery using 25 patient-level variables, spanning symptoms, comorbidities, and demographic information, from a geographically diverse sample representing 17 countries. The effects of feature reduction on the data were tested by running classifiers on a high-quality data set of 212 patients with populated entries for all 25 available features. The full data set was compared to two reduced variations with 7 features and 1 feature, respectively, extracted using univariate mutual information and chi-square testing. Classifier performance on each data set was then assessed on the basis of accuracy, sensitivity, specificity, and received operating characteristic?derived area under the curve metrics to quantify benefit or loss from reduction. Results: The performance of the classifiers on the 212-patient sample resulted in strong mortality detection, with the highest performing model achieving specificity of 90.7% (95% CI 89.1%-92.3%) and sensitivity of 92.0% (95% CI 91.0%-92.9%). Dimensionality reduction provided strong benefits for performance. The baseline accuracy of a random forest classifier increased from 89.2% (95% CI 88.0%-90.4%) to 92.5% (95% CI 91.9%-93.0%) when training on 7 chi-square?extracted features and to 90.8% (95% CI 89.8%-91.7%) when training on 7 mutual information?extracted features. Reduction impact on a separate logistic classifier was mixed; however, when present, losses were marginal compared to the extent of feature reduction, altogether showing that reduction either improves performance or can reduce the variable-sourcing burden at hospital admission with little performance loss. Extreme feature reduction to a single most salient feature, often age, demonstrated large standalone explanatory power, with the best-performing model achieving an accuracy of 81.6% (95% CI 81.1%-82.1%); this demonstrates the relatively marginal improvement that additional variables bring to the tested models. Conclusions: Predictive statistical models have promising performance in early prediction of death among patients with COVID-19. Strong dimensionality reduction was shown to further improve baseline performance on selected classifiers and only marginally reduce it in others, highlighting the importance of feature reduction in future model construction and the feasibility of deprioritizing large, hard-to-source, and nonessential feature sets in real world settings. UR - https://med.jmirx.org/2021/4/e29392 UR - http://dx.doi.org/10.2196/29392 UR - http://www.ncbi.nlm.nih.gov/pubmed/34843609 ID - info:doi/10.2196/29392 ER - TY - JOUR AU - Zuo, Zheming AU - Watson, Matthew AU - Budgen, David AU - Hall, Robert AU - Kennelly, Chris AU - Al Moubayed, Noura PY - 2021/10/15 TI - Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study JO - JMIR Med Inform SP - e29871 VL - 9 IS - 10 KW - healthcare KW - privacy-preserving KW - GDPR KW - DPA 2018 KW - EHR KW - SLM KW - data science KW - anonymization KW - reidentification risk KW - usability N2 - Background: Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union?s General Data Protection Regulation (2016) and the United Kingdom?s Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. Objective: In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. Methods: We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization. Results: We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning?based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization. Conclusions: This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications. UR - https://medinform.jmir.org/2021/10/e29871 UR - http://dx.doi.org/10.2196/29871 UR - http://www.ncbi.nlm.nih.gov/pubmed/34652278 ID - info:doi/10.2196/29871 ER - TY - JOUR AU - Gaudet-Blavignac, Christophe AU - Rudaz, Andrea AU - Lovis, Christian PY - 2021/10/13 TI - Building a Shared, Scalable, and Sustainable Source for the Problem-Oriented Medical Record: Developmental Study JO - JMIR Med Inform SP - e29174 VL - 9 IS - 10 KW - medical records KW - problem-oriented KW - electronic health records KW - semantics N2 - Background: Since the creation of the problem-oriented medical record, the building of problem lists has been the focus of many studies. To date, this issue is not well resolved, and building an appropriate contextualized problem list is still a challenge. Objective: This paper aims to present the process of building a shared multipurpose common problem list at the Geneva University Hospitals. This list aims to bridge the gap between clinicians? language expressed in free text and secondary uses requiring structured information. Methods: We focused on the needs of clinicians by building a list of uniquely identified expressions to support their daily activities. In the second stage, these expressions were connected to additional information to build a complex graph of information. A list of 45,946 expressions manually extracted from clinical documents was manually curated and encoded in multiple semantic dimensions, such as International Classification of Diseases, 10th revision; International Classification of Primary Care 2nd edition; Systematized Nomenclature of Medicine Clinical Terms; or dimensions dictated by specific usages, such as identifying expressions specific to a domain, a gender, or an intervention. The list was progressively deployed for clinicians with an iterative process of quality control, maintenance, and improvements, including the addition of new expressions or dimensions for specific needs. The problem management of the electronic health record allowed the measurement and correction of encoding based on real-world use. Results: The list was deployed in production in January 2017 and was regularly updated and deployed in new divisions of the hospital. Over 4 years, 684,102 problems were created using the list. The proportion of free-text entries decreased progressively from 37.47% (8321/22,206) in December 2017 to 18.38% (4547/24,738) in December 2020. In the last version of the list, over 14 dimensions were mapped to expressions, among which 5 were international classifications and 8 were other classifications for specific uses. The list became a central axis in the electronic health record, being used for many different purposes linked to care, such as surgical planning or emergency wards, or in research, for various predictions using machine learning techniques. Conclusions: This study breaks with common approaches primarily by focusing on real clinicians? language when expressing patients? problems and secondarily by mapping whatever is required, including controlled vocabularies to answer specific needs. This approach improves the quality of the expression of patients? problems while allowing the building of as many structured dimensions as needed to convey semantics according to specific contexts. The method is shown to be scalable, sustainable, and efficient at hiding the complexity of semantics or the burden of constraint-structured problem list entry for clinicians. Ongoing work is analyzing the impact of this approach on how clinicians express patients? problems. UR - https://medinform.jmir.org/2021/10/e29174 UR - http://dx.doi.org/10.2196/29174 UR - http://www.ncbi.nlm.nih.gov/pubmed/34643542 ID - info:doi/10.2196/29174 ER - TY - JOUR AU - Berenspöhler, Sarah AU - Minnerup, Jens AU - Dugas, Martin AU - Varghese, Julian PY - 2021/10/12 TI - Common Data Elements for Meaningful Stroke Documentation in Routine Care and Clinical Research: Retrospective Data Analysis JO - JMIR Med Inform SP - e27396 VL - 9 IS - 10 KW - common data elements KW - stroke KW - documentation N2 - Background: Medical information management for stroke patients is currently a very time-consuming endeavor. There are clear guidelines and procedures to treat patients having acute stroke, but it is not known how well these established practices are reflected in patient documentation. Objective: This study compares a variety of documentation processes regarding stroke. The main objective of this work is to provide an overview of the most commonly occurring medical concepts in stroke documentation and identify overlaps between different documentation contexts to allow for the definition of a core data set that could be used in potential data interfaces. Methods: Medical source documentation forms from different documentation contexts, including hospitals, clinical trials, registries, and international standards, regarding stroke treatment followed by rehabilitation were digitized in the operational data model. Each source data element was semantically annotated using the Unified Medical Language System. The concept codes were analyzed for semantic overlaps. A concept was considered common if it appeared in at least two documentation contexts. The resulting common concepts were extended with implementation details, including data types and permissible values based on frequent patterns of source data elements, using an established expert-based and semiautomatic approach. Results: In total, 3287 data elements were identified, and 1051 of these emerged as unique medical concepts. The 100 most frequent medical concepts cover 9.51% (100/1051) of all concept occurrences in stroke documentation, and the 50 most frequent concepts cover 4.75% (50/1051). A list of common data elements was implemented in different standardized machine-readable formats on a public metadata repository for interoperable reuse. Conclusions: Standardization of medical documentation is a prerequisite for data exchange as well as the transferability and reuse of data. In the long run, standardization would save time and money and extend the capabilities for which such data could be used. In the context of this work, a lack of standardization was observed regarding current information management. Free-form text fields and intricate questions complicate automated data access and transfer between institutions. This work also revealed the potential of a unified documentation process as a core data set of the 50 most frequent common data elements, accounting for 34% of the documentation in medical information management. Such a data set offers a starting point for standardized and interoperable data collection in routine care, quality management, and clinical research. UR - https://medinform.jmir.org/2021/10/e27396 UR - http://dx.doi.org/10.2196/27396 UR - http://www.ncbi.nlm.nih.gov/pubmed/34636733 ID - info:doi/10.2196/27396 ER - TY - JOUR AU - Weber, M. Griffin AU - Zhang, G. Harrison AU - L'Yi, Sehi AU - Bonzel, Clara-Lea AU - Hong, Chuan AU - Avillach, Paul AU - Gutiérrez-Sacristán, Alba AU - Palmer, P. Nathan AU - Tan, Min Amelia Li AU - Wang, Xuan AU - Yuan, William AU - Gehlenborg, Nils AU - Alloni, Anna AU - Amendola, F. Danilo AU - Bellasi, Antonio AU - Bellazzi, Riccardo AU - Beraghi, Michele AU - Bucalo, Mauro AU - Chiovato, Luca AU - Cho, Kelly AU - Dagliati, Arianna AU - Estiri, Hossein AU - Follett, W. Robert AU - García Barrio, Noelia AU - Hanauer, A. David AU - Henderson, W. Darren AU - Ho, Yuk-Lam AU - Holmes, H. John AU - Hutch, R. Meghan AU - Kavuluru, Ramakanth AU - Kirchoff, Katie AU - Klann, G. Jeffrey AU - Krishnamurthy, K. Ashok AU - Le, T. Trang AU - Liu, Molei AU - Loh, Will Ne Hooi AU - Lozano-Zahonero, Sara AU - Luo, Yuan AU - Maidlow, Sarah AU - Makoudjou, Adeline AU - Malovini, Alberto AU - Martins, Roberto Marcelo AU - Moal, Bertrand AU - Morris, Michele AU - Mowery, L. Danielle AU - Murphy, N. Shawn AU - Neuraz, Antoine AU - Ngiam, Yuan Kee AU - Okoshi, P. Marina AU - Omenn, S. Gilbert AU - Patel, P. Lav AU - Pedrera Jiménez, Miguel AU - Prudente, A. Robson AU - Samayamuthu, Jebathilagam Malarkodi AU - Sanz Vidorreta, J. Fernando AU - Schriver, R. Emily AU - Schubert, Petra AU - Serrano Balazote, Pablo AU - Tan, WL Byorn AU - Tanni, E. Suzana AU - Tibollo, Valentina AU - Visweswaran, Shyam AU - Wagholikar, B. Kavishwar AU - Xia, Zongqi AU - Zöller, Daniela AU - AU - Kohane, S. Isaac AU - Cai, Tianxi AU - South, M. Andrew AU - Brat, A. Gabriel PY - 2021/10/11 TI - International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study JO - J Med Internet Res SP - e31400 VL - 23 IS - 10 KW - SARS-CoV-2 KW - electronic health records KW - federated study KW - retrospective cohort study KW - meta-analysis KW - COVID-19 KW - severe COVID-19 KW - laboratory trajectory N2 - Background: Many countries have experienced 2 predominant waves of COVID-19?related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. Objective: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. Methods: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. Results: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. Conclusions: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve. UR - https://www.jmir.org/2021/10/e31400 UR - http://dx.doi.org/10.2196/31400 UR - http://www.ncbi.nlm.nih.gov/pubmed/34533459 ID - info:doi/10.2196/31400 ER - TY - JOUR AU - Tong, Yao AU - Liao, C. Zachary AU - Tarczy-Hornoch, Peter AU - Luo, Gang PY - 2021/10/7 TI - Using a Constraint-Based Method to Identify Chronic Disease Patients Who Are Apt to Obtain Care Mostly Within a Given Health Care System: Retrospective Cohort Study JO - JMIR Form Res SP - e26314 VL - 5 IS - 10 KW - asthma KW - chronic kidney disease KW - chronic obstructive pulmonary disease KW - data analysis KW - diabetes mellitus KW - emergency department KW - health care system KW - inpatients KW - patient care management N2 - Background: For several major chronic diseases including asthma, chronic obstructive pulmonary disease, chronic kidney disease, and diabetes, a state-of-the-art method to avert poor outcomes is to use predictive models to identify future high-cost patients for preemptive care management interventions. Frequently, an American patient obtains care from multiple health care systems, each managed by a distinct institution. As the patient?s medical data are spread across these health care systems, none has complete medical data for the patient. The task of building models to predict an individual patient?s cost is currently thought to be impractical with incomplete data, which limits the use of care management to improve outcomes. Recently, we developed a constraint-based method to identify patients who are apt to obtain care mostly within a given health care system. Our method was shown to work well for the cohort of all adult patients at the University of Washington Medicine for a 6-month follow-up period. It is unknown how well our method works for patients with various chronic diseases and over follow-up periods of different lengths, and subsequently, whether it is reasonable to perform this predictive modeling task on the subset of patients pinpointed by our method. Objective: To understand our method?s potential to enable this predictive modeling task on incomplete medical data, this study assesses our method?s performance at the University of Washington Medicine on 5 subgroups of adult patients with major chronic diseases and over follow-up periods of 2 different lengths. Methods: We used University of Washington Medicine data for all adult patients who obtained care at the University of Washington Medicine in 2018 and PreManage data containing usage information from all hospitals in Washington state in 2019. We evaluated our method?s performance over the follow-up periods of 6 months and 12 months on 5 patient subgroups separately?asthma, chronic kidney disease, type 1 diabetes, type 2 diabetes, and chronic obstructive pulmonary disease. Results: Our method identified 21.81% (3194/14,644) of University of Washington Medicine adult patients with asthma. Around 66.75% (797/1194) and 67.13% (1997/2975) of their emergency department visits and inpatient stays took place within the University of Washington Medicine system in the subsequent 6 months and in the subsequent 12 months, respectively, approximately double the corresponding percentage for all University of Washington Medicine adult patients with asthma. The performance for adult patients with chronic kidney disease, adult patients with chronic obstructive pulmonary disease, adult patients with type 1 diabetes, and adult patients with type 2 diabetes was reasonably similar to that for adult patients with asthma. Conclusions: For each of the 5 chronic diseases most relevant to care management, our method can pinpoint a reasonably large subset of patients who are apt to obtain care mostly within the University of Washington Medicine system. This opens the door to building models to predict an individual patient?s cost on incomplete data, which was formerly deemed impractical. International Registered Report Identifier (IRRID): RR2-10.2196/13783 UR - https://formative.jmir.org/2021/10/e26314 UR - http://dx.doi.org/10.2196/26314 UR - http://www.ncbi.nlm.nih.gov/pubmed/34617906 ID - info:doi/10.2196/26314 ER - TY - JOUR AU - Foraker, Randi AU - Guo, Aixia AU - Thomas, Jason AU - Zamstein, Noa AU - Payne, RO Philip AU - Wilcox, Adam AU - PY - 2021/10/4 TI - The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data JO - J Med Internet Res SP - e30697 VL - 23 IS - 10 KW - synthetic data KW - protected health information KW - COVID-19 KW - electronic health records and systems KW - data analysis N2 - Background: Computationally derived (?synthetic?) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. Objective: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. Methods: We used the National COVID Cohort Collaborative?s instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19?positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19?related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. Results: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. Conclusions: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. UR - https://www.jmir.org/2021/10/e30697 UR - http://dx.doi.org/10.2196/30697 UR - http://www.ncbi.nlm.nih.gov/pubmed/34559671 ID - info:doi/10.2196/30697 ER - TY - JOUR AU - Braunack-Mayer, Annette AU - Fabrianesi, Belinda AU - Street, Jackie AU - O'Shaughnessy, Pauline AU - Carter, M. Stacy AU - Engelen, Lina AU - Carolan, Lucy AU - Bosward, Rebecca AU - Roder, David AU - Sproston, Kylie PY - 2021/10/1 TI - Sharing Government Health Data With the Private Sector: Community Attitudes Survey JO - J Med Internet Res SP - e24200 VL - 23 IS - 10 KW - big data KW - health information systems KW - health data KW - private sector KW - data linkage KW - public opinion KW - consent KW - trust KW - public interest KW - social license N2 - Background: The use of government health data for secondary purposes, such as monitoring the quality of hospital services, researching the health needs of populations, and testing how well new treatments work, is increasing. This increase in the secondary uses of health data has led to increased interest in what the public thinks about data sharing, in particular, the possibilities of sharing with the private sector for research and development. Although international evidence demonstrates broad public support for the secondary use of health data, this support does not extend to sharing health data with the private sector. If governments intend to share health data with the private sector, knowing what the public thinks will be important. This paper reports a national survey to explore public attitudes in Australia toward sharing health data with private companies for research on and development of therapeutic drugs and medical devices. Objective: This study aims to explore public attitudes in Australia toward sharing government health data with the private sector. Methods: A web-based survey tool was developed to assess attitudes about sharing government health data with the private sector. A market research company was employed to administer the web-based survey in June 2019. Results: The survey was completed by 2537 individuals residing in Australia. Between 51.8% and 57.98% of all participants were willing to share their data, with slightly fewer in favor of sharing to improve health services (51.99%) and a slightly higher proportion in favor of sharing for research and development (57.98%). There was a preference for opt-in consent (53.44%) and broad support for placing conditions on sharing health information with private companies (62% to 91.99%). Wide variability was also observed in participants? views about the extent to which the private sector could be trusted and how well they would behave if entrusted with people?s health information. In their qualitative responses, the participants noted concerns about private sector corporate interests, corruption, and profit making and expressed doubt about the Australian government?s capacity to manage data sharing safely. The percentages presented are adjusted against the Australian population. Conclusions: This nationally representative survey provides preliminary evidence that Australians are uncertain about sharing their health data with the private sector. Although just over half of all the respondents supported sharing health data with the private sector, there was also strong support for strict conditions on sharing data and for opt-in consent and significant concerns about how well the private sector would manage government health data. Addressing public concern about sharing government health data with the private sector will require more and better engagement to build community understanding about how agencies can collect, share, protect, and use their personal data. UR - https://www.jmir.org/2021/10/e24200 UR - http://dx.doi.org/10.2196/24200 UR - http://www.ncbi.nlm.nih.gov/pubmed/34596573 ID - info:doi/10.2196/24200 ER - TY - JOUR AU - Lee, Junghwan AU - Kim, Hyun Jae AU - Liu, Cong AU - Hripcsak, George AU - Natarajan, Karthik AU - Ta, Casey AU - Weng, Chunhua PY - 2021/9/30 TI - Columbia Open Health Data for COVID-19 Research: Database Analysis JO - J Med Internet Res SP - e31122 VL - 23 IS - 9 KW - COVID-19 KW - open data KW - electronic health record KW - data science KW - research KW - data KW - access KW - database KW - symptom KW - cohort KW - prevalence N2 - Background: COVID-19 has threatened the health of tens of millions of people all over the world. Massive research efforts have been made in response to the COVID-19 pandemic. Utilization of clinical data can accelerate these research efforts to combat the pandemic since important characteristics of the patients are often found by examining the clinical data. Publicly accessible clinical data on COVID-19, however, remain limited despite the immediate need. Objective: To provide shareable clinical data to catalyze COVID-19 research, we present Columbia Open Health Data for COVID-19 Research (COHD-COVID), a publicly accessible database providing clinical concept prevalence, clinical concept co-occurrence, and clinical symptom prevalence for hospitalized patients with COVID-19. COHD-COVID also provides data on hospitalized patients with influenza and general hospitalized patients as comparator cohorts. Methods: The data used in COHD-COVID were obtained from NewYork-Presbyterian/Columbia University Irving Medical Center?s electronic health records database. Condition, drug, and procedure concepts were obtained from the visits of identified patients from the cohorts. Rare concepts were excluded, and the true concept counts were perturbed using Poisson randomization to protect patient privacy. Concept prevalence, concept prevalence ratio, concept co-occurrence, and symptom prevalence were calculated using the obtained concepts. Results: Concept prevalence and concept prevalence ratio analyses showed the clinical characteristics of the COVID-19 cohorts, confirming the well-known characteristics of COVID-19 (eg, acute lower respiratory tract infection and cough). The concepts related to the well-known characteristics of COVID-19 recorded high prevalence and high prevalence ratio in the COVID-19 cohort compared to the hospitalized influenza cohort and general hospitalized cohort. Concept co-occurrence analyses showed potential associations between specific concepts. In case of acute lower respiratory tract infection in the COVID-19 cohort, a high co-occurrence ratio was obtained with COVID-19?related concepts and commonly used drugs (eg, disease due to coronavirus and acetaminophen). Symptom prevalence analysis indicated symptom-level characteristics of the cohorts and confirmed that well-known symptoms of COVID-19 (eg, fever, cough, and dyspnea) showed higher prevalence than the hospitalized influenza cohort and the general hospitalized cohort. Conclusions: We present COHD-COVID, a publicly accessible database providing useful clinical data for hospitalized patients with COVID-19, hospitalized patients with influenza, and general hospitalized patients. We expect COHD-COVID to provide researchers and clinicians quantitative measures of COVID-19?related clinical features to better understand and combat the pandemic. UR - https://www.jmir.org/2021/9/e31122 UR - http://dx.doi.org/10.2196/31122 UR - http://www.ncbi.nlm.nih.gov/pubmed/34543225 ID - info:doi/10.2196/31122 ER - TY - JOUR AU - Sankaranarayanan, Saranya AU - Balan, Jagadheshwar AU - Walsh, R. Jesse AU - Wu, Yanhong AU - Minnich, Sara AU - Piazza, Amy AU - Osborne, Collin AU - Oliver, R. Gavin AU - Lesko, Jessica AU - Bates, L. Kathy AU - Khezeli, Kia AU - Block, R. Darci AU - DiGuardo, Margaret AU - Kreuter, Justin AU - O?Horo, C. John AU - Kalantari, John AU - Klee, W. Eric AU - Salama, E. Mohamed AU - Kipp, Benjamin AU - Morice, G. William AU - Jenkinson, Garrett PY - 2021/9/28 TI - COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation JO - J Med Internet Res SP - e30157 VL - 23 IS - 9 KW - COVID-19 KW - mortality KW - prediction KW - recurrent neural networks KW - missing data KW - time series KW - deep learning KW - machine learning KW - neural network KW - electronic health record KW - EHR KW - algorithm KW - development KW - validation N2 - Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient?s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19?positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result. UR - https://www.jmir.org/2021/9/e30157 UR - http://dx.doi.org/10.2196/30157 UR - http://www.ncbi.nlm.nih.gov/pubmed/34449401 ID - info:doi/10.2196/30157 ER - TY - JOUR AU - Daniels, Helen AU - Jones, Helen Kerina AU - Heys, Sharon AU - Ford, Vincent David PY - 2021/9/24 TI - Exploring the Use of Genomic and Routinely Collected Data: Narrative Literature Review and Interview Study JO - J Med Internet Res SP - e15739 VL - 23 IS - 9 KW - genomic data KW - routine data KW - electronic health records KW - health data science KW - genome KW - data regulation KW - case study KW - eHealth N2 - Background: Advancing the use of genomic data with routinely collected health data holds great promise for health care and research. Increasing the use of these data is a high priority to understand and address the causes of disease. Objective: This study aims to provide an outline of the use of genomic data alongside routinely collected data in health research to date. As this field prepares to move forward, it is important to take stock of the current state of play in order to highlight new avenues for development, identify challenges, and ensure that adequate data governance models are in place for safe and socially acceptable progress. Methods: We conducted a literature review to draw information from past studies that have used genomic and routinely collected data and conducted interviews with individuals who use these data for health research. We collected data on the following: the rationale of using genomic data in conjunction with routinely collected data, types of genomic and routinely collected data used, data sources, project approvals, governance and access models, and challenges encountered. Results: The main purpose of using genomic and routinely collected data was to conduct genome-wide and phenome-wide association studies. Routine data sources included electronic health records, disease and death registries, health insurance systems, and deprivation indices. The types of genomic data included polygenic risk scores, single nucleotide polymorphisms, and measures of genetic activity, and biobanks generally provided these data. Although the literature search showed that biobanks released data to researchers, the case studies revealed a growing tendency for use within a data safe haven. Challenges of working with these data revolved around data collection, data storage, technical, and data privacy issues. Conclusions: Using genomic and routinely collected data holds great promise for progressing health research. Several challenges are involved, particularly in terms of privacy. Overcoming these barriers will ensure that the use of these data to progress health research can be exploited to its full potential. UR - https://www.jmir.org/2021/9/e15739 UR - http://dx.doi.org/10.2196/15739 UR - http://www.ncbi.nlm.nih.gov/pubmed/34559060 ID - info:doi/10.2196/15739 ER - TY - JOUR AU - Alaqra, Sarah Ala AU - Kane, Bridget AU - Fischer-Hübner, Simone PY - 2021/9/16 TI - Machine Learning?Based Analysis of Encrypted Medical Data in the Cloud: Qualitative Study of Expert Stakeholders? Perspectives JO - JMIR Hum Factors SP - e21810 VL - 8 IS - 3 KW - medical data analysis KW - encryption KW - privacy-enhancing technologies KW - machine learning KW - stakeholders KW - tradeoffs KW - perspectives KW - eHealth KW - interviews N2 - Background: Third-party cloud-based data analysis applications are proliferating in electronic health (eHealth) because of the expertise offered and their monetary advantage. However, privacy and security are critical concerns when handling sensitive medical data in the cloud. Technical advances based on ?crypto magic? in privacy-preserving machine learning (ML) enable data analysis in encrypted form for maintaining confidentiality. Such privacy-enhancing technologies (PETs) could be counterintuitive to relevant stakeholders in eHealth, which could in turn hinder adoption; thus, more attention is needed on human factors for establishing trust and transparency. Objective: The aim of this study was to analyze eHealth expert stakeholders? perspectives and the perceived tradeoffs in regard to data analysis on encrypted medical data in the cloud, and to derive user requirements for development of a privacy-preserving data analysis tool. Methods: We used semistructured interviews and report on 14 interviews with individuals having medical, technical, or research expertise in eHealth. We used thematic analysis for analyzing interview data. In addition, we conducted a workshop for eliciting requirements. Results: Our results show differences in the understanding of and in trusting the technology; caution is advised by technical experts, whereas patient safety assurances are required by medical experts. Themes were identified with general perspectives on data privacy and practices (eg, acceptance of using external services), as well as themes highlighting specific perspectives (eg, data protection drawbacks and concerns of the data analysis on encrypted data). The latter themes result in requiring assurances and conformance testing for trusting tools such as the proposed ML-based tool. Communicating privacy, and utility benefits and tradeoffs with stakeholders is essential for trust. Furthermore, stakeholders and their organizations share accountability of patient data. Finally, stakeholders stressed the importance of informing patients about the privacy of their data. Conclusions: Understanding the benefits and risks of using eHealth PETs is crucial, and collaboration among diverse stakeholders is essential. Assurances of the tool?s privacy, accuracy, and patient safety should be in place for establishing trust of ML-based PETs, especially if used in the cloud. UR - https://humanfactors.jmir.org/2021/3/e21810 UR - http://dx.doi.org/10.2196/21810 UR - http://www.ncbi.nlm.nih.gov/pubmed/34528892 ID - info:doi/10.2196/21810 ER - TY - JOUR AU - Chi, Chien-Yu AU - Ao, Shuang AU - Winkler, Adrian AU - Fu, Kuan-Chun AU - Xu, Jie AU - Ho, Yi-Lwun AU - Huang, Chien-Hua AU - Soltani, Rohollah PY - 2021/9/13 TI - Predicting the Mortality and Readmission of In-Hospital Cardiac Arrest Patients With Electronic Health Records: A Machine Learning Approach JO - J Med Internet Res SP - e27798 VL - 23 IS - 9 KW - in-hospital cardiac arrest KW - 30-day mortality KW - 30-day readmission KW - machine learning KW - imbalanced dataset N2 - Background: In-hospital cardiac arrest (IHCA) is associated with high mortality and health care costs in the recovery phase. Predicting adverse outcome events, including readmission, improves the chance for appropriate interventions and reduces health care costs. However, studies related to the early prediction of adverse events of IHCA survivors are rare. Therefore, we used a deep learning model for prediction in this study. Objective: This study aimed to demonstrate that with the proper data set and learning strategies, we can predict the 30-day mortality and readmission of IHCA survivors based on their historical claims. Methods: National Health Insurance Research Database claims data, including 168,693 patients who had experienced IHCA at least once and 1,569,478 clinical records, were obtained to generate a data set for outcome prediction. We predicted the 30-day mortality/readmission after each current record (ALL-mortality/ALL-readmission) and 30-day mortality/readmission after IHCA (cardiac arrest [CA]-mortality/CA-readmission). We developed a hierarchical vectorizer (HVec) deep learning model to extract patients? information and predict mortality and readmission. To embed the textual medical concepts of the clinical records into our deep learning model, we used Text2Node to compute the distributed representations of all medical concept codes as a 128-dimensional vector. Along with the patient?s demographic information, our novel HVec model generated embedding vectors to hierarchically describe the health status at the record-level and patient-level. Multitask learning involving two main tasks and auxiliary tasks was proposed. As CA-mortality and CA-readmission were rare, person upsampling of patients with CA and weighting of CA records were used to improve prediction performance. Results: With the multitask learning setting in the model learning process, we achieved an area under the receiver operating characteristic of 0.752 for CA-mortality, 0.711 for ALL-mortality, 0.852 for CA-readmission, and 0.889 for ALL-readmission. The area under the receiver operating characteristic was improved to 0.808 for CA-mortality and 0.862 for CA-readmission after solving the extremely imbalanced issue for CA-mortality/CA-readmission by upsampling and weighting. Conclusions: This study demonstrated the potential of predicting future outcomes for IHCA survivors by machine learning. The results showed that our proposed approach could effectively alleviate data imbalance problems and train a better model for outcome prediction. UR - https://www.jmir.org/2021/9/e27798 UR - http://dx.doi.org/10.2196/27798 UR - http://www.ncbi.nlm.nih.gov/pubmed/34515639 ID - info:doi/10.2196/27798 ER - TY - JOUR AU - Geva, A. Gil AU - Ketko, Itay AU - Nitecki, Maya AU - Simon, Shoham AU - Inbar, Barr AU - Toledo, Itay AU - Shapiro, Michael AU - Vaturi, Barak AU - Votta, Yoni AU - Filler, Daniel AU - Yosef, Roey AU - Shpitzer, A. Sagi AU - Hir, Nabil AU - Peri Markovich, Michal AU - Shapira, Shachar AU - Fink, Noam AU - Glasberg, Elon AU - Furer, Ariel PY - 2021/9/10 TI - Data Empowerment of Decision-Makers in an Era of a Pandemic: Intersection of ?Classic? and Artificial Intelligence in the Service of Medicine JO - J Med Internet Res SP - e24295 VL - 23 IS - 9 KW - COVID-19 KW - medical informatics KW - decision-making KW - pandemic KW - data KW - policy KW - validation KW - accuracy KW - data analysis N2 - Background: The COVID-19 outbreak required prompt action by health authorities around the world in response to a novel threat. With enormous amounts of information originating in sources with uncertain degree of validation and accuracy, it is essential to provide executive-level decision-makers with the most actionable, pertinent, and updated data analysis to enable them to adapt their strategy swiftly and competently. Objective: We report here the origination of a COVID-19 dedicated response in the Israel Defense Forces with the assembly of an operational Data Center for the Campaign against Coronavirus. Methods: Spearheaded by directors with clinical, operational, and data analytics orientation, a multidisciplinary team utilized existing and newly developed platforms to collect and analyze large amounts of information on an individual level in the context of SARS-CoV-2 contraction and infection. Results: Nearly 300,000 responses to daily questionnaires were recorded and were merged with other data sets to form a unified data lake. By using basic as well as advanced analytic tools ranging from simple aggregation and display of trends to data science application, we provided commanders and clinicians with access to trusted, accurate, and personalized information and tools that were designed to foster operational changes and mitigate the propagation of the pandemic. The developed tools aided in the in the identification of high-risk individuals for severe disease and resulted in a 30% decline in their attendance to their units. Moreover, the queue for laboratory examination for COVID-19 was optimized using a predictive model and resulted in a high true-positive rate of 20%, which is more than twice as high as the baseline rate (2.28%, 95% CI 1.63%-3.19%). Conclusions: In times of ambiguity and uncertainty, along with an unprecedented flux of information, health organizations may find multidisciplinary teams working to provide intelligence from diverse and rich data a key factor in providing executives relevant and actionable support for decision-making. UR - https://www.jmir.org/2021/9/e24295 UR - http://dx.doi.org/10.2196/24295 UR - http://www.ncbi.nlm.nih.gov/pubmed/34313589 ID - info:doi/10.2196/24295 ER - TY - JOUR AU - Stamenova, Vess AU - Chu, Cherry AU - Pang, Andrea AU - Tadrous, Mina AU - Bhatia, Sacha R. AU - Cram, Peter PY - 2021/9/7 TI - Using Administrative Data to Explore Potentially Aberrant Provision of Virtual Care During COVID-19: Retrospective Cohort Study of Ontario Provincial Data JO - J Med Internet Res SP - e29396 VL - 23 IS - 9 KW - telemedicine KW - virtual care KW - COVID-19 KW - pandemic KW - virtual health KW - telehealth KW - ambulatory visits KW - physicians KW - patients KW - digital health N2 - Background: The COVID-19 pandemic has led to a rapid increase in virtual care use across the globe. Many health care systems have responded by creating virtual care billing codes that allow physicians to see their patients over telephone or video. This rapid liberalization of billing requirements, both in Canada and other countries, has led to concerns about potential abuse, but empirical data are limited. Objective: The objectives of this study were to examine whether there were substantial changes in physicians? ambulatory visit volumes coinciding with the liberalization of virtual care billing rules and to describe the characteristics of physicians who significantly increased their ambulatory visit volumes during this period. We also sought to describe the relationship between visit volume changes in 2020 and the volumes of virtual care use among individual physicians and across specialties. Methods: We conducted a population-based, retrospective cohort study using health administrative data from the Ontario Health Insurance Plan, which was linked to the ICES Physician Database. We identified a unique cohort of providers based on physicians? billings and calculated the ratio of total in-person and virtual ambulatory visits over the period from January to June 2020 (virtual predominating) relative to that over the period from January to June 2019 (in-person predominating) for each physician. Based on these ratios, we then stratified physicians into four groups: low-, same-, high-, and very high?use physicians. We then calculated various demographic and practice characteristics of physicians in each group. Results: Among 28,383 eligible physicians in 2020, the mean ratio of ambulatory visits in January to June 2020:2019 was 0.99 (SD 2.53; median 0.81, IQR 0.59-1.0). Out of 28,383 physicians, only 2672 (9.4%) fell into the high-use group and only 291 (1.0%) fell into the very high?use group. High-use physicians were younger, more recent graduates, more likely female, and less likely to be international graduates. They also had, on average, lower-volume practices. There was a significant positive correlation between percent virtual care and the 2020:2019 ratio only in the group of physicians who maintained their practice (R=0.35, P<.001). There was also a significant positive correlation between the 2020:2019 ratio and the percent virtual care per specialty (R=0.59, P<.01). Conclusions: During the early stages of the pandemic, the introduction of virtual care did not lead to significant increases in visit volume. Our results provide reassuring evidence that relaxation of billing requirements early in the COVID-19 pandemic in Ontario were not associated with widespread and aberrant billing behaviors. Furthermore, the strong relationship between the ability to maintain practice volumes and the use of virtual care suggests that the introduction of virtual care allowed for continued access to care for patients. UR - https://www.jmir.org/2021/9/e29396 UR - http://dx.doi.org/10.2196/29396 UR - http://www.ncbi.nlm.nih.gov/pubmed/34313590 ID - info:doi/10.2196/29396 ER - TY - JOUR AU - Li, Patrick AU - Chen, Bob AU - Rhodes, Evan AU - Slagle, Jason AU - Alrifai, Wael Mhd AU - France, Daniel AU - Chen, You PY - 2021/9/3 TI - Measuring Collaboration Through Concurrent Electronic Health Record Usage: Network Analysis Study JO - JMIR Med Inform SP - e28998 VL - 9 IS - 9 KW - collaboration KW - electronic health records KW - audit logs KW - health care workers KW - neonatal intensive care unit KW - network analysis KW - clustering KW - visualization KW - concurrent interaction KW - human-computer interaction KW - survey instrument KW - informatics framework KW - secondary data analysis N2 - Background: Collaboration is vital within health care institutions, and it allows for the effective use of collective health care worker (HCW) expertise. Human-computer interactions involving electronic health records (EHRs) have become pervasive and act as an avenue for quantifying these collaborations using statistical and network analysis methods. Objective: We aimed to measure HCW collaboration and its characteristics by analyzing concurrent EHR usage. Methods: By extracting concurrent EHR usage events from audit log data, we defined concurrent sessions. For each HCW, we established a metric called concurrent intensity, which was the proportion of EHR activities in concurrent sessions over all EHR activities. Statistical models were used to test the differences in the concurrent intensity between HCWs. For each patient visit, starting from admission to discharge, we measured concurrent EHR usage across all HCWs, which we called temporal patterns. Again, we applied statistical models to test the differences in temporal patterns of the admission, discharge, and intermediate days of hospital stay between weekdays and weekends. Network analysis was leveraged to measure collaborative relationships among HCWs. We surveyed experts to determine if they could distinguish collaborative relationships between high and low likelihood categories derived from concurrent EHR usage. Clustering was used to aggregate concurrent activities to describe concurrent sessions. We gathered 4 months of EHR audit log data from a large academic medical center?s neonatal intensive care unit (NICU) to validate the effectiveness of our framework. Results: There was a significant difference (P<.001) in the concurrent intensity (proportion of concurrent activities: ranging from mean 0.07, 95% CI 0.06-0.08, to mean 0.36, 95% CI 0.18-0.54; proportion of time spent on concurrent activities: ranging from mean 0.32, 95% CI 0.20-0.44, to mean 0.76, 95% CI 0.51-1.00) between the top 13 HCW specialties who had the largest amount of time spent in EHRs. Temporal patterns between weekday and weekend periods were significantly different on admission (number of concurrent intervals per hour: 11.60 vs 0.54; P<.001) and discharge days (4.72 vs 1.54; P<.001), but not during intermediate days of hospital stay. Neonatal nurses, fellows, frontline providers, neonatologists, consultants, respiratory therapists, and ancillary and support staff had collaborative relationships. NICU professionals could distinguish high likelihood collaborative relationships from low ones at significant rates (3.54, 95% CI 3.31-4.37 vs 2.64, 95% CI 2.46-3.29; P<.001). We identified 50 clusters of concurrent activities. Over 87% of concurrent sessions could be described by a single cluster, with the remaining 13% of sessions comprising multiple clusters. Conclusions: Leveraging concurrent EHR usage workflow through audit logs to analyze HCW collaboration may improve our understanding of collaborative patient care. HCW collaboration using EHRs could potentially influence the quality of patient care, discharge timeliness, and clinician workload, stress, or burnout. UR - https://medinform.jmir.org/2021/9/e28998 UR - http://dx.doi.org/10.2196/28998 UR - http://www.ncbi.nlm.nih.gov/pubmed/34477566 ID - info:doi/10.2196/28998 ER - TY - JOUR AU - Patterson, Rees Jenny AU - Shaw, Donna AU - Thomas, R. Sharita AU - Hayes, A. Julie AU - Daley, R. Christopher AU - Knight, Stefania AU - Aikat, Jay AU - Mieczkowska, O. Joanna AU - Ahalt, C. Stanley AU - Krishnamurthy, K. Ashok PY - 2021/9/2 TI - COVID-19 Data Utilization in North Carolina: Qualitative Analysis of Stakeholder Experiences JO - JMIR Public Health Surveill SP - e29310 VL - 7 IS - 9 KW - qualitative research KW - interview KW - COVID-19 KW - SARS-CoV-2 KW - pandemic KW - data collection KW - data reporting KW - data KW - public health KW - coronavirus disease 2019 N2 - Background: As the world faced the pandemic caused by the novel coronavirus disease 2019 (COVID-19), medical professionals, technologists, community leaders, and policy makers sought to understand how best to leverage data for public health surveillance and community education. With this complex public health problem, North Carolinians relied on data from state, federal, and global health organizations to increase their understanding of the pandemic and guide decision-making. Objective: We aimed to describe the role that stakeholders involved in COVID-19?related data played in managing the pandemic in North Carolina. The study investigated the processes used by organizations throughout the state in using, collecting, and reporting COVID-19 data. Methods: We used an exploratory qualitative study design to investigate North Carolina?s COVID-19 data collection efforts. To better understand these processes, key informant interviews were conducted with employees from organizations that collected COVID-19 data across the state. We developed an interview guide, and open-ended semistructured interviews were conducted during the period from June through November 2020. Interviews lasted between 30 and 45 minutes and were conducted by data scientists by videoconference. Data were subsequently analyzed using qualitative data analysis software. Results: Results indicated that electronic health records were primary sources of COVID-19 data. Often, data were also used to create dashboards to inform the public or other health professionals, to aid in decision-making, or for reporting purposes. Cross-sector collaboration was cited as a major success. Consistency among metrics and data definitions, data collection processes, and contact tracing were cited as challenges. Conclusions: Findings suggest that, during future outbreaks, organizations across regions could benefit from data centralization and data governance. Data should be publicly accessible and in a user-friendly format. Additionally, established cross-sector collaboration networks are demonstrably beneficial for public health professionals across the state as these established relationships facilitate a rapid response to evolving public health challenges. UR - https://publichealth.jmir.org/2021/9/e29310 UR - http://dx.doi.org/10.2196/29310 UR - http://www.ncbi.nlm.nih.gov/pubmed/34298500 ID - info:doi/10.2196/29310 ER - TY - JOUR AU - Gonzales, Aldren AU - Smith, R. Scott AU - Dullabh, Prashila AU - Hovey, Lauren AU - Heaney-Huls, Krysta AU - Robichaud, Meagan AU - Boodoo, Roger PY - 2021/8/27 TI - Potential Uses of Blockchain Technology for Outcomes Research on Opioids JO - JMIR Med Inform SP - e16293 VL - 9 IS - 8 KW - blockchain KW - distributed ledger KW - opioid crisis KW - outcomes research KW - patient-centered outcomes research KW - mobile phone UR - https://medinform.jmir.org/2021/8/e16293 UR - http://dx.doi.org/10.2196/16293 UR - http://www.ncbi.nlm.nih.gov/pubmed/34448721 ID - info:doi/10.2196/16293 ER - TY - JOUR AU - Aggarwal, Ravi AU - Farag, Soma AU - Martin, Guy AU - Ashrafian, Hutan AU - Darzi, Ara PY - 2021/8/26 TI - Patient Perceptions on Data Sharing and Applying Artificial Intelligence to Health Care Data: Cross-sectional Survey JO - J Med Internet Res SP - e26162 VL - 23 IS - 8 KW - artificial intelligence KW - patient perception KW - data sharing KW - health data KW - privacy N2 - Background: Considerable research is being conducted as to how artificial intelligence (AI) can be effectively applied to health care. However, for the successful implementation of AI, large amounts of health data are required for training and testing algorithms. As such, there is a need to understand the perspectives and viewpoints of patients regarding the use of their health data in AI research. Objective: We surveyed a large sample of patients for identifying current awareness regarding health data research, and for obtaining their opinions and views on data sharing for AI research purposes, and on the use of AI technology on health care data. Methods: A cross-sectional survey with patients was conducted at a large multisite teaching hospital in the United Kingdom. Data were collected on patient and public views about sharing health data for research and the use of AI on health data. Results: A total of 408 participants completed the survey. The respondents had generally low levels of prior knowledge about AI. Most were comfortable with sharing health data with the National Health Service (NHS) (318/408, 77.9%) or universities (268/408, 65.7%), but far fewer with commercial organizations such as technology companies (108/408, 26.4%). The majority endorsed AI research on health care data (357/408, 87.4%) and health care imaging (353/408, 86.4%) in a university setting, provided that concerns about privacy, reidentification of anonymized health care data, and consent processes were addressed. Conclusions: There were significant variations in the patient perceptions, levels of support, and understanding of health data research and AI. Greater public engagement levels and debates are necessary to ensure the acceptability of AI research and its successful integration into clinical practice in future. UR - https://www.jmir.org/2021/8/e26162 UR - http://dx.doi.org/10.2196/26162 UR - http://www.ncbi.nlm.nih.gov/pubmed/34236994 ID - info:doi/10.2196/26162 ER - TY - JOUR AU - Mishra, Ninad AU - Duke, Jon AU - Karki, Saugat AU - Choi, Myung AU - Riley, Michael AU - Ilatovskiy, V. Andrey AU - Gorges, Marla AU - Lenert, Leslie PY - 2021/8/11 TI - A Modified Public Health Automated Case Event Reporting Platform for Enhancing Electronic Laboratory Reports With Clinical Data: Design and Implementation Study JO - J Med Internet Res SP - e26388 VL - 23 IS - 8 KW - public health surveillance KW - sexually transmitted diseases KW - gonorrhea KW - chlamydia KW - electronic case reporting KW - electronic laboratory reporting KW - health information interoperability KW - fast healthcare interoperability resources KW - electronic health records KW - EHR N2 - Background: Public health reporting is the cornerstone of public health practices that inform prevention and control strategies. There is a need to leverage advances made in the past to implement an architecture that facilitates the timely and complete public health reporting of relevant case-related information that has previously not easily been available to the public health community. Electronic laboratory reporting (ELR) is a reliable method for reporting cases to public health authorities but contains very limited data. In an earlier pilot study, we designed the Public Health Automated Case Event Reporting (PACER) platform, which leverages existing ELR infrastructure as the trigger for creating an electronic case report. PACER is a FHIR (Fast Health Interoperability Resources)-based system that queries the electronic health record from where the laboratory test was requested to extract expanded additional information about a case. Objective: This study aims to analyze the pilot implementation of a modified PACER system for electronic case reporting and describe how this FHIR-based, open-source, and interoperable system allows health systems to conduct public health reporting while maintaining the appropriate governance of the clinical data. Methods: ELR to a simulated public health department was used as the trigger for a FHIR-based query. Predetermined queries were translated into Clinical Quality Language logics. Within the PACER environment, these Clinical Quality Language logical statements were managed and evaluated against the providers? FHIR servers. These predetermined logics were filtered, and only data relevant to that episode of the condition were extracted and sent to simulated public health agencies as an electronic case report. Design and testing were conducted at the Georgia Tech Research Institute, and the pilot was deployed at the Medical University of South Carolina. We evaluated this architecture by examining the completeness of additional information in the electronic case report, such as patient demographics, medications, symptoms, and diagnoses. This additional information is crucial for understanding disease epidemiology, but existing electronic case reporting and ELR architectures do not report them. Therefore, we used the completeness of these data fields as the metrics for enriching electronic case reports. Results: During the 8-week study period, we identified 117 positive test results for chlamydia. PACER successfully created an electronic case report for all 117 patients. PACER extracted demographics, medications, symptoms, and diagnoses from 99.1% (116/117), 72.6% (85/117), 70.9% (83/117), and 65% (76/117) of the cases, respectively. Conclusions: PACER deployed in conjunction with electronic laboratory reports can enhance public health case reporting with additional relevant data. The architecture is modular in design, thereby allowing it to be used for any reportable condition, including evolving outbreaks. PACER allows for the creation of an enhanced and more complete case report that contains relevant case information that helps us to better understand the epidemiology of a disease. UR - https://www.jmir.org/2021/8/e26388 UR - http://dx.doi.org/10.2196/26388 UR - http://www.ncbi.nlm.nih.gov/pubmed/34383669 ID - info:doi/10.2196/26388 ER - TY - JOUR AU - Bright, A. Roselie AU - Rankin, K. Summer AU - Dowdy, Katherine AU - Blok, V. Sergey AU - Bright, J. Susan AU - Palmer, M. Lee Anne PY - 2021/8/11 TI - Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method JO - JMIRx Med SP - e27017 VL - 2 IS - 3 KW - epidemiology KW - electronic health record KW - electronic health care record KW - big data KW - patient harm KW - patient safety KW - public health KW - product surveillance, postmarketing KW - natural language processing KW - proof-of-concept study KW - critical care N2 - Background: Big data tools provide opportunities to monitor adverse events (patient harm associated with medical care) (AEs) in the unstructured text of electronic health care records (EHRs). Writers may explicitly state an apparent association between treatment and adverse outcome (?attributed?) or state the simple treatment and outcome without an association (?unattributed?). Many methods for finding AEs in text rely on predefining possible AEs before searching for prespecified words and phrases or manual labeling (standardization) by investigators. We developed a method to identify possible AEs, even if unknown or unattributed, without any prespecifications or standardization of notes. Our method was inspired by word-frequency analysis methods used to uncover the true authorship of disputed works credited to William Shakespeare. We chose two use cases, ?transfusion? and ?time-based.? Transfusion was chosen because new transfusion AE types were becoming recognized during the study data period; therefore, we anticipated an opportunity to find unattributed potential AEs (PAEs) in the notes. With the time-based case, we wanted to simulate near real-time surveillance. We chose time periods in the hope of detecting PAEs due to contaminated heparin from mid-2007 to mid-2008 that were announced in early 2008. We hypothesized that the prevalence of contaminated heparin may have been widespread enough to manifest in EHRs through symptoms related to heparin AEs, independent of clinicians? documentation of attributed AEs. Objective: We aimed to develop a new method to identify attributed and unattributed PAEs using the unstructured text of EHRs. Methods: We used EHRs for adult critical care admissions at a major teaching hospital (2001-2012). For each case, we formed a group of interest and a comparison group. We concatenated the text notes for each admission into one document sorted by date, and deleted replicate sentences and lists. We identified statistically significant words in the group of interest versus the comparison group. Documents in the group of interest were filtered to those words, followed by topic modeling on the filtered documents to produce topics. For each topic, the three documents with the maximum topic scores were manually reviewed to identify PAEs. Results: Topics centered around medical conditions that were unique to or more common in the group of interest, including PAEs. In each use case, most PAEs were unattributed in the notes. Among the transfusion PAEs was unattributed evidence of transfusion-associated cardiac overload and transfusion-related acute lung injury. Some of the PAEs from mid-2007 to mid-2008 were increased unattributed events consistent with AEs related to heparin contamination. Conclusions: The Shakespeare method could be a useful supplement to AE reporting and surveillance of structured EHR data. Future improvements should include automation of the manual review process. UR - https://med.jmirx.org/2021/3/e27017 UR - http://dx.doi.org/10.2196/27017 UR - http://www.ncbi.nlm.nih.gov/pubmed/37725533 ID - info:doi/10.2196/27017 ER - TY - JOUR AU - Kummer, Benjamin AU - Shakir, Lubaina AU - Kwon, Rachel AU - Habboushe, Joseph AU - Jetté, Nathalie PY - 2021/8/2 TI - Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis JO - JMIR Med Inform SP - e28266 VL - 9 IS - 8 KW - medical informatics KW - clinical informatics KW - mhealth KW - digital health KW - cerebrovascular disease KW - medical calculators KW - health information KW - health information technology KW - information technology KW - economic health KW - clinical health KW - electronic health records N2 - Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app?based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc?s calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5% of total and 32% of stroke-related page views), the Mean Arterial Pressure calculator (2.4% of total and 14.0% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9% of total and 11.4% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7% of total and 10.1% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4% of total and 8.1% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7%-91.2% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. UR - https://medinform.jmir.org/2021/8/e28266 UR - http://dx.doi.org/10.2196/28266 UR - http://www.ncbi.nlm.nih.gov/pubmed/34338647 ID - info:doi/10.2196/28266 ER - TY - JOUR AU - Barata, Carolina AU - Rodrigues, Maria Ana AU - Canhão, Helena AU - Vinga, Susana AU - Carvalho, M. Alexandra PY - 2021/7/30 TI - Predicting Biologic Therapy Outcome of Patients With Spondyloarthritis: Joint Models for Longitudinal and Survival Analysis JO - JMIR Med Inform SP - e26823 VL - 9 IS - 7 KW - data mining KW - survival analysis KW - joint models KW - spondyloarthritis KW - drug survival KW - rheumatic disease KW - electronic medical records KW - medical records N2 - Background: Rheumatic diseases are one of the most common chronic diseases worldwide. Among them, spondyloarthritis (SpA) is a group of highly debilitating diseases, with an early onset age, which significantly impacts patients? quality of life, health care systems, and society in general. Recent treatment options consist of using biologic therapies, and establishing the most beneficial option according to the patients? characteristics is a challenge that needs to be overcome. Meanwhile, the emerging availability of electronic medical records has made necessary the development of methods that can extract insightful information while handling all the challenges of dealing with complex, real-world data. Objective: The aim of this study was to achieve a better understanding of SpA patients? therapy responses and identify the predictors that affect them, thereby enabling the prognosis of therapy success or failure. Methods: A data mining approach based on joint models for the survival analysis of the biologic therapy failure is proposed, which considers the information of both baseline and time-varying variables extracted from the electronic medical records of SpA patients from the database, Reuma.pt. Results: Our results show that being a male, starting biologic therapy at an older age, having a larger time interval between disease start and initiation of the first biologic drug, and being human leukocyte antigen (HLA)?B27 positive are indicators of a good prognosis for the biological drug survival; meanwhile, having disease onset or biologic therapy initiation occur in more recent years, a larger number of education years, and higher values of C-reactive protein or Bath Ankylosing Spondylitis Functional Index (BASFI) at baseline are all predictors of a greater risk of failure of the first biologic therapy. Conclusions: Among this Portuguese subpopulation of SpA patients, those who were male, HLA-B27 positive, and with a later biologic therapy starting date or a larger time interval between disease start and initiation of the first biologic therapy showed longer therapy adherence. Joint models proved to be a valuable tool for the analysis of electronic medical records in the field of rheumatic diseases and may allow for the identification of potential predictors of biologic therapy failure. UR - https://medinform.jmir.org/2021/7/e26823 UR - http://dx.doi.org/10.2196/26823 UR - http://www.ncbi.nlm.nih.gov/pubmed/34328435 ID - info:doi/10.2196/26823 ER - TY - JOUR AU - Ahuja, Manik AU - Aseltine Jr, Robert PY - 2021/7/13 TI - Barriers to Dissemination of Local Health Data Faced by US State Agencies: Survey Study of Behavioral Risk Factor Surveillance System Coordinators JO - J Med Internet Res SP - e16750 VL - 23 IS - 7 KW - web-based data query systems, WDQS KW - health data KW - population health KW - dissemination of local health data N2 - Background: Advances in information technology have paved the way to facilitate accessibility to population-level health data through web-based data query systems (WDQSs). Despite these advances in technology, US state agencies face many challenges related to the dissemination of their local health data. It is essential for the public to have access to high-quality data that are easy to interpret, reliable, and trusted. These challenges have been at the forefront throughout the COVID-19 pandemic. Objective: The purpose of this study is to identify the most significant challenges faced by state agencies, from the perspective of the Behavioral Risk Factor Surveillance System (BRFSS) coordinator from each state, and to assess if the coordinators from states with a WDQS perceive these challenges differently. Methods: We surveyed BRFSS coordinators (N=43) across all 50 US states and the District of Columbia. We surveyed the participants about contextual factors and asked them to rate system aspects and challenges they faced with their health data system on a Likert scale. We used two-sample t tests to compare the means of the ratings by participants from states with and without a WDQS. Results: Overall, 41/43 states (95%) make health data available over the internet, while 65% (28/43) employ a WDQS. States with a WDQS reported greater challenges (P=.01) related to the cost of hardware and software (mean score 3.44/4, 95% CI 3.09-3.78) than states without a WDQS (mean score 2.63/4, 95% CI 2.25-3.00). The system aspect of standardization of vocabulary scored more favorably (P=.01) in states with a WDQS (mean score 3.32/5, 95% CI 2.94-3.69) than in states without a WDQS (mean score 2.85/5, 95% CI 2.47-3.22). Conclusions: Securing of adequate resources and commitment to standardization are vital in the dissemination of local-level health data. Factors such as receiving data in a timely manner, privacy, and political opposition are less significant barriers than anticipated. UR - https://www.jmir.org/2021/7/e16750 UR - http://dx.doi.org/10.2196/16750 UR - http://www.ncbi.nlm.nih.gov/pubmed/34255650 ID - info:doi/10.2196/16750 ER - TY - JOUR AU - Viberg Johansson, Jennifer AU - Bentzen, Beate Heidi AU - Shah, Nisha AU - Haraldsdóttir, Eik AU - Jónsdóttir, Andrea Guðbjörg AU - Kaye, Jane AU - Mascalzoni, Deborah AU - Veldwijk, Jorien PY - 2021/7/5 TI - Preferences of the Public for Sharing Health Data: Discrete Choice Experiment JO - JMIR Med Inform SP - e29614 VL - 9 IS - 7 KW - preferences KW - discrete choice experiment KW - health data KW - secondary use KW - willingness to share N2 - Background: Digital technological development in the last 20 years has led to significant growth in digital collection, use, and sharing of health data. To maintain public trust in the digital society and to enable acceptable policy-making in the future, it is important to investigate people?s preferences for sharing digital health data. Objective: The aim of this study is to elicit the preferences of the public in different Northern European countries (the United Kingdom, Norway, Iceland, and Sweden) for sharing health information in different contexts. Methods: Respondents in this discrete choice experiment completed several choice tasks, in which they were asked if data sharing in the described hypothetical situation was acceptable to them. Latent class logistic regression models were used to determine attribute-level estimates and heterogeneity in preferences. We calculated the relative importance of the attributes and the predicted acceptability for different contexts in which the data were shared from the estimates. Results: In the final analysis, we used 37.83% (1967/5199) questionnaires. All attributes influenced the respondents? willingness to share health information (P<.001). The most important attribute was whether the respondents were informed about their data being shared. The possibility of opting out from sharing data was preferred over the opportunity to consent (opt-in). Four classes were identified in the latent class model, and the average probabilities of belonging were 27% for class 1, 32% for class 2, 23% for class 3, and 18% for class 4. The uptake probability varied between 14% and 85%, depending on the least to most preferred combination of levels. Conclusions: Respondents from different countries have different preferences for sharing their health data regarding the value of a review process and the reason for their new use. Offering respondents information about the use of their data and the possibility to opt out is the most preferred governance mechanism. UR - https://medinform.jmir.org/2021/7/e29614 UR - http://dx.doi.org/10.2196/29614 UR - http://www.ncbi.nlm.nih.gov/pubmed/36260402 ID - info:doi/10.2196/29614 ER - TY - JOUR AU - Schmit, Cason AU - Giannouchos, Theodoros AU - Ramezani, Mahin AU - Zheng, Qi AU - Morrisey, A. Michael AU - Kum, Hye-Chung PY - 2021/7/5 TI - US Privacy Laws Go Against Public Preferences and Impede Public Health and Research: Survey Study JO - J Med Internet Res SP - e25266 VL - 23 IS - 7 KW - privacy KW - law KW - medical informatics KW - conjoint analysis KW - surveys and questionnaires KW - public health KW - information dissemination KW - health policy KW - public policy KW - big data N2 - Background: Reaping the benefits from massive volumes of data collected in all sectors to improve population health, inform personalized medicine, and transform biomedical research requires the delicate balance between the benefits and risks of using individual-level data. There is a patchwork of US data protection laws that vary depending on the type of data, who is using it, and their intended purpose. Differences in these laws challenge big data projects using data from different sources. The decisions to permit or restrict data uses are determined by elected officials; therefore, constituent input is critical to finding the right balance between individual privacy and public benefits. Objective: This study explores the US public?s preferences for using identifiable data for different purposes without their consent. Methods: We measured data use preferences of a nationally representative sample of 504 US adults by conducting a web-based survey in February 2020. The survey used a choice-based conjoint analysis. We selected choice-based conjoint attributes and levels based on 5 US data protection laws (Health Insurance Portability and Accountability Act, Family Educational Rights and Privacy Act, Privacy Act of 1974, Federal Trade Commission Act, and the Common Rule). There were 72 different combinations of attribute levels, representing different data use scenarios. Participants were given 12 pairs of data use scenarios and were asked to choose the scenario they were the most comfortable with. We then simulated the population preferences by using the hierarchical Bayes regression model using the ChoiceModelR package in R. Results: Participants strongly preferred data reuse for public health and research than for profit-driven, marketing, or crime-detection activities. Participants also strongly preferred data use by universities or nonprofit organizations over data use by businesses and governments. Participants were fairly indifferent about the different types of data used (health, education, government, or economic data). Conclusions: Our results show a notable incongruence between public preferences and current US data protection laws. Our findings appear to show that the US public favors data uses promoting social benefits over those promoting individual or organizational interests. This study provides strong support for continued efforts to provide safe access to useful data sets for research and public health. Policy makers should consider more robust public health and research data use exceptions to align laws with public preferences. In addition, policy makers who revise laws to enable data use for research and public health should consider more comprehensive protection mechanisms, including transparent use of data and accountability. UR - https://www.jmir.org/2021/7/e25266 UR - http://dx.doi.org/10.2196/25266 UR - http://www.ncbi.nlm.nih.gov/pubmed/36260399 ID - info:doi/10.2196/25266 ER - TY - JOUR AU - Gaudet-Blavignac, Christophe AU - Raisaro, Louis Jean AU - Touré, Vasundra AU - Österle, Sabine AU - Crameri, Katrin AU - Lovis, Christian PY - 2021/6/24 TI - A National, Semantic-Driven, Three-Pillar Strategy to Enable Health Data Secondary Usage Interoperability for Research Within the Swiss Personalized Health Network: Methodological Study JO - JMIR Med Inform SP - e27591 VL - 9 IS - 6 KW - interoperability KW - clinical data reuse KW - personalized medicine N2 - Background: Interoperability is a well-known challenge in medical informatics. Current trends in interoperability have moved from a data model technocentric approach to sustainable semantics, formal descriptive languages, and processes. Despite many initiatives and investments for decades, the interoperability challenge remains crucial. The need for data sharing for most purposes ranging from patient care to secondary uses, such as public health, research, and quality assessment, faces unmet problems. Objective: This work was performed in the context of a large Swiss Federal initiative aiming at building a national infrastructure for reusing consented data acquired in the health care and research system to enable research in the field of personalized medicine in Switzerland. The initiative is the Swiss Personalized Health Network (SPHN). This initiative is providing funding to foster use and exchange of health-related data for research. As part of the initiative, a national strategy to enable a semantically interoperable clinical data landscape was developed and implemented. Methods: A deep analysis of various approaches to address interoperability was performed at the start, including large frameworks in health care, such as Health Level Seven (HL7) and Integrating Healthcare Enterprise (IHE), and in several domains, such as regulatory agencies (eg, Clinical Data Interchange Standards Consortium [CDISC]) and research communities (eg, Observational Medical Outcome Partnership [OMOP]), to identify bottlenecks and assess sustainability. Based on this research, a strategy composed of three pillars was designed. It has strong multidimensional semantics, descriptive formal language for exchanges, and as many data models as needed to comply with the needs of various communities. Results: This strategy has been implemented stepwise in Switzerland since the middle of 2019 and has been adopted by all university hospitals and high research organizations. The initiative is coordinated by a central organization, the SPHN Data Coordination Center of the SIB Swiss Institute of Bioinformatics. The semantics is mapped by domain experts on various existing standards, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), and International Classification of Diseases (ICD). The resource description framework (RDF) is used for storing and transporting data, and to integrate information from different sources and standards. Data transformers based on SPARQL query language are implemented to convert RDF representations to the numerous data models required by the research community or bridge with other systems, such as electronic case report forms. Conclusions: The SPHN strategy successfully implemented existing standards in a pragmatic and applicable way. It did not try to build any new standards but used existing ones in a nondogmatic way. It has now been funded for another 4 years, bringing the Swiss landscape into a new dimension to support research in the field of personalized medicine and large interoperable clinical data. UR - https://medinform.jmir.org/2021/6/e27591/ UR - http://dx.doi.org/10.2196/27591 UR - http://www.ncbi.nlm.nih.gov/pubmed/34185008 ID - info:doi/10.2196/27591 ER - TY - JOUR AU - van Allen, Zack AU - Bacon, L. Simon AU - Bernard, Paquito AU - Brown, Heather AU - Desroches, Sophie AU - Kastner, Monika AU - Lavoie, Kim AU - Marques, Marta AU - McCleary, Nicola AU - Straus, Sharon AU - Taljaard, Monica AU - Thavorn, Kednapa AU - Tomasone, R. Jennifer AU - Presseau, Justin PY - 2021/6/11 TI - Clustering of Unhealthy Behaviors: Protocol for a Multiple Behavior Analysis of Data From the Canadian Longitudinal Study on Aging JO - JMIR Res Protoc SP - e24887 VL - 10 IS - 6 KW - health behaviors KW - multiple behaviors KW - cluster analysis KW - network analysis KW - CLSA N2 - Background: Health behaviors such as physical inactivity, unhealthy eating, smoking tobacco, and alcohol use are leading risk factors for noncommunicable chronic diseases and play a central role in limiting health and life satisfaction. To date, however, health behaviors tend to be considered separately from one another, resulting in guidelines and interventions for healthy aging siloed by specific behaviors and often focused only on a given health behavior without considering the co-occurrence of family, social, work, and other behaviors of everyday life. Objective: The aim of this study is to understand how behaviors cluster and how such clusters are associated with physical and mental health, life satisfaction, and health care utilization may provide opportunities to leverage this co-occurrence to develop and evaluate interventions to promote multiple health behavior changes. Methods: Using cross-sectional baseline data from the Canadian Longitudinal Study on Aging, we will perform a predefined set of exploratory and hypothesis-generating analyses to examine the co-occurrence of health and everyday life behaviors. We will use agglomerative hierarchical cluster analysis to cluster individuals based on their behavioral tendencies. Multinomial logistic regression will then be used to model the relationships between clusters and demographic indicators, health care utilization, and general health and life satisfaction, and assess whether sex and age moderate these relationships. In addition, we will conduct network community detection analysis using the clique percolation algorithm to detect overlapping communities of behaviors based on the strength of relationships between variables. Results: Baseline data for the Canadian Longitudinal Study on Aging were collected from 51,338 participants aged between 45 and 85 years. Data were collected between 2010 and 2015. Secondary data analysis for this project was approved by the Ottawa Health Science Network Research Ethics Board (protocol ID #20190506-01H). Conclusions: This study will help to inform the development of interventions tailored to subpopulations of adults (eg, physically inactive smokers) defined by the multiple behaviors that describe their everyday life experiences. International Registered Report Identifier (IRRID): DERR1-10.2196/24887 UR - https://www.researchprotocols.org/2021/6/e24887 UR - http://dx.doi.org/10.2196/24887 UR - http://www.ncbi.nlm.nih.gov/pubmed/34114962 ID - info:doi/10.2196/24887 ER - TY - JOUR AU - Surodina, Svitlana AU - Lam, Ching AU - Grbich, Svetislav AU - Milne-Ives, Madison AU - van Velthoven, Michelle AU - Meinert, Edward PY - 2021/6/11 TI - Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study JO - JMIRx Med SP - e25560 VL - 2 IS - 2 KW - data collection KW - herpes simplex virus KW - registries KW - machine learning KW - risk assessment KW - artificial intelligence KW - medical information system KW - user-centered design KW - predictor KW - risk N2 - Background: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. Objective: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. Methods: The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. Results: The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. Conclusions: This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals? levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements. UR - https://xmed.jmir.org/2021/2/e25560 UR - http://dx.doi.org/10.2196/25560 UR - http://www.ncbi.nlm.nih.gov/pubmed/37725536 ID - info:doi/10.2196/25560 ER - TY - JOUR AU - Blitz, Rogério AU - Storck, Michael AU - Baune, T. Bernhard AU - Dugas, Martin AU - Opel, Nils PY - 2021/6/9 TI - Design and Implementation of an Informatics Infrastructure for Standardized Data Acquisition, Transfer, Storage, and Export in Psychiatric Clinical Routine: Feasibility Study JO - JMIR Ment Health SP - e26681 VL - 8 IS - 6 KW - medical informatics KW - digital mental health KW - digital data collection KW - psychiatry KW - single-source metadata architecture transformation KW - mental health KW - design KW - implementation KW - feasibility KW - informatics KW - infrastructure KW - data N2 - Background: Empirically driven personalized diagnostic applications and treatment stratification is widely perceived as a major hallmark in psychiatry. However, databased personalized decision making requires standardized data acquisition and data access, which are currently absent in psychiatric clinical routine. Objective: Here, we describe the informatics infrastructure implemented at the psychiatric Münster University Hospital, which allows standardized acquisition, transfer, storage, and export of clinical data for future real-time predictive modelling in psychiatric routine. Methods: We designed and implemented a technical architecture that includes an extension of the electronic health record (EHR) via scalable standardized data collection and data transfer between EHRs and research databases, thus allowing the pooling of EHRs and research data in a unified database and technical solutions for the visual presentation of collected data and analyses results in the EHR. The Single-source Metadata ARchitecture Transformation (SMA:T) was used as the software architecture. SMA:T is an extension of the EHR system and uses module-driven engineering to generate standardized applications and interfaces. The operational data model was used as the standard. Standardized data were entered on iPads via the Mobile Patient Survey (MoPat) and the web application Mopat@home, and the standardized transmission, processing, display, and export of data were realized via SMA:T. Results: The technical feasibility of the informatics infrastructure was demonstrated in the course of this study. We created 19 standardized documentation forms with 241 items. For 317 patients, 6451 instances were automatically transferred to the EHR system without errors. Moreover, 96,323 instances were automatically transferred from the EHR system to the research database for further analyses. Conclusions: In this study, we present the successful implementation of the informatics infrastructure enabling standardized data acquisition and data access for future real-time predictive modelling in clinical routine in psychiatry. The technical solution presented here might guide similar initiatives at other sites and thus help to pave the way toward future application of predictive models in psychiatric clinical routine. UR - https://mental.jmir.org/2021/6/e26681 UR - http://dx.doi.org/10.2196/26681 UR - http://www.ncbi.nlm.nih.gov/pubmed/34106072 ID - info:doi/10.2196/26681 ER - TY - JOUR AU - Castro, A. Lauren AU - Shelley, D. Courtney AU - Osthus, Dave AU - Michaud, Isaac AU - Mitchell, Jason AU - Manore, A. Carrie AU - Del Valle, Y. Sara PY - 2021/6/9 TI - How New Mexico Leveraged a COVID-19 Case Forecasting Model to Preemptively Address the Health Care Needs of the State: Quantitative Analysis JO - JMIR Public Health Surveill SP - e27888 VL - 7 IS - 6 KW - COVID-19 KW - forecasting KW - health care KW - prediction KW - forecast KW - model KW - quantitative KW - hospital KW - ICU KW - ventilator KW - intensive care unit KW - probability KW - trend KW - plan N2 - Background: Prior to the COVID-19 pandemic, US hospitals relied on static projections of future trends for long-term planning and were only beginning to consider forecasting methods for short-term planning of staffing and other resources. With the overwhelming burden imposed by COVID-19 on the health care system, an emergent need exists to accurately forecast hospitalization needs within an actionable timeframe. Objective: Our goal was to leverage an existing COVID-19 case and death forecasting tool to generate the expected number of concurrent hospitalizations, occupied intensive care unit (ICU) beds, and in-use ventilators 1 day to 4 weeks in the future for New Mexico and each of its five health regions. Methods: We developed a probabilistic model that took as input the number of new COVID-19 cases for New Mexico from Los Alamos National Laboratory?s COVID-19 Forecasts Using Fast Evaluations and Estimation tool, and we used the model to estimate the number of new daily hospital admissions 4 weeks into the future based on current statewide hospitalization rates. The model estimated the number of new admissions that would require an ICU bed or use of a ventilator and then projected the individual lengths of hospital stays based on the resource need. By tracking the lengths of stay through time, we captured the projected simultaneous need for inpatient beds, ICU beds, and ventilators. We used a postprocessing method to adjust the forecasts based on the differences between prior forecasts and the subsequent observed data. Thus, we ensured that our forecasts could reflect a dynamically changing situation on the ground. Results: Forecasts made between September 1 and December 9, 2020, showed variable accuracy across time, health care resource needs, and forecast horizon. Forecasts made in October, when new COVID-19 cases were steadily increasing, had an average accuracy error of 20.0%, while the error in forecasts made in September, a month with low COVID-19 activity, was 39.7%. Across health care use categories, state-level forecasts were more accurate than those at the regional level. Although the accuracy declined as the forecast was projected further into the future, the stated uncertainty of the prediction improved. Forecasts were within 5% of their stated uncertainty at the 50% and 90% prediction intervals at the 3- to 4-week forecast horizon for state-level inpatient and ICU needs. However, uncertainty intervals were too narrow for forecasts of state-level ventilator need and all regional health care resource needs. Conclusions: Real-time forecasting of the burden imposed by a spreading infectious disease is a crucial component of decision support during a public health emergency. Our proposed methodology demonstrated utility in providing near-term forecasts, particularly at the state level. This tool can aid other stakeholders as they face COVID-19 population impacts now and in the future. UR - https://publichealth.jmir.org/2021/6/e27888 UR - http://dx.doi.org/10.2196/27888 UR - http://www.ncbi.nlm.nih.gov/pubmed/34003763 ID - info:doi/10.2196/27888 ER - TY - JOUR AU - Jungkunz, Martin AU - Köngeter, Anja AU - Mehlis, Katja AU - Winkler, C. Eva AU - Schickhardt, Christoph PY - 2021/6/8 TI - Secondary Use of Clinical Data in Data-Gathering, Non-Interventional Research or Learning Activities: Definition, Types, and a Framework for Risk Assessment JO - J Med Internet Res SP - e26631 VL - 23 IS - 6 KW - secondary use KW - risk assessment KW - clinical data KW - ethics KW - risk factors KW - risks KW - privacy KW - electronic health records KW - research KW - patient data N2 - Background: The secondary use of clinical data in data-gathering, non-interventional research or learning activities (SeConts) has great potential for scientific progress and health care improvement. At the same time, it poses relevant risks for the privacy and informational self-determination of patients whose data are used. Objective: Since the current literature lacks a tailored framework for risk assessment in SeConts as well as a clarification of the concept and practical scope of SeConts, we aim to fill this gap. Methods: In this study, we analyze each element of the concept of SeConts to provide a synthetic definition, investigate the practical relevance and scope of SeConts through a literature review, and operationalize the widespread definition of risk (as a harmful event of a certain magnitude that occurs with a certain probability) to conduct a tailored analysis of privacy risk factors typically implied in SeConts. Results: We offer a conceptual clarification and definition of SeConts and provide a list of types of research and learning activities that can be subsumed under the definition of SeConts. We also offer a proposal for the classification of SeConts types into the categories non-interventional (observational) clinical research, quality control and improvement, or public health research. In addition, we provide a list of risk factors that determine the probability or magnitude of harm implied in SeConts. The risk factors provide a framework for assessing the privacy-related risks for patients implied in SeConts. We illustrate the use of risk assessment by applying it to a concrete example. Conclusions: In the future, research ethics committees and data use and access committees will be able to rely on and apply the framework offered here when reviewing projects of secondary use of clinical data for learning and research purposes. UR - https://www.jmir.org/2021/6/e26631 UR - http://dx.doi.org/10.2196/26631 UR - http://www.ncbi.nlm.nih.gov/pubmed/34100760 ID - info:doi/10.2196/26631 ER - TY - JOUR AU - Oh, SeHee AU - Sung, MinDong AU - Rhee, Yumie AU - Hong, Namki AU - Park, Rang Yu PY - 2021/5/31 TI - Evaluation of the Privacy Risks of Personal Health Identifiers and Quasi-Identifiers in a Distributed Research Network: Development and Validation Study JO - JMIR Med Inform SP - e24940 VL - 9 IS - 5 KW - distributed research network KW - Observational Medical Outcomes Partnership common data model KW - privacy risk quantification KW - personal health identifier KW - quasi-identifier N2 - Background: Privacy should be protected in medical data that include patient information. A distributed research network (DRN) is one of the challenges in privacy protection and in the encouragement of multi-institutional clinical research. A DRN standardizes multi-institutional data into a common structure and terminology called a common data model (CDM), and it only shares analysis results. It is necessary to measure how a DRN protects patient information privacy even without sharing data in practice. Objective: This study aimed to quantify the privacy risk of a DRN by comparing different deidentification levels focusing on personal health identifiers (PHIs) and quasi-identifiers (QIs). Methods: We detected PHIs and QIs in an Observational Medical Outcomes Partnership (OMOP) CDM as threatening privacy, based on 18 Health Insurance Portability and Accountability Act of 1996 (HIPPA) identifiers and previous studies. To compare the privacy risk according to the different privacy policies, we generated limited and safe harbor data sets based on 16 PHIs and 12 QIs as threatening privacy from the Synthetic Public Use File 5 Percent (SynPUF5PCT) data set, which is a public data set of the OMOP CDM. With minimum cell size and equivalence class methods, we measured the privacy risk reduction with a trust differential gap obtained by comparing the two data sets. We also measured the gap in randomly sampled records from the two data sets to adjust the number of PHI or QI records. Results: The gaps averaged 31.448% and 73.798% for PHIs and QIs, respectively, with a minimum cell size of one, which represents a unique record in a data set. Among PHIs, the national provider identifier had the highest gap of 71.236% (71.244% and 0.007% in the limited and safe harbor data sets, respectively). The maximum size of the equivalence class, which has the largest size of an indistinguishable set of records, averaged 771. In 1000 random samples of PHIs, Device_exposure_start_date had the highest gap of 33.730% (87.705% and 53.975% in the data sets). Among QIs, Death had the highest gap of 99.212% (99.997% and 0.784% in the data sets). In 1000, 10,000, and 100,000 random samples of QIs, Device_treatment had the highest gaps of 12.980% (99.980% and 87.000% in the data sets), 60.118% (99.831% and 39.713%), and 93.597% (98.805% and 5.207%), respectively, and in 1 million random samples, Death had the highest gap of 99.063% (99.998% and 0.934% in the data sets). Conclusions: In this study, we verified and quantified the privacy risk of PHIs and QIs in the DRN. Although this study used limited PHIs and QIs for verification, the privacy limitations found in this study could be used as a quality measurement index for deidentification of multi-institutional collaboration research, thereby increasing DRN safety. UR - https://medinform.jmir.org/2021/5/e24940 UR - http://dx.doi.org/10.2196/24940 UR - http://www.ncbi.nlm.nih.gov/pubmed/34057426 ID - info:doi/10.2196/24940 ER - TY - JOUR AU - Zong, Nansu AU - Ngo, Victoria AU - Stone, J. Daniel AU - Wen, Andrew AU - Zhao, Yiqing AU - Yu, Yue AU - Liu, Sijia AU - Huang, Ming AU - Wang, Chen AU - Jiang, Guoqian PY - 2021/5/25 TI - Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study JO - JMIR Med Inform SP - e23586 VL - 9 IS - 5 KW - genetic reports KW - electronic health records KW - predicting primary cancers KW - Fast Healthcare Interoperability Resources KW - FHIR KW - Resource Description Framework KW - RDF N2 - Background: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. Objective: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. Methods: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic?s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. Results: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. Conclusions: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer. UR - https://medinform.jmir.org/2021/5/e23586 UR - http://dx.doi.org/10.2196/23586 UR - http://www.ncbi.nlm.nih.gov/pubmed/34032581 ID - info:doi/10.2196/23586 ER - TY - JOUR AU - Alhassan, Zakhriya AU - Watson, Matthew AU - Budgen, David AU - Alshammari, Riyad AU - Alessa, Ali AU - Al Moubayed, Noura PY - 2021/5/24 TI - Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records JO - JMIR Med Inform SP - e25237 VL - 9 IS - 5 KW - glycated hemoglobin HbA1c KW - prediction KW - machine learning KW - deep learning KW - neural network KW - multilayer perceptron KW - electronic health records KW - time series data KW - longitudinal data KW - diabetes N2 - Background: Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes. Objective: Our study investigated the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models. Methods: This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (?5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records. Results: The machine learning models achieved promising results for predicting current HbA1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions: This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (?5.7% or less). Using patients? longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies. UR - https://medinform.jmir.org/2021/5/e25237 UR - http://dx.doi.org/10.2196/25237 UR - http://www.ncbi.nlm.nih.gov/pubmed/34028357 ID - info:doi/10.2196/25237 ER - TY - JOUR AU - Lee, Haeyun AU - Chai, Jun Young AU - Joo, Hyunjin AU - Lee, Kyungsu AU - Hwang, Youn Jae AU - Kim, Seok-Mo AU - Kim, Kwangsoon AU - Nam, Inn-Chul AU - Choi, Young June AU - Yu, Won Hyeong AU - Lee, Myung-Chul AU - Masuoka, Hiroo AU - Miyauchi, Akira AU - Lee, Eun Kyu AU - Kim, Sungwan AU - Kong, Hyoun-Joong PY - 2021/5/18 TI - Federated Learning for Thyroid Ultrasound Image Analysis to Protect Personal Information: Validation Study in a Real Health Care Environment JO - JMIR Med Inform SP - e25869 VL - 9 IS - 5 KW - deep learning KW - federated learning KW - thyroid nodules KW - ultrasound image N2 - Background: Federated learning is a decentralized approach to machine learning; it is a training strategy that overcomes medical data privacy regulations and generalizes deep learning algorithms. Federated learning mitigates many systemic privacy risks by sharing only the model and parameters for training, without the need to export existing medical data sets. In this study, we performed ultrasound image analysis using federated learning to predict whether thyroid nodules were benign or malignant. Objective: The goal of this study was to evaluate whether the performance of federated learning was comparable with that of conventional deep learning. Methods: A total of 8457 (5375 malignant, 3082 benign) ultrasound images were collected from 6 institutions and used for federated learning and conventional deep learning. Five deep learning networks (VGG19, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) were used. Using stratified random sampling, we selected 20% (1075 malignant, 616 benign) of the total images for internal validation. For external validation, we used 100 ultrasound images (50 malignant, 50 benign) from another institution. Results: For internal validation, the area under the receiver operating characteristic (AUROC) curve for federated learning was between 78.88% and 87.56%, and the AUROC for conventional deep learning was between 82.61% and 91.57%. For external validation, the AUROC for federated learning was between 75.20% and 86.72%, and the AUROC curve for conventional deep learning was between 73.04% and 91.04%. Conclusions: We demonstrated that the performance of federated learning using decentralized data was comparable to that of conventional deep learning using pooled data. Federated learning might be potentially useful for analyzing medical images while protecting patients? personal information. UR - https://medinform.jmir.org/2021/5/e25869 UR - http://dx.doi.org/10.2196/25869 UR - http://www.ncbi.nlm.nih.gov/pubmed/33858817 ID - info:doi/10.2196/25869 ER - TY - JOUR AU - Taushanov, Zhivko AU - Verloo, Henk AU - Wernli, Boris AU - Di Giovanni, Saviana AU - von Gunten, Armin AU - Pereira, Filipa PY - 2021/5/11 TI - Transforming a Patient Registry Into a Customized Data Set for the Advanced Statistical Analysis of Health Risk Factors and for Medication-Related Hospitalization Research: Retrospective Hospital Patient Registry Study JO - JMIR Med Inform SP - e24205 VL - 9 IS - 5 KW - cluster analysis KW - hierarchical 2-step clustering KW - registry KW - raw data KW - hospital KW - retrospective KW - population based KW - multidimensional N2 - Background: Hospital patient registries provide substantial longitudinal data sets describing the clinical and medical health statuses of inpatients and their pharmacological prescriptions. Despite the multiple advantages of routinely collecting multidimensional longitudinal data, those data sets are rarely suitable for advanced statistical analysis and they require customization and synthesis. Objective: The aim of this study was to describe the methods used to transform and synthesize a raw, multidimensional, hospital patient registry data set into an exploitable database for the further investigation of risk profiles and predictive and survival health outcomes among polymorbid, polymedicated, older inpatients in relation to their medicine prescriptions at hospital discharge. Methods: A raw, multidimensional data set from a public hospital was extracted from the hospital registry in a CSV (.csv) file and imported into the R statistical package for cleaning, customization, and synthesis. Patients fulfilling the criteria for inclusion were home-dwelling, polymedicated, older adults with multiple chronic conditions aged ?65 who became hospitalized. The patient data set covered 140 variables from 20,422 hospitalizations of polymedicated, home-dwelling older adults from 2015 to 2018. Each variable, according to type, was explored and computed to describe distributions, missing values, and associations. Different clustering methods, expert opinion, recoding, and missing-value techniques were used to customize and synthesize these multidimensional data sets. Results: Sociodemographic data showed no missing values. Average age, hospital length of stay, and frequency of hospitalization were computed. Discharge details were recoded and summarized. Clinical data were cleaned up and best practices for managing missing values were applied. Seven clusters of medical diagnoses, surgical interventions, somatic, cognitive, and medicines data were extracted using empirical and statistical best practices, with each presenting the health status of the patients included in it as accurately as possible. Medical, comorbidity, and drug data were recoded and summarized. Conclusions: A cleaner, better-structured data set was obtained, combining empirical and best-practice statistical approaches. The overall strategy delivered an exploitable, population-based database suitable for an advanced analysis of the descriptive, predictive, and survival statistics relating to polymedicated, home-dwelling older adults admitted as inpatients. More research is needed to develop best practices for customizing and synthesizing large, multidimensional, population-based registries. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-030030 UR - https://medinform.jmir.org/2021/5/e24205 UR - http://dx.doi.org/10.2196/24205 UR - http://www.ncbi.nlm.nih.gov/pubmed/33973865 ID - info:doi/10.2196/24205 ER - TY - JOUR AU - Chen, Hong AU - Yu, Ping AU - Hailey, David AU - Cui, Tingru PY - 2021/5/10 TI - Validation of 4D Components for Measuring Quality of the Public Health Data Collection Process: Elicitation Study JO - J Med Internet Res SP - e17240 VL - 23 IS - 5 KW - data quality KW - data collection KW - HIV/AIDS KW - public health informatics KW - health information systems KW - component validation KW - expert elicitation KW - public health KW - health informatics N2 - Background: Identification of the essential components of the quality of the data collection process is the starting point for designing effective data quality management strategies for public health information systems. An inductive analysis of the global literature on the quality of the public health data collection process has led to the formation of a preliminary 4D component framework, that is, data collection management, data collection personnel, data collection system, and data collection environment. It is necessary to empirically validate the framework for its use in future research and practice. Objective: This study aims to obtain empirical evidence to confirm the components of the framework and, if needed, to further develop this framework. Methods: Expert elicitation was used to evaluate the preliminary framework in the context of the Chinese National HIV/AIDS Comprehensive Response Information Management System. The research processes included the development of an interview guide and data collection form, data collection, and analysis. A total of 3 public health administrators, 15 public health workers, and 10 health care practitioners participated in the elicitation session. A framework qualitative data analysis approach and a quantitative comparative analysis were followed to elicit themes from the interview transcripts and to map them to the elements of the preliminary 4D framework. Results: A total of 302 codes were extracted from interview transcripts. After iterative and recursive comparison, classification, and mapping, 46 new indicators emerged; 24.8% (37/149) of the original indicators were deleted because of a lack of evidence support and another 28.2% (42/149) were merged. The validated 4D component framework consists of 116 indicators (82 facilitators and 34 barriers). The first component, data collection management, includes data collection protocols and quality assurance. It was measured by 41 indicators, decreased from the original 49% (73/149) to 35.3% (41/116). The second component, data collection environment, was measured by 37 indicators, increased from the original 13.4% (20/149) to 31.9% (37/116). It comprised leadership, training, funding, organizational policy, high-level management support, and collaboration among parallel organizations. The third component, data collection personnel, includes the perception of data collection, skills and competence, communication, and staffing patterns. There was no change in the proportion for data collection personnel (19.5% vs 19.0%), although the number of its indicators was reduced from 29 to 22. The fourth component, the data collection system, was measured using 16 indicators, with a slight decrease in percentage points from 18.1% (27/149) to 13.8% (16/116). It comprised functions, system integration, technical support, and data collection devices. Conclusions: This expert elicitation study validated and improved the 4D framework. The framework can be useful in developing a questionnaire survey instrument for measuring the quality of the public health data collection process after validation of psychometric properties and item reduction. UR - https://www.jmir.org/2021/5/e17240 UR - http://dx.doi.org/10.2196/17240 UR - http://www.ncbi.nlm.nih.gov/pubmed/33970112 ID - info:doi/10.2196/17240 ER - TY - JOUR AU - Her, Qoua AU - Kent, Thomas AU - Samizo, Yuji AU - Slavkovic, Aleksandra AU - Vilk, Yury AU - Toh, Sengwee PY - 2021/4/23 TI - Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study JO - JMIR Med Inform SP - e21459 VL - 9 IS - 4 KW - distributed regression analysis KW - distributed data networks KW - privacy-protecting analytics KW - vertically partitioned data KW - informatics KW - data networks KW - data N2 - Background: In clinical research, important variables may be collected from multiple data sources. Physical pooling of patient-level data from multiple sources often raises several challenges, including proper protection of patient privacy and proprietary interests. We previously developed an SAS-based package to perform distributed regression?a suite of privacy-protecting methods that perform multivariable-adjusted regression analysis using only summary-level information?with horizontally partitioned data, a setting where distinct cohorts of patients are available from different data sources. We integrated the package with PopMedNet, an open-source file transfer software, to facilitate secure file transfer between the analysis center and the data-contributing sites. The feasibility of using PopMedNet to facilitate distributed regression analysis (DRA) with vertically partitioned data, a setting where the data attributes from a cohort of patients are available from different data sources, was unknown. Objective: The objective of the study was to describe the feasibility of using PopMedNet and enhancements to PopMedNet to facilitate automatable vertical DRA (vDRA) in real-world settings. Methods: We gathered the statistical and informatic requirements of using PopMedNet to facilitate automatable vDRA. We enhanced PopMedNet based on these requirements to improve its technical capability to support vDRA. Results: PopMedNet can enable automatable vDRA. We identified and implemented two enhancements to PopMedNet that improved its technical capability to perform automatable vDRA in real-world settings. The first was the ability to simultaneously upload and download multiple files, and the second was the ability to directly transfer summary-level information between the data-contributing sites without a third-party analysis center. Conclusions: PopMedNet can be used to facilitate automatable vDRA to protect patient privacy and support clinical research in real-world settings. UR - https://medinform.jmir.org/2021/4/e21459 UR - http://dx.doi.org/10.2196/21459 UR - http://www.ncbi.nlm.nih.gov/pubmed/33890866 ID - info:doi/10.2196/21459 ER - TY - JOUR AU - Domínguez-Olmedo, L. Juan AU - Gragera-Martínez, Álvaro AU - Mata, Jacinto AU - Pachón Álvarez, Victoria PY - 2021/4/14 TI - Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation JO - J Med Internet Res SP - e26211 VL - 23 IS - 4 KW - COVID-19 KW - electronic health record KW - machine learning KW - mortality KW - prediction N2 - Background: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain?s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. Objective: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. Methods: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. Results: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. Conclusions: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality. UR - https://www.jmir.org/2021/4/e26211 UR - http://dx.doi.org/10.2196/26211 UR - http://www.ncbi.nlm.nih.gov/pubmed/33793407 ID - info:doi/10.2196/26211 ER - TY - JOUR AU - Borges do Nascimento, Júnior Israel AU - Marcolino, Soriano Milena AU - Abdulazeem, Mohamed Hebatullah AU - Weerasekara, Ishanka AU - Azzopardi-Muscat, Natasha AU - Gonçalves, André Marcos AU - Novillo-Ortiz, David PY - 2021/4/13 TI - Impact of Big Data Analytics on People?s Health: Overview of Systematic Reviews and Recommendations for Future Studies JO - J Med Internet Res SP - e27275 VL - 23 IS - 4 KW - public health KW - big data KW - health status KW - evidence-based medicine KW - big data analytics KW - secondary data analysis KW - machine learning KW - systematic review KW - overview KW - World Health Organization N2 - Background: Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health. Objective: The aim of this study was to assess the impact of the use of big data analytics on people?s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2?related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people?s health. Methods: Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist. Results: The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. ?Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease? and ?suicide mortality rate? were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as ?critically low? for 25 reviews, as ?low? for 7 reviews, and as ?moderate? for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data. Conclusions: Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes. Trial Registration: International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048 UR - https://www.jmir.org/2021/4/e27275 UR - http://dx.doi.org/10.2196/27275 UR - http://www.ncbi.nlm.nih.gov/pubmed/33847586 ID - info:doi/10.2196/27275 ER - TY - JOUR AU - Staffini, Alessio AU - Svensson, Kishi Akiko AU - Chung, Ung-Il AU - Svensson, Thomas PY - 2021/4/6 TI - An Agent-Based Model of the Local Spread of SARS-CoV-2: Modeling Study JO - JMIR Med Inform SP - e24192 VL - 9 IS - 4 KW - computational epidemiology KW - COVID-19 KW - SARS-CoV-2 KW - agent-based modeling KW - public health KW - computational models KW - modeling KW - agent KW - spread KW - computation KW - epidemiology KW - policy N2 - Background: The spread of SARS-CoV-2, originating in Wuhan, China, was classified as a pandemic by the World Health Organization on March 11, 2020. The governments of affected countries have implemented various measures to limit the spread of the virus. The starting point of this paper is the different government approaches, in terms of promulgating new legislative regulations to limit the virus diffusion and to contain negative effects on the populations. Objective: This paper aims to study how the spread of SARS-CoV-2 is linked to government policies and to analyze how different policies have produced different results on public health. Methods: Considering the official data provided by 4 countries (Italy, Germany, Sweden, and Brazil) and from the measures implemented by each government, we built an agent-based model to study the effects that these measures will have over time on different variables such as the total number of COVID-19 cases, intensive care unit (ICU) bed occupancy rates, and recovery and case-fatality rates. The model we implemented provides the possibility of modifying some starting variables, and it was thus possible to study the effects that some policies (eg, keeping the national borders closed or increasing the ICU beds) would have had on the spread of the infection. Results: The 4 considered countries have adopted different containment measures for COVID-19, and the forecasts provided by the model for the considered variables have given different results. Italy and Germany seem to be able to limit the spread of the infection and any eventual second wave, while Sweden and Brazil do not seem to have the situation under control. This situation is also reflected in the forecasts of pressure on the National Health Services, which see Sweden and Brazil with a high occupancy rate of ICU beds in the coming months, with a consequent high number of deaths. Conclusions: In line with what we expected, the obtained results showed that the countries that have taken restrictive measures in terms of limiting the population mobility have managed more successfully than others to contain the spread of COVID-19. Moreover, the model demonstrated that herd immunity cannot be reached even in countries that have relied on a strategy without strict containment measures. UR - https://medinform.jmir.org/2021/4/e24192 UR - http://dx.doi.org/10.2196/24192 UR - http://www.ncbi.nlm.nih.gov/pubmed/33750735 ID - info:doi/10.2196/24192 ER - TY - JOUR AU - Park, Ae Ji AU - Sung, Dong Min AU - Kim, Heon Ho AU - Park, Rang Yu PY - 2021/4/5 TI - Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies JO - JMIR Med Inform SP - e21043 VL - 9 IS - 4 KW - multi-institutional study KW - distributed data KW - data sharing KW - privacy-protecting methods N2 - Background: Securing the representativeness of study populations is crucial in biomedical research to ensure high generalizability. In this regard, using multi-institutional data have advantages in medicine. However, combining data physically is difficult as the confidential nature of biomedical data causes privacy issues. Therefore, a methodological approach is necessary when using multi-institution medical data for research to develop a model without sharing data between institutions. Objective: This study aims to develop a weight-based integrated predictive model of multi-institutional data, which does not require iterative communication between institutions, to improve average predictive performance by increasing the generalizability of the model under privacy-preserving conditions without sharing patient-level data. Methods: The weight-based integrated model generates a weight for each institutional model and builds an integrated model for multi-institutional data based on these weights. We performed 3 simulations to show the weight characteristics and to determine the number of repetitions of the weight required to obtain stable values. We also conducted an experiment using real multi-institutional data to verify the developed weight-based integrated model. We selected 10 hospitals (2845 intensive care unit [ICU] stays in total) from the electronic intensive care unit Collaborative Research Database to predict ICU mortality with 11 features. To evaluate the validity of our model, compared with a centralized model, which was developed by combining all the data of 10 hospitals, we used proportional overlap (ie, 0.5 or less indicates a significant difference at a level of .05; and 2 indicates 2 CIs overlapping completely). Standard and firth logistic regression models were applied for the 2 simulations and the experiment. Results: The results of these simulations indicate that the weight of each institution is determined by 2 factors (ie, the data size of each institution and how well each institutional model fits into the overall institutional data) and that repeatedly generating 200 weights is necessary per institution. In the experiment, the estimated area under the receiver operating characteristic curve (AUC) and 95% CIs were 81.36% (79.37%-83.36%) and 81.95% (80.03%-83.87%) in the centralized model and weight-based integrated model, respectively. The proportional overlap of the CIs for AUC in both the weight-based integrated model and the centralized model was approximately 1.70, and that of overlap of the 11 estimated odds ratios was over 1, except for 1 case. Conclusions: In the experiment where real multi-institutional data were used, our model showed similar results to the centralized model without iterative communication between institutions. In addition, our weight-based integrated model provided a weighted average model by integrating 10 models overfitted or underfitted, compared with the centralized model. The proposed weight-based integrated model is expected to provide an efficient distributed research approach as it increases the generalizability of the model and does not require iterative communication. UR - https://medinform.jmir.org/2021/4/e21043 UR - http://dx.doi.org/10.2196/21043 UR - http://www.ncbi.nlm.nih.gov/pubmed/33818396 ID - info:doi/10.2196/21043 ER - TY - JOUR AU - Tran, Linh AU - Chi, Lianhua AU - Bonti, Alessio AU - Abdelrazek, Mohamed AU - Chen, Phoebe Yi-Ping PY - 2021/4/1 TI - Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study JO - JMIR Med Inform SP - e25000 VL - 9 IS - 4 KW - mortality KW - cardiovascular KW - medical claims data KW - imbalanced data KW - machine learning KW - deep learning N2 - Background: Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. Objective: This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit. Methods: The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of deceased patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data. Results: Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8% and 97.7%, respectively), followed by ET (96.8%) and LR (96.4%), whereas DNN was the least discriminative (95.3%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data. Conclusions: Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. UR - https://medinform.jmir.org/2021/4/e25000 UR - http://dx.doi.org/10.2196/25000 UR - http://www.ncbi.nlm.nih.gov/pubmed/33792549 ID - info:doi/10.2196/25000 ER - TY - JOUR AU - Park, Jimyung AU - You, Chan Seng AU - Jeong, Eugene AU - Weng, Chunhua AU - Park, Dongsu AU - Roh, Jin AU - Lee, Yun Dong AU - Cheong, Youn Jae AU - Choi, Wook Jin AU - Kang, Mira AU - Park, Woong Rae PY - 2021/3/30 TI - A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study JO - JMIR Med Inform SP - e23983 VL - 9 IS - 3 KW - natural language processing KW - search engine KW - data curation KW - data management KW - common data model N2 - Background: Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions. Objective: This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data. Methods: We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission. Results: Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings. Conclusions: We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research. UR - https://medinform.jmir.org/2021/3/e23983 UR - http://dx.doi.org/10.2196/23983 UR - http://www.ncbi.nlm.nih.gov/pubmed/33783361 ID - info:doi/10.2196/23983 ER - TY - JOUR AU - Peterson, S. Kelly AU - Lewis, Julia AU - Patterson, V. Olga AU - Chapman, B. Alec AU - Denhalter, W. Daniel AU - Lye, A. Patricia AU - Stevens, W. Vanessa AU - Gamage, D. Shantini AU - Roselle, A. Gary AU - Wallace, S. Katherine AU - Jones, Makoto PY - 2021/3/24 TI - Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation JO - JMIR Public Health Surveill SP - e26719 VL - 7 IS - 3 KW - natural language processing KW - machine learning KW - travel history KW - COVID-19 KW - Zika KW - infectious disease surveillance KW - surveillance applications KW - biosurveillance KW - electronic health record N2 - Background: Patient travel history can be crucial in evaluating evolving infectious disease events. Such information can be challenging to acquire in electronic health records, as it is often available only in unstructured text. Objective: This study aims to assess the feasibility of annotating and automatically extracting travel history mentions from unstructured clinical documents in the Department of Veterans Affairs across disparate health care facilities and among millions of patients. Information about travel exposure augments existing surveillance applications for increased preparedness in responding quickly to public health threats. Methods: Clinical documents related to arboviral disease were annotated following selection using a semiautomated bootstrapping process. Using annotated instances as training data, models were developed to extract from unstructured clinical text any mention of affirmed travel locations outside of the continental United States. Automated text processing models were evaluated, involving machine learning and neural language models for extraction accuracy. Results: Among 4584 annotated instances, 2659 (58%) contained an affirmed mention of travel history, while 347 (7.6%) were negated. Interannotator agreement resulted in a document-level Cohen kappa of 0.776. Automated text processing accuracy (F1 85.6, 95% CI 82.5-87.9) and computational burden were acceptable such that the system can provide a rapid screen for public health events. Conclusions: Automated extraction of patient travel history from clinical documents is feasible for enhanced passive surveillance public health systems. Without such a system, it would usually be necessary to manually review charts to identify recent travel or lack of travel, use an electronic health record that enforces travel history documentation, or ignore this potential source of information altogether. The development of this tool was initially motivated by emergent arboviral diseases. More recently, this system was used in the early phases of response to COVID-19 in the United States, although its utility was limited to a relatively brief window due to the rapid domestic spread of the virus. Such systems may aid future efforts to prevent and contain the spread of infectious diseases. UR - https://publichealth.jmir.org/2021/3/e26719 UR - http://dx.doi.org/10.2196/26719 UR - http://www.ncbi.nlm.nih.gov/pubmed/33759790 ID - info:doi/10.2196/26719 ER - TY - JOUR AU - Zhao, Peng AU - Yoo, Illhoi AU - Naqvi, H. Syed PY - 2021/3/23 TI - Early Prediction of Unplanned 30-Day Hospital Readmission: Model Development and Retrospective Data Analysis JO - JMIR Med Inform SP - e16306 VL - 9 IS - 3 KW - patient readmission KW - risk factors KW - unplanned KW - early detection KW - all-cause KW - predictive model KW - 30-day KW - machine learning N2 - Background: Existing readmission reduction solutions tend to focus on complementing inpatient care with enhanced care transition and postdischarge interventions. These solutions are initiated near or after discharge, when clinicians? impact on inpatient care is ending. Preventive intervention during hospitalization is an underexplored area that holds potential for reducing readmission risk. However, it is challenging to predict readmission risk at the early stage of hospitalization because few data are available. Objective: The objective of this study was to build an early prediction model of unplanned 30-day hospital readmission using a large and diverse sample. We were also interested in identifying novel readmission risk factors and protective factors. Methods: We extracted the medical records of 96,550 patients in 205 participating Cerner client hospitals across four US census regions in 2016 from the Health Facts database. The model was built with index admission data that can become available within 24 hours and data from previous encounters up to 1 year before the index admission. The candidate models were evaluated for performance, timeliness, and generalizability. Multivariate logistic regression analysis was used to identify readmission risk factors and protective factors. Results: We developed six candidate readmission models with different machine learning algorithms. The best performing model of extreme gradient boosting (XGBoost) achieved an area under the receiver operating characteristic curve of 0.753 on the development data set and 0.742 on the validation data set. By multivariate logistic regression analysis, we identified 14 risk factors and 2 protective factors of readmission that have never been reported. Conclusions: The performance of our model is better than that of the most widely used models in US health care settings. This model can help clinicians identify readmission risk at the early stage of hospitalization so that they can pay extra attention during the care process of high-risk patients. The 14 novel risk factors and 2 novel protective factors can aid understanding of the factors associated with readmission. UR - https://medinform.jmir.org/2021/3/e16306 UR - http://dx.doi.org/10.2196/16306 UR - http://www.ncbi.nlm.nih.gov/pubmed/33755027 ID - info:doi/10.2196/16306 ER - TY - JOUR AU - Kohane, S. Isaac AU - Aronow, J. Bruce AU - Avillach, Paul AU - Beaulieu-Jones, K. Brett AU - Bellazzi, Riccardo AU - Bradford, L. Robert AU - Brat, A. Gabriel AU - Cannataro, Mario AU - Cimino, J. James AU - García-Barrio, Noelia AU - Gehlenborg, Nils AU - Ghassemi, Marzyeh AU - Gutiérrez-Sacristán, Alba AU - Hanauer, A. David AU - Holmes, H. John AU - Hong, Chuan AU - Klann, G. Jeffrey AU - Loh, Will Ne Hooi AU - Luo, Yuan AU - Mandl, D. Kenneth AU - Daniar, Mohamad AU - Moore, H. Jason AU - Murphy, N. Shawn AU - Neuraz, Antoine AU - Ngiam, Yuan Kee AU - Omenn, S. Gilbert AU - Palmer, Nathan AU - Patel, P. Lav AU - Pedrera-Jiménez, Miguel AU - Sliz, Piotr AU - South, M. Andrew AU - Tan, Min Amelia Li AU - Taylor, M. Deanne AU - Taylor, W. Bradley AU - Torti, Carlo AU - Vallejos, K. Andrew AU - Wagholikar, B. Kavishwar AU - AU - Weber, M. Griffin AU - Cai, Tianxi PY - 2021/3/2 TI - What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask JO - J Med Internet Res SP - e22219 VL - 23 IS - 3 KW - COVID-19 KW - electronic health records KW - real-world data KW - literature KW - publishing KW - quality KW - data quality KW - reporting standards KW - reporting checklist KW - review KW - statistics UR - https://www.jmir.org/2021/3/e22219 UR - http://dx.doi.org/10.2196/22219 UR - http://www.ncbi.nlm.nih.gov/pubmed/33600347 ID - info:doi/10.2196/22219 ER - TY - JOUR AU - Lounsbury, Olivia AU - Roberts, Lily AU - Goodman, R. Jonathan AU - Batey, Philippa AU - Naar, Lenny AU - Flott, M. Kelsey AU - Lawrence-Jones, Anna AU - Ghafur, Saira AU - Darzi, Ara AU - Neves, Luisa Ana PY - 2021/2/22 TI - Opening a ?Can of Worms? to Explore the Public's Hopes and Fears About Health Care Data Sharing: Qualitative Study JO - J Med Internet Res SP - e22744 VL - 23 IS - 2 KW - electronic health records KW - patient participation KW - data sharing KW - patient safety KW - data security N2 - Background: Evidence suggests that health care data sharing may strengthen care coordination, improve quality and safety, and reduce costs. However, to achieve efficient and meaningful adoption of health care data-sharing initiatives, it is necessary to engage all stakeholders, from health care professionals to patients. Although previous work has assessed health care professionals? perceptions of data sharing, perspectives of the general public and particularly of seldom heard groups have yet to be fully assessed. Objective: This study aims to explore the views of the public, particularly their hopes and concerns, around health care data sharing. Methods: An original, immersive public engagement interactive experience was developed?The Can of Worms installation?in which participants were prompted to reflect about data sharing through listening to individual stories around health care data sharing. A multidisciplinary team with expertise in research, public involvement, and human-centered design developed this concept. The installation took place in three separate events between November 2018 and November 2019. A combination of convenience and snowball sampling was used in this study. Participants were asked to fill self-administered feedback cards and to describe their hopes and fears about the meaningful use of data in health care. The transcripts were compiled verbatim and systematically reviewed by four independent reviewers using the thematic analysis method to identify emerging themes. Results: Our approach exemplifies the potential of using interdisciplinary expertise in research, public involvement, and human-centered design to tell stories, collect perspectives, and spark conversations around complex topics in participatory digital medicine. A total of 352 qualitative feedback cards were collected, each reflecting participants? hopes and fears for health care data sharing. Thematic analyses identified six themes under hopes: enablement of personal access and ownership, increased interoperability and collaboration, generation of evidence for better and safer care, improved timeliness and efficiency, delivery of more personalized care, and equality. The five main fears identified included inadequate security and exploitation, data inaccuracy, distrust, discrimination and inequality, and less patient-centered care. Conclusions: This study sheds new light on the main hopes and fears of the public regarding health care data sharing. Importantly, our results highlight novel concerns from the public, particularly in terms of the impact on health disparities, both at international and local levels, and on delivering patient-centered care. Incorporating the knowledge generated and focusing on co-designing solutions to tackle these concerns is critical to engage the public as active contributors and to fully leverage the potential of health care data use. UR - https://www.jmir.org/2021/2/e22744 UR - http://dx.doi.org/10.2196/22744 UR - http://www.ncbi.nlm.nih.gov/pubmed/33616532 ID - info:doi/10.2196/22744 ER - TY - JOUR AU - Vaid, Akhil AU - Jaladanki, K. Suraj AU - Xu, Jie AU - Teng, Shelly AU - Kumar, Arvind AU - Lee, Samuel AU - Somani, Sulaiman AU - Paranjpe, Ishan AU - De Freitas, K. Jessica AU - Wanyan, Tingyi AU - Johnson, W. Kipp AU - Bicak, Mesude AU - Klang, Eyal AU - Kwon, Joon Young AU - Costa, Anthony AU - Zhao, Shan AU - Miotto, Riccardo AU - Charney, W. Alexander AU - Böttinger, Erwin AU - Fayad, A. Zahi AU - Nadkarni, N. Girish AU - Wang, Fei AU - Glicksberg, S. Benjamin PY - 2021/1/27 TI - Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach JO - JMIR Med Inform SP - e24207 VL - 9 IS - 1 KW - federated learning KW - COVID-19 KW - machine learning KW - electronic health records N2 - Background: Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability. Objective: We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days. Methods: Patient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator. Results: The LASSOfederated model outperformed the LASSOlocal model at 3 hospitals, and the MLPfederated model performed better than the MLPlocal model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSOpooled model outperformed the LASSOfederated model at all hospitals, and the MLPfederated model outperformed the MLPpooled model at 2 hospitals. Conclusions: The federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy. UR - http://medinform.jmir.org/2021/1/e24207/ UR - http://dx.doi.org/10.2196/24207 UR - http://www.ncbi.nlm.nih.gov/pubmed/33400679 ID - info:doi/10.2196/24207 ER - TY - JOUR AU - Gaudet-Blavignac, Christophe AU - Foufi, Vasiliki AU - Bjelogrlic, Mina AU - Lovis, Christian PY - 2021/1/26 TI - Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review JO - J Med Internet Res SP - e24594 VL - 23 IS - 1 KW - SNOMED CT KW - natural language processing KW - scoping review KW - terminology N2 - Background: Interoperability and secondary use of data is a challenge in health care. Specifically, the reuse of clinical free text remains an unresolved problem. The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) has become the universal language of health care and presents characteristics of a natural language. Its use to represent clinical free text could constitute a solution to improve interoperability. Objective: Although the use of SNOMED and SNOMED CT has already been reviewed, its specific use in processing and representing unstructured data such as clinical free text has not. This review aims to better understand SNOMED CT's use for representing free text in medicine. Methods: A scoping review was performed on the topic by searching MEDLINE, Embase, and Web of Science for publications featuring free-text processing and SNOMED CT. A recursive reference review was conducted to broaden the scope of research. The review covered the type of processed data, the targeted language, the goal of the terminology binding, the method used and, when appropriate, the specific software used. Results: In total, 76 publications were selected for an extensive study. The language targeted by publications was 91% (n=69) English. The most frequent types of documents for which the terminology was used are complementary exam reports (n=18, 24%) and narrative notes (n=16, 21%). Mapping to SNOMED CT was the final goal of the research in 21% (n=16) of publications and a part of the final goal in 33% (n=25). The main objectives of mapping are information extraction (n=44, 39%), feature in a classification task (n=26, 23%), and data normalization (n=23, 20%). The method used was rule-based in 70% (n=53) of publications, hybrid in 11% (n=8), and machine learning in 5% (n=4). In total, 12 different software packages were used to map text to SNOMED CT concepts, the most frequent being Medtex, Mayo Clinic Vocabulary Server, and Medical Text Extraction Reasoning and Mapping System. Full terminology was used in 64% (n=49) of publications, whereas only a subset was used in 30% (n=23) of publications. Postcoordination was proposed in 17% (n=13) of publications, and only 5% (n=4) of publications specifically mentioned the use of the compositional grammar. Conclusions: SNOMED CT has been largely used to represent free-text data, most frequently with rule-based approaches, in English. However, currently, there is no easy solution for mapping free text to this terminology and to perform automatic postcoordination. Most solutions conceive SNOMED CT as a simple terminology rather than as a compositional bag of ontologies. Since 2012, the number of publications on this subject per year has decreased. However, the need for formal semantic representation of free text in health care is high, and automatic encoding into a compositional ontology could be a solution. UR - http://www.jmir.org/2021/1/e24594/ UR - http://dx.doi.org/10.2196/24594 UR - http://www.ncbi.nlm.nih.gov/pubmed/33496673 ID - info:doi/10.2196/24594 ER - TY - JOUR AU - Schmit, Cason AU - Ajayi, V. Kobi AU - Ferdinand, O. Alva AU - Giannouchos, Theodoros AU - Ilangovan, Gurudev AU - Nowell, Benjamin W. AU - Kum, Hye-Chung PY - 2020/12/15 TI - Communicating With Patients About Software for Enhancing Privacy in Secondary Database Research Involving Record Linkage: Delphi Study JO - J Med Internet Res SP - e20783 VL - 22 IS - 12 KW - Delphi technique KW - privacy KW - communication barriers KW - medical record linkage KW - research subjects KW - big data N2 - Background: There is substantial prior research on the perspectives of patients on the use of health information for research. Numerous communication barriers challenge transparency between researchers and data participants in secondary database research (eg, waiver of informed consent and knowledge gaps). Individual concerns and misconceptions challenge the trust in researchers among patients despite efforts to protect data. Technical software used to protect research data can further complicate the public's understanding of research. For example, MiNDFIRL (Minimum Necessary Disclosure For Interactive Record Linkage) is a prototype software that can be used to enhance the confidentiality of data sets by restricting disclosures of identifying information during the record linkage process. However, software, such as MiNDFIRL, which is used to protect data, must overcome the aforementioned communication barriers. One proposed solution is the creation of an interactive web-based frequently asked question (FAQ) template that can be adapted and used to communicate research issues to data subjects. Objective: This study aims to improve communication with patients and transparency about how complex software, such as MiNDFIRL, is used to enhance privacy in secondary database studies to maintain the public's trust in researchers. Methods: A Delphi technique with 3 rounds of the survey was used to develop the FAQ document to communicate privacy issues related to a generic secondary database study using the MiNDFIRL software. The Delphi panel consisted of 38 patients with chronic health conditions. We revised the FAQ between Delphi rounds and provided participants with a summary of the feedback. We adopted a conservative consensus threshold of less than 10% negative feedback per FAQ section. Results: We developed a consensus language for 21 of the 24 FAQ sections. Participant feedback demonstrated preference differences (eg, brevity vs comprehensiveness). We adapted the final FAQ into an interactive web-based format that 94% (31/33) of the participants found helpful or very helpful. The template FAQ and MiNDFIRL source code are available on GitHub. The results indicate the following patient communication considerations: patients have diverse and varied preferences; the tone is important but challenging; and patients want information on security, identifiers, and final disposition of information. Conclusions: The findings of this study provide insights into what research-related information is useful to patients and how researchers can communicate such information. These findings align with the current understanding of health literacy and its challenges. Communication is essential to transparency and ethical data use, yet it is exceedingly challenging. Developing FAQ template language to accompany a complex software may enable researchers to provide greater transparency when informed consent is not possible. UR - http://www.jmir.org/2020/12/e20783/ UR - http://dx.doi.org/10.2196/20783 UR - http://www.ncbi.nlm.nih.gov/pubmed/33320097 ID - info:doi/10.2196/20783 ER - TY - JOUR AU - Jeon, Seungho AU - Seo, Jeongeun AU - Kim, Sukyoung AU - Lee, Jeongmoon AU - Kim, Jong-Ho AU - Sohn, Wook Jang AU - Moon, Jongsub AU - Joo, Joon Hyung PY - 2020/11/26 TI - Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models JO - J Med Internet Res SP - e19597 VL - 22 IS - 11 KW - de-identification KW - privacy KW - anonymization KW - common data model KW - Observational Health Data Sciences and Informatics N2 - Background: De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective: This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods: The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results: The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one ?highest risk? value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the ?source values? (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions: Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers. UR - http://www.jmir.org/2020/11/e19597/ UR - http://dx.doi.org/10.2196/19597 UR - http://www.ncbi.nlm.nih.gov/pubmed/33177037 ID - info:doi/10.2196/19597 ER - TY - JOUR AU - El Emam, Khaled AU - Mosquera, Lucy AU - Bass, Jason PY - 2020/11/16 TI - Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation JO - J Med Internet Res SP - e23139 VL - 22 IS - 11 KW - synthetic data KW - privacy KW - data sharing KW - data access KW - de-identification KW - open data N2 - Background: There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them. Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data. Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this ?meaningful identity disclosure risk.? The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data. Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively. Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data. UR - http://www.jmir.org/2020/11/e23139/ UR - http://dx.doi.org/10.2196/23139 UR - http://www.ncbi.nlm.nih.gov/pubmed/33196453 ID - info:doi/10.2196/23139 ER - TY - JOUR AU - Oliveira, R. Carlos AU - Niccolai, Patrick AU - Ortiz, Michelle Anette AU - Sheth, S. Sangini AU - Shapiro, D. Eugene AU - Niccolai, M. Linda AU - Brandt, A. Cynthia PY - 2020/11/3 TI - Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study JO - JMIR Med Inform SP - e20826 VL - 8 IS - 11 KW - natural language processing KW - automated data extraction KW - human papillomavirus KW - surveillance KW - pathology reporting KW - cervical cancer KW - anal cancer KW - precancer KW - cancer KW - HPV KW - accuracy N2 - Background: Accurate identification of new diagnoses of human papillomavirus?associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. Objective: This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. Methods: A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm?s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm?s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. Results: The natural language processing algorithm?s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). Conclusions: This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology. UR - https://medinform.jmir.org/2020/11/e20826 UR - http://dx.doi.org/10.2196/20826 UR - http://www.ncbi.nlm.nih.gov/pubmed/32469840 ID - info:doi/10.2196/20826 ER - TY - JOUR AU - Fan, Yunzhou AU - Wu, Yanyan AU - Cao, Xiongjing AU - Zou, Junning AU - Zhu, Ming AU - Dai, Di AU - Lu, Lin AU - Yin, Xiaoxv AU - Xiong, Lijuan PY - 2020/10/23 TI - Automated Cluster Detection of Health Care?Associated Infection Based on the Multisource Surveillance of Process Data in the Area Network: Retrospective Study of Algorithm Development and Validation JO - JMIR Med Inform SP - e16901 VL - 8 IS - 10 KW - health care?associated infection KW - cluster detection KW - early warning KW - multi sources surveillance KW - process data N2 - Background: The cluster detection of health care?associated infections (HAIs) is crucial for identifying HAI outbreaks in the early stages. Objective: We aimed to verify whether multisource surveillance based on the process data in an area network can be effective in detecting HAI clusters. Methods: We retrospectively analyzed the incidence of HAIs and 3 indicators of process data relative to infection, namely, antibiotic utilization rate in combination, inspection rate of bacterial specimens, and positive rate of bacterial specimens, from 4 independent high-risk units in a tertiary hospital in China. We utilized the Shewhart warning model to detect the peaks of the time-series data. Subsequently, we designed 5 surveillance strategies based on the process data for the HAI cluster detection: (1) antibiotic utilization rate in combination only, (2) inspection rate of bacterial specimens only, (3) positive rate of bacterial specimens only, (4) antibiotic utilization rate in combination + inspection rate of bacterial specimens + positive rate of bacterial specimens in parallel, and (5) antibiotic utilization rate in combination + inspection rate of bacterial specimens + positive rate of bacterial specimens in series. We used the receiver operating characteristic (ROC) curve and Youden index to evaluate the warning performance of these surveillance strategies for the detection of HAI clusters. Results: The ROC curves of the 5 surveillance strategies were located above the standard line, and the area under the curve of the ROC was larger in the parallel strategy than in the series strategy and the single-indicator strategies. The optimal Youden indexes were 0.48 (95% CI 0.29-0.67) at a threshold of 1.5 in the antibiotic utilization rate in combination?only strategy, 0.49 (95% CI 0.45-0.53) at a threshold of 0.5 in the inspection rate of bacterial specimens?only strategy, 0.50 (95% CI 0.28-0.71) at a threshold of 1.1 in the positive rate of bacterial specimens?only strategy, 0.63 (95% CI 0.49-0.77) at a threshold of 2.6 in the parallel strategy, and 0.32 (95% CI 0.00-0.65) at a threshold of 0.0 in the series strategy. The warning performance of the parallel strategy was greater than that of the single-indicator strategies when the threshold exceeded 1.5. Conclusions: The multisource surveillance of process data in the area network is an effective method for the early detection of HAI clusters. The combination of multisource data and the threshold of the warning model are 2 important factors that influence the performance of the model. UR - http://medinform.jmir.org/2020/10/e16901/ UR - http://dx.doi.org/10.2196/16901 UR - http://www.ncbi.nlm.nih.gov/pubmed/32965228 ID - info:doi/10.2196/16901 ER - TY - JOUR AU - Wu, Jun AU - Wang, Jian AU - Nicholas, Stephen AU - Maitland, Elizabeth AU - Fan, Qiuyan PY - 2020/10/9 TI - Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations JO - J Med Internet Res SP - e21980 VL - 22 IS - 10 KW - big data KW - COVID-19 KW - disease prevention and control N2 - Background: In the prevention and control of infectious diseases, previous research on the application of big data technology has mainly focused on the early warning and early monitoring of infectious diseases. Although the application of big data technology for COVID-19 warning and monitoring remain important tasks, prevention of the disease?s rapid spread and reduction of its impact on society are currently the most pressing challenges for the application of big data technology during the COVID-19 pandemic. After the outbreak of COVID-19 in Wuhan, the Chinese government and nongovernmental organizations actively used big data technology to prevent, contain, and control the spread of COVID-19. Objective: The aim of this study is to discuss the application of big data technology to prevent, contain, and control COVID-19 in China; draw lessons; and make recommendations. Methods: We discuss the data collection methods and key data information that existed in China before the outbreak of COVID-19 and how these data contributed to the prevention and control of COVID-19. Next, we discuss China?s new data collection methods and new information assembled after the outbreak of COVID-19. Based on the data and information collected in China, we analyzed the application of big data technology from the perspectives of data sources, data application logic, data application level, and application results. In addition, we analyzed the issues, challenges, and responses encountered by China in the application of big data technology from four perspectives: data access, data use, data sharing, and data protection. Suggestions for improvements are made for data collection, data circulation, data innovation, and data security to help understand China?s response to the epidemic and to provide lessons for other countries? prevention and control of COVID-19. Results: In the process of the prevention and control of COVID-19 in China, big data technology has played an important role in personal tracking, surveillance and early warning, tracking of the virus?s sources, drug screening, medical treatment, resource allocation, and production recovery. The data used included location and travel data, medical and health data, news media data, government data, online consumption data, data collected by intelligent equipment, and epidemic prevention data. We identified a number of big data problems including low efficiency of data collection, difficulty in guaranteeing data quality, low efficiency of data use, lack of timely data sharing, and data privacy protection issues. To address these problems, we suggest unified data collection standards, innovative use of data, accelerated exchange and circulation of data, and a detailed and rigorous data protection system. Conclusions: China has used big data technology to prevent and control COVID-19 in a timely manner. To prevent and control infectious diseases, countries must collect, clean, and integrate data from a wide range of sources; use big data technology to analyze a wide range of big data; create platforms for data analyses and sharing; and address privacy issues in the collection and use of big data. UR - http://www.jmir.org/2020/10/e21980/ UR - http://dx.doi.org/10.2196/21980 UR - http://www.ncbi.nlm.nih.gov/pubmed/33001836 ID - info:doi/10.2196/21980 ER - TY - JOUR AU - Xiu, Xiaolei AU - Qian, Qing AU - Wu, Sizhu PY - 2020/10/7 TI - Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study JO - JMIR Med Inform SP - e18287 VL - 8 IS - 10 KW - Chinese electronic medical records KW - knowledge graph KW - digestive system tumor KW - graph evaluation N2 - Background: With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. Objective: This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. Methods: This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. Results: Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of ?rationality of schema structure,? ?scalability,? and ?readability of results,? the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its ?practicability? score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. Conclusions: We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG?s potential. UR - http://medinform.jmir.org/2020/10/e18287/ UR - http://dx.doi.org/10.2196/18287 UR - http://www.ncbi.nlm.nih.gov/pubmed/33026359 ID - info:doi/10.2196/18287 ER - TY - JOUR AU - Scheible, Raphael AU - Kadioglu, Dennis AU - Ehl, Stephan AU - Blum, Marco AU - Boeker, Martin AU - Folz, Michael AU - Grimbacher, Bodo AU - Göbel, Jens AU - Klein, Christoph AU - Nieters, Alexandra AU - Rusch, Stephan AU - Kindle, Gerhard AU - Storf, Holger PY - 2020/10/7 TI - Enabling External Inquiries to an Existing Patient Registry by Using the Open Source Registry System for Rare Diseases: Demonstration of the System Using the European Society for Immunodeficiencies Registry JO - JMIR Med Inform SP - e17420 VL - 8 IS - 10 KW - registry interoperability KW - collaboration in research KW - data findability KW - registry software N2 - Background: The German Network on Primary Immunodeficiency Diseases (PID-NET) utilizes the European Society for Immunodeficiencies (ESID) registry as a platform for collecting data. In the context of PID-NET data, we show how registries based on custom software can be made interoperable for better collaborative access to precollected data. The Open Source Registry System for Rare Diseases (Open-Source-Registersystem für Seltene Erkrankungen [OSSE], in German) provides patient organizations, physicians, scientists, and other parties with open source software for the creation of patient registries. In addition, the necessary interoperability between different registries based on the OSSE, as well as existing registries, is supported, which allows those registries to be confederated at both the national and international levels. Objective: Data from the PID-NET registry should be made available in an interoperable manner without losing data sovereignty by extending the existing custom software of the registry using the OSSE registry framework. Methods: This paper describes the following: (1) the installation and configuration of the OSSE bridgehead, (2) an approach using a free toolchain to set up the required interfaces to connect a registry with the OSSE bridgehead, and (3) the decentralized search, which allows the formulation of inquiries that are sent to a selected set of registries of interest. Results: PID-NET uses the established and highly customized ESID registry software. By setting up a so-called OSSE bridgehead, PID-NET data are made interoperable according to a federated approach, and centrally formulated inquiries for data can be received. As the first registry to use the OSSE bridgehead, the authors introduce an approach using a free toolchain to efficiently implement and maintain the required interfaces. Finally, to test and demonstrate the system, two inquiries are realized using the graphical query builder. By establishing and interconnecting an OSSE bridgehead with the underlying ESID registry, confederated queries for data can be received and, if desired, the inquirer can be contacted to further discuss any requirements for cooperation. Conclusions: The OSSE offers an infrastructure that provides the possibility of more collaborative and transparent research. The decentralized search functionality includes registries into one search application while still maintaining data sovereignty. The OSSE bridgehead enables any registry software to be integrated into the OSSE network. The proposed toolchain to set up the required interfaces consists of freely available software components that are well documented. The use of the decentralized search is uncomplicated to use and offers a well-structured, yet still improvable, graphical user interface to formulate queries. UR - http://medinform.jmir.org/2020/10/e17420/ UR - http://dx.doi.org/10.2196/17420 UR - http://www.ncbi.nlm.nih.gov/pubmed/33026355 ID - info:doi/10.2196/17420 ER - TY - JOUR AU - Schwab, Patrick AU - DuMont Schütte, August AU - Dietz, Benedikt AU - Bauer, Stefan PY - 2020/10/6 TI - Clinical Predictive Models for COVID-19: Systematic Study JO - J Med Internet Res SP - e21439 VL - 22 IS - 10 KW - SARS-CoV-2 KW - COVID-19 KW - machine learning KW - clinical prediction KW - prediction KW - infectious disease KW - clinical data KW - testing KW - hospitalization KW - intensive care N2 - Background: COVID-19 is a rapidly emerging respiratory disease caused by SARS-CoV-2. Due to the rapid human-to-human transmission of SARS-CoV-2, many health care systems are at risk of exceeding their health care capacities, in particular in terms of SARS-CoV-2 tests, hospital and intensive care unit (ICU) beds, and mechanical ventilators. Predictive algorithms could potentially ease the strain on health care systems by identifying those who are most likely to receive a positive SARS-CoV-2 test, be hospitalized, or admitted to the ICU. Objective: The aim of this study is to develop, study, and evaluate clinical predictive models that estimate, using machine learning and based on routinely collected clinical data, which patients are likely to receive a positive SARS-CoV-2 test or require hospitalization or intensive care. Methods: Using a systematic approach to model development and optimization, we trained and compared various types of machine learning models, including logistic regression, neural networks, support vector machines, random forests, and gradient boosting. To evaluate the developed models, we performed a retrospective evaluation on demographic, clinical, and blood analysis data from a cohort of 5644 patients. In addition, we determined which clinical features were predictive to what degree for each of the aforementioned clinical tasks using causal explanations. Results: Our experimental results indicate that our predictive models identified patients that test positive for SARS-CoV-2 a priori at a sensitivity of 75% (95% CI 67%-81%) and a specificity of 49% (95% CI 46%-51%), patients who are SARS-CoV-2 positive that require hospitalization with 0.92 area under the receiver operator characteristic curve (AUC; 95% CI 0.81-0.98), and patients who are SARS-CoV-2 positive that require critical care with 0.98 AUC (95% CI 0.95-1.00). Conclusions: Our results indicate that predictive models trained on routinely collected clinical data could be used to predict clinical pathways for COVID-19 and, therefore, help inform care and prioritize resources. UR - http://www.jmir.org/2020/10/e21439/ UR - http://dx.doi.org/10.2196/21439 UR - http://www.ncbi.nlm.nih.gov/pubmed/32976111 ID - info:doi/10.2196/21439 ER - TY - JOUR AU - Lim, Cherry AU - Miliya, Thyl AU - Chansamouth, Vilada AU - Aung, Thazin Myint AU - Karkey, Abhilasha AU - Teparrukkul, Prapit AU - Rahul, Batra AU - Lan, Huong Nguyen Phu AU - Stelling, John AU - Turner, Paul AU - Ashley, Elizabeth AU - van Doorn, Rogier H. AU - Lin, Naing Htet AU - Ling, Clare AU - Hinjoy, Soawapak AU - Iamsirithaworn, Sopon AU - Dunachie, Susanna AU - Wangrangsimakul, Tri AU - Hantrakun, Viriya AU - Schilling, William AU - Yen, Minh Lam AU - Tan, Van Le AU - Hlaing, Htay Htay AU - Mayxay, Mayfong AU - Vongsouvath, Manivanh AU - Basnyat, Buddha AU - Edgeworth, Jonathan AU - Peacock, J. Sharon AU - Thwaites, Guy AU - Day, PJ Nicholas AU - Cooper, S. Ben AU - Limmathurotsakul, Direk PY - 2020/10/2 TI - Automating the Generation of Antimicrobial Resistance Surveillance Reports: Proof-of-Concept Study Involving Seven Hospitals in Seven Countries JO - J Med Internet Res SP - e19762 VL - 22 IS - 10 KW - antimicrobial resistance KW - surveillance KW - report KW - data analysis KW - application N2 - Background: Reporting cumulative antimicrobial susceptibility testing data on a regular basis is crucial to inform antimicrobial resistance (AMR) action plans at local, national, and global levels. However, analyzing data and generating a report are time consuming and often require trained personnel. Objective: This study aimed to develop and test an application that can support a local hospital to analyze routinely collected electronic data independently and generate AMR surveillance reports rapidly. Methods: An offline application to generate standardized AMR surveillance reports from routinely available microbiology and hospital data files was written in the R programming language (R Project for Statistical Computing). The application can be run by double clicking on the application file without any further user input. The data analysis procedure and report content were developed based on the recommendations of the World Health Organization Global Antimicrobial Resistance Surveillance System (WHO GLASS). The application was tested on Microsoft Windows 10 and 7 using open access example data sets. We then independently tested the application in seven hospitals in Cambodia, Lao People?s Democratic Republic, Myanmar, Nepal, Thailand, the United Kingdom, and Vietnam. Results: We developed the AutoMated tool for Antimicrobial resistance Surveillance System (AMASS), which can support clinical microbiology laboratories to analyze their microbiology and hospital data files (in CSV or Excel format) onsite and promptly generate AMR surveillance reports (in PDF and CSV formats). The data files could be those exported from WHONET or other laboratory information systems. The automatically generated reports contain only summary data without patient identifiers. The AMASS application is downloadable from https://www.amass.website/. The participating hospitals tested the application and deposited their AMR surveillance reports in an open access data repository. Conclusions: The AMASS is a useful tool to support the generation and sharing of AMR surveillance reports. UR - https://www.jmir.org/2020/10/e19762 UR - http://dx.doi.org/10.2196/19762 UR - http://www.ncbi.nlm.nih.gov/pubmed/33006570 ID - info:doi/10.2196/19762 ER - TY - JOUR AU - Brown, Paul Adrian AU - Randall, M. Sean PY - 2020/9/23 TI - Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model JO - JMIR Med Inform SP - e18920 VL - 8 IS - 9 KW - cloud computing KW - medical record linkage KW - confidentiality KW - data science N2 - Background: The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. Objective: This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. Methods: A new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. Results: The cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). Conclusions: The result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs. UR - http://medinform.jmir.org/2020/9/e18920/ UR - http://dx.doi.org/10.2196/18920 UR - http://www.ncbi.nlm.nih.gov/pubmed/32965236 ID - info:doi/10.2196/18920 ER - TY - JOUR AU - Gagalova, K. Kristina AU - Leon Elizalde, Angelica M. AU - Portales-Casamar, Elodie AU - Görges, Matthias PY - 2020/8/27 TI - What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions JO - JMIR Form Res SP - e17687 VL - 4 IS - 8 KW - database KW - data warehousing KW - data aggregation KW - information storage and retrieval KW - data analytics KW - health informatics N2 - Background: Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective: The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods: We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results: Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions: IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning. UR - http://formative.jmir.org/2020/8/e17687/ UR - http://dx.doi.org/10.2196/17687 UR - http://www.ncbi.nlm.nih.gov/pubmed/32852280 ID - info:doi/10.2196/17687 ER - TY - JOUR AU - Weissler, Hope Elizabeth AU - Lippmann, J. Steven AU - Smerek, M. Michelle AU - Ward, A. Rachael AU - Kansal, Aman AU - Brock, Adam AU - Sullivan, C. Robert AU - Long, Chandler AU - Patel, R. Manesh AU - Greiner, A. Melissa AU - Hardy, Chantelle N. AU - Curtis, H. Lesley AU - Jones, Schuyler W. PY - 2020/8/19 TI - Model-Based Algorithms for Detecting Peripheral Artery Disease Using Administrative Data From an Electronic Health Record Data System: Algorithm Development Study JO - JMIR Med Inform SP - e18542 VL - 8 IS - 8 KW - peripheral artery disease KW - patient selection KW - electronic health records KW - cardiology KW - health data N2 - Background: Peripheral artery disease (PAD) affects 8 to 10 million Americans, who face significantly elevated risks of both mortality and major limb events such as amputation. Unfortunately, PAD is relatively underdiagnosed, undertreated, and underresearched, leading to wide variations in treatment patterns and outcomes. Efforts to improve PAD care and outcomes have been hampered by persistent difficulties identifying patients with PAD for clinical and investigatory purposes. Objective: The aim of this study is to develop and validate a model-based algorithm to detect patients with peripheral artery disease (PAD) using data from an electronic health record (EHR) system. Methods: An initial query of the EHR in a large health system identified all patients with PAD-related diagnosis codes for any encounter during the study period. Clinical adjudication of PAD diagnosis was performed by chart review on a random subgroup. A binary logistic regression to predict PAD was built and validated using a least absolute shrinkage and selection operator (LASSO) approach in the adjudicated patients. The algorithm was then applied to the nonsampled records to further evaluate its performance. Results: The initial EHR data query using 406 diagnostic codes yielded 15,406 patients. Overall, 2500 patients were randomly selected for ground truth PAD status adjudication. In the end, 108 code flags remained after removing rarely- and never-used codes. We entered these code flags plus administrative encounter, imaging, procedure, and specialist flags into a LASSO model. The area under the curve for this model was 0.862. Conclusions: The algorithm we constructed has two main advantages over other approaches to the identification of patients with PAD. First, it was derived from a broad population of patients with many different PAD manifestations and treatment pathways across a large health system. Second, our model does not rely on clinical notes and can be applied in situations in which only administrative billing data (eg, large administrative data sets) are available. A combination of diagnosis codes and administrative flags can accurately identify patients with PAD in large cohorts. UR - http://medinform.jmir.org/2020/8/e18542/ UR - http://dx.doi.org/10.2196/18542 UR - http://www.ncbi.nlm.nih.gov/pubmed/32663152 ID - info:doi/10.2196/18542 ER - TY - JOUR AU - A'mar, Teresa AU - Beatty, David J. AU - Fedorenko, Catherine AU - Markowitz, Daniel AU - Corey, Thomas AU - Lange, Jane AU - Schwartz, M. Stephen AU - Huang, Bin AU - Chubak, Jessica AU - Etzioni, Ruth PY - 2020/8/17 TI - Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study JO - JMIR Cancer SP - e18143 VL - 6 IS - 2 KW - cancer registries KW - medical claims KW - cancer recurrence event KW - statistical learning KW - breast cancer KW - medical informatics KW - data mining N2 - Background: There is a need for automated approaches to incorporate information on cancer recurrence events into population-based cancer registries. Objective: The aim of this study is to determine the accuracy of a novel data mining algorithm to extract information from linked registry and medical claims data on the occurrence and timing of second breast cancer events (SBCE). Methods: We used supervised data from 3092 stage I and II breast cancer cases (with 394 recurrences), diagnosed between 1993 and 2006 inclusive, of patients at Kaiser Permanente Washington and cases in the Puget Sound Cancer Surveillance System. Our goal was to classify each month after primary treatment as pre- versus post-SBCE. The prediction feature set for a given month consisted of registry variables on disease and patient characteristics related to the primary breast cancer event, as well as features based on monthly counts of diagnosis and procedure codes for the current, prior, and future months. A month was classified as post-SBCE if the predicted probability exceeded a probability threshold (PT); the predicted time of the SBCE was taken to be the month of maximum increase in the predicted probability between adjacent months. Results: The Kaplan-Meier net probability of SBCE was 0.25 at 14 years. The month-level receiver operating characteristic curve on test data (20% of the data set) had an area under the curve of 0.986. The person-level predictions (at a monthly PT of 0.5) had a sensitivity of 0.89, a specificity of 0.98, a positive predictive value of 0.85, and a negative predictive value of 0.98. The corresponding median difference between the observed and predicted months of recurrence was 0 and the mean difference was 0.04 months. Conclusions: Data mining of medical claims holds promise for the streamlining of cancer registry operations to feasibly collect information about second breast cancer events. UR - http://cancer.jmir.org/2020/2/e18143/ UR - http://dx.doi.org/10.2196/18143 UR - http://www.ncbi.nlm.nih.gov/pubmed/32804084 ID - info:doi/10.2196/18143 ER - TY - JOUR AU - Adly, Sedky Aya AU - Adly, Sedky Afnan AU - Adly, Sedky Mahmoud PY - 2020/8/10 TI - Approaches Based on Artificial Intelligence and the Internet of Intelligent Things to Prevent the Spread of COVID-19: Scoping Review JO - J Med Internet Res SP - e19104 VL - 22 IS - 8 KW - SARS-CoV-2 KW - COVID-19 KW - novel coronavirus KW - artificial intelligence KW - internet of things KW - telemedicine KW - machine learning KW - modeling KW - simulation KW - robotics N2 - Background: Artificial intelligence (AI) and the Internet of Intelligent Things (IIoT) are promising technologies to prevent the concerningly rapid spread of coronavirus disease (COVID-19) and to maximize safety during the pandemic. With the exponential increase in the number of COVID-19 patients, it is highly possible that physicians and health care workers will not be able to treat all cases. Thus, computer scientists can contribute to the fight against COVID-19 by introducing more intelligent solutions to achieve rapid control of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes the disease. Objective: The objectives of this review were to analyze the current literature, discuss the applicability of reported ideas for using AI to prevent and control COVID-19, and build a comprehensive view of how current systems may be useful in particular areas. This may be of great help to many health care administrators, computer scientists, and policy makers worldwide. Methods: We conducted an electronic search of articles in the MEDLINE, Google Scholar, Embase, and Web of Knowledge databases to formulate a comprehensive review that summarizes different categories of the most recently reported AI-based approaches to prevent and control the spread of COVID-19. Results: Our search identified the 10 most recent AI approaches that were suggested to provide the best solutions for maximizing safety and preventing the spread of COVID-19. These approaches included detection of suspected cases, large-scale screening, monitoring, interactions with experimental therapies, pneumonia screening, use of the IIoT for data and information gathering and integration, resource allocation, predictions, modeling and simulation, and robotics for medical quarantine. Conclusions: We found few or almost no studies regarding the use of AI to examine COVID-19 interactions with experimental therapies, the use of AI for resource allocation to COVID-19 patients, or the use of AI and the IIoT for COVID-19 data and information gathering/integration. Moreover, the adoption of other approaches, including use of AI for COVID-19 prediction, use of AI for COVID-19 modeling and simulation, and use of AI robotics for medical quarantine, should be further emphasized by researchers because these important approaches lack sufficient numbers of studies. Therefore, we recommend that computer scientists focus on these approaches, which are still not being adequately addressed. UR - https://www.jmir.org/2020/8/e19104 UR - http://dx.doi.org/10.2196/19104 UR - http://www.ncbi.nlm.nih.gov/pubmed/32584780 ID - info:doi/10.2196/19104 ER - TY - JOUR AU - Cheng, Hao-Yuan AU - Wu, Yu-Chun AU - Lin, Min-Hau AU - Liu, Yu-Lun AU - Tsai, Yue-Yang AU - Wu, Jo-Hua AU - Pan, Ke-Han AU - Ke, Chih-Jung AU - Chen, Chiu-Mei AU - Liu, Ding-Ping AU - Lin, I-Feng AU - Chuang, Jen-Hsiang PY - 2020/8/5 TI - Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study JO - J Med Internet Res SP - e15394 VL - 22 IS - 8 KW - influenza KW - Influenza-like illness KW - forecasting KW - machine learning KW - artificial intelligence KW - epidemic forecasting KW - surveillance N2 - Background: Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective: We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods: Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results: All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) (?=0.802-0.965; MAPE: 5.2%-9.2%; hit rate: 0.577-0.756), 1-week (?=0.803-0.918; MAPE: 8.3%-11.8%; hit rate: 0.643-0.747), 2-week (?=0.783-0.867; MAPE: 10.1%-15.3%; hit rate: 0.669-0.734), and 3-week forecasts (?=0.676-0.801; MAPE: 12.0%-18.9%; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts (?=0.875-0.969; MAPE: 5.3%-8.0%; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts (?=0.721-0.908; MAPE: 7.6%-13.5%; hit rate: 0.596-0.904). Conclusions: This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making. UR - https://www.jmir.org/2020/8/e15394 UR - http://dx.doi.org/10.2196/15394 UR - http://www.ncbi.nlm.nih.gov/pubmed/32755888 ID - info:doi/10.2196/15394 ER - TY - JOUR AU - Bhardwaj, Niharika AU - Cecchetti, A. Alfred AU - Murughiyan, Usha AU - Neitch, Shirley PY - 2020/8/4 TI - Analysis of Benzodiazepine Prescription Practices in Elderly Appalachians with Dementia via the Appalachian Informatics Platform: Longitudinal Study JO - JMIR Med Inform SP - e18389 VL - 8 IS - 8 KW - dementia KW - Alzheimer disease KW - benzodiazepines KW - Appalachia KW - geriatrics KW - informatics platform KW - interactive visualization KW - eHealth KW - clinical data N2 - Background: Caring for the growing dementia population with complex health care needs in West Virginia has been challenging due to its large, sizably rural-dwelling geriatric population and limited resource availability. Objective: This paper aims to illustrate the application of an informatics platform to drive dementia research and quality care through a preliminary study of benzodiazepine (BZD) prescription patterns and its effects on health care use by geriatric patients. Methods: The Maier Institute Data Mart, which contains clinical and billing data on patients aged 65 years and older (N=98,970) seen within our clinics and hospital, was created. Relevant variables were analyzed to identify BZD prescription patterns and calculate related charges and emergency department (ED) use. Results: Nearly one-third (4346/13,910, 31.24%) of patients with dementia received at least one BZD prescription, 20% more than those without dementia. More women than men received at least one BZD prescription. On average, patients with dementia and at least one BZD prescription sustained higher charges and visited the ED more often than those without one. Conclusions: The Appalachian Informatics Platform has the potential to enhance dementia care and research through a deeper understanding of dementia, data enrichment, risk identification, and care gap analysis. UR - https://medinform.jmir.org/2020/8/e18389 UR - http://dx.doi.org/10.2196/18389 UR - http://www.ncbi.nlm.nih.gov/pubmed/32749226 ID - info:doi/10.2196/18389 ER - TY - JOUR AU - Mangin, Dee AU - Lawson, Jennifer AU - Adamczyk, Krzysztof AU - Guenter, Dale PY - 2020/7/27 TI - Embedding ?Smart? Disease Coding Within Routine Electronic Medical Record Workflow: Prospective Single-Arm Trial JO - JMIR Med Inform SP - e16764 VL - 8 IS - 7 KW - chronic disease management KW - comorbidity KW - problem list KW - disease coding KW - disease registry KW - data improvement KW - electronic medical record KW - electronic health record KW - practice-based research network KW - population health KW - primary care KW - family medicine N2 - Background: Electronic medical record (EMR) chronic disease measurement can help direct primary care prevention and treatment strategies and plan health services resource management. Incomplete data and poor consistency of coded disease values within EMR problem lists are widespread issues that limit primary and secondary uses of these data. These issues were shared by the McMaster University Sentinel and Information Collaboration (MUSIC), a primary care practice-based research network (PBRN) located in Hamilton, Ontario, Canada. Objective: We sought to develop and evaluate the effectiveness of new EMR interface tools aimed at improving the quantity and the consistency of disease codes recorded within the disease registry across the MUSIC PBRN. Methods: We used a single-arm prospective trial design with preintervention and postintervention data analysis to assess the effect of the intervention on disease recording volume and quality. The MUSIC network holds data on over 75,080 patients, 37,212 currently rostered. There were 4 MUSIC network clinician champions involved in gap analysis of the disease coding process and in the iterative design of new interface tools. We leveraged terminology standards and factored EMR workflow and usability into a new interface solution that aimed to optimize code selection volume and quality while minimizing physician time burden. The intervention was integrated as part of usual clinical workflow during routine billing activities. Results: After implementation of the new interface (June 25, 2017), we assessed the disease registry codes at 3 and 6 months (intervention period) to compare their volume and quality to preintervention levels (baseline period). A total of 17,496 International Classification of Diseases, 9th Revision (ICD9) code values were recorded in the disease registry during the 11.5-year (2006 to mid-2017) baseline period. A large gain in disease recording occurred in the intervention period (8516/17,496, 48.67% over baseline), resulting in a total of 26,774 codes. The coding rate increased by a factor of 11.2, averaging 1419 codes per month over the baseline average rate of 127 codes per month. The proportion of preferred ICD9 codes increased by 17.03% in the intervention period (11,007/17,496, 62.91% vs 7417/9278, 79.94%; ?21=819.4; P<.001). A total of 45.03% (4178/9278) of disease codes were entered by way of the new screen prompt tools, with significant increases between quarters (Jul-Sep: 2507/6140, 40.83% vs Oct-Dec: 1671/3148, 53.08%; ?21=126.2; P<.001). Conclusions: The introduction of clinician co-designed, workflow-embedded disease coding tools is a very effective solution to the issues of poor disease coding and quality in EMRs. The substantial effectiveness in a routine care environment demonstrates usability, and the intervention detail described here should be generalizable to any setting. Significant improvements in problem list coding within primary care EMRs can be realized with minimal disruption to routine clinical workflow. UR - http://medinform.jmir.org/2020/7/e16764/ UR - http://dx.doi.org/10.2196/16764 UR - http://www.ncbi.nlm.nih.gov/pubmed/32716304 ID - info:doi/10.2196/16764 ER - TY - JOUR AU - Li, Sixuan AU - Zhang, Liang AU - Liu, Shiwei AU - Hubbard, Richard AU - Li, Hui PY - 2020/7/23 TI - Surveillance of Noncommunicable Disease Epidemic Through the Integrated Noncommunicable Disease Collaborative Management System: Feasibility Pilot Study Conducted in the City of Ningbo, China JO - J Med Internet Res SP - e17340 VL - 22 IS - 7 KW - surveillance KW - noncommunicable diseases KW - regional health information platform KW - electronic health records. N2 - Background: Noncommunicable diseases (NCDs) have become the main public health concern worldwide. With rapid economic development and changes in lifestyles, the burden of NCDs in China is increasing dramatically every year. Monitoring is a critical measure for NCDs control and prevention. However, because of the lack of regional representativeness, unsatisfactory data quality, and inefficient data sharing and utilization, the existing surveillance systems and surveys in China cannot track the status and transition of NCDs epidemic. Objective: To efficaciously track NCDs epidemic in China, this pilot program conducted in Ningbo city by the Chinese Center for Disease Control and Prevention (CDC) aimed to develop an innovative model for NCDs surveillance and management: the integrated noncommunicable disease collaborative management system (NCDCMS). Methods: This Ningbo model was designed and developed through a 3-level (county/district, municipal, and provincial levels) direct reporting system based on the regional health information platform. The uniform data standards and interface specifications were established to connect different platforms and conduct data exchanges. The performance of the system was evaluated based on the 9 attributes of surveillance system evaluation framework recommended by the US CDC. Results: NCDCMS allows automatic NCDs data exchanging and sharing via a 3-level public health data exchange platform in China. It currently covers 201 medical institutions throughout the city. Compared with previous systems, automatic popping up of the report card, automatic patient information extraction, and real-time data exchange process have highly improved the simplicity and timeliness of the system. The data quality meets the requirements to monitor the incidence trend of NCDs accurately, and the comprehensive data types obtained from the database (ie, directly from the 3-level platform on the data warehouse) also provide a useful information to conduct scientific studies. So far, 98.1% (201/205) of medical institutions across Ningbo having been involved in data exchanges with the model. Evaluations of the system performance showed that NCDCMS has high levels of simplicity, data quality, acceptability, representativeness, and timeliness. Conclusions: NCDCMS completely reshaped the process of NCD surveillance reporting and had unique advantages, which include reducing the work burden of different stakeholders by data sharing and exchange, eliminating unnecessary redundancies, reducing the amount of underreporting, and structuring population-based cohorts. The Ningbo model will be gradually promoted elsewhere following this success of the pilot project, and is expected to be a milestone in NCDs surveillance, control, and prevention in China. UR - http://www.jmir.org/2020/7/e17340/ UR - http://dx.doi.org/10.2196/17340 UR - http://www.ncbi.nlm.nih.gov/pubmed/32706706 ID - info:doi/10.2196/17340 ER - TY - JOUR AU - Sun, Haixia AU - Xiao, Jin AU - Zhu, Wei AU - He, Yilong AU - Zhang, Sheng AU - Xu, Xiaowei AU - Hou, Li AU - Li, Jiao AU - Ni, Yuan AU - Xie, Guotong PY - 2020/7/23 TI - Medical Knowledge Graph to Enhance Fraud, Waste, and Abuse Detection on Claim Data: Model Development and Performance Evaluation JO - JMIR Med Inform SP - e17653 VL - 8 IS - 7 KW - medical knowledge graph KW - FWA detection N2 - Background: Fraud, Waste, and Abuse (FWA) detection is a significant yet challenging problem in the health insurance industry. An essential step in FWA detection is to check whether the medication is clinically reasonable with respect to the diagnosis. Currently, human experts with sufficient medical knowledge are required to perform this task. To reduce the cost, insurance inspectors tend to build an intelligent system to detect suspicious claims with inappropriate diagnoses/medications automatically. Objective: The aim of this study was to develop an automated method for making use of a medical knowledge graph to identify clinically suspected claims for FWA detection. Methods: First, we identified the medical knowledge that is required to assess the clinical rationality of the claims. We then searched for data sources that contain information to build such knowledge. In this study, we focused on Chinese medical knowledge. Second, we constructed a medical knowledge graph using unstructured knowledge. We used a deep learning?based method to extract the entities and relationships from the knowledge sources and developed a multilevel similarity matching approach to conduct the entity linking. To guarantee the quality of the medical knowledge graph, we involved human experts to review the entity and relationships with lower confidence. These reviewed results could be used to further improve the machine-learning models. Finally, we developed the rules to identify the suspected claims by reasoning according to the medical knowledge graph. Results: We collected 185,796 drug labels from the China Food and Drug Administration, 3390 types of disease information from medical textbooks (eg, symptoms, diagnosis, treatment, and prognosis), and information from 5272 examinations as the knowledge sources. The final medical knowledge graph includes 1,616,549 nodes and 5,963,444 edges. We designed three knowledge graph reasoning rules to identify three kinds of inappropriate diagnosis/medications. The experimental results showed that the medical knowledge graph helps to detect 70% of the suspected claims. Conclusions: The medical knowledge graph?based method successfully identified suspected cases of FWA (such as fraud diagnosis, excess prescription, and irrational prescription) from the claim documents, which helped to improve the efficiency of claim processing. UR - http://medinform.jmir.org/2020/7/e17653/ UR - http://dx.doi.org/10.2196/17653 UR - http://www.ncbi.nlm.nih.gov/pubmed/32706714 ID - info:doi/10.2196/17653 ER - TY - JOUR AU - Palzes, A. Vanessa AU - Weisner, Constance AU - Chi, W. Felicia AU - Kline-Simon, H. Andrea AU - Satre, D. Derek AU - Hirschtritt, E. Matthew AU - Ghadiali, Murtuza AU - Sterling, Stacy PY - 2020/7/22 TI - The Kaiser Permanente Northern California Adult Alcohol Registry, an Electronic Health Records-Based Registry of Patients With Alcohol Problems: Development and Implementation JO - JMIR Med Inform SP - e19081 VL - 8 IS - 7 KW - electronic health records KW - alcohol KW - registry KW - unhealthy alcohol use KW - alcohol use disorder KW - recovery KW - secondary data N2 - Background: Electronic health record (EHR)?based disease registries have aided health care professionals and researchers in increasing their understanding of chronic illnesses, including identifying patients with (or at risk of developing) conditions and tracking treatment progress and recovery. Despite excessive alcohol use being a major contributor to the global burden of disease and disability, no registries of alcohol problems exist. EHR-based data in Kaiser Permanente Northern California (KPNC), an integrated health system that conducts systematic alcohol screening, which provides specialty addiction medicine treatment internally and has a membership of over 4 million members that are highly representative of the US population with access to care, provide a unique opportunity to develop such a registry. Objective: Our objectives were to describe the development and implementation of a protocol for assembling the KPNC Adult Alcohol Registry, which may be useful to other researchers and health systems, and to characterize the registry cohort descriptively, including underlying health conditions. Methods: Inclusion criteria were adult members with unhealthy alcohol use (using National Institute on Alcohol Abuse and Alcoholism guidelines), an alcohol use disorder (AUD) diagnosis, or an alcohol-related health problem between June 1, 2013, and May 31, 2019. We extracted patients? longitudinal, multidimensional EHR data from 1 year before their date of eligibility through May 31, 2019, and conducted descriptive analyses. Results: We identified 723,604 adult patients who met the registry inclusion criteria at any time during the study period: 631,780 with unhealthy alcohol use, 143,690 with an AUD diagnosis, and 18,985 with an alcohol-related health problem. We identified 65,064 patients who met two or more criteria. Of the 4,973,195 adult patients with at least one encounter with the health system during the study period, the prevalence of unhealthy alcohol use was 13% (631,780/4,973,195), the prevalence of AUD diagnoses was 3% (143,690/4,973,195), and the prevalence of alcohol-related health problems was 0.4% (18,985/4,973,195). The registry cohort was 60% male (n=432,847) and 41% non-White (n=295,998) and had a median age of 41 years (IQR=27). About 48% (n=346,408) had a chronic medical condition, 18% (n=130,031) had a mental health condition, and 4% (n=30,429) had a drug use disorder diagnosis. Conclusions: We demonstrated that EHR-based data collected during clinical care within an integrated health system could be leveraged to develop a registry of patients with alcohol problems that is flexible and can be easily updated. The registry?s comprehensive patient-level data over multiyear periods provides a strong foundation for robust research addressing critical public health questions related to the full course and spectrum of alcohol problems, including recovery, which would complement other methods used in alcohol research (eg, population-based surveys, clinical trials). UR - http://medinform.jmir.org/2020/7/e19081/ UR - http://dx.doi.org/10.2196/19081 UR - http://www.ncbi.nlm.nih.gov/pubmed/32706676 ID - info:doi/10.2196/19081 ER - TY - JOUR AU - de Lusignan, Simon AU - Jones, Nicholas AU - Dorward, Jienchi AU - Byford, Rachel AU - Liyanage, Harshana AU - Briggs, John AU - Ferreira, Filipa AU - Akinyemi, Oluwafunmi AU - Amirthalingam, Gayatri AU - Bates, Chris AU - Lopez Bernal, Jamie AU - Dabrera, Gavin AU - Eavis, Alex AU - Elliot, J. Alex AU - Feher, Michael AU - Krajenbrink, Else AU - Hoang, Uy AU - Howsam, Gary AU - Leach, Jonathan AU - Okusi, Cecilia AU - Nicholson, Brian AU - Nieri, Philip AU - Sherlock, Julian AU - Smith, Gillian AU - Thomas, Mark AU - Thomas, Nicholas AU - Tripathy, Manasa AU - Victor, William AU - Williams, John AU - Wood, Ian AU - Zambon, Maria AU - Parry, John AU - O?Hanlon, Shaun AU - Joy, Mark AU - Butler, Chris AU - Marshall, Martin AU - Hobbs, Richard F. D. PY - 2020/7/2 TI - The Oxford Royal College of General Practitioners Clinical Informatics Digital Hub: Protocol to Develop Extended COVID-19 Surveillance and Trial Platforms JO - JMIR Public Health Surveill SP - e19773 VL - 6 IS - 3 KW - primary health care KW - general practice KW - medical record systems, computerized KW - sentinel surveillance KW - public health surveillance KW - clinical trials as a topic KW - adaptive clinical trials KW - severe acute respiratory syndrome coronavirus 2 KW - COVID-19 N2 - Background: Routinely recorded primary care data have been used for many years by sentinel networks for surveillance. More recently, real world data have been used for a wider range of research projects to support rapid, inexpensive clinical trials. Because the partial national lockdown in the United Kingdom due to the coronavirus disease (COVID-19) pandemic has resulted in decreasing community disease incidence, much larger numbers of general practices are needed to deliver effective COVID-19 surveillance and contribute to in-pandemic clinical trials. Objective: The aim of this protocol is to describe the rapid design and development of the Oxford Royal College of General Practitioners Clinical Informatics Digital Hub (ORCHID) and its first two platforms. The Surveillance Platform will provide extended primary care surveillance, while the Trials Platform is a streamlined clinical trials platform that will be integrated into routine primary care practice. Methods: We will apply the FAIR (Findable, Accessible, Interoperable, and Reusable) metadata principles to a new, integrated digital health hub that will extract routinely collected general practice electronic health data for use in clinical trials and provide enhanced communicable disease surveillance. The hub will be findable through membership in Health Data Research UK and European metadata repositories. Accessibility through an online application system will provide access to study-ready data sets or developed custom data sets. Interoperability will be facilitated by fixed linkage to other key sources such as Hospital Episodes Statistics and the Office of National Statistics using pseudonymized data. All semantic descriptors (ie, ontologies) and code used for analysis will be made available to accelerate analyses. We will also make data available using common data models, starting with the US Food and Drug Administration Sentinel and Observational Medical Outcomes Partnership approaches, to facilitate international studies. The Surveillance Platform will provide access to data for health protection and promotion work as authorized through agreements between Oxford, the Royal College of General Practitioners, and Public Health England. All studies using the Trials Platform will go through appropriate ethical and other regulatory approval processes. Results: The hub will be a bottom-up, professionally led network that will provide benefits for member practices, our health service, and the population served. Data will only be used for SQUIRE (surveillance, quality improvement, research, and education) purposes. We have already received positive responses from practices, and the number of practices in the network has doubled to over 1150 since February 2020. COVID-19 surveillance has resulted in tripling of the number of virology sites to 293 (target 300), which has aided the collection of the largest ever weekly total of surveillance swabs in the United Kingdom as well as over 3000 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) serology samples. Practices are recruiting to the PRINCIPLE (Platform Randomised trial of INterventions against COVID-19 In older PeopLE) trial, and these participants will be followed up through ORCHID. These initial outputs demonstrate the feasibility of ORCHID to provide an extended national digital health hub. Conclusions: ORCHID will provide equitable and innovative use of big data through a professionally led national primary care network and the application of FAIR principles. The secure data hub will host routinely collected general practice data linked to other key health care repositories for clinical trials and support enhanced in situ surveillance without always requiring large volume data extracts. ORCHID will support rapid data extraction, analysis, and dissemination with the aim of improving future research and development in general practice to positively impact patient care. International Registered Report Identifier (IRRID): DERR1-10.2196/19773 UR - https://publichealth.jmir.org/2020/3/e19773 UR - http://dx.doi.org/10.2196/19773 UR - http://www.ncbi.nlm.nih.gov/pubmed/32484782 ID - info:doi/10.2196/19773 ER - TY - JOUR AU - Nguyen, Long AU - Stoové, Mark AU - Boyle, Douglas AU - Callander, Denton AU - McManus, Hamish AU - Asselin, Jason AU - Guy, Rebecca AU - Donovan, Basil AU - Hellard, Margaret AU - El-Hayek, Carol PY - 2020/6/24 TI - Privacy-Preserving Record Linkage of Deidentified Records Within a Public Health Surveillance System: Evaluation Study JO - J Med Internet Res SP - e16757 VL - 22 IS - 6 KW - medical record linkage KW - public health surveillance KW - sentinel surveillance KW - sensitivity and specificity KW - data linkage KW - confidentiality KW - evaluation studies as a topic N2 - Background: The Australian Collaboration for Coordinated Enhanced Sentinel Surveillance (ACCESS) was established to monitor national testing and test outcomes for blood-borne viruses (BBVs) and sexually transmissible infections (STIs) in key populations. ACCESS extracts deidentified data from sentinel health services that include general practice, sexual health, and infectious disease clinics, as well as public and private laboratories that conduct a large volume of BBV/STI testing. An important attribute of ACCESS is the ability to accurately link individual-level records within and between the participating sites, as this enables the system to produce reliable epidemiological measures. Objective: The aim of this study was to evaluate the use of GRHANITE software in ACCESS to extract and link deidentified data from participating clinics and laboratories. GRHANITE generates irreversible hashed linkage keys based on patient-identifying data captured in the patient electronic medical records (EMRs) at the site. The algorithms to produce the data linkage keys use probabilistic linkage principles to account for variability and completeness of the underlying patient identifiers, producing up to four linkage key types per EMR. Errors in the linkage process can arise from imperfect or missing identifiers, impacting the system?s integrity. Therefore, it is important to evaluate the quality of the linkages created and evaluate the outcome of the linkage for ongoing public health surveillance. Methods: Although ACCESS data are deidentified, we created two gold-standard datasets where the true match status could be confirmed in order to compare against record linkage results arising from different approaches of the GRHANITE Linkage Tool. We reported sensitivity, specificity, and positive and negative predictive values where possible and estimated specificity by comparing a history of HIV and hepatitis C antibody results for linked EMRs. Results: Sensitivity ranged from 96% to 100%, and specificity was 100% when applying the GRHANITE Linkage Tool to a small gold-standard dataset of 3700 clinical medical records. Medical records in this dataset contained a very high level of data completeness by having the name, date of birth, post code, and Medicare number available for use in record linkage. In a larger gold-standard dataset containing 86,538 medical records across clinics and pathology services, with a lower level of data completeness, sensitivity ranged from 94% to 95% and estimated specificity ranged from 91% to 99% in 4 of the 6 different record linkage approaches. Conclusions: This study?s findings suggest that the GRHANITE Linkage Tool can be used to link deidentified patient records accurately and can be confidently used for public health surveillance in systems such as ACCESS. UR - https://www.jmir.org/2020/6/e16757 UR - http://dx.doi.org/10.2196/16757 UR - http://www.ncbi.nlm.nih.gov/pubmed/32579128 ID - info:doi/10.2196/16757 ER - TY - JOUR AU - Sung, Sheng-Feng AU - Hsieh, Cheng-Yang AU - Hu, Ya-Han PY - 2020/6/16 TI - Two Decades of Research Using Taiwan?s National Health Insurance Claims Data: Bibliometric and Text Mining Analysis on PubMed JO - J Med Internet Res SP - e18457 VL - 22 IS - 6 KW - administrative claims data KW - bibliometric analysis KW - National Health Insurance KW - text mining KW - open access journals KW - PubMed N2 - Background: Studies using Taiwan?s National Health Insurance (NHI) claims data have expanded rapidly both in quantity and quality during the first decade following the first study published in 2000. However, some of these studies were criticized for being merely data-dredging studies rather than hypothesis-driven. In addition, the use of claims data without the explicit authorization from individual patients has incurred litigation. Objective: This study aimed to investigate whether the research output during the second decade after the release of the NHI claims database continues growing, to explore how the emergence of open access mega journals (OAMJs) and lawsuit against the use of this database affect the research topics and publication volume and to discuss the underlying reasons. Methods: PubMed was used to locate publications based on NHI claims data between 1996 and 2017. Concept extraction using MetaMap was employed to mine research topics from article titles. Research trends were analyzed from various aspects, including publication amount, journals, research topics and types, and cooperation between authors. Results: A total of 4473 articles were identified. A rapid growth in publications was witnessed from 2000 to 2015, followed by a plateau. Diabetes, stroke, and dementia were the top 3 most popular research topics whereas statin therapy, metformin, and Chinese herbal medicine were the most investigated interventions. Approximately one-third of the articles were published in open access journals. Studies with two or more medical conditions, but without any intervention, were the most common study type. Studies of this type tended to be contributed by prolific authors and published in OAMJs. Conclusions: The growth in publication volume during the second decade after the release of the NHI claims database was different from that during the first decade. OAMJs appeared to provide fertile soil for the rapid growth of research based on NHI claims data, in particular for those studies with two or medical conditions in the article title. A halt in the growth of publication volume was observed after the use of NHI claims data for research purposes had been restricted in response to legal controversy. More efforts are needed to improve the impact of knowledge gained from NHI claims data on medical decisions and policy making. UR - http://www.jmir.org/2020/6/e18457/ UR - http://dx.doi.org/10.2196/18457 UR - http://www.ncbi.nlm.nih.gov/pubmed/32543443 ID - info:doi/10.2196/18457 ER - TY - JOUR AU - Ye, Qing AU - Zhou, Jin AU - Wu, Hong PY - 2020/6/8 TI - Using Information Technology to Manage the COVID-19 Pandemic: Development of a Technical Framework Based on Practical Experience in China JO - JMIR Med Inform SP - e19515 VL - 8 IS - 6 KW - COVID-19 KW - pandemic KW - health informatics KW - health information technology KW - technical framework KW - privacy protection N2 - Background: The coronavirus disease (COVID-19) epidemic poses an enormous challenge to the global health system, and governments have taken active preventive and control measures. The health informatics community in China has actively taken action to leverage health information technologies for epidemic monitoring, detection, early warning, prevention and control, and other tasks. Objective: The aim of this study was to develop a technical framework to respond to the COVID-19 epidemic from a health informatics perspective. Methods: In this study, we collected health information technology?related information to understand the actions taken by the health informatics community in China during the COVID-19 outbreak and developed a health information technology framework for epidemic response based on health information technology?related measures and methods. Results: Based on the framework, we review specific health information technology practices for managing the outbreak in China, describe the highlights of their application in detail, and discuss critical issues to consider when using health information technology. Technologies employed include mobile and web-based services such as Internet hospitals and Wechat, big data analyses (including digital contact tracing through QR codes or epidemic prediction), cloud computing, Internet of things, Artificial Intelligence (including the use of drones, robots, and intelligent diagnoses), 5G telemedicine, and clinical information systems to facilitate clinical management for COVID-19. Conclusions: Practical experience in China shows that health information technologies play a pivotal role in responding to the COVID-19 epidemic. UR - http://medinform.jmir.org/2020/6/e19515/ UR - http://dx.doi.org/10.2196/19515 UR - http://www.ncbi.nlm.nih.gov/pubmed/32479411 ID - info:doi/10.2196/19515 ER - TY - JOUR AU - Her, Qoua AU - Malenfant, Jessica AU - Zhang, Zilu AU - Vilk, Yury AU - Young, Jessica AU - Tabano, David AU - Hamilton, Jack AU - Johnson, Ron AU - Raebel, Marsha AU - Boudreau, Denise AU - Toh, Sengwee PY - 2020/6/4 TI - Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance JO - JMIR Med Inform SP - e15073 VL - 8 IS - 6 KW - distributed regression analysis KW - distributed data networks KW - privacy-protecting analytics KW - pharmacoepidemiology KW - PopMedNet N2 - Background: A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited. Objective: This study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network. Methods: We executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow. Results: All DRA results were precise (<10?12), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution. Conclusions: We integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies. UR - https://medinform.jmir.org/2020/6/e15073 UR - http://dx.doi.org/10.2196/15073 UR - http://www.ncbi.nlm.nih.gov/pubmed/32496200 ID - info:doi/10.2196/15073 ER - TY - JOUR AU - Montvida, Olga AU - Dibato, Epoh John AU - Paul, Sanjoy PY - 2020/6/3 TI - Evaluating the Representativeness of US Centricity Electronic Medical Records With Reports From the Centers for Disease Control and Prevention: Comparative Study on Office Visits and Cardiometabolic Conditions JO - JMIR Med Inform SP - e17174 VL - 8 IS - 6 KW - electronic medical records KW - observational study KW - epidemiology KW - population health N2 - Background: Electronic medical record (EMR)?based clinical and epidemiological research has dramatically increased over the last decade, although establishing the generalizability of such big databases for conducting epidemiological studies has been an ongoing challenge. To draw meaningful inferences from such studies, it is essential to fully understand the characteristics of the underlying population and potential biases in EMRs. Objective: This study aimed to assess the generalizability and representativity of the widely used US Centricity Electronic Medical Record (CEMR), a primary and ambulatory care EMR for population health research, using data from the National Ambulatory Medical Care Surveys (NAMCS) and the National Health and Nutrition Examination Surveys (NHANES). Methods: The number of office visits reported in the NAMCS, designed to meet the need for objective and reliable information about the provision and the use of ambulatory medical care services, was compared with similar data from the CEMR. The distribution of major cardiometabolic diseases in the NHANES, designed to assess the health and nutritional status of adults and children in the United States, was compared with similar data from the CEMR. Results: Gender and ethnicity distributions were similar between the NAMCS and the CEMR. Younger patients (aged <15 years) were underrepresented in the CEMR compared with the NAMCS. The number of office visits per 100 persons per year was similar: 277.9 (95% CI 259.3-296.5) in the NAMCS and 284.6 (95% CI 284.4-284.7) in the CEMR. However, the number of visits for males was significantly higher in the CEMR (CEMR: 270.8 and NAMCS: 239.0). West and South regions were underrepresented and overrepresented, respectively, in the CEMR. The overall prevalence of diabetes along with age and gender distribution was similar in the CEMR and the NHANES: overall prevalence, 10.1% and 9.7%; male, 11.5% and 10.8%; female, 9.1% and 8.8%; age 20 to 40 years, 2.5% and 1.8%; and age 40 to 60 years, 9.4% and 11.1%, respectively. The prevalence of obesity was similar: 42.1% and 39.6%, with similar age and female distribution (41.5% and 41.1%) but different male distribution (42.7% and 37.9%). The overall prevalence of high cholesterol along with age and female distribution was similar in the CEMR and the NHANES: overall prevalence, 12.4% and 12.4%; and female, 14.8% and 13.2%, respectively. The overall prevalence of hypertension was significantly higher in the CEMR (33.5%) than in the NHANES (95% CI: 27.0%-31.0%). Conclusions: The distribution of major cardiometabolic diseases in the CEMR is comparable with the national survey results. The CEMR represents the general US population well in terms of office visits and major chronic conditions, whereas the potential subgroup differences in terms of age and gender distribution and prevalence may differ and, therefore, should be carefully taken care of in future studies. UR - https://medinform.jmir.org/2020/6/e17174 UR - http://dx.doi.org/10.2196/17174 UR - http://www.ncbi.nlm.nih.gov/pubmed/32490850 ID - info:doi/10.2196/17174 ER - TY - JOUR AU - Liu, Zhike AU - Zhang, Liang AU - Yang, Yu AU - Meng, Ruogu AU - Fang, Ting AU - Dong, Ying AU - Li, Ning AU - Xu, Guozhang AU - Zhan, Siyan PY - 2020/6/1 TI - Active Surveillance of Adverse Events Following Human Papillomavirus Vaccination: Feasibility Pilot Study Based on the Regional Health Care Information Platform in the City of Ningbo, China JO - J Med Internet Res SP - e17446 VL - 22 IS - 6 KW - safety KW - HPV KW - human papillomavirus KW - vaccine KW - active surveillance N2 - Background: Comprehensive safety data for vaccines from post-licensure surveillance, especially active surveillance, could guide administrations and individuals to make reasonable decisions on vaccination. Therefore, we designed a pilot study to assess the capability of a regional health care information platform to actively monitor the safety of a newly licensed vaccine. Objective: This study aimed to conduct active surveillance of human papillomavirus (HPV) vaccine safety based on this information platform. Methods: In 2017, one of China?s most mature information platforms with superior data linkage was selected. A structured questionnaire and open-ended interview guidelines were developed to investigate the feasibility of active surveillance following HPV vaccination using the regional health care information platform in Ningbo. The questionnaire was sent to participants via email, and a face-to-face interview was conducted to confirm details or resolve discrepancies. Results: Five databases that could be considered essential to active surveillance of vaccine safety were integrated into the platform starting in 2015. Except for residents' health records, which had a coverage rate of 87%, the data sources covered more than 95% of the records that were documented in Ningbo. All the data could be inherently linked using the national identity card. There were 19,328 women who received the HPV vaccine, and 37,988 doses were administered in 2017 and 2018. Women aged 30-40 years accounted for the largest proportion. Quadrivalent vaccination accounted for 73.1% of total vaccination, a much higher proportion than that of bivalent vaccination. Of the first doses, 60 (60/19,328, 0.31%) occurred outside Ningbo. There were no missing data for vaccination-relevant variables, such as identity card, vaccine name, vaccination doses, vaccination date, and manufacturer. ICD-10 coding could be used to identify 9,180 cases using a predefined list of the outcomes of interest, and 1.88% of these cases were missing the identity card. During the 90 days following HPV vaccination, 4 incident cases were found through the linked vaccination history and electronic medical records. The combined incident rate of rheumatoid arthritis, optic neuritis, and Henoch-Schonlein purpura was 8.84/100,000 doses of bivalent HPV, and the incidence rate of rheumatoid arthritis was 3.75/100,000 doses of quadrivalent HPV. Conclusions: This study presents an available approach to initiate an active surveillance system for adverse events following HPV vaccination, based on a regional health care information platform in China. An extended observation period or the inclusion of additional functional sites is warranted to conduct future hypothesis-generating and hypothesis-confirming studies for vaccine safety concerns. UR - https://www.jmir.org/2020/6/e17446 UR - http://dx.doi.org/10.2196/17446 UR - http://www.ncbi.nlm.nih.gov/pubmed/32234696 ID - info:doi/10.2196/17446 ER - TY - JOUR AU - McLennan, Stuart AU - Celi, Anthony Leo AU - Buyx, Alena PY - 2020/5/29 TI - COVID-19: Putting the General Data Protection Regulation to the Test JO - JMIR Public Health Surveill SP - e19279 VL - 6 IS - 2 KW - COVID-19 KW - data sharing KW - GDPR KW - research exemption KW - global health KW - public health KW - research KW - digital health KW - electronic health records KW - EHR UR - http://publichealth.jmir.org/2020/2/e19279/ UR - http://dx.doi.org/10.2196/19279 UR - http://www.ncbi.nlm.nih.gov/pubmed/32449686 ID - info:doi/10.2196/19279 ER - TY - JOUR AU - Huang, Yihao AU - Li, Mingtao PY - 2020/5/27 TI - Optimization of Precontrol Methods and Analysis of a Dynamic Model for Brucellosis: Model Development and Validation JO - JMIR Med Inform SP - e18664 VL - 8 IS - 5 KW - brucellosis KW - dynamic model KW - protective measures KW - precontrol methods N2 - Background: Brucella is a gram-negative, nonmotile bacterium without a capsule. The infection scope of Brucella is wide. The major source of infection is mammals such as cattle, sheep, goats, pigs, and dogs. Currently, human beings do not transmit Brucella to each other. When humans eat Brucella-contaminated food or contact animals or animal secretions and excretions infected with Brucella, they may develop brucellosis. Although brucellosis does not originate in humans, its diagnosis and cure are very difficult; thus, it has a huge impact on humans. Even with the rapid development of medical science, brucellosis is still a major problem for Chinese people. Currently, the number of patients with brucellosis in China is 100,000 per year. In addition, due to the ongoing improvement in the living standards of Chinese people, the demand for meat products has gradually increased, and increased meat transactions have greatly promoted the spread of brucellosis. Therefore, many researchers are concerned with investigating the transmission of Brucella as well as the diagnosis and treatment of brucellosis.Mathematical models have become an important tool for the study of infectious diseases. Mathematical models can reflect the spread of infectious diseases and be used to study the effect of different inhibition methods on infectious diseases. The effect of control measures to obtain effective suppression can provide theoretical support for the suppression of infectious diseases. Therefore, it is the objective of this study to build a suitable mathematical model for brucellosis infection. Objective: We aimed to study the optimized precontrol methods of brucellosis using a dynamic threshold?based microcomputer model and to provide critical theoretical support for the prevention and control of brucellosis. Methods: By studying the transmission characteristics of Brucella and building a Brucella transmission model, the precontrol methods were designed and presented to the key populations (Brucella-susceptible populations). We investigated the utilization of protective tools by the key populations before and after precontrol methods. Results: An improvement in the amount of glove-wearing was evident and significant (P<.001), increasing from 51.01% before the precontrol methods to 66.22% after the precontrol methods, an increase of 15.21%. However, the amount of hat-wearing did not improve significantly (P=.95). Hat-wearing among the key populations increased from 57.3% before the precontrol methods to 58.6% after the precontrol methods, an increase of 1.3%. Conclusions: By demonstrating the optimized precontrol methods for a brucellosis model built on a dynamic threshold?based microcomputer model, this study provides theoretical support for the suppression of Brucella and the improved usage of protective measures by key populations. UR - https://medinform.jmir.org/2020/5/e18664 UR - http://dx.doi.org/10.2196/18664 UR - http://www.ncbi.nlm.nih.gov/pubmed/32459180 ID - info:doi/10.2196/18664 ER - TY - JOUR AU - Huang, Yihao AU - Li, Mingtao PY - 2020/5/27 TI - Application of a Mathematical Model in Determining the Spread of the Rabies Virus: Simulation Study JO - JMIR Med Inform SP - e18627 VL - 8 IS - 5 KW - rabies KW - computer model KW - suppression measures KW - basic reproductive number N2 - Background: Rabies is an acute infectious disease of the central nervous system caused by the rabies virus. The mortality rate of rabies is almost 100%. For some countries with poor sanitation, the spread of rabies among dogs is very serious. Objective: The objective of this paper was to study the ecological transmission mode of rabies to make theoretical contributions to the suppression of rabies in China. Methods: A mathematical model of the transmission mode of rabies was constructed using relevant data from the literature and officially published figures in China. Using this model, we fitted the data of the number of patients with rabies and predicted the future number of patients with rabies. In addition, we studied the effectiveness of different rabies suppression measures. Results: The results of the study indicated that the number of people infected with rabies will rise in the first stage, and then decrease. The model forecasted that in about 10 years, the number of rabies cases will be controlled within a relatively stable range. According to the prediction results of the model reported in this paper, the number of rabies cases will eventually plateau at approximately 500 people every year. Relatively effective rabies suppression measures include controlling the birth rate of domestic and wild dogs as well as increasing the level of rabies immunity in domestic dogs. Conclusions: The basic reproductive number of rabies in China is still greater than 1. That is, China currently has insufficient measures to control rabies. The research on the transmission mode of rabies and control measures in this paper can provide theoretical support for rabies control in China. UR - http://medinform.jmir.org/2020/5/e18627/ UR - http://dx.doi.org/10.2196/18627 UR - http://www.ncbi.nlm.nih.gov/pubmed/32459185 ID - info:doi/10.2196/18627 ER - TY - JOUR AU - Jones, Kerina AU - Daniels, Helen AU - Heys, Sharon AU - Lacey, Arron AU - Ford, V. David PY - 2020/5/15 TI - Toward a Risk-Utility Data Governance Framework for Research Using Genomic and Phenotypic Data in Safe Havens: Multifaceted Review JO - J Med Internet Res SP - e16346 VL - 22 IS - 5 KW - genomic data KW - data safe havens KW - data governance N2 - Background: Research using genomic data opens up new insights into health and disease. Being able to use the data in association with health and administrative record data held in safe havens can multiply the benefits. However, there is much discussion about the use of genomic data with perceptions of particular challenges in doing so safely and effectively. Objective: This study aimed to work toward a risk-utility data governance framework for research using genomic and phenotypic data in an anonymized form for research in safe havens. Methods: We carried out a multifaceted review drawing upon data governance arrangements in published research, case studies of organizations working with genomic and phenotypic data, public views and expectations, and example studies using genomic and phenotypic data in combination. The findings were contextualized against a backdrop of legislative and regulatory requirements and used to create recommendations. Results: We proposed recommendations toward a risk-utility model with a flexible suite of controls to safeguard privacy and retain data utility for research. These were presented as overarching principles aligned to the core elements in the data sharing framework produced by the Global Alliance for Genomics and Health and as practical control measures distilled from published literature and case studies of operational safe havens to be applied as required at a project-specific level. Conclusions: The recommendations presented can be used to contribute toward a proportionate data governance framework to promote the safe, socially acceptable use of genomic and phenotypic data in safe havens. They do not purport to eradicate risk but propose case-by-case assessment with transparency and accountability. If the risks are adequately understood and mitigated, there should be no reason that linked genomic and phenotypic data should not be used in an anonymized form for research in safe havens. UR - https://www.jmir.org/2020/5/e16346 UR - http://dx.doi.org/10.2196/16346 UR - http://www.ncbi.nlm.nih.gov/pubmed/32412420 ID - info:doi/10.2196/16346 ER - TY - JOUR AU - Avoundjian, Tigran AU - Dombrowski, C. Julia AU - Golden, R. Matthew AU - Hughes, P. James AU - Guthrie, L. Brandon AU - Baseman, Janet AU - Sadinle, Mauricio PY - 2020/4/30 TI - Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study JO - JMIR Public Health Surveill SP - e15917 VL - 6 IS - 2 KW - medical record linkage KW - public health surveillance KW - public health practice KW - data management N2 - Background: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. Objective: This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. Methods: We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. Results: In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). Conclusions: Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action. UR - http://publichealth.jmir.org/2020/2/e15917/ UR - http://dx.doi.org/10.2196/15917 UR - http://www.ncbi.nlm.nih.gov/pubmed/32352389 ID - info:doi/10.2196/15917 ER - TY - JOUR AU - Grundstrom, Casandra AU - Korhonen, Olli AU - Väyrynen, Karin AU - Isomursu, Minna PY - 2020/3/26 TI - Insurance Customers? Expectations for Sharing Health Data: Qualitative Survey Study JO - JMIR Med Inform SP - e16102 VL - 8 IS - 3 KW - data sharing KW - qualitative research KW - survey KW - health insurance KW - insurance KW - medical informatics KW - health services N2 - Background: Insurance organizations are essential stakeholders in health care ecosystems. For addressing future health care needs, insurance companies require access to health data to deliver preventative and proactive digital health services to customers. However, extant research is limited in examining the conditions that incentivize health data sharing. Objective: This study aimed to (1) identify the expectations of insurance customers when sharing health data, (2) determine the perceived intrinsic value of health data, and (3) explore the conditions that aid in incentivizing health data sharing in the relationship between an insurance organization and its customer. Methods: A Web-based survey was distributed to randomly selected customers from a Finnish insurance organization through email. A single open-text answer was used for a qualitative data analysis through inductive coding, followed by a thematic analysis. Furthermore, the 4 constructs of commitment, power, reciprocity, and trust from the social exchange theory (SET) were applied as a framework. Results: From the 5000 customers invited to participate, we received 452 surveys (response rate: 9.0%). Customer characteristics were found to reflect customer demographics. Of the 452 surveys, 48 (10.6%) open-text responses were skipped by the customer, 57 (12.6%) customers had no expectations from sharing health data, and 44 (9.7%) customers preferred to abstain from a data sharing relationship. Using the SET framework, we found that customers expected different conditions to be fulfilled by their insurance provider based on the commitment, power, reciprocity, and trust constructs. Of the 452 customers who completed the surveys, 64 (14.2%) customers required that the insurance organization meets their data treatment expectations (commitment). Overall, 4.9% (22/452) of customers were concerned about their health data being used against them to profile their health, to increase insurance prices, or to deny health insurance claims (power). A total of 28.5% (129/452) of customers expected some form of benefit, such as personalized digital health services, and 29.9% (135/452) of customers expected finance-related compensation (reciprocity). Furthermore, 7.5% (34/452) of customers expected some form of empathy from the insurance organization through enhanced transparency or an emotional connection (trust). Conclusions: To aid in the design and development of digital health services, insurance organizations need to address the customers? expectations when sharing their health data. We established the expectations of customers in the social exchange of health data and explored the perceived values of data as intangible goods. Actions by the insurance organization should aim to increase trust through a culture of transparency, commitment to treat health data in a prescribed manner, provide reciprocal benefits through digital health services that customers deem valuable, and assuage fears of health data being used to prevent providing insurance coverage or increase costs. UR - http://medinform.jmir.org/2020/3/e16102/ UR - http://dx.doi.org/10.2196/16102 UR - http://www.ncbi.nlm.nih.gov/pubmed/32213467 ID - info:doi/10.2196/16102 ER - TY - JOUR AU - Park, Rang Yu AU - Koo, HaYeong AU - Yoon, Young-Kwang AU - Park, Sumi AU - Lim, Young-Suk AU - Baek, Seunghee AU - Kim, Reong Hae AU - Kim, Won Tae PY - 2020/2/27 TI - Expedited Safety Reporting Through an Alert System for Clinical Trial Management at an Academic Medical Center: Retrospective Design Study JO - JMIR Med Inform SP - e14379 VL - 8 IS - 2 KW - clinical trial KW - adverse event KW - early detection KW - patient safety N2 - Background: Early detection or notification of adverse event (AE) occurrences during clinical trials is essential to ensure patient safety. Clinical trials take advantage of innovative strategies, clinical designs, and state-of-the-art technologies to evaluate efficacy and safety, however, early awareness of AE occurrences by investigators still needs to be systematically improved. Objective: This study aimed to build a system to promptly inform investigators when clinical trial participants make unscheduled visits to the emergency room or other departments within the hospital. Methods: We developed the Adverse Event Awareness System (AEAS), which promptly informs investigators and study coordinators of AE occurrences by automatically sending text messages when study participants make unscheduled visits to the emergency department or other clinics at our center. We established the AEAS in July 2015 in the clinical trial management system. We compared the AE reporting timeline data of 305 AE occurrences from 74 clinical trials between the preinitiative period (December 2014-June 2015) and the postinitiative period (July 2015-June 2016) in terms of three AE awareness performance indicators: onset to awareness, awareness to reporting, and onset to reporting. Results: A total of 305 initial AE reports from 74 clinical trials were included. All three AE awareness performance indicators were significantly lower in the postinitiative period. Specifically, the onset-to-reporting times were significantly shorter in the postinitiative period (median 1 day [IQR 0-1], mean rank 140.04 [SD 75.35]) than in the preinitiative period (median 1 day [IQR 0-4], mean rank 173.82 [SD 91.07], P?.001). In the phase subgroup analysis, the awareness-to-reporting and onset-to-reporting indicators of phase 1 studies were significantly lower in the postinitiative than in the preinitiative period (preinitiative: median 1 day, mean rank of awareness to reporting 47.94, vs postinitiative: median 0 days, mean rank of awareness to reporting 35.75, P=.01; and preinitiative: median 1 day, mean rank of onset to reporting 47.4, vs postinitiative: median 1 day, mean rank of onset to reporting 35.99, P=.03). The risk-level subgroup analysis found that the onset-to-reporting time for low- and high-risk studies significantly decreased postinitiative (preinitiative: median 4 days, mean rank of low-risk studies 18.73, vs postinitiative: median 1 day, mean rank of low-risk studies 11.76, P=.02; and preinitiative: median 1 day, mean rank of high-risk studies 117.36, vs postinitiative: median 1 day, mean rank of high-risk studies 97.27, P=.01). In particular, onset to reporting was reduced more in the low-risk trial than in the high-risk trial (low-risk: median 4-0 days, vs high-risk: median 1-1 day). Conclusions: We demonstrated that a real-time automatic alert system can effectively improve safety reporting timelines. The improvements were prominent in phase 1 and in low- and high-risk clinical trials. These findings suggest that an information technology-driven automatic alert system effectively improves safety reporting timelines, which may enhance patient safety. UR - http://medinform.jmir.org/2020/2/e14379/ UR - http://dx.doi.org/10.2196/14379 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/14379 ER - TY - JOUR AU - Reiner Benaim, Anat AU - Almog, Ronit AU - Gorelik, Yuri AU - Hochberg, Irit AU - Nassar, Laila AU - Mashiach, Tanya AU - Khamaisi, Mogher AU - Lurie, Yael AU - Azzam, S. Zaher AU - Khoury, Johad AU - Kurnik, Daniel AU - Beyar, Rafael PY - 2020/2/20 TI - Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies JO - JMIR Med Inform SP - e16492 VL - 8 IS - 2 KW - synthetic data KW - electronic medical records KW - MDClone KW - validation study KW - big data analysis N2 - Background: Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed. Objective: This paper aimed to validate the results obtained when analyzing synthetic structured data for medical research. A comprehensive validation process concerning meaningful clinical questions and various types of data was conducted to assess the accuracy and precision of statistical estimates derived from synthetic patient data. Methods: A cross-hospital project was conducted to validate results obtained from synthetic data produced for five contemporary studies on various topics. For each study, results derived from synthetic data were compared with those based on real data. In addition, repeatedly generated synthetic datasets were used to estimate the bias and stability of results obtained from synthetic data. Results: This study demonstrated that results derived from synthetic data were predictive of results from real data. When the number of patients was large relative to the number of variables used, highly accurate and strongly consistent results were observed between synthetic and real data. For studies based on smaller populations that accounted for confounders and modifiers by multivariate models, predictions were of moderate accuracy, yet clear trends were correctly observed. Conclusions: The use of synthetic structured data provides a close estimate to real data results and is thus a powerful tool in shaping research hypotheses and accessing estimated analyses, without risking patient privacy. Synthetic data enable broad access to data (eg, for out-of-organization researchers), and rapid, safe, and repeatable analysis of data in hospitals or other health organizations where patient privacy is a primary value. UR - http://medinform.jmir.org/2020/2/e16492/ UR - http://dx.doi.org/10.2196/16492 UR - http://www.ncbi.nlm.nih.gov/pubmed/32130148 ID - info:doi/10.2196/16492 ER - TY - JOUR AU - Bacon, Seb AU - Goldacre, Ben PY - 2020/1/13 TI - Barriers to Working With National Health Service England?s Open Data JO - J Med Internet Res SP - e15603 VL - 22 IS - 1 KW - informatics KW - health services KW - software KW - access to information UR - https://www.jmir.org/2020/1/e15603 UR - http://dx.doi.org/10.2196/15603 UR - http://www.ncbi.nlm.nih.gov/pubmed/31929101 ID - info:doi/10.2196/15603 ER - TY - JOUR AU - de Lusignan, Simon AU - Correa, Ana AU - Dos Santos, Gaël AU - Meyer, Nadia AU - Haguinet, François AU - Webb, Rebecca AU - McGee, Christopher AU - Byford, Rachel AU - Yonova, Ivelina AU - Pathirannehelage, Sameera AU - Ferreira, Matos Filipa AU - Jones, Simon PY - 2019/11/14 TI - Enhanced Safety Surveillance of Influenza Vaccines in General Practice, Winter 2015-16: Feasibility Study JO - JMIR Public Health Surveill SP - e12016 VL - 5 IS - 4 KW - vaccines KW - safety management KW - medical records systems, computerized KW - drug-related side effects and adverse reactions KW - influenza, human KW - influenza vaccines KW - general practice KW - England N2 - Background: The European Medicines Agency (EMA) requires vaccine manufacturers to conduct enhanced real-time surveillance of seasonal influenza vaccination. The EMA has specified a list of adverse events of interest to be monitored. The EMA sets out 3 different ways to conduct such surveillance: (1) active surveillance, (2) enhanced passive surveillance, or (3) electronic health record data mining (EHR-DM). English general practice (GP) is a suitable setting to implement enhanced passive surveillance and EHR-DM. Objective: This study aimed to test the feasibility of conducting enhanced passive surveillance in GP using the yellow card scheme (adverse events of interest reporting cards) to determine if it has any advantages over EHR-DM alone. Methods: A total of 9 GPs in England participated, of which 3 tested the feasibility of enhanced passive surveillance and the other 6 EHR-DM alone. The 3 that tested EPS provided patients with yellow (adverse events) cards for patients to report any adverse events. Data were extracted from all 9 GPs? EHRs between weeks 35 and 49 (08/24/2015 to 12/06/2015), the main period of influenza vaccination. We conducted weekly analysis and end-of-study analyses. Results: Our GPs were largely distributed across England with a registered population of 81,040. In the week 49 report, 15,863/81,040 people (19.57% of the registered practice population) were vaccinated. In the EPS practices, staff managed to hand out the cards to 61.25% (4150/6776) of the vaccinees, and of these cards, 1.98% (82/4150) were returned to the GP offices. Adverse events of interests were reported by 113 /7223 people (1.56%) in the enhanced passive surveillance practices, compared with 322/8640 people (3.73%) in the EHR-DM practices. Conclusions: Overall, we demonstrated that GPs EHR-DM was an appropriate method of enhanced surveillance. However, the use of yellow cards, in enhanced passive surveillance practices, did not enhance the collection of adverse events of interests as demonstrated in this study. Their return rate was poor, data entry from them was not straightforward, and there were issues with data reconciliation. We concluded that customized cards prespecifying the EMA?s adverse events of interests, combined with EHR-DM, were needed to maximize data collection. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2016-015469 UR - http://publichealth.jmir.org/2019/4/e12016/ UR - http://dx.doi.org/10.2196/12016 UR - http://www.ncbi.nlm.nih.gov/pubmed/31724955 ID - info:doi/10.2196/12016 ER - TY - JOUR AU - Karnoe, Astrid AU - Kayser, Lars AU - Skovgaard, Lasse PY - 2019/10/9 TI - Identification of Factors That Motivate People With Multiple Sclerosis to Participate in Digital Data Collection in Research: Sequential Mixed Methods Study JO - JMIR Hum Factors SP - e13295 VL - 6 IS - 4 KW - health literacy KW - computer literacy KW - mobile apps KW - patient participation KW - research design KW - multiple sclerosis N2 - Background: Digital data collection has the potential to reduce participant burden in research projects that require extensive registrations from participants. To achieve this, a digital data collection tool needs to address potential barriers and motivations for participation. Objective: This study aimed to identify factors that may affect motivation for participation and adoption of a digital data collection tool in a research project on nutrition and multiple sclerosis (MS). Methods: The study was designed as a sequential mixed methods study with 3 phases. In phase 1, 15 semistructured interviews were conducted in a Danish population of individuals with MS. Interview guide frameworks were based on dimensions from the electronic health literacy framework and the Health Education Impact Questionnaire. Data from phase 1 were analyzed in a content analysis, and findings were used to inform the survey design in phase 2 that validates the results from the content analysis in a larger population. The survey consisted of 14 items, and it was sent to 1000 individuals with MS (response rate 42.5%). In phase 3, participants in 3 focus group interviews discussed how findings from phases 1 and 2 might affect motivation for participation and adoption of the digital tool. Results: The following 3 categories related to barriers and incentives for participation were identified in the content analysis of the 15 individual interviews: (1) life with MS, (2) use of technology, and (3) participation and incentives. Phase 1 findings were tested in phase 2?s survey in a larger population (n=1000). The majority of participants were comfortable using smartphone technologies and participated actively on social media platforms. MS symptoms did cause limitations in the use of Web pages and apps when the given pages had screen clutter, too many colors, or too small buttons. Life with MS meant that most participants had to ration their energy levels. Support from family and friends was important to participants, but support could also come in the form of physical aids (walking aids and similar) and digital aids (reminders, calendar functions, and medication management). Factors that could discourage participation were particularly related to the time it would take every day. The biggest motivations for participation were to contribute to research in MS, to learn more about one?s own MS and what affects it, and to be able to exchange experiences with other people with MS. Conclusions: MS causes limitations that put demands on tools developed for digital data collection. A digital data collection tool can increase chances of high adoption rates, but it needs to be supplemented with a clear and simple project design and continuous communication with participants. Motivational factors should be considered in both study design and the development of a digital data collection tool for research. UR - https://humanfactors.jmir.org/2019/4/e13295 UR - http://dx.doi.org/10.2196/13295 UR - http://www.ncbi.nlm.nih.gov/pubmed/31599738 ID - info:doi/10.2196/13295 ER - TY - JOUR AU - Neves, Luísa Ana AU - Poovendran, Dilkushi AU - Freise, Lisa AU - Ghafur, Saira AU - Flott, Kelsey AU - Darzi, Ara AU - Mayer, K. Erik PY - 2019/9/26 TI - Health Care Professionals? Perspectives on the Secondary Use of Health Records to Improve Quality and Safety of Care in England: Qualitative Study JO - J Med Internet Res SP - e14135 VL - 21 IS - 9 KW - electronic health records KW - information technology KW - health policy KW - safety culture N2 - Background: Health care professionals (HCPs) are often patients? first point of contact in what concerns the communication of the purposes, benefits, and risks of sharing electronic health records (EHRs) for nondirect care purposes. Their engagement is fundamental to ensure patients? buy-in and a successful implementation of health care data sharing schemes. However, their views on this subject are seldom evaluated. Objective: This study aimed to explore HCPs? perspectives on the secondary uses of health care data in England. Specifically, we aimed to assess their knowledge on its purposes and the main concerns about data sharing processes. Methods: A total of 30 interviews were conducted between March 27, 2017, and April 7, 2017, using a Web-based interview platform and following a topic guide with open-ended questions. The participants represented a variety of geographic locations across England (London, West Midlands, East of England, North East England, and Yorkshire and the Humber), covering both primary and secondary care services. The transcripts were compiled verbatim and systematically reviewed by 2 independent reviewers using the framework analysis method to identify emerging themes. Results: HCPs were knowledgeable about the possible secondary uses of data and highlighted its importance for patient profiling and tailored care, research, quality assurance, public health, and service delivery planning purposes. Main concerns toward data sharing included data accuracy, patients? willingness to share their records, challenges on obtaining free and informed consent, data security, lack of adequacy or understanding of current policies, and potential patient exposure and exploitation. Conclusions: These results suggest a high level of HCPs? understanding about the purposes of data sharing for secondary purposes; however, some concerns still remain. A better understanding of HCPs? knowledge and concerns could inform national communication policies and improve tailoring to maximize efficiency and improve patients? buy-in. UR - https://www.jmir.org/2019/9/e14135 UR - http://dx.doi.org/10.2196/14135 UR - http://www.ncbi.nlm.nih.gov/pubmed/31573898 ID - info:doi/10.2196/14135 ER - TY - JOUR AU - Jones, H. Kerina AU - Daniels, Helen AU - Squires, Emma AU - Ford, V. David PY - 2019/08/21 TI - Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study JO - J Med Internet Res SP - e14384 VL - 21 IS - 8 KW - human genome KW - genetic databases KW - public opinion KW - data storage KW - data linkage N2 - Background: The literature abounds with increasing numbers of research studies using genomic data in combination with health data (eg, health records and phenotypic and lifestyle data), with great potential for large-scale research and precision medicine. However, concerns have been raised about social acceptability and risks posed for individuals and their kin. Although there has been public engagement on various aspects of this topic, there is a lack of information about public views on data access models. Objective: This study aimed to address the lack of information on the social acceptability of access models for reusing genomic data collected for research in conjunction with health data. Models considered were open web-based access, released externally to researchers, and access within a data safe haven. Methods: Views were ascertained using a series of 8 public workshops (N=116). The workshops included an explanation of benefits and risks in using genomic data with health data, a facilitated discussion, and an exit questionnaire. The resulting quantitative data were analyzed using descriptive and inferential statistics, and the qualitative data were analyzed for emerging themes. Results: Respondents placed a high value on the reuse of genomic data but raised concerns including data misuse, information governance, and discrimination. They showed a preference for giving consent and use of data within a safe haven over external release or open access. Perceived risks with open access included data being used by unscrupulous parties, with external release included data security, and with safe havens included the need for robust safeguards. Conclusions: This is the first known study exploring public views of access models for reusing anonymized genomic and health data in research. It indicated that people are generally amenable but prefer data safe havens because of perceived sensitivities. We recommend that public views be incorporated into guidance on models for the reuse of genomic and health data. UR - http://www.jmir.org/2019/8/e14384/ UR - http://dx.doi.org/10.2196/14384 UR - http://www.ncbi.nlm.nih.gov/pubmed/31436163 ID - info:doi/10.2196/14384 ER - TY - JOUR AU - Kim, Heon Ho AU - Kim, Bora AU - Joo, Segyeong AU - Shin, Soo-Yong AU - Cha, Soung Hyo AU - Park, Rang Yu PY - 2019/08/06 TI - Why Do Data Users Say Health Care Data Are Difficult to Use? A Cross-Sectional Survey Study JO - J Med Internet Res SP - e14126 VL - 21 IS - 8 KW - data anonymization KW - privacy act KW - data sharing KW - data protection KW - data linking KW - health care data demand N2 - Background: There has been significant effort in attempting to use health care data. However, laws that protect patients? privacy have restricted data use because health care data contain sensitive information. Thus, discussions on privacy laws now focus on the active use of health care data beyond protection. However, current literature does not clarify the obstacles that make data usage and deidentification processes difficult or elaborate on users? needs for data linking from practical perspectives. Objective: The objective of this study is to investigate (1) the current status of data use in each medical area, (2) institutional efforts and difficulties in deidentification processes, and (3) users? data linking needs. Methods: We conducted a cross-sectional online survey. To recruit people who have used health care data, we publicized the promotion campaign and sent official documents to an academic society encouraging participation in the online survey. Results: In total, 128 participants responded to the online survey; 10 participants were excluded for either inconsistent responses or lack of demand for health care data. Finally, 118 participants? responses were analyzed. The majority of participants worked in general hospitals or universities (62/118, 52.5% and 51/118, 43.2%, respectively, multiple-choice answers). More than half of participants responded that they have a need for clinical data (82/118, 69.5%) and public data (76/118, 64.4%). Furthermore, 85.6% (101/118) of respondents conducted deidentification measures when using data, and they considered rigid social culture as an obstacle for deidentification (28/101, 27.7%). In addition, they required data linking (98/118, 83.1%), and they noted deregulation and data standardization to allow access to health care data linking (33/98, 33.7% and 38/98, 38.8%, respectively). There were no significant differences in the proportion of responded data needs and linking in groups that used health care data for either public purposes or commercial purposes. Conclusions: This study provides a cross-sectional view from a practical, user-oriented perspective on the kinds of data users want to utilize, efforts and difficulties in deidentification processes, and the needs for data linking. Most users want to use clinical and public data, and most participants conduct deidentification processes and express a desire to conduct data linking. Our study confirmed that they noted regulation as a primary obstacle whether their purpose is commercial or public. A legal system based on both data utilization and data protection needs is required. UR - https://www.jmir.org/2019/8/e14126/ UR - http://dx.doi.org/10.2196/14126 UR - http://www.ncbi.nlm.nih.gov/pubmed/31389335 ID - info:doi/10.2196/14126 ER - TY - JOUR AU - Bracha, Yiscah AU - Bagwell, Jacqueline AU - Furberg, Robert AU - Wald, S. Jonathan PY - 2019/06/03 TI - Consumer-Mediated Data Exchange for Research: Current State of US Law, Technology, and Trust JO - JMIR Med Inform SP - e12348 VL - 7 IS - 2 KW - health records, personal KW - electronic health records KW - patient access to records KW - research KW - trust KW - data collection KW - consumer health information UR - http://medinform.jmir.org/2019/2/e12348/ UR - http://dx.doi.org/10.2196/12348 UR - http://www.ncbi.nlm.nih.gov/pubmed/30946692 ID - info:doi/10.2196/12348 ER - TY - JOUR AU - Chevrier, Raphaël AU - Foufi, Vasiliki AU - Gaudet-Blavignac, Christophe AU - Robert, Arnaud AU - Lovis, Christian PY - 2019/05/31 TI - Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review JO - J Med Internet Res SP - e13484 VL - 21 IS - 5 KW - anonymization KW - anonymisation KW - de-identification KW - deidentification KW - pseudonymization KW - privacy KW - confidentiality KW - secondary use KW - data protection KW - scoping review N2 - Background: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients? privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects? privacy on one side, and the benefit of scientific advances on the other. Objective: This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. Methods: Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. Results: After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. Conclusions: Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community. UR - http://www.jmir.org/2019/5/e13484/ UR - http://dx.doi.org/10.2196/13484 UR - http://www.ncbi.nlm.nih.gov/pubmed/31152528 ID - info:doi/10.2196/13484 ER - TY - JOUR AU - Dankar, K. Fida AU - Madathil, Nisha AU - Dankar, K. Samar AU - Boughorbel, Sabri PY - 2019/04/29 TI - Privacy-Preserving Analysis of Distributed Biomedical Data: Designing Efficient and Secure Multiparty Computations Using Distributed Statistical Learning Theory JO - JMIR Med Inform SP - e12702 VL - 7 IS - 2 KW - data analytics KW - data aggregation KW - personal genetic information KW - patient data privacy N2 - Background: Biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world, which raises many privacy, ethical, and legal concerns. In the face of these concerns, privacy experts are trying to explore approaches to analyzing the distributed data while protecting its privacy. Many of these approaches are based on secure multiparty computations (SMCs). SMC is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data; however, it incurs heavy computation time and requires extensive communication between the involved parties. Objective: This study aimed to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data sharing. Methods: We have introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties? sites independently and then combine these computations to form 1 estimator for the collective dataset, thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy-preserving model is demonstrated through a linear regression application. Results: Our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results showed no loss of accuracy (over nonsecure regression) and very good performance (20 min for 100 million records). Conclusions: We used DSC to securely calculate a linear regression model over multiple datasets. Our experiments showed very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression. UR - http://medinform.jmir.org/2019/2/e12702/ UR - http://dx.doi.org/10.2196/12702 UR - http://www.ncbi.nlm.nih.gov/pubmed/31033449 ID - info:doi/10.2196/12702 ER - TY - JOUR AU - Devoe, Connor AU - Gabbidon, Harriett AU - Schussler, Nina AU - Cortese, Lauren AU - Caplan, Emily AU - Gorman, Colin AU - Jethwani, Kamal AU - Kvedar, Joseph AU - Agboola, Stephen PY - 2019/04/26 TI - Use of Electronic Health Records to Develop and Implement a Silent Best Practice Alert Notification System for Patient Recruitment in Clinical Research: Quality Improvement Initiative JO - JMIR Med Inform SP - e10020 VL - 7 IS - 2 KW - recruitment KW - silent BPA notifications KW - research KW - enrollment KW - innovation KW - electronic medical record KW - COPD N2 - Background: Participant recruitment, especially for frail, elderly, hospitalized patients, remains one of the greatest challenges for many research groups. Traditional recruitment methods such as chart reviews are often inefficient, low-yielding, time consuming, and expensive. Best Practice Alert (BPA) systems have previously been used to improve clinical care and inform provider decision making, but the system has not been widely used in the setting of clinical research. Objective: The primary objective of this quality-improvement initiative was to develop, implement, and refine a silent Best Practice Alert (sBPA) system that could maximize recruitment efficiency. Methods: The captured duration of the screening sessions for both methods combined with the allotted research coordinator hours in the Emerald-COPD (chronic obstructive pulmonary disease) study budget enabled research coordinators to estimate the cost-efficiency. Results: Prior to implementation, the sBPA system underwent three primary stages of development. Ultimately, the final iteration produced a system that provided similar results as the manual Epic Reporting Workbench method of screening. A total of 559 potential participants who met the basic prescreen criteria were identified through the two screening methods. Of those, 418 potential participants were identified by both methods simultaneously, 99 were identified only by the Epic Reporting Workbench Method, and 42 were identified only by the sBPA method. Of those identified by the Epic Reporting Workbench, only 12 (of 99, 12.12%) were considered eligible. Of those identified by the sBPA method, 30 (of 42, 71.43%) were considered eligible. Using a side-by-side comparison of the sBPA and the traditional Epic Reporting Workbench method of screening, the sBPA screening method was shown to be approximately four times faster than our previous screening method and estimated a projected 442.5 hours saved over the duration of the study. Additionally, since implementation, the sBPA system identified the equivalent of three additional potential participants per week. Conclusions: Automation of the recruitment process allowed us to identify potential participants in real time and find more potential participants who meet basic eligibility criteria. sBPA screening is a considerably faster method that allows for more efficient use of resources. This innovative and instrumental functionality can be modified to the needs of other research studies aiming to use the electronic medical records system for participant recruitment. UR - http://medinform.jmir.org/2019/2/e10020/ UR - http://dx.doi.org/10.2196/10020 UR - http://www.ncbi.nlm.nih.gov/pubmed/31025947 ID - info:doi/10.2196/10020 ER - TY - JOUR AU - Yang, Cheng-Yi AU - Chen, Ray-Jade AU - Chou, Wan-Lin AU - Lee, Yuarn-Jang AU - Lo, Yu-Sheng PY - 2019/02/01 TI - An Integrated Influenza Surveillance Framework Based on National Influenza-Like Illness Incidence and Multiple Hospital Electronic Medical Records for Early Prediction of Influenza Epidemics: Design and Evaluation JO - J Med Internet Res SP - e12341 VL - 21 IS - 2 KW - influenza KW - epidemics KW - influenza surveillance KW - electronic disease surveillance KW - electronic medical records KW - electronic health records KW - public health N2 - Background: Influenza is a leading cause of death worldwide and contributes to heavy economic losses to individuals and communities. Therefore, the early prediction of and interventions against influenza epidemics are crucial to reduce mortality and morbidity because of this disease. Similar to other countries, the Taiwan Centers for Disease Control and Prevention (TWCDC) has implemented influenza surveillance and reporting systems, which primarily rely on influenza-like illness (ILI) data reported by health care providers, for the early prediction of influenza epidemics. However, these surveillance and reporting systems show at least a 2-week delay in prediction, indicating the need for improvement. Objective: We aimed to integrate the TWCDC ILI data with electronic medical records (EMRs) of multiple hospitals in Taiwan. Our ultimate goal was to develop a national influenza trend prediction and reporting tool more accurate and efficient than the current influenza surveillance and reporting systems. Methods: First, the influenza expertise team at Taipei Medical University Health Care System (TMUHcS) identified surveillance variables relevant to the prediction of influenza epidemics. Second, we developed a framework for integrating the EMRs of multiple hospitals with the ILI data from the TWCDC website to proactively provide results of influenza epidemic monitoring to hospital infection control practitioners. Third, using the TWCDC ILI data as the gold standard for influenza reporting, we calculated Pearson correlation coefficients to measure the strength of the linear relationship between TMUHcS EMRs and regional and national TWCDC ILI data for 2 weekly time series datasets. Finally, we used the Moving Epidemic Method analyses to evaluate each surveillance variable for its predictive power for influenza epidemics. Results: Using this framework, we collected the EMRs and TWCDC ILI data of the past 3 influenza seasons (October 2014 to September 2017). On the basis of the EMRs of multiple hospitals, 3 surveillance variables, TMUHcS-ILI, TMUHcS-rapid influenza laboratory tests with positive results (RITP), and TMUHcS-influenza medication use (IMU), which reflected patients with ILI, those with positive results from rapid influenza diagnostic tests, and those treated with antiviral drugs, respectively, showed strong correlations with the TWCDC regional and national ILI data (r=.86-.98). The 2 surveillance variables?TMUHcS-RITP and TMUHcS-IMU?showed predictive power for influenza epidemics 3 to 4 weeks before the increase noted in the TWCDC ILI reports. Conclusions: Our framework periodically integrated and compared surveillance data from multiple hospitals and the TWCDC website to maintain a certain prediction quality and proactively provide monitored results. Our results can be extended to other infectious diseases, mitigating the time and effort required for data collection and analysis. Furthermore, this approach may be developed as a cost-effective electronic surveillance tool for the early and accurate prediction of epidemics of influenza and other infectious diseases in densely populated regions and nations. UR - http://www.jmir.org/2019/2/e12341/ UR - http://dx.doi.org/10.2196/12341 UR - http://www.ncbi.nlm.nih.gov/pubmed/30707099 ID - info:doi/10.2196/12341 ER - TY - JOUR AU - Essay, Patrick AU - Shahin, B. Tala AU - Balkan, Baran AU - Mosier, Jarrod AU - Subbian, Vignesh PY - 2019/01/24 TI - The Connected Intensive Care Unit Patient: Exploratory Analyses and Cohort Discovery From a Critical Care Telemedicine Database JO - JMIR Med Inform SP - e13006 VL - 7 IS - 1 KW - telemedicine KW - critical care KW - medical informatics applications KW - intensive care units N2 - Background: Many intensive care units (ICUs) utilize telemedicine in response to an expanding critical care patient population, off-hours coverage, and intensivist shortages, particularly in rural facilities. Advances in digital health technologies, among other reasons, have led to the integration of active, well-networked critical care telemedicine (tele-ICU) systems across the United States, which in turn, provide the ability to generate large-scale remote monitoring data from critically ill patients. Objective: The objective of this study was to explore opportunities and challenges of utilizing multisite, multimodal data acquired through critical care telemedicine. Using a publicly available tele-ICU, or electronic ICU (eICU), database, we illustrated the quality and potential uses of remote monitoring data, including cohort discovery for secondary research. Methods: Exploratory analyses were performed on the eICU Collaborative Research Database that includes deidentified clinical data collected from adult patients admitted to ICUs between 2014 and 2015. Patient and ICU characteristics, top admission diagnoses, and predictions from clinical scoring systems were extracted and analyzed. Additionally, a case study on respiratory failure patients was conducted to demonstrate research prospects using tele-ICU data. Results: The eICU database spans more than 200 hospitals and over 139,000 ICU patients across the United States with wide-ranging clinical data and diagnoses. Although mixed medical-surgical ICU was the most common critical care setting, patients with cardiovascular conditions accounted for more than 20% of ICU stays, and those with neurological or respiratory illness accounted for nearly 15% of ICU unit stays. The case study on respiratory failure patients showed that cohort discovery using the eICU database can be highly specific, albeit potentially limiting in terms of data provenance and sparsity for certain types of clinical questions. Conclusions: Large-scale remote monitoring data sources, such as the eICU database, have a strong potential to advance the role of critical care telemedicine by serving as a testbed for secondary research as well as for developing and testing tools, including predictive and prescriptive analytical solutions and decision support systems. The resulting tools will also inform coordination of care for critically ill patients, intensivist coverage, and the overall process of critical care telemedicine. UR - http://medinform.jmir.org/2019/1/e13006/ UR - http://dx.doi.org/10.2196/13006 UR - http://www.ncbi.nlm.nih.gov/pubmed/30679148 ID - info:doi/10.2196/13006 ER - TY - JOUR AU - Sharafoddini, Anis AU - Dubin, A. Joel AU - Maslove, M. David AU - Lee, Joon PY - 2019/01/08 TI - A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study JO - JMIR Med Inform SP - e11605 VL - 7 IS - 1 KW - electronic health records KW - clinical laboratory tests KW - machine learning KW - hospital mortality N2 - Background: The data missing from patient profiles in intensive care units (ICUs) are substantial and unavoidable. However, this incompleteness is not always random or because of imperfections in the data collection process. Objective: This study aimed to investigate the potential hidden information in data missing from electronic health records (EHRs) in an ICU and examine whether the presence or missingness of a variable itself can convey information about the patient health status. Methods: Daily retrieval of laboratory test (LT) measurements from the Medical Information Mart for Intensive Care III database was set as our reference for defining complete patient profiles. Missingness indicators were introduced as a way of representing presence or absence of the LTs in a patient profile. Thereafter, various feature selection methods (filter and embedded feature selection methods) were used to examine the predictive power of missingness indicators. Finally, a set of well-known prediction models (logistic regression [LR], decision tree, and random forest) were used to evaluate whether the absence status itself of a variable recording can provide predictive power. We also examined the utility of missingness indicators in improving predictive performance when used with observed laboratory measurements as model input. The outcome of interest was in-hospital mortality and mortality at 30 days after ICU discharge. Results: Regardless of mortality type or ICU day, more than 40% of the predictors selected by feature selection methods were missingness indicators. Notably, employing missingness indicators as the only predictors achieved reasonable mortality prediction on all days and for all mortality types (for instance, in 30-day mortality prediction with LR, we achieved area under the curve of the receiver operating characteristic [AUROC] of 0.6836±0.012). Including indicators with observed measurements in the prediction models also improved the AUROC; the maximum improvement was 0.0426. Indicators also improved the AUROC for Simplified Acute Physiology Score II model?a well-known ICU severity of illness score?confirming the additive information of the indicators (AUROC of 0.8045±0.0109 for 30-day mortality prediction for LR). Conclusions: Our study demonstrated that the presence or absence of LT measurements is informative and can be considered a potential predictor of in-hospital and 30-day mortality. The comparative analysis of prediction models also showed statistically significant prediction improvement when indicators were included. Moreover, missing data might reflect the opinions of examining clinicians. Therefore, the absence of measurements can be informative in ICUs and has predictive power beyond the measured data themselves. This initial case study shows promise for more in-depth analysis of missing data and its informativeness in ICUs. Future studies are needed to generalize these results. UR - http://medinform.jmir.org/2019/1/e11605/ UR - http://dx.doi.org/10.2196/11605 UR - http://www.ncbi.nlm.nih.gov/pubmed/30622091 ID - info:doi/10.2196/11605 ER - TY - JOUR AU - Rahman, Nabilah AU - Wang, D. Debby AU - Ng, Hui-Xian Sheryl AU - Ramachandran, Sravan AU - Sridharan, Srinath AU - Khoo, Astrid AU - Tan, Seng Chuen AU - Goh, Wei-Ping AU - Tan, Quan Xin PY - 2018/12/21 TI - Processing of Electronic Medical Records for Health Services Research in an Academic Medical Center: Methods and Validation JO - JMIR Med Inform SP - e10933 VL - 6 IS - 4 KW - health services KW - electronic medical records KW - data curation KW - validation studies N2 - Background: Electronic medical records (EMRs) contain a wealth of information that can support data-driven decision making in health care policy design and service planning. Although research using EMRs has become increasingly prevalent, challenges such as coding inconsistency, data validity, and lack of suitable measures in important domains still hinder the progress. Objective: The objective of this study was to design a structured way to process records in administrative EMR systems for health services research and assess validity in selected areas. Methods: On the basis of a local hospital EMR system in Singapore, we developed a structured framework for EMR data processing, including standardization and phenotyping of diagnosis codes, construction of cohort with multilevel views, and generation of variables and proxy measures to supplement primary data. Disease complexity was estimated by Charlson Comorbidity Index (CCI) and Polypharmacy Score (PPS), whereas socioeconomic status (SES) was estimated by housing type. Validity of modified diagnosis codes and derived measures were investigated. Results: Visit-level (N=7,778,761) and patient-level records (n=549,109) were generated. The International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) codes were standardized to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) with a mapping rate of 87.1%. In all, 97.4% of the ICD-9-CM codes were phenotyped successfully using Clinical Classification Software by Agency for Healthcare Research and Quality. Diagnosis codes that underwent modification (truncation or zero addition) in standardization and phenotyping procedures had the modification validated by physicians, with validity rates of more than 90%. Disease complexity measures (CCI and PPS) and SES were found to be valid and robust after a correlation analysis and a multivariate regression analysis. CCI and PPS were correlated with each other and positively correlated with health care utilization measures. Larger housing type was associated with lower government subsidies received, suggesting association with higher SES. Profile of constructed cohorts showed differences in disease prevalence, disease complexity, and health care utilization in those aged above 65 years and those aged 65 years or younger. Conclusions: The framework proposed in this study would be useful for other researchers working with EMR data for health services research. Further analyses would be needed to better understand differences observed in the cohorts. UR - http://medinform.jmir.org/2018/4/e10933/ UR - http://dx.doi.org/10.2196/10933 UR - http://www.ncbi.nlm.nih.gov/pubmed/30578188 ID - info:doi/10.2196/10933 ER - TY - JOUR AU - Tang, Chunlei AU - Plasek, M. Joseph AU - Bates, W. David PY - 2018/11/22 TI - Rethinking Data Sharing at the Dawn of a Health Data Economy: A Viewpoint JO - J Med Internet Res SP - e11519 VL - 20 IS - 11 KW - economics, hospital KW - machine learning KW - models, economic KW - precision medicine UR - http://www.jmir.org/2018/11/e11519/ UR - http://dx.doi.org/10.2196/11519 UR - http://www.ncbi.nlm.nih.gov/pubmed/30467103 ID - info:doi/10.2196/11519 ER - TY - JOUR AU - Rumbold, John AU - Pierscionek, Barbara PY - 2018/11/22 TI - Contextual Anonymization for Secondary Use of Big Data in Biomedical Research: Proposal for an Anonymization Matrix JO - JMIR Med Inform SP - e47 VL - 6 IS - 4 KW - anonymization matrix KW - big data KW - data protection KW - privacy KW - research ethics N2 - Background: The current law on anonymization sets the same standard across all situations, which poses a problem for biomedical research. Objective: We propose a matrix for setting different standards, which is responsive to context and public expectations. Methods: The law and ethics applicable to anonymization were reviewed in a scoping study. Social science on public attitudes and research on technical methods of anonymization were applied to formulate a matrix. Results: The matrix adjusts anonymization standards according to the sensitivity of the data and the safety of the place, people, and projects involved. Conclusions: The matrix offers a tool with context-specific standards for anonymization in data research. UR - http://medinform.jmir.org/2018/4/e47/ UR - http://dx.doi.org/10.2196/medinform.7096 UR - http://www.ncbi.nlm.nih.gov/pubmed/30467101 ID - info:doi/10.2196/medinform.7096 ER - TY - JOUR AU - Delvaux, Nicolas AU - Aertgeerts, Bert AU - van Bussel, CH Johan AU - Goderis, Geert AU - Vaes, Bert AU - Vermandere, Mieke PY - 2018/11/19 TI - Health Data for Research Through a Nationwide Privacy-Proof System in Belgium: Design and Implementation JO - JMIR Med Inform SP - e11428 VL - 6 IS - 4 KW - electronic health records KW - health information exchange KW - health information interoperability KW - learning health systems KW - medical record linkage N2 - Background: Health data collected during routine care have important potential for reuse for other purposes, especially as part of a learning health system to advance the quality of care. Many sources of bias have been identified through the lifecycle of health data that could compromise the scientific integrity of these data. New data protection legislation requires research facilities to improve safety measures and, thus, ensure privacy. Objective: This study aims to address the question on how health data can be transferred from various sources and using multiple systems to a centralized platform, called Healthdata.be, while ensuring the accuracy, validity, safety, and privacy. In addition, the study demonstrates how these processes can be used in various research designs relevant for learning health systems. Methods: The Healthdata.be platform urges uniformity of the data registration at the primary source through the use of detailed clinical models. Data retrieval and transfer are organized through end-to-end encrypted electronic health channels, and data are encoded using token keys. In addition, patient identifiers are pseudonymized so that health data from the same patient collected across various sources can still be linked without compromising the deidentification. Results: The Healthdata.be platform currently collects data for >150 clinical registries in Belgium. We demonstrated how the data collection for the Belgian primary care morbidity register INTEGO is organized and how the Healthdata.be platform can be used for a cluster randomized trial. Conclusions: Collecting health data in various sources and linking these data to a single patient is a promising feature that can potentially address important concerns on the validity and quality of health data. Safe methods of data transfer without compromising privacy are capable of transporting these data from the primary data provider or clinician to a research facility. More research is required to demonstrate that these methods improve the quality of data collection, allowing researchers to rely on electronic health records as a valid source for scientific data. UR - http://medinform.jmir.org/2018/4/e11428/ UR - http://dx.doi.org/10.2196/11428 UR - http://www.ncbi.nlm.nih.gov/pubmed/30455164 ID - info:doi/10.2196/11428 ER - TY - JOUR AU - Leroy, Gondy AU - Gu, Yang AU - Pettygrove, Sydney AU - Galindo, K. Maureen AU - Arora, Ananyaa AU - Kurzius-Spencer, Margaret PY - 2018/11/07 TI - Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application JO - J Med Internet Res SP - e10497 VL - 20 IS - 11 KW - parser KW - natural language processing KW - complex entity extraction KW - Autism Spectrum Disorder KW - DSM KW - electronic health records KW - decision tree KW - machine learning N2 - Background: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. Objective: Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. Methods: We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. Results: We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. Conclusions: Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets. UR - https://www.jmir.org/2018/11/e10497/ UR - http://dx.doi.org/10.2196/10497 UR - http://www.ncbi.nlm.nih.gov/pubmed/30404767 ID - info:doi/10.2196/10497 ER - TY - JOUR AU - Luo, Gang AU - Johnson, D. Michael AU - Nkoy, L. Flory AU - He, Shan AU - Stone, L. Bryan PY - 2018/11/05 TI - Appropriateness of Hospital Admission for Emergency Department Patients with Bronchiolitis: Secondary Analysis JO - JMIR Med Inform SP - e10498 VL - 6 IS - 4 KW - appropriate hospital admission KW - bronchiolitis KW - emergency department KW - operational definition N2 - Background: Bronchiolitis is the leading cause of hospitalization in children under 2 years of age. Each year in the United States, bronchiolitis results in 287,000 emergency department visits, 32%-40% of which end in hospitalization. Frequently, emergency department disposition decisions (to discharge or hospitalize) are made subjectively because of the lack of evidence and objective criteria for bronchiolitis management, leading to significant practice variation, wasted health care use, and suboptimal outcomes. At present, no operational definition of appropriate hospital admission for emergency department patients with bronchiolitis exists. Yet, such a definition is essential for assessing care quality and building a predictive model to guide and standardize disposition decisions. Our prior work provided a framework of such a definition using 2 concepts, one on safe versus unsafe discharge and another on necessary versus unnecessary hospitalization. Objective: The goal of this study was to determine the 2 threshold values used in the 2 concepts, with 1 value per concept. Methods: Using Intermountain Healthcare data from 2005-2014, we examined distributions of several relevant attributes of emergency department visits by children under 2 years of age for bronchiolitis. Via a data-driven approach, we determined the 2 threshold values. Results: We completed the first operational definition of appropriate hospital admission for emergency department patients with bronchiolitis. Appropriate hospital admissions include actual admissions with exposure to major medical interventions for more than 6 hours, as well as actual emergency department discharges, followed by an emergency department return within 12 hours ending in admission for bronchiolitis. Based on the definition, 0.96% (221/23,125) of the emergency department discharges were deemed unsafe. Moreover, 14.36% (432/3008) of the hospital admissions from the emergency department were deemed unnecessary. Conclusions: Our operational definition can define the prediction target for building a predictive model to guide and improve emergency department disposition decisions for bronchiolitis in the future. UR - http://medinform.jmir.org/2018/4/e10498/ UR - http://dx.doi.org/10.2196/10498 UR - http://www.ncbi.nlm.nih.gov/pubmed/30401659 ID - info:doi/10.2196/10498 ER - TY - JOUR AU - Lofters, K. Aisha AU - Telner, Deanna AU - Kalia, Sumeet AU - Slater, Morgan PY - 2018/11/01 TI - Association Between Adherence to Cancer Screening and Knowledge of Screening Guidelines: Feasibility Study Linking Self-Reported Survey Data With Medical Records JO - JMIR Cancer SP - e10529 VL - 4 IS - 2 KW - cancer screening KW - electronic medical records KW - electronic survey KW - health literacy KW - self-reported data N2 - Background: It is possible that patients who are more aware of cancer screening guidelines may be more likely to adhere to them. Objective: The aim of this study was to determine whether screening knowledge was associated with the documented screening participation. We also assessed the feasibility and acceptability of linking electronic survey data with clinical data in the primary care setting. Methods: We conducted an electronic survey at 2 sites in Toronto, Canada. At one site, eligible patients were approached in the waiting room to complete the survey; at the second site, eligible patients were sent an email inviting them to participate. All participants were asked to consent to the linkage of their survey results with their electronic medical record. Results: Overall, 1683 participants responded to the survey?247 responded in the waiting room (response rate, 247/366, 67.5%), whereas 1436 responded through email (response rate, 1436/5779, 24.8%). More than 80% (199/247 and 1245/1436) of participants consented to linking their survey data to their medical record. Knowledge of cancer screening guidelines was generally low. Although the majority of participants were able to identify the recommended tests for breast and cervical screening, very few participants correctly identified the recommended age and frequency of screening, with a maximum of 22% (21/95) of screen-eligible women correctly answering all 3 questions for breast cancer screening. However, this low level of knowledge among patients was not significantly associated with screening uptake, particularly after adjustment for sociodemographic characteristics. Conclusions: Although knowledge of screening guidelines was low among patients in our study, this was not associated with screening participation. Participants were willing to link self-reported data with their medical record data, which has substantial implications for future research. UR - http://cancer.jmir.org/2018/2/e10529/ UR - http://dx.doi.org/10.2196/10529 UR - http://www.ncbi.nlm.nih.gov/pubmed/30389655 ID - info:doi/10.2196/10529 ER - TY - JOUR AU - Hardjojo, Antony AU - Gunachandran, Arunan AU - Pang, Long AU - Abdullah, Bin Mohammed Ridzwan AU - Wah, Win AU - Chong, Chen Joash Wen AU - Goh, Hui Ee AU - Teo, Huang Sok AU - Lim, Gilbert AU - Lee, Li Mong AU - Hsu, Wynne AU - Lee, Vernon AU - Chen, I-Cheng Mark AU - Wong, Franco AU - Phang, King Jonathan Siung PY - 2018/6/11 TI - Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore JO - JMIR Med Inform SP - e36 VL - 6 IS - 2 KW - natural language processing KW - communicable diseases KW - epidemiology KW - surveillance KW - syndromic surveillance KW - electronic health records N2 - Background: Free-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms. Objective: The aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records. Methods: CHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration. Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation. Results: The symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status. Conclusions: We have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations. UR - http://medinform.jmir.org/2018/2/e36/ UR - http://dx.doi.org/10.2196/medinform.8204 UR - http://www.ncbi.nlm.nih.gov/pubmed/29907560 ID - info:doi/10.2196/medinform.8204 ER - TY - JOUR AU - Musy, N. Sarah AU - Ausserhofer, Dietmar AU - Schwendimann, René AU - Rothen, Ulrich Hans AU - Jeitziner, Marie-Madlen AU - Rutjes, WS Anne AU - Simon, Michael PY - 2018/05/30 TI - Trigger Tool?Based Automated Adverse Event Detection in Electronic Health Records: Systematic Review JO - J Med Internet Res SP - e198 VL - 20 IS - 5 KW - patient safety KW - electronic health records KW - patient harm KW - review, systematic N2 - Background: Adverse events in health care entail substantial burdens to health care systems, institutions, and patients. Retrospective trigger tools are often manually applied to detect AEs, although automated approaches using electronic health records may offer real-time adverse event detection, allowing timely corrective interventions. Objective: The aim of this systematic review was to describe current study methods and challenges regarding the use of automatic trigger tool-based adverse event detection methods in electronic health records. In addition, we aimed to appraise the applied studies? designs and to synthesize estimates of adverse event prevalence and diagnostic test accuracy of automatic detection methods using manual trigger tool as a reference standard. Methods: PubMed, EMBASE, CINAHL, and the Cochrane Library were queried. We included observational studies, applying trigger tools in acute care settings, and excluded studies using nonhospital and outpatient settings. Eligible articles were divided into diagnostic test accuracy studies and prevalence studies. We derived the study prevalence and estimates for the positive predictive value. We assessed bias risks and applicability concerns using Quality Assessment tool for Diagnostic Accuracy Studies-2 (QUADAS-2) for diagnostic test accuracy studies and an in-house developed tool for prevalence studies. Results: A total of 11 studies met all criteria: 2 concerned diagnostic test accuracy and 9 prevalence. We judged several studies to be at high bias risks for their automated detection method, definition of outcomes, and type of statistical analyses. Across all the 11 studies, adverse event prevalence ranged from 0% to 17.9%, with a median of 0.8%. The positive predictive value of all triggers to detect adverse events ranged from 0% to 100% across studies, with a median of 40%. Some triggers had wide ranging positive predictive value values: (1) in 6 studies, hypoglycemia had a positive predictive value ranging from 15.8% to 60%; (2) in 5 studies, naloxone had a positive predictive value ranging from 20% to 91%; (3) in 4 studies, flumazenil had a positive predictive value ranging from 38.9% to 83.3%; and (4) in 4 studies, protamine had a positive predictive value ranging from 0% to 60%. We were unable to determine the adverse event prevalence, positive predictive value, preventability, and severity in 40.4%, 10.5%, 71.1%, and 68.4% of the studies, respectively. These studies did not report the overall number of records analyzed, triggers, or adverse events; or the studies did not conduct the analysis. Conclusions: We observed broad interstudy variation in reported adverse event prevalence and positive predictive value. The lack of sufficiently described methods led to difficulties regarding interpretation. To improve quality, we see the need for a set of recommendations to endorse optimal use of research designs and adequate reporting of future adverse event detection studies. UR - http://www.jmir.org/2018/5/e198/ UR - http://dx.doi.org/10.2196/jmir.9901 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/jmir.9901 ER - TY - JOUR AU - Verheij, A. Robert AU - Curcin, Vasa AU - Delaney, C. Brendan AU - McGilchrist, M. Mark PY - 2018/05/29 TI - Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse JO - J Med Internet Res SP - e185 VL - 20 IS - 5 KW - electronic health record KW - data accuracy KW - data sharing KW - health information interoperability KW - health care systems KW - health information systems KW - medical informatics N2 - Background: Enormous amounts of data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. Objective: In this paper, we discuss whether data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of data. Methods: This paper is based on the authors? experience as users of electronic health records data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. Results: We first describe the different stages in the processing of electronic health record data, as well as the different purposes for which these data are used. Given the different data processing steps and purposes, we then discuss the possible mechanisms for each individual data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. Conclusions: There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated data collection process and environmental influences that can affect the quality of the data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless data quality issues are better understood and unless adequate controls are embedded throughout the data lifecycle, data-driven health care will not live up to its expectations. We need a data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda. UR - http://www.jmir.org/2018/5/e185/ UR - http://dx.doi.org/10.2196/jmir.9134 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/jmir.9134 ER - TY - JOUR AU - Huang, Yingxiang AU - Lee, Junghye AU - Wang, Shuang AU - Sun, Jimeng AU - Liu, Hongfang AU - Jiang, Xiaoqian PY - 2018/05/16 TI - Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources JO - JMIR Med Inform SP - e33 VL - 6 IS - 2 KW - interoperability KW - contextual embedding KW - predictive models KW - patient data privacy N2 - Background: Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless. Objective: The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. Methods: We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings. Results: We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling. Conclusions: Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care. UR - http://medinform.jmir.org/2018/2/e33/ UR - http://dx.doi.org/10.2196/medinform.9455 UR - http://www.ncbi.nlm.nih.gov/pubmed/29769172 ID - info:doi/10.2196/medinform.9455 ER - TY - JOUR AU - Ho, Kiki Hoi Ki AU - Görges, Matthias AU - Portales-Casamar, Elodie PY - 2018/05/14 TI - Data Access and Usage Practices Across a Cohort of Researchers at a Large Tertiary Pediatric Hospital: Qualitative Survey Study JO - JMIR Med Inform SP - e32 VL - 6 IS - 2 KW - clinical data sharing, research barriers, data linkage, data sources, data management, environmental scan, research facilitation N2 - Background: Health and health-related data collected as part of clinical care is a foundational component of quality improvement and research. While the importance of these data is widely recognized, there are many challenges faced by researchers attempting to use such data. It is crucial to acknowledge and identify barriers to improve data sharing and access practices and ultimately optimize research capacity. Objective: To better understand the current state, explore opportunities, and identify barriers, an environmental scan of investigators at BC Children?s Hospital Research Institute (BCCHR) was conducted to elucidate current local practices around data access and usage. Methods: The Clinical and Community Data, Analytics and Informatics group at BCCHR comprises over 40 investigators with diverse expertise and interest in data who share a common goal of facilitating data collection, usage, and access across the community. Semistructured interviews with 35 of these researchers were conducted, and data were summarized qualitatively. A total impact score, considering both frequency with which a problem occurs and the impact of the problem, was calculated for each item to prioritize and rank barriers. Results: Three main themes for barriers emerged: the lengthy turnaround time before data access (18/35, 51%), inconsistent and opaque data access processes (16/35, 46%), and the inability to link data (15/35, 43%) effectively. Less frequent themes included quality and usability of data, ethics and privacy review barriers, lack of awareness of data sources, and efforts required duplicating data extraction and linkage. The two main opportunities for improvement were data access facilitation (14/32, 44%) and migration toward a single data platform (10/32, 31%). Conclusions: By identifying the current state and needs of the data community onsite, this study enables us to focus our resources on combating the challenges having the greatest impact on researchers. The current state parallels that of the national landscape. By ensuring protection of privacy while achieving efficient data access, research institutions will be able to maximize their research capacity, a crucial step towards achieving the ultimate and shared goal between all stakeholders?to better health outcomes. UR - http://medinform.jmir.org/2018/2/e32/ UR - http://dx.doi.org/10.2196/medinform.8724 UR - http://www.ncbi.nlm.nih.gov/pubmed/29759958 ID - info:doi/10.2196/medinform.8724 ER - TY - JOUR AU - Munkhdalai, Tsendsuren AU - Liu, Feifan AU - Yu, Hong PY - 2018/04/25 TI - Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning JO - JMIR Public Health Surveill SP - e29 VL - 4 IS - 2 KW - medical informatics applications KW - drug-related side effects and adverse reactions KW - neural networks KW - natural language processing KW - electronic health records N2 - Background: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. Objective: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. Methods: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. Results: Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. Conclusions: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community. UR - http://publichealth.jmir.org/2018/2/e29/ UR - http://dx.doi.org/10.2196/publichealth.9361 UR - http://www.ncbi.nlm.nih.gov/pubmed/29695376 ID - info:doi/10.2196/publichealth.9361 ER - TY - JOUR AU - Bidargaddi, Niranjan AU - van Kasteren, Yasmin AU - Musiat, Peter AU - Kidd, Michael PY - 2018/04/24 TI - Developing a Third-Party Analytics Application Using Australia?s National Personal Health Records System: Case Study JO - JMIR Med Inform SP - e28 VL - 6 IS - 2 KW - computer software applications KW - electronic health record KW - software design KW - medication compliance N2 - Background: My Health Record (MyHR) is Australia?s national electronic health record (EHR) system. Poor usability and functionality have resulted in low utility, affecting enrollment and participation rates by both patients and clinicians alike. Similar to apps on mobile phone app stores, innovative third-party applications of MyHR platform data can enhance the usefulness of the platform, but there is a paucity of research into the processes involved in developing third-party applications that integrate and use data from EHR systems. Objective: The research describes the challenges involved in pioneering the development of a patient and clinician Web-based software application for MyHR and insights resulting from this experience. Methods: This research uses a case study approach, investigating the development and implementation of Actionable Intime Insights (AI2), a third-party application for MyHR, which translates Medicare claims records stored in MyHR into a clinically meaningful timeline visualization of health data for both patients and clinicians. This case study identifies the challenges encountered by the Personal Health Informatics team from Flinders University in the MyHR third-party application development environment. Results: The study presents a nuanced understanding of different data types and quality of data in MyHR and the complexities associated with developing secondary-use applications. Regulatory requirements associated with utilization of MyHR data, restrictions on visualizations of data, and processes of testing third-party applications were encountered during the development of the application. Conclusions: This study identified several processes, technical and regulatory barriers which, if addressed, can make MyHR a thriving ecosystem of health applications. It clearly identifies opportunities and considerations for the Australian Digital Health Agency and other national bodies wishing to encourage the development of new and innovative use cases for national EHRs. UR - http://medinform.jmir.org/2018/2/e28/ UR - http://dx.doi.org/10.2196/medinform.7710 UR - http://www.ncbi.nlm.nih.gov/pubmed/29691211 ID - info:doi/10.2196/medinform.7710 ER - TY - JOUR AU - Lin, Fong-Ci AU - Wang, Chen-Yu AU - Shang, Ji Rung AU - Hsiao, Fei-Yuan AU - Lin, Mei-Shu AU - Hung, Kuan-Yu AU - Wang, Jui AU - Lin, Zhen-Fang AU - Lai, Feipei AU - Shen, Li-Jiuan AU - Huang, Chih-Fen PY - 2018/04/24 TI - Identifying Unmet Treatment Needs for Patients With Osteoporotic Fracture: Feasibility Study for an Electronic Clinical Surveillance System JO - J Med Internet Res SP - e142 VL - 20 IS - 4 KW - information systems KW - public health surveillance KW - osteoporotic fractures KW - pharmacovigilance KW - guideline adherence N2 - Background: Traditional clinical surveillance relied on the results from clinical trials and observational studies of administrative databases. However, these studies not only required many valuable resources but also faced a very long time lag. Objective: This study aimed to illustrate a practical application of the National Taiwan University Hospital Clinical Surveillance System (NCSS) in the identification of patients with an osteoporotic fracture and to provide a high reusability infrastructure for longitudinal clinical data. Methods: The NCSS integrates electronic medical records in the National Taiwan University Hospital (NTUH) with a data warehouse and is equipped with a user-friendly interface. The NCSS was developed using professional insight from multidisciplinary experts, including clinical practitioners, epidemiologists, and biomedical engineers. The practical example identifying the unmet treatment needs for patients encountering major osteoporotic fractures described herein was mainly achieved by adopting the computerized workflow in the NCSS. Results: We developed the infrastructure of the NCSS, including an integrated data warehouse and an automatic surveillance workflow. By applying the NCSS, we efficiently identified 2193 patients who were newly diagnosed with a hip or vertebral fracture between 2010 and 2014 at NTUH. By adopting the filter function, we identified 1808 (1808/2193, 82.44%) patients who continued their follow-up at NTUH, and 464 (464/2193, 21.16%) patients who were prescribed anti-osteoporosis medications, within 3 and 12 months post the index date of their fracture, respectively. Conclusions: The NCSS systems can integrate the workflow of cohort identification to accelerate the survey process of clinically relevant problems and provide decision support in the daily practice of clinical physicians, thereby making the benefit of evidence-based medicine a reality. UR - http://www.jmir.org/2018/4/e142/ UR - http://dx.doi.org/10.2196/jmir.9477 UR - http://www.ncbi.nlm.nih.gov/pubmed/29691201 ID - info:doi/10.2196/jmir.9477 ER - TY - JOUR AU - Kim, Miran AU - Song, Yongsoo AU - Wang, Shuang AU - Xia, Yuhou AU - Jiang, Xiaoqian PY - 2018/04/17 TI - Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation JO - JMIR Med Inform SP - e19 VL - 6 IS - 2 KW - homomorphic encryption KW - machine learning KW - logistic regression KW - gradient descent N2 - Background: Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective: The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods: We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results: Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions: We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. UR - http://medinform.jmir.org/2018/2/e19/ UR - http://dx.doi.org/10.2196/medinform.8805 UR - http://www.ncbi.nlm.nih.gov/pubmed/29666041 ID - info:doi/10.2196/medinform.8805 ER - TY - JOUR AU - Tully, P. Mary AU - Bozentko, Kyle AU - Clement, Sarah AU - Hunn, Amanda AU - Hassan, Lamiece AU - Norris, Ruth AU - Oswald, Malcolm AU - Peek, Niels PY - 2018/03/28 TI - Investigating the Extent to Which Patients Should Control Access to Patient Records for Research: A Deliberative Process Using Citizens? Juries JO - J Med Internet Res SP - e112 VL - 20 IS - 3 KW - public participation KW - patient engagement KW - public opinion KW - medical research KW - confidentiality KW - privacy KW - national health services KW - data linkage KW - public policy, decision making, organizational N2 - Background: The secondary use of health data for research raises complex questions of privacy and governance. Such questions are ill-suited to opinion polling where citizens must choose quickly between multiple-choice answers based on little information. Objective: The aim of this project was to extend knowledge about what control informed citizens would seek over the use of health records for research after participating in a deliberative process using citizens? juries. Methods: Two 3-day citizens? juries, of 17 citizens each, were convened to reflect UK national demographics from 355 eligible applicants. Each jury addressed the mission ?To what extent should patients control access to patient records for secondary use?? Jurors heard from and questioned 5 expert witnesses (chosen either to inform the jury, or to argue for and against the secondary use of data), interspersed with structured opportunities to deliberate among themselves, including discussion and role-play. Jurors voted on a series of questions associated with the jury mission, giving their rationale. Individual views were polled using questionnaires at the beginning and at end of the process. Results: At the end of the process, 33 out of 34 jurors voted in support of the secondary use of data for research, with 24 wanting individuals to be able to opt out, 6 favoring opt in, and 3 voting that all records should be available without any consent process. When considering who should get access to data, both juries had very similar rationales. Both thought that public benefit was a key justification for access. Jury 1 was more strongly supportive of sharing patient records for public benefit, whereas jury 2 was more cautious and sought to give patients more control. Many jurors changed their opinion about who should get access to health records: 17 people became more willing to support wider information sharing of health data for public benefit, whereas 2 moved toward more patient control over patient records. Conclusions: The findings highlight that, when informed of both risks and opportunities associated with data sharing, citizens believe an individual?s right to privacy should not prevent research that can benefit the general public. The juries also concluded that patients should be notified of any such scheme and have the right to opt out if they so choose. Many jurors changed their minds about this complex policy question when they became more informed. Many, but not all, jurors became less skeptical about health data sharing, as they became better informed of its benefits and risks. UR - http://www.jmir.org/2018/3/e112/ UR - http://dx.doi.org/10.2196/jmir.7763 UR - http://www.ncbi.nlm.nih.gov/pubmed/29592847 ID - info:doi/10.2196/jmir.7763 ER - TY - JOUR AU - Sadat, Nazmus Md AU - Jiang, Xiaoqian AU - Aziz, Al Md Momin AU - Wang, Shuang AU - Mohammed, Noman PY - 2018/03/05 TI - Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation JO - JMIR Med Inform SP - e14 VL - 6 IS - 1 KW - privacy-preserving regression analysis KW - Intel SGX KW - somewhat homomorphic encryption N2 - Background: Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Objective: Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Methods: Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Results: Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. Conclusions: To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. UR - http://medinform.jmir.org/2018/1/e14/ UR - http://dx.doi.org/10.2196/medinform.8286 UR - http://www.ncbi.nlm.nih.gov/pubmed/29506966 ID - info:doi/10.2196/medinform.8286 ER - TY - JOUR AU - Ng, Charmaine Kamela AU - Meehan, Joseph Conor AU - Torrea, Gabriela AU - Goeminne, Léonie AU - Diels, Maren AU - Rigouts, Leen AU - de Jong, Catherine Bouke AU - André, Emmanuel PY - 2018/02/27 TI - Potential Application of Digitally Linked Tuberculosis Diagnostics for Real-Time Surveillance of Drug-Resistant Tuberculosis Transmission: Validation and Analysis of Test Results JO - JMIR Med Inform SP - e12 VL - 6 IS - 1 KW - tuberculosis KW - drug resistance KW - rifampicin-resistant tuberculosis KW - rapid diagnostic tests KW - Xpert MTB/RIF KW - Genotype MTBDRplus v2.0 KW - Genoscholar NTM + MDRTB II KW - RDT probe reactions KW - rpoB mutations KW - validation and analysis KW - real-time detection N2 - Background: Tuberculosis (TB) is the highest-mortality infectious disease in the world and the main cause of death related to antimicrobial resistance, yet its surveillance is still paper-based. Rifampicin-resistant TB (RR-TB) is an urgent public health crisis. The World Health Organization has, since 2010, endorsed a series of rapid diagnostic tests (RDTs) that enable rapid detection of drug-resistant strains and produce large volumes of data. In parallel, most high-burden countries have adopted connectivity solutions that allow linking of diagnostics, real-time capture, and shared repository of these test results. However, these connected diagnostics and readily available test results are not used to their full capacity, as we have yet to capitalize on fully understanding the relationship between test results and specific rpoB mutations to elucidate its potential application to real-time surveillance. Objective: We aimed to validate and analyze RDT data in detail, and propose the potential use of connected diagnostics and associated test results for real-time evaluation of RR-TB transmission. Methods: We selected 107 RR-TB strains harboring 34 unique rpoB mutations, including 30 within the rifampicin resistance?determining region (RRDR), from the Belgian Coordinated Collections of Microorganisms, Antwerp, Belgium. We subjected these strains to Xpert MTB/RIF, GenoType MTBDRplus v2.0, and Genoscholar NTM + MDRTB II, the results of which were validated against the strains? available rpoB gene sequences. We determined the reproducibility of the results, analyzed and visualized the probe reactions, and proposed these for potential use in evaluating transmission. Results: The RDT probe reactions detected most RRDR mutations tested, although we found a few critical discrepancies between observed results and manufacturers? claims. Based on published frequencies of probe reactions and RRDR mutations, we found specific probe reactions with high potential use in transmission studies: Xpert MTB/RIF probes A, Bdelayed, C, and Edelayed; Genotype MTBDRplus v2.0 WT2, WT5, and WT6; and Genoscholar NTM + MDRTB II S1 and S3. Inspection of probe reactions of disputed mutations may potentially resolve discordance between genotypic and phenotypic test results. Conclusions: We propose a novel approach for potential real-time detection of RR-TB transmission through fully using digitally linked TB diagnostics and shared repository of test results. To our knowledge, this is the first pragmatic and scalable work in response to the consensus of world-renowned TB experts in 2016 on the potential of diagnostic connectivity to accelerate efforts to eliminate TB. This is evidenced by the ability of our proposed approach to facilitate comparison of probe reactions between different RDTs used in the same setting. Integrating this proposed approach as a plug-in module to a connectivity platform will increase usefulness of connected TB diagnostics for RR-TB outbreak detection through real-time investigation of suspected RR-TB transmission cases based on epidemiologic linking. UR - http://medinform.jmir.org/2018/1/e12/ UR - http://dx.doi.org/10.2196/medinform.9309 UR - http://www.ncbi.nlm.nih.gov/pubmed/29487047 ID - info:doi/10.2196/medinform.9309 ER - TY - JOUR AU - Beaulieu-Jones, K. Brett AU - Lavage, R. Daniel AU - Snyder, W. John AU - Moore, H. Jason AU - Pendergrass, A. Sarah AU - Bauer, R. Christopher PY - 2018/02/23 TI - Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis JO - JMIR Med Inform SP - e11 VL - 6 IS - 1 KW - imputation KW - missing data KW - clinical laboratory test results KW - electronic health records N2 - Background: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. Objective: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. Methods: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). Results: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. Conclusions: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available. UR - http://medinform.jmir.org/2018/1/e11/ UR - http://dx.doi.org/10.2196/medinform.8960 UR - http://www.ncbi.nlm.nih.gov/pubmed/29475824 ID - info:doi/10.2196/medinform.8960 ER - TY - JOUR AU - Wellner, Ben AU - Grand, Joan AU - Canzone, Elizabeth AU - Coarr, Matt AU - Brady, W. Patrick AU - Simmons, Jeffrey AU - Kirkendall, Eric AU - Dean, Nathan AU - Kleinman, Monica AU - Sylvester, Peter PY - 2017/11/22 TI - Predicting Unplanned Transfers to the Intensive Care Unit: A Machine Learning Approach Leveraging Diverse Clinical Elements JO - JMIR Med Inform SP - e45 VL - 5 IS - 4 KW - clinical deterioration KW - machine learning KW - data mining KW - electronic health record KW - patient acuity KW - vital signs KW - nursing assessment KW - clinical laboratory techniques N2 - Background: Early warning scores aid in the detection of pediatric clinical deteriorations but include limited data inputs, rarely include data trends over time, and have limited validation. Objective: Machine learning methods that make use of large numbers of predictor variables are now commonplace. This work examines how different types of predictor variables derived from the electronic health record affect the performance of predicting unplanned transfers to the intensive care unit (ICU) at three large children?s hospitals. Methods: We trained separate models with data from three different institutions from 2011 through 2013 and evaluated models with 2014 data. Cases consisted of patients who transferred from the floor to the ICU and met one or more of 5 different priori defined criteria for suspected unplanned transfers. Controls were patients who were never transferred to the ICU. Predictor variables for the models were derived from vitals, labs, acuity scores, and nursing assessments. Classification models consisted of L1 and L2 regularized logistic regression and neural network models. We evaluated model performance over prediction horizons ranging from 1 to 16 hours. Results: Across the three institutions, the c-statistic values for our best models were 0.892 (95% CI 0.875-0.904), 0.902 (95% CI 0.880-0.923), and 0.899 (95% CI 0.879-0.919) for the task of identifying unplanned ICU transfer 6 hours before its occurrence and achieved 0.871 (95% CI 0.855-0.888), 0.872 (95% CI 0.850-0.895), and 0.850 (95% CI 0.825-0.875) for a prediction horizon of 16 hours. For our first model at 80% sensitivity, this resulted in a specificity of 80.5% (95% CI 77.4-83.7) and a positive predictive value of 5.2% (95% CI 4.5-6.2). Conclusions: Feature-rich models with many predictor variables allow for patient deterioration to be predicted accurately, even up to 16 hours in advance. UR - http://medinform.jmir.org/2017/4/e45/ UR - http://dx.doi.org/10.2196/medinform.8680 UR - http://www.ncbi.nlm.nih.gov/pubmed/29167089 ID - info:doi/10.2196/medinform.8680 ER - TY - JOUR AU - Li, Peiyao AU - Xie, Chen AU - Pollard, Tom AU - Johnson, William Alistair Edward AU - Cao, Desen AU - Kang, Hongjun AU - Liang, Hong AU - Zhang, Yuezhou AU - Liu, Xiaoli AU - Fan, Yong AU - Zhang, Yuan AU - Xue, Wanguo AU - Xie, Lixin AU - Celi, Anthony Leo AU - Zhang, Zhengbo PY - 2017/11/14 TI - Promoting Secondary Analysis of Electronic Medical Records in China: Summary of the PLAGH-MIT Critical Data Conference and Health Datathon JO - JMIR Med Inform SP - e43 VL - 5 IS - 4 KW - electronic health record KW - datathon KW - database KW - intensive care units UR - http://medinform.jmir.org/2017/4/e43/ UR - http://dx.doi.org/10.2196/medinform.7380 UR - http://www.ncbi.nlm.nih.gov/pubmed/29138126 ID - info:doi/10.2196/medinform.7380 ER - TY - JOUR AU - Lea, Christopher Nathan AU - Nicholls, Jacqueline AU - Dobbs, Christine AU - Sethi, Nayha AU - Cunningham, James AU - Ainsworth, John AU - Heaven, Martin AU - Peacock, Trevor AU - Peacock, Anthony AU - Jones, Kerina AU - Laurie, Graeme AU - Kalra, Dipak PY - 2016/06/21 TI - Data Safe Havens and Trust: Toward a Common Understanding of Trusted Research Platforms for Governing Secure and Ethical Health Research JO - JMIR Med Inform SP - e22 VL - 4 IS - 2 KW - trusted research platforms KW - data safe havens KW - trusted researchers KW - legislative and regulatory compliance KW - public engagement KW - public involvement KW - clinical research support KW - health record linkage supported research KW - genomics research support UR - http://medinform.jmir.org/2016/2/e22/ UR - http://dx.doi.org/10.2196/medinform.5571 UR - http://www.ncbi.nlm.nih.gov/pubmed/27329087 ID - info:doi/10.2196/medinform.5571 ER - TY - JOUR AU - Spencer, Karen AU - Sanders, Caroline AU - Whitley, A. Edgar AU - Lund, David AU - Kaye, Jane AU - Dixon, Gregory William PY - 2016/04/15 TI - Patient Perspectives on Sharing Anonymized Personal Health Data Using a Digital System for Dynamic Consent and Research Feedback: A Qualitative Study JO - J Med Internet Res SP - e66 VL - 18 IS - 4 KW - eHealth KW - data sharing KW - public trust KW - consent N2 - Background: Electronic health records are widely acknowledged to provide an important opportunity to anonymize patient-level health care data and collate across populations to support research. Nonetheless, in the wake of public and policy concerns about security and inappropriate use of data, conventional approaches toward data governance may no longer be sufficient to respect and protect individual privacy. One proposed solution to improve transparency and public trust is known as Dynamic Consent, which uses information technology to facilitate a more explicit and accessible opportunity to opt out. In this case, patients can tailor preferences about whom they share their data with and can change their preferences reliably at any time. Furthermore, electronic systems provide opportunities for informing patients about data recipients and the results of research to which their data have contributed. Objective: To explore patient perspectives on the use of anonymized health care data for research purposes. To evaluate patient perceptions of a Dynamic Consent model and electronic system to enable and implement ongoing communication and collaboration between patients and researchers. Methods: A total of 26 qualitative interviews and three focus groups were conducted that included a video presentation explaining the reuse of anonymized electronic patient records for research. Slides and tablet devices were used to introduce the Dynamic Consent system for discussion. A total of 35 patients with chronic rheumatic disease with varying levels of illness and social deprivation were recruited from a rheumatology outpatient clinic; 5 participants were recruited from a patient and public involvement health research network. Results: Patients were supportive of sharing their anonymized electronic patient record for research, but noted a lack of transparency and awareness around the use of data, making it difficult to secure public trust. While there were general concerns about detrimental consequences of data falling into the wrong hands, such as insurance companies, 39 out of 40 (98%) participants generally considered that the altruistic benefits of sharing health care data outweighed the risks. Views were mostly positive about the use of an electronic interface to enable greater control over consent choices, although some patients were happy to share their data without further engagement. Participants were particularly enthusiastic about the system as a means of enabling feedback regarding data recipients and associated research results, noting that this would improve trust and public engagement in research. This underlines the importance of patient and public involvement and engagement throughout the research process, including the reuse of anonymized health care data for research. More than half of patients found the touch screen interface easy to use, although a significant minority, especially those with limited access to technology, expressed some trepidation and felt they may need support to use the system. Conclusions: Patients from a range of socioeconomic backgrounds viewed a digital system for Dynamic Consent positively, in particular, feedback about data recipients and research results. Implementation of a digital Dynamic Consent system would require careful interface design and would need to be located within a robust data infrastructure; it has the potential to improve trust and engagement in electronic medical record research. UR - http://www.jmir.org/2016/4/e66/ UR - http://dx.doi.org/10.2196/jmir.5011 UR - http://www.ncbi.nlm.nih.gov/pubmed/27083521 ID - info:doi/10.2196/jmir.5011 ER - TY - JOUR AU - Williams, Hawys AU - Spencer, Karen AU - Sanders, Caroline AU - Lund, David AU - Whitley, A. Edgar AU - Kaye, Jane AU - Dixon, G. William PY - 2015/01/13 TI - Dynamic Consent: A Possible Solution to Improve Patient Confidence and Trust in How Electronic Patient Records Are Used in Medical Research JO - JMIR Med Inform SP - e3 VL - 3 IS - 1 KW - dynamic consent KW - electronic patient record (EPR) KW - medical research KW - confidentiality KW - privacy KW - governance KW - NHS KW - data linkage KW - care.data UR - http://medinform.jmir.org/2015/1/e3/ UR - http://dx.doi.org/10.2196/medinform.3525 UR - http://www.ncbi.nlm.nih.gov/pubmed/25586934 ID - info:doi/10.2196/medinform.3525 ER -