%0 Journal Article %@ 2563-6316 %I JMIR Publications %V 6 %N %P e60866 %T Improved Alzheimer Disease Diagnosis With a Machine Learning Approach and Neuroimaging: Case Study Development %A Lazli,Lilia %K Alzheimer disease %K computer-aided diagnosis system %K machine learning %K principal component analysis %K linear discriminant analysis %K t-distributed stochastic neighbor embedding %K feedforward neural network %K vision transformer architecture %K support vector machines %K magnetic resonance imaging %K positron emission tomography imaging %K Open Access Series of Imaging Studies %K Alzheimer's Disease Neuroimaging Initiative %K OASIS %K ADNI %D 2025 %7 21.4.2025 %9 %J JMIRx Med %G English %X Background: Alzheimer disease (AD) is a severe neurological brain disorder. While not curable, earlier detection can help improve symptoms substantially. Machine learning (ML) models are popular and well suited for medical image processing tasks such as computer-aided diagnosis. These techniques can improve the process for an accurate diagnosis of AD. Objective: In this paper, a complete computer-aided diagnosis system for the diagnosis of AD has been presented. We investigate the performance of some of the most used ML techniques for AD detection and classification using neuroimages from the Open Access Series of Imaging Studies (OASIS) and Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets. Methods: The system uses artificial neural networks (ANNs) and support vector machines (SVMs) as classifiers, and dimensionality reduction techniques as feature extractors. To retrieve features from the neuroimages, we used principal component analysis (PCA), linear discriminant analysis, and t-distributed stochastic neighbor embedding. These features are fed into feedforward neural networks (FFNNs) and SVM-based ML classifiers. Furthermore, we applied the vision transformer (ViT)–based ANNs in conjunction with data augmentation to distinguish patients with AD from healthy controls. Results: Experiments were performed on magnetic resonance imaging and positron emission tomography scans. The OASIS dataset included a total of 300 patients, while the ADNI dataset included 231 patients. For OASIS, 90 (30%) patients were healthy and 210 (70%) were severely impaired by AD. Likewise for the ADNI database, a total of 149 (64.5%) patients with AD were detected and 82 (35.5%) patients were used as healthy controls. An important difference was established between healthy patients and patients with AD (P=.02). We examined the effectiveness of the three feature extractors and classifiers using 5-fold cross-validation and confusion matrix–based standard classification metrics, namely, accuracy, sensitivity, specificity, precision, F1-score, and area under the receiver operating characteristic curve (AUROC). Compared with the state-of-the-art performing methods, the success rate was satisfactory for all the created ML models, but SVM and FFNN performed best with the PCA extractor, while the ViT classifier performed best with more data. The data augmentation/ViT approach worked better overall, achieving accuracies of 93.2% (sensitivity=87.2, specificity=90.5, precision=87.6, F1-score=88.7, and AUROC=92) for OASIS and 90.4% (sensitivity=85.4, specificity=88.6, precision=86.9, F1-score=88, and AUROC=90) for ADNI. Conclusions: Effective ML models using neuroimaging data could help physicians working on AD diagnosis and will assist them in prescribing timely treatment to patients with AD. Good results were obtained on the OASIS and ADNI datasets with all the proposed classifiers, namely, SVM, FFNN, and ViTs. However, the results show that the ViT model is much better at predicting AD than the other models when a sufficient amount of data are available to perform the training. This highlights that the data augmentation process could impact the overall performance of the ViT model. %R 10.2196/60866 %U https://xmed.jmir.org/2025/1/e60866 %U https://doi.org/10.2196/60866 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e60442 %T Daily Automated Prediction of Delirium Risk in Hospitalized Patients: Model Development and Validation %A Shaw,Kendrick Matthew %A Shao,Yu-Ping %A Ghanta,Manohar %A Junior,Valdery Moura %A Kimchi,Eyal Y %A Houle,Timothy T %A Akeju,Oluwaseun %A Westover,Michael Brandon %+ Department of Anesthesia, Pain, and Critical care Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston, MA, 02114, United States, 1 6177262000, kmshaw@mgh.harvard.edu %K delirium %K prediction model %K machine learning %K boosted trees %K model development %K validation %K AI %K artificial intelligence %K screening %K prevention %K develop %K logistic regression %K vitals %K vital signs %K gender %K age %K prevent %D 2025 %7 18.4.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Delirium is common in hospitalized patients and is correlated with increased morbidity and mortality. Despite this, delirium is underdiagnosed, and many institutions do not have sufficient resources to consistently apply effective screening and prevention. Objective: This study aims to develop a machine learning algorithm to identify patients at the highest risk of delirium in the hospital each day in an automated fashion based on data available in the electronic medical record, reducing the barrier to large-scale delirium screening. Methods: We developed and compared multiple machine learning models on a retrospective dataset of all hospitalized adult patients with recorded Confusion Assessment Method (CAM) screens at a major academic medical center from April 2, 2016, to January 16, 2019, comprising 23,006 patients. The patient’s age, gender, and all available laboratory values, vital signs, prior CAM screens, and medication administrations were used as potential predictors. Four machine learning approaches were investigated: logistic regression with L1-regularization, multilayer perceptrons, random forests, and boosted trees. Model development used 80% of the patients; the remaining 20% was reserved for testing the final models. Laboratory values, vital signs, medications, gender, and age were used to predict a positive CAM screen in the next 24 hours. Results: The boosted tree model achieved the greatest predictive power, with an area under the receiver operator characteristic curve (AUROC) of 0.92 (95% CI 0.913-9.22), followed by the random forest (AUROC 0.91, 95% CI 0.909-0.918), multilayer perceptron (AUROC 0.86, 95% CI 0.850-0.861), and logistic regression (AUROC 0.85, 95% CI 0.841-0.852). These AUROCs decreased to 0.78-0.82 and 0.74-0.80 when limited to patients who currently do not or never have had delirium, respectively. Conclusions: A boosted tree machine learning model was able to identify hospitalized patients at elevated risk for delirium in the next 24 hours. This may allow for automated delirium risk screening and more precise targeting of proven and investigational interventions to prevent delirium. %M 39721068 %R 10.2196/60442 %U https://medinform.jmir.org/2025/1/e60442 %U https://doi.org/10.2196/60442 %U http://www.ncbi.nlm.nih.gov/pubmed/39721068 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66530 %T Diagnosis Test Accuracy of Artificial Intelligence for Endometrial Cancer: Systematic Review and Meta-Analysis %A Wang,Longyun %A Wang,Zeyu %A Zhao,Bowei %A Wang,Kai %A Zheng,Jingying %A Zhao,Lijing %+ Department of Gynecology and Obstetrics, The Second Hospital of Jilin University, No.4026, Yatai Street, Changchun, 130000, China, 86 15704313636, zheng_jy@jlu.edu.cn %K artificial intelligence %K endometrial cancer %K diagnostic test accuracy %K systematic review %K meta-analysis %K machine learning %K deep learning %D 2025 %7 18.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Endometrial cancer is one of the most common gynecological tumors, and early screening and diagnosis are crucial for its treatment. Research on the application of artificial intelligence (AI) in the diagnosis of endometrial cancer is increasing, but there is currently no comprehensive meta-analysis to evaluate the diagnostic accuracy of AI in screening for endometrial cancer. Objective: This paper presents a systematic review of AI-based endometrial cancer screening, which is needed to clarify its diagnostic accuracy and provide evidence for the application of AI technology in screening for endometrial cancer. Methods: A search was conducted across PubMed, Embase, Cochrane Library, Web of Science, and Scopus databases to include studies published in English, which evaluated the performance of AI in endometrial cancer screening. A total of 2 independent reviewers screened the titles and abstracts, and the quality of the selected studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies—2 (QUADAS-2) tool. The certainty of the diagnostic test evidence was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. Results: A total of 13 studies were included, and the hierarchical summary receiver operating characteristic model used for the meta-analysis showed that the overall sensitivity of AI-based endometrial cancer screening was 86% (95% CI 79%-90%) and specificity was 92% (95% CI 87%-95%). Subgroup analysis revealed similar results across AI type, study region, publication year, and study type, but the overall quality of evidence was low. Conclusions: AI-based endometrial cancer screening can effectively detect patients with endometrial cancer, but large-scale population studies are needed in the future to further clarify the diagnostic accuracy of AI in screening for endometrial cancer. Trial Registration: PROSPERO CRD42024519835; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024519835 %M 40249940 %R 10.2196/66530 %U https://www.jmir.org/2025/1/e66530 %U https://doi.org/10.2196/66530 %U http://www.ncbi.nlm.nih.gov/pubmed/40249940 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66491 %T Artificial Intelligence Models for Pediatric Lung Sound Analysis: Systematic Review and Meta-Analysis %A Park,Ji Soo %A Park,Sa-Yoon %A Moon,Jae Won %A Kim,Kwangsoo %A Suh,Dong In %+ Department of Pediatrics, Seoul National University College of Medicine, 101, Daehak-Ro Jongno-Gu, Seoul, 03080, Republic of Korea, 82 2 2072 362, dongins0@snu.ac.kr %K machine learning %K respiratory disease classification %K wheeze detection %K auscultation %K mel-spectrogram %K abnormal lung sound detection %K artificial intelligence %K pediatric %K lung sound analysis %K systematic review %K asthma %K pneumonia %K children %K morbidity %K mortality %K diagnostic %K respiratory pathology %D 2025 %7 18.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Pediatric respiratory diseases, including asthma and pneumonia, are major causes of morbidity and mortality in children. Auscultation of lung sounds is a key diagnostic tool but is prone to subjective variability. The integration of artificial intelligence (AI) and machine learning (ML) with electronic stethoscopes offers a promising approach for automated and objective lung sound. Objective: This systematic review and meta-analysis assess the performance of ML models in pediatric lung sound analysis. The study evaluates the methodologies, model performance, and database characteristics while identifying limitations and future directions for clinical implementation. Methods: A systematic search was conducted in Medline via PubMed, Embase, Web of Science, OVID, and IEEE Xplore for studies published between January 1, 1990, and December 16, 2024. Inclusion criteria are as follows: studies developing ML models for pediatric lung sound classification with a defined database, physician-labeled reference standard, and reported performance metrics. Exclusion criteria are as follows: studies focusing on adults, cardiac auscultation, validation of existing models, or lacking performance metrics. Risk of bias was assessed using a modified Quality Assessment of Diagnostic Accuracy Studies (version 2) framework. Data were extracted on study design, dataset, ML methods, feature extraction, and classification tasks. Bivariate meta-analysis was performed for binary classification tasks, including wheezing and abnormal lung sound detection. Results: A total of 41 studies met the inclusion criteria. The most common classification task was binary detection of abnormal lung sounds, particularly wheezing. Pooled sensitivity and specificity for wheeze detection were 0.902 (95% CI 0.726-0.970) and 0.955 (95% CI 0.762-0.993), respectively. For abnormal lung sound detection, pooled sensitivity was 0.907 (95% CI 0.816-0.956) and specificity 0.877 (95% CI 0.813-0.921). The most frequently used feature extraction methods were Mel-spectrogram, Mel-frequency cepstral coefficients, and short-time Fourier transform. Convolutional neural networks were the predominant ML model, often combined with recurrent neural networks or residual network architectures. However, high heterogeneity in dataset size, annotation methods, and evaluation criteria were observed. Most studies relied on small, single-center datasets, limiting generalizability. Conclusions: ML models show high accuracy in pediatric lung sound analysis, but face limitations due to dataset heterogeneity, lack of standard guidelines, and limited external validation. Future research should focus on standardized protocols and the development of large-scale, multicenter datasets to improve model robustness and clinical implementation. %M 40249944 %R 10.2196/66491 %U https://www.jmir.org/2025/1/e66491 %U https://doi.org/10.2196/66491 %U http://www.ncbi.nlm.nih.gov/pubmed/40249944 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e70752 %T Implementation of an Integrated, Clinical Decision Support Tool at the Point of Antihypertensive Medication Refill Request to Improve Hypertension Management: Controlled Pre-Post Study %A Matulis 3rd,John Charles %A Greenwood,Jason %A Eberle,Michele %A Anderson,Benjamin %A Blair,David %A Chaudhry,Rajeev %K clinical decision support systems %K population health %K hypertension %K electronic health records %D 2025 %7 11.4.2025 %9 %J JMIR Med Inform %G English %X Background: Improving processes regarding the management of electronic health record (EHR) requests for chronic antihypertensive medication renewals may represent an opportunity to enhance blood pressure (BP) management at the individual and population level. Objective: This study aimed to evaluate the effectiveness of the eRx HTN Chart Check, an integrated clinical decision support tool available at the point of antihypertensive medication refill request, in facilitating enhanced provider management of chronic hypertension. Methods: The study was conducted at two Mayo Clinic sites—Northwest Wisconsin Family Medicine and Rochester Community Internal Medicine practices—with control groups in comparable Mayo Clinic practices. The intervention integrated structured clinical data, including recent BP readings, laboratory results, and visit dates, into the electronic prescription renewal interface to facilitate prescriber decision-making regarding hypertension management. A difference-in-differences (DID) design compared pre- and postintervention hypertension control rates between the intervention and control groups. Data were collected from the Epic EHR system and analyzed using linear regression models. Results: The baseline BP control rates were slightly higher in intervention clinics. Postimplementation, no significant improvement in population-level hypertension control was observed (DID estimate: 0.07%, 95% CI −4.0% to 4.1%; P=.97). Of the 19,968 refill requests processed, 46% met all monitoring criteria. However, clinician approval rates remained high (90%), indicating minimal impact on prescribing behavior. Conclusions: Despite successful implementation, the tool did not significantly improve hypertension control, possibly due to competing quality initiatives and high in-basket volumes. Future iterations should focus on enhanced integration with other decision support tools and strategies to improve clinician engagement and patient outcomes. Further research is needed to optimize chronic disease management through EHR-integrated decision support systems. %R 10.2196/70752 %U https://medinform.jmir.org/2025/1/e70752 %U https://doi.org/10.2196/70752 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e65629 %T Model-Based Feature Extraction and Classification for Parkinson Disease Screening Using Gait Analysis: Development and Validation Study %A Lim,Ming De %A Connie,Tee %A Goh,Michael Kah Ong %A Saedon,Nor ‘Izzati %+ Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka, 75450, Malaysia, 60 62523592, tee.connie@mmu.edu.my %K model-based features %K gait analysis %K Parkinson disease %K computer vision %K support vector machine %D 2025 %7 8.4.2025 %9 Original Paper %J JMIR Aging %G English %X Background: Parkinson disease (PD) is a progressive neurodegenerative disorder that affects motor coordination, leading to gait abnormalities. Early detection of PD is crucial for effective management and treatment. Traditional diagnostic methods often require invasive procedures or are performed when the disease has significantly progressed. Therefore, there is a need for noninvasive techniques that can identify early motor symptoms, particularly those related to gait. Objective: The study aimed to develop a noninvasive approach for the early detection of PD by analyzing model-based gait features. The primary focus is on identifying subtle gait abnormalities associated with PD using kinematic characteristics. Methods: Data were collected through controlled video recordings of participants performing the timed up and go (TUG) assessment, with particular emphasis on the turning phase. The kinematic features analyzed include shoulder distance, step length, stride length, knee and hip angles, leg and arm symmetry, and trunk angles. These features were processed using advanced filtering techniques and analyzed through machine learning methods to distinguish between normal and PD-affected gait patterns. Results: The analysis of kinematic features during the turning phase of the TUG assessment revealed that individuals with PD exhibited subtle gait abnormalities, such as freezing of gait, reduced step length, and asymmetrical movements. The model-based features proved effective in differentiating between normal and PD-affected gait, demonstrating the potential of this approach in early detection. Conclusions: This study presents a promising noninvasive method for the early detection of PD by analyzing specific gait features during the turning phase of the TUG assessment. The findings suggest that this approach could serve as a sensitive and accurate tool for diagnosing and monitoring PD, potentially leading to earlier intervention and improved patient outcomes. %M 40198116 %R 10.2196/65629 %U https://aging.jmir.org/2025/1/e65629 %U https://doi.org/10.2196/65629 %U http://www.ncbi.nlm.nih.gov/pubmed/40198116 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62732 %T Investigating Clinicians’ Intentions and Influencing Factors for Using an Intelligence-Enabled Diagnostic Clinical Decision Support System in Health Care Systems: Cross-Sectional Survey %A Zheng,Rui %A Jiang,Xiao %A Shen,Li %A He,Tianrui %A Ji,Mengting %A Li,Xingyi %A Yu,Guangjun %+ , Shanghai Children's Hospital, No 355 Luding Road, Shanghai, 200062, China, 86 18917762998, gjyu@shchildren.com.cn %K artificial intelligence %K clinical decision support systems %K task-technology fit %K technology acceptance model %K perceived risk %K performance expectations %K intention to use %D 2025 %7 7.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: An intelligence-enabled clinical decision support system (CDSS) is a computerized system that integrates medical knowledge, patient data, and clinical guidelines to assist health care providers make clinical decisions. Research studies have shown that CDSS utilization rates have not met expectations. Clinicians’ intentions and their attitudes determine the use and promotion of CDSS in clinical practice. Objective: The aim of this study was to enhance the successful utilization of CDSS by analyzing the pivotal factors that influence clinicians’ intentions to adopt it and by putting forward targeted management recommendations. Methods: This study proposed a research model grounded in the task-technology fit model and the technology acceptance model, which was then tested through a cross-sectional survey. The measurement instrument comprised demographic characteristics, multi-item scales, and an open-ended query regarding areas where clinicians perceived the system required improvement. We leveraged structural equation modeling to assess the direct and indirect effects of “task-technology fit” and “perceived ease of use” on clinicians’ intentions to use the CDSS when mediated by “performance expectation” and “perceived risk.” We collated and analyzed the responses to the open-ended question. Results: We collected a total of 247 questionnaires. The model explained 65.8% of the variance in use intention. Performance expectations (β=0.228; P<.001) and perceived risk (β=–0.579; P<.001) were both significant predictors of use intention. Task-technology fit (β=–0.281; P<.001) and perceived ease of use (β=–0.377; P<.001) negatively affected perceived risk. Perceived risk (β=–0.308; P<.001) negatively affected performance expectations. Task-technology fit positively affected perceived ease of use (β=0.692; P<.001) and performance expectations (β=0.508; P<.001). Task characteristics (β=0.168; P<.001) and technology characteristics (β=0.749; P<.001) positively affected task-technology fit. Contrary to expectations, perceived ease of use (β=0.108; P=.07) did not have a significant impact on use intention. From the open-ended question, 3 main themes emerged regarding clinicians’ perceived deficiencies in CDSS: system security risks, personalized interaction, seamless integration. Conclusions: Perceived risk and performance expectations were direct determinants of clinicians’ adoption of CDSS, significantly influenced by task-technology fit and perceived ease of use. In the future, increasing transparency within CDSS and fostering trust between clinicians and technology should be prioritized. Furthermore, focusing on personalized interactions and ensuring seamless integration into clinical workflows are crucial steps moving forward. %M 40194276 %R 10.2196/62732 %U https://www.jmir.org/2025/1/e62732 %U https://doi.org/10.2196/62732 %U http://www.ncbi.nlm.nih.gov/pubmed/40194276 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63105 %T Modernizing the Staging of Parkinson Disease Using Digital Health Technology %A Templeton,John Michael %A Poellabauer,Christian %A Schneider,Sandra %A Rahimi,Morteza %A Braimoh,Taofeek %A Tadamarry,Fhaheem %A Margolesky,Jason %A Burke,Shanna %A Al Masry,Zeina %+ , Bellini College of Artificial Intelligence, Cybersecurity, and Computing, University of South Florida, 4202 E Fowler Ave, Tampa, FL, 33620, United States, 1 813 396 0962, jtemplet@usf.edu %K digital health %K Parkinson disease %K disease classification %K wearables %K personalized medicine %K neurocognition %K artificial intelligence %K AI %D 2025 %7 4.4.2025 %9 Viewpoint %J J Med Internet Res %G English %X Due to the complicated nature of Parkinson disease (PD), a number of subjective considerations (eg, staging schemes, clinical assessment tools, or questionnaires) on how best to assess clinical deficits and monitor clinical progression have been published; however, none of these considerations include a comprehensive, objective assessment of all functional areas of neurocognition affected by PD (eg, motor, memory, speech, language, executive function, autonomic function, sensory function, behavior, and sleep). This paper highlights the increasing use of digital health technology (eg, smartphones, tablets, and wearable devices) for the classification, staging, and monitoring of PD. Furthermore, this Viewpoint proposes a foundation for a new staging schema that builds from multiple clinically implemented scales (eg, Hoehn and Yahr Scale and Berg Balance Scale) for ease and homogeneity, while also implementing digital health technology to expand current staging protocols. This proposed staging system foundation aims to provide an objective, symptom-specific assessment of all functional areas of neurocognition via inherent device capabilities (eg, device sensors and human-device interactions). As individuals with PD may manifest different symptoms at different times across the spectrum of neurocognition, the modernization of assessments that include objective, symptom-specific monitoring is imperative for providing personalized medicine and maintaining individual quality of life. %M 40184612 %R 10.2196/63105 %U https://www.jmir.org/2025/1/e63105 %U https://doi.org/10.2196/63105 %U http://www.ncbi.nlm.nih.gov/pubmed/40184612 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59591 %T Public Awareness of and Attitudes Toward the Use of AI in Pathology Research and Practice: Mixed Methods Study %A Lewis,Claire %A Groarke,Jenny %A Graham-Wisener,Lisa %A James,Jacqueline %+ School of Medicine Dentistry and Biomedical Sciences, Queen's University Belfast, University Road, Belfast, BT7 1NN, United Kingdom, 44 2890972804, claire.lewis@qub.ac.uk %K artificial intelligence %K AI %K public opinion %K pathology %K health care %K public awareness %K survey %D 2025 %7 2.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The last decade has witnessed major advances in the development of artificial intelligence (AI) technologies for use in health care. One of the most promising areas of research that has potential clinical utility is the use of AI in pathology to aid cancer diagnosis and management. While the value of using AI to improve the efficiency and accuracy of diagnosis cannot be underestimated, there are challenges in the development and implementation of such technologies. Notably, questions remain about public support for the use of AI to assist in pathological diagnosis and for the use of health care data, including data obtained from tissue samples, to train algorithms. Objective: This study aimed to investigate public awareness of and attitudes toward AI in pathology research and practice. Methods: A nationally representative, cross-sectional, web-based mixed methods survey (N=1518) was conducted to assess the UK public’s awareness of and views on the use of AI in pathology research and practice. Respondents were recruited via Prolific, an online research platform. To be eligible for the study, participants had to be aged >18 years, be UK residents, and have the capacity to express their own opinion. Respondents answered 30 closed-ended questions and 2 open-ended questions. Sociodemographic information and previous experience with cancer were collected. Descriptive and inferential statistics were used to analyze quantitative data; qualitative data were analyzed thematically. Results: Awareness was low, with only 23.19% (352/1518) of the respondents somewhat or moderately aware of AI being developed for use in pathology. Most did not support a diagnosis of cancer (908/1518, 59.82%) or a diagnosis based on biomarkers (694/1518, 45.72%) being made using AI only. However, most (1478/1518, 97.36%) supported diagnoses made by pathologists with AI assistance. The adjusted odds ratio (aOR) for supporting AI in cancer diagnosis and management was higher for men (aOR 1.34, 95% CI 1.02-1.75). Greater awareness (aOR 1.25, 95% CI 1.10-1.42), greater trust in data security and privacy protocols (aOR 1.04, 95% CI 1.01-1.07), and more positive beliefs (aOR 1.27, 95% CI 1.20-1.36) also increased support, whereas identifying more risks reduced the likelihood of support (aOR 0.80, 95% CI 0.73-0.89). In total, 3 main themes emerged from the qualitative data: bringing the public along, the human in the loop, and more hard evidence needed, indicating conditional support for AI in pathology with human decision-making oversight, robust measures for data handling and protection, and evidence for AI benefit and effectiveness. Conclusions: Awareness of AI’s potential use in pathology was low, but attitudes were positive, with high but conditional support. Challenges remain, particularly among women, regarding AI use in cancer diagnosis and management. Apprehension persists about the access to and use of health care data by private organizations. %M 40173441 %R 10.2196/59591 %U https://www.jmir.org/2025/1/e59591 %U https://doi.org/10.2196/59591 %U http://www.ncbi.nlm.nih.gov/pubmed/40173441 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67922 %T AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis %A Xu,He-Li %A Li,Xiao-Ying %A Jia,Ming-Qian %A Ma,Qi-Peng %A Zhang,Ying-Hua %A Liu,Fang-Hua %A Qin,Ying %A Chen,Yu-Han %A Li,Yu %A Chen,Xi-Yang %A Xu,Yi-Lin %A Li,Dong-Run %A Wang,Dong-Dong %A Huang,Dong-Hui %A Xiao,Qian %A Zhao,Yu-Hong %A Gao,Song %A Qin,Xue %A Tao,Tao %A Gong,Ting-Ting %A Wu,Qi-Jun %+ Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, No. 36, San Hao Street, ShenYang, 110004, China, 86 024 96615 13652, wuqj@sj-hospital.org %K artificial intelligence %K AI %K blood biomarker %K ovarian cancer %K diagnosis %K PRISMA %D 2025 %7 24.3.2025 %9 Review %J J Med Internet Res %G English %X Background: Emerging evidence underscores the potential application of artificial intelligence (AI) in discovering noninvasive blood biomarkers. However, the diagnostic value of AI-derived blood biomarkers for ovarian cancer (OC) remains inconsistent. Objective: We aimed to evaluate the research quality and the validity of AI-based blood biomarkers in OC diagnosis. Methods: A systematic search was performed in the MEDLINE, Embase, IEEE Xplore, PubMed, Web of Science, and the Cochrane Library databases. Studies examining the diagnostic accuracy of AI in discovering OC blood biomarkers were identified. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies–AI tool. Pooled sensitivity, specificity, and area under the curve (AUC) were estimated using a bivariate model for the diagnostic meta-analysis. Results: A total of 40 studies were ultimately included. Most (n=31, 78%) included studies were evaluated as low risk of bias. Overall, the pooled sensitivity, specificity, and AUC were 85% (95% CI 83%-87%), 91% (95% CI 90%-92%), and 0.95 (95% CI 0.92-0.96), respectively. For contingency tables with the highest accuracy, the pooled sensitivity, specificity, and AUC were 95% (95% CI 90%-97%), 97% (95% CI 95%-98%), and 0.99 (95% CI 0.98-1.00), respectively. Stratification by AI algorithms revealed higher sensitivity and specificity in studies using machine learning (sensitivity=85% and specificity=92%) compared to those using deep learning (sensitivity=77% and specificity=85%). In addition, studies using serum reported substantially higher sensitivity (94%) and specificity (96%) than those using plasma (sensitivity=83% and specificity=91%). Stratification by external validation demonstrated significantly higher specificity in studies with external validation (specificity=94%) compared to those without external validation (specificity=89%), while the reverse was observed for sensitivity (74% vs 90%). No publication bias was detected in this meta-analysis. Conclusions: AI algorithms demonstrate satisfactory performance in the diagnosis of OC using blood biomarkers and are anticipated to become an effective diagnostic modality in the future, potentially avoiding unnecessary surgeries. Future research is warranted to incorporate external validation into AI diagnostic models, as well as to prioritize the adoption of deep learning methodologies. Trial Registration: PROSPERO CRD42023481232; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023481232 %M 40126546 %R 10.2196/67922 %U https://www.jmir.org/2025/1/e67922 %U https://doi.org/10.2196/67922 %U http://www.ncbi.nlm.nih.gov/pubmed/40126546 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e60215 %T Customizing Computerized Adaptive Test Stopping Rules for Clinical Settings Using the Negative Affect Subdomain of the NIH Toolbox Emotion Battery: Simulation Study %A Amagai,Saki %A Kaat,Aaron J %A Fox,Rina S %A Ho,Emily H %A Pila,Sarah %A Kallen,Michael A %A Schalet,Benjamin D %A Nowinski,Cindy J %A Gershon,Richard C %K computerized adaptive testing %K CAT %K stopping rules %K NIH Toolbox %K reliability %K test burden %K clinical setting %K patient-reported outcome %K clinician %D 2025 %7 21.3.2025 %9 %J JMIR Form Res %G English %X Background: Patient-reported outcome measures are crucial for informed medical decisions and evaluating treatments. However, they can be burdensome for patients and sometimes lack the reliability clinicians need for clear clinical interpretations. Objective: We aimed to assess the extent to which applying alternative stopping rules can increase reliability for clinical use while minimizing the burden of computerized adaptive tests (CATs). Methods: CAT simulations were conducted on 3 adult item banks in the NIH Toolbox for Assessment of Neurological and Behavioral Function Emotion Battery; the item banks were in the Negative Affect subdomain (ie, Anger Affect, Fear Affect, and Sadness) and contained at least 8 items. In the originally applied NIH Toolbox CAT stopping rules, the CAT was stopped if the score SE reached <0.3 before 12 items were administered. We first contrasted this with a SE-change rule in a planned simulation analysis. We then contrasted the original rules with fixed-length CATs (4‐12 items), a reduction of the maximum number of items to 8, and other modifications in post hoc analyses. Burden was measured by the number of items administered per simulation, precision by the percentage of assessments yielding reliability cutoffs (0.85, 0.90, and 0.95), and accurate score recovery by the root mean squared error between the generating θ and the CAT-estimated “expected a posteriori”–based θ. Results: In general, relative to the original rules, the alternative stopping rules slightly decreased burden while also increasing the proportion of assessments achieving high reliability for the adult banks; however, the SE-change rule and fixed-length CATs with 8 or fewer items also notably increased assessments yielding reliability <0.85. Among the alternative rules explored, the reduced maximum stopping rule best balanced precision and parsimony, presenting another option beyond the original rules. Conclusions: Our findings demonstrate the challenges in attempting to reduce test burden while also achieving score precision for clinical use. Stopping rules should be modified in accordance with the context of the study population and the purpose of the study. %R 10.2196/60215 %U https://formative.jmir.org/2025/1/e60215 %U https://doi.org/10.2196/60215 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e67299 %T Assessing the Diagnostic Accuracy of ChatGPT-4 in Identifying Diverse Skin Lesions Against Squamous and Basal Cell Carcinoma %A Chetla,Nitin %A Chen,Matthew %A Chang,Joseph %A Smith,Aaron %A Hage,Tamer Rajai %A Patel,Romil %A Gardner,Alana %A Bryer,Bridget %K chatbot %K ChatGPT %K ChatGPT-4 %K squamous cell carcinoma %K basal cell carcinoma %K skin cancer %K skin cancer detection %K dermatoscopic image analysis %K skin lesion differentiation %K dermatologist %K machine learning %K ML %K artificial intelligence %K AI %K AI in dermatology %K algorithm %K model %K analytics %K diagnostic accuracy %D 2025 %7 21.3.2025 %9 %J JMIR Dermatol %G English %X Our study evaluates the diagnostic accuracy of ChatGPT-4o in classifying various skin lesions, highlighting its limitations in distinguishing squamous cell carcinoma from basal cell carcinoma using dermatoscopic images. %R 10.2196/67299 %U https://derma.jmir.org/2025/1/e67299 %U https://doi.org/10.2196/67299 %0 Journal Article %@ 2563-6316 %I JMIR Publications %V 6 %N %P e65263 %T Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance %A Mansoor,Masab %A Ibrahim,Andrew F %A Grindem,David %A Baig,Asad %K natural language processing %K NLP %K machine learning %K ML %K artificial intelligence %K language model %K large language model %K LLM %K generative pretrained transformer %K GPT %K pediatrics %D 2025 %7 19.3.2025 %9 %J JMIRx Med %G English %X Background: Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis. Objective: This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings. Methods: This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0‐18 years; n=261, 52.2% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses. Results: The GPT-3 model achieved an accuracy of 87.3% (131/150 cases), sensitivity of 85% (95% CI 82%‐88%), and specificity of 90% (95% CI 87%‐93%), comparable to pediatricians’ accuracy of 91.3% (137/150 cases; P=.47). Performance was consistent across age groups (0‐5 years: 54/62, 87%; 6‐12 years: 47/53, 89%; 13‐18 years: 30/35, 86%) and common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80%) but comparable to pediatricians (17/20, 85%; P=.62). Conclusions: This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation. %R 10.2196/65263 %U https://xmed.jmir.org/2025/1/e65263 %U https://doi.org/10.2196/65263 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63962 %T Emotion Forecasting: A Transformer-Based Approach %A Paz-Arbaizar,Leire %A Lopez-Castroman,Jorge %A Artés-Rodríguez,Antonio %A Olmos,Pablo M %A Ramírez,David %+ Signal Theory and Communications Department, Universidad Carlos III de Madrid, Av. de la Universidad, 30, Leganés, 28911, Spain, 34 91 624 9157, lpaz@pa.uc3m.es %K affect %K emotional valence %K machine learning %K mental disorder %K monitoring %K mood %K passive data %K Patient Health Questionnaire-9 %K PHQ-9 %K psychological distress %K time-series forecasting %D 2025 %7 18.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Monitoring the emotional states of patients with psychiatric problems has always been challenging due to the noncontinuous nature of clinical assessments, the effect of the health care environment, and the inherent subjectivity of evaluation instruments. However, mental states in psychiatric disorders exhibit substantial variability over time, making real-time monitoring crucial for preventing risky situations and ensuring appropriate treatment. Objective: This study aimed to leverage new technologies and deep learning techniques to enable more objective, real-time monitoring of patients. This was achieved by passively monitoring variables such as step count, patient location, and sleep patterns using mobile devices. We aimed to predict patient self-reports and detect sudden variations in their emotional valence, identifying situations that may require clinical intervention. Methods: Data for this project were collected using the Evidence-Based Behavior (eB2) app, which records both passive and self-reported variables daily. Passive data refer to behavioral information gathered via the eB2 app through sensors embedded in mobile devices and wearables. These data were obtained from studies conducted in collaboration with hospitals and clinics that used eB2. We used hidden Markov models (HMMs) to address missing data and transformer deep neural networks for time-series forecasting. Finally, classification algorithms were applied to predict several variables, including emotional state and responses to the Patient Health Questionnaire-9. Results: Through real-time patient monitoring, we demonstrated the ability to accurately predict patients’ emotional states and anticipate changes over time. Specifically, our approach achieved high accuracy (0.93) and a receiver operating characteristic (ROC) area under the curve (AUC) of 0.98 for emotional valence classification. For predicting emotional state changes 1 day in advance, we obtained an ROC AUC of 0.87. Furthermore, we demonstrated the feasibility of forecasting responses to the Patient Health Questionnaire-9, with particularly strong performance for certain questions. For example, in question 9, related to suicidal ideation, our model achieved an accuracy of 0.9 and an ROC AUC of 0.77 for predicting the next day’s response. Moreover, we illustrated the enhanced stability of multivariate time-series forecasting when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods, such as recurrent neural networks or long short-term memory cells. Conclusions: The stability of multivariate time-series forecasting improved when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods (eg, recurrent neural network and long short-term memory), leveraging the attention mechanisms to capture longer time dependencies and gain interpretability. We showed the potential to assess the emotional state of a patient and the scores of psychiatric questionnaires from passive variables in advance. This allows real-time monitoring of patients and hence better risk detection and treatment adjustment. %M 40101216 %R 10.2196/63962 %U https://www.jmir.org/2025/1/e63962 %U https://doi.org/10.2196/63962 %U http://www.ncbi.nlm.nih.gov/pubmed/40101216 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63895 %T The Perceptions of Potential Prerequisites for Artificial Intelligence in Danish General Practice: Vignette-Based Interview Study Among General Practitioners %A Jørgensen,Natasha Lee %A Merrild,Camilla Hoffmann %A Jensen,Martin Bach %A Moeslund,Thomas B %A Kidholm,Kristian %A Thomsen,Janus Laust %K general practice %K general practitioners %K GPs %K artificial intelligence %K AI %K prerequisites %K interviews %K vignettes %K qualitative study %K thematic analysis %D 2025 %7 12.3.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) has been deemed revolutionary in medicine; however, no AI tools have been implemented or validated in Danish general practice. General practice in Denmark has an excellent digitization system for developing and using AI. Nevertheless, there is a lack of involvement of general practitioners (GPs) in developing AI. The perspectives of GPs as end users are essential for facilitating the next stage of AI development in general practice. Objective: This study aimed to identify the essential prerequisites that GPs perceive as necessary to realize the potential of AI in Danish general practice. Methods: This study used semistructured interviews and vignettes among GPs to gain perspectives on the potential of AI in general practice. A total of 12 GPs interested in the potential of AI in general practice were interviewed in 2019 and 2021. The interviews were transcribed verbatim and thematic analysis was conducted to identify the dominant themes throughout the data. Results: In the data analysis, four main themes were identified as essential prerequisites for GPs when considering the potential of AI in general practice: (1) AI must begin with the low-hanging fruit, (2) AI must be meaningful in the GP’s work, (3) the GP-patient relationship must be maintained despite AI, and (4) AI must be a free, active, and integrated option in the electronic health record (EHR). These 4 themes suggest that the development of AI should initially focus on low-complexity tasks that do not influence patient interactions but facilitate GPs’ work in a meaningful manner as an integrated part of the EHR. Examples of this include routine and administrative tasks. Conclusions: The research findings outline the participating GPs’ perceptions of the essential prerequisites to consider when exploring the potential applications of AI in primary care settings. We believe that these perceptions of potential prerequisites can support the initial stages of future development and assess the suitability of existing AI tools for general practice. %R 10.2196/63895 %U https://medinform.jmir.org/2025/1/e63895 %U https://doi.org/10.2196/63895 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66760 %T Investigating Whether AI Will Replace Human Physicians and Understanding the Interplay of the Source of Consultation, Health-Related Stigma, and Explanations of Diagnoses on Patients’ Evaluations of Medical Consultations: Randomized Factorial Experiment %A Guo,Weiqi %A Chen,Yang %+ , School of Journalism and Communication, Renmin University of China, Room 720, Mingde Xinwen Building, No.59 Zhongguancun St., Haidian, Beijing, 100872, China, 86 1062514835, 20050022@ruc.edu.cn %K artificial intelligence %K AI %K medical artificial intelligence %K medical AI %K human–artificial intelligence interaction %K human-AI interaction %K medical consultation %K health-related stigma %K diagnosis explanation %K health communication %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The increasing use of artificial intelligence (AI) in medical diagnosis and consultation promises benefits such as greater accuracy and efficiency. However, there is little evidence to systematically test whether the ideal technological promises translate into an improved evaluation of the medical consultation from the patient’s perspective. This perspective is significant because AI as a technological solution does not necessarily improve patient confidence in diagnosis and adherence to treatment at the functional level, create meaningful interactions between the medical agent and the patient at the relational level, evoke positive emotions, or reduce the patient’s pessimism at the emotional level. Objective: This study aims to investigate, from a patient-centered perspective, whether AI or human-involved AI can replace the role of human physicians in diagnosis at the functional, relational, and emotional levels as well as how some health-related differences between human-AI and human-human interactions affect patients’ evaluations of the medical consultation. Methods: A 3 (consultation source: AI vs human-involved AI vs human) × 2 (health-related stigma: low vs high) × 2 (diagnosis explanation: without vs with explanation) factorial experiment was conducted with 249 participants. The main effects and interaction effects of the variables were examined on individuals’ functional, relational, and emotional evaluations of the medical consultation. Results: Functionally, people trusted the diagnosis of the human physician (mean 4.78-4.85, SD 0.06-0.07) more than medical AI (mean 4.34-4.55, SD 0.06-0.07) or human-involved AI (mean 4.39-4.56, SD 0.06-0.07; P<.001), but at the relational and emotional levels, there was no significant difference between human-AI and human-human interactions (P>.05). Health-related stigma had no significant effect on how people evaluated the medical consultation or contributed to preferring AI-powered systems over humans (P>.05); however, providing explanations of the diagnosis significantly improved the functional (P<.001), relational (P<.05), and emotional (P<.05) evaluations of the consultation for all 3 medical agents. Conclusions: The findings imply that at the current stage of AI development, people trust human expertise more than accurate AI, especially for decisions traditionally made by humans, such as medical diagnosis, supporting the algorithm aversion theory. Surprisingly, even for highly stigmatized diseases such as AIDS, where we assume anonymity and privacy are preferred in medical consultations, the dehumanization of AI does not contribute significantly to the preference for AI-powered medical agents over humans, suggesting that instrumental needs of diagnosis override patient privacy concerns. Furthermore, explaining the diagnosis effectively improves treatment adherence, strengthens the physician-patient relationship, and fosters positive emotions during the consultation. This provides insights for the design of AI medical agents, which have long been criticized for lacking transparency while making highly consequential decisions. This study concludes by outlining theoretical contributions to research on health communication and human-AI interaction and discusses the implications for the design and application of medical AI. %M 40053785 %R 10.2196/66760 %U https://www.jmir.org/2025/1/e66760 %U https://doi.org/10.2196/66760 %U http://www.ncbi.nlm.nih.gov/pubmed/40053785 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e54625 %T Predicting Overall Survival in Patients with Male Breast Cancer: Nomogram Development and External Validation Study %A Tang,Wen-Zhen %A Mo,Shu-Tian %A Xie,Yuan-Xi %A Wei,Tian-Fu %A Chen,Guo-Lian %A Teng,Yan-Juan %A Jia,Kui %K male breast cancer %K specific survival %K prediction model %K nomogram %K Surveillance, Epidemiology, and End Results database %K SEER database %D 2025 %7 4.3.2025 %9 %J JMIR Cancer %G English %X Background: Male breast cancer (MBC) is an uncommon disease. Few studies have discussed the prognosis of MBC due to its rarity. Objective: This study aimed to develop a nomogram to predict the overall survival of patients with MBC and externally validate it using cases from China. Methods: Based on the Surveillance, Epidemiology, and End Results (SEER) database, male patients who were diagnosed with breast cancer between January 2010, and December 2015, were enrolled. These patients were randomly assigned to either a training set (n=1610) or a validation set (n=713) in a 7:3 ratio. Additionally, 22 MBC cases diagnosed at the First Affiliated Hospital of Guangxi Medical University between January 2013 and June 2021 were used for external validation, with the follow-up endpoint being June 10, 2023. Cox regression analysis was performed to identify significant risk variables and construct a nomogram to predict the overall survival of patients with MBC. Information collected from the test set was applied to validate the model. The concordance index (C-index), receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and a Kaplan-Meier survival curve were used to evaluate the accuracy and reliability of the model. Results: A total of 2301 patients with MBC in the SEER database and 22 patients with MBC from the study hospital were included. The predictive model included 7 variables: age (hazard ratio [HR] 1.89, 95% CI 1.50‐2.38), surgery (HR 0.38, 95% CI 0.29‐0.51), marital status (HR 0.75, 95% CI 0.63‐0.89), tumor stage (HR 1.17, 95% CI 1.05‐1.29), clinical stage (HR 1.41, 95% CI 1.15‐1.74), chemotherapy (HR 0.62, 95% CI 0.50‐0.75), and HER2 status (HR 2.68, 95% CI 1.20‐5.98). The C-index was 0.72, 0.747, and 0.981 in the training set, internal validation set, and external validation set, respectively. The nomogram showed accurate calibration, and the ROC curve confirmed the advantage of the model in clinical validity. The DCA analysis indicated that the model had good clinical applicability. Furthermore, the nomogram classification allowed for more accurate differentiation of risk subgroups, and patients with low-risk MBC demonstrated substantially improved survival outcomes compared with medium- and high-risk patients (P<.001). Conclusions: A survival prognosis prediction nomogram with 7 variables for patients with MBC was constructed in this study. The model can predict the survival outcome of these patients and provide a scientific basis for clinical diagnosis and treatment. %R 10.2196/54625 %U https://cancer.jmir.org/2025/1/e54625 %U https://doi.org/10.2196/54625 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e53892 %T Future Use of AI in Diagnostic Medicine: 2-Wave Cross-Sectional Survey Study %A Cabral,Bernardo Pereira %A Braga,Luiza Amara Maciel %A Conte Filho,Carlos Gilbert %A Penteado,Bruno %A Freire de Castro Silva,Sandro Luis %A Castro,Leonardo %A Fornazin,Marcelo %A Mota,Fabio %+ Cellular Communication Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Avenida Brasil, 4365, Manguinhos, Rio de Janeiro, 21040-900, Brazil, 55 2125984220, fabio.mota@fiocruz.br %K artificial intelligence %K AI %K diagnostic medicine %K survey research %K researcher opinion %K future %D 2025 %7 27.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The rapid evolution of artificial intelligence (AI) presents transformative potential for diagnostic medicine, offering opportunities to enhance diagnostic accuracy, reduce costs, and improve patient outcomes. Objective: This study aimed to assess the expected future impact of AI on diagnostic medicine by comparing global researchers’ expectations using 2 cross-sectional surveys. Methods: The surveys were conducted in September 2020 and February 2023. Each survey captured a 10-year projection horizon, gathering insights from >3700 researchers with expertise in AI and diagnostic medicine from all over the world. The survey sought to understand the perceived benefits, integration challenges, and evolving attitudes toward AI use in diagnostic settings. Results: Results indicated a strong expectation among researchers that AI will substantially influence diagnostic medicine within the next decade. Key anticipated benefits include enhanced diagnostic reliability, reduced screening costs, improved patient care, and decreased physician workload, addressing the growing demand for diagnostic services outpacing the supply of medical professionals. Specifically, x-ray diagnosis, heart rhythm interpretation, and skin malignancy detection were identified as the diagnostic tools most likely to be integrated with AI technologies due to their maturity and existing AI applications. The surveys highlighted the growing optimism regarding AI’s ability to transform traditional diagnostic pathways and enhance clinical decision-making processes. Furthermore, the study identified barriers to the integration of AI in diagnostic medicine. The primary challenges cited were the difficulties of embedding AI within existing clinical workflows, ethical and regulatory concerns, and data privacy issues. Respondents emphasized uncertainties around legal responsibility and accountability for AI-supported clinical decisions, data protection challenges, and the need for robust regulatory frameworks to ensure safe AI deployment. Ethical concerns, particularly those related to algorithmic transparency and bias, were noted as increasingly critical, reflecting a heightened awareness of the potential risks associated with AI adoption in clinical settings. Differences between the 2 survey waves indicated a growing focus on ethical and regulatory issues, suggesting an evolving recognition of these challenges over time. Conclusions: Despite these barriers, there was notable consistency in researchers’ expectations across the 2 survey periods, indicating a stable and sustained outlook on AI’s transformative potential in diagnostic medicine. The findings show the need for interdisciplinary collaboration among clinicians, AI developers, and regulators to address ethical and practical challenges while maximizing AI’s benefits. This study offers insights into the projected trajectory of AI in diagnostic medicine, guiding stakeholders, including health care providers, policy makers, and technology developers, on navigating the opportunities and challenges of AI integration. %M 40053779 %R 10.2196/53892 %U https://www.jmir.org/2025/1/e53892 %U https://doi.org/10.2196/53892 %U http://www.ncbi.nlm.nih.gov/pubmed/40053779 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67010 %T Stroke Diagnosis and Prediction Tool Using ChatGLM: Development and Validation Study %A Song,Xiaowei %A Wang,Jiayi %A He,Feifei %A Yin,Wei %A Ma,Weizhi %A Wu,Jian %+ Department of Neurology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, No.168 of Litang Road, Beijing, 102218, China, 86 01056118918, wujianxuanwu@126.com %K stroke %K diagnosis %K large language model %K ChatGLM %K generative language model %K primary care %K acute stroke %K prediction tool %K stroke detection %K treatment %K electronic health records %K noncontrast computed tomography %D 2025 %7 26.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Stroke is a globally prevalent disease that imposes a significant burden on health care systems and national economies. Accurate and rapid stroke diagnosis can substantially increase reperfusion rates, mitigate disability, and reduce mortality. However, there are considerable discrepancies in the diagnosis and treatment of acute stroke. Objective: The aim of this study is to develop and validate a stroke diagnosis and prediction tool using ChatGLM-6B, which uses free-text information from electronic health records in conjunction with noncontrast computed tomography (NCCT) reports to enhance stroke detection and treatment. Methods: A large language model (LLM) using ChatGLM-6B was proposed to facilitate stroke diagnosis by identifying optimal input combinations, using external tools, and applying instruction tuning and low-rank adaptation (LoRA) techniques. A dataset containing details of 1885 patients with and those without stroke from 2016 to 2024 was used for training and internal validation; another 335 patients from two hospitals were used as an external test set, including 230 patients from the training hospital but admitted at different periods, and 105 patients from another hospital. Results: The LLM, which is based on clinical notes and NCCT, demonstrates exceptionally high accuracy in stroke diagnosis, achieving 99% in the internal validation dataset and 95.5% and 79.1% in two external test cohorts. It effectively distinguishes between ischemia and hemorrhage, with an accuracy of 100% in the validation dataset and 99.1% and 97.1% in the other test cohorts. In addition, it identifies large vessel occlusions (LVO) with an accuracy of 80% in the validation dataset and 88.6% and 83.3% in the other test cohorts. Furthermore, it screens patients eligible for intravenous thrombolysis (IVT) with an accuracy of 89.4% in the validation dataset and 60% and 80% in the other test cohorts. Conclusions: We developed an LLM that leverages clinical text and NCCT to identify strokes and guide recanalization therapy. While our results necessitate validation through widespread deployment, they hold the potential to enhance stroke identification and reduce reperfusion time. %M 40009850 %R 10.2196/67010 %U https://www.jmir.org/2025/1/e67010 %U https://doi.org/10.2196/67010 %U http://www.ncbi.nlm.nih.gov/pubmed/40009850 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e55492 %T Complete Blood Count and Monocyte Distribution Width–Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study %A Campagner,Andrea %A Agnello,Luisa %A Carobene,Anna %A Padoan,Andrea %A Del Ben,Fabio %A Locatelli,Massimo %A Plebani,Mario %A Ognibene,Agostino %A Lorubbio,Maria %A De Vecchi,Elena %A Cortegiani,Andrea %A Piva,Elisa %A Poz,Donatella %A Curcio,Francesco %A Cabitza,Federico %A Ciaccio,Marcello %+ Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo, 1, Milano, 20126, Italy, 39 0264487888, federico.cabitza@unimib.it %K sepsis %K medical machine learning %K external validation %K complete blood count %K controllable AI %K machine learning %K artificial intelligence %K development study %K validation study %K organ %K organ dysfunction %K detection %K clinical signs %K clinical symptoms %K biomarker %K diagnostic %K machine learning model %K sepsis detection %K early detection %K data distribution %D 2025 %7 26.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improving the patient outcome. Laboratory medicine can play a crucial role by providing biomarkers whose alteration can be detected before the onset of clinical signs and symptoms. In particular, the relevance of monocyte distribution width (MDW) as a sepsis biomarker has emerged in the previous decade. However, despite encouraging results, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. Objective: This study aims to investigate the use of machine learning (ML) to overcome the limitations mentioned earlier by combining different parameters and therefore improving sepsis detection. However, making ML models function in clinical practice may be problematic, as their performance may suffer when deployed in contexts other than the research environment. In fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. Methods: In this multicentric study, we developed ML models whose intended use is the early detection of sepsis on the basis of MDW and complete blood count parameters. In total, data from 6 patient cohorts (encompassing 5344 patients) collected at 5 different Italian hospitals were used to train and externally validate ML models. The models were trained on a patient cohort encompassing patients enrolled at the emergency department, and it was externally validated on 5 different cohorts encompassing patients enrolled at both the emergency department and the intensive care unit. The cohorts were selected to exhibit a variety of data distribution shifts compared to the training set, including label, covariate, and missing data shifts, enabling a conservative validation of the developed models. To improve generalizability and robustness to different types of distribution shifts, the developed ML models combine traditional methodologies with advanced techniques inspired by controllable artificial intelligence (AI), namely cautious classification, which gives the ML models the ability to abstain from making predictions, and explainable AI, which provides health operators with useful information about the models’ functioning. Results: The developed models achieved good performance on the internal validation (area under the receiver operating characteristic curve between 0.91 and 0.98), as well as consistent generalization performance across the external validation datasets (area under the receiver operating characteristic curve between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance and were used to derive an interpretable set of diagnostic rules. Conclusions: Our findings demonstrate how controllable AI approaches based on complete blood count and MDW may be used for the early detection of sepsis while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts. %M 40009841 %R 10.2196/55492 %U https://www.jmir.org/2025/1/e55492 %U https://doi.org/10.2196/55492 %U http://www.ncbi.nlm.nih.gov/pubmed/40009841 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62851 %T Artificial Intelligence in Lymphoma Histopathology: Systematic Review %A Fu,Yao %A Huang,Zongyao %A Deng,Xudong %A Xu,Linna %A Liu,Yang %A Zhang,Mingxing %A Liu,Jinyi %A Huang,Bin %+ Department of Pathology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, South Renmin Road, Chengdu, 610041, China, 86 18236170185, 18236170185@163.com %K lymphoma %K artificial intelligence %K bias %K histopathology %K tumor %K hematological %K lymphatic disease %K public health %K pathologists %K pathology %K immunohistochemistry %K diagnosis %K prognosis %D 2025 %7 14.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3% to 100%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. %M 39951716 %R 10.2196/62851 %U https://www.jmir.org/2025/1/e62851 %U https://doi.org/10.2196/62851 %U http://www.ncbi.nlm.nih.gov/pubmed/39951716 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59015 %T A New Computer-Based Cognitive Measure for Early Detection of Dementia Risk (Japan Cognitive Function Test): Validation Study %A Shimada,Hiroyuki %A Doi,Takehiko %A Tsutsumimoto,Kota %A Makino,Keitaro %A Harada,Kenji %A Tomida,Kouki %A Morikawa,Masanori %A Makizako,Hyuma %+ Department of Preventive Gerontology, Centre for Gerontology and Social Science, National Center for Geriatrics and Gerontology, 7-430, Morioka-cho, Ōbu, 474-8511, Japan, 81 0562 46 2311, shimada@ncgg.go.jp %K cognition %K neurocognitive test %K dementia %K Alzheimer disease %K aged %K MMSE %K cognitive impairment %K Mini-Mental State Examination %K monitoring %K eHealth %D 2025 %7 14.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The emergence of disease-modifying treatment options for Alzheimer disease is creating a paradigm shift in strategies to identify patients with mild symptoms in primary care settings. Systematic reviews on digital cognitive tests reported that most showed diagnostic performance comparable with that of paper-and-pencil tests for mild cognitive impairment and dementia. However, most studies have small sample sizes, with fewer than 100 individuals, and are based on case-control or cross-sectional designs. Objective: This study aimed to examine the predictive validity of the Japanese Cognitive Function Test (J-Cog), a new computerized cognitive battery test, for dementia development. Methods: We randomly assigned 2520 older adults (average age 72.7, SD 6.7 years) to derivation and validation groups to determine and validate cutoff points for the onset of dementia. The Mini-Mental State Examination (MMSE) was used for comparison purposes. The J-Cog consists of 12 tasks that assess orientation, designation, attention and calculation, mental rotation, verbal fluency, sentence completion, working memory, logical reasoning, attention, common knowledge, word memory recall, and episodic memory recall. The onset of dementia was monitored for 60 months. In the derivation group, receiver operating characteristic curves were plotted to determine the MMSE and J-Cog cutoff points that best discriminated between the groups with and without dementia. In the validation group, Cox proportional regression models were developed to predict the associations of the group classified using the cutoff points of the J-Cog or MMSE with dementia incidence. Harrell C-statistic was estimated to summarize how well a predicted risk score described an observed sequence of events. The Akaike information criterion was calculated for relative goodness of fit, where lower absolute values indicate a better model fit. Results: Significant hazard ratios (HRs) for dementia incidence were found using the MMSE cutoff between 23 and 24 point (HR 1.93, 95% CI 1.13-3.27) and the J-Cog cutoff between 43 and 44 points (HR 2.42, 95% CI 1.50-3.93). In the total validation group, the C-statistic was above 0.8 for all cutoff points. Akaike information criterion with MMSE cutoff between 23 and 24 points as a reference showed a poor fit for MMSE cutoff between 28 and 29 points, and a good fit for the J-Cog cutoff between 43 and 44 points. Conclusions: The J-Cog has higher accuracy in predicting the development of dementia than the MMSE and has advantages for use in the community as a test of cognitive function, which can be administered by nonprofessionals. %M 39951718 %R 10.2196/59015 %U https://www.jmir.org/2025/1/e59015 %U https://doi.org/10.2196/59015 %U http://www.ncbi.nlm.nih.gov/pubmed/39951718 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 4 %N %P e64624 %T Exploring Speech Biosignatures for Traumatic Brain Injury and Neurodegeneration: Pilot Machine Learning Study %A Rubaiat,Rahmina %A Templeton,John Michael %A Schneider,Sandra L %A De Silva,Upeka %A Madanian,Samaneh %A Poellabauer,Christian %K speech biosignatures %K speech feature analysis %K amyotrophic lateral sclerosis %K ALS %K neurodegenerative disease %K Parkinson's disease %K detection %K speech %K neurological %K traumatic brain injury %K concussion %K mobile device %K digital health %K machine learning %K mobile health %K diagnosis %K mobile phone %D 2025 %7 12.2.2025 %9 %J JMIR Neurotech %G English %X Background: Speech features are increasingly linked to neurodegenerative and mental health conditions, offering the potential for early detection and differentiation between disorders. As interest in speech analysis grows, distinguishing between conditions becomes critical for reliable diagnosis and assessment. Objective: This pilot study explores speech biosignatures in two distinct neurodegenerative conditions: (1) mild traumatic brain injuries (eg, concussions) and (2) Parkinson disease (PD) as the neurodegenerative condition. Methods: The study included speech samples from 235 participants (97 concussed and 94 age-matched healthy controls, 29 PD and 15 healthy controls) for the PaTaKa test and 239 participants (91 concussed and 104 healthy controls, 29 PD and 15 healthy controls) for the Sustained Vowel (/ah/) test. Age-matched healthy controls were used. Young age-matched controls were used for concussion and respective age-matched controls for neurodegenerative participants (15 healthy samples for both tests). Data augmentation with noise was applied to balance small datasets for neurodegenerative and healthy controls. Machine learning models (support vector machine, decision tree, random forest, and Extreme Gradient Boosting) were employed using 37 temporal and spectral speech features. A 5-fold stratified cross-validation was used to evaluate classification performance. Results: For the PaTaKa test, classifiers performed well, achieving F1-scores above 0.9 for concussed versus healthy and concussed versus neurodegenerative classifications across all models. Initial tests using the original dataset for neurodegenerative versus healthy classification yielded very poor results, with F1-scores below 0.2 and accuracy under 30% (eg, below 12 out of 44 correctly classified samples) across all models. This underscored the need for data augmentation, which significantly improved performance to 60%‐70% (eg, 26‐31 out of 44 samples) accuracy. In contrast, the Sustained Vowel test showed mixed results; F1-scores remained high (more than 0.85 across all models) for concussed versus neurodegenerative classifications but were significantly lower for concussed versus healthy (0.59‐0.62) and neurodegenerative versus healthy (0.33‐0.77), depending on the model. Conclusions: This study highlights the potential of speech features as biomarkers for neurodegenerative conditions. The PaTaKa test exhibited strong discriminative ability, especially for concussed versus neurodegenerative and concussed versus healthy tasks, whereas challenges remain for neurodegenerative versus healthy classification. These findings emphasize the need for further exploration of speech-based tools for differential diagnosis and early identification in neurodegenerative health. %R 10.2196/64624 %U https://neuro.jmir.org/2025/1/e64624 %U https://doi.org/10.2196/64624 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e63149 %T Harnessing Internet Search Data as a Potential Tool for Medical Diagnosis: Literature Review %A Downing,Gregory J %A Tramontozzi,Lucas M %A Garcia,Jackson %A Villanueva,Emma %+ Innovation Horizons, Inc, 2819 27th Street, NW, Washington, DC, 20008, United States, 1 (301) 675 1346, gregory.downing@innovationhorizons.net %K health %K informatics %K internet search data %K early diagnosis %K web search %K information technology %K internet %K machine learning %K medical records %K diagnosis %K health care %K self-diagnosis %K detection %K intervention %K patient education %K internet search %K health-seeking behavior %K artificial intelligence %K AI %D 2025 %7 11.2.2025 %9 Review %J JMIR Ment Health %G English %X Background: The integration of information technology into health care has created opportunities to address diagnostic challenges. Internet searches, representing a vast source of health-related data, hold promise for improving early disease detection. Studies suggest that patterns in search behavior can reveal symptoms before clinical diagnosis, offering potential for innovative diagnostic tools. Leveraging advancements in machine learning, researchers have explored linking search data with health records to enhance screening and outcomes. However, challenges like privacy, bias, and scalability remain critical to its widespread adoption. Objective: We aimed to explore the potential and challenges of using internet search data in medical diagnosis, with a specific focus on diseases and conditions such as cancer, cardiovascular disease, mental and behavioral health, neurodegenerative disorders, and nutritional and metabolic diseases. We examined ethical, technical, and policy considerations while assessing the current state of research, identifying gaps and limitations, and proposing future research directions to advance this emerging field. Methods: We conducted a comprehensive analysis of peer-reviewed literature and informational interviews with subject matter experts to examine the landscape of internet search data use in medical research. We searched for published peer-reviewed literature on the PubMed database between October and December 2023. Results: Systematic selection based on predefined criteria included 40 articles from the 2499 identified articles. The analysis revealed a nascent domain of internet search data research in medical diagnosis, marked by advancements in analytics and data integration. Despite challenges such as bias, privacy, and infrastructure limitations, emerging initiatives could reshape data collection and privacy safeguards. Conclusions: We identified signals correlating with diagnostic considerations in certain diseases and conditions, indicating the potential for such data to enhance clinical diagnostic capabilities. However, leveraging internet search data for improved early diagnosis and health care outcomes requires effectively addressing ethical, technical, and policy challenges. By fostering interdisciplinary collaboration, advancing infrastructure development, and prioritizing patient engagement and consent, researchers can unlock the transformative potential of internet search data in medical diagnosis to ultimately enhance patient care and advance health care practice and policy. %M 39813106 %R 10.2196/63149 %U https://mental.jmir.org/2025/1/e63149 %U https://doi.org/10.2196/63149 %U http://www.ncbi.nlm.nih.gov/pubmed/39813106 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e64414 %T Physician Perspectives on the Potential Benefits and Risks of Applying Artificial Intelligence in Psychiatric Medicine: Qualitative Study %A Stroud,Austin M %A Curtis,Susan H %A Weir,Isabel B %A Stout,Jeremiah J %A Barry,Barbara A %A Bobo,William V %A Athreya,Arjun P %A Sharp,Richard R %+ Biomedical Ethics Program, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 507 538 6502, sharp.richard@mayo.edu %K artificial intelligence %K machine learning %K digital health %K mental health %K psychiatry %K depression %K interviews %K family medicine %K physicians %K qualitative %K providers %K attitudes %K opinions %K perspectives %K ethics %D 2025 %7 10.2.2025 %9 Original Paper %J JMIR Ment Health %G English %X Background: As artificial intelligence (AI) tools are integrated more widely in psychiatric medicine, it is important to consider the impact these tools will have on clinical practice. Objective: This study aimed to characterize physician perspectives on the potential impact AI tools will have in psychiatric medicine. Methods: We interviewed 42 physicians (21 psychiatrists and 21 family medicine practitioners). These interviews used detailed clinical case scenarios involving the use of AI technologies in the evaluation, diagnosis, and treatment of psychiatric conditions. Interviews were transcribed and subsequently analyzed using qualitative analysis methods. Results: Physicians highlighted multiple potential benefits of AI tools, including potential support for optimizing pharmaceutical efficacy, reducing administrative burden, aiding shared decision-making, and increasing access to health services, and were optimistic about the long-term impact of these technologies. This optimism was tempered by concerns about potential near-term risks to both patients and themselves including misguiding clinical judgment, increasing clinical burden, introducing patient harms, and creating legal liability. Conclusions: Our results highlight the importance of considering specialist perspectives when deploying AI tools in psychiatric medicine. %M 39928397 %R 10.2196/64414 %U https://mental.jmir.org/2025/1/e64414 %U https://doi.org/10.2196/64414 %U http://www.ncbi.nlm.nih.gov/pubmed/39928397 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e64396 %T The Efficacy of Conversational AI in Rectifying the Theory-of-Mind and Autonomy Biases: Comparative Analysis %A Rządeczka,Marcin %A Sterna,Anna %A Stolińska,Julia %A Kaczyńska,Paulina %A Moskalewicz,Marcin %+ Institute of Philosophy, Maria Curie-Skłodowska University, Pl. Marii Curie-Skłodowskiej 4, pok. 204, Lublin, 20-031, Poland, 48 815375481, marcin.rzadeczka@umcs.pl %K cognitive bias %K conversational artificial intelligence %K artificial intelligence %K AI %K chatbots %K digital mental health %K bias rectification %K affect recognition %D 2025 %7 7.2.2025 %9 Original Paper %J JMIR Ment Health %G English %X Background: The increasing deployment of conversational artificial intelligence (AI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases are particularly relevant in mental health contexts as they can exacerbate conditions such as depression and anxiety by reinforcing maladaptive thought patterns or unrealistic expectations in human-AI interactions. Objective: This study aimed to assess the effectiveness of therapeutic chatbots (Wysa and Youper) versus general-purpose language models (GPT-3.5, GPT-4, and Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods: This study used constructed case scenarios simulating typical user-bot interactions to examine how effectively chatbots address selected cognitive biases. The cognitive biases assessed included theory-of-mind biases (anthropomorphism, overtrust, and attribution) and autonomy biases (illusion of control, fundamental attribution error, and just-world hypothesis). Each chatbot response was evaluated based on accuracy, therapeutic quality, and adherence to cognitive behavioral therapy principles using an ordinal scale to ensure consistency in scoring. To enhance reliability, responses underwent a double review process by 2 cognitive scientists, followed by a secondary review by a clinical psychologist specializing in cognitive behavioral therapy, ensuring a robust assessment across interdisciplinary perspectives. Results: This study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, whereas the therapeutic bot Wysa scored the lowest. Notably, general-purpose bots showed more consistent accuracy and adaptability in recognizing and addressing bias-related cues across different contexts, suggesting a broader flexibility in handling complex cognitive patterns. In addition, in affect recognition tasks, general-purpose chatbots not only excelled but also demonstrated quicker adaptation to subtle emotional nuances, outperforming therapeutic bots in 67% (4/6) of the tested biases. Conclusions: This study shows that, while therapeutic chatbots hold promise for mental health support and cognitive bias intervention, their current capabilities are limited. Addressing cognitive biases in AI-human interactions requires systems that can both rectify and analyze biases as integral to human cognition, promoting precision and simulating empathy. The findings reveal the need for improved simulated emotional intelligence in chatbot design to provide adaptive, personalized responses that reduce overreliance and encourage independent coping skills. Future research should focus on enhancing affective response mechanisms and addressing ethical concerns such as bias mitigation and data privacy to ensure safe, effective AI-based mental health support. %M 39919295 %R 10.2196/64396 %U https://mental.jmir.org/2025/1/e64396 %U https://doi.org/10.2196/64396 %U http://www.ncbi.nlm.nih.gov/pubmed/39919295 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e53741 %T Mapping and Summarizing the Research on AI Systems for Automating Medical History Taking and Triage: Scoping Review %A Siira,Elin %A Johansson,Hanna %A Nygren,Jens %+ School of Health and Welfare, Halmstad University, Box 823, Halmstad, 301 18, Sweden, 46 70 692 46 13, elin.siira@hh.se %K scoping review %K artificial intelligence %K AI %K medical history taking %K triage %K health care %K automation %D 2025 %7 6.2.2025 %9 Review %J J Med Internet Res %G English %X Background: The integration of artificial intelligence (AI) systems for automating medical history taking and triage can significantly enhance patient flow in health care systems. Despite the promising performance of numerous AI studies, only a limited number of these systems have been successfully integrated into routine health care practice. To elucidate how AI systems can create value in this context, it is crucial to identify the current state of knowledge, including the readiness of these systems, the facilitators of and barriers to their implementation, and the perspectives of various stakeholders involved in their development and deployment. Objective: This study aims to map and summarize empirical research on AI systems designed for automating medical history taking and triage in health care settings. Methods: The study was conducted following the framework proposed by Arksey and O’Malley and adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guidelines. A comprehensive search of 5 databases—PubMed, CINAHL, PsycINFO, Scopus, and Web of Science—was performed. A detailed protocol was established before the review to ensure methodological rigor. Results: A total of 1248 research publications were identified and screened. Of these, 86 (6.89%) met the eligibility criteria. Notably, most (n=63, 73%) studies were published between 2020 and 2022, with a significant concentration on emergency care (n=32, 37%). Other clinical contexts included radiology (n=12, 14%) and primary care (n=6, 7%). Many (n=15, 17%) studies did not specify a clinical context. Most (n=31, 36%) studies used retrospective designs, while others (n=34, 40%) did not specify their methodologies. The predominant type of AI system identified was the hybrid model (n=68, 79%), with forecasting (n=40, 47%) and recognition (n=36, 42%) being the most common tasks performed. While most (n=70, 81%) studies included patient populations, only 1 (1%) study investigated patients’ views on AI-based medical history taking and triage, and 2 (2%) studies considered health care professionals’ perspectives. Furthermore, only 6 (7%) studies validated or demonstrated AI systems in relevant clinical settings through real-time model testing, workflow implementation, clinical outcome evaluation, or integration into practice. Most (n=76, 88%) studies were concerned with the prototyping, development, or validation of AI systems. In total, 4 (5%) studies were reviews of several empirical studies conducted in different clinical settings. The facilitators and barriers to AI system implementation were categorized into 4 themes: technical aspects, contextual and cultural considerations, end-user engagement, and evaluation processes. Conclusions: This review highlights current trends, stakeholder perspectives, stages of innovation development, and key influencing factors related to implementing AI systems in health care. The identified literature gaps regarding stakeholder perspectives and the limited research on AI systems for automating medical history taking and triage indicate significant opportunities for further investigation and development in this evolving field. %M 39913918 %R 10.2196/53741 %U https://www.jmir.org/2025/1/e53741 %U https://doi.org/10.2196/53741 %U http://www.ncbi.nlm.nih.gov/pubmed/39913918 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e50124 %T Barriers and Facilitators to the Preadoption of a Computer-Aided Diagnosis Tool for Cervical Cancer: Qualitative Study on Health Care Providers’ Perspectives in Western Cameroon %A Jonnalagedda-Cattin,Magali %A Moukam Datchoua,Alida Manoëla %A Yakam,Virginie Flore %A Kenfack,Bruno %A Petignat,Patrick %A Thiran,Jean-Philippe %A Schönenberger,Klaus %A Schmidt,Nicole C %+ Signal Processing Laboratory LTS5, Swiss Federal Institute of Technology Lausanne (EPFL), EPFL-STI-IEL-LTS5, Station 11, Lausanne, 1015, Switzerland, 41 21 693 97 77, magali.cattin@epfl.ch %K qualitative research %K technology acceptance %K cervical cancer %K diagnosis %K computer-assisted %K decision support systems %K artificial intelligence %K health personnel attitudes %K Cameroon %K mobile phone %D 2025 %7 5.2.2025 %9 Original Paper %J JMIR Cancer %G English %X Background: Computer-aided detection and diagnosis (CAD) systems can enhance the objectivity of visual inspection with acetic acid (VIA), which is widely used in low- and middle-income countries (LMICs) for cervical cancer detection. VIA’s reliance on subjective health care provider (HCP) interpretation introduces variability in diagnostic accuracy. CAD tools can address some limitations; nonetheless, understanding the contextual factors affecting CAD integration is essential for effective adoption and sustained use, particularly in resource-constrained settings. Objective: This study investigated the barriers and facilitators perceived by HCPs in Western Cameroon regarding sustained CAD tool use for cervical cancer detection using VIA. The aim was to guide smooth technology adoption in similar settings by identifying specific barriers and facilitators and optimizing CAD’s potential benefits while minimizing obstacles. Methods: The perspectives of HCPs on adopting CAD for VIA were explored using a qualitative methodology. The study participants included 8 HCPs (6 midwives and 2 gynecologists) working in the Dschang district, Cameroon. Focus group discussions were conducted with midwives, while individual interviews were conducted with gynecologists to comprehend unique perspectives. Each interview was audio-recorded, transcribed, and independently coded by 2 researchers using the ATLAS.ti (Lumivero, LLC) software. The technology acceptance lifecycle framework guided the content analysis, focusing on the preadoption phases to examine the perceived acceptability and initial acceptance of the CAD tool in clinical workflows. The study findings were reported adhering to the COREQ (Consolidated Criteria for Reporting Qualitative Research) and SRQR (Standards for Reporting Qualitative Research) checklists. Results: Key elements influencing the sustained use of CAD tools for VIA by HCPs were identified, primarily within the technology acceptance lifecycle’s preadoption framework. Barriers included the system’s ease of use, particularly challenges associated with image acquisition, concerns over confidentiality and data security, limited infrastructure and resources such as the internet and device quality, and potential workflow changes. Facilitators encompassed the perceived improved patient care, the potential for enhanced diagnostic accuracy, and the integration of CAD tools into routine clinical practices, provided that infrastructure and training were adequate. The HCPs emphasized the importance of clinical validation, usability testing, and iterative feedback mechanisms to build trust in the CAD tool’s accuracy and utility. Conclusions: This study provides practical insights from HCPs in Western Cameroon regarding the adoption of CAD tools for VIA in clinical settings. CAD technology can aid diagnostic objectivity; however, data management, workflow adaptation, and infrastructure limitations must be addressed to avoid “pilotitis”—the failure of digital health tools to progress beyond the pilot phase. Effective implementation requires comprehensive technology management, including regulatory compliance, infrastructure support, and user-focused training. Involving end users can ensure that CAD tools are fully integrated and embraced in LMICs to aid cervical cancer screening. %M 39908553 %R 10.2196/50124 %U https://cancer.jmir.org/2025/1/e50124 %U https://doi.org/10.2196/50124 %U http://www.ncbi.nlm.nih.gov/pubmed/39908553 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e66330 %T Estimating the Prevalence of Schizophrenia in the General Population of Japan Using an Artificial Neural Network–Based Schizophrenia Classifier: Web-Based Cross-Sectional Survey %A Choomung,Pichsinee %A He,Yupeng %A Matsunaga,Masaaki %A Sakuma,Kenji %A Kishi,Taro %A Li,Yuanying %A Tanihara,Shinichi %A Iwata,Nakao %A Ota,Atsuhiko %K schizophrenia %K schizophrenic %K prevalence %K artificial neural network %K neural network %K neural networks %K ANN %K deep learning %K machine learning %K SZ classifier %K web-based survey %K epidemiology %K epidemiological %K Japan %K classifiers %K mental illness %K mental disorder %K mental health %D 2025 %7 29.1.2025 %9 %J JMIR Form Res %G English %X Background: Estimating the prevalence of schizophrenia in the general population remains a challenge worldwide, as well as in Japan. Few studies have estimated schizophrenia prevalence in the Japanese population and have often relied on reports from hospitals and self-reported physician diagnoses or typical schizophrenia symptoms. These approaches are likely to underestimate the true prevalence owing to stigma, poor insight, or lack of access to health care among respondents. To address these issues, we previously developed an artificial neural network (ANN)–based schizophrenia classification model (SZ classifier) using data from a large-scale Japanese web-based survey to enhance the comprehensiveness of schizophrenia case identification in the general population. In addition, we also plan to introduce a population-based survey to collect general information and sample participants matching the population’s demographic structure, thereby achieving a precise estimate of the prevalence of schizophrenia in Japan. Objective: This study aimed to estimate the prevalence of schizophrenia by applying the SZ classifier to random samples from the Japanese population. Methods: We randomly selected a sample of 750 participants where the age, sex, and regional distributions were similar to Japan’s demographic structure from a large-scale Japanese web-based survey. Demographic data, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities were collected and applied to the SZ classifier, as this information was also used for developing the SZ classifier. The crude prevalence of schizophrenia was calculated through the proportion of positive cases detected by the SZ classifier. The crude estimate was further refined by excluding false-positive cases and including false-negative cases to determine the actual prevalence of schizophrenia. Results: Out of 750 participants, 62 were classified as schizophrenia cases by the SZ classifier, resulting in a crude prevalence of schizophrenia in the general population of Japan of 8.3% (95% CI 6.6%-10.1%). Among these 62 cases, 53 were presumed to be false positives, and 3 were presumed to be false negatives. After adjustment, the actual prevalence of schizophrenia in the general population was estimated to be 1.6% (95% CI 0.7%-2.5%). Conclusions: This estimated prevalence was slightly higher than that reported in previous studies, possibly due to a more comprehensive disease classification methodology or, conversely, model limitations. This study demonstrates the capability of an ANN-based model to improve the estimation of schizophrenia prevalence in the general population, offering a novel approach to public health analysis. %R 10.2196/66330 %U https://formative.jmir.org/2025/1/e66330 %U https://doi.org/10.2196/66330 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e64188 %T Urgency Prediction for Medical Laboratory Tests Through Optimal Sparse Decision Tree: Case Study With Echocardiograms %A Jiang,Yiqun %A Li,Qing %A Huang,Yu-Li %A Zhang,Wenli %+ Department of Information Systems and Business Analytics, Iowa State University, 2167 Union Drive, Ames, IA, 50011-2027, United States, 1 5152942469, wlzhang@iastate.edu %K interpretable machine learning %K urgency prediction %K appointment scheduling %K echocardiogram %K health care management %D 2025 %7 29.1.2025 %9 Original Paper %J JMIR AI %G English %X Background: In the contemporary realm of health care, laboratory tests stand as cornerstone components, driving the advancement of precision medicine. These tests offer intricate insights into a variety of medical conditions, thereby facilitating diagnosis, prognosis, and treatments. However, the accessibility of certain tests is hindered by factors such as high costs, a shortage of specialized personnel, or geographic disparities, posing obstacles to achieving equitable health care. For example, an echocardiogram is a type of laboratory test that is extremely important and not easily accessible. The increasing demand for echocardiograms underscores the imperative for more efficient scheduling protocols. Despite this pressing need, limited research has been conducted in this area. Objective: The study aims to develop an interpretable machine learning model for determining the urgency of patients requiring echocardiograms, thereby aiding in the prioritization of scheduling procedures. Furthermore, this study aims to glean insights into the pivotal attributes influencing the prioritization of echocardiogram appointments, leveraging the high interpretability of the machine learning model. Methods: Empirical and predictive analyses have been conducted to assess the urgency of patients based on a large real-world echocardiogram appointment dataset (ie, 34,293 appointments) sourced from electronic health records encompassing administrative information, referral diagnosis, and underlying patient conditions. We used a state-of-the-art interpretable machine learning algorithm, the optimal sparse decision tree (OSDT), renowned for its high accuracy and interpretability, to investigate the attributes pertinent to echocardiogram appointments. Results: The method demonstrated satisfactory performance (F1-score=36.18% with an improvement of 1.7% and F2-score=28.18% with an improvement of 0.79% by the best-performing baseline model) in comparison to the best-performing baseline model. Moreover, due to its high interpretability, the results provide valuable medical insights regarding the identification of urgent patients for tests through the extraction of decision rules from the OSDT model. Conclusions: The method demonstrated state-of-the-art predictive performance, affirming its effectiveness. Furthermore, we validate the decision rules derived from the OSDT model by comparing them with established medical knowledge. These interpretable results (eg, attribute importance and decision rules from the OSDT model) underscore the potential of our approach in prioritizing patient urgency for echocardiogram appointments and can be extended to prioritize other laboratory test appointments using electronic health record data. %M 39879091 %R 10.2196/64188 %U https://ai.jmir.org/2025/1/e64188 %U https://doi.org/10.2196/64188 %U http://www.ncbi.nlm.nih.gov/pubmed/39879091 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63109 %T Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design %A Ghaffar,Faisal %A Furtado,Nadine M. %A Ali,Imad %A Burns,Catherine %+ Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, EC4 2121, 295 Phillip St, Waterloo, ON, N2L 3W8, Canada, 1 519 888 4567 ext 33903, catherine.burns@uwaterloo.ca %K decision-making %K human-centered AI design %K human factors %K experts versus novices differences %K optometry %K glaucoma diagnosis %K experts versus novices %K glaucoma %K eye disease %K vision %K vision impairment %K comparative analysis %K methodology %K optometrist %K artificial intelligence %K AI %K diagnostic accuracy %K consistency %K clinical data %K risk assessment %K progression analysis %D 2025 %7 29.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies. Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions. Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches. Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20%) concordance rates with limited data in comparison to novices (7/69, 10%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40% for both groups (experts: 5/12, 42% and novices: 8/21, 42%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions. Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in accuracy, approach, and management. These findings demonstrate the critical need for AI systems tailored to varying levels of expertise. They also provide insights for the future design of AI systems aimed at enhancing the diagnostic accuracy of optometrists and consistency across different expertise levels, ultimately improving patient outcomes in optometric practice. %M 39879089 %R 10.2196/63109 %U https://medinform.jmir.org/2025/1/e63109 %U https://doi.org/10.2196/63109 %U http://www.ncbi.nlm.nih.gov/pubmed/39879089 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e62914 %T Preclinical Cognitive Markers of Alzheimer Disease and Early Diagnosis Using Virtual Reality and Artificial Intelligence: Literature Review %A Scribano Parada,María de la Paz %A González Palau,Fátima %A Valladares Rodríguez,Sonia %A Rincon,Mariano %A Rico Barroeta,Maria José %A García Rodriguez,Marta %A Bueno Aguado,Yolanda %A Herrero Blanco,Ana %A Díaz-López,Estela %A Bachiller Mayoral,Margarita %A Losada Durán,Raquel %K dementia %K Alzheimer disease %K mild cognitive impairment %K virtual reality %K artificial intelligence %K early detection %K qualitative review %K literature review %K AI %D 2025 %7 28.1.2025 %9 %J JMIR Med Inform %G English %X Background: This review explores the potential of virtual reality (VR) and artificial intelligence (AI) to identify preclinical cognitive markers of Alzheimer disease (AD). By synthesizing recent studies, it aims to advance early diagnostic methods to detect AD before significant symptoms occur. Objective: Research emphasizes the significance of early detection in AD during the preclinical phase, which does not involve cognitive impairment but nevertheless requires reliable biomarkers. Current biomarkers face challenges, prompting the exploration of cognitive behavior indicators beyond episodic memory. Methods: Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we searched Scopus, PubMed, and Google Scholar for studies on neuropsychiatric disorders utilizing conversational data. Results: Following an analysis of 38 selected articles, we highlight verbal episodic memory as a sensitive preclinical AD marker, with supporting evidence from neuroimaging and genetic profiling. Executive functions precede memory decline, while processing speed is a significant correlate. The potential of VR remains underexplored, and AI algorithms offer a multidimensional approach to early neurocognitive disorder diagnosis. Conclusions: Emerging technologies like VR and AI show promise for preclinical diagnostics, but thorough validation and regulation for clinical safety and efficacy are necessary. Continued technological advancements are expected to enhance early detection and management of AD. %R 10.2196/62914 %U https://medinform.jmir.org/2025/1/e62914 %U https://doi.org/10.2196/62914 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e53928 %T Discrimination of Radiologists' Experience Level Using Eye-Tracking Technology and Machine Learning: Case Study %A Martinez,Stanford %A Ramirez-Tamayo,Carolina %A Akhter Faruqui,Syed Hasib %A Clark,Kal %A Alaeddini,Adel %A Czarnek,Nicholas %A Aggarwal,Aarushi %A Emamzadeh,Sahra %A Mock,Jeffrey R %A Golob,Edward J %+ Department of Mechanical Engineering, Southern Methodist University, 3101 Dyer Street, Dallas, TX, 75205, United States, 1 214 768 3050, aalaeddini@smu.edu %K machine learning %K eye-tracking %K experience level determination %K radiology education %K search pattern feature extraction %K search pattern %K radiology %K classification %K gaze %K fixation %K education %K experience %K spatio-temporal %K image %K x-ray %K eye movement %D 2025 %7 22.1.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Perception-related errors comprise most diagnostic mistakes in radiology. To mitigate this problem, radiologists use personalized and high-dimensional visual search strategies, otherwise known as search patterns. Qualitative descriptions of these search patterns, which involve the physician verbalizing or annotating the order he or she analyzes the image, can be unreliable due to discrepancies in what is reported versus the actual visual patterns. This discrepancy can interfere with quality improvement interventions and negatively impact patient care. Objective: The objective of this study is to provide an alternative method for distinguishing between radiologists by means of captured eye-tracking data such that the raw gaze (or processed fixation data) can be used to discriminate users based on subconscious behavior in visual inspection. Methods: We present a novel discretized feature encoding based on spatiotemporal binning of fixation data for efficient geometric alignment and temporal ordering of eye movement when reading chest x-rays. The encoded features of the eye-fixation data are used by machine learning classifiers to discriminate between faculty and trainee radiologists. A clinical trial case study was conducted using metrics such as the area under the curve, accuracy, F1-score, sensitivity, and specificity to evaluate the discriminability between the 2 groups regarding their level of experience. The classification performance was then compared with state-of-the-art methodologies. In addition, a repeatability experiment using a separate dataset, experimental protocol, and eye tracker was performed with 8 participants to evaluate the robustness of the proposed approach. Results: The numerical results from both experiments demonstrate that classifiers using the proposed feature encoding methods outperform the current state-of-the-art in differentiating between radiologists in terms of experience level. An average performance gain of 6.9% is observed compared with traditional features while classifying experience levels of radiologists. This gain in accuracy is also substantial across different eye tracker–collected datasets, with improvements of 6.41% using the Tobii eye tracker and 7.29% using the EyeLink eye tracker. These results signify the potential impact of the proposed method for identifying radiologists’ level of expertise and those who would benefit from additional training. Conclusions: The effectiveness of the proposed spatiotemporal discretization approach, validated across diverse datasets and various classification metrics, underscores its potential for objective evaluation, informing targeted interventions and training strategies in radiology. This research advances reliable assessment tools, addressing challenges in perception-related errors to enhance patient care outcomes. %M 39842001 %R 10.2196/53928 %U https://formative.jmir.org/2025/1/e53928 %U https://doi.org/10.2196/53928 %U http://www.ncbi.nlm.nih.gov/pubmed/39842001 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e57874 %T AI Machine Learning–Based Diabetes Prediction in Older Adults in South Korea: Cross-Sectional Analysis %A Lee,Hocheol %A Park,Myung-Bae %A Won,Young-Joo %K diabetes %K prediction model %K super-aging population %K extreme gradient boosting model %K geriatrics %K older adults %K aging %K artificial intelligence %K machine learning %D 2025 %7 21.1.2025 %9 %J JMIR Form Res %G English %X Background: Diabetes is prevalent in older adults, and machine learning algorithms could help predict diabetes in this population. Objective: This study determined diabetes risk factors among older adults aged ≥60 years using machine learning algorithms and selected an optimized prediction model. Methods: This cross-sectional study was conducted on 3084 older adults aged ≥60 years in Seoul from January to November 2023. Data were collected using a mobile app (Gosufit) that measured depression, stress, anxiety, basal metabolic rate, oxygen saturation, heart rate, and average daily step count. Health coordinators recorded data on diabetes, hypertension, hyperlipidemia, chronic obstructive pulmonary disease, percent body fat, and percent muscle. The presence of diabetes was the target variable, with various health indicators as predictors. Machine learning algorithms, including random forest, gradient boosting model, light gradient boosting model, extreme gradient boosting model, and k-nearest neighbors, were employed for analysis. The dataset was split into 70% training and 30% testing sets. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the curve (AUC). Shapley additive explanations (SHAPs) were used for model interpretability. Results: Significant predictors of diabetes included hypertension (χ²1=197.294; P<.001), hyperlipidemia (χ²1=47.671; P<.001), age (mean: diabetes group 72.66 years vs nondiabetes group 71.81 years), stress (mean: diabetes group 42.68 vs nondiabetes group 41.47; t3082=−2.858; P=.004), and heart rate (mean: diabetes group 75.05 beats/min vs nondiabetes group 73.14 beats/min; t3082=−7.948; P<.001). The extreme gradient boosting model (XGBM) demonstrated the best performance, with an accuracy of 84.88%, precision of 77.92%, recall of 66.91%, F1 score of 72.00, and AUC of 0.7957. The SHAP analysis of the top-performing XGBM revealed key predictors for diabetes: hypertension, age, percent body fat, heart rate, hyperlipidemia, basal metabolic rate, stress, and oxygen saturation. Hypertension strongly increased diabetes risk, while advanced age and elevated stress levels also showed significant associations. Hyperlipidemia and higher heart rates further heightened diabetes probability. These results highlight the importance and directional impact of specific features in predicting diabetes, providing valuable insights for risk stratification and targeted interventions. Conclusions: This study focused on modifiable risk factors, providing crucial data for establishing a system for the automated collection of health information and lifelog data from older adults using digital devices at service facilities. %R 10.2196/57874 %U https://formative.jmir.org/2025/1/e57874 %U https://doi.org/10.2196/57874 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e54121 %T Machine Learning for the Early Prediction of Delayed Cerebral Ischemia in Patients With Subarachnoid Hemorrhage: Systematic Review and Meta-Analysis %A Zhang,Haofuzi %A Zou,Peng %A Luo,Peng %A Jiang,Xiaofan %+ Department of Neurosurgery, Xijing Hospital, Fourth Military Medical University, No. 127, Changle West Road, Xincheng District, Xi'an, , China, 86 186 0298 0377, jiangxf@fmmu.edu.cn %K machine learning %K subarachnoid hemorrhage %K delayed cerebral ischemia %K systematic review %D 2025 %7 20.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Delayed cerebral ischemia (DCI) is a primary contributor to death after subarachnoid hemorrhage (SAH), with significant incidence. Therefore, early determination of the risk of DCI is an urgent need. Machine learning (ML) has received much attention in clinical practice. Recently, some studies have attempted to apply ML models for early noninvasive prediction of DCI. However, systematic evidence for its predictive accuracy is still lacking. Objective: The aim of this study was to synthesize the prediction accuracy of ML models for DCI to provide evidence for the development or updating of intelligent detection tools. Methods: PubMed, Cochrane, Embase, and Web of Science databases were systematically searched up to May 18, 2023. The risk of bias in the included studies was assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool). During the analysis, we discussed the performance of different models in the training and validation sets. Results: We finally included 48 studies containing 16,294 patients with SAH and 71 ML models with logistic regression as the main model type. In the training set, the pooled concordance index (C index), sensitivity, and specificity of all the models were 0.786 (95% CI 0.737-0.835), 0.77 (95% CI 0.69-0.84), and 0.83 (95% CI 0.75-0.89), respectively, while those of the logistic regression models were 0.770 (95% CI 0.724-0.817), 0.75 (95% CI 0.67-0.82), and 0.71 (95% CI 0.63-0.78), respectively. In the validation set, the pooled C index, sensitivity, and specificity of all the models were 0.767 (95% CI 0.741-0.793), 0.66 (95% CI 0.53-0.77), and 0.78 (95% CI 0.71-0.84), respectively, while those of the logistic regression models were 0.757 (95% CI 0.715-0.800), 0.59 (95% CI 0.57-0.80), and 0.80 (95% CI 0.71-0.87), respectively. Conclusions: ML models appear to have relatively desirable power for early noninvasive prediction of DCI after SAH. However, enhancing the prediction sensitivity of these models is challenging. Therefore, efficient, noninvasive, or minimally invasive low-cost predictors should be further explored in future studies to improve the prediction accuracy of ML models. Trial Registration: PROSPERO (CRD42023438399); https://tinyurl.com/yfuuudde %M 39832368 %R 10.2196/54121 %U https://www.jmir.org/2025/1/e54121 %U https://doi.org/10.2196/54121 %U http://www.ncbi.nlm.nih.gov/pubmed/39832368 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63004 %T Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders %A De Silva,Upeka %A Madanian,Samaneh %A Olsen,Sharon %A Templeton,John Michael %A Poellabauer,Christian %A Schneider,Sandra L %A Narayanan,Ajit %A Rubaiat,Rahmina %+ Department of Computer Science and Software Engineering, Auckland University of Technology, 55 Wellesley Street East, Auckland CBD, Auckland 1010, Auckland, 1010, New Zealand, 64 09 9219999 ext 6539, sam.madanian@aut.ac.nz %K digital health %K health informatics %K digital biomarker %K speech analytics %K artificial intelligence %K machine learning %D 2025 %7 13.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems. Objective: This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives. Methods: A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis. Results: A total of 389 articles met the initial eligibility criteria, of which 72 (18.5%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing–based speech features (such as wavelet transformation–based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically. Conclusions: The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance. %M 39804693 %R 10.2196/63004 %U https://www.jmir.org/2025/1/e63004 %U https://doi.org/10.2196/63004 %U http://www.ncbi.nlm.nih.gov/pubmed/39804693 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63924 %T Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis %A Zhang,Yong %A Lu,Xiao %A Luo,Yan %A Zhu,Ying %A Ling,Wenwu %K chatbots %K ChatGPT %K ERNIE Bot %K performance %K accuracy rates %K ultrasound %K language %K examination %D 2025 %7 9.1.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. Objective: This study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. Methods: We curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. Results: Of the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P<.05). Both models showed a performance decline in English, but ERNIE Bot’s decline was less significant. The models performed better in terms of basic knowledge, ultrasound methods, and diseases than in terms of ultrasound signs and diagnosis. Conclusions: Chatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use. %R 10.2196/63924 %U https://medinform.jmir.org/2025/1/e63924 %U https://doi.org/10.2196/63924 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63020 %T Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text %A Zhuang,Yan %A Zhang,Junyan %A Li,Xiuxing %A Liu,Chao %A Yu,Yue %A Dong,Wei %A He,Kunlun %+ Medical Big Data Research Center, Chinese PLA General Hospital, 28 Fuxing Road, Beijing, 100853, China, 86 13911232619, kunlunhe@plagh.org %K BERT %K bidirectional encoder representations from transformers %K pretrained language models %K prompt learning %K ICD %K International Classification of Diseases %K cardiovascular disease %K few-shot learning %K multicenter medical data %D 2025 %7 6.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate. We developed a fully automated pipeline based on the Key–bidirectional encoder representations from transformers (BERT) approach and large-scale medical records for continued pretraining, which effectively converts long free text into standard ICD codes. By adjusting parameter settings, such as mixed templates and soft verbalizers, the model can adapt flexibly to different requirements, enabling task-specific prompt learning. Objective: This study aims to propose a prompt learning real-time framework based on pretrained language models that can automatically label long free-text data with ICD-10 codes for cardiovascular diseases without the need for semiautomatic preprocessing. Methods: We integrated 4 components into our framework: a medical pretrained BERT, a keyword filtration BERT in a functional order, a fine-tuning phase, and task-specific prompt learning utilizing mixed templates and soft verbalizers. This framework was validated on a multicenter medical dataset for the automated ICD coding of 13 common cardiovascular diseases (584,969 records). Its performance was compared against robustly optimized BERT pretraining approach, extreme language network, and various BERT-based fine-tuning pipelines. Additionally, we evaluated the framework’s performance under different prompt learning and fine-tuning settings. Furthermore, few-shot learning experiments were conducted to assess the feasibility and efficacy of our framework in scenarios involving small- to mid-sized datasets. Results: Compared with traditional pretraining and fine-tuning pipelines, our approach achieved a higher micro–F1-score of 0.838 and a macro–area under the receiver operating characteristic curve (macro-AUC) of 0.958, which is 10% higher than other methods. Among different prompt learning setups, the combination of mixed templates and soft verbalizers yielded the best performance. Few-shot experiments showed that performance stabilized and the AUC peaked at 500 shots. Conclusions: These findings underscore the effectiveness and superior performance of prompt learning and fine-tuning for subtasks within pretrained language models in medical practice. Our real-time ICD coding pipeline efficiently converts detailed medical free text into standardized labels, offering promising applications in clinical decision-making. It can assist doctors unfamiliar with the ICD coding system in organizing medical record information, thereby accelerating the medical process and enhancing the efficiency of diagnosis and treatment. %M 39761555 %R 10.2196/63020 %U https://medinform.jmir.org/2025/1/e63020 %U https://doi.org/10.2196/63020 %U http://www.ncbi.nlm.nih.gov/pubmed/39761555 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58812 %T Enhancing Clinical Decision Making by Predicting Readmission Risk in Patients With Heart Failure Using Machine Learning: Predictive Model Development Study %A Jiang,Xiangkui %A Wang,Bingquan %K prediction model %K heart failure %K hospital readmission %K machine learning %K cardiology %K admissions %K hospitalization %D 2024 %7 31.12.2024 %9 %J JMIR Med Inform %G English %X Background: Patients with heart failure frequently face the possibility of rehospitalization following an initial hospital stay, placing a significant burden on both patients and health care systems. Accurate predictive tools are crucial for guiding clinical decision-making and optimizing patient care. However, the effectiveness of existing models tailored specifically to the Chinese population is still limited. Objective: This study aimed to formulate a predictive model for assessing the likelihood of readmission among patients diagnosed with heart failure. Methods: In this study, we analyzed data from 1948 patients with heart failure in a hospital in Sichuan Province between 2016 and 2019. By applying 3 variable selection strategies, 29 relevant variables were identified. Subsequently, we constructed 6 predictive models using different algorithms: logistic regression, support vector machine, gradient boosting machine, Extreme Gradient Boosting, multilayer perception, and graph convolutional networks. Results: The graph convolutional network model showed the highest prediction accuracy with an area under the receiver operating characteristic curve of 0.831, accuracy of 75%, sensitivity of 52.12%, and specificity of 90.25%. Conclusions: The model crafted in this study proves its effectiveness in forecasting the likelihood of readmission among patients with heart failure, thus serving as a crucial reference for clinical decision-making. %R 10.2196/58812 %U https://medinform.jmir.org/2024/1/e58812 %U https://doi.org/10.2196/58812 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56382 %T Leveraging Machine Learning to Identify Subgroups of Misclassified Patients in the Emergency Department: Multicenter Proof-of-Concept Study %A Wyatt,Sage %A Lunde Markussen,Dagfinn %A Haizoune,Mounir %A Vestbø,Anders Strand %A Sima,Yeneabeba Tilahun %A Sandboe,Maria Ilene %A Landschulze,Marcus %A Bartsch,Hauke %A Sauer,Christopher Martin %+ Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany, 49 201 723 0, sauerc@mit.edu %K emergency department %K triage %K machine learning %K real world evidence %K random forest %K classification %K subgroup %K misclassification %K patient %K multi-center %K proof-of-concept %K hospital %K clinical feature %K Norway %K retrospective %K cohort study %K electronic health system %K electronic health record %D 2024 %7 31.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Hospitals use triage systems to prioritize the needs of patients within available resources. Misclassification of a patient can lead to either adverse outcomes in a patient who did not receive appropriate care in the case of undertriage or a waste of hospital resources in the case of overtriage. Recent advances in machine learning algorithms allow for the quantification of variables important to under- and overtriage. Objective: This study aimed to identify clinical features most strongly associated with triage misclassification using a machine learning classification model to capture nonlinear relationships. Methods: Multicenter retrospective cohort data from 2 big regional hospitals in Norway were extracted. The South African Triage System is used at Bergen University Hospital, and the Rapid Emergency Triage and Treatment System is used at Trondheim University Hospital. Variables included triage score, age, sex, arrival time, subject area affiliation, reason for emergency department contact, discharge location, level of care, and time of death were retrieved. Random forest classification models were used to identify features with the strongest association with overtriage and undertriage in clinical practice in Bergen and Trondheim. We reported variable importance as SHAP (SHapley Additive exPlanations)-values. Results: We collected data on 205,488 patient records from Bergen University Hospital and 304,997 patient records from Trondheim University Hospital. Overall, overtriage was very uncommon at both hospitals (all <0.1%), with undertriage differing between both locations, with 0.8% at Bergen and 0.2% at Trondheim University Hospital. Demographics were similar for both hospitals. However, the percentage given a high-priority triage score (red or orange) was higher in Bergen (24%) compared with 9% in Trondheim. The clinical referral department was found to be the variable with the strongest association with undertriage (mean SHAP +0.62 and +0.37 for Bergen and Trondheim, respectively). Conclusions: We identified subgroups of patients consistently undertriaged using 2 common triage systems. While the importance of clinical patient characteristics to triage misclassification varies by triage system and location, we found consistent evidence between the two locations that the clinical referral department is the most important variable associated with triage misclassification. Replication of this approach at other centers could help to further improve triage scoring systems and improve patient care worldwide. %M 39451101 %R 10.2196/56382 %U https://www.jmir.org/2024/1/e56382 %U https://doi.org/10.2196/56382 %U http://www.ncbi.nlm.nih.gov/pubmed/39451101 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52914 %T Artificial Intelligence–Aided Diagnosis System for the Detection and Classification of Private-Part Skin Diseases: Decision Analytical Modeling Study %A Wang,Wei %A Chen,Xiang %A Xu,Licong %A Huang,Kai %A Zhao,Shuang %A Wang,Yong %+ School of Automation, Central South University, 932 South Lushan Road, Changsha, 410083, China, 86 18507313729, ywang@csu.edu.cn %K artificial intelligence-aided diagnosis %K private parts %K skin disease %K knowledge graph %K dermatology %K classification %K artificial intelligence %K AI %K diagnosis %D 2024 %7 27.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Private-part skin diseases (PPSDs) can cause a patient’s stigma, which may hinder the early diagnosis of these diseases. Artificial intelligence (AI) is an effective tool to improve the early diagnosis of PPSDs, especially in preventing the deterioration of skin tumors in private parts such as Paget disease. However, to our knowledge, there is currently no research on using AI to identify PPSDs due to the complex backgrounds of the lesion areas and the challenges in data collection. Objective: This study aimed to develop and evaluate an AI-aided diagnosis system for the detection and classification of PPSDs: aiding patients in self-screening and supporting dermatologists’ diagnostic enhancement. Methods: In this decision analytical modeling study, a 2-stage AI-aided diagnosis system was developed to classify PPSDs. In the first stage, a multitask detection network was trained to automatically detect and classify skin lesions (type, color, and shape). In the second stage, we proposed a knowledge graph based on dermatology expertise and constructed a decision network to classify seven PPSDs (condyloma acuminatum, Paget disease, eczema, pearly penile papules, genital herpes, syphilis, and Bowen disease). A reader study with 13 dermatologists of different experience levels was conducted. Dermatologists were asked to classify the testing cohort under reading room conditions, first without and then with system support. This AI-aided diagnostic study used the data of 635 patients from two institutes between July 2019 and April 2022. The data of Institute 1 contained 2701 skin lesion samples from 520 patients, which were used for the training of the multitask detection network in the first stage. In addition, the data of Institute 2 consisted of 115 clinical images and the corresponding medical records, which were used for the test of the whole 2-stage AI-aided diagnosis system. Results: On the test data of Institute 2, the proposed system achieved the average precision, recall, and F1-score of 0.81, 0.86, and 0.83, respectively, better than existing advanced algorithms. For the reader performance test, our system improved the average F1-score of the junior, intermediate, and senior dermatologists by 16%, 7%, and 4%, respectively. Conclusions: In this study, we constructed the first skin-lesion–based dataset and developed the first AI-aided diagnosis system for PPSDs. This system provides the final diagnosis result by simulating the diagnostic process of dermatologists. Compared with existing advanced algorithms, this system is more accurate in identifying PPSDs. Overall, our system can not only help patients achieve self-screening and alleviate their stigma but also assist dermatologists in diagnosing PPSDs. %M 39729353 %R 10.2196/52914 %U https://www.jmir.org/2024/1/e52914 %U https://doi.org/10.2196/52914 %U http://www.ncbi.nlm.nih.gov/pubmed/39729353 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60684 %T AI in Dental Radiology—Improving the Efficiency of Reporting With ChatGPT: Comparative Study %A Stephan,Daniel %A Bertsch,Annika %A Burwinkel,Matthias %A Vinayahalingam,Shankeeth %A Al-Nawas,Bilal %A Kämmerer,Peer W %A Thiem,Daniel GE %+ Department of Oral and Maxillofacial Surgery, Facial Plastic Surgery, University Medical Centre of the Johannes Gutenberg-University Mainz, Augustusplatz 2, Mainz, 55131, Germany, 49 6131177038, stephand@uni-mainz.de %K artificial intelligence %K ChatGPT %K radiology report %K dental radiology %K dental orthopantomogram %K panoramic radiograph %K dental %K radiology %K chatbot %K medical documentation %K medical application %K imaging %K disease detection %K clinical decision support %K natural language processing %K medical licensing %K dentistry %K patient care %D 2024 %7 23.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Structured and standardized documentation is critical for accurately recording diagnostic findings, treatment plans, and patient progress in health care. Manual documentation can be labor-intensive and error-prone, especially under time constraints, prompting interest in the potential of artificial intelligence (AI) to automate and optimize these processes, particularly in medical documentation. Objective: This study aimed to assess the effectiveness of ChatGPT (OpenAI) in generating radiology reports from dental panoramic radiographs, comparing the performance of AI-generated reports with those manually created by dental students. Methods: A total of 100 dental students were tasked with analyzing panoramic radiographs and generating radiology reports manually or assisted by ChatGPT using a standardized prompt derived from a diagnostic checklist. Results: Reports generated by ChatGPT showed a high degree of textual similarity to reference reports; however, they often lacked critical diagnostic information typically included in reports authored by students. Despite this, the AI-generated reports were consistent in being error-free and matched the readability of student-generated reports. Conclusions: The findings from this study suggest that ChatGPT has considerable potential for generating radiology reports, although it currently faces challenges in accuracy and reliability. This underscores the need for further refinement in the AI’s prompt design and the development of robust validation mechanisms to enhance its use in clinical settings. %M 39714078 %R 10.2196/60684 %U https://www.jmir.org/2024/1/e60684 %U https://doi.org/10.2196/60684 %U http://www.ncbi.nlm.nih.gov/pubmed/39714078 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51409 %T Longitudinal Model Shifts of Machine Learning–Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals %A Cabanillas Silva,Patricia %A Sun,Hong %A Rezk,Mohamed %A Roccaro-Waldmeyer,Diana M %A Fliegenschmidt,Janis %A Hulde,Nikolai %A von Dossow,Vera %A Meesseman,Laurent %A Depraetere,Kristof %A Stieg,Joerg %A Szymanowsky,Ralph %A Dahlweid,Fried-Michael %+ Dedalus HealthCare, Roderveldlaan 2, Antwerp, 2600, Belgium, 32 0784244010, mohamed.rezk@dedalus.com %K model shift %K model monitoring %K prediction models %K acute kidney injury %K AKI %K sepsis %K delirium %K decision curve analysis %K DCA %D 2024 %7 13.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: In recent years, machine learning (ML)–based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. Objective: This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases—delirium, sepsis, and acute kidney injury (AKI)—from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. Methods: We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. Results: The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=–1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. Conclusions: Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making. %M 39671571 %R 10.2196/51409 %U https://www.jmir.org/2024/1/e51409 %U https://doi.org/10.2196/51409 %U http://www.ncbi.nlm.nih.gov/pubmed/39671571 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55986 %T Accuracy of Machine Learning in Detecting Pediatric Epileptic Seizures: Systematic Review and Meta-Analysis %A Zou,Zhuan %A Chen,Bin %A Xiao,Dongqiong %A Tang,Fajuan %A Li,Xihong %+ Department of Emergency, West China Second University Hospital, Sichuan University, No 20, Section 3, Renmin South Road, Wuhou District, Chengdu, 610000, China, 86 13551089846, lixihonghxey@163.com %K epileptic seizures %K machine learning %K deep learning %K electroencephalogram %K EEG %K children %K pediatrics %K epilepsy %K detection %D 2024 %7 11.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Real-time monitoring of pediatric epileptic seizures poses a significant challenge in clinical practice. In recent years, machine learning (ML) has attracted substantial attention from researchers for diagnosing and treating neurological diseases, leading to its application for detecting pediatric epileptic seizures. However, systematic evidence substantiating its feasibility remains limited. Objective: This systematic review aimed to consolidate the existing evidence regarding the effectiveness of ML in monitoring pediatric epileptic seizures with an effort to provide an evidence-based foundation for the development and enhancement of intelligent tools in the future. Methods: We conducted a systematic search of the PubMed, Cochrane, Embase, and Web of Science databases for original studies focused on the detection of pediatric epileptic seizures using ML, with a cutoff date of August 27, 2023. The risk of bias in eligible studies was assessed using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies–2). Meta-analyses were performed to evaluate the C-index and the diagnostic 4-grid table, using a bivariate mixed-effects model for the latter. We also examined publication bias for the C-index by using funnel plots and the Egger test. Results: This systematic review included 28 original studies, with 15 studies on ML and 13 on deep learning (DL). All these models were based on electroencephalography data of children. The pooled C-index, sensitivity, specificity, and accuracy of ML in the training set were 0.76 (95% CI 0.69-0.82), 0.77 (95% CI 0.73-0.80), 0.74 (95% CI 0.70-0.77), and 0.75 (95% CI 0.72-0.77), respectively. In the validation set, the pooled C-index, sensitivity, specificity, and accuracy of ML were 0.73 (95% CI 0.67-0.79), 0.88 (95% CI 0.83-0.91), 0.83 (95% CI 0.71-0.90), and 0.78 (95% CI 0.73-0.82), respectively. Meanwhile, the pooled C-index of DL in the validation set was 0.91 (95% CI 0.88-0.94), with sensitivity, specificity, and accuracy being 0.89 (95% CI 0.85-0.91), 0.91 (95% CI 0.88-0.93), and 0.89 (95% CI 0.86-0.92), respectively. Conclusions: Our systematic review demonstrates promising accuracy of artificial intelligence methods in epilepsy detection. DL appears to offer higher detection accuracy than ML. These findings support the development of DL-based early-warning tools in future research. Trial Registration: PROSPERO CRD42023467260; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42023467260 %M 39661965 %R 10.2196/55986 %U https://www.jmir.org/2024/1/e55986 %U https://doi.org/10.2196/55986 %U http://www.ncbi.nlm.nih.gov/pubmed/39661965 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58423 %T A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and Validation %A Du,Jianchao %A Ding,Junyao %A Wu,Yuan %A Chen,Tianyan %A Lian,Jianqi %A Shi,Lei %A Zhou,Yun %K fever of unknown origin %K FUO %K intelligent diagnosis %K machine learning %K hierarchical classification %K feature selection %K model design %K validation %K diagnostic %K prediction model %D 2024 %7 9.12.2024 %9 %J JMIR Form Res %G English %X Background: Fever of unknown origin (FUO) is a significant challenge for the medical community due to its association with a wide range of diseases, the complexity of diagnosis, and the likelihood of misdiagnosis. Machine learning can extract valuable information from the extensive data of patient indicators, aiding doctors in diagnosing the underlying cause of FUO. Objective: The study aims to design a multipath hierarchical classification algorithm to diagnose FUO due to the hierarchical structure of the etiology of FUO. In addition, to improve the diagnostic performance of the model, a mechanism for feature selection is added to the model. Methods: The case data of patients with FUO admitted to the First Affiliated Hospital of Xi’an Jiaotong University between 2011 and 2020 in China were used as the dataset for model training and validation. The hierarchical structure tree was then characterized according to etiology. The structure included 3 layers, with the top layer representing the FUO, the middle layer dividing the FUO into 5 categories of etiology (bacterial infection, viral infection, other infection, autoimmune diseases, and other noninfection), and the last layer further refining them to 16 etiologies. Finally, ablation experiments were set to determine the optimal structure of the proposed method, and comparison experiments were to verify the diagnostic performance. Results: According to ablation experiments, the model achieved the best performance with an accuracy of 76.08% when the number of middle paths was 3%, and 25% of the features were selected. According to comparison experiments, the proposed model outperformed the comparison methods, both from the perspective of feature selection methods and hierarchical classification methods. Specifically, brucellosis had an accuracy of 100%, and liver abscess, viral infection, and lymphoma all had an accuracy of more than 80%. Conclusions: In this study, a novel multipath feature selection and hierarchical classification model was designed for the diagnosis of FUO and was adequately evaluated quantitatively. Despite some limitations, this model enriches the exploration of FUO in machine learning and assists physicians in their work. %R 10.2196/58423 %U https://formative.jmir.org/2024/1/e58423 %U https://doi.org/10.2196/58423 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e67409 %T The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance %A Sorich,Michael Joseph %A Mangoni,Arduino Aleksander %A Bacchi,Stephen %A Menz,Bradley Douglas %A Hopkins,Ashley Mark %+ College of Medicine and Public Health, Flinders University, GPO Box 2100, Adelaide, 5001, Australia, 61 82013217, michael.sorich@flinders.edu.au %K generative artificial intelligence %K large language models %K triage %K diagnosis %K accuracy %K physician %K ChatGPT %K diagnostic %K primary care %K physicians %K prediction %K medical care %K internet %K LLMs %K AI %D 2024 %7 6.12.2024 %9 Research Letter %J J Med Internet Res %G English %X %M 39642373 %R 10.2196/67409 %U https://www.jmir.org/2024/1/e67409 %U https://doi.org/10.2196/67409 %U http://www.ncbi.nlm.nih.gov/pubmed/39642373 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58666 %T Facilitating Trust Calibration in Artificial Intelligence–Driven Diagnostic Decision Support Systems for Determining Physicians’ Diagnostic Accuracy: Quasi-Experimental Study %A Sakamoto,Tetsu %A Harada,Yukinori %A Shimizu,Taro %K trust calibration %K artificial intelligence %K diagnostic accuracy %K diagnostic decision support %K decision support %K diagnosis %K diagnostic %K chart %K history %K reliable %K reliability %K accurate %K accuracy %K AI %D 2024 %7 27.11.2024 %9 %J JMIR Form Res %G English %X Background: Diagnostic errors are significant problems in medical care. Despite the usefulness of artificial intelligence (AI)–based diagnostic decision support systems, the overreliance of physicians on AI-generated diagnoses may lead to diagnostic errors. Objective: We investigated the safe use of AI-based diagnostic decision support systems with trust calibration by adjusting trust levels to match the actual reliability of AI. Methods: A quasi-experimental study was conducted at Dokkyo Medical University, Japan, with physicians allocated (1:1) to the intervention and control groups. A total of 20 clinical cases were created based on the medical histories recorded by an AI-driven automated medical history–taking system from actual patients who visited a community-based hospital in Japan. The participants reviewed the medical histories of 20 clinical cases generated by an AI-driven automated medical history–taking system with an AI-generated list of 10 differential diagnoses and provided 1 to 3 possible diagnoses. Physicians were asked whether the final diagnosis was in the AI-generated list of 10 differential diagnoses in the intervention group, which served as the trust calibration. We analyzed the diagnostic accuracy of physicians and the correctness of the trust calibration in the intervention group. We also investigated the relationship between the accuracy of the trust calibration and the diagnostic accuracy of physicians, and the physicians’ confidence level regarding the use of AI. Results: Among the 20 physicians assigned to the intervention (n=10) and control (n=10) groups, the mean age was 30.9 (SD 3.9) years and 31.7 (SD 4.2) years, the proportion of men was 80% and 60%, and the mean postgraduate year was 5.8 (SD 2.9) and 7.2 (SD 4.6), respectively, with no significant differences. The physicians’ diagnostic accuracy was 41.5% in the intervention group and 46% in the control group, with no significant difference (95% CI −0.75 to 2.55; P=.27). The overall accuracy of the trust calibration was only 61.5%, and despite correct calibration, the diagnostic accuracy was 54.5%. In the multivariate logistic regression model, the accuracy of the trust calibration was a significant contributor to the diagnostic accuracy of physicians (adjusted odds ratio 5.90, 95% CI 2.93‐12.46; P<.001). The mean confidence level for AI was 72.5% in the intervention group and 45% in the control group, with no significant difference. Conclusions: Trust calibration did not significantly improve physicians’ diagnostic accuracy when considering the differential diagnoses generated by reading medical histories and the possible differential diagnosis lists of an AI-driven automated medical history–taking system. As this was a formative study, the small sample size and suboptimal trust calibration methods may have contributed to the lack of significant differences. This study highlights the need for a larger sample size and the implementation of supportive measures of trust calibration. %R 10.2196/58666 %U https://formative.jmir.org/2024/1/e58666 %U https://doi.org/10.2196/58666 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e52514 %T The Promise of AI for Image-Driven Medicine: Qualitative Interview Study of Radiologists’ and Pathologists’ Perspectives %A Drogt,Jojanneke %A Milota,Megan %A Veldhuis,Wouter %A Vos,Shoko %A Jongsma,Karin %K digital medicine %K computer vision %K medical AI %K image-driven specialisms %K qualitative interview study %K digital health ethics %K artificial intelligence %K AI %K imaging %K imaging informatics %K radiology %K pathology %D 2024 %7 21.11.2024 %9 %J JMIR Hum Factors %G English %X Background: Image-driven specialisms such as radiology and pathology are at the forefront of medical artificial intelligence (AI) innovation. Many believe that AI will lead to significant shifts in professional roles, so it is vital to investigate how professionals view the pending changes that AI innovation will initiate and incorporate their views in ongoing AI developments. Objective: Our study aimed to gain insights into the perspectives and wishes of radiologists and pathologists regarding the promise of AI. Methods: We have conducted the first qualitative interview study investigating the perspectives of both radiologists and pathologists regarding the integration of AI in their fields. The study design is in accordance with the consolidated criteria for reporting qualitative research (COREQ). Results: In total, 21 participants were interviewed for this study (7 pathologists, 10 radiologists, and 4 computer scientists). The interviews revealed a diverse range of perspectives on the impact of AI. Respondents discussed various task-specific benefits of AI; yet, both pathologists and radiologists agreed that AI had yet to live up to its hype. Overall, our study shows that AI could facilitate welcome changes in the workflows of image-driven professionals and eventually lead to better quality of care. At the same time, these professionals also admitted that many hopes and expectations for AI were unlikely to become a reality in the next decade. Conclusions: This study points to the importance of maintaining a “healthy skepticism” on the promise of AI in imaging specialisms and argues for more structural and inclusive discussions about whether AI is the right technology to solve current problems encountered in daily clinical practice. %R 10.2196/52514 %U https://humanfactors.jmir.org/2024/1/e52514 %U https://doi.org/10.2196/52514 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51477 %T Performance of a Full-Coverage Cervical Cancer Screening Program Using on an Artificial Intelligence– and Cloud-Based Diagnostic System: Observational Study of an Ultralarge Population %A Ji,Lu %A Yao,Yifan %A Yu,Dandan %A Chen,Wen %A Yin,Shanshan %A Fu,Yun %A Tang,Shangfeng %A Yao,Lan %+ School of Medicine and Health Management, Tongji Medical College of Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, 430030, China, 86 027 83692727, ylhuster@163.com %K full coverage %K cervical cancer screening %K artificial intelligence %K primary health institutions %K accessibility %K efficiency %D 2024 %7 20.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The World Health Organization has set a global strategy to eliminate cervical cancer, emphasizing the need for cervical cancer screening coverage to reach 70%. In response, China has developed an action plan to accelerate the elimination of cervical cancer, with Hubei province implementing China’s first provincial full-coverage screening program using an artificial intelligence (AI) and cloud-based diagnostic system. Objective: This study aimed to evaluate the performance of AI technology in this full-coverage screening program. The evaluation indicators included accessibility, screening efficiency, diagnostic quality, and program cost. Methods: Characteristics of 1,704,461 individuals screened from July 2022 to January 2023 were used to analyze accessibility and AI screening efficiency. A random sample of 220 individuals was used for external diagnostic quality control. The costs of different participating screening institutions were assessed. Results: Cervical cancer screening services were extended to all administrative districts, especially in rural areas. Rural women had the highest participation rate at 67.54% (1,147,839/1,699,591). Approximately 1.7 million individuals were screened, achieving a cumulative coverage of 13.45% in about 6 months. Full-coverage programs could be achieved by AI technology in approximately 1 year, which was 87.5 times more efficient than the manual reading of slides. The sample compliance rate was as high as 99.1%, and compliance rates for positive, negative, and pathology biopsy reviews exceeded 96%. The cost of this program was CN ¥49 (the average exchange rate in 2022 is as follows: US $1=CN ¥6.7261) per person, with the primary screening institution and the third-party testing institute receiving CN ¥19 and ¥27, respectively. Conclusions: AI-assisted diagnosis has proven to be accessible, efficient, reliable, and low cost, which could support the implementation of full-coverage screening programs, especially in areas with insufficient health resources. AI technology served as a crucial tool for rapidly and effectively increasing screening coverage, which would accelerate the achievement of the World Health Organization’s goals of eliminating cervical cancer. %M 39566061 %R 10.2196/51477 %U https://www.jmir.org/2024/1/e51477 %U https://doi.org/10.2196/51477 %U http://www.ncbi.nlm.nih.gov/pubmed/39566061 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e64844 %T Comparative Analysis of Diagnostic Performance: Differential Diagnosis Lists by LLaMA3 Versus LLaMA2 for Case Reports %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Shiraishi,Tatsuya %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 0282861111, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K clinical decision support system %K generative artificial intelligence %K large language models %K natural language processing %K NLP %K AI %K clinical decision making %K decision support %K decision making %K LLM: diagnostic %K case report %K diagnosis %K generative AI %K LLaMA %D 2024 %7 19.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Generative artificial intelligence (AI), particularly in the form of large language models, has rapidly developed. The LLaMA series are popular and recently updated from LLaMA2 to LLaMA3. However, the impacts of the update on diagnostic performance have not been well documented. Objective: We conducted a comparative evaluation of the diagnostic performance in differential diagnosis lists generated by LLaMA3 and LLaMA2 for case reports. Methods: We analyzed case reports published in the American Journal of Case Reports from 2022 to 2023. After excluding nondiagnostic and pediatric cases, we input the remaining cases into LLaMA3 and LLaMA2 using the same prompt and the same adjustable parameters. Diagnostic performance was defined by whether the differential diagnosis lists included the final diagnosis. Multiple physicians independently evaluated whether the final diagnosis was included in the top 10 differentials generated by LLaMA3 and LLaMA2. Results: In our comparative evaluation of the diagnostic performance between LLaMA3 and LLaMA2, we analyzed differential diagnosis lists for 392 case reports. The final diagnosis was included in the top 10 differentials generated by LLaMA3 in 79.6% (312/392) of the cases, compared to 49.7% (195/392) for LLaMA2, indicating a statistically significant improvement (P<.001). Additionally, LLaMA3 showed higher performance in including the final diagnosis in the top 5 differentials, observed in 63% (247/392) of cases, compared to LLaMA2’s 38% (149/392, P<.001). Furthermore, the top diagnosis was accurately identified by LLaMA3 in 33.9% (133/392) of cases, significantly higher than the 22.7% (89/392) achieved by LLaMA2 (P<.001). The analysis across various medical specialties revealed variations in diagnostic performance with LLaMA3 consistently outperforming LLaMA2. Conclusions: The results reveal that the LLaMA3 model significantly outperforms LLaMA2 per diagnostic performance, with a higher percentage of case reports having the final diagnosis listed within the top 10, top 5, and as the top diagnosis. Overall diagnostic performance improved almost 1.5 times from LLaMA2 to LLaMA3. These findings support the rapid development and continuous refinement of generative AI systems to enhance diagnostic processes in medicine. However, these findings should be carefully interpreted for clinical application, as generative AI, including the LLaMA series, has not been approved for medical applications such as AI-enhanced diagnostics. %M 39561356 %R 10.2196/64844 %U https://formative.jmir.org/2024/1/e64844 %U https://doi.org/10.2196/64844 %U http://www.ncbi.nlm.nih.gov/pubmed/39561356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57641 %T Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis %A Zhu,Jinpu %A Yang,Fushuang %A Wang,Yang %A Wang,Zhongtian %A Xiao,Yao %A Wang,Lie %A Sun,Liping %+ Center of Children's Clinic, The Affiliated Hospital to Changchun University of Chinese Medicine, No. 185, Shenzhen Street, Economic and Technological Development Zone, Jilin, P.R.C., Changchun, 130022, China, 86 15948000551, slpcczyydx@sina.com %K machine learning %K artificial intelligence %K Kawasaki disease %K febrile illness %K coronary artery lesions %K systematic review %K meta-analysis %D 2024 %7 18.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Kawasaki disease (KD) is an acute pediatric vasculitis that can lead to coronary artery aneurysms and severe cardiovascular complications, often presenting with obvious fever in the early stages. In current clinical practice, distinguishing KD from other febrile illnesses remains a significant challenge. In recent years, some researchers have explored the potential of machine learning (ML) methods for the differential diagnosis of KD versus other febrile illnesses, as well as for predicting coronary artery lesions (CALs) in people with KD. However, there is still a lack of systematic evidence to validate their effectiveness. Therefore, we have conducted the first systematic review and meta-analysis to evaluate the accuracy of ML in differentiating KD from other febrile illnesses and in predicting CALs in people with KD, so as to provide evidence-based support for the application of ML in the diagnosis and treatment of KD. Objective: This study aimed to summarize the accuracy of ML in differentiating KD from other febrile illnesses and predicting CALs in people with KD. Methods: PubMed, Cochrane Library, Embase, and Web of Science were systematically searched until September 26, 2023. The risk of bias in the included original studies was appraised using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Stata (version 15.0; StataCorp) was used for the statistical analysis. Results: A total of 29 studies were incorporated. Of them, 20 used ML to differentiate KD from other febrile illnesses. These studies involved a total of 103,882 participants, including 12,541 people with KD. In the validation set, the pooled concordance index, sensitivity, and specificity were 0.898 (95% CI 0.874-0.922), 0.91 (95% CI 0.83-0.95), and 0.86 (95% CI 0.80-0.90), respectively. Meanwhile, 9 studies used ML for early prediction of the risk of CALs in children with KD. These studies involved a total of 6503 people with KD, of whom 986 had CALs. The pooled concordance index in the validation set was 0.787 (95% CI 0.738-0.835). Conclusions: The diagnostic and predictive factors used in the studies we included were primarily derived from common clinical data. The ML models constructed based on these clinical data demonstrated promising effectiveness in differentiating KD from other febrile illnesses and in predicting coronary artery lesions. Therefore, in future research, we can explore the use of ML methods to identify more efficient predictors and develop tools that can be applied on a broader scale for the differentiation of KD and the prediction of CALs. %M 39556821 %R 10.2196/57641 %U https://www.jmir.org/2024/1/e57641 %U https://doi.org/10.2196/57641 %U http://www.ncbi.nlm.nih.gov/pubmed/39556821 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49724 %T Task-Specific Transformer-Based Language Models in Health Care: Scoping Review %A Cho,Ha Na %A Jun,Tae Joon %A Kim,Young-Hak %A Kang,Heejun %A Ahn,Imjin %A Gwon,Hansle %A Kim,Yunha %A Seo,Jiahn %A Choi,Heejung %A Kim,Minkyoung %A Han,Jiye %A Kee,Gaeun %A Park,Seohyun %A Ko,Soyoung %+ Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea, 82 10 2956 6101, saigram89@gmail.com %K transformer-based language models %K medicine %K health care %K medical language model %D 2024 %7 18.11.2024 %9 Review %J JMIR Med Inform %G English %X Background: Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. Objective: This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. Methods: We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. Results: Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. Conclusions: This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics. %M 39556827 %R 10.2196/49724 %U https://medinform.jmir.org/2024/1/e49724 %U https://doi.org/10.2196/49724 %U http://www.ncbi.nlm.nih.gov/pubmed/39556827 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e53616 %T Benefits and Risks of AI in Health Care: Narrative Review %A Chustecki,Margaret %+ Department of Internal Medicine, Yale School of Medicine, 1952 Whitney Ave, 3rd Floor, New Haven, CT, 06510, United States, 1 2038091700, margaret.chustecki@imgnh.com %K artificial intelligence %K safety risks %K biases %K AI %K benefit %K risk %K health care %K safety %K ethics %K transparency %K data privacy %K accuracy %D 2024 %7 18.11.2024 %9 Review %J Interact J Med Res %G English %X Background: The integration of artificial intelligence (AI) into health care has the potential to transform the industry, but it also raises ethical, regulatory, and safety concerns. This review paper provides an in-depth examination of the benefits and risks associated with AI in health care, with a focus on issues like biases, transparency, data privacy, and safety. Objective: This study aims to evaluate the advantages and drawbacks of incorporating AI in health care. This assessment centers on the potential biases in AI algorithms, transparency challenges, data privacy issues, and safety risks in health care settings. Methods: Studies included in this review were selected based on their relevance to AI applications in health care, focusing on ethical, regulatory, and safety considerations. Inclusion criteria encompassed peer-reviewed articles, reviews, and relevant research papers published in English. Exclusion criteria included non–peer-reviewed articles, editorials, and studies not directly related to AI in health care. A comprehensive literature search was conducted across 8 databases: OVID MEDLINE, OVID Embase, OVID PsycINFO, EBSCO CINAHL Plus with Full Text, ProQuest Sociological Abstracts, ProQuest Philosopher’s Index, ProQuest Advanced Technologies & Aerospace, and Wiley Cochrane Library. The search was last updated on June 23, 2023. Results were synthesized using qualitative methods to identify key themes and findings related to the benefits and risks of AI in health care. Results: The literature search yielded 8796 articles. After removing duplicates and applying the inclusion and exclusion criteria, 44 studies were included in the qualitative synthesis. This review highlights the significant promise that AI holds in health care, such as enhancing health care delivery by providing more accurate diagnoses, personalized treatment plans, and efficient resource allocation. However, persistent concerns remain, including biases ingrained in AI algorithms, a lack of transparency in decision-making, potential compromises of patient data privacy, and safety risks associated with AI implementation in clinical settings. Conclusions: In conclusion, while AI presents the opportunity for a health care revolution, it is imperative to address the ethical, regulatory, and safety challenges linked to its integration. Proactive measures are required to ensure that AI technologies are developed and deployed responsibly, striking a balance between innovation and the safeguarding of patient well-being. %M 39556817 %R 10.2196/53616 %U https://www.i-jmr.org/2024/1/e53616 %U https://doi.org/10.2196/53616 %U http://www.ncbi.nlm.nih.gov/pubmed/39556817 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59607 %T Examining the Role of Large Language Models in Orthopedics: Systematic Review %A Zhang,Cheng %A Liu,Shanshan %A Zhou,Xingyu %A Zhou,Siyu %A Tian,Yinglun %A Wang,Shenglin %A Xu,Nanfang %A Li,Weishi %+ Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Beijing, 100191, China, 86 01082267360, puh3liweishi@163.com %K large language model %K LLM %K orthopedics %K generative pretrained transformer %K GPT %K ChatGPT %K digital health %K clinical practice %K artificial intelligence %K AI %K generative AI %K Bard %D 2024 %7 15.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent. Objective: The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges. Methods: PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of “large language model,” “generative artificial intelligence,” “ChatGPT,” and “orthopaedics,” were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment. Results: A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs’ performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4’s accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections. Conclusions: LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision. %M 39546795 %R 10.2196/59607 %U https://www.jmir.org/2024/1/e59607 %U https://doi.org/10.2196/59607 %U http://www.ncbi.nlm.nih.gov/pubmed/39546795 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e65277 %T Evaluating the Sensitivity of Wearable Devices in Posttranscatheter Aortic Valve Implantation Functional Assessment %A An,Jinghui %A Shi,Fengwu %A Wang,Huajun %A Zhang,Hang %A Liu,Su %K aortic valve %K implantation functional %K wearable devices %D 2024 %7 8.11.2024 %9 %J JMIR Mhealth Uhealth %G English %X %R 10.2196/65277 %U https://mhealth.jmir.org/2024/1/e65277 %U https://doi.org/10.2196/65277 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58466 %T Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study %A Lin,Yu-Chun %A Yan,Huang-Ting %A Lin,Chih-Hsueh %A Chang,Hen-Hong %+ Graduate Institute of Integrated Medicine, College of Chinese Medicine, China Medical University, No 91, Hsueh-Shih Road, North District, Taichung, 40402, Taiwan, 886 22053366 ext 3609, tcmchh55@gmail.com %K frailty phenotypes %K older adults %K successful aging %K vocal biomarkers %K frailty %K phenotype %K vocal biomarker %K cross-sectional %K gerontology %K geriatrics %K older adult %K Taiwan %K energy-based %K hybrid-based %K sarcopenia %D 2024 %7 8.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice. Objective: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers. Methods: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty–energy, and hybrid-based frailty–sarcopenia. Participants were asked to pronounce a sustained vowel “/a/” for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters—the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)—were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model. Results: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty–sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty–energy (OR 1.43, 95% CI 1.02-2.01), respectively. Conclusions: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype. %M 39515817 %R 10.2196/58466 %U https://www.jmir.org/2024/1/e58466 %U https://doi.org/10.2196/58466 %U http://www.ncbi.nlm.nih.gov/pubmed/39515817 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e22769 %T Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review %A Wang,Leyao %A Wan,Zhiyu %A Ni,Congning %A Song,Qingyuan %A Li,Yang %A Clayton,Ellen %A Malin,Bradley %A Yin,Zhijun %+ Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave Ste 1475, Nashville, TN, 37203, United States, 1 6159363690, zhijun.yin@vumc.org %K large language model %K ChatGPT %K artificial intelligence %K natural language processing %K health care %K summarization %K medical knowledge inquiry %K reliability %K bias %K privacy %D 2024 %7 7.11.2024 %9 Review %J J Med Internet Res %G English %X Background: The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. Objective: This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. Methods: We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care. %M 39509695 %R 10.2196/22769 %U https://www.jmir.org/2024/1/e22769 %U https://doi.org/10.2196/22769 %U http://www.ncbi.nlm.nih.gov/pubmed/39509695 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58776 %T A Deep Learning Model to Predict Breast Implant Texture Types Using Ultrasonography Images: Feasibility Development Study %A Kim,Ho Heon %A Jeong,Won Chan %A Pi,Kyungran %A Lee,Angela Soeun %A Kim,Min Soo %A Kim,Hye Jin %A Kim,Jae Hong %+ The W Clinic, 9F Kukdong B/D, 596 Gangnam-daero, Gangnam-gu, Seoul, 06038, Republic of Korea, 82 2 517 7617, stenkaracin@gmail.com %K breast implants %K mammoplasty %K ultrasonography: AI-assisted diagnosis %K cshell surface topography %K artificial intelligence %K deep learning %K machine learning %D 2024 %7 5.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Breast implants, including textured variants, have been widely used in aesthetic and reconstructive mammoplasty. However, the textured type, which is one of the shell texture types of breast implants, has been identified as a possible etiologic factor for lymphoma, specifically breast implant–associated anaplastic large cell lymphoma (BIA-ALCL). Identifying the shell texture type of the implant is critical to diagnosing BIA-ALCL. However, distinguishing the shell texture type can be difficult due to the loss of human memory and medical history. An alternative approach is to use ultrasonography, but this method also has limitations in quantitative assessment. Objective: This study aims to determine the feasibility of using a deep learning model to classify the shell texture type of breast implants and make robust predictions from ultrasonography images from heterogeneous sources. Methods: A total of 19,502 breast implant images were retrospectively collected from heterogeneous sources, including images captured from both Canon and GE devices, images of ruptured implants, and images without implants, as well as publicly available images. The Canon images were trained using ResNet-50. The model’s performance on the Canon dataset was evaluated using stratified 5-fold cross-validation. Additionally, external validation was conducted using the GE and publicly available datasets. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (PRAUC) were calculated based on the contribution of the pixels with Gradient-weighted Class Activation Mapping (Grad-CAM). To identify the significant pixels for classification, we masked the pixels that contributed less than 10%, up to a maximum of 100%. To assess the model’s robustness to uncertainty, Shannon entropy was calculated for 4 image groups: Canon, GE, ruptured implants, and without implants. Results: The deep learning model achieved an average AUROC of 0.98 and a PRAUC of 0.88 in the Canon dataset. The model achieved an AUROC of 0.985 and a PRAUC of 0.748 for images captured with GE devices. Additionally, the model predicted an AUROC of 0.909 and a PRAUC of 0.958 for the publicly available dataset. This model maintained the PRAUC values for quantitative validation when masking up to 90% of the least-contributing pixels and the remnant pixels in breast shell layers. Furthermore, the prediction uncertainty increased in the following order: Canon (0.066), GE (0072), ruptured implants (0.371), and no implants (0.777). Conclusions: We have demonstrated the feasibility of using deep learning to predict the shell texture type of breast implants. This approach quantifies the shell texture types of breast implants, supporting the first step in the diagnosis of BIA-ALCL. %M 39499915 %R 10.2196/58776 %U https://formative.jmir.org/2024/1/e58776 %U https://doi.org/10.2196/58776 %U http://www.ncbi.nlm.nih.gov/pubmed/39499915 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58149 %T AI-Supported Digital Microscopy Diagnostics in Primary Health Care Laboratories: Protocol for a Scoping Review %A von Bahr,Joar %A Diwan,Vinod %A Mårtensson,Andreas %A Linder,Nina %A Lundin,Johan %+ Department of Global Public Health, Karolinska Institutet, Tomtebodavägen 18A, Stockholm, 17177, Sweden, 46 708561007, joar.von.bahr@ki.se %K AI %K artificial intelligence %K convolutional neural network %K deep learning %K diagnosis %K digital diagnostics %K machine learning %K pathology %K primary health care %K whole slide images %D 2024 %7 1.11.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Digital microscopy combined with artificial intelligence (AI) is increasingly being implemented in health care, predominantly in advanced laboratory settings. However, AI-supported digital microscopy could be especially advantageous in primary health care settings, since such methods could improve access to diagnostics via automation and lead to a decreased need for experts on site. To our knowledge, no scoping or systematic review had been published on the use of AI-supported digital microscopy within primary health care laboratories when this scoping review was initiated. A scoping review can guide future research by providing insights to help navigate the challenges of implementing these novel methods in primary health care laboratories. Objective: The objective of this scoping review is to map peer-reviewed studies on AI-supported digital microscopy in primary health care laboratories to generate an overview of the subject. Methods: A systematic search of the databases PubMed, Web of Science, Embase, and IEEE will be conducted. Only peer-reviewed articles in English will be considered, and no limit on publication year will be applied. The concept inclusion criteria in the scoping review include studies that have applied AI-supported digital microscopy with the aim of achieving a diagnosis on the subject level. In addition, the studies must have been performed in the context of primary health care laboratories, as defined by the criteria of not having a pathologist on site and using simple sample preparations. The study selection and data extraction will be performed by 2 independent researchers, and in the case of disagreements, a third researcher will be involved. The results will be presented in a table developed by the researchers, including information on investigated diseases, sample collection, preparation and digitization, AI model used, and results. Furthermore, the results will be described narratively to provide an overview of the studies included. The proposed methodology is in accordance with the JBI methodology for scoping reviews. Results: The scoping review was initiated in January 2023, and a protocol was published in the Open Science Framework in January 2024. The protocol was completed in March 2024, and the systematic search will be performed after the protocol has been peer reviewed. The scoping review is expected to be finalized by the end of 2024. Conclusions: A systematic review of studies on AI-supported digital microscopy in primary health care laboratories is anticipated to identify the diseases where these novel methods could be advantageous, along with the shared challenges encountered and approaches taken to address them. International Registered Report Identifier (IRRID): PRR1-10.2196/58149 %M 39486020 %R 10.2196/58149 %U https://www.researchprotocols.org/2024/1/e58149 %U https://doi.org/10.2196/58149 %U http://www.ncbi.nlm.nih.gov/pubmed/39486020 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58572 %T Automated Speech Analysis for Risk Detection of Depression, Anxiety, Insomnia, and Fatigue: Algorithm Development and Validation Study %A Riad,Rachid %A Denais,Martin %A de Gennes,Marc %A Lesage,Adrien %A Oustric,Vincent %A Cao,Xuan Nga %A Mouchabac,Stéphane %A Bourla,Alexis %+ Callyope, 5 Parvis Alan Turing, Paris, 75013, France, 33 666522141, rachid@callyope.com %K speech analysis %K voice detection %K voice analysis %K speech biomarkers %K speech-based systems %K computer-aided diagnosis %K mental health symptom detection %K machine learning %K mental health %K fatigue %K anxiety %K depression %D 2024 %7 31.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in mental health do not properly assess the limitations of speech-based systems, such as uncertainty, or fairness for a safe clinical deployment. Objective: We investigated the predictive potential of mobile-collected speech data for detecting and estimating depression, anxiety, fatigue, and insomnia, focusing on other factors than mere accuracy, in the general population. Methods: We included 865 healthy adults and recorded their answers regarding their perceived mental and sleep states. We asked how they felt and if they had slept well lately. Clinically validated questionnaires measuring depression, anxiety, insomnia, and fatigue severity were also used. We developed a novel speech and machine learning pipeline involving voice activity detection, feature extraction, and model training. We automatically modeled speech with pretrained deep learning models that were pretrained on a large, open, and free database, and we selected the best one on the validation set. Based on the best speech modeling approach, clinical threshold detection, individual score prediction, model uncertainty estimation, and performance fairness across demographics (age, sex, and education) were evaluated. We used a train-validation-test split for all evaluations: to develop our models, select the best ones, and assess the generalizability of held-out data. Results: The best model was Whisper M with a max pooling and oversampling method. Our methods achieved good detection performance for all symptoms, depression (Patient Health Questionnaire-9: area under the curve [AUC]=0.76; F1-score=0.49 and Beck Depression Inventory: AUC=0.78; F1-score=0.65), anxiety (Generalized Anxiety Disorder 7-item scale: AUC=0.77; F1-score=0.50), insomnia (Athens Insomnia Scale: AUC=0.73; F1-score=0.62), and fatigue (Multidimensional Fatigue Inventory total score: AUC=0.68; F1-score=0.88). The system performed well when it needed to abstain from making predictions, as demonstrated by low abstention rates in depression detection with the Beck Depression Inventory and fatigue, with risk-coverage AUCs below 0.4. Individual symptom scores were accurately predicted (correlations were all significant with Pearson strengths between 0.31 and 0.49). Fairness analysis revealed that models were consistent for sex (average disparity ratio [DR] 0.86, SD 0.13), to a lesser extent for education level (average DR 0.47, SD 0.30), and worse for age groups (average DR 0.33, SD 0.30). Conclusions: This study demonstrates the potential of speech-based systems for multifaceted mental health assessment in the general population, not only for detecting clinical thresholds but also for estimating their severity. Addressing fairness and incorporating uncertainty estimation with selective classification are key contributions that can enhance the clinical utility and responsible implementation of such systems. %M 39324329 %R 10.2196/58572 %U https://www.jmir.org/2024/1/e58572 %U https://doi.org/10.2196/58572 %U http://www.ncbi.nlm.nih.gov/pubmed/39324329 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e50451 %T AI in Psoriatic Disease: Scoping Review %A Barlow,Richard %A Bewley,Anthony %A Gkini,Maria Angeliki %+ Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, Whitechapel Road, London, E1 1FR, United Kingdom, 44 020 7377 700, gkinimargo@gmail.com %K artificial intelligence %K machine learning %K psoriasis %K psoriatic arthritis %K psoriatic disease %K biologics %K prognostic models %K mobile phone %D 2024 %7 16.10.2024 %9 Review %J JMIR Dermatol %G English %X Background: Artificial intelligence (AI) has many applications in numerous medical fields, including dermatology. Although the majority of AI studies in dermatology focus on skin cancer, there is growing interest in the applicability of AI models in inflammatory diseases, such as psoriasis. Psoriatic disease is a chronic, inflammatory, immune-mediated systemic condition with multiple comorbidities and a significant impact on patients’ quality of life. Advanced treatments, including biologics and small molecules, have transformed the management of psoriatic disease. Nevertheless, there are still considerable unmet needs. Globally, delays in the diagnosis of the disease and its severity are common due to poor access to health care systems. Moreover, despite the abundance of treatments, we are unable to predict which is the right medication for the right patient, especially in resource-limited settings. AI could be an additional tool to address those needs. In this way, we can improve rates of diagnosis, accurately assess severity, and predict outcomes of treatment. Objective: This study aims to provide an up-to-date literature review on the use of AI in psoriatic disease, including diagnostics and clinical management as well as addressing the limitations in applicability. Methods: We searched the databases MEDLINE, PubMed, and Embase using the keywords “AI AND psoriasis OR psoriatic arthritis OR psoriatic disease,” “machine learning AND psoriasis OR psoriatic arthritis OR psoriatic disease,” and “prognostic model AND psoriasis OR psoriatic arthritis OR psoriatic disease” until June 1, 2023. Reference lists of relevant papers were also cross-examined for other papers not detected in the initial search. Results: Our literature search yielded 38 relevant papers. AI has been identified as a key component in digital health technologies. Within this field, there is the potential to apply specific techniques such as machine learning and deep learning to address several aspects of managing psoriatic disease. This includes diagnosis, particularly useful for remote teledermatology via photographs taken by patients as well as monitoring and estimating severity. Similarly, AI can be used to synthesize the vast data sets already in place through patient registries which can help identify appropriate biologic treatments for future cohorts and those individuals most likely to develop complications. Conclusions: There are multiple advantageous uses for AI and digital health technologies in psoriatic disease. With wider implementation of AI, we need to be mindful of potential limitations, such as validation and standardization or generalizability of results in specific populations, such as patients with darker skin phototypes. %M 39413371 %R 10.2196/50451 %U https://derma.jmir.org/2024/1/e50451 %U https://doi.org/10.2196/50451 %U http://www.ncbi.nlm.nih.gov/pubmed/39413371 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 9 %N %P e60399 %T Trends in South Korean Medical Device Development for Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder: Narrative Review %A Cho,Yunah %A Talboys,Sharon L %+ Division of Public Health, Department of Family and Preventive Medicine, University of Utah Asia Campus, 119-3 Songdomunhwa-ro, Yeonsu-gu, Incheon, 21985, Republic of Korea, 82 032 626 6901, yunah.cho@utah.edu %K ADHD %K attention-deficit/hyperactivity disorder %K ASD %K autism spectrum disorder %K medical device %K digital therapeutics %D 2024 %7 15.10.2024 %9 Review %J JMIR Biomed Eng %G English %X Background: Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) are among the most prevalent mental disorders among school-aged youth in South Korea and may play a role in the increasing pressures on teachers and school-based special education programming. A lack of support for special education; tensions between teachers, students, and parents; and limited backup for teacher absences are common complaints among Korean educators. New innovations in technology to screen and treat ADHD and ASD may offer relief to students, parents, and teachers through earlier and efficient diagnosis; access to treatment options; and ultimately, better-managed care and expectations. Objective: This narrative literature review provides an account of medical device use and development in South Korea for the diagnosis and management of ADHD and ASD and highlights research gaps. Methods: A narrative review was conducted across 4 databases (PubMed, Korean National Assembly Library, Scopus, and PsycINFO). Journal articles, dissertations, and government research and development reports were included if they discussed medical devices for ADHD and ASD. Only Korean or English papers were included. Resources were excluded if they did not correspond to the research objective or did not discuss at least 1 topic about medical devices for ADHD and ASD. Journal articles were excluded if they were not peer reviewed. Resources were limited to publications between 2013 and July 22, 2024. Results: A total of 1794 records about trends in Korean medical device development were categorized into 2 major groups: digital therapeutics and traditional therapy. Digital therapeutics resulted in 5 subgroups: virtual reality and artificial intelligence, machine learning and robot, gaming and visual contents, eye-feedback and movement intervention, and electroencephalography and neurofeedback. Traditional therapy resulted in 3 subgroups: cognitive behavioral therapy and working memory; diagnosis and rating scale; and musical, literary therapy, and mindfulness-based stress reduction. Digital therapeutics using artificial intelligence, machine learning, and electroencephalography technologies account for the biggest portions of development in South Korea, rather than traditional therapies. Most resources, 94.15% (1689/1794), were from the Korean National Assembly Library. Conclusions: Limitations include small sizes of populations to conclude findings in many articles, a lower number of articles discussing medical devices for ASD, and a majority of articles being dissertations. Emerging digital medical devices and those integrated with traditional therapies are important solutions to reducing the prevalence rates of ADHD and ASD in South Korea by promoting early diagnosis and intervention. Furthermore, their application will relieve pressures on teachers and school-based special education programming by providing direct supporting resources to students with ADHD or ASD. Future development of medical devices for ADHD and ASD is predicted to heavily rely on digital technologies, such as those that sense people’s behaviors, eye movement, and brainwaves. %M 39405518 %R 10.2196/60399 %U https://biomedeng.jmir.org/2024/1/e60399 %U https://doi.org/10.2196/60399 %U http://www.ncbi.nlm.nih.gov/pubmed/39405518 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e59810 %T Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms %A Mao,Lijun %A Yu,Zhen %A Lin,Luotao %A Sharma,Manoj %A Song,Hualing %A Zhao,Hailei %A Xu,Xianglong %K visual impairment %K China %K middle-aged and elderly adults %K machine learning %K prediction model %D 2024 %7 9.10.2024 %9 %J JMIR Aging %G English %X Background: Visual impairment (VI) is a prevalent global health issue, affecting over 2.2 billion people worldwide, with nearly half of the Chinese population aged 60 years and older being affected. Early detection of high-risk VI is essential for preventing irreversible vision loss among Chinese middle-aged and older adults. While machine learning (ML) algorithms exhibit significant predictive advantages, their application in predicting VI risk among the general middle-aged and older adult population in China remains limited. Objective: This study aimed to predict VI and identify its determinants using ML algorithms. Methods: We used 19,047 participants from 4 waves of the China Health and Retirement Longitudinal Study (CHARLS) that were conducted between 2011 and 2018. To envisage the prevalence of VI, we generated a geographical distribution map. Additionally, we constructed a model using indicators of a self-reported questionnaire, a physical examination, and blood biomarkers as predictors. Multiple ML algorithms, including gradient boosting machine, distributed random forest, the generalized linear model, deep learning, and stacked ensemble, were used for prediction. We plotted receiver operating characteristic and calibration curves to assess the predictive performance. Variable importance analysis was used to identify key predictors. Results: Among all participants, 33.9% (6449/19,047) had VI. Qinghai, Chongqing, Anhui, and Sichuan showed the highest VI rates, while Beijing and Xinjiang had the lowest. The generalized linear model, gradient boosting machine, and stacked ensemble achieved acceptable area under curve values of 0.706, 0.710, and 0.715, respectively, with the stacked ensemble performing best. Key predictors included hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, high-density lipoprotein cholesterol, and arthritis or rheumatism. Conclusions: Nearly one-third of middle-aged and older adults in China had VI. The prevalence of VI shows regional variations, but there are no distinct east-west or north-south distribution differences. ML algorithms demonstrate accurate predictive capabilities for VI. The combination of prediction models and variable importance analysis provides valuable insights for the early identification and intervention of VI among Chinese middle-aged and older adults. %R 10.2196/59810 %U https://aging.jmir.org/2024/1/e59810 %U https://doi.org/10.2196/59810 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56851 %T Development and Validation of a Computed Tomography–Based Model for Noninvasive Prediction of the T Stage in Gastric Cancer: Multicenter Retrospective Study %A Tao,Jin %A Liu,Dan %A Hu,Fu-Bi %A Zhang,Xiao %A Yin,Hongkun %A Zhang,Huiling %A Zhang,Kai %A Huang,Zixing %A Yang,Kun %+ Department of General Surgery and Laboratory of Gastric Cancer, State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Guo Xue street, Chengdu, 610041, China, 86 18980606729, yangkun068@163.com %K gastric cancer %K computed tomography %K radiomics %K T stage %K deep learning %K cancer %K multicenter study %K accuracy %K binary classification %K tumor %K hybrid model %K performance %K pathological stage %D 2024 %7 9.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: As part of the TNM (tumor-node-metastasis) staging system, T staging based on tumor depth is crucial for developing treatment plans. Previous studies have constructed a deep learning model based on computed tomographic (CT) radiomic signatures to predict the number of lymph node metastases and survival in patients with resected gastric cancer (GC). However, few studies have reported the combination of deep learning and radiomics in predicting T staging in GC. Objective: This study aimed to develop a CT-based model for automatic prediction of the T stage of GC via radiomics and deep learning. Methods: A total of 771 GC patients from 3 centers were retrospectively enrolled and divided into training, validation, and testing cohorts. Patients with GC were classified into mild (stage T1 and T2), moderate (stage T3), and severe (stage T4) groups. Three predictive models based on the labeled CT images were constructed using the radiomics features (radiomics model), deep features (deep learning model), and a combination of both (hybrid model). Results: The overall classification accuracy of the radiomics model was 64.3% in the internal testing data set. The deep learning model and hybrid model showed better performance than the radiomics model, with overall classification accuracies of 75.7% (P=.04) and 81.4% (P=.001), respectively. On the subtasks of binary classification of tumor severity, the areas under the curve of the radiomics, deep learning, and hybrid models were 0.875, 0.866, and 0.886 in the internal testing data set and 0.820, 0.818, and 0.972 in the external testing data set, respectively, for differentiating mild (stage T1~T2) from nonmild (stage T3~T4) patients, and were 0.815, 0.892, and 0.894 in the internal testing data set and 0.685, 0.808, and 0.897 in the external testing data set, respectively, for differentiating nonsevere (stage T1~T3) from severe (stage T4) patients. Conclusions: The hybrid model integrating radiomics features and deep features showed favorable performance in diagnosing the pathological stage of GC. %M 39382960 %R 10.2196/56851 %U https://www.jmir.org/2024/1/e56851 %U https://doi.org/10.2196/56851 %U http://www.ncbi.nlm.nih.gov/pubmed/39382960 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e63010 %T Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Ito,Takahiro %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 282861111, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K clinical decision support %K diagnostic excellence %K generative artificial intelligence %K large language models %K natural language processing %D 2024 %7 2.10.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Generative artificial intelligence (GAI) systems by Google have recently been updated from Bard to Gemini and Gemini Advanced as of December 2023. Gemini is a basic, free-to-use model after a user’s login, while Gemini Advanced operates on a more advanced model requiring a fee-based subscription. These systems have the potential to enhance medical diagnostics. However, the impact of these updates on comprehensive diagnostic accuracy remains unknown. Objective: This study aimed to compare the accuracy of the differential diagnosis lists generated by Gemini Advanced, Gemini, and Bard across comprehensive medical fields using case report series. Methods: We identified a case report series with relevant final diagnoses published in the American Journal Case Reports from January 2022 to March 2023. After excluding nondiagnostic cases and patients aged 10 years and younger, we included the remaining case reports. After refining the case parts as case descriptions, we input the same case descriptions into Gemini Advanced, Gemini, and Bard to generate the top 10 differential diagnosis lists. In total, 2 expert physicians independently evaluated whether the final diagnosis was included in the lists and its ranking. Any discrepancies were resolved by another expert physician. Bonferroni correction was applied to adjust the P values for the number of comparisons among 3 GAI systems, setting the corrected significance level at P value <.02. Results: In total, 392 case reports were included. The inclusion rates of the final diagnosis within the top 10 differential diagnosis lists were 73% (286/392) for Gemini Advanced, 76.5% (300/392) for Gemini, and 68.6% (269/392) for Bard. The top diagnoses matched the final diagnoses in 31.6% (124/392) for Gemini Advanced, 42.6% (167/392) for Gemini, and 31.4% (123/392) for Bard. Gemini demonstrated higher diagnostic accuracy than Bard both within the top 10 differential diagnosis lists (P=.02) and as the top diagnosis (P=.001). In addition, Gemini Advanced achieved significantly lower accuracy than Gemini in identifying the most probable diagnosis (P=.002). Conclusions: The results of this study suggest that Gemini outperformed Bard in diagnostic accuracy following the model update. However, Gemini Advanced requires further refinement to optimize its performance for future artificial intelligence–enhanced diagnostics. These findings should be interpreted cautiously and considered primarily for research purposes, as these GAI systems have not been adjusted for medical diagnostics nor approved for clinical use. %M 39357052 %R 10.2196/63010 %U https://medinform.jmir.org/2024/1/e63010 %U https://doi.org/10.2196/63010 %U http://www.ncbi.nlm.nih.gov/pubmed/39357052 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52323 %T Game-Based Assessment of Peripheral Neuropathy Combining Sensor-Equipped Insoles, Video Games, and AI: Proof-of-Concept Study %A Ming,Antao %A Clemens,Vera %A Lorek,Elisabeth %A Wall,Janina %A Alhajjar,Ahmad %A Galazky,Imke %A Baum,Anne-Katrin %A Li,Yang %A Li,Meng %A Stober,Sebastian %A Mertens,Nils David %A Mertens,Peter Rene %+ University Clinic for Nephrology and Hypertension, Diabetology and Endocrinology, Otto von Guericke University Magdeburg, Leipziger Straße 44, Magdeburg, 39120, Germany, 49 391 6713236, peter.mertens@med.ovgu.de %K diabetes mellitus %K metabolic syndrome %K peripheral neuropathy %K sensor-equipped insoles %K video games %K machine learning %K feature extraction %D 2024 %7 1.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Detecting peripheral neuropathy (PNP) is crucial in preventing complications such as foot ulceration. Clinical examinations for PNP are infrequently provided to patients at high risk due to restrictions on facilities, care providers, or time. A gamified health assessment approach combining wearable sensors holds the potential to address these challenges and provide individuals with instantaneous feedback on their health status. Objective: We aimed to develop and evaluate an application that assesses PNP through video games controlled by pressure sensor–equipped insoles. Methods: In the proof-of-concept exploratory cohort study, a complete game-based framework that allowed the study participant to play 4 video games solely by modulating plantar pressure values was established in an outpatient clinic setting. Foot plantar pressures were measured by the sensor-equipped insole and transferred via Bluetooth to an Android tablet for game control in real time. Game results and sensor data were delivered to the study server for visualization and analysis. Each session lasted about 15 minutes. In total, 299 patients with diabetes mellitus and 30 with metabolic syndrome were tested using the game application. Patients’ game performance was initially assessed by hypothesis-driven key capabilities that consisted of reaction time, sensation, skillfulness, balance, endurance, and muscle strength. Subsequently, specific game features were extracted from gaming data sets and compared with nerve conduction study findings, neuropathy symptoms, or disability scores. Multiple machine learning algorithms were applied to 70% (n=122) of acquired data to train predictive models for PNP, while the remaining data were held out for final model evaluation. Results: Overall, clinically evident PNP was present in 247 of 329 (75.1%) participants, with 88 (26.7%) individuals showing asymmetric nerve deficits. In a subcohort (n=37) undergoing nerve conduction study as the gold standard, sensory and motor nerve conduction velocities and nerve amplitudes in lower extremities significantly correlated with 79 game features (|R|>0.4, highest R value +0.65; P<.001; adjusted R2=0.36). Within another subcohort (n=173) with normal cognition and matched covariates (age, sex, BMI, etc), hypothesis-driven key capabilities and specific game features were significantly correlated with the presence of PNP. Predictive models using selected game features achieved 76.1% (left) and 81.7% (right foot) accuracy for PNP detection. Multiclass models yielded an area under the receiver operating characteristic curve of 0.76 (left foot) and 0.72 (right foot) for assessing nerve damage patterns (small, large, or mixed nerve fiber damage). Conclusions: The game-based application presents a promising avenue for PNP screening and classification. Evaluation in expanded cohorts may iteratively optimize artificial intelligence model efficacy. The integration of engaging motivational elements and automated data interpretation will support acceptance as a telemedical application. %M 39353184 %R 10.2196/52323 %U https://www.jmir.org/2024/1/e52323 %U https://doi.org/10.2196/52323 %U http://www.ncbi.nlm.nih.gov/pubmed/39353184 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 8 %N %P e60503 %T Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach %A Xie,Fagen %A Lee,Ming-sum %A Allahwerdy,Salam %A Getahun,Darios %A Wessler,Benjamin %A Chen,Wansu %+ Department of Research and Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 6265643294, fagen.xie@kp.org %K echocardiography report %K heart valve %K stenosis %K regurgitation %K natural language processing %K algorithm %D 2024 %7 30.9.2024 %9 Original Paper %J JMIR Cardio %G English %X Background: Valvular heart disease (VHD) is a leading cause of cardiovascular morbidity and mortality that poses a substantial health care and economic burden on health care systems. Administrative diagnostic codes for ascertaining VHD diagnosis are incomplete. Objective: This study aimed to develop a natural language processing (NLP) algorithm to identify patients with aortic, mitral, tricuspid, and pulmonic valve stenosis and regurgitation from transthoracic echocardiography (TTE) reports within a large integrated health care system. Methods: We used reports from echocardiograms performed in the Kaiser Permanente Southern California (KPSC) health care system between January 1, 2011, and December 31, 2022. Related terms/phrases of aortic, mitral, tricuspid, and pulmonic stenosis and regurgitation and their severities were compiled from the literature and enriched with input from clinicians. An NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review, followed by adjudication. The developed algorithm was applied to 200 annotated echocardiography reports to assess its performance and then the study echocardiography reports. Results: A total of 1,225,270 TTE reports were extracted from KPSC electronic health records during the study period. In these reports, valve lesions identified included 111,300 (9.08%) aortic stenosis, 20,246 (1.65%) mitral stenosis, 397 (0.03%) tricuspid stenosis, 2585 (0.21%) pulmonic stenosis, 345,115 (28.17%) aortic regurgitation, 802,103 (65.46%) mitral regurgitation, 903,965 (73.78%) tricuspid regurgitation, and 286,903 (23.42%) pulmonic regurgitation. Among the valves, 50,507 (4.12%), 22,656 (1.85%), 1685 (0.14%), and 1767 (0.14%) were identified as prosthetic aortic valves, mitral valves, tricuspid valves, and pulmonic valves, respectively. Mild and moderate were the most common severity levels of heart valve stenosis, while trace and mild were the most common severity levels of regurgitation. Males had a higher frequency of aortic stenosis and all 4 valvular regurgitations, while females had more mitral, tricuspid, and pulmonic stenosis. Non-Hispanic Whites had the highest frequency of all 4 valvular stenosis and regurgitations. The distribution of valvular stenosis and regurgitation severity was similar across race/ethnicity groups. Frequencies of aortic stenosis, mitral stenosis, and regurgitation of all 4 heart valves increased with age. In TTE reports with stenosis detected, younger patients were more likely to have mild aortic stenosis, while older patients were more likely to have severe aortic stenosis. However, mitral stenosis was opposite (milder in older patients and more severe in younger patients). In TTE reports with regurgitation detected, younger patients had a higher frequency of severe/very severe aortic regurgitation. In comparison, older patients had higher frequencies of mild aortic regurgitation and severe mitral/tricuspid regurgitation. Validation of the NLP algorithm against the 200 annotated TTE reports showed excellent precision, recall, and F1-scores. Conclusions: The proposed computerized algorithm could effectively identify heart valve stenosis and regurgitation, as well as the severity of valvular involvement, with significant implications for pharmacoepidemiological studies and outcomes research. %M 39348175 %R 10.2196/60503 %U https://cardio.jmir.org/2024/1/e60503 %U https://doi.org/10.2196/60503 %U http://www.ncbi.nlm.nih.gov/pubmed/39348175 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e49720 %T Assessing the Utility of a Patient-Facing Diagnostic Tool Among Individuals With Hypermobile Ehlers-Danlos Syndrome: Focus Group Study %A Goehringer,Jessica %A Kosmin,Abigail %A Laible,Natalie %A Romagnoli,Katrina %+ Department of Genomic Health, Geisinger, 100 North Academy Avenue, Dept of Genomic Health, Danville, PA, 17822, United States, 1 5702141005, jgoehringer@geisinger.edu %K diagnostic tool %K hypermobile Ehlers-Danlos syndrome %K patient experiences %K diagnostic odyssey %K affinity mapping %K mobile health app %K mobile phone %D 2024 %7 26.9.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Hypermobile Ehlers-Danlos syndrome (hEDS), characterized by joint hypermobility, skin laxity, and tissue fragility, is thought to be the most common inherited connective tissue disorder, with millions affected worldwide. Diagnosing this condition remains a challenge that can impact quality of life for individuals with hEDS. Many with hEDS describe extended diagnostic odysseys involving exorbitant time and monetary investment. This delay is due to the complexity of diagnosis, symptom overlap with other conditions, and limited access to providers. Many primary care providers are unfamiliar with hEDS, compounded by genetics clinics that do not accept referrals for hEDS evaluation and long waits for genetics clinics that do evaluate for hEDS, leaving patients without sufficient options. Objective: This study explored the user experience, quality, and utility of a prototype of a patient-facing diagnostic tool intended to support clinician diagnosis for individuals with symptoms of hEDS. The questions included within the prototype are aligned with the 2017 international classification of Ehlers-Danlos syndromes. This study explored how this tool may help patients communicate information about hEDS to their physicians, influencing the diagnosis of hEDS and affecting patient experience. Methods: Participants clinically diagnosed with hEDS were recruited from either a medical center or private groups on a social media platform. Interested participants provided verbal consent, completed questionnaires about their diagnosis, and were invited to join an internet-based focus group to share their thoughts and opinions on a diagnostic tool prototype. Participants were invited to complete the Mobile App Rating Scale (MARS) to evaluate their experience viewing the diagnostic tool. The MARS is a framework for evaluating mobile health apps across 4 dimensions: engagement, functionality, esthetics, and information quality. Qualitative data were analyzed using affinity mapping to organize information and inductively create themes that were categorized within the MARS framework dimensions to help identify strengths and weaknesses of the diagnostic tool prototype. Results: In total, 15 individuals participated in the internet-based focus groups; 3 (20%) completed the MARS. Through affinity diagramming, 2 main categories of responses were identified, including responses related to the user interface and responses related to the application of the tool. Each category included several themes and subthemes that mapped well to the 4 MARS dimensions. The analysis showed that the tool held value and utility among the participants diagnosed with hEDS. The shareable ending summary sheet provided by the tool stood out as a strength for facilitating communication between patient and provider during the diagnostic evaluation. Conclusions: The results provide insights on the perceived utility and value of the tool, including preferred phrasing, layout and design preferences, and tool accessibility. The participants expressed that the tool may improve the hEDS diagnostic odyssey and help educate providers about the diagnostic process. %M 39325533 %R 10.2196/49720 %U https://formative.jmir.org/2024/1/e49720 %U https://doi.org/10.2196/49720 %U http://www.ncbi.nlm.nih.gov/pubmed/39325533 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58202 %T Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies Using AI (QUADAS-AI): Protocol for a Qualitative Study %A Guni,Ahmad %A Sounderajah,Viknesh %A Whiting,Penny %A Bossuyt,Patrick %A Darzi,Ara %A Ashrafian,Hutan %+ Institute of Global Health Innovation, Imperial College London, 10th Floor QEQM Building, St Mary’s Hospital, Praed St, London, W2 1NY, United Kingdom, 44 2075895111, h.ashrafian@imperial.ac.uk %K artificial intelligence %K AI %K AI-specific quality assessment of diagnostic accuracy studies %K QUADAS-AI %K AI-driven %K diagnostics %K evidence synthesis %K quality assessment %K evaluation %K diagnostic %K accuracy %K bias %K translation %K clinical practice %K assessment tool %K diagnostic service %D 2024 %7 18.9.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Quality assessment of diagnostic accuracy studies (QUADAS), and more recently QUADAS-2, were developed to aid the evaluation of methodological quality within primary diagnostic accuracy studies. However, its current form, QUADAS-2 does not address the unique considerations raised by artificial intelligence (AI)–centered diagnostic systems. The rapid progression of the AI diagnostics field mandates suitable quality assessment tools to determine the risk of bias and applicability, and subsequently evaluate translational potential for clinical practice. Objective: We aim to develop an AI-specific QUADAS (QUADAS-AI) tool that addresses the specific challenges associated with the appraisal of AI diagnostic accuracy studies. This paper describes the processes and methods that will be used to develop QUADAS-AI. Methods: The development of QUADAS-AI can be distilled into 3 broad stages. Stage 1—a project organization phase had been undertaken, during which a project team and a steering committee were established. The steering committee consists of a panel of international experts representing diverse stakeholder groups. Following this, the scope of the project was finalized. Stage 2—an item generation process will be completed following (1) a mapping review, (2) a meta-research study, (3) a scoping survey of international experts, and (4) a patient and public involvement and engagement exercise. Candidate items will then be put forward to the international Delphi panel to achieve consensus for inclusion in the revised tool. A modified Delphi consensus methodology involving multiple online rounds and a final consensus meeting will be carried out to refine the tool, following which the initial QUADAS-AI tool will be drafted. A piloting phase will be carried out to identify components that are considered to be either ambiguous or missing. Stage 3—once the steering committee has finalized the QUADAS-AI tool, specific dissemination strategies will be aimed toward academic, policy, regulatory, industry, and public stakeholders, respectively. Results: As of July 2024, the project organization phase, as well as the mapping review and meta-research study, have been completed. We aim to complete the item generation, including the Delphi consensus, and finalize the tool by the end of 2024. Therefore, QUADAS-AI will be able to provide a consensus-derived platform upon which stakeholders may systematically appraise the methodological quality associated with AI diagnostic accuracy studies by the beginning of 2025. Conclusions: AI-driven systems comprise an increasingly significant proportion of research in clinical diagnostics. Through this process, QUADAS-AI will aid the evaluation of studies in this domain in order to identify bias and applicability concerns. As such, QUADAS-AI may form a key part of clinical, governmental, and regulatory evaluation frameworks for AI diagnostic systems globally. International Registered Report Identifier (IRRID): DERR1-10.2196/58202 %M 39293047 %R 10.2196/58202 %U https://www.researchprotocols.org/2024/1/e58202 %U https://doi.org/10.2196/58202 %U http://www.ncbi.nlm.nih.gov/pubmed/39293047 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e54655 %T Investigating Acoustic and Psycholinguistic Predictors of Cognitive Impairment in Older Adults: Modeling Study %A Badal,Varsha D %A Reinen,Jenna M %A Twamley,Elizabeth W %A Lee,Ellen E %A Fellows,Robert P %A Bilal,Erhan %A Depp,Colin A %+ IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY, United States, 1 9149453000, ebilal@us.ibm.com %K acoustic %K psycholinguistic %K speech %K speech marker %K speech markers %K cognitive impairment %K CI %K mild cognitive impairment %K MCI %K cognitive disability %K cognitive restriction %K cognitive limitation %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K early warning %K early detection %K NLP %K natural language processing %K Alzheimer %K dementia %K neurological decline %K neurocognition %K neurocognitive disorder %D 2024 %7 16.9.2024 %9 Original Paper %J JMIR Aging %G English %X Background: About one-third of older adults aged 65 years and older often have mild cognitive impairment or dementia. Acoustic and psycho-linguistic features derived from conversation may be of great diagnostic value because speech involves verbal memory and cognitive and neuromuscular processes. The relative decline in these processes, however, may not be linear and remains understudied. Objective: This study aims to establish associations between cognitive abilities and various attributes of speech and natural language production. To date, the majority of research has been cross-sectional, relying mostly on data from structured interactions and restricted to textual versus acoustic analyses. Methods: In a sample of 71 older (mean age 83.3, SD 7.0 years) community-dwelling adults who completed qualitative interviews and cognitive testing, we investigated the performance of both acoustic and psycholinguistic features associated with cognitive deficits contemporaneously and at a 1-2 years follow up (mean follow-up time 512.3, SD 84.5 days). Results: Combined acoustic and psycholinguistic features achieved high performance (F1-scores 0.73-0.86) and sensitivity (up to 0.90) in estimating cognitive deficits across multiple domains. Performance remained high when acoustic and psycholinguistic features were used to predict follow-up cognitive performance. The psycholinguistic features that were most successful at classifying high cognitive impairment reflected vocabulary richness, the quantity of speech produced, and the fragmentation of speech, whereas the analogous top-ranked acoustic features reflected breathing and nonverbal vocalizations such as giggles or laughter. Conclusions: These results suggest that both acoustic and psycholinguistic features extracted from qualitative interviews may be reliable markers of cognitive deficits in late life. %M 39283659 %R 10.2196/54655 %U https://aging.jmir.org/2024/1/e54655 %U https://doi.org/10.2196/54655 %U http://www.ncbi.nlm.nih.gov/pubmed/39283659 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58347 %T Alarm Management in Provisional COVID-19 Intensive Care Units: Retrospective Analysis and Recommendations for Future Pandemics %A Wunderlich,Maximilian Markus %A Frey,Nicolas %A Amende-Wolf,Sandro %A Hinrichs,Carl %A Balzer,Felix %A Poncette,Akira-Sebastian %+ Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Invalidenstraße 90, Berlin, 10115 Berlin, Germany, 49 030 450 581 018, akira-sebastian.poncette@charite.de %K patient monitoring %K intensive care unit %K ICU %K alarm fatigue %K alarm management %K patient safety %K alarm system %K alarm system quality %K medical devices %K clinical alarms %K COVID-19 %D 2024 %7 9.9.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. Objective: We aimed to compare alarm situations between provisional COVID-19 ICUs and non–COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. Methods: Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non–COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. Results: COVID-19 ICUs had significantly more alarms per bed per day than non–COVID-19 ICUs (P<.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19–specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (P=.04). Conclusions: Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings. %M 39250783 %R 10.2196/58347 %U https://medinform.jmir.org/2024/1/e58347 %U https://doi.org/10.2196/58347 %U http://www.ncbi.nlm.nih.gov/pubmed/39250783 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 9 %N %P e59867 %T Implementation of Artificial Intelligence–Based Diabetic Retinopathy Screening in a Tertiary Care Hospital in Quebec: Prospective Validation Study %A Antaki,Fares %A Hammana,Imane %A Tessier,Marie-Catherine %A Boucher,Andrée %A David Jetté,Maud Laurence %A Beauchemin,Catherine %A Hammamji,Karim %A Ong,Ariel Yuhan %A Rhéaume,Marc-André %A Gauthier,Danny %A Harissi-Dagher,Mona %A Keane,Pearse A %A Pomp,Alfons %+ Institute of Ophthalmology, University College London, 11-43 Bath St, London, EC1V 9EL, United Kingdom, 44 20 7608 6800, f.antaki@ucl.ac.uk %K artificial intelligence %K diabetic retinopathy %K screening %K clinical validation %K diabetic %K diabetes %K screening %K tertiary care hospital %K validation study %K Quebec %K Canada %K vision %K vision loss %K ophthalmological %K AI %K detection %K eye %D 2024 %7 3.9.2024 %9 Original Paper %J JMIR Diabetes %G English %X Background: Diabetic retinopathy (DR) affects about 25% of people with diabetes in Canada. Early detection of DR is essential for preventing vision loss. Objective: We evaluated the real-world performance of an artificial intelligence (AI) system that analyzes fundus images for DR screening in a Quebec tertiary care center. Methods: We prospectively recruited adult patients with diabetes at the Centre hospitalier de l’Université de Montréal (CHUM) in Montreal, Quebec, Canada. Patients underwent dual-pathway screening: first by the Computer Assisted Retinal Analysis (CARA) AI system (index test), then by standard ophthalmological examination (reference standard). We measured the AI system's sensitivity and specificity for detecting referable disease at the patient level, along with its performance for detecting any retinopathy and diabetic macular edema (DME) at the eye level, and potential cost savings. Results: This study included 115 patients. CARA demonstrated a sensitivity of 87.5% (95% CI 71.9-95.0) and specificity of 66.2% (95% CI 54.3-76.3) for detecting referable disease at the patient level. For any retinopathy detection at the eye level, CARA showed 88.2% sensitivity (95% CI 76.6-94.5) and 71.4% specificity (95% CI 63.7-78.1). For DME detection, CARA had 100% sensitivity (95% CI 64.6-100) and 81.9% specificity (95% CI 75.6-86.8). Potential yearly savings from implementing CARA at the CHUM were estimated at CAD $245,635 (US $177,643.23, as of July 26, 2024) considering 5000 patients with diabetes. Conclusions: Our study indicates that integrating a semiautomated AI system for DR screening demonstrates high sensitivity for detecting referable disease in a real-world setting. This system has the potential to improve screening efficiency and reduce costs at the CHUM, but more work is needed to validate it. %M 39226095 %R 10.2196/59867 %U https://diabetes.jmir.org/2024/1/e59867 %U https://doi.org/10.2196/59867 %U http://www.ncbi.nlm.nih.gov/pubmed/39226095 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58456 %T Impact of an Electronic Health Record–Based Interruptive Alert Among Patients With Headaches Seen in Primary Care: Cluster Randomized Controlled Trial %A Pradhan,Apoorva %A Wright,Eric A %A Hayduk,Vanessa A %A Berhane,Juliana %A Sponenberg,Mallory %A Webster,Leeann %A Anderson,Hannah %A Park,Siyeon %A Graham,Jove %A Friedenberg,Scott %K headache management %K migraine management %K electronic health record–based alerts %K primary care %K clinician decision support tools %K electronic health record %K EHR %D 2024 %7 29.8.2024 %9 %J JMIR Med Inform %G English %X Background: Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%‐30% of referrals from primary care to neurology. In primary care, electronic health record–based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care. Objective: This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients’ overall headache management. Methods: We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to–health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level. Results: We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate –1.89, 95% CI –5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon. Conclusions: The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches. Trial Registration: ClinicalTrials.gov NCT05067725; https://clinicaltrials.gov/study/NCT05067725 %R 10.2196/58456 %U https://medinform.jmir.org/2024/1/e58456 %U https://doi.org/10.2196/58456 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55476 %T Recognition of Forward Head Posture Through 3D Human Pose Estimation With a Graph Convolutional Network: Development and Feasibility Study %A Lee,Haedeun %A Oh,Bumjo %A Kim,Seung-Chan %+ Machine Learning Systems Laboratory, School of Sports Science, Sungkyunkwan University, Seoburo 2066, Suwon, Gyunggi-do, 16419, Republic of Korea, 82 31 299 6918, seungchan@ieee.org %K posture correction %K injury prediction %K human pose estimation %K forward head posture %K machine learning %K graph convolutional networks %K posture %K graph neural network %K graph %K pose %K postural %K deep learning %K neural network %K neural networks %K upper %K algorithms %D 2024 %7 26.8.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Prolonged improper posture can lead to forward head posture (FHP), causing headaches, impaired respiratory function, and fatigue. This is especially relevant in sedentary scenarios, where individuals often maintain static postures for extended periods—a significant part of daily life for many. The development of a system capable of detecting FHP is crucial, as it would not only alert users to correct their posture but also serve the broader goal of contributing to public health by preventing the progression of chronic injuries associated with this condition. However, despite significant advancements in estimating human poses from standard 2D images, most computational pose models do not include measurements of the craniovertebral angle, which involves the C7 vertebra, crucial for diagnosing FHP. Objective: Accurate diagnosis of FHP typically requires dedicated devices, such as clinical postural assessments or specialized imaging equipment, but their use is impractical for continuous, real-time monitoring in everyday settings. Therefore, developing an accessible, efficient method for regular posture assessment that can be easily integrated into daily activities, providing real-time feedback, and promoting corrective action, is necessary. Methods: The system sequentially estimates 2D and 3D human anatomical key points from a provided 2D image, using the Detectron2D and VideoPose3D algorithms, respectively. It then uses a graph convolutional network (GCN), explicitly crafted to analyze the spatial configuration and alignment of the upper body’s anatomical key points in 3D space. This GCN aims to implicitly learn the intricate relationship between the estimated 3D key points and the correct posture, specifically to identify FHP. Results: The test accuracy was 78.27% when inputs included all joints corresponding to the upper body key points. The GCN model demonstrated slightly superior balanced performance across classes with an F1-score (macro) of 77.54%, compared to the baseline feedforward neural network (FFNN) model’s 75.88%. Specifically, the GCN model showed a more balanced precision and recall between the classes, suggesting its potential for better generalization in FHP detection across diverse postures. Meanwhile, the baseline FFNN model demonstrates a higher precision for FHP cases but at the cost of lower recall, indicating that while it is more accurate in confirming FHP when detected, it misses a significant number of actual FHP instances. This assertion is further substantiated by the examination of the latent feature space using t-distributed stochastic neighbor embedding, where the GCN model presented an isotropic distribution, unlike the FFNN model, which showed an anisotropic distribution. Conclusions: Based on 2D image input using 3D human pose estimation joint inputs, it was found that it is possible to learn FHP-related features using the proposed GCN-based network to develop a posture correction system. We conclude the paper by addressing the limitations of our current system and proposing potential avenues for future work in this area. %M 39186772 %R 10.2196/55476 %U https://formative.jmir.org/2024/1/e55476 %U https://doi.org/10.2196/55476 %U http://www.ncbi.nlm.nih.gov/pubmed/39186772 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55641 %T Comparing the Output of an Artificial Intelligence Algorithm in Detecting Radiological Signs of Pulmonary Tuberculosis in Digital Chest X-Rays and Their Smartphone-Captured Photos of X-Ray Films: Retrospective Study %A Ridhi,Smriti %A Robert,Dennis %A Soren,Pitamber %A Kumar,Manish %A Pawar,Saniya %A Reddy,Bhargava %+ Qure.ai, 2nd floor, Prestige Summit, Halasuru, Bangalore, 560042, India, 91 9611981003, dennis.robert.nm@gmail.com %K artificial intelligence %K AI %K deep learning %K early detection %K tuberculosis %K TB %K computer-aided detection %K diagnostic accuracy %K chest x-ray %K mobile phone %D 2024 %7 21.8.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) based computer-aided detection devices are recommended for screening and triaging of pulmonary tuberculosis (TB) using digital chest x-ray (CXR) images (soft copies). Most AI algorithms are trained using input data from digital CXR Digital Imaging and Communications in Medicine (DICOM) files. There can be scenarios when only digital CXR films (hard copies) are available for interpretation. A smartphone-captured photo of the digital CXR film may be used for AI to process in such a scenario. There is a gap in the literature investigating if there is a significant difference in the performance of AI algorithms when digital CXR DICOM files are used as input for AI to process as opposed to photos of the digital CXR films being used as input. Objective: The primary objective was to compare the agreement of AI in detecting radiological signs of TB when using DICOM files (denoted as CXRd) as input versus when using smartphone-captured photos of digital CXR films (denoted as CXRp) with human readers. Methods: Pairs of CXRd and CXRp images were obtained retrospectively from patients screened for TB. AI results were obtained using both the CXRd and CXRp files. The majority consensus on the presence or absence of TB in CXR pairs was obtained from a panel of 3 independent radiologists. The positive and negative percent agreement of AI in detecting radiological signs of TB in CXRd and CXRp were estimated by comparing with the majority consensus. The distribution of AI probability scores was also compared. Results: A total of 1278 CXR pairs were analyzed. The positive percent agreement of AI was found to be 92.22% (95% CI 89.94-94.12) and 90.75% (95% CI 88.32-92.82), respectively, for CXRd and CXRp images (P=.09). The negative percent agreement of AI was 82.08% (95% CI 78.76-85.07) and 79.23% (95% CI 75.75-82.42), respectively, for CXRd and CXRp images (P=.06). The median of the AI probability score was 0.72 (IQR 0.11-0.97) in CXRd and 0.72 (IQR 0.14-0.96) in CXRp images (P=.75). Conclusions: We did not observe any statistically significant differences in the output of AI in digital CXRs and photos of digital CXR films. %M 39167435 %R 10.2196/55641 %U https://formative.jmir.org/2024/1/e55641 %U https://doi.org/10.2196/55641 %U http://www.ncbi.nlm.nih.gov/pubmed/39167435 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e46946 %T Exploring Computational Techniques in Preprocessing Neonatal Physiological Signals for Detecting Adverse Outcomes: Scoping Review %A Rahman,Jessica %A Brankovic,Aida %A Tracy,Mark %A Khanna,Sankalp %+ Commonwealth Scientific and Industrial Research Organisation (CSIRO) Australian e-Health Research Centre, Australia, 160 Hawkesbury Road, Sydney, 2145, Australia, 61 02 9325 3016, jessica.rahman@csiro.au %K physiological signals %K preterm %K neonatal intensive care unit %K morbidity %K signal processing %K signal analysis %K adverse outcomes %K predictive and diagnostic models %D 2024 %7 20.8.2024 %9 Review %J Interact J Med Res %G English %X Background: Computational signal preprocessing is a prerequisite for developing data-driven predictive models for clinical decision support. Thus, identifying the best practices that adhere to clinical principles is critical to ensure transparency and reproducibility to drive clinical adoption. It further fosters reproducible, ethical, and reliable conduct of studies. This procedure is also crucial for setting up a software quality management system to ensure regulatory compliance in developing software as a medical device aimed at early preclinical detection of clinical deterioration. Objective: This scoping review focuses on the neonatal intensive care unit setting and summarizes the state-of-the-art computational methods used for preprocessing neonatal clinical physiological signals; these signals are used for the development of machine learning models to predict the risk of adverse outcomes. Methods: Five databases (PubMed, Web of Science, Scopus, IEEE, and ACM Digital Library) were searched using a combination of keywords and MeSH (Medical Subject Headings) terms. A total of 3585 papers from 2013 to January 2023 were identified based on the defined search terms and inclusion criteria. After removing duplicates, 2994 (83.51%) papers were screened by title and abstract, and 81 (0.03%) were selected for full-text review. Of these, 52 (64%) were eligible for inclusion in the detailed analysis. Results: Of the 52 articles reviewed, 24 (46%) studies focused on diagnostic models, while the remainder (n=28, 54%) focused on prognostic models. The analysis conducted in these studies involved various physiological signals, with electrocardiograms being the most prevalent. Different programming languages were used, with MATLAB and Python being notable. The monitoring and capturing of physiological data used diverse systems, impacting data quality and introducing study heterogeneity. Outcomes of interest included sepsis, apnea, bradycardia, mortality, necrotizing enterocolitis, and hypoxic-ischemic encephalopathy, with some studies analyzing combinations of adverse outcomes. We found a partial or complete lack of transparency in reporting the setting and the methods used for signal preprocessing. This includes reporting methods to handle missing data, segment size for considered analysis, and details regarding the modification of the state-of-the-art methods for physiological signal processing to align with the clinical principles for neonates. Only 7 (13%) of the 52 reviewed studies reported all the recommended preprocessing steps, which could have impacts on the downstream analysis. Conclusions: The review found heterogeneity in the techniques used and inconsistent reporting of parameters and procedures used for preprocessing neonatal physiological signals, which is necessary to confirm adherence to clinical and software quality management system practices, usefulness, and choice of best practices. Enhancing transparency in reporting and standardizing procedures will boost study interpretation and reproducibility and expedite clinical adoption, instilling confidence in the research findings and streamlining the translation of research outcomes into clinical practice, ultimately contributing to the advancement of neonatal care and patient outcomes. %M 39163610 %R 10.2196/46946 %U https://www.i-jmr.org/2024/1/e46946 %U https://doi.org/10.2196/46946 %U http://www.ncbi.nlm.nih.gov/pubmed/39163610 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57162 %T Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study %A Szumilas,Dawid %A Ochmann,Anna %A Zięba,Katarzyna %A Bartoszewicz,Bartłomiej %A Kubrak,Anna %A Makuch,Sebastian %A Agrawal,Siddarth %A Mazur,Grzegorz %A Chudek,Jerzy %K LabTest Checker %K CDSS %K symptom checker %K laboratory testing %K AI %K assessment %K accuracy %K artificial intelligence %K health care %K medical fields %K clinical decision support systems %K application %K applications %K diagnoses %K patients %K patient %K medical history %K tool %K tools %D 2024 %7 14.8.2024 %9 %J JMIR Med Inform %G English %X Background: In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results’ significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area. Objective: The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients’ medical histories. Methods: This cohort study embraced a prospective data collection approach. A total of 101 patients aged ≥18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard. Results: The system achieved a 74.3% accuracy and 100% sensitivity for emergency safety and 92.3% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6% (42/101) and achieved an 82.9% accuracy in identifying underlying pathologies. Conclusions: This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC’s performance evaluation highlights the advancements in AI’s role in laboratory medicine. Trial Registration: ClinicalTrials.gov NCT05813938; https://clinicaltrials.gov/study/NCT05813938 %R 10.2196/57162 %U https://medinform.jmir.org/2024/1/e57162 %U https://doi.org/10.2196/57162 %0 Journal Article %@ 2152-7202 %I JMIR Publications %V 16 %N %P e55705 %T Shifting Grounds—Facilitating Self-Care in Testing for Sexually Transmitted Infections Through the Use of Self-Test Technology: Qualitative Study %A Trettin,Bettina %A Skjøth,Mette Maria %A Munk,Nadja Trier %A Vestergaard,Tine %A Nielsen,Charlotte %+ Department of Dermatology and Allergy Centre, Odense University Hospital, J. B. Winsløws Vej 4, Odense, 5000, Denmark, 45 60494279, bettina.trettin@rsyd.dk %K chlamydia %K sexually transmitted diseases %K participatory design %K self-test %K qualitative %K Chlamydia trachomatis %K lymphogranuloma venereum %K participatory %K STD %K STDs %K sexually transmitted %K sexually transmitted illness %K sexually transmitted illnesses %K STI %K STIs %K participatory %K participation %K self-testing %K screening %K health screening %K asymptomatic screening %K testing uptake %D 2024 %7 14.8.2024 %9 Original Paper %J J Particip Med %G English %X Background: Chlamydia remains prevalent worldwide and is considered a global public health problem. However, testing rates among young sexually active people remain low. Effective clinical management relies on screening asymptomatic patients. However, attending face-to-face consultations of testing for sexually transmitted infections is associated with stigmatization and anxiety. Self-testing technology (STT) allows patients to test themselves for chlamydia and gonorrhea without the presence of health care professionals. This may result in wider access to testing and increase testing uptake. Therefore, the sexual health clinic at Odense University Hospital has designed and developed a technology that allows patients to get tested at the clinic through self-collected sampling without a face-to-face consultation. Objective: This study aimed to (1) pilot-test STT used in clinical practice and (2) investigate the experiences of patients who have completed a self-test for chlamydia and gonorrhea. Methods: The study was conducted as a qualitative study inspired by the methodology of participatory design. Ethnographic methods were applied in the feasibility study and the data analyzed were inspired by the action research spiral in iterative processes using steps, such as plan, act, observe, and reflect. The qualitative evaluation study used semistructured interviews and data were analyzed using a qualitative 3-level analytical model. Results: The findings from the feasibility study, such as lack of signposting and adequate information, led to the final modifications of the self-test technology and made it possible to implement it in clinical practice. The qualitative evaluation study found that self-testing was seen as more appealing than testing at a face-to-face consultation because it was an easy solution that both saved time and allowed for the freedom to plan the visit independently. Security was experienced when the instructions balanced between being detail-oriented while also being simple and illustrative. The anonymity and discretion contributed to preserving privacy and removed the fear of an awkward conversation or being judged by health care professionals thus leading to the reduction of intrusive feelings. Conclusions: Accessible health care services are crucial in preventing and reducing the impact of sexually transmitted infections and STT may have the potential to increase testing uptake as it takes into account some of the barriers that exist. The pilot test and evaluation have resulted in a fully functioning implementation of STT in clinical practice. %M 39141903 %R 10.2196/55705 %U https://jopm.jmir.org/2024/1/e55705 %U https://doi.org/10.2196/55705 %U http://www.ncbi.nlm.nih.gov/pubmed/39141903 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58599 %T Building Dual AI Models and Nomograms Using Noninvasive Parameters for Aiding Male Bladder Outlet Obstruction Diagnosis and Minimizing the Need for Invasive Video-Urodynamic Studies: Development and Validation Study %A Tsai,Chung-You %A Tian,Jing-Hui %A Lee,Chien-Cheng %A Kuo,Hann-Chorng %+ Department of Urology, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation and Tzu Chi University, 707, Section 3, Chung-Yang Road, Hualien, 970, Taiwan, 886 3 856 1825, hck@tzuchi.com.tw %K bladder outlet obstruction %K lower urinary tract symptoms %K machine learning %K nomogram %K artificial intelligence %K video urodynamic study %D 2024 %7 23.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Diagnosing underlying causes of nonneurogenic male lower urinary tract symptoms associated with bladder outlet obstruction (BOO) is challenging. Video-urodynamic studies (VUDS) and pressure-flow studies (PFS) are both invasive diagnostic methods for BOO. VUDS can more precisely differentiate etiologies of male BOO, such as benign prostatic obstruction, primary bladder neck obstruction, and dysfunctional voiding, potentially outperforming PFS. Objective: These examinations’ invasive nature highlights the need for developing noninvasive predictive models to facilitate BOO diagnosis and reduce the necessity for invasive procedures. Methods: We conducted a retrospective study with a cohort of men with medication-refractory, nonneurogenic lower urinary tract symptoms suspected of BOO who underwent VUDS from 2001 to 2022. In total, 2 BOO predictive models were developed—1 based on the International Continence Society’s definition (International Continence Society–defined bladder outlet obstruction; ICS-BOO) and the other on video-urodynamic studies–diagnosed bladder outlet obstruction (VBOO). The patient cohort was randomly split into training and test sets for analysis. A total of 6 machine learning algorithms, including logistic regression, were used for model development. During model development, we first performed development validation using repeated 5-fold cross-validation on the training set and then test validation to assess the model’s performance on an independent test set. Both models were implemented as paper-based nomograms and integrated into a web-based artificial intelligence prediction tool to aid clinical decision-making. Results: Among 307 patients, 26.7% (n=82) met the ICS-BOO criteria, while 82.1% (n=252) were diagnosed with VBOO. The ICS-BOO prediction model had a mean area under the receiver operating characteristic curve (AUC) of 0.74 (SD 0.09) and mean accuracy of 0.76 (SD 0.04) in development validation and AUC and accuracy of 0.86 and 0.77, respectively, in test validation. The VBOO prediction model yielded a mean AUC of 0.71 (SD 0.06) and mean accuracy of 0.77 (SD 0.06) internally, with AUC and accuracy of 0.72 and 0.76, respectively, externally. When both models’ predictions are applied to the same patient, their combined insights can significantly enhance clinical decision-making and simplify the diagnostic pathway. By the dual-model prediction approach, if both models positively predict BOO, suggesting all cases actually resulted from medication-refractory primary bladder neck obstruction or benign prostatic obstruction, surgical intervention may be considered. Thus, VUDS might be unnecessary for 100 (32.6%) patients. Conversely, when ICS-BOO predictions are negative but VBOO predictions are positive, indicating varied etiology, VUDS rather than PFS is advised for precise diagnosis and guiding subsequent therapy, accurately identifying 51.1% (47/92) of patients for VUDS. Conclusions: The 2 machine learning models predicting ICS-BOO and VBOO, based on 6 noninvasive clinical parameters, demonstrate commendable discrimination performance. Using the dual-model prediction approach, when both models predict positively, VUDS may be avoided, assisting in male BOO diagnosis and reducing the need for such invasive procedures. %M 39042442 %R 10.2196/58599 %U https://www.jmir.org/2024/1/e58599 %U https://doi.org/10.2196/58599 %U http://www.ncbi.nlm.nih.gov/pubmed/39042442 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55542 %T Diagnostic Accuracy of a Mobile AI-Based Symptom Checker and a Web-Based Self-Referral Tool in Rheumatology: Multicenter Randomized Controlled Trial %A Knitza,Johannes %A Tascilar,Koray %A Fuchs,Franziska %A Mohn,Jacob %A Kuhn,Sebastian %A Bohr,Daniela %A Muehlensiepen,Felix %A Bergmann,Christina %A Labinsky,Hannah %A Morf,Harriet %A Araujo,Elizabeth %A Englbrecht,Matthias %A Vorbrüggen,Wolfgang %A von der Decken,Cay-Benedict %A Kleinert,Stefan %A Ramming,Andreas %A Distler,Jörg H W %A Bartz-Bazzanella,Peter %A Vuillerme,Nicolas %A Schett,Georg %A Welcker,Martin %A Hueber,Axel %+ Institute for Digital Medicine, University Hospital Giessen-Marburg, Philipps University Marburg, Baldingerstrasse, Marburg, 35043, Germany, 1 49 6421 ext 58, johannes.knitza@uni-marburg.de %K symptom checker %K artificial intelligence %K eHealth %K diagnostic decision support system %K rheumatology %K decision support %K decision %K diagnostic %K tool %K rheumatologists %K symptom assessment %K resources %K randomized controlled trial %K diagnosis %K decision support system %K support system %K support %D 2024 %7 23.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The diagnosis of inflammatory rheumatic diseases (IRDs) is often delayed due to unspecific symptoms and a shortage of rheumatologists. Digital diagnostic decision support systems (DDSSs) have the potential to expedite diagnosis and help patients navigate the health care system more efficiently. Objective: The aim of this study was to assess the diagnostic accuracy of a mobile artificial intelligence (AI)–based symptom checker (Ada) and a web-based self-referral tool (Rheport) regarding IRDs. Methods: A prospective, multicenter, open-label, crossover randomized controlled trial was conducted with patients newly presenting to 3 rheumatology centers. Participants were randomly assigned to complete a symptom assessment using either Ada or Rheport. The primary outcome was the correct identification of IRDs by the DDSSs, defined as the presence of any IRD in the list of suggested diagnoses by Ada or achieving a prespecified threshold score with Rheport. The gold standard was the diagnosis made by rheumatologists. Results: A total of 600 patients were included, among whom 214 (35.7%) were diagnosed with an IRD. Most frequent IRD was rheumatoid arthritis with 69 (11.5%) patients. Rheport’s disease suggestion and Ada’s top 1 (D1) and top 5 (D5) disease suggestions demonstrated overall diagnostic accuracies of 52%, 63%, and 58%, respectively, for IRDs. Rheport showed a sensitivity of 62% and a specificity of 47% for IRDs. Ada’s D1 and D5 disease suggestions showed a sensitivity of 52% and 66%, respectively, and a specificity of 68% and 54%, respectively, concerning IRDs. Ada’s diagnostic accuracy regarding individual diagnoses was heterogenous, and Ada performed considerably better in identifying rheumatoid arthritis in comparison to other diagnoses (D1: 42%; D5: 64%). The Cohen κ statistic of Rheport for agreement on any rheumatic disease diagnosis with Ada D1 was 0.15 (95% CI 0.08-0.18) and with Ada D5 was 0.08 (95% CI 0.00-0.16), indicating poor agreement for the presence of any rheumatic disease between the 2 DDSSs. Conclusions: To our knowledge, this is the largest comparative DDSS trial with actual use of DDSSs by patients. The diagnostic accuracies of both DDSSs for IRDs were not promising in this high-prevalence patient population. DDSSs may lead to a misuse of scarce health care resources. Our results underscore the need for stringent regulation and drastic improvements to ensure the safety and efficacy of DDSSs. Trial Registration: German Register of Clinical Trials DRKS00017642; https://drks.de/search/en/trial/DRKS00017642 %M 39042425 %R 10.2196/55542 %U https://www.jmir.org/2024/1/e55542 %U https://doi.org/10.2196/55542 %U http://www.ncbi.nlm.nih.gov/pubmed/39042425 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58141 %T Construction of a Multi-Label Classifier for Extracting Multiple Incident Factors From Medication Incident Reports in Residential Care Facilities: Natural Language Processing Approach %A Kizaki,Hayato %A Satoh,Hiroki %A Ebara,Sayaka %A Watabe,Satoshi %A Sawada,Yasufumi %A Imai,Shungo %A Hori,Satoko %+ Division of Drug Informatics, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, Japan, 81 354002799, hayatokizaki625@keio.jp %K residential facilities %K incidents %K non-medical staff %K natural language processing %K risk management %D 2024 %7 23.7.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Medication safety in residential care facilities is a critical concern, particularly when nonmedical staff provide medication assistance. The complex nature of medication-related incidents in these settings, coupled with the psychological impact on health care providers, underscores the need for effective incident analysis and preventive strategies. A thorough understanding of the root causes, typically through incident-report analysis, is essential for mitigating medication-related incidents. Objective: We aimed to develop and evaluate a multilabel classifier using natural language processing to identify factors contributing to medication-related incidents using incident report descriptions from residential care facilities, with a focus on incidents involving nonmedical staff. Methods: We analyzed 2143 incident reports, comprising 7121 sentences, from residential care facilities in Japan between April 1, 2015, and March 31, 2016. The incident factors were annotated using sentences based on an established organizational factor model and previous research findings. The following 9 factors were defined: procedure adherence, medicine, resident, resident family, nonmedical staff, medical staff, team, environment, and organizational management. To assess the label criteria, 2 researchers with relevant medical knowledge annotated a subset of 50 reports; the interannotator agreement was measured using Cohen κ. The entire data set was subsequently annotated by 1 researcher. Multiple labels were assigned to each sentence. A multilabel classifier was developed using deep learning models, including 2 Bidirectional Encoder Representations From Transformers (BERT)–type models (Tohoku-BERT and a University of Tokyo Hospital BERT pretrained with Japanese clinical text: UTH-BERT) and an Efficiently Learning Encoder That Classifies Token Replacements Accurately (ELECTRA), pretrained on Japanese text. Both sentence- and report-level training were performed; the performance was evaluated by the F1-score and exact match accuracy through 5-fold cross-validation. Results: Among all 7121 sentences, 1167, 694, 2455, 23, 1905, 46, 195, 1104, and 195 included “procedure adherence,” “medicine,” “resident,” “resident family,” “nonmedical staff,” “medical staff,” “team,” “environment,” and “organizational management,” respectively. Owing to limited labels, “resident family” and “medical staff” were omitted from the model development process. The interannotator agreement values were higher than 0.6 for each label. A total of 10, 278, and 1855 reports contained no, 1, and multiple labels, respectively. The models trained using the report data outperformed those trained using sentences, with macro F1-scores of 0.744, 0.675, and 0.735 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. The report-trained models also demonstrated better exact match accuracy, with 0.411, 0.389, and 0.399 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. Notably, the accuracy was consistent even when the analysis was confined to reports containing multiple labels. Conclusions: The multilabel classifier developed in our study demonstrated potential for identifying various factors associated with medication-related incidents using incident reports from residential care facilities. Thus, this classifier can facilitate prompt analysis of incident factors, thereby contributing to risk management and the development of preventive strategies. %M 39042454 %R 10.2196/58141 %U https://medinform.jmir.org/2024/1/e58141 %U https://doi.org/10.2196/58141 %U http://www.ncbi.nlm.nih.gov/pubmed/39042454 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50130 %T Human-AI Teaming in Critical Care: A Comparative Analysis of Data Scientists’ and Clinicians’ Perspectives on AI Augmentation and Automation %A Bienefeld,Nadine %A Keller,Emanuela %A Grote,Gudela %+ Department of Management, Technology, and Economics, ETH Zurich, , Zurich, Switzerland, 41 44 633 45 95, nbienefeld@ethz.ch %K AI in health care %K human-AI teaming %K sociotechnical systems %K intensive care %K ICU %K AI adoption %K AI implementation %K augmentation %K automation, health care policy and regulatory foresight %K explainable AI %K explainable %K human-AI %K human-computer %K human-machine %K ethical implications of AI in health care %K ethical %K ethic %K ethics %K artificial intelligence %K policy %K foresight %K policies %K recommendation %K recommendations %K policy maker %K policy makers %K Delphi %K sociotechnical %D 2024 %7 22.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) holds immense potential for enhancing clinical and administrative health care tasks. However, slow adoption and implementation challenges highlight the need to consider how humans can effectively collaborate with AI within broader socio-technical systems in health care. Objective: In the example of intensive care units (ICUs), we compare data scientists’ and clinicians’ assessments of the optimal utilization of human and AI capabilities by determining suitable levels of human-AI teaming for safely and meaningfully augmenting or automating 6 core tasks. The goal is to provide actionable recommendations for policy makers and health care practitioners regarding AI design and implementation. Methods: In this multimethod study, we combine a systematic task analysis across 6 ICUs with an international Delphi survey involving 19 health data scientists from the industry and academia and 61 ICU clinicians (25 physicians and 36 nurses) to define and assess optimal levels of human-AI teaming (level 1=no performance benefits; level 2=AI augments human performance; level 3=humans augment AI performance; level 4=AI performs without human input). Stakeholder groups also considered ethical and social implications. Results: Both stakeholder groups chose level 2 and 3 human-AI teaming for 4 out of 6 core tasks in the ICU. For one task (monitoring), level 4 was the preferred design choice. For the task of patient interactions, both data scientists and clinicians agreed that AI should not be used regardless of technological feasibility due to the importance of the physician-patient and nurse-patient relationship and ethical concerns. Human-AI design choices rely on interpretability, predictability, and control over AI systems. If these conditions are not met and AI performs below human-level reliability, a reduction to level 1 or shifting accountability away from human end users is advised. If AI performs at or beyond human-level reliability and these conditions are not met, shifting to level 4 automation should be considered to ensure safe and efficient human-AI teaming. Conclusions: By considering the sociotechnical system and determining appropriate levels of human-AI teaming, our study showcases the potential for improving the safety and effectiveness of AI usage in ICUs and broader health care settings. Regulatory measures should prioritize interpretability, predictability, and control if clinicians hold full accountability. Ethical and social implications must be carefully evaluated to ensure effective collaboration between humans and AI, particularly considering the most recent advancements in generative AI. %M 39038285 %R 10.2196/50130 %U https://www.jmir.org/2024/1/e50130 %U https://doi.org/10.2196/50130 %U http://www.ncbi.nlm.nih.gov/pubmed/39038285 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e56361 %T Diagnostic Accuracy of Artificial Intelligence in Endoscopy: Umbrella Review %A Zha,Bowen %A Cai,Angshu %A Wang,Guiqi %K endoscopy %K artificial intelligence %K umbrella review %K meta-analyses %K AI %K diagnostic %K researchers %K researcher %K tools %K tool %K assessment %D 2024 %7 15.7.2024 %9 %J JMIR Med Inform %G English %X Background: Some research has already reported the diagnostic value of artificial intelligence (AI) in different endoscopy outcomes. However, the evidence is confusing and of varying quality. Objective: This review aimed to comprehensively evaluate the credibility of the evidence of AI’s diagnostic accuracy in endoscopy. Methods: Before the study began, the protocol was registered on PROSPERO (CRD42023483073). First, 2 researchers searched PubMed, Web of Science, Embase, and Cochrane Library using comprehensive search terms. Then, researchers screened the articles and extracted information. We used A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2) to evaluate the quality of the articles. When there were multiple studies aiming at the same result, we chose the study with higher-quality evaluations for further analysis. To ensure the reliability of the conclusions, we recalculated each outcome. Finally, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) was used to evaluate the credibility of the outcomes. Results: A total of 21 studies were included for analysis. Through AMSTAR2, it was found that 8 research methodologies were of moderate quality, while other studies were regarded as having low or critically low quality. The sensitivity and specificity of 17 different outcomes were analyzed. There were 4 studies on esophagus, 4 studies on stomach, and 4 studies on colorectal regions. Two studies were associated with capsule endoscopy, two were related to laryngoscopy, and one was related to ultrasonic endoscopy. In terms of sensitivity, gastroesophageal reflux disease had the highest accuracy rate, reaching 97%, while the invasion depth of colon neoplasia, with 71%, had the lowest accuracy rate. On the other hand, the specificity of colorectal cancer was the highest, reaching 98%, while the gastrointestinal stromal tumor, with only 80%, had the lowest specificity. The GRADE evaluation suggested that the reliability of most outcomes was low or very low. Conclusions: AI proved valuabe in endoscopic diagnoses, especially in esophageal and colorectal diseases. These findings provide a theoretical basis for developing and evaluating AI-assisted systems, which are aimed at assisting endoscopists in carrying out examinations, leading to improved patient health outcomes. However, further high-quality research is needed in the future to fully validate AI’s effectiveness. %R 10.2196/56361 %U https://medinform.jmir.org/2024/1/e56361 %U https://doi.org/10.2196/56361 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52139 %T Artificial Intelligence–Based Electrocardiographic Biomarker for Outcome Prediction in Patients With Acute Heart Failure: Prospective Cohort Study %A Cho,Youngjin %A Yoon,Minjae %A Kim,Joonghee %A Lee,Ji Hyun %A Oh,Il-Young %A Lee,Chan Joo %A Kang,Seok-Min %A Choi,Dong-Ju %+ Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, 82 Gumi-ro 173 Beon-gil, Bundang-gu, Seongnam, Gyeonggi-do, 13620, Republic of Korea, 82 317877007, djchoi@snubh.org %K acute heart failure %K electrocardiography %K artificial intelligence %K deep learning %D 2024 %7 3.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although several biomarkers exist for patients with heart failure (HF), their use in routine clinical practice is often constrained by high costs and limited availability. Objective: We examined the utility of an artificial intelligence (AI) algorithm that analyzes printed electrocardiograms (ECGs) for outcome prediction in patients with acute HF. Methods: We retrospectively analyzed prospectively collected data of patients with acute HF at two tertiary centers in Korea. Baseline ECGs were analyzed using a deep-learning system called Quantitative ECG (QCG), which was trained to detect several urgent clinical conditions, including shock, cardiac arrest, and reduced left ventricular ejection fraction (LVEF). Results: Among the 1254 patients enrolled, in-hospital cardiac death occurred in 53 (4.2%) patients, and the QCG score for critical events (QCG-Critical) was significantly higher in these patients than in survivors (mean 0.57, SD 0.23 vs mean 0.29, SD 0.20; P<.001). The QCG-Critical score was an independent predictor of in-hospital cardiac death after adjustment for age, sex, comorbidities, HF etiology/type, atrial fibrillation, and QRS widening (adjusted odds ratio [OR] 1.68, 95% CI 1.47-1.92 per 0.1 increase; P<.001), and remained a significant predictor after additional adjustments for echocardiographic LVEF and N-terminal prohormone of brain natriuretic peptide level (adjusted OR 1.59, 95% CI 1.36-1.87 per 0.1 increase; P<.001). During long-term follow-up, patients with higher QCG-Critical scores (>0.5) had higher mortality rates than those with low QCG-Critical scores (<0.25) (adjusted hazard ratio 2.69, 95% CI 2.14-3.38; P<.001). Conclusions: Predicting outcomes in patients with acute HF using the QCG-Critical score is feasible, indicating that this AI-based ECG score may be a novel biomarker for these patients. Trial Registration: ClinicalTrials.gov NCT01389843; https://clinicaltrials.gov/study/NCT01389843 %M 38959500 %R 10.2196/52139 %U https://www.jmir.org/2024/1/e52139 %U https://doi.org/10.2196/52139 %U http://www.ncbi.nlm.nih.gov/pubmed/38959500 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e48811 %T Efficacy of an Artificial Intelligence App (Aysa) in Dermatological Diagnosis: Cross-Sectional Analysis %A Marri,Shiva Shankar %A Albadri,Warood %A Hyder,Mohammed Salman %A Janagond,Ajit B %A Inamadar,Arun C %+ Department of Dermatology, Venereology and Leprosy, Shri B M Patil Medical College, Hospital and Research Centre, BLDE (Deemed to be) University, Bangaramma Sajjan Campus, Vijayapura, Karnataka, 586103, India, 91 9448102920, aruninamadar@gmail.com %K artificial intelligence %K AI %K AI-aided diagnosis %K dermatology %K mobile app %K application %K neural network %K machine learning %K dermatological %K skin %K computer-aided diagnosis %K diagnostic %K imaging %K lesion %D 2024 %7 2.7.2024 %9 Original Paper %J JMIR Dermatol %G English %X Background: Dermatology is an ideal specialty for artificial intelligence (AI)–driven image recognition to improve diagnostic accuracy and patient care. Lack of dermatologists in many parts of the world and the high frequency of cutaneous disorders and malignancies highlight the increasing need for AI-aided diagnosis. Although AI-based applications for the identification of dermatological conditions are widely available, research assessing their reliability and accuracy is lacking. Objective: The aim of this study was to analyze the efficacy of the Aysa AI app as a preliminary diagnostic tool for various dermatological conditions in a semiurban town in India. Methods: This observational cross-sectional study included patients over the age of 2 years who visited the dermatology clinic. Images of lesions from individuals with various skin disorders were uploaded to the app after obtaining informed consent. The app was used to make a patient profile, identify lesion morphology, plot the location on a human model, and answer questions regarding duration and symptoms. The app presented eight differential diagnoses, which were compared with the clinical diagnosis. The model’s performance was evaluated using sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F1-score. Comparison of categorical variables was performed with the χ2 test and statistical significance was considered at P<.05. Results: A total of 700 patients were part of the study. A wide variety of skin conditions were grouped into 12 categories. The AI model had a mean top-1 sensitivity of 71% (95% CI 61.5%-74.3%), top-3 sensitivity of 86.1% (95% CI 83.4%-88.6%), and all-8 sensitivity of 95.1% (95% CI 93.3%-96.6%). The top-1 sensitivities for diagnosis of skin infestations, disorders of keratinization, other inflammatory conditions, and bacterial infections were 85.7%, 85.7%, 82.7%, and 81.8%, respectively. In the case of photodermatoses and malignant tumors, the top-1 sensitivities were 33.3% and 10%, respectively. Each category had a strong correlation between the clinical diagnosis and the probable diagnoses (P<.001). Conclusions: The Aysa app showed promising results in identifying most dermatoses. %M 38954807 %R 10.2196/48811 %U https://derma.jmir.org/2024/1/e48811 %U https://doi.org/10.2196/48811 %U http://www.ncbi.nlm.nih.gov/pubmed/38954807 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58491 %T AI: Bridging Ancient Wisdom and Modern Innovation in Traditional Chinese Medicine %A Lu,Linken %A Lu,Tangsheng %A Tian,Chunyu %A Zhang,Xiujun %+ School of Psychology and Mental Health, North China University of Science and Technology, 21 Bohai Avenue, Caofeidian New Town, Tangshan, Hebei Province, 063210, China, 86 0315 8805970, zhxj@ncst.edu.cn %K traditional Chinese medicine %K TCM %K artificial intelligence %K AI %K diagnosis %D 2024 %7 28.6.2024 %9 Viewpoint %J JMIR Med Inform %G English %X The pursuit of groundbreaking health care innovations has led to the convergence of artificial intelligence (AI) and traditional Chinese medicine (TCM), thus marking a new frontier that demonstrates the promise of combining the advantages of ancient healing practices with cutting-edge advancements in modern technology. TCM, which is a holistic medical system with >2000 years of empirical support, uses unique diagnostic methods such as inspection, auscultation and olfaction, inquiry, and palpation. AI is the simulation of human intelligence processes by machines, especially via computer systems. TCM is experience oriented, holistic, and subjective, and its combination with AI has beneficial effects, which presumably arises from the perspectives of diagnostic accuracy, treatment efficacy, and prognostic veracity. The role of AI in TCM is highlighted by its use in diagnostics, with machine learning enhancing the precision of treatment through complex pattern recognition. This is exemplified by the greater accuracy of TCM syndrome differentiation via tongue images that are analyzed by AI. However, integrating AI into TCM also presents multifaceted challenges, such as data quality and ethical issues; thus, a unified strategy, such as the use of standardized data sets, is required to improve AI understanding and application of TCM principles. The evolution of TCM through the integration of AI is a key factor for elucidating new horizons in health care. As research continues to evolve, it is imperative that technologists and TCM practitioners collaborate to drive innovative solutions that push the boundaries of medical science and honor the profound legacy of TCM. We can chart a future course wherein AI-augmented TCM practices contribute to more systematic, effective, and accessible health care systems for all individuals. %M 38941141 %R 10.2196/58491 %U https://medinform.jmir.org/2024/1/e58491 %U https://doi.org/10.2196/58491 %U http://www.ncbi.nlm.nih.gov/pubmed/38941141 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e51614 %T Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review %A Kale,Aditya U %A Hogg,Henry David Jeffry %A Pearson,Russell %A Glocker,Ben %A Golder,Su %A Coombe,April %A Waring,Justin %A Liu,Xiaoxuan %A Moore,David J %A Denniston,Alastair K %+ Institute of Inflammation and Ageing, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom, 44 1213713264, a.denniston@bham.ac.uk %K patient safety %K adverse events %K randomized controlled trials %K medical device %K systematic review %K algorithmic %K artificial intelligence %K AI %K AI health technology %K safety %K algorithm error %D 2024 %7 28.6.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. Objective: This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. Methods: This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. Results: The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. Conclusions: Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. Trial Registration: PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747 International Registered Report Identifier (IRRID): PRR1-10.2196/51614 %M 38941147 %R 10.2196/51614 %U https://www.researchprotocols.org/2024/1/e51614 %U https://doi.org/10.2196/51614 %U http://www.ncbi.nlm.nih.gov/pubmed/38941147 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58157 %T A Symptom-Checker for Adult Patients Visiting an Interdisciplinary Emergency Care Center and the Safety of Patient Self-Triage: Real-Life Prospective Evaluation %A Meer,Andreas %A Rahm,Philipp %A Schwendinger,Markus %A Vock,Michael %A Grunder,Bettina %A Demurtas,Jacopo %A Rutishauser,Jonas %+ In4medicine Inc, Monbijoustrasse 23, Bern, 3011, Switzerland, 41 313701330, a.meer@in4medicine.ch %K safety %K telemedicine %K teletriage %K symptom-checker %K self-triage %K self-assessment %K triage %K triaging %K symptom %K symptoms %K validation %K validity %K telehealth %K mHealth %K mobile health %K app %K apps %K application %K applications %K diagnosis %K diagnoses %K diagnostic %K diagnostics %K checker %K checkers %K check %K web %K neural network %K neural networks %D 2024 %7 27.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Symptom-checkers have become important tools for self-triage, assisting patients to determine the urgency of medical care. To be safe and effective, these tools must be validated, particularly to avoid potentially hazardous undertriage without leading to inefficient overtriage. Only limited safety data from studies including small sample sizes have been available so far. Objective: The objective of our study was to prospectively investigate the safety of patients’ self-triage in a large patient sample. We used SMASS (Swiss Medical Assessment System; in4medicine, Inc) pathfinder, a symptom-checker based on a computerized transparent neural network. Methods: We recruited 2543 patients into this single-center, prospective clinical trial conducted at the cantonal hospital of Baden, Switzerland. Patients with an Emergency Severity Index of 1-2 were treated by the team of the emergency department, while those with an index of 3-5 were seen at the walk-in clinic by general physicians. We compared the triage recommendation obtained by the patients’ self-triage with the assessment of clinical urgency made by 3 successive interdisciplinary panels of physicians (panels A, B, and C). Using the Clopper-Pearson CI, we assumed that to confirm the symptom-checkers’ safety, the upper confidence bound for the probability of a potentially hazardous undertriage should lie below 1%. A potentially hazardous undertriage was defined as a triage in which either all (consensus criterion) or the majority (majority criterion) of the experts of the last panel (panel C) rated the triage of the symptom-checker to be “rather likely” or “likely” life-threatening or harmful. Results: Of the 2543 patients, 1227 (48.25%) were female and 1316 (51.75%) male. None of the patients reached the prespecified consensus criterion for a potentially hazardous undertriage. This resulted in an upper 95% confidence bound of 0.1184%. Further, 4 cases met the majority criterion. This resulted in an upper 95% confidence bound for the probability of a potentially hazardous undertriage of 0.3616%. The 2-sided 95% Clopper-Pearson CI for the probability of overtriage (n=450 cases,17.69%) was 16.23% to 19.24%, which is considerably lower than the figures reported in the literature. Conclusions: The symptom-checker proved to be a safe triage tool, avoiding potentially hazardous undertriage in a real-life clinical setting of emergency consultations at a walk-in clinic or emergency department without causing undesirable overtriage. Our data suggest the symptom-checker may be safely used in clinical routine. Trial Registration: ClinicalTrials.gov NCT04055298; https://clinicaltrials.gov/study/NCT04055298 %M 38809606 %R 10.2196/58157 %U https://www.jmir.org/2024/1/e58157 %U https://doi.org/10.2196/58157 %U http://www.ncbi.nlm.nih.gov/pubmed/38809606 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e48777 %T Detection of Mild Cognitive Impairment Through Hand Motor Function Under Digital Cognitive Test: Mixed Methods Study %A Li,Aoyu %A Li,Jingwen %A Chai,Jiali %A Wu,Wei %A Chaudhary,Suamn %A Zhao,Juanjuan %A Qiang,Yan %+ College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, No. 209, University Street, Yuji District, Shanxi Province, Jinzhong, 030024, China, 86 18635168680, qiangyan@tyut.edu.cn %K mild cognitive impairment %K movement kinetics %K digital cognitive test %K dual task %K mobile phone %D 2024 %7 26.6.2024 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Early detection of cognitive impairment or dementia is essential to reduce the incidence of severe neurodegenerative diseases. However, currently available diagnostic tools for detecting mild cognitive impairment (MCI) or dementia are time-consuming, expensive, or not widely accessible. Hence, exploring more effective methods to assist clinicians in detecting MCI is necessary. Objective: In this study, we aimed to explore the feasibility and efficiency of assessing MCI through movement kinetics under tablet-based “drawing and dragging” tasks. Methods: We iteratively designed “drawing and dragging” tasks by conducting symposiums, programming, and interviews with stakeholders (neurologists, nurses, engineers, patients with MCI, healthy older adults, and caregivers). Subsequently, stroke patterns and movement kinetics were evaluated in healthy control and MCI groups by comparing 5 categories of features related to hand motor function (ie, time, stroke, frequency, score, and sequence). Finally, user experience with the overall cognitive screening system was investigated using structured questionnaires and unstructured interviews, and their suggestions were recorded. Results: The “drawing and dragging” tasks can detect MCI effectively, with an average accuracy of 85% (SD 2%). Using statistical comparison of movement kinetics, we discovered that the time- and score-based features are the most effective among all the features. Specifically, compared with the healthy control group, the MCI group showed a significant increase in the time they took for the hand to switch from one stroke to the next, with longer drawing times, slow dragging, and lower scores. In addition, patients with MCI had poorer decision-making strategies and visual perception of drawing sequence features, as evidenced by adding auxiliary information and losing more local details in the drawing. Feedback from user experience indicates that our system is user-friendly and facilitates screening for deficits in self-perception. Conclusions: The tablet-based MCI detection system quantitatively assesses hand motor function in older adults and further elucidates the cognitive and behavioral decline phenomenon in patients with MCI. This innovative approach serves to identify and measure digital biomarkers associated with MCI or Alzheimer dementia, enabling the monitoring of changes in patients’ executive function and visual perceptual abilities as the disease advances. %M 38924786 %R 10.2196/48777 %U https://mhealth.jmir.org/2024/1/e48777 %U https://doi.org/10.2196/48777 %U http://www.ncbi.nlm.nih.gov/pubmed/38924786 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e56646 %T In Silico Approaches to Polyherbal Synergy: Protocol for a Scoping Review %A Chandhiruthil Sathyan,Anjana %A Yadav,Pramod %A Gupta,Prashant %A Mahapathra,Arun Kumar %A Galib,Ruknuddin %+ Department of Rasa Shastra and Bhaishajya Kalpana, All India Institute of Ayurveda, Sarita Vihar, Delhi, 110076, India, 91 011 26950402, anjana.bobo@gmail.com %K polyherbal formulation %K Ayurveda system %K Ayurveda %K Ayurvedic medicine %K Ayurvedic treatment %K herbal %K herbal drug %K pharmacodynamic %K pharmacology %K computer-aided drug design %K in silico methodology %K scoping review %D 2024 %7 10.6.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: According to the World Health Organization, more than 80% of the world’s population relies on traditional medicine. Traditional medicine is typically based on the use of single herbal drugs or polyherbal formulations (PHFs) to manage diseases. However, the probable mode of action of these formulations is not well studied or documented. Over the past few decades, computational methods have been used to study the molecular mechanism of phytochemicals in single herbal drugs. However, the in silico methods applied to study PHFs remain unclear. Objective: The aim of this protocol is to develop a search strategy for a scoping review to map the in silico approaches applied in understanding the activity of PHFs used as traditional medicines worldwide. Methods: The scoping review will be conducted based on the methodology developed by Arksey and O’Malley and the recommendations of the Joanna Briggs Institute (JBI). A set of predetermined keywords will be used to identify the relevant studies from five databases: PubMed, Embase, Science Direct, Web of Science, and Google Scholar. Two independent reviewers will conduct the search to yield a list of relevant studies based on the inclusion and exclusion criteria. Mendeley version 1.19.8 will be used to remove duplicate citations, and title and abstract screening will be performed with Rayyan software. The JBI System for the Unified Management, Assessment, and Review of Information tool will be used for data extraction. The scoping review will be reported based on the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Results: Based on the core areas of the scoping review, a 3-step search strategy was developed. The initial search produced 3865 studies. After applying filters, 875 studies were short-listed for further review. Keywords were further refined to yield more relevant studies on the topic. Conclusions: The findings are expected to determine the extent of the knowledge gap in the applications of computational methods in PHFs for any traditional medicine across the world. The study can provide answers to open research questions related to the phytochemical identification of PHFs, criteria for target identification, strategies applied for in silico studies, software used, and challenges in adopting in silico methods for understanding the mechanisms of action of PHFs. This study can thus provide a better understanding of the application and types of in silico methods for investigating PHFs. International Registered Report Identifier (IRRID): PRR1-10.2196/56646 %M 38857494 %R 10.2196/56646 %U https://www.researchprotocols.org/2024/1/e56646 %U https://doi.org/10.2196/56646 %U http://www.ncbi.nlm.nih.gov/pubmed/38857494 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e54428 %T Event Analysis for Automated Estimation of Absent and Persistent Medication Alerts: Novel Methodology %A Bittmann,Janina A %A Scherkl,Camilo %A Meid,Andreas D %A Haefeli,Walter E %A Seidling,Hanna M %K clinical decision support system %K CDSS %K medication alert system %K alerting %K alert acceptance %K event analysis %D 2024 %7 4.6.2024 %9 %J JMIR Med Inform %G English %X Background: Event analysis is a promising approach to estimate the acceptance of medication alerts issued by computerized physician order entry (CPOE) systems with an integrated clinical decision support system (CDSS), particularly when alerts cannot be interactively confirmed in the CPOE-CDSS due to its system architecture. Medication documentation is then reviewed for documented evidence of alert acceptance, which can be a time-consuming process, especially when performed manually. Objective: We present a new automated event analysis approach, which was applied to a large data set generated in a CPOE-CDSS with passive, noninterruptive alerts. Methods: Medication and alert data generated over 3.5 months within the CPOE-CDSS at Heidelberg University Hospital were divided into 24-hour time intervals in which the alert display was correlated with associated prescription changes. Alerts were considered “persistent” if they were displayed in every consecutive 24-hour time interval due to a respective active prescription until patient discharge and were considered “absent” if they were no longer displayed during continuous prescriptions in the subsequent interval. Results: Overall, 1670 patient cases with 11,428 alerts were analyzed. Alerts were displayed for a median of 3 (IQR 1-7) consecutive 24-hour time intervals, with the shortest alerts displayed for drug-allergy interactions and the longest alerts displayed for potentially inappropriate medication for the elderly (PIM). Among the total 11,428 alerts, 56.1% (n=6413) became absent, most commonly among alerts for drug-drug interactions (1915/2366, 80.9%) and least commonly among PIM alerts (199/499, 39.9%). Conclusions: This new approach to estimate alert acceptance based on event analysis can be flexibly adapted to the automated evaluation of passive, noninterruptive alerts. This enables large data sets of longitudinal patient cases to be processed, allows for the derivation of the ratios of persistent and absent alerts, and facilitates the comparison and prospective monitoring of these alerts. %R 10.2196/54428 %U https://medinform.jmir.org/2024/1/e54428 %U https://doi.org/10.2196/54428 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53642 %T Development of a Subjective Visual Vertical Test System Using a Smartphone With Virtual Reality Goggles for Screening of Otolithic Dysfunction: Observational Study %A Umibe,Akiko %A Fushiki,Hiroaki %A Tsunoda,Reiko %A Kuroda,Tatsuaki %A Kuroda,Kazuhiro %A Tanaka,Yasuhiro %+ Department of Otorhinolaryngology, Head and Neck Surgery, Dokkyo Medical University Saitama Medical Center, 2-1-50, Minami-Koshigaya, Koshigaya-shi, Saitama, 343-8555, Japan, 81 489651111, aumibe@dokkyomed.ac.jp %K vestibular function tests %K telemedicine %K smartphone %K virtual reality %K otolith dysfunction screening tool %K vestibular evoked myogenic potential %K iPhone %K mobile phone %D 2024 %7 4.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The subjective visual vertical (SVV) test can evaluate otolith function and spatial awareness and is performed in dedicated vertigo centers using specialized equipment; however, it is not otherwise widely used because of the specific equipment and space requirements. An SVV test smartphone app was developed to easily perform assessments in outpatient facilities. Objective: This study aimed to verify whether the SVV test smartphone app with commercially available virtual reality goggles can be used in a clinical setting. Methods: The reference range was calculated for 15 healthy participants. We included 14 adult patients with unilateral vestibular neuritis, sudden sensorineural hearing loss with vertigo, and Meniere disease and investigated the correlation between the SVV test results and vestibular evoked myogenic potential (VEMP) results. Results: The SVV reference range of healthy participants for the sitting front-facing position was small, ranging from –2.6º to 2.3º. Among the 14 patients, 6 (43%) exceeded the reference range for healthy participants. The SVV of patients with vestibular neuritis and sudden sensorineural hearing loss tended to deviate to the affected side. A total of 9 (64%) had abnormal cervical VEMP (cVEMP) values and 6 (43%) had abnormal ocular VEMP (oVEMP) values. No significant difference was found between the presence or absence of abnormal SVV values and the presence or absence of abnormal cVEMP and oVEMP values; however, the odds ratios (ORs) suggested a higher likelihood of abnormal SVV values among those with abnormal cVEMP and oVEMP responses (OR 2.40, 95% CI 0.18-32.88; P>.99; and OR 2, 95% CI 0.90-4.45; P=.46, respectively). Conclusions: The SVV app can be used anywhere and in a short period while reducing directional bias by using virtual reality goggles, thus making it highly versatile and useful as a practical otolith dysfunction screening tool. %M 38833295 %R 10.2196/53642 %U https://formative.jmir.org/2024/1/e53642 %U https://doi.org/10.2196/53642 %U http://www.ncbi.nlm.nih.gov/pubmed/38833295 %0 Journal Article %@ 2291-5222 %I %V 12 %N %P e53964 %T Cardiac Health Assessment Using a Wearable Device Before and After Transcatheter Aortic Valve Implantation: Prospective Study %A Eerdekens,Rob %A Zelis,Jo %A ter Horst,Herman %A Crooijmans,Caia %A van 't Veer,Marcel %A Keulards,Danielle %A Kelm,Marcus %A Archer,Gareth %A Kuehne,Titus %A Brueren,Guus %A Wijnbergen,Inge %A Johnson,Nils %A Tonino,Pim %K aortic valve stenosis %K health watch %K quality of life %K heart %K cardiology %K cardiac %K aortic %K valve %K stenosis %K watch %K smartwatch %K wearables %K 6MWT %K walking %K test %K QoL %K WHOQOL-BREF %K 6-minute walking test %D 2024 %7 3.6.2024 %9 %J JMIR Mhealth Uhealth %G English %X Background: Due to aging of the population, the prevalence of aortic valve stenosis will increase drastically in upcoming years. Consequently, transcatheter aortic valve implantation (TAVI) procedures will also expand worldwide. Optimal selection of patients who benefit with improved symptoms and prognoses is key, since TAVI is not without its risks. Currently, we are not able to adequately predict functional outcomes after TAVI. Quality of life measurement tools and traditional functional assessment tests do not always agree and can depend on factors unrelated to heart disease. Activity tracking using wearable devices might provide a more comprehensive assessment. Objective: This study aimed to identify objective parameters (eg, change in heart rate) associated with improvement after TAVI for severe aortic stenosis from a wearable device. Methods: In total, 100 patients undergoing routine TAVI wore a Philips Health Watch device for 1 week before and after the procedure. Watch data were analyzed offline—before TAVI for 97 patients and after TAVI for 75 patients. Results: Parameters such as the total number of steps and activity time did not change, in contrast to improvements in the 6-minute walking test (6MWT) and physical limitation domain of the transformed WHOQOL-BREF questionnaire. Conclusions: These findings, in an older TAVI population, show that watch-based parameters, such as the number of steps, do not change after TAVI, unlike traditional 6MWT and QoL assessments. Basic wearable device parameters might be less appropriate for measuring treatment effects from TAVI. %R 10.2196/53964 %U https://mhealth.jmir.org/2024/1/e53964 %U https://doi.org/10.2196/53964 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e57292 %T Evaluation of Artificial Intelligence Algorithms for Diabetic Retinopathy Detection: Protocol for a Systematic Review and Meta-Analysis %A Sesgundo III,Jaime Angeles %A Maeng,David Collin %A Tukay,Jumelle Aubrey %A Ascano,Maria Patricia %A Suba-Cohen,Justine %A Sampang,Virginia %+ Office of Medical Research, University of Nevada, Reno School of Medicine, 1664 N. Virginia St, Reno, NV, 89557, United States, 1 7757841110, jsesgundo@med.unr.edu %K artificial intelligence %K diabetic retinopathy %K deep learning %K ophthalmology %K accuracy %K imaging %K AI %K DR %K complication %K retinopathy %K Optha %K AI algorithms %K detection %K management %K ophthalmologists %K early detection %K screening %K meta-analysis %K diabetes mellitus %K DM %K diabetes %K systematic review %D 2024 %7 27.5.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Diabetic retinopathy (DR) is one of the most common complications of diabetes mellitus. The global burden is immense with a worldwide prevalence of 8.5%. Recent advancements in artificial intelligence (AI) have demonstrated the potential to transform the landscape of ophthalmology with earlier detection and management of DR. Objective: This study seeks to provide an update and evaluate the accuracy and current diagnostic ability of AI in detecting DR versus ophthalmologists. Additionally, this review will highlight the potential of AI integration to enhance DR screening, management, and disease progression. Methods: A systematic review of the current landscape of AI’s role in DR will be undertaken, guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) model. Relevant peer-reviewed papers published in English will be identified by searching 4 international databases: PubMed, Embase, CINAHL, and the Cochrane Central Register of Controlled Trials. Eligible studies will include randomized controlled trials, observational studies, and cohort studies published on or after 2022 that evaluate AI’s performance in retinal imaging detection of DR in diverse adult populations. Studies that focus on specific comorbid conditions, nonimage-based applications of AI, or those lacking a direct comparison group or clear methodology will be excluded. Selected papers will be independently assessed for bias by 2 review authors (JS and DM) using the Quality Assessment of Diagnostic Accuracy Studies tool for systematic reviews. Upon systematic review completion, if it is determined that there are sufficient data, a meta-analysis will be performed. Data synthesis will use a quantitative model. Statistical software such as RevMan and STATA will be used to produce a random-effects meta-regression model to pool data from selected studies. Results: Using selected search queries across multiple databases, we accumulated 3494 studies regarding our topic of interest, of which 1588 were duplicates, leaving 1906 unique research papers to review and analyze. Conclusions: This systematic review and meta-analysis protocol outlines a comprehensive evaluation of AI for DR detection. This active study is anticipated to assess the current accuracy of AI methods in detecting DR. International Registered Report Identifier (IRRID): DERR1-10.2196/57292 %M 38801771 %R 10.2196/57292 %U https://www.researchprotocols.org/2024/1/e57292 %U https://doi.org/10.2196/57292 %U http://www.ncbi.nlm.nih.gov/pubmed/38801771 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e56909 %T Generalization of a Deep Learning Model for Continuous Glucose Monitoring–Based Hypoglycemia Prediction: Algorithm Development and Validation Study %A Shao,Jian %A Pan,Ying %A Kou,Wei-Bin %A Feng,Huyi %A Zhao,Yu %A Zhou,Kaixin %A Zhong,Shao %K hypoglycemia prediction %K hypoglycemia %K hypoglycemic %K blood sugar %K prediction %K predictive %K deep learning %K generalization %K machine learning %K glucose %K diabetes %K continuous glucose monitoring %K type 1 diabetes %K type 2 diabetes %K LSTM %K long short-term memory %D 2024 %7 24.5.2024 %9 %J JMIR Med Inform %G English %X Background: Predicting hypoglycemia while maintaining a low false alarm rate is a challenge for the wide adoption of continuous glucose monitoring (CGM) devices in diabetes management. One small study suggested that a deep learning model based on the long short-term memory (LSTM) network had better performance in hypoglycemia prediction than traditional machine learning algorithms in European patients with type 1 diabetes. However, given that many well-recognized deep learning models perform poorly outside the training setting, it remains unclear whether the LSTM model could be generalized to different populations or patients with other diabetes subtypes. Objective: The aim of this study was to validate LSTM hypoglycemia prediction models in more diverse populations and across a wide spectrum of patients with different subtypes of diabetes. Methods: We assembled two large data sets of patients with type 1 and type 2 diabetes. The primary data set including CGM data from 192 Chinese patients with diabetes was used to develop the LSTM, support vector machine (SVM), and random forest (RF) models for hypoglycemia prediction with a prediction horizon of 30 minutes. Hypoglycemia was categorized into mild (glucose=54-70 mg/dL) and severe (glucose<54 mg/dL) levels. The validation data set of 427 patients of European-American ancestry in the United States was used to validate the models and examine their generalizations. The predictive performance of the models was evaluated according to the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results: For the difficult-to-predict mild hypoglycemia events, the LSTM model consistently achieved AUC values greater than 97% in the primary data set, with a less than 3% AUC reduction in the validation data set, indicating that the model was robust and generalizable across populations. AUC values above 93% were also achieved when the LSTM model was applied to both type 1 and type 2 diabetes in the validation data set, further strengthening the generalizability of the model. Under different satisfactory levels of sensitivity for mild and severe hypoglycemia prediction, the LSTM model achieved higher specificity than the SVM and RF models, thereby reducing false alarms. Conclusions: Our results demonstrate that the LSTM model is robust for hypoglycemia prediction and is generalizable across populations or diabetes subtypes. Given its additional advantage of false-alarm reduction, the LSTM model is a strong candidate to be widely implemented in future CGM devices for hypoglycemia prediction. %R 10.2196/56909 %U https://medinform.jmir.org/2024/1/e56909 %U https://doi.org/10.2196/56909 %0 Journal Article %@ 2369-2529 %I %V 11 %N %P e54939 %T Clinical Utility and Usability of the Digital Box and Block Test: Mixed Methods Study %A Prochaska,Eveline %A Ammenwerth,Elske %K assessment %K clinical utility %K digital Box and Block Test %K dBBT %K hand dexterity %K dexterity %K usability %D 2024 %7 23.5.2024 %9 %J JMIR Rehabil Assist Technol %G English %X Background: The Box and Block Test (BBT) is a clinical tool used to measure hand dexterity, which is often used for tracking disease progression or the effectiveness of therapy, particularly benefiting older adults and those with neurological conditions. Digitizing the measurement of hand function may enhance the quality of data collection. We have developed and validated a prototype that digitizes this test, known as the digital BBT (dBBT), which automatically measures time and determines and displays the test result. Objective: This study aimed to investigate the clinical utility and usability of the newly developed dBBT and to collect suggestions for future improvements. Methods: A total of 4 occupational therapists participated in our study. To evaluate the clinical utility, we compared the dBBT to the BBT across dimensions such as acceptance, portability, energy and effort, time, and costs. We observed therapists using the dBBT as a dexterity measurement tool and conducted a quantitative usability questionnaire using the System Usability Scale (SUS), along with a focus group. Evaluative, structured, and qualitative content analysis was used for the qualitative data, whereas quantitative analysis was applied to questionnaire data. The qualitative and quantitative data were merged and analyzed using a convergent mixed methods approach. Results: Overall, the results of the evaluative content analysis suggested that the dBBT had a better clinical utility than the original BBT, with ratings of all collected participant statements for the dBBT being 45% (45/99) equal to, 48% (48/99) better than, and 6% (6/99) lesser than the BBT. Particularly in the subcategories “acceptance,” “time required for evaluation,” and “purchase costs,” the dBBT was rated as being better than the original BBT. The dBBT achieved a mean SUS score of 83 (95% CI 76-96). Additionally, several suggested changes to the system were identified. Conclusions: The study demonstrated an overall positive evaluation of the clinical utility and usability of the dBBT. Valuable insights were gathered for future system iterations. These pioneering results highlight the potential of digitizing hand dexterity assessments. Trial Registration: Open Science Framework qv2d9; https://osf.io/qv2d9 %R 10.2196/54939 %U https://rehab.jmir.org/2024/1/e54939 %U https://doi.org/10.2196/54939 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 3 %N %P e51822 %T Direct Clinical Applications of Natural Language Processing in Common Neurological Disorders: Scoping Review %A Lefkovitz,Ilana %A Walsh,Samantha %A Blank,Leah J %A Jetté,Nathalie %A Kummer,Benjamin R %+ Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave Levy Place, Box 1137, New York, NY, 10029, United States, 1 212 241 5050, benjamin.kummer@mountsinai.org %K natural language processing %K NLP %K unstructured %K text %K machine learning %K deep learning %K neurology %K headache disorders %K migraine %K Parkinson disease %K cerebrovascular disease %K stroke %K transient ischemic attack %K epilepsy %K multiple sclerosis %K cardiovascular %K artificial intelligence %K Parkinson %K neurological %K neurological disorder %K scoping review %K diagnosis %K treatment %K prediction %D 2024 %7 22.5.2024 %9 Review %J JMIR Neurotech %G English %X Background: Natural language processing (NLP), a branch of artificial intelligence that analyzes unstructured language, is being increasingly used in health care. However, the extent to which NLP has been formally studied in neurological disorders remains unclear. Objective: We sought to characterize studies that applied NLP to the diagnosis, prediction, or treatment of common neurological disorders. Methods: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) standards. The search was conducted using MEDLINE and Embase on May 11, 2022. Studies of NLP use in migraine, Parkinson disease, Alzheimer disease, stroke and transient ischemic attack, epilepsy, or multiple sclerosis were included. We excluded conference abstracts, review papers, as well as studies involving heterogeneous clinical populations or indirect clinical uses of NLP. Study characteristics were extracted and analyzed using descriptive statistics. We did not aggregate measurements of performance in our review due to the high variability in study outcomes, which is the main limitation of the study. Results: In total, 916 studies were identified, of which 41 (4.5%) met all eligibility criteria and were included in the final review. Of the 41 included studies, the most frequently represented disorders were stroke and transient ischemic attack (n=20, 49%), followed by epilepsy (n=10, 24%), Alzheimer disease (n=6, 15%), and multiple sclerosis (n=5, 12%). We found no studies of NLP use in migraine or Parkinson disease that met our eligibility criteria. The main objective of NLP was diagnosis (n=20, 49%), followed by disease phenotyping (n=17, 41%), prognostication (n=9, 22%), and treatment (n=4, 10%). In total, 18 (44%) studies used only machine learning approaches, 6 (15%) used only rule-based methods, and 17 (41%) used both. Conclusions: We found that NLP was most commonly applied for diagnosis, implying a potential role for NLP in augmenting diagnostic accuracy in settings with limited access to neurological expertise. We also found several gaps in neurological NLP research, with few to no studies addressing certain disorders, which may suggest additional areas of inquiry. Trial Registration: Prospective Register of Systematic Reviews (PROSPERO) CRD42021228703; https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=228703 %R 10.2196/51822 %U https://neuro.jmir.org/2024/1/e51822 %U https://doi.org/10.2196/51822 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53985 %T Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study %A Harada,Yukinori %A Sakamoto,Tetsu %A Sugimoto,Shu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Shimotsuga, 321-0293, Japan, 81 282 86 1111, yharada@dokkyomed.ac.jp %K atypical presentations %K diagnostic accuracy %K diagnosis %K diagnostics %K symptom checker %K uncommon diseases %K symptom checkers %K uncommon %K rare %K artificial intelligence %D 2024 %7 17.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited. Objective: This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world. Methods: This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker’s diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year). Results: A total of 381 patients were included. Common diseases comprised 257 (67.5%) cases, and typical presentations were observed in 298 (78.2%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1%), which did not differ across the 3 years (first year: 97/219, 44.3%; second year: 32/72, 44.4%; and third year: 43/90, 47.7%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2%) and atypical presentations (12/83, 14.5%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker. Conclusions: A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions. %M 38758588 %R 10.2196/53985 %U https://formative.jmir.org/2024/1/e53985 %U https://doi.org/10.2196/53985 %U http://www.ncbi.nlm.nih.gov/pubmed/38758588 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e57026 %T Ventilator-Associated Pneumonia Prediction Models Based on AI: Scoping Review %A Zhang,Jinbo %A Yang,Pingping %A Zeng,Lu %A Li,Shan %A Zhou,Jiamei %K artificial intelligence %K machine learning %K ventilator-associated pneumonia %K prediction %K scoping %K PRISMA %K Preferred Reporting Items for Systematic Reviews and Meta-Analyses %D 2024 %7 14.5.2024 %9 %J JMIR Med Inform %G English %X Background: Ventilator-associated pneumonia (VAP) is a serious complication of mechanical ventilation therapy that affects patients’ treatments and prognoses. Owing to its excellent data mining capabilities, artificial intelligence (AI) has been increasingly used to predict VAP. Objective: This paper reviews VAP prediction models that are based on AI, providing a reference for the early identification of high-risk groups in future clinical practice. Methods: A scoping review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The Wanfang database, the Chinese Biomedical Literature Database, Cochrane Library, Web of Science, PubMed, MEDLINE, and Embase were searched to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. The data extracted from the included studies were synthesized narratively. Results: Of the 137 publications retrieved, 11 were included in this scoping review. The included studies reported the use of AI for predicting VAP. All 11 studies predicted VAP occurrence, and studies on VAP prognosis were excluded. Further, these studies used text data, and none of them involved imaging data. Public databases were the primary sources of data for model building (studies: 6/11, 55%), and 5 studies had sample sizes of <1000. Machine learning was the primary algorithm for studying the VAP prediction models. However, deep learning and large language models were not used to construct VAP prediction models. The random forest model was the most commonly used model (studies: 5/11, 45%). All studies only performed internal validations, and none of them addressed how to implement and apply the final model in real-life clinical settings. Conclusions: This review presents an overview of studies that used AI to predict and diagnose VAP. AI models have better predictive performance than traditional methods and are expected to provide indispensable tools for VAP risk prediction in the future. However, the current research is in the model construction and validation stage, and the implementation of and guidance for clinical VAP prediction require further research. %R 10.2196/57026 %U https://medinform.jmir.org/2024/1/e57026 %U https://doi.org/10.2196/57026 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e56884 %T The Roles of NOTCH3 p.R544C and Thrombophilia Genes in Vietnamese Patients With Ischemic Stroke: Study Involving a Hierarchical Cluster Analysis %A Bui,Huong Thi Thu %A Nguyễn Thị Phương,Quỳnh %A Cam Tu,Ho %A Nguyen Phuong,Sinh %A Pham,Thuy Thi %A Vu,Thu %A Nguyen Thi Thu,Huyen %A Khanh Ho,Lam %A Nguyen Tien,Dung %+ Department of Internal Medicine, Thai Nguyen University of Medicine and Pharmacy, 284 Luong Ngoc Quyen, Quang Trung, Thai Nguyen, 250000, Vietnam, 84 913516863, dung.nt@tnmc.edu.vn %K Glasgow Coma Scale %K ischemic stroke %K hierarchical cluster analysis %K clustering %K machine learning %K MTHFR %K NOTCH3 %K modified Rankin scale %K National Institutes of Health Stroke Scale %K prothrombin %K thrombophilia %K mutations %K genetics %K genomics %K ischemia %K risk %K risk analysis %D 2024 %7 7.5.2024 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: The etiology of ischemic stroke is multifactorial. Several gene mutations have been identified as leading causes of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), a hereditary disease that causes stroke and other neurological symptoms. Objective: We aimed to identify the variants of NOTCH3 and thrombophilia genes, and their complex interactions with other factors. Methods: We conducted a hierarchical cluster analysis (HCA) on the data of 100 patients diagnosed with ischemic stroke. The variants of NOTCH3 and thrombophilia genes were identified by polymerase chain reaction with confronting 2-pair primers and real-time polymerase chain reaction. The overall preclinical characteristics, cumulative cutpoint values, and factors associated with these somatic mutations were analyzed in unidimensional and multidimensional scaling models. Results: We identified the following optimal cutpoints: creatinine, 83.67 (SD 9.19) µmol/L; age, 54 (SD 5) years; prothrombin (PT) time, 13.25 (SD 0.17) seconds; and international normalized ratio (INR), 1.02 (SD 0.03). Using the Nagelkerke method, cutpoint 50% values of the Glasgow Coma Scale score; modified Rankin scale score; and National Institutes of Health Stroke Scale scores at admission, after 24 hours, and at discharge were 12.77, 2.86 (SD 1.21), 9.83 (SD 2.85), 7.29 (SD 2.04), and 6.85 (SD 2.90), respectively. Conclusions: The variants of MTHFR (C677T and A1298C) and NOTCH3 p.R544C may influence the stroke severity under specific conditions of PT, creatinine, INR, and BMI, with risk ratios of 4.8 (95% CI 1.53-15.04) and 3.13 (95% CI 1.60-6.11), respectively (Pfisher<.05). It is interesting that although there are many genes linked to increased atrial fibrillation risk, not all of them are associated with ischemic stroke risk. With the detection of stroke risk loci, more information can be gained on their impacts and interconnections, especially in young patients. %M 38935968 %R 10.2196/56884 %U https://bioinform.jmir.org/2024/1/e56884 %U https://doi.org/10.2196/56884 %U http://www.ncbi.nlm.nih.gov/pubmed/38935968 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54538 %T Integrating Biomarkers From Virtual Reality and Magnetic Resonance Imaging for the Early Detection of Mild Cognitive Impairment Using a Multimodal Learning Approach: Validation Study %A Park,Bogyeom %A Kim,Yuwon %A Park,Jinseok %A Choi,Hojin %A Kim,Seong-Eun %A Ryu,Hokyoung %A Seo,Kyoungwon %+ Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Sangsang Hall, 4th Fl, Gongneung-ro, Gongneung-dong, Nowon-gu, Seoul, 01811, Republic of Korea, 82 010 5668 8660, kwseo@seoultech.ac.kr %K magnetic resonance imaging %K MRI %K virtual reality %K VR %K early detection %K mild cognitive impairment %K multimodal learning %K hand movement %K eye movement %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Early detection of mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer disease, is crucial for preventing the progression of dementia. Virtual reality (VR) biomarkers have proven to be effective in capturing behaviors associated with subtle deficits in instrumental activities of daily living, such as challenges in using a food-ordering kiosk, for early detection of MCI. On the other hand, magnetic resonance imaging (MRI) biomarkers have demonstrated their efficacy in quantifying observable structural brain changes that can aid in early MCI detection. Nevertheless, the relationship between VR-derived and MRI biomarkers remains an open question. In this context, we explored the integration of VR-derived and MRI biomarkers to enhance early MCI detection through a multimodal learning approach. Objective: We aimed to evaluate and compare the efficacy of VR-derived and MRI biomarkers in the classification of MCI while also examining the strengths and weaknesses of each approach. Furthermore, we focused on improving early MCI detection by leveraging multimodal learning to integrate VR-derived and MRI biomarkers. Methods: The study encompassed a total of 54 participants, comprising 22 (41%) healthy controls and 32 (59%) patients with MCI. Participants completed a virtual kiosk test to collect 4 VR-derived biomarkers (hand movement speed, scanpath length, time to completion, and the number of errors), and T1-weighted MRI scans were performed to collect 22 MRI biomarkers from both hemispheres. Analyses of covariance were used to compare these biomarkers between healthy controls and patients with MCI, with age considered as a covariate. Subsequently, the biomarkers that exhibited significant differences between the 2 groups were used to train and validate a multimodal learning model aimed at early screening for patients with MCI among healthy controls. Results: The support vector machine (SVM) using only VR-derived biomarkers achieved a sensitivity of 87.5% and specificity of 90%, whereas the MRI biomarkers showed a sensitivity of 90.9% and specificity of 71.4%. Moreover, a correlation analysis revealed a significant association between MRI-observed brain atrophy and impaired performance in instrumental activities of daily living in the VR environment. Notably, the integration of both VR-derived and MRI biomarkers into a multimodal SVM model yielded superior results compared to unimodal SVM models, achieving higher accuracy (94.4%), sensitivity (100%), specificity (90.9%), precision (87.5%), and F1-score (93.3%). Conclusions: The results indicate that VR-derived biomarkers, characterized by their high specificity, can be valuable as a robust, early screening tool for MCI in a broader older adult population. On the other hand, MRI biomarkers, known for their high sensitivity, excel at confirming the presence of MCI. Moreover, the multimodal learning approach introduced in our study provides valuable insights into the improvement of early MCI detection by integrating a diverse set of biomarkers. %M 38631021 %R 10.2196/54538 %U https://www.jmir.org/2024/1/e54538 %U https://doi.org/10.2196/54538 %U http://www.ncbi.nlm.nih.gov/pubmed/38631021 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56655 %T Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study %A He,Zhe %A Bhasuran,Balu %A Jin,Qiao %A Tian,Shubo %A Hanna,Karim %A Shavor,Cindy %A Arguello,Lisbeth Garcia %A Murray,Patrick %A Lu,Zhiyong %+ School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306, United States, 1 8506445775, zhe@fsu.edu %K large language models %K generative artificial intelligence %K generative AI %K ChatGPT %K laboratory test results %K patient education %K natural language processing %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. Objective: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test–related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. Methods: We collected laboratory test result–related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. Results: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4–generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4’s responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one’s medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients’ laboratory test result–related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4’s responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation. %M 38630520 %R 10.2196/56655 %U https://www.jmir.org/2024/1/e56655 %U https://doi.org/10.2196/56655 %U http://www.ncbi.nlm.nih.gov/pubmed/38630520 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 16 %N %P e50771 %T Machine Learning for Prediction of Tuberculosis Detection: Case Study of Trained African Giant Pouched Rats %A Jonathan,Joan %A Barakabitze,Alcardo Alex %A Fast,Cynthia D %A Cox,Christophe %+ Department of Informatics and Information Technology, Sokoine University of Agriculture, PO Box 3038, Morogoro, United Republic of Tanzania, 255 763 630 054, joanjonathan@sua.ac.tz %K machine learning %K African giant pouched rat %K diagnosis %K tuberculosis %K health care %D 2024 %7 16.4.2024 %9 Original Paper %J Online J Public Health Inform %G English %X Background: Technological advancement has led to the growth and rapid increase of tuberculosis (TB) medical data generated from different health care areas, including diagnosis. Prioritizing better adoption and acceptance of innovative diagnostic technology to reduce the spread of TB significantly benefits developing countries. Trained TB-detection rats are used in Tanzania and Ethiopia for operational research to complement other TB diagnostic tools. This technology has increased new TB case detection owing to its speed, cost-effectiveness, and sensitivity. Objective: During the TB detection process, rats produce vast amounts of data, providing an opportunity to identify interesting patterns that influence TB detection performance. This study aimed to develop models that predict if the rat will hit (indicate the presence of TB within) the sample or not using machine learning (ML) techniques. The goal was to improve the diagnostic accuracy and performance of TB detection involving rats. Methods: APOPO (Anti-Persoonsmijnen Ontmijnende Product Ontwikkeling) Center in Morogoro provided data for this study from 2012 to 2019, and 366,441 observations were used to build predictive models using ML techniques, including decision tree, random forest, naïve Bayes, support vector machine, and k-nearest neighbor, by incorporating a variety of variables, such as the diagnostic results from partner health clinics using methods endorsed by the World Health Organization (WHO). Results: The support vector machine technique yielded the highest accuracy of 83.39% for prediction compared to other ML techniques used. Furthermore, this study found that the inclusion of variables related to whether the sample contained TB or not increased the performance accuracy of the predictive model. Conclusions: The inclusion of variables related to the diagnostic results of TB samples may improve the detection performance of the trained rats. The study results may be of importance to TB-detection rat trainers and TB decision-makers as the results may prompt them to take action to maintain the usefulness of the technology and increase the TB detection performance of trained rats. %M 38625737 %R 10.2196/50771 %U https://ojphi.jmir.org/2024/1/e50771 %U https://doi.org/10.2196/50771 %U http://www.ncbi.nlm.nih.gov/pubmed/38625737 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 9 %N %P e56246 %T Impact of Audio Data Compression on Feature Extraction for Vocal Biomarker Detection: Validation Study %A Oreskovic,Jessica %A Kaufman,Jaycee %A Fossat,Yan %+ Klick Labs, 175 Bloor St E #300, 3rd floor, Toronto, ON, M4W3R8, Canada, 1 6472068717, yfossat@klick.com %K vocal biomarker %K biomarker %K biomarkers %K sound %K sounds %K audio %K compression %K voice %K acoustic %K acoustics %K audio compression %K feature extraction %K Python %K speech %K detect %K detection %K algorithm %K algorithms %D 2024 %7 15.4.2024 %9 Original Paper %J JMIR Biomed Eng %G English %X Background: Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care. Objective: The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection. Methods: The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis. Results: In this study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods. Conclusions: Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This study enhances our understanding of audio compression’s influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes. %M 38875677 %R 10.2196/56246 %U https://biomedeng.jmir.org/2024/1/e56246 %U https://doi.org/10.2196/56246 %U http://www.ncbi.nlm.nih.gov/pubmed/38875677 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51250 %T Application of AI in Multilevel Pain Assessment Using Facial Images: Systematic Review and Meta-Analysis %A Huo,Jian %A Yu,Yan %A Lin,Wei %A Hu,Anmin %A Wu,Chaoran %+ Department of Anesthesia, Shenzhen People's Hospital, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen Key Medical Discipline, No 1017, Dongmen North Road, Shenzhen, 518020, China, 86 18100282848, wu.chaoran@szhospital.com %K computer vision %K facial image %K monitoring %K multilevel pain assessment %K pain %K postoperative %K status %D 2024 %7 12.4.2024 %9 Review %J J Med Internet Res %G English %X Background: The continuous monitoring and recording of patients’ pain status is a major problem in current research on postoperative pain management. In the large number of original or review articles focusing on different approaches for pain assessment, many researchers have investigated how computer vision (CV) can help by capturing facial expressions. However, there is a lack of proper comparison of results between studies to identify current research gaps. Objective: The purpose of this systematic review and meta-analysis was to investigate the diagnostic performance of artificial intelligence models for multilevel pain assessment from facial images. Methods: The PubMed, Embase, IEEE, Web of Science, and Cochrane Library databases were searched for related publications before September 30, 2023. Studies that used facial images alone to estimate multiple pain values were included in the systematic review. A study quality assessment was conducted using the Quality Assessment of Diagnostic Accuracy Studies, 2nd edition tool. The performance of these studies was assessed by metrics including sensitivity, specificity, log diagnostic odds ratio (LDOR), and area under the curve (AUC). The intermodal variability was assessed and presented by forest plots. Results: A total of 45 reports were included in the systematic review. The reported test accuracies ranged from 0.27-0.99, and the other metrics, including the mean standard error (MSE), mean absolute error (MAE), intraclass correlation coefficient (ICC), and Pearson correlation coefficient (PCC), ranged from 0.31-4.61, 0.24-2.8, 0.19-0.83, and 0.48-0.92, respectively. In total, 6 studies were included in the meta-analysis. Their combined sensitivity was 98% (95% CI 96%-99%), specificity was 98% (95% CI 97%-99%), LDOR was 7.99 (95% CI 6.73-9.31), and AUC was 0.99 (95% CI 0.99-1). The subgroup analysis showed that the diagnostic performance was acceptable, although imbalanced data were still emphasized as a major problem. All studies had at least one domain with a high risk of bias, and for 20% (9/45) of studies, there were no applicability concerns. Conclusions: This review summarizes recent evidence in automatic multilevel pain estimation from facial expressions and compared the test accuracy of results in a meta-analysis. Promising performance for pain estimation from facial images was established by current CV algorithms. Weaknesses in current studies were also identified, suggesting that larger databases and metrics evaluating multiclass classification performance could improve future studies. Trial Registration: PROSPERO CRD42023418181; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=418181 %M 38607660 %R 10.2196/51250 %U https://www.jmir.org/2024/1/e51250 %U https://doi.org/10.2196/51250 %U http://www.ncbi.nlm.nih.gov/pubmed/38607660 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e55627 %T Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Ito,Takahiro %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 282 87 2498, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K large language model %K LLM %K LLMs %K language model %K language models %K ChatGPT %K GPT %K ChatGPT-4V %K ChatGPT-4 Vision %K clinical decision support %K natural language processing %K decision support %K NLP %K diagnostic excellence %K diagnosis %K diagnoses %K diagnose %K diagnostic %K diagnostics %K image %K images %K imaging %D 2024 %7 9.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the evolving field of health care, multimodal generative artificial intelligence (AI) systems, such as ChatGPT-4 with vision (ChatGPT-4V), represent a significant advancement, as they integrate visual data with text data. This integration has the potential to revolutionize clinical diagnostics by offering more comprehensive analysis capabilities. However, the impact on diagnostic accuracy of using image data to augment ChatGPT-4 remains unclear. Objective: This study aims to assess the impact of adding image data on ChatGPT-4’s diagnostic accuracy and provide insights into how image data integration can enhance the accuracy of multimodal AI in medical diagnostics. Specifically, this study endeavored to compare the diagnostic accuracy between ChatGPT-4V, which processed both text and image data, and its counterpart, ChatGPT-4, which only uses text data. Methods: We identified a total of 557 case reports published in the American Journal of Case Reports from January 2022 to March 2023. After excluding cases that were nondiagnostic, pediatric, and lacking image data, we included 363 case descriptions with their final diagnoses and associated images. We compared the diagnostic accuracy of ChatGPT-4V and ChatGPT-4 without vision based on their ability to include the final diagnoses within differential diagnosis lists. Two independent physicians evaluated their accuracy, with a third resolving any discrepancies, ensuring a rigorous and objective analysis. Results: The integration of image data into ChatGPT-4V did not significantly enhance diagnostic accuracy, showing that final diagnoses were included in the top 10 differential diagnosis lists at a rate of 85.1% (n=309), comparable to the rate of 87.9% (n=319) for the text-only version (P=.33). Notably, ChatGPT-4V’s performance in correctly identifying the top diagnosis was inferior, at 44.4% (n=161), compared with 55.9% (n=203) for the text-only version (P=.002, χ2 test). Additionally, ChatGPT-4’s self-reports showed that image data accounted for 30% of the weight in developing the differential diagnosis lists in more than half of cases. Conclusions: Our findings reveal that currently, ChatGPT-4V predominantly relies on textual data, limiting its ability to fully use the diagnostic potential of visual information. This study underscores the need for further development of multimodal generative AI systems to effectively integrate and use clinical image data. Enhancing the diagnostic performance of such AI systems through improved multimodal data integration could significantly benefit patient care by providing more accurate and comprehensive diagnostic insights. Future research should focus on overcoming these limitations, paving the way for the practical application of advanced AI in medicine. %M 38592758 %R 10.2196/55627 %U https://medinform.jmir.org/2024/1/e55627 %U https://doi.org/10.2196/55627 %U http://www.ncbi.nlm.nih.gov/pubmed/38592758 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e48862 %T Interpretable Deep Learning System for Identifying Critical Patients Through the Prediction of Triage Level, Hospitalization, and Length of Stay: Prospective Study %A Lin,Yu-Ting %A Deng,Yuan-Xiang %A Tsai,Chu-Lin %A Huang,Chien-Hua %A Fu,Li-Chen %+ Department of Computer Science and Information Engineering, National Taiwan University, CSIE Der Tian Hall No. 1, Sec. 4, Roosevelt Road, Taipei, 106319, Taiwan, 886 935545846, lichen@ntu.edu.tw %K emergency department %K triage system %K hospital admission %K length of stay %K multimodal integration %D 2024 %7 1.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Triage is the process of accurately assessing patients’ symptoms and providing them with proper clinical treatment in the emergency department (ED). While many countries have developed their triage process to stratify patients’ clinical severity and thus distribute medical resources, there are still some limitations of the current triage process. Since the triage level is mainly identified by experienced nurses based on a mix of subjective and objective criteria, mis-triage often occurs in the ED. It can not only cause adverse effects on patients, but also impose an undue burden on the health care delivery system. Objective: Our study aimed to design a prediction system based on triage information, including demographics, vital signs, and chief complaints. The proposed system can not only handle heterogeneous data, including tabular data and free-text data, but also provide interpretability for better acceptance by the ED staff in the hospital. Methods: In this study, we proposed a system comprising 3 subsystems, with each of them handling a single task, including triage level prediction, hospitalization prediction, and length of stay prediction. We used a large amount of retrospective data to pretrain the model, and then, we fine-tuned the model on a prospective data set with a golden label. The proposed deep learning framework was built with TabNet and MacBERT (Chinese version of bidirectional encoder representations from transformers [BERT]). Results: The performance of our proposed model was evaluated on data collected from the National Taiwan University Hospital (901 patients were included). The model achieved promising results on the collected data set, with accuracy values of 63%, 82%, and 71% for triage level prediction, hospitalization prediction, and length of stay prediction, respectively. Conclusions: Our system improved the prediction of 3 different medical outcomes when compared with other machine learning methods. With the pretrained vital sign encoder and repretrained mask language modeling MacBERT encoder, our multimodality model can provide a deeper insight into the characteristics of electronic health records. Additionally, by providing interpretability, we believe that the proposed system can assist nursing staff and physicians in taking appropriate medical decisions. %M 38557661 %R 10.2196/48862 %U https://medinform.jmir.org/2024/1/e48862 %U https://doi.org/10.2196/48862 %U http://www.ncbi.nlm.nih.gov/pubmed/38557661 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e55802 %T Clinical Decision Support Requirements for Ventricular Tachycardia Diagnosis Within the Frameworks of Knowledge and Practice: Survey Study %A Hu,Zhao %A Wang,Min %A Zheng,Si %A Xu,Xiaowei %A Zhang,Zhuxin %A Ge,Qiaoyue %A Li,Jiao %A Yao,Yan %+ Arrhythmia Center, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College/National Center for Cardiovascular Diseases, Beilishi Road 167, Beijing, 100037, China, 86 10 88322401, ianyao@263.net.cn %K clinical decision support system %K requirements analysis %K ventricular tachycardia %K knowledge %K clinical practice %K questionnaires %D 2024 %7 26.3.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Ventricular tachycardia (VT) diagnosis is challenging due to the similarity between VT and some forms of supraventricular tachycardia, complexity of clinical manifestations, heterogeneity of underlying diseases, and potential for life-threatening hemodynamic instability. Clinical decision support systems (CDSSs) have emerged as promising tools to augment the diagnostic capabilities of cardiologists. However, a requirements analysis is acknowledged to be vital for the success of a CDSS, especially for complex clinical tasks such as VT diagnosis. Objective: The aims of this study were to analyze the requirements for a VT diagnosis CDSS within the frameworks of knowledge and practice and to determine the clinical decision support (CDS) needs. Methods: Our multidisciplinary team first conducted semistructured interviews with seven cardiologists related to the clinical challenges of VT and expected decision support. A questionnaire was designed by the multidisciplinary team based on the results of interviews. The questionnaire was divided into four sections: demographic information, knowledge assessment, practice assessment, and CDS needs. The practice section consisted of two simulated cases for a total score of 10 marks. Online questionnaires were disseminated to registered cardiologists across China from December 2022 to February 2023. The scores for the practice section were summarized as continuous variables, using the mean, median, and range. The knowledge and CDS needs sections were assessed using a 4-point Likert scale without a neutral option. Kruskal-Wallis tests were performed to investigate the relationship between scores and practice years or specialty. Results: Of the 687 cardiologists who completed the questionnaire, 567 responses were eligible for further analysis. The results of the knowledge assessment showed that 383 cardiologists (68%) lacked knowledge in diagnostic evaluation. The overall average score of the practice assessment was 6.11 (SD 0.55); the etiological diagnosis section had the highest overall scores (mean 6.74, SD 1.75), whereas the diagnostic evaluation section had the lowest scores (mean 5.78, SD 1.19). A majority of cardiologists (344/567, 60.7%) reported the need for a CDSS. There was a significant difference in practice competency scores between general cardiologists and arrhythmia specialists (P=.02). Conclusions: There was a notable deficiency in the knowledge and practice of VT among Chinese cardiologists. Specific knowledge and practice support requirements were identified, which provide a foundation for further development and optimization of a CDSS. Moreover, it is important to consider clinicians’ specialization levels and years of practice for effective and personalized support. %M 38530337 %R 10.2196/55802 %U https://humanfactors.jmir.org/2024/1/e55802 %U https://doi.org/10.2196/55802 %U http://www.ncbi.nlm.nih.gov/pubmed/38530337 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e52602 %T AI as a Medical Device for Ophthalmic Imaging in Europe, Australia, and the United States: Protocol for a Systematic Scoping Review of Regulated Devices %A Ong,Ariel Yuhan %A Hogg,Henry David Jeffry %A Kale,Aditya U %A Taribagil,Priyal %A Kras,Ashley %A Dow,Eliot %A Macdonald,Trystan %A Liu,Xiaoxuan %A Keane,Pearse A %A Denniston,Alastair K %+ Institute of Inflammation and Ageing, University of Birmingham, College of Medical and Dental Sciences, Edgbaston, Birmingham, B15 2TT, United Kingdom, 44 01213716905, a.denniston@bham.ac.uk %K AIaMD %K artificial intelligence as a medical device %K artificial intelligence %K deep learning %K machine learning %K ophthalmic imaging %K regulatory approval %D 2024 %7 14.3.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence as a medical device (AIaMD) has the potential to transform many aspects of ophthalmic care, such as improving accuracy and speed of diagnosis, addressing capacity issues in high-volume areas such as screening, and detecting novel biomarkers of systemic disease in the eye (oculomics). In order to ensure that such tools are safe for the target population and achieve their intended purpose, it is important that these AIaMD have adequate clinical evaluation to support any regulatory decision. Currently, the evidential requirements for regulatory approval are less clear for AIaMD compared to more established interventions such as drugs or medical devices. There is therefore value in understanding the level of evidence that underpins AIaMD currently on the market, as a step toward identifying what the best practices might be in this area. In this systematic scoping review, we will focus on AIaMD that contributes to clinical decision-making (relating to screening, diagnosis, prognosis, and treatment) in the context of ophthalmic imaging. Objective: This study aims to identify regulator-approved AIaMD for ophthalmic imaging in Europe, Australia, and the United States; report the characteristics of these devices and their regulatory approvals; and report the available evidence underpinning these AIaMD. Methods: The Food and Drug Administration (United States), the Australian Register of Therapeutic Goods (Australia), the Medicines and Healthcare products Regulatory Agency (United Kingdom), and the European Database on Medical Devices (European Union) regulatory databases will be searched for ophthalmic imaging AIaMD through a snowballing approach. PubMed and clinical trial registries will be systematically searched, and manufacturers will be directly contacted for studies investigating the effectiveness of eligible AIaMD. Preliminary regulatory database searches, evidence searches, screening, data extraction, and methodological quality assessment will be undertaken by 2 independent review authors and arbitrated by a third at each stage of the process. Results: Preliminary searches were conducted in February 2023. Data extraction, data synthesis, and assessment of methodological quality commenced in October 2023. The review is on track to be completed and submitted for peer review by April 2024. Conclusions: This systematic review will provide greater clarity on ophthalmic imaging AIaMD that have achieved regulatory approval as well as the evidence that underpins them. This should help adopters understand the range of tools available and whether they can be safely incorporated into their clinical workflow, and it should also support developers in navigating regulatory approval more efficiently. International Registered Report Identifier (IRRID): DERR1-10.2196/52602 %M 38483456 %R 10.2196/52602 %U https://www.researchprotocols.org/2024/1/e52602 %U https://doi.org/10.2196/52602 %U http://www.ncbi.nlm.nih.gov/pubmed/38483456 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e49965 %T Oral Cannabidiol for Seborrheic Dermatitis in Patients With Parkinson Disease: Randomized Clinical Trial %A Weber,Isaac %A Zagona-Prizio,Caterina %A Sivesind,Torunn E %A Adelman,Madeline %A Szeto,Mindy D %A Liu,Ying %A Sillau,Stefan H %A Bainbridge,Jacquelyn %A Klawitter,Jost %A Sempio,Cristina %A Dunnick,Cory A %A Leehey,Maureen A %A Dellavalle,Robert P %+ Dermatology Service, Rocky Mountain Regional Veterans Affairs Medical Center, 1700 N Wheeling St, Rm E1-342, Aurora, CO, 80045, United States, 1 720 857 5562, Robert.dellavalle@ucdenver.edu %K cannabidiol %K cannabis %K CBD treatment %K CBD %K image %K photograph %K photographs %K imaging %K sebum %K clinical trials %K seborrheic dermatitis %K Parkinson disease %K clinical trial %K RCT %K randomized %K controlled trial %K drug response %K SEDASI %K drug %K Parkinson %K dermatitis %K skin %K dermatology %K seborrheic dermatitis %K treatment %K outcome %K cannabis %K chi-square %D 2024 %7 11.3.2024 %9 Original Paper %J JMIR Dermatol %G English %X Background: Seborrheic dermatitis (SD) affects 18.6%-59% of persons with Parkinson disease (PD), and recent studies provide evidence that oral cannabidiol (CBD) therapy could reduce sebum production in addition to improving motor and psychiatric symptoms in PD. Therefore, oral CBD could be useful for improving symptoms of both commonly co-occurring conditions. Objective: This study investigates whether oral CBD therapy is associated with a decrease in SD severity in PD. Methods: Facial photographs were collected as a component of a randomized (1:1 CBD vs placebo), parallel, double-blind, placebo-controlled trial assessing the efficacy of a short-term 2.5 mg per kg per day oral sesame solution CBD-rich cannabis extract (formulated to 100 mg/mL CBD and 3.3 mg/mL THC) for reducing motor symptoms in PD. Participants took 1.25 mg per kg per day each morning for 4 ±1 days and then twice daily for 10 ±4 days. Reviewers analyzed the photographs independently and provided a severity ranking based on the Seborrheic Dermatitis Area and Severity Index (SEDASI) scale. Baseline demographic and disease characteristics, as well as posttreatment SEDASI averages and the presence of SD, were analyzed with 2-tailed t tests and Pearson χ2 tests. SEDASI was analyzed with longitudinal regression, and SD was analyzed with generalized estimating equations. Results: A total of 27 participants received a placebo and 26 received CBD for 16 days. SD severity was low in both groups at baseline, and there was no treatment effect. The risk ratio for patients receiving CBD, post versus pre, was 0.69 (95% CI 0.41-1.18; P=.15), compared to 1.20 (95% CI 0.88-1.65; P=.26) for the patients receiving the placebo. The within-group pre-post change was not statistically significant for either group, but they differed from each other (P=.07) because there was an estimated improvement for the CBD group and an estimated worsening for the placebo group. Conclusions: This study does not provide solid evidence that oral CBD therapy reduces the presence of SD among patients with PD. While this study was sufficiently powered to detect the primary outcome (efficacy of CBD on PD motor symptoms), it was underpowered for the secondary outcomes of detecting changes in the presence and severity of SD. Multiple mechanisms exist through which CBD can exert beneficial effects on SD pathogenesis. Larger studies, including participants with increased disease severity and longer treatment periods, may better elucidate treatment effects and are needed to determine CBD’s true efficacy for affecting SD severity. Trial Registration: ClinicalTrials.gov NCT03582137; https://clinicaltrials.gov/ct2/show/NCT03582137 %M 38466972 %R 10.2196/49965 %U https://derma.jmir.org/2024/1/e49965 %U https://doi.org/10.2196/49965 %U http://www.ncbi.nlm.nih.gov/pubmed/38466972 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e47803 %T Optimization of Using Multiple Machine Learning Approaches in Atrial Fibrillation Detection Based on a Large-Scale Data Set of 12-Lead Electrocardiograms: Cross-Sectional Study %A Chuang,Beau Bo-Sheng %A Yang,Albert C %+ Digital Medicine and Smart Healthcare Research Center, National Yang Ming Chiao Tung University, No 155, Li-Nong St, Sec.2, Beitou District, Taipei, 112304, Taiwan, 886 228267995, accyang@nycu.edu.tw %K machine learning %K atrial fibrillation %K light gradient boosting machine %K power spectral density %K digital health %K electrocardiogram %K machine learning algorithm %K atrial fibrillation detection %K real-time %K detection %K electrocardiography leads %K clinical outcome %D 2024 %7 11.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Atrial fibrillation (AF) represents a hazardous cardiac arrhythmia that significantly elevates the risk of stroke and heart failure. Despite its severity, its diagnosis largely relies on the proficiency of health care professionals. At present, the real-time identification of paroxysmal AF is hindered by the lack of automated techniques. Consequently, a highly effective machine learning algorithm specifically designed for AF detection could offer substantial clinical benefits. We hypothesized that machine learning algorithms have the potential to identify and extract features of AF with a high degree of accuracy, given the intricate and distinctive patterns present in electrocardiogram (ECG) recordings of AF. Objective: This study aims to develop a clinically valuable machine learning algorithm that can accurately detect AF and compare different leads’ performances of AF detection. Methods: We used 12-lead ECG recordings sourced from the 2020 PhysioNet Challenge data sets. The Welch method was used to extract power spectral features of the 12-lead ECGs within a frequency range of 0.083 to 24.92 Hz. Subsequently, various machine learning techniques were evaluated and optimized to classify sinus rhythm (SR) and AF based on these power spectral features. Furthermore, we compared the effects of different frequency subbands and different lead selections on machine learning performances. Results: The light gradient boosting machine (LightGBM) was found to be the most effective in classifying AF and SR, achieving an average F1-score of 0.988 across all ECG leads. Among the frequency subbands, the 0.083 to 4.92 Hz range yielded the highest F1-score of 0.985. In interlead comparisons, aVR had the highest performance (F1=0.993), with minimal differences observed between leads. Conclusions: In conclusion, this study successfully used machine learning methodologies, particularly the LightGBM model, to differentiate SR and AF based on power spectral features derived from 12-lead ECGs. The performance marked by an average F1-score of 0.988 and minimal interlead variation underscores the potential of machine learning algorithms to bolster real-time AF detection. This advancement could significantly improve patient care in intensive care units as well as facilitate remote monitoring through wearable devices, ultimately enhancing clinical outcomes. %M 38466973 %R 10.2196/47803 %U https://formative.jmir.org/2024/1/e47803 %U https://doi.org/10.2196/47803 %U http://www.ncbi.nlm.nih.gov/pubmed/38466973 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e46817 %T Comparison of the Discrimination Performance of AI Scoring and the Brixia Score in Predicting COVID-19 Severity on Chest X-Ray Imaging: Diagnostic Accuracy Study %A Tenda,Eric Daniel %A Yunus,Reyhan Eddy %A Zulkarnaen,Benny %A Yugo,Muhammad Reynalzi %A Pitoyo,Ceva Wicaksono %A Asaf,Moses Mazmur %A Islamiyati,Tiara Nur %A Pujitresnani,Arierta %A Setiadharma,Andry %A Henrina,Joshua %A Rumende,Cleopas Martin %A Wulani,Vally %A Harimurti,Kuntjoro %A Lydia,Aida %A Shatri,Hamzah %A Soewondo,Pradana %A Yusuf,Prasandhya Astagiri %+ Department of Medical Physiology and Biophysics/ Medical Technology Cluster IMERI, Faculty of Medicine, Universitas Indonesia, Jalan Salemba Raya No.6, Jakarta, 10430, Indonesia, 62 812 8459 4272, prasandhya.a.yusuf@ui.ac.id %K artificial intelligence %K Brixia %K chest x-ray %K COVID-19 %K CAD4COVID %K pneumonia %K radiograph %K artificial intelligence scoring system %K AI scoring system %K prediction %K disease severity %D 2024 %7 7.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The artificial intelligence (AI) analysis of chest x-rays can increase the precision of binary COVID-19 diagnosis. However, it is unknown if AI-based chest x-rays can predict who will develop severe COVID-19, especially in low- and middle-income countries. Objective: The study aims to compare the performance of human radiologist Brixia scores versus 2 AI scoring systems in predicting the severity of COVID-19 pneumonia. Methods: We performed a cross-sectional study of 300 patients suspected with and with confirmed COVID-19 infection in Jakarta, Indonesia. A total of 2 AI scores were generated using CAD4COVID x-ray software. Results: The AI probability score had slightly lower discrimination (area under the curve [AUC] 0.787, 95% CI 0.722-0.852). The AI score for the affected lung area (AUC 0.857, 95% CI 0.809-0.905) was almost as good as the human Brixia score (AUC 0.863, 95% CI 0.818-0.908). Conclusions: The AI score for the affected lung area and the human radiologist Brixia score had similar and good discrimination performance in predicting COVID-19 severity. Our study demonstrated that using AI-based diagnostic tools is possible, even in low-resource settings. However, before it is widely adopted in daily practice, more studies with a larger scale and that are prospective in nature are needed to confirm our findings. %M 38451633 %R 10.2196/46817 %U https://formative.jmir.org/2024/1/e46817 %U https://doi.org/10.2196/46817 %U http://www.ncbi.nlm.nih.gov/pubmed/38451633 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 9 %N %P e58911 %T Enhancing Ultrasound Image Quality Across Disease Domains: Application of Cycle-Consistent Generative Adversarial Network and Perceptual Loss %A Athreya,Shreeram %A Radhachandran,Ashwath %A Ivezić,Vedrana %A Sant,Vivek R %A Arnold,Corey W %A Speier,William %+ Department of Electrical and Computer Engineering, University of California Los Angeles, 924 Westwood Boulevard, Suite 600, Los Angeles, CA, 90024, United States, 1 4244206158, shreeram@ucla.edu %K generative networks %K cycle generative adversarial network %K image enhancement %K perceptual loss %K ultrasound scans %K ultrasound images %K imaging %K machine learning %K portable handheld devices %D 2024 %7 17.12.2024 %9 Original Paper %J JMIR Biomed Eng %G English %X Background: Numerous studies have explored image processing techniques aimed at enhancing ultrasound images to narrow the performance gap between low-quality portable devices and high-end ultrasound equipment. These investigations often use registered image pairs created by modifying the same image through methods like down sampling or adding noise, rather than using separate images from different machines. Additionally, they rely on organ-specific features, limiting the models’ generalizability across various imaging conditions and devices. The challenge remains to develop a universal framework capable of improving image quality across different devices and conditions, independent of registration or specific organ characteristics. Objective: This study aims to develop a robust framework that enhances the quality of ultrasound images, particularly those captured with compact, portable devices, which are often constrained by low quality due to hardware limitations. The framework is designed to effectively process nonregistered ultrasound image pairs, a common challenge in medical imaging, across various clinical settings and device types. By addressing these challenges, the research seeks to provide a more generalized and adaptable solution that can be widely applied across diverse medical scenarios, improving the accessibility and quality of diagnostic imaging. Methods: A retrospective analysis was conducted by using a cycle-consistent generative adversarial network (CycleGAN) framework enhanced with perceptual loss to improve the quality of ultrasound images, focusing on nonregistered image pairs from various organ systems. The perceptual loss was integrated to preserve anatomical integrity by comparing deep features extracted from pretrained neural networks. The model’s performance was evaluated against corresponding high-resolution images, ensuring that the enhanced outputs closely mimic those from high-end ultrasound devices. The model was trained and validated using a publicly available, diverse dataset to ensure robustness and generalizability across different imaging scenarios. Results: The advanced CycleGAN framework, enhanced with perceptual loss, significantly outperformed the previous state-of-the-art, stable CycleGAN, in multiple evaluation metrics. Specifically, our method achieved a structural similarity index of 0.2889 versus 0.2502 (P<.001), a peak signal-to-noise ratio of 15.8935 versus 14.9430 (P<.001), and a learned perceptual image patch similarity score of 0.4490 versus 0.5005 (P<.001). These results demonstrate the model’s superior ability to enhance image quality while preserving critical anatomical details, thereby improving diagnostic usefulness. Conclusions: This study presents a significant advancement in ultrasound imaging by leveraging a CycleGAN model enhanced with perceptual loss to bridge the quality gap between images from different devices. By processing nonregistered image pairs, the model not only enhances visual quality but also ensures the preservation of essential anatomical structures, crucial for accurate diagnosis. This approach holds the potential to democratize high-quality ultrasound imaging, making it accessible through low-cost portable devices, thereby improving health care outcomes, particularly in resource-limited settings. Future research will focus on further validation and optimization for clinical use. %M 39689310 %R 10.2196/58911 %U https://biomedeng.jmir.org/2024/1/e58911 %U https://doi.org/10.2196/58911 %U http://www.ncbi.nlm.nih.gov/pubmed/39689310 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48544 %T Economic Evaluations and Equity in the Use of Artificial Intelligence in Imaging Exams for Medical Diagnosis in People With Skin, Neurological, and Pulmonary Diseases: Protocol for a Systematic Review %A Santana,Giulia Osório %A Couto,Rodrigo de Macedo %A Loureiro,Rafael Maffei %A Furriel,Brunna Carolinne Rocha Silva %A Rother,Edna Terezinha %A de Paiva,Joselisa Péres Queiroz %A Correia,Lucas Reis %+ PROADI-SUS, Hospital Israelita Albert Einstein, Madre Cabrini Street, 462, Tower A, 5th Floor, São Paulo, Brazil, 55 11 97444 8995, giulia.santana@einstein.br %K artificial intelligence %K economic evaluation %K equity %K medical diagnosis %K health care system %K technology %K systematic review %K cost-effectiveness %K imaging exam %K intervention %D 2023 %7 28.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Traditional health care systems face long-standing challenges, including patient diversity, geographical disparities, and financial constraints. The emergence of artificial intelligence (AI) in health care offers solutions to these challenges. AI, a multidisciplinary field, enhances clinical decision-making. However, imbalanced AI models may enhance health disparities. Objective: This systematic review aims to investigate the economic performance and equity impact of AI in diagnostic imaging for skin, neurological, and pulmonary diseases. The research question is “To what extent does the use of AI in imaging exams for diagnosing skin, neurological, and pulmonary diseases result in improved economic outcomes, and does it promote equity in health care systems?” Methods: The study is a systematic review of economic and equity evaluations following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and CHEERS (Consolidated Health Economic Evaluation Reporting Standards) guidelines. Eligibility criteria include articles reporting on economic evaluations or equity considerations related to AI-based diagnostic imaging for specified diseases. Data will be collected from PubMed, Embase, Scopus, Web of Science, and reference lists. Data quality and transferability will be assessed according to CHEC (Consensus on Health Economic Criteria), EPHPP (Effective Public Health Practice Project), and Welte checklists. Results: This systematic review began in March 2023. The literature search identified 9,526 publications and, after full-text screening, 9 publications were included in the study. We plan to submit a manuscript to a peer-reviewed journal once it is finalized, with an expected completion date in January 2024. Conclusions: AI in diagnostic imaging offers potential benefits but also raises concerns about equity and economic impact. Bias in algorithms and disparities in access may hinder equitable outcomes. Evaluating the economic viability of AI applications is essential for resource allocation and affordability. Policy makers and health care stakeholders can benefit from this review’s insights to make informed decisions. Limitations, including study variability and publication bias, will be considered in the analysis. This systematic review will provide valuable insights into the economic and equity implications of AI in diagnostic imaging. It aims to inform evidence-based decision-making and contribute to more efficient and equitable health care systems. International Registered Report Identifier (IRRID): DERR1-10.2196/48544 %M 38153775 %R 10.2196/48544 %U https://www.researchprotocols.org/2023/1/e48544 %U https://doi.org/10.2196/48544 %U http://www.ncbi.nlm.nih.gov/pubmed/38153775 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e53058 %T Risk Prediction of Emergency Department Visits in Patients With Lung Cancer Using Machine Learning: Retrospective Observational Study %A Lee,Ah Ra %A Park,Hojoon %A Yoo,Aram %A Kim,Seok %A Sunwoo,Leonard %A Yoo,Sooyoung %+ Office of eHealth Research and Business, Seoul National University Bundang Hospital, 172, Dolma-ro, Bundang-gu, Seongnam-si, 13605, Republic of Korea, 82 31 787 8980, yoosoo0@snubh.org %K emergency department %K lung cancer %K risk prediction %K machine learning %K common data model %K emergency %K hospitalization %K hospitalizations %K lung %K cancer %K oncology %K lungs %K pulmonary %K respiratory %K predict %K prediction %K predictions %K predictive %K algorithm %K algorithms %K risk %K risks %K model %K models %D 2023 %7 6.12.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patients with lung cancer are among the most frequent visitors to emergency departments due to cancer-related problems, and the prognosis for those who seek emergency care is dismal. Given that patients with lung cancer frequently visit health care facilities for treatment or follow-up, the ability to predict emergency department visits based on clinical information gleaned from their routine visits would enhance hospital resource utilization and patient outcomes. Objective: This study proposed a machine learning–based prediction model to identify risk factors for emergency department visits by patients with lung cancer. Methods: This was a retrospective observational study of patients with lung cancer diagnosed at Seoul National University Bundang Hospital, a tertiary general hospital in South Korea, between January 2010 and December 2017. The primary outcome was an emergency department visit within 30 days of an outpatient visit. This study developed a machine learning–based prediction model using a common data model. In addition, the importance of features that influenced the decision-making of the model output was analyzed to identify significant clinical factors. Results: The model with the best performance demonstrated an area under the receiver operating characteristic curve of 0.73 in its ability to predict the attendance of patients with lung cancer in emergency departments. The frequency of recent visits to the emergency department and several laboratory test results that are typically collected during cancer treatment follow-up visits were revealed as influencing factors for the model output. Conclusions: This study developed a machine learning–based risk prediction model using a common data model and identified influencing factors for emergency department visits by patients with lung cancer. The predictive model contributes to the efficiency of resource utilization and health care service quality by facilitating the identification and early intervention of high-risk patients. This study demonstrated the possibility of collaborative research among different institutions using the common data model for precision medicine in lung cancer. %M 38055320 %R 10.2196/53058 %U https://medinform.jmir.org/2023/1/e53058 %U https://doi.org/10.2196/53058 %U http://www.ncbi.nlm.nih.gov/pubmed/38055320 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48142 %T Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy %A Yoon,Jeewoo %A Han,Jinyoung %A Ko,Junseo %A Choi,Seong %A Park,Ji In %A Hwang,Joon Seo %A Han,Jeong Mo %A Hwang,Daniel Duck-Jin %+ Department of Ophthalmology, Hangil Eye Hospital, 35 Bupyeong-daero, Bupyeong-gu, Incheon, Incheon, 21388, Republic of Korea, 82 327175808, daniel.dj.hwang@gmail.com %K computer aided diagnosis %K ophthalmology %K deep learning %K artificial intelligence %K computer vision %K imaging informatics %K retinal disease %K central serous chorioretinopathy %K diagnostic study %D 2023 %7 29.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Although previous research has made substantial progress in developing high-performance artificial intelligence (AI)–based computer-aided diagnosis (AI-CAD) systems in various medical domains, little attention has been paid to developing and evaluating AI-CAD system in ophthalmology, particularly for diagnosing retinal diseases using optical coherence tomography (OCT) images. Objective: This diagnostic study aimed to determine the usefulness of a proposed AI-CAD system in assisting ophthalmologists with the diagnosis of central serous chorioretinopathy (CSC), which is known to be difficult to diagnose, using OCT images. Methods: For the training and evaluation of the proposed deep learning model, 1693 OCT images were collected and annotated. The data set included 929 and 764 cases of acute and chronic CSC, respectively. In total, 66 ophthalmologists (2 groups: 36 retina and 30 nonretina specialists) participated in the observer performance test. To evaluate the deep learning algorithm used in the proposed AI-CAD system, the training, validation, and test sets were split in an 8:1:1 ratio. Further, 100 randomly sampled OCT images from the test set were used for the observer performance test, and the participants were instructed to select a CSC subtype for each of these images. Each image was provided under different conditions: (1) without AI assistance, (2) with AI assistance with a probability score, and (3) with AI assistance with a probability score and visual evidence heatmap. The sensitivity, specificity, and area under the receiver operating characteristic curve were used to measure the diagnostic performance of the model and ophthalmologists. Results: The proposed system achieved a high detection performance (99% of the area under the curve) for CSC, outperforming the 66 ophthalmologists who participated in the observer performance test. In both groups, ophthalmologists with the support of AI assistance with a probability score and visual evidence heatmap achieved the highest mean diagnostic performance compared with that of those subjected to other conditions (without AI assistance or with AI assistance with a probability score). Nonretina specialists achieved expert-level diagnostic performance with the support of the proposed AI-CAD system. Conclusions: Our proposed AI-CAD system improved the diagnosis of CSC by ophthalmologists, which may support decision-making regarding retinal disease detection and alleviate the workload of ophthalmologists. %M 38019564 %R 10.2196/48142 %U https://www.jmir.org/2023/1/e48142 %U https://doi.org/10.2196/48142 %U http://www.ncbi.nlm.nih.gov/pubmed/38019564 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48754 %T Wearable Artificial Intelligence for Detecting Anxiety: Systematic Review and Meta-Analysis %A Abd-alrazaq,Alaa %A AlSaad,Rawan %A Harfouche,Manale %A Aziz,Sarah %A Ahmed,Arfan %A Damseh,Rafat %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Cornell University, Qatar Foundation - Education City, Ezdan Street, Doha, M343A8, Qatar, 974 44928812, aaa4027@qatar-med.cornell.edu %K anxiety %K artificial intelligence %K wearable devices %K machine learning %K systematic review %K mobile phone %D 2023 %7 8.11.2023 %9 Review %J J Med Internet Res %G English %X Background: Anxiety disorders rank among the most prevalent mental disorders worldwide. Anxiety symptoms are typically evaluated using self-assessment surveys or interview-based assessment methods conducted by clinicians, which can be subjective, time-consuming, and challenging to repeat. Therefore, there is an increasing demand for using technologies capable of providing objective and early detection of anxiety. Wearable artificial intelligence (AI), the combination of AI technology and wearable devices, has been widely used to detect and predict anxiety disorders automatically, objectively, and more efficiently. Objective: This systematic review and meta-analysis aims to assess the performance of wearable AI in detecting and predicting anxiety. Methods: Relevant studies were retrieved by searching 8 electronic databases and backward and forward reference list checking. In total, 2 reviewers independently carried out study selection, data extraction, and risk-of-bias assessment. The included studies were assessed for risk of bias using a modified version of the Quality Assessment of Diagnostic Accuracy Studies–Revised. Evidence was synthesized using a narrative (ie, text and tables) and statistical (ie, meta-analysis) approach as appropriate. Results: Of the 918 records identified, 21 (2.3%) were included in this review. A meta-analysis of results from 81% (17/21) of the studies revealed a pooled mean accuracy of 0.82 (95% CI 0.71-0.89). Meta-analyses of results from 48% (10/21) of the studies showed a pooled mean sensitivity of 0.79 (95% CI 0.57-0.91) and a pooled mean specificity of 0.92 (95% CI 0.68-0.98). Subgroup analyses demonstrated that the performance of wearable AI was not moderated by algorithms, aims of AI, wearable devices used, status of wearable devices, data types, data sources, reference standards, and validation methods. Conclusions: Although wearable AI has the potential to detect anxiety, it is not yet advanced enough for clinical use. Until further evidence shows an ideal performance of wearable AI, it should be used along with other clinical assessments. Wearable device companies need to develop devices that can promptly detect anxiety and identify specific time points during the day when anxiety levels are high. Further research is needed to differentiate types of anxiety, compare the performance of different wearable devices, and investigate the impact of the combination of wearable device data and neuroimaging data on the performance of wearable AI. Trial Registration: PROSPERO CRD42023387560; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387560 %M 37938883 %R 10.2196/48754 %U https://www.jmir.org/2023/1/e48754 %U https://doi.org/10.2196/48754 %U http://www.ncbi.nlm.nih.gov/pubmed/37938883 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e44732 %T Physician- and Patient-Elicited Barriers and Facilitators to Implementation of a Machine Learning–Based Screening Tool for Peripheral Arterial Disease: Preimplementation Study With Physician and Patient Stakeholders %A Ho,Vy %A Brown Johnson,Cati %A Ghanzouri,Ilies %A Amal,Saeed %A Asch,Steven %A Ross,Elsie %+ Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, 500 Pasteur Drive, Stanford, CA, 94043, United States, 1 6507232185, vivianho@stanford.edu %K artificial intelligence %K cardiovascular disease %K machine learning %K peripheral arterial disease %K preimplementation study %D 2023 %7 6.11.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Peripheral arterial disease (PAD) is underdiagnosed, partially due to a high prevalence of atypical symptoms and a lack of physician and patient awareness. Implementing clinical decision support tools powered by machine learning algorithms may help physicians identify high-risk patients for diagnostic workup. Objective: This study aims to evaluate barriers and facilitators to the implementation of a novel machine learning–based screening tool for PAD among physician and patient stakeholders using the Consolidated Framework for Implementation Research (CFIR). Methods: We performed semistructured interviews with physicians and patients from the Stanford University Department of Primary Care and Population Health, Division of Cardiology, and Division of Vascular Medicine. Participants answered questions regarding their perceptions toward machine learning and clinical decision support for PAD detection. Rapid thematic analysis was performed using templates incorporating codes from CFIR constructs. Results: A total of 12 physicians (6 primary care physicians and 6 cardiovascular specialists) and 14 patients were interviewed. Barriers to implementation arose from 6 CFIR constructs: complexity, evidence strength and quality, relative priority, external policies and incentives, knowledge and beliefs about intervention, and individual identification with the organization. Facilitators arose from 5 CFIR constructs: intervention source, relative advantage, learning climate, patient needs and resources, and knowledge and beliefs about intervention. Physicians felt that a machine learning–powered diagnostic tool for PAD would improve patient care but cited limited time and authority in asking patients to undergo additional screening procedures. Patients were interested in having their physicians use this tool but raised concerns about such technologies replacing human decision-making. Conclusions: Patient- and physician-reported barriers toward the implementation of a machine learning–powered PAD diagnostic tool followed four interdependent themes: (1) low familiarity or urgency in detecting PAD; (2) concerns regarding the reliability of machine learning; (3) differential perceptions of responsibility for PAD care among primary care versus specialty physicians; and (4) patient preference for physicians to remain primary interpreters of health care data. Facilitators followed two interdependent themes: (1) enthusiasm for clinical use of the predictive model and (2) willingness to incorporate machine learning into clinical care. Implementation of machine learning–powered diagnostic tools for PAD should leverage provider support while simultaneously educating stakeholders on the importance of early PAD diagnosis. High predictive validity is necessary for machine learning models but not sufficient for implementation. %M 37930755 %R 10.2196/44732 %U https://cardio.jmir.org/2023/1/e44732 %U https://doi.org/10.2196/44732 %U http://www.ncbi.nlm.nih.gov/pubmed/37930755 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e47532 %T The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study %A Ito,Naoki %A Kadomatsu,Sakina %A Fujisawa,Mineto %A Fukaguchi,Kiyomitsu %A Ishizawa,Ryo %A Kanda,Naoki %A Kasugai,Daisuke %A Nakajima,Mikio %A Goto,Tadahiro %A Tsugawa,Yusuke %+ TXP Medical Co Ltd, 41-1 H¹O Kanda 706, Tokyo, 101-0042, Japan, 81 03 5615 8433, tag695@mail.harvard.edu %K GPT-4 %K racial and ethnic bias %K typical clinical vignettes %K diagnosis %K triage %K artificial intelligence %K AI %K race %K clinical vignettes %K physician %K efficiency %K decision-making %K bias %K GPT %D 2023 %7 2.11.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. Objective: We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. Methods: We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as “correct” or “incorrect.” Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. Results: The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients’ race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. Conclusions: GPT-4’s ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage. %M 37917120 %R 10.2196/47532 %U https://mededu.jmir.org/2023/1/e47532 %U https://doi.org/10.2196/47532 %U http://www.ncbi.nlm.nih.gov/pubmed/37917120 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e51375 %T AI Algorithm to Predict Acute Coronary Syndrome in Prehospital Cardiac Care: Retrospective Cohort Study %A de Koning,Enrico %A van der Haas,Yvette %A Saguna,Saguna %A Stoop,Esmee %A Bosch,Jan %A Beeres,Saskia %A Schalij,Martin %A Boogers,Mark %+ Cardiology Department, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZA, Netherlands, 31 715269111, j.m.j.boogers@lumc.nl %K cardiology %K acute coronary syndrome %K Hollands Midden Acute Regional Triage–cardiology %K prehospital %K triage %K artificial intelligence %K natural language processing %K angina %K algorithm %K overcrowding %K emergency department %K clinical decision-making %K emergency medical service %K paramedics %D 2023 %7 31.10.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Overcrowding of hospitals and emergency departments (EDs) is a growing problem. However, not all ED consultations are necessary. For example, 80% of patients in the ED with chest pain do not have an acute coronary syndrome (ACS). Artificial intelligence (AI) is useful in analyzing (medical) data, and might aid health care workers in prehospital clinical decision-making before patients are presented to the hospital. Objective: The aim of this study was to develop an AI model which would be able to predict ACS before patients visit the ED. The model retrospectively analyzed prehospital data acquired by emergency medical services' nurse paramedics. Methods: Patients presenting to the emergency medical services with symptoms suggestive of ACS between September 2018 and September 2020 were included. An AI model using a supervised text classification algorithm was developed to analyze data. Data were analyzed for all 7458 patients (mean 68, SD 15 years, 54% men). Specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for control and intervention groups. At first, a machine learning (ML) algorithm (or model) was chosen; afterward, the features needed were selected and then the model was tested and improved using iterative evaluation and in a further step through hyperparameter tuning. Finally, a method was selected to explain the final AI model. Results: The AI model had a specificity of 11% and a sensitivity of 99.5% whereas usual care had a specificity of 1% and a sensitivity of 99.5%. The PPV of the AI model was 15% and the NPV was 99%. The PPV of usual care was 13% and the NPV was 94%. Conclusions: The AI model was able to predict ACS based on retrospective data from the prehospital setting. It led to an increase in specificity (from 1% to 11%) and NPV (from 94% to 99%) when compared to usual care, with a similar sensitivity. Due to the retrospective nature of this study and the singular focus on ACS it should be seen as a proof-of-concept. Other (possibly life-threatening) diagnoses were not analyzed. Future prospective validation is necessary before implementation. %M 37906226 %R 10.2196/51375 %U https://cardio.jmir.org/2023/1/e51375 %U https://doi.org/10.2196/51375 %U http://www.ncbi.nlm.nih.gov/pubmed/37906226 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50448 %T Clinical Decision Support System for All Stages of Gastric Carcinogenesis in Real-Time Endoscopy: Model Establishment and Validation Study %A Gong,Eun Jeong %A Bang,Chang Seok %A Lee,Jae Jun %A Jeong,Hae Min %A Baik,Gwang Ho %A Jeong,Jae Hoon %A Dick,Sigmund %A Lee,Gi Hun %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, 24253, Republic of Korea, 82 1052657810, csbang@hallym.ac.kr %K atrophy %K intestinal metaplasia %K metaplasia %K deep learning %K endoscopy %K gastric neoplasms %K neoplasm %K neoplasms %K internal medicine %K cancer %K oncology %K decision support %K real time %K gastrointestinal %K gastric %K intestinal %K machine learning %K clinical decision support system %K CDSS %K computer aided %K diagnosis %K diagnostic %K carcinogenesis %D 2023 %7 30.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Our research group previously established a deep-learning–based clinical decision support system (CDSS) for real-time endoscopy-based detection and classification of gastric neoplasms. However, preneoplastic conditions, such as atrophy and intestinal metaplasia (IM) were not taken into account, and there is no established model that classifies all stages of gastric carcinogenesis. Objective: This study aims to build and validate a CDSS for real-time endoscopy for all stages of gastric carcinogenesis, including atrophy and IM. Methods: A total of 11,868 endoscopic images were used for training and internal testing. The primary outcomes were lesion classification accuracy (6 classes: advanced gastric cancer, early gastric cancer, dysplasia, atrophy, IM, and normal) and atrophy and IM lesion segmentation rates for the segmentation model. The following tests were carried out to validate the performance of lesion classification accuracy: (1) external testing using 1282 images from another institution and (2) evaluation of the classification accuracy of atrophy and IM in real-world procedures in a prospective manner. To estimate the clinical utility, 2 experienced endoscopists were invited to perform a blind test with the same data set. A CDSS was constructed by combining the established 6-class lesion classification model and the preneoplastic lesion segmentation model with the previously established lesion detection model. Results: The overall lesion classification accuracy (95% CI) was 90.3% (89%-91.6%) in the internal test. For the performance validation, the CDSS achieved 85.3% (83.4%-97.2%) overall accuracy. The per-class external test accuracies for atrophy and IM were 95.3% (92.6%-98%) and 89.3% (85.4%-93.2%), respectively. CDSS-assisted endoscopy showed an accuracy of 92.1% (88.8%-95.4%) for atrophy and 95.5% (92%-99%) for IM in the real-world application of 522 consecutive screening endoscopies. There was no significant difference in the overall accuracy between the invited endoscopists and established CDSS in the prospective real-clinic evaluation (P=.23). The CDSS demonstrated a segmentation rate of 93.4% (95% CI 92.4%-94.4%) for atrophy or IM lesion segmentation in the internal testing. Conclusions: The CDSS achieved high performance in terms of computer-aided diagnosis of all stages of gastric carcinogenesis and demonstrated real-world application potential. %M 37902818 %R 10.2196/50448 %U https://www.jmir.org/2023/1/e50448 %U https://doi.org/10.2196/50448 %U http://www.ncbi.nlm.nih.gov/pubmed/37902818 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47590 %T A Web-Based Calculator to Predict Early Death Among Patients With Bone Metastasis Using Machine Learning Techniques: Development and Validation Study %A Lei,Mingxing %A Wu,Bing %A Zhang,Zhicheng %A Qin,Yong %A Cao,Xuyong %A Cao,Yuncen %A Liu,Baoge %A Su,Xiuyun %A Liu,Yaosheng %+ Department of Orthopedics, The Fifth Medical Center of PLA General Hospital, 8 Fengtaidongda Rd, Fengtai District, Beijing, China, 86 15810069346, liuyaosheng@301hospital.com.cn %K bone metastasis %K early death %K machine learning %K prediction model %K local interpretable model–agnostic explanation %D 2023 %7 23.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients with bone metastasis often experience a significantly limited survival time, and a life expectancy of <3 months is generally regarded as a contraindication for extensive invasive surgeries. In this context, the accurate prediction of survival becomes very important since it serves as a crucial guide in making clinical decisions. Objective: This study aimed to develop a machine learning–based web calculator that can provide an accurate assessment of the likelihood of early death among patients with bone metastasis. Methods: This study analyzed a large cohort of 118,227 patients diagnosed with bone metastasis between 2010 and 2019 using the data obtained from a national cancer database. The entire cohort of patients was randomly split 9:1 into a training group (n=106,492) and a validation group (n=11,735). Six approaches—logistic regression, extreme gradient boosting machine, decision tree, random forest, neural network, and gradient boosting machine—were implemented in this study. The performance of these approaches was evaluated using 11 measures, and each approach was ranked based on its performance in each measure. Patients (n=332) from a teaching hospital were used as the external validation group, and external validation was performed using the optimal model. Results: In the entire cohort, a substantial proportion of patients (43,305/118,227, 36.63%) experienced early death. Among the different approaches evaluated, the gradient boosting machine exhibited the highest score of prediction performance (54 points), followed by the neural network (52 points) and extreme gradient boosting machine (50 points). The gradient boosting machine demonstrated a favorable discrimination ability, with an area under the curve of 0.858 (95% CI 0.851-0.865). In addition, the calibration slope was 1.02, and the intercept-in-large value was −0.02, indicating good calibration of the model. Patients were divided into 2 risk groups using a threshold of 37% based on the gradient boosting machine. Patients in the high-risk group (3105/4315, 71.96%) were found to be 4.5 times more likely to experience early death compared with those in the low-risk group (1159/7420, 15.62%). External validation of the model demonstrated a high area under the curve of 0.847 (95% CI 0.798-0.895), indicating its robust performance. The model developed by the gradient boosting machine has been deployed on the internet as a calculator. Conclusions: This study develops a machine learning–based calculator to assess the probability of early death among patients with bone metastasis. The calculator has the potential to guide clinical decision-making and improve the care of patients with bone metastasis by identifying those at a higher risk of early death. %M 37870889 %R 10.2196/47590 %U https://www.jmir.org/2023/1/e47590 %U https://doi.org/10.2196/47590 %U http://www.ncbi.nlm.nih.gov/pubmed/37870889 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48093 %T Digital Marker for Early Screening of Mild Cognitive Impairment Through Hand and Eye Movement Analysis in Virtual Reality Using Machine Learning: First Validation Study %A Kim,Se Young %A Park,Jinseok %A Choi,Hojin %A Loeser,Martin %A Ryu,Hokyoung %A Seo,Kyoungwon %+ Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Sangsang hall, 4th Fl., Gongneung-ro, Gongneung-dong, Nowon-gu, Seoul, 01811, Republic of Korea, 82 010 5668 8660, kwseo@seoultech.ac.kr %K Alzheimer disease %K biomarkers %K dementia %K digital markers %K eye movement %K hand movement %K machine learning %K mild cognitive impairment %K screening %K virtual reality %D 2023 %7 20.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: With the global rise in Alzheimer disease (AD), early screening for mild cognitive impairment (MCI), which is a preclinical stage of AD, is of paramount importance. Although biomarkers such as cerebrospinal fluid amyloid level and magnetic resonance imaging have been studied, they have limitations, such as high cost and invasiveness. Digital markers to assess cognitive impairment by analyzing behavioral data collected from digital devices in daily life can be a new alternative. In this context, we developed a “virtual kiosk test” for early screening of MCI by analyzing behavioral data collected when using a kiosk in a virtual environment. Objective: We aimed to investigate key behavioral features collected from a virtual kiosk test that could distinguish patients with MCI from healthy controls with high statistical significance. Also, we focused on developing a machine learning model capable of early screening of MCI based on these behavioral features. Methods: A total of 51 participants comprising 20 healthy controls and 31 patients with MCI were recruited by 2 neurologists from a university hospital. The participants performed a virtual kiosk test—developed by our group—where we recorded various behavioral data such as hand and eye movements. Based on these time series data, we computed the following 4 behavioral features: hand movement speed, proportion of fixation duration, time to completion, and the number of errors. To compare these behavioral features between healthy controls and patients with MCI, independent-samples 2-tailed t tests were used. Additionally, we used these behavioral features to train and validate a machine learning model for early screening of patients with MCI from healthy controls. Results: In the virtual kiosk test, all 4 behavioral features showed statistically significant differences between patients with MCI and healthy controls. Compared with healthy controls, patients with MCI had slower hand movement speed (t49=3.45; P=.004), lower proportion of fixation duration (t49=2.69; P=.04), longer time to completion (t49=–3.44; P=.004), and a greater number of errors (t49=–3.77; P=.001). All 4 features were then used to train a support vector machine to distinguish between healthy controls and patients with MCI. Our machine learning model achieved 93.3% accuracy, 100% sensitivity, 83.3% specificity, 90% precision, and 94.7% F1-score. Conclusions: Our research preliminarily suggests that analyzing hand and eye movements in the virtual kiosk test holds potential as a digital marker for early screening of MCI. In contrast to conventional biomarkers, this digital marker in virtual reality is advantageous as it can collect ecologically valid data at an affordable cost and in a short period (5-15 minutes), making it a suitable means for early screening of MCI. We call for further studies to confirm the reliability and validity of this approach. %M 37862101 %R 10.2196/48093 %U https://www.jmir.org/2023/1/e48093 %U https://doi.org/10.2196/48093 %U http://www.ncbi.nlm.nih.gov/pubmed/37862101 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47346 %T Use of Artificial Intelligence in the Identification and Diagnosis of Frailty Syndrome in Older Adults: Scoping Review %A Velazquez-Diaz,Daniel %A Arco,Juan E %A Ortiz,Andres %A Pérez-Cabezas,Verónica %A Lucena-Anton,David %A Moral-Munoz,Jose A %A Galán-Mercant,Alejandro %+ MOVE-IT Research Group, Department of Nursing and Physiotherapy, Faculty of Health Sciences, University of Cádiz, Ana de Viya, 52, Cádiz, 11003, Spain, 34 676 719 119, veronica.perezcabezas@uca.es %K frail older adult %K identification %K diagnosis %K artificial intelligence %K review %K frailty %K older adults %K aging %K biological variability %K detection %K accuracy %K sensitivity %K screening %K tool %D 2023 %7 20.10.2023 %9 Review %J J Med Internet Res %G English %X Background: Frailty syndrome (FS) is one of the most common noncommunicable diseases, which is associated with lower physical and mental capacities in older adults. FS diagnosis is mostly focused on biological variables; however, it is likely that this diagnosis could fail owing to the high biological variability in this syndrome. Therefore, artificial intelligence (AI) could be a potential strategy to identify and diagnose this complex and multifactorial geriatric syndrome. Objective: The objective of this scoping review was to analyze the existing scientific evidence on the use of AI for the identification and diagnosis of FS in older adults, as well as to identify which model provides enhanced accuracy, sensitivity, specificity, and area under the curve (AUC). Methods: A search was conducted using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines on various databases: PubMed, Web of Science, Scopus, and Google Scholar. The search strategy followed Population/Problem, Intervention, Comparison, and Outcome (PICO) criteria with the population being older adults; intervention being AI; comparison being compared or not to other diagnostic methods; and outcome being FS with reported sensitivity, specificity, accuracy, or AUC values. The results were synthesized through information extraction and are presented in tables. Results: We identified 26 studies that met the inclusion criteria, 6 of which had a data set over 2000 and 3 with data sets below 100. Machine learning was the most widely used type of AI, employed in 18 studies. Moreover, of the 26 included studies, 9 used clinical data, with clinical histories being the most frequently used data type in this category. The remaining 17 studies used nonclinical data, most frequently involving activity monitoring using an inertial sensor in clinical and nonclinical contexts. Regarding the performance of each AI model, 10 studies achieved a value of precision, sensitivity, specificity, or AUC ≥90. Conclusions: The findings of this scoping review clarify the overall status of recent studies using AI to identify and diagnose FS. Moreover, the findings show that the combined use of AI using clinical data along with nonclinical information such as the kinematics of inertial sensors that monitor activities in a nonclinical context could be an appropriate tool for the identification and diagnosis of FS. Nevertheless, some possible limitations of the evidence included in the review could be small sample sizes, heterogeneity of study designs, and lack of standardization in the AI models and diagnostic criteria used across studies. Future research is needed to validate AI systems with diverse data sources for diagnosing FS. AI should be used as a decision support tool for identifying FS, with data quality and privacy addressed, and the tool should be regularly monitored for performance after being integrated in clinical practice. %M 37862082 %R 10.2196/47346 %U https://www.jmir.org/2023/1/e47346 %U https://doi.org/10.2196/47346 %U http://www.ncbi.nlm.nih.gov/pubmed/37862082 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e48808 %T ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation %A Hirosawa,Takanobu %A Kawamura,Ren %A Harada,Yukinori %A Mizuta,Kazuya %A Tokumasu,Kazuki %A Kaji,Yuki %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, Tochigi, 321-0293, Japan, 81 282861111, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K AI chatbot %K ChatGPT %K large language models %K clinical decision support %K natural language processing %K diagnostic excellence %K language model %K vignette %K case study %K diagnostic %K accuracy %K decision support %K diagnosis %D 2023 %7 9.10.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. Objective: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. Methods: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. Results: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models’ diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). Conclusions: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making. %M 37812468 %R 10.2196/48808 %U https://medinform.jmir.org/2023/1/e48808 %U https://doi.org/10.2196/48808 %U http://www.ncbi.nlm.nih.gov/pubmed/37812468 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e49898 %T Parkinson Disease Recognition Using a Gamified Website: Machine Learning Development and Usability Study %A Parab,Shubham %A Boster,Jerry %A Washington,Peter %+ Department of Information & Computer Sciences, University of Hawaii at Manoa, 2500 Campus Rd, Honolulu, HI, 96822, United States, 1 1 512 680 0926, pyw@hawaii.edu %K Parkinson disease %K digital health %K machine learning %K remote screening %K accessible screening %D 2023 %7 29.9.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Parkinson disease (PD) affects millions globally, causing motor function impairments. Early detection is vital, and diverse data sources aid diagnosis. We focus on lower arm movements during keyboard and trackpad or touchscreen interactions, which serve as reliable indicators of PD. Previous works explore keyboard tapping and unstructured device monitoring; we attempt to further these works with structured tests taking into account 2D hand movement in addition to finger tapping. Our feasibility study uses keystroke and mouse movement data from a remotely conducted, structured, web-based test combined with self-reported PD status to create a predictive model for detecting the presence of PD. Objective: Analysis of finger tapping speed and accuracy through keyboard input and analysis of 2D hand movement through mouse input allowed differentiation between participants with and without PD. This comparative analysis enables us to establish clear distinctions between the two groups and explore the feasibility of using motor behavior to predict the presence of the disease. Methods: Participants were recruited via email by the Hawaii Parkinson Association (HPA) and directed to a web application for the tests. The 2023 HPA symposium was also used as a forum to recruit participants and spread information about our study. The application recorded participant demographics, including age, gender, and race, as well as PD status. We conducted a series of tests to assess finger tapping, using on-screen prompts to request key presses of constant and random keys. Response times, accuracy, and unintended movements resulting in accidental presses were recorded. Participants performed a hand movement test consisting of tracing straight and curved on-screen ribbons using a trackpad or mouse, allowing us to evaluate stability and precision of 2D hand movement. From this tracing, the test collected and stored insights concerning lower arm motor movement. Results: Our formative study included 31 participants, 18 without PD and 13 with PD, and analyzed their lower limb movement data collected from keyboards and computer mice. From the data set, we extracted 28 features and evaluated their significances using an extra tree classifier predictor. A random forest model was trained using the 6 most important features identified by the predictor. These selected features provided insights into precision and movement speed derived from keyboard tapping and mouse tracing tests. This final model achieved an average F1-score of 0.7311 (SD 0.1663) and an average accuracy of 0.7429 (SD 0.1400) over 20 runs for predicting the presence of PD. Conclusions: This preliminary feasibility study suggests the possibility of using technology-based limb movement data to predict the presence of PD, demonstrating the practicality of implementing this approach in a cost-effective and accessible manner. In addition, this study demonstrates that structured mouse movement tests can be used in combination with finger tapping to detect PD. %M 37773607 %R 10.2196/49898 %U https://formative.jmir.org/2023/1/e49898 %U https://doi.org/10.2196/49898 %U http://www.ncbi.nlm.nih.gov/pubmed/37773607 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e47095 %T Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study %A Deng,Yuhan %A Ma,Yuan %A Fu,Jingzhu %A Wang,Xiaona %A Yu,Canqing %A Lv,Jun %A Man,Sailimai %A Wang,Bo %A Li,Liming %+ Meinian Institute of Health, 13 Floor, Health Work, Huayuan Road, Haidian District, Beijing, 100083, China, 86 010 82097560, paul@meinianresearch.com %K machine learning %K carotid plaque %K health check-up %K prediction %K fatty liver %K risk assessment %K risk stratification %K cardiovascular %K logistic regression %D 2023 %7 7.9.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective: This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods: Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results: Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions: The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver. %M 37676713 %R 10.2196/47095 %U https://publichealth.jmir.org/2023/1/e47095 %U https://doi.org/10.2196/47095 %U http://www.ncbi.nlm.nih.gov/pubmed/37676713 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42638 %T Smartphone App–Based and Paper-Based Patient-Reported Outcomes Using a Disease-Specific Questionnaire for Dry Eye Disease: Randomized Crossover Equivalence Study %A Nagino,Ken %A Okumura,Yuichi %A Akasaki,Yasutsugu %A Fujio,Kenta %A Huang,Tianxiang %A Sung,Jaemyoung %A Midorikawa-Inomata,Akie %A Fujimoto,Keiichi %A Eguchi,Atsuko %A Hurramhon,Shokirova %A Yee,Alan %A Miura,Maria %A Ohno,Mizu %A Hirosawa,Kunihiko %A Morooka,Yuki %A Murakami,Akira %A Kobayashi,Hiroyuki %A Inomata,Takenori %+ Department of Ophthalmology, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 1130033, Japan, 81 338133111, tinoma@juntendo.ac.jp %K dry eye syndrome %K mobile app %K equivalence trial %K Ocular Surface Disease Index %K patient-reported outcome measures %K mobile health %K reliability %K validity %K telemedicine %K precision medicine %D 2023 %7 3.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Using traditional patient-reported outcomes (PROs), such as paper-based questionnaires, is cumbersome in the era of web-based medical consultation and telemedicine. Electronic PROs may reduce the burden on patients if implemented widely. Considering promising reports of DryEyeRhythm, our in-house mHealth smartphone app for investigating dry eye disease (DED) and the electronic and paper-based Ocular Surface Disease Index (OSDI) should be evaluated and compared to determine their equivalency. Objective: The purpose of this study is to assess the equivalence between smartphone app–based and paper-based questionnaires for DED. Methods: This prospective, nonblinded, randomized crossover study enrolled 34 participants between April 2022 and June 2022 at a university hospital in Japan. The participants were allocated randomly into 2 groups in a 1:1 ratio. The paper-app group initially responded to the paper-based Japanese version of the OSDI (J-OSDI), followed by the app-based J-OSDI. The app-paper group responded to similar questionnaires but in reverse order. We performed an equivalence test based on minimal clinically important differences to assess the equivalence of the J-OSDI total scores between the 2 platforms (paper-based vs app-based). A 95% CI of the mean difference between the J-OSDI total scores within the ±7.0 range between the 2 platforms indicated equivalence. The internal consistency and agreement of the app-based J-OSDI were assessed with Cronbach α coefficients and intraclass correlation coefficient values. Results: A total of 33 participants were included in this study. The total scores for the app- and paper-based J-OSDI indicated satisfactory equivalence per our study definition (mean difference 1.8, 95% CI –1.4 to 5.0). Moreover, the app-based J-OSDI total score demonstrated good internal consistency and agreement (Cronbach α=.958; intraclass correlation=0.919; 95% CI 0.842 to 0.959) and was significantly correlated with its paper-based counterpart (Pearson correlation=0.932, P<.001). Conclusions: This study demonstrated the equivalence of PROs between the app- and paper-based J-OSDI. Implementing the app-based J-OSDI in various scenarios, including telehealth, may have implications for the early diagnosis of DED and longitudinal monitoring of PROs. %M 37535409 %R 10.2196/42638 %U https://www.jmir.org/2023/1/e42638 %U https://doi.org/10.2196/42638 %U http://www.ncbi.nlm.nih.gov/pubmed/37535409 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e49034 %T Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence–Driven Automated History–Taking System: Pilot Cross-Sectional Study %A Harada,Yukinori %A Tomiyama,Shusaku %A Sakamoto,Tetsu %A Sugimoto,Shu %A Kawamura,Ren %A Yokose,Masashi %A Hayashi,Arisa %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu, Shimotsugagun, 321-0293, Japan, 81 282 86 1111, yharada@dokkyomed.ac.jp %K collective intelligence %K differential diagnosis generator %K diagnostic accuracy %K automated medical history taking system %K artificial intelligence %K AI %D 2023 %7 2.8.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Low diagnostic accuracy is a major concern in automated medical history–taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective: The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods: We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)–driven automated medical history–taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history–taking system without reading the index lists generated by the automated medical history–taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians’ input). Results: The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions: Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial. %M 37531164 %R 10.2196/49034 %U https://formative.jmir.org/2023/1/e49034 %U https://doi.org/10.2196/49034 %U http://www.ncbi.nlm.nih.gov/pubmed/37531164 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43154 %T Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review %A Hansun,Seng %A Argha,Ahmadreza %A Liaw,Siaw-Teng %A Celler,Branko G %A Marks,Guy B %+ South West Sydney (SWS), School of Clinical Medicine, University of New South Wales, Burnside Drive, Warwick Farm, New South Wales, Sydney, 2170, Australia, 61 456541224, s.hansun@unsw.edu.au %K chest x-rays %K convolutional neural networks %K diagnostic test accuracy %K machine and deep learning %K PRISMA guidelines %K risk of bias %K QUADAS-2 %K sensitivity and specificity %K systematic literature review %K tuberculosis detection %D 2023 %7 3.7.2023 %9 Review %J J Med Internet Res %G English %X Background: Tuberculosis (TB) was the leading infectious cause of mortality globally prior to COVID-19 and chest radiography has an important role in the detection, and subsequent diagnosis, of patients with this disease. The conventional experts reading has substantial within- and between-observer variability, indicating poor reliability of human readers. Substantial efforts have been made in utilizing various artificial intelligence–based algorithms to address the limitations of human reading of chest radiographs for diagnosing TB. Objective: This systematic literature review (SLR) aims to assess the performance of machine learning (ML) and deep learning (DL) in the detection of TB using chest radiography (chest x-ray [CXR]). Methods: In conducting and reporting the SLR, we followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A total of 309 records were identified from Scopus, PubMed, and IEEE (Institute of Electrical and Electronics Engineers) databases. We independently screened, reviewed, and assessed all available records and included 47 studies that met the inclusion criteria in this SLR. We also performed the risk of bias assessment using Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) and meta-analysis of 10 included studies that provided confusion matrix results. Results: Various CXR data sets have been used in the included studies, with 2 of the most popular ones being Montgomery County (n=29) and Shenzhen (n=36) data sets. DL (n=34) was more commonly used than ML (n=7) in the included studies. Most studies used human radiologist’s report as the reference standard. Support vector machine (n=5), k-nearest neighbors (n=3), and random forest (n=2) were the most popular ML approaches. Meanwhile, convolutional neural networks were the most commonly used DL techniques, with the 4 most popular applications being ResNet-50 (n=11), VGG-16 (n=8), VGG-19 (n=7), and AlexNet (n=6). Four performance metrics were popularly used, namely, accuracy (n=35), area under the curve (AUC; n=34), sensitivity (n=27), and specificity (n=23). In terms of the performance results, ML showed higher accuracy (mean ~93.71%) and sensitivity (mean ~92.55%), while on average DL models achieved better AUC (mean ~92.12%) and specificity (mean ~91.54%). Based on data from 10 studies that provided confusion matrix results, we estimated the pooled sensitivity and specificity of ML and DL methods to be 0.9857 (95% CI 0.9477-1.00) and 0.9805 (95% CI 0.9255-1.00), respectively. From the risk of bias assessment, 17 studies were regarded as having unclear risks for the reference standard aspect and 6 studies were regarded as having unclear risks for the flow and timing aspect. Only 2 included studies had built applications based on the proposed solutions. Conclusions: Findings from this SLR confirm the high potential of both ML and DL for TB detection using CXR. Future studies need to pay a close attention on 2 aspects of risk of bias, namely, the reference standard and the flow and timing aspects. Trial Registration: PROSPERO CRD42021277155; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=277155 %M 37399055 %R 10.2196/43154 %U https://www.jmir.org/2023/1/e43154 %U https://doi.org/10.2196/43154 %U http://www.ncbi.nlm.nih.gov/pubmed/37399055 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44642 %T Accuracy of a Standalone Atrial Fibrillation Detection Algorithm Added to a Popular Wristband and Smartwatch: Prospective Diagnostic Accuracy Study %A Selder,Jasper L %A Te Kolste,Henryk Jan %A Twisk,Jos %A Schijven,Marlies %A Gielen,Willem %A Allaart,Cornelis P %+ Department of Cardiology, Amsterdam University Medical Center, De Boelelaan 1117, Amsterdam, 1081 HV, Netherlands, 31 645256921, j.selder@amsterdamumc.nl %K smartwatch %K atrial fibrillation %K algorithm %K fibrillation detection %K wristband %K diagnose %K heart rhythm %K cardioversion %K environment %K software algorithm %K artificial intelligence %K AI %K electrocardiography %K ECG %K EKG %D 2023 %7 26.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Silent paroxysmal atrial fibrillation (AF) may be difficult to diagnose, and AF burden is hard to establish. In contrast to conventional diagnostic devices, photoplethysmography (PPG)–driven smartwatches or wristbands allow for long-term continuous heart rhythm assessment. However, most smartwatches lack an integrated PPG-AF algorithm. Adding a standalone PPG-AF algorithm to these wrist devices might open new possibilities for AF screening and burden assessment. Objective: The aim of this study was to assess the accuracy of a well-known standalone PPG-AF detection algorithm added to a popular wristband and smartwatch, with regard to discriminating AF and sinus rhythm, in a group of patients with AF before and after cardioversion (CV). Methods: Consecutive consenting patients with AF admitted for CV in a large academic hospital in Amsterdam, the Netherlands, were asked to wear a Biostrap wristband or Fitbit Ionic smartwatch with Fibricheck algorithm add-on surrounding the procedure. A set of 1-min PPG measurements and 12-lead reference electrocardiograms was obtained before and after CV. Rhythm assessment by the PPG device-software combination was compared with the 12-lead electrocardiogram. Results: A total of 78 patients were included in the Biostrap-Fibricheck cohort (156 measurement sets) and 73 patients in the Fitbit-Fibricheck cohort (143 measurement sets). Of the measurement sets, 19/156 (12%) and 7/143 (5%), respectively, were not classifiable by the PPG algorithm due to bad quality. The diagnostic performance in terms of sensitivity, specificity, positive predictive value, negative predictive value, and accuracy was 98%, 96%, 96%, 99%, 97%, and 97%, 100%, 100%, 97%, and 99%, respectively, at an AF prevalence of ~50%. Conclusions: This study demonstrates that the addition of a well-known standalone PPG-AF detection algorithm to a popular PPG smartwatch and wristband without integrated algorithm yields a high accuracy for the detection of AF, with an acceptable unclassifiable rate, in a semicontrolled environment. %M 37234033 %R 10.2196/44642 %U https://www.jmir.org/2023/1/e44642 %U https://doi.org/10.2196/44642 %U http://www.ncbi.nlm.nih.gov/pubmed/37234033 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e40887 %T Optometrists' Perspectives Regarding Artificial Intelligence Aids and Contributing Retinal Images to a Repository: Web-Based Interview Study %A Constantin,Aurora %A Atkinson,Malcolm %A Bernabeu,Miguel Oscar %A Buckmaster,Fiona %A Dhillon,Baljean %A McTrusty,Alice %A Strang,Niall %A Williams,Robin %+ Department of Vision Sciences, Glasgow Caledonian University, Cowcaddens Road, Glasgow, G40BA, United Kingdom, 44 07794835467, n.strang@gcu.ac.uk %K AI in optometry %K repository of ocular images %K user studies %K AI decision support tools %K perspectives of optometrists and ophthalmologists %K AI %K research %K medical %K decision support %K tool %K digital tool %K digital %D 2023 %7 25.5.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: A repository of retinal images for research is being established in Scotland. It will permit researchers to validate, tune, and refine artificial intelligence (AI) decision-support algorithms to accelerate safe deployment in Scottish optometry and beyond. Research demonstrates the potential of AI systems in optometry and ophthalmology, though they are not yet widely adopted. Objective: In this study, 18 optometrists were interviewed to (1) identify their expectations and concerns about the national image research repository and their use of AI decision support and (2) gather their suggestions for improving eye health care. The goal was to clarify attitudes among optometrists delivering primary eye care with respect to contributing their patients’ images and to using AI assistance. These attitudes are less well studied in primary care contexts. Five ophthalmologists were interviewed to discover their interactions with optometrists. Methods: Between March and August 2021, 23 semistructured interviews were conducted online lasting for 30-60 minutes. Transcribed and pseudonymized recordings were analyzed using thematic analysis. Results: All optometrists supported contributing retinal images to form an extensive and long-running research repository. Our main findings are summarized as follows. Optometrists were willing to share images of their patients’ eyes but expressed concern about technical difficulties, lack of standardization, and the effort involved. Those interviewed thought that sharing digital images would improve collaboration between optometrists and ophthalmologists, for example, during referral to secondary health care. Optometrists welcomed an expanded primary care role in diagnosis and management of diseases by exploiting new technologies and anticipated significant health benefits. Optometrists welcomed AI assistance but insisted that it should not reduce their role and responsibilities. Conclusions: Our investigation focusing on optometrists is novel because most similar studies on AI assistance were performed in hospital settings. Our findings are consistent with those of studies with professionals in ophthalmology and other medical disciplines: showing near universal willingness to use AI to improve health care, alongside concerns over training, costs, responsibilities, skill retention, data sharing, and disruptions to professional practices. Our study on optometrists’ willingness to contribute images to a research repository introduces a new aspect; they hope that a digital image sharing infrastructure will facilitate service integration. %M 37227761 %R 10.2196/40887 %U https://humanfactors.jmir.org/2023/1/e40887 %U https://doi.org/10.2196/40887 %U http://www.ncbi.nlm.nih.gov/pubmed/37227761 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e47564 %T User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study %A Shahsavar,Yeganeh %A Choudhury,Avishek %+ Industrial and Management Systems Engineering, Benjamin M Statler College of Engineering and Mineral Resources, West Virginia University, 1306 Evansdale Drive, 321 Engineering Sciences Building, Morgantown, WV, 26506, United States, 1 3042934970, avishek.choudhury@mail.wvu.edu %K human factors %K behavioral intention %K chatbots %K health care %K integrated diagnostics %K use %K ChatGPT %K artificial intelligence %K users %K self-diagnosis %K decision-making %K integration %K willingness %K policy %D 2023 %7 17.5.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have emerged as potential tools for various applications, including health care. However, ChatGPT is not specifically designed for health care purposes, and its use for self-diagnosis raises concerns regarding its adoption’s potential risks and benefits. Users are increasingly inclined to use ChatGPT for self-diagnosis, necessitating a deeper understanding of the factors driving this trend. Objective: This study aims to investigate the factors influencing users’ perception of decision-making processes and intentions to use ChatGPT for self-diagnosis and to explore the implications of these findings for the safe and effective integration of AI chatbots in health care. Methods: A cross-sectional survey design was used, and data were collected from 607 participants. The relationships between performance expectancy, risk-reward appraisal, decision-making, and intention to use ChatGPT for self-diagnosis were analyzed using partial least squares structural equation modeling (PLS-SEM). Results: Most respondents were willing to use ChatGPT for self-diagnosis (n=476, 78.4%). The model demonstrated satisfactory explanatory power, accounting for 52.4% of the variance in decision-making and 38.1% in the intent to use ChatGPT for self-diagnosis. The results supported all 3 hypotheses: The higher performance expectancy of ChatGPT (β=.547, 95% CI 0.474-0.620) and positive risk-reward appraisals (β=.245, 95% CI 0.161-0.325) were positively associated with the improved perception of decision-making outcomes among users, and enhanced perception of decision-making processes involving ChatGPT positively impacted users’ intentions to use the technology for self-diagnosis (β=.565, 95% CI 0.498-0.628). Conclusions: Our research investigated factors influencing users’ intentions to use ChatGPT for self-diagnosis and health-related purposes. Even though the technology is not specifically designed for health care, people are inclined to use ChatGPT in health care contexts. Instead of solely focusing on discouraging its use for health care purposes, we advocate for improving the technology and adapting it for suitable health care applications. Our study highlights the importance of collaboration among AI developers, health care providers, and policy makers in ensuring AI chatbots’ safe and responsible use in health care. By understanding users’ expectations and decision-making processes, we can develop AI chatbots, such as ChatGPT, that are tailored to human needs, providing reliable and verified health information sources. This approach not only enhances health care accessibility but also improves health literacy and awareness. As the field of AI chatbots in health care continues to evolve, future research should explore the long-term effects of using AI chatbots for self-diagnosis and investigate their potential integration with other digital health interventions to optimize patient care and outcomes. In doing so, we can ensure that AI chatbots, including ChatGPT, are designed and implemented to safeguard users’ well-being and support positive health outcomes in health care settings. %M 37195756 %R 10.2196/47564 %U https://humanfactors.jmir.org/2023/1/e47564 %U https://doi.org/10.2196/47564 %U http://www.ncbi.nlm.nih.gov/pubmed/37195756 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e44650 %T Clinical Validation of an Artificial Intelligence–Based Tool for Automatic Estimation of Left Ventricular Ejection Fraction and Strain in Echocardiography: Protocol for a Two-Phase Prospective Cohort Study %A Hadjidimitriou,Stelios %A Pagourelias,Efstathios %A Apostolidis,Georgios %A Dimaridis,Ioannis %A Charisis,Vasileios %A Bakogiannis,Constantinos %A Hadjileontiadis,Leontios %A Vassilikos,Vassilios %+ Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, University Campus, Faculty of Engineering, Building D, 6th Fl., Thessaloniki, GR-54124, Greece, 30 2310996319, stelios.hadjidimitriou@gmail.com %K artificial intelligence %K clinical validation %K computer-aided diagnosis %K echocardiography %K ejection fraction %K global longitudinal strain %K left ventricle %K prospective cohort design %K ultrasound %D 2023 %7 13.3.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Echocardiography (ECHO) is a type of ultrasonographic procedure for examining the cardiac function and morphology, with functional parameters of the left ventricle (LV), such as the ejection fraction (EF) and global longitudinal strain (GLS), being important indicators. Estimation of LV-EF and LV-GLS is performed either manually or semiautomatically by cardiologists and requires a nonnegligible amount of time, while estimation accuracy depends on scan quality and the clinician’s experience in ECHO, leading to considerable measurement variability. Objective: The aim of this study is to externally validate the clinical performance of a trained artificial intelligence (AI)–based tool that automatically estimates LV-EF and LV-GLS from transthoracic ECHO scans and to produce preliminary evidence regarding its utility. Methods: This is a prospective cohort study conducted in 2 phases. ECHO scans will be collected from 120 participants referred for ECHO examination based on routine clinical practice in the Hippokration General Hospital, Thessaloniki, Greece. During the first phase, 60 scans will be processed by 15 cardiologists of different experience levels and the AI-based tool to determine whether the latter is noninferior in LV-EF and LV-GLS estimation accuracy (primary outcomes) compared to cardiologists. Secondary outcomes include the time required for estimation and Bland-Altman plots and intraclass correlation coefficients to assess measurement reliability for both the AI and cardiologists. In the second phase, the rest of the scans will be examined by the same cardiologists with and without the AI-based tool to primarily evaluate whether the combination of the cardiologist and the tool is superior in terms of correctness of LV function diagnosis (normal or abnormal) to the cardiologist’s routine examination practice, accounting for the cardiologist’s level of ECHO experience. Secondary outcomes include time to diagnosis and the system usability scale score. Reference LV-EF and LV-GLS measurements and LV function diagnoses will be provided by a panel of 3 expert cardiologists. Results: Recruitment started in September 2022, and data collection is ongoing. The results of the first phase are expected to be available by summer 2023, while the study will conclude in May 2024, with the end of the second phase. Conclusions: This study will provide external evidence regarding the clinical performance and utility of the AI-based tool based on prospectively collected ECHO scans in the routine clinical setting, thus reflecting real-world clinical scenarios. The study protocol may be useful to investigators conducting similar research. International Registered Report Identifier (IRRID): DERR1-10.2196/44650 %M 36912875 %R 10.2196/44650 %U https://www.researchprotocols.org/2023/1/e44650 %U https://doi.org/10.2196/44650 %U http://www.ncbi.nlm.nih.gov/pubmed/36912875 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 11 %P e35622 %T Discovery and Analytical Validation of a Vocal Biomarker to Monitor Anosmia and Ageusia in Patients With COVID-19: Cross-sectional Study %A Higa,Eduardo %A Elbéji,Abir %A Zhang,Lu %A Fischer,Aurélie %A Aguayo,Gloria A %A Nazarov,Petr V %A Fagherazzi,Guy %+ Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, Strassen, L1445, Luxembourg, 1 26970 457, guy.fagherazzi@gmail.com %K vocal biomarker %K COVID-19 %K ageusia %K anosmia %K loss of smell %K loss of taste %K digital assessment tool %K digital health %K medical informatics %K telehealth %K telemonitoring %K biomarker %K pandemic %K symptoms %K tool %K disease %K noninvasive %K AI %K artificial intelligence %K digital %K device %D 2022 %7 8.11.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: The COVID-19 disease has multiple symptoms, with anosmia and ageusia being the most prevalent, varying from 75% to 95% and from 50% to 80% of infected patients, respectively. An automatic assessment tool for these symptoms will help monitor the disease in a fast and noninvasive manner. Objective: We hypothesized that people with COVID-19 experiencing anosmia and ageusia had different voice features than those without such symptoms. Our objective was to develop an artificial intelligence pipeline to identify and internally validate a vocal biomarker of these symptoms for remotely monitoring them. Methods: This study used population-based data. Participants were assessed daily through a web-based questionnaire and asked to register 2 different types of voice recordings. They were adults (aged >18 years) who were confirmed by a polymerase chain reaction test to be positive for COVID-19 in Luxembourg and met the inclusion criteria. Statistical methods such as recursive feature elimination for dimensionality reduction, multiple statistical learning methods, and hypothesis tests were used throughout this study. The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) Prediction Model Development checklist was used to structure the research. Results: This study included 259 participants. Younger (aged <35 years) and female participants showed higher rates of ageusia and anosmia. Participants were aged 41 (SD 13) years on average, and the data set was balanced for sex (female: 134/259, 51.7%; male: 125/259, 48.3%). The analyzed symptom was present in 94 (36.3%) out of 259 participants and in 450 (27.5%) out of 1636 audio recordings. In all, 2 machine learning models were built, one for Android and one for iOS devices, and both had high accuracy—88% for Android and 85% for iOS. The final biomarker was then calculated using these models and internally validated. Conclusions: This study demonstrates that people with COVID-19 who have anosmia and ageusia have different voice features from those without these symptoms. Upon further validation, these vocal biomarkers could be nested in digital devices to improve symptom assessment in clinical practice and enhance the telemonitoring of COVID-19–related symptoms. Trial Registration: Clinicaltrials.gov NCT04380987; https://clinicaltrials.gov/ct2/show/NCT04380987 %M 36265042 %R 10.2196/35622 %U https://medinform.jmir.org/2022/11/e35622 %U https://doi.org/10.2196/35622 %U http://www.ncbi.nlm.nih.gov/pubmed/36265042 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e36877 %T Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study %A Wendelboe,Aaron %A Saber,Ibrahim %A Dvorak,Justin %A Adamski,Alys %A Feland,Natalie %A Reyes,Nimia %A Abe,Karon %A Ortel,Thomas %A Raskob,Gary %+ Department of Biostatistics and Epidemiology, Hudson College of Public Health, University of Oklahoma Health Sciences Center, CHB Room 301, 801 NE 13th Street, Oklahoma City, OK, 73104, United States, 1 405 271 2229 ext 57897, Aaron-Wendelboe@ouhsc.edu %K venous thromboembolism %K public health surveillance %K machine learning %K natural language processing %K medical imaging review %K public health %D 2022 %7 5.8.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review. Objective: We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)—an NLP tool—in automatically classifying cases of VTE by “reading” unstructured text from diagnostic imaging records collected from 2012 to 2014. Methods: After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians’ comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05. Results: The VTE model of IDEAL-X “read” 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI 93.7%-93.8%), 96.3% sensitivity (95% CI 96.2%-96.4%), 92% specificity (95% CI 91.9%-92%), an 89.1% positive predictive value (95% CI 89%-89.2%), and a 97.3% negative predictive value (95% CI 97.3%-97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI 97.8%-98%) than at the OUHSC (93.3%, 95% CI 93.1%-93.4%; P<.001), but the specificity was higher at the OUHSC (95.9%, 95% CI 95.8%-96%) than at Duke University (86.5%, 95% CI 86.4%-86.7%; P<.001). Conclusions: The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process. %M 37206160 %R 10.2196/36877 %U https://bioinform.jmir.org/2022/1/e36877 %U https://doi.org/10.2196/36877 %U http://www.ncbi.nlm.nih.gov/pubmed/37206160 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e34126 %T A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study %A Yu,Fangzhou %A Wu,Peixia %A Deng,Haowen %A Wu,Jingfang %A Sun,Shan %A Yu,Huiqian %A Yang,Jianming %A Luo,Xianyang %A He,Jing %A Ma,Xiulan %A Wen,Junxiong %A Qiu,Danhong %A Nie,Guohui %A Liu,Rizhao %A Hu,Guohua %A Chen,Tao %A Zhang,Cheng %A Li,Huawei %+ Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, Room 611, Building 9, No. 83, Fenyang Road, Xuhui District, Shanghai, 200031, China, 86 021 64377134 ext 2669, hwli@shmu.edu.cn %K vestibular disorders %K machine learning %K diagnostic model %K vertigo %K ENT %K questionnaire %D 2022 %7 3.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Questionnaires have been used in the past 2 decades to predict the diagnosis of vertigo and assist clinical decision-making. A questionnaire-based machine learning model is expected to improve the efficiency of diagnosis of vestibular disorders. Objective: This study aims to develop and validate a questionnaire-based machine learning model that predicts the diagnosis of vertigo. Methods: In this multicenter prospective study, patients presenting with vertigo entered a consecutive cohort at their first visit to the ENT and vertigo clinics of 7 tertiary referral centers from August 2019 to March 2021, with a follow-up period of 2 months. All participants completed a diagnostic questionnaire after eligibility screening. Patients who received only 1 final diagnosis by their treating specialists for their primary complaint were included in model development and validation. The data of patients enrolled before February 1, 2021 were used for modeling and cross-validation, while patients enrolled afterward entered external validation. Results: A total of 1693 patients were enrolled, with a response rate of 96.2% (1693/1760). The median age was 51 (IQR 38-61) years, with 991 (58.5%) females; 1041 (61.5%) patients received the final diagnosis during the study period. Among them, 928 (54.8%) patients were included in model development and validation, and 113 (6.7%) patients who enrolled later were used as a test set for external validation. They were classified into 5 diagnostic categories. We compared 9 candidate machine learning methods, and the recalibrated model of light gradient boosting machine achieved the best performance, with an area under the curve of 0.937 (95% CI 0.917-0.962) in cross-validation and 0.954 (95% CI 0.944-0.967) in external validation. Conclusions: The questionnaire-based light gradient boosting machine was able to predict common vestibular disorders and assist decision-making in ENT and vertigo clinics. Further studies with a larger sample size and the participation of neurologists will help assess the generalization and robustness of this machine learning method. %M 35921135 %R 10.2196/34126 %U https://www.jmir.org/2022/8/e34126 %U https://doi.org/10.2196/34126 %U http://www.ncbi.nlm.nih.gov/pubmed/35921135 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 2 %P e36825 %T A Computerized Cognitive Test Battery for Detection of Dementia and Mild Cognitive Impairment: Instrument Validation Study %A Ye,Siao %A Sun,Kevin %A Huynh,Duong %A Phi,Huy Q %A Ko,Brian %A Huang,Bin %A Hosseini Ghomi,Reza %+ BrainCheck, Inc, 5616 Kirby Dr. # 690, Houston, TX, 77005, United States, 1 888 416 0004, bin@braincheck.com %K cognitive test %K mild cognitive impairment %K dementia %K cognitive decline %K repeatable battery %K discriminant analysis %D 2022 %7 15.4.2022 %9 Original Paper %J JMIR Aging %G English %X Background: Early detection of dementia is critical for intervention and care planning but remains difficult. Computerized cognitive testing provides an accessible and promising solution to address these current challenges. Objective: The aim of this study was to evaluate a computerized cognitive testing battery (BrainCheck) for its diagnostic accuracy and ability to distinguish the severity of cognitive impairment. Methods: A total of 99 participants diagnosed with dementia, mild cognitive impairment (MCI), or normal cognition (NC) completed the BrainCheck battery. Statistical analyses compared participant performances on BrainCheck based on their diagnostic group. Results: BrainCheck battery performance showed significant differences between the NC, MCI, and dementia groups, achieving 88% or higher sensitivity and specificity (ie, true positive and true negative rates) for separating dementia from NC, and 77% or higher sensitivity and specificity in separating the MCI group from the NC and dementia groups. Three-group classification found true positive rates of 80% or higher for the NC and dementia groups and true positive rates of 64% or higher for the MCI group. Conclusions: BrainCheck was able to distinguish between diagnoses of dementia, MCI, and NC, providing a potentially reliable tool for early detection of cognitive impairment. %M 35436212 %R 10.2196/36825 %U https://aging.jmir.org/2022/2/e36825 %U https://doi.org/10.2196/36825 %U http://www.ncbi.nlm.nih.gov/pubmed/35436212 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e31106 %T A Digital Screening System for Alzheimer Disease Based on a Neuropsychological Test and a Convolutional Neural Network: System Development and Validation %A Cheah,Wen-Ting %A Hwang,Jwu-Jia %A Hong,Sheng-Yi %A Fu,Li-Chen %A Chang,Yu-Ling %A Chen,Ta-Fu %A Chen,I-An %A Chou,Chun-Chen %+ Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd, Taipei, 10617, Taiwan, 886 935545846, lichen@ntu.edu.tw %K Alzheimer disease %K mild cognitive impairment %K screening system %K convolutional neural network %K Rey-Osterrieth Complex Figure %D 2022 %7 9.3.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Alzheimer disease (AD) and other types of dementia are now considered one of the world’s most pressing health problems for aging people worldwide. It was the seventh-leading cause of death, globally, in 2019. With a growing number of patients with dementia and increasing costs for treatment and care, early detection of the disease at the stage of mild cognitive impairment (MCI) will prevent the rapid progression of dementia. In addition to reducing the physical and psychological stress of patients’ caregivers in the long term, it will also improve the everyday quality of life of patients. Objective: The aim of this study was to design a digital screening system to discriminate between patients with MCI and AD and healthy controls (HCs), based on the Rey-Osterrieth Complex Figure (ROCF) neuropsychological test. Methods: The study took place at National Taiwan University between 2018 and 2019. In order to develop the system, pretraining was performed using, and features were extracted from, an open sketch data set using a data-driven deep learning approach through a convolutional neural network. Later, the learned features were transferred to our collected data set to further train the classifier. The first data set was collected using pen and paper for the traditional method. The second data set used a tablet and smart pen for data collection. The system’s performance was then evaluated using the data sets. Results: The performance of the designed system when using the data set that was collected using the traditional pen and paper method resulted in a mean area under the receiver operating characteristic curve (AUROC) of 0.913 (SD 0.004) when distinguishing between patients with MCI and HCs. On the other hand, when discriminating between patients with AD and HCs, the mean AUROC was 0.950 (SD 0.003) when using the data set that was collected using the digitalized method. Conclusions: The automatic ROCF test scoring system that we designed showed satisfying results for differentiating between patients with AD and MCI and HCs. Comparatively, our proposed network architecture provided better performance than our previous work, which did not include data augmentation and dropout techniques. In addition, it also performed better than other existing network architectures, such as AlexNet and Sketch-a-Net, with transfer learning techniques. The proposed system can be incorporated with other tests to assist clinicians in the early diagnosis of AD and to reduce the physical and mental burden on patients’ family and friends. %M 35262497 %R 10.2196/31106 %U https://medinform.jmir.org/2022/3/e31106 %U https://doi.org/10.2196/31106 %U http://www.ncbi.nlm.nih.gov/pubmed/35262497 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e28781 %T State of the Art of Machine Learning–Enabled Clinical Decision Support in Intensive Care Units: Literature Review %A Hong,Na %A Liu,Chun %A Gao,Jianwei %A Han,Lin %A Chang,Fengxiang %A Gong,Mengchun %A Su,Longxiang %+ Department of Critical Care Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, No.1 Shuaifuyuan Wangfujing Dongcheng District, Beijing, China, 86 10 69152308, sulongxiang@vip.163.com %K machine learning %K intensive care units %K clinical decision support %K prediction model %K artificial intelligence %K electronic health records %D 2022 %7 3.3.2022 %9 Review %J JMIR Med Inform %G English %X Background: Modern clinical care in intensive care units is full of rich data, and machine learning has great potential to support clinical decision-making. The development of intelligent machine learning–based clinical decision support systems is facing great opportunities and challenges. Clinical decision support systems may directly help clinicians accurately diagnose, predict outcomes, identify risk events, or decide treatments at the point of care. Objective: We aimed to review the research and application of machine learning–enabled clinical decision support studies in intensive care units to help clinicians, researchers, developers, and policy makers better understand the advantages and limitations of machine learning–supported diagnosis, outcome prediction, risk event identification, and intensive care unit point-of-care recommendations. Methods: We searched papers published in the PubMed database between January 1980 and October 2020. We defined selection criteria to identify papers that focused on machine learning–enabled clinical decision support studies in intensive care units and reviewed the following aspects: research topics, study cohorts, machine learning models, analysis variables, and evaluation metrics. Results: A total of 643 papers were collected, and using our selection criteria, 97 studies were found. Studies were categorized into 4 topics—monitoring, detection, and diagnosis (13/97, 13.4%), early identification of clinical events (32/97, 33.0%), outcome prediction and prognosis assessment (46/97, 47.6%), and treatment decision (6/97, 6.2%). Of the 97 papers, 82 (84.5%) studies used data from adult patients, 9 (9.3%) studies used data from pediatric patients, and 6 (6.2%) studies used data from neonates. We found that 65 (67.0%) studies used data from a single center, and 32 (33.0%) studies used a multicenter data set; 88 (90.7%) studies used supervised learning, 3 (3.1%) studies used unsupervised learning, and 6 (6.2%) studies used reinforcement learning. Clinical variable categories, starting with the most frequently used, were demographic (n=74), laboratory values (n=59), vital signs (n=55), scores (n=48), ventilation parameters (n=43), comorbidities (n=27), medications (n=18), outcome (n=14), fluid balance (n=13), nonmedicine therapy (n=10), symptoms (n=7), and medical history (n=4). The most frequently adopted evaluation metrics for clinical data modeling studies included area under the receiver operating characteristic curve (n=61), sensitivity (n=51), specificity (n=41), accuracy (n=29), and positive predictive value (n=23). Conclusions: Early identification of clinical and outcome prediction and prognosis assessment contributed to approximately 80% of studies included in this review. Using new algorithms to solve intensive care unit clinical problems by developing reinforcement learning, active learning, and time-series analysis methods for clinical decision support will be greater development prospects in the future. %M 35238790 %R 10.2196/28781 %U https://medinform.jmir.org/2022/3/e28781 %U https://doi.org/10.2196/28781 %U http://www.ncbi.nlm.nih.gov/pubmed/35238790 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 1 %P e34475 %T The Use of a Computerized Cognitive Assessment to Improve the Efficiency of Primary Care Referrals to Memory Services: Protocol for the Accelerating Dementia Pathway Technologies (ADePT) Study %A Kalafatis,Chris %A Modarres,Mohammad Hadi %A Apostolou,Panos %A Tabet,Naji %A Khaligh-Razavi,Seyed-Mahdi %+ Cognetivity Neurosciences Ltd, 3 Waterhouse Square, London, EC1N 2SW, United Kingdom, 44 020 3002 362, seyed@cognetivity.com %K primary health care %K general practice %K dementia %K cognitive assessment %K artificial intelligence %K early diagnosis %K cognition %K assessment %K efficiency %K diagnosis %K COVID-19 %K memory %K mental health %K impairment %K screening %K detection %K efficiency %D 2022 %7 27.1.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Existing primary care cognitive assessment tools are crude or time-consuming screening instruments which can only detect cognitive impairment when it is well established. Due to the COVID-19 pandemic, memory services have adapted to the new environment by moving to remote patient assessments to continue meeting service user demand. However, the remote use of cognitive assessments has been variable while there has been scant evaluation of the outcome of such a change in clinical practice. Emerging research in remote memory clinics has highlighted computerized cognitive tests, such as the Integrated Cognitive Assessment (ICA), as prominent candidates for adoption in clinical practice both during the pandemic and for post-COVID-19 implementation as part of health care innovation. Objective: The aim of the Accelerating Dementia Pathway Technologies (ADePT) study is to develop a real-world evidence basis to support the adoption of ICA as an inexpensive screening tool for the detection of cognitive impairment to improve the efficiency of the dementia care pathway. Methods: Patients who have been referred to a memory clinic by a general practitioner (GP) are recruited. Participants complete the ICA either at home or in the clinic along with medical history and usability questionnaires. The GP referral and ICA outcome are compared with the specialist diagnosis obtained at the memory clinic. The clinical outcomes as well as National Health Service reference costing data will be used to assess the potential health and economic benefits of the use of the ICA in the dementia diagnosis pathway. Results: The ADePT study was funded in January 2020 by Innovate UK (Project Number 105837). As of September 2021, 86 participants have been recruited in the study, with 23 participants also completing a retest visit. Initially, the study was designed for in-person visits at the memory clinic; however, in light of the COVID-19 pandemic, the study was amended to allow remote as well as face-to-face visits. The study was also expanded from a single site to 4 sites in the United Kingdom. We expect results to be published by the second quarter of 2022. Conclusions: The ADePT study aims to improve the efficiency of the dementia care pathway at its very beginning and supports systems integration at the intersection between primary and secondary care. The introduction of a standardized, self-administered, digital assessment tool for the timely detection of neurodegeneration as part of a decision support system that can signpost accordingly can reduce unnecessary referrals, service backlog, and assessment variability. Trial Registration: ISRCTN 16596456; https://www.isrctn.com/ISRCTN16596456 International Registered Report Identifier (IRRID): DERR1-10.2196/34475 %M 34932495 %R 10.2196/34475 %U https://www.researchprotocols.org/2022/1/e34475 %U https://doi.org/10.2196/34475 %U http://www.ncbi.nlm.nih.gov/pubmed/34932495 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e25328 %T Can Real-time Computer-Aided Detection Systems Diminish the Risk of Postcolonoscopy Colorectal Cancer? %A Madalinski,Mariusz %A Prudham,Roger %+ Northern Care Alliance, Royal Oldham Hospital, Rochdale Rd, Oldham, OL1 2JH, United Kingdom, 44 01616240420, mariusz.madalinski@googlemail.com %K artificial intelligence %K colonoscopy %K adenoma %K real-time computer-aided detection %K colonic polyp %D 2021 %7 24.12.2021 %9 Viewpoint %J JMIR Med Inform %G English %X The adenoma detection rate is the constant subject of research and the main marker of quality in bowel cancer screening. However, by improving the quality of endoscopy via artificial intelligence methods, all polyps, including those with the potential for malignancy, can be removed, thereby reducing interval colorectal cancer rates. As such, the removal of all polyps may become the best marker of endoscopy quality. Thus, we present a viewpoint on integrating the computer-aided detection (CADe) of polyps with high-accuracy, real-time colonoscopy to challenge quality improvements in the performance of colonoscopy. Colonoscopy for bowel cancer screening involving the integration of a deep learning methodology (ie, integrating artificial intelligence with CADe systems) has been assessed in an effort to increase the adenoma detection rate. In this viewpoint, a few studies are described, and their results show that CADe systems are able to increase screening sensitivity. The detection of adenomatous polyps, which are associated with a potential risk of progression to colorectal cancer, and their removal are expected to reduce cancer incidence and mortality rates. However, so far, artificial intelligence methods do not increase the detection of cancer or large adenomatous polyps but contribute to the detection of small precancerous polyps. %M 34571490 %R 10.2196/25328 %U https://medinform.jmir.org/2021/12/e25328 %U https://doi.org/10.2196/25328 %U http://www.ncbi.nlm.nih.gov/pubmed/34571490 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e30798 %T Artificial Intelligence in Predicting Cardiac Arrest: Scoping Review %A Alamgir,Asma %A Mousa,Osama %A Shah,Zubair %+ College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Education City, PO BOX 34110, Street 2731, Al Luqta St, Ar-Rayyan, Doha, Qatar, 974 5074 4851, zshah@hbku.edu.qa %K artificial intelligence %K machine learning %K deep learning %K cardiac arrest %K predict %D 2021 %7 17.12.2021 %9 Review %J JMIR Med Inform %G English %X Background: Cardiac arrest is a life-threatening cessation of activity in the heart. Early prediction of cardiac arrest is important, as it allows for the necessary measures to be taken to prevent or intervene during the onset. Artificial intelligence (AI) technologies and big data have been increasingly used to enhance the ability to predict and prepare for the patients at risk. Objective: This study aims to explore the use of AI technology in predicting cardiac arrest as reported in the literature. Methods: A scoping review was conducted in line with the guidelines of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extension for scoping reviews. Scopus, ScienceDirect, Embase, the Institute of Electrical and Electronics Engineers, and Google Scholar were searched to identify relevant studies. Backward reference list checks of the included studies were also conducted. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively. Results: Out of 697 citations retrieved, 41 studies were included in the review, and 6 were added after backward citation checking. The included studies reported the use of AI in the prediction of cardiac arrest. Of the 47 studies, we were able to classify the approaches taken by the studies into 3 different categories: 26 (55%) studies predicted cardiac arrest by analyzing specific parameters or variables of the patients, whereas 16 (34%) studies developed an AI-based warning system. The remaining 11% (5/47) of studies focused on distinguishing patients at high risk of cardiac arrest from patients who were not at risk. Two studies focused on the pediatric population, and the rest focused on adults (45/47, 96%). Most of the studies used data sets with a size of <10,000 samples (32/47, 68%). Machine learning models were the most prominent branch of AI used in the prediction of cardiac arrest in the studies (38/47, 81%), and the most used algorithm was the neural network (23/47, 49%). K-fold cross-validation was the most used algorithm evaluation tool reported in the studies (24/47, 51%). Conclusions: AI is extensively used to predict cardiac arrest in different patient settings. Technology is expected to play an integral role in improving cardiac medicine. There is a need for more reviews to learn the obstacles to the implementation of AI technologies in clinical settings. Moreover, research focusing on how to best provide clinicians with support to understand, adapt, and implement this technology in their practice is also necessary. %M 34927595 %R 10.2196/30798 %U https://medinform.jmir.org/2021/12/e30798 %U https://doi.org/10.2196/30798 %U http://www.ncbi.nlm.nih.gov/pubmed/34927595 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e33540 %T How Clinicians Perceive Artificial Intelligence–Assisted Technologies in Diagnostic Decision Making: Mixed Methods Approach %A Hah,Hyeyoung %A Goldin,Deana Shevit %+ Information Systems and Business Analytics, College of Business, Florida International University, 11200 SW 8th Street, Miami, FL, 33199, United States, 1 3053484342, hhah@fiu.edu %K artificial intelligence algorithms %K AI %K diagnostic capability %K virtual care %K multilevel modeling %K human-AI teaming %K natural language understanding %D 2021 %7 16.12.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: With the rapid development of artificial intelligence (AI) and related technologies, AI algorithms are being embedded into various health information technologies that assist clinicians in clinical decision making. Objective: This study aimed to explore how clinicians perceive AI assistance in diagnostic decision making and suggest the paths forward for AI-human teaming for clinical decision making in health care. Methods: This study used a mixed methods approach, utilizing hierarchical linear modeling and sentiment analysis through natural language understanding techniques. Results: A total of 114 clinicians participated in online simulation surveys in 2020 and 2021. These clinicians studied family medicine and used AI algorithms to aid in patient diagnosis. Their overall sentiment toward AI-assisted diagnosis was positive and comparable with diagnoses made without the assistance of AI. However, AI-guided decision making was not congruent with the way clinicians typically made decisions in diagnosing illnesses. In a quantitative survey, clinicians reported perceiving current AI assistance as not likely to enhance diagnostic capability and negatively influenced their overall performance (β=–0.421, P=.02). Instead, clinicians’ diagnostic capabilities tended to be associated with well-known parameters, such as education, age, and daily habit of technology use on social media platforms. Conclusions: This study elucidated clinicians’ current perceptions and sentiments toward AI-enabled diagnosis. Although the sentiment was positive, the current form of AI assistance may not be linked with efficient decision making, as AI algorithms are not well aligned with subjective human reasoning in clinical diagnosis. Developers and policy makers in health could gather behavioral data from clinicians in various disciplines to help align AI algorithms with the unique subjective patterns of reasoning that humans employ in clinical diagnosis. %M 34924356 %R 10.2196/33540 %U https://www.jmir.org/2021/12/e33540 %U https://doi.org/10.2196/33540 %U http://www.ncbi.nlm.nih.gov/pubmed/34924356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e33267 %T Computer-Aided Diagnosis of Gastrointestinal Ulcer and Hemorrhage Using Wireless Capsule Endoscopy: Systematic Review and Diagnostic Test Accuracy Meta-analysis %A Bang,Chang Seok %A Lee,Jae Jun %A Baik,Gwang Ho %+ Department of Internal Medicine, Hallym University College of Medicine, 77 Sakju-ro, Chuncheon, 24253, Republic of Korea, 82 33 240 5821, csbang@hallym.ac.kr %K artificial intelligence %K computer-aided diagnosis %K capsule endoscopy %K ulcer %K hemorrhage %K gastrointestinal %K endoscopy %K review %K accuracy %K meta-analysis %K diagnostic %K performance %K machine learning %K prediction models %D 2021 %7 14.12.2021 %9 Review %J J Med Internet Res %G English %X Background: Interpretation of capsule endoscopy images or movies is operator-dependent and time-consuming. As a result, computer-aided diagnosis (CAD) has been applied to enhance the efficacy and accuracy of the review process. Two previous meta-analyses reported the diagnostic performance of CAD models for gastrointestinal ulcers or hemorrhage in capsule endoscopy. However, insufficient systematic reviews have been conducted, which cannot determine the real diagnostic validity of CAD models. Objective: To evaluate the diagnostic test accuracy of CAD models for gastrointestinal ulcers or hemorrhage using wireless capsule endoscopic images. Methods: We conducted core databases searching for studies based on CAD models for the diagnosis of ulcers or hemorrhage using capsule endoscopy and presenting data on diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Overall, 39 studies were included. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of ulcers (or erosions) were .97 (95% confidence interval, .95–.98), .93 (.89–.95), .92 (.89–.94), and 138 (79–243), respectively. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of hemorrhage (or angioectasia) were .99 (.98–.99), .96 (.94–0.97), .97 (.95–.99), and 888 (343–2303), respectively. Subgroup analyses showed robust results. Meta-regression showed that published year, number of training images, and target disease (ulcers vs erosions, hemorrhage vs angioectasia) was found to be the source of heterogeneity. No publication bias was detected. Conclusions: CAD models showed high performance for the optical diagnosis of gastrointestinal ulcer and hemorrhage in wireless capsule endoscopy. %M 34904949 %R 10.2196/33267 %U https://www.jmir.org/2021/12/e33267 %U https://doi.org/10.2196/33267 %U http://www.ncbi.nlm.nih.gov/pubmed/34904949 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e27363 %T Machine Learning Algorithms to Detect Subclinical Keratoconus: Systematic Review %A Maile,Howard %A Li,Ji-Peng Olivia %A Gore,Daniel %A Leucci,Marcello %A Mulholland,Padraig %A Hau,Scott %A Szabo,Anita %A Moghul,Ismail %A Balaskas,Konstantinos %A Fujinami,Kaoru %A Hysi,Pirro %A Davidson,Alice %A Liskova,Petra %A Hardcastle,Alison %A Tuft,Stephen %A Pontikos,Nikolas %+ UCL Institute of Ophthalmology, University College London, 11-43 Bath Street, London, EC1V 9EL, United Kingdom, 44 (0)207608 ext 6800, n.pontikos@ucl.ac.uk %K artificial intelligence %K machine learning %K cornea %K keratoconus %K corneal tomography %K subclinical %K corneal imaging %K decision support systems %K corneal disease %K keratometry %D 2021 %7 13.12.2021 %9 Review %J JMIR Med Inform %G English %X Background: Keratoconus is a disorder characterized by progressive thinning and distortion of the cornea. If detected at an early stage, corneal collagen cross-linking can prevent disease progression and further visual loss. Although advanced forms are easily detected, reliable identification of subclinical disease can be problematic. Several different machine learning algorithms have been used to improve the detection of subclinical keratoconus based on the analysis of multiple types of clinical measures, such as corneal imaging, aberrometry, or biomechanical measurements. Objective: The aim of this study is to survey and critically evaluate the literature on the algorithmic detection of subclinical keratoconus and equivalent definitions. Methods: For this systematic review, we performed a structured search of the following databases: MEDLINE, Embase, and Web of Science and Cochrane Library from January 1, 2010, to October 31, 2020. We included all full-text studies that have used algorithms for the detection of subclinical keratoconus and excluded studies that did not perform validation. This systematic review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) recommendations. Results: We compared the measured parameters and the design of the machine learning algorithms reported in 26 papers that met the inclusion criteria. All salient information required for detailed comparison, including diagnostic criteria, demographic data, sample size, acquisition system, validation details, parameter inputs, machine learning algorithm, and key results are reported in this study. Conclusions: Machine learning has the potential to improve the detection of subclinical keratoconus or early keratoconus in routine ophthalmic practice. Currently, there is no consensus regarding the corneal parameters that should be included for assessment and the optimal design for the machine learning algorithm. We have identified avenues for further research to improve early detection and stratification of patients for early treatment to prevent disease progression. %M 34898463 %R 10.2196/27363 %U https://medinform.jmir.org/2021/12/e27363 %U https://doi.org/10.2196/27363 %U http://www.ncbi.nlm.nih.gov/pubmed/34898463 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e33049 %T Differential Biases and Variabilities of Deep Learning–Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study %A Cha,Dongchul %A Pae,Chongwon %A Lee,Se A %A Na,Gina %A Hur,Young Kyun %A Lee,Ho Young %A Cho,A Ra %A Cho,Young Joon %A Han,Sang Gil %A Kim,Sung Huhn %A Choi,Jae Young %A Park,Hae-Jeong %+ Center for Systems and Translational Brain Sciences, Institute of Human Complexity and Systems Science, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Republic of Korea, 82 2 2228 2363, parkhj@yuhs.ac %K human-machine cooperation %K convolutional neural network %K deep learning, class imbalance problem %K otoscopy %K eardrum %K artificial intelligence %K otology %K computer-aided diagnosis %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Deep learning (DL)–based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application. Objective: This study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems. Methods: We used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models. Results: Our ML models had consistently high accuracies (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%), equivalent to those of otolaryngologists (balanced: mean 71.17%, SD 3.37%; imbalanced: mean 72.84%, SD 6.41%) and far better than those of nonotolaryngologists (balanced: mean 45.63%, SD 7.89%; imbalanced: mean 44.08%, SD 15.83%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07). Conclusions: Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation. %M 34889764 %R 10.2196/33049 %U https://medinform.jmir.org/2021/12/e33049 %U https://doi.org/10.2196/33049 %U http://www.ncbi.nlm.nih.gov/pubmed/34889764 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e29212 %T Prediction Algorithms for Blood Pressure Based on Pulse Wave Velocity Using Health Checkup Data in Healthy Korean Men: Algorithm Development and Validation %A Park,Dohyun %A Cho,Soo Jin %A Kim,Kyunga %A Woo,Hyunki %A Kim,Jee Eun %A Lee,Jin-Young %A Koh,Janghyun %A Lee,JeanHyoung %A Choi,Jong Soo %A Chang,Dong Kyung %A Choi,Yoon-Ho %A Chung,Ji In %A Cha,Won Chul %A Jeong,Ok Soon %A Jekal,Se Yong %A Kang,Mira %+ Department of Digital Health, Samsung Advanced Institute of Health Sciences and Technology, Sungkyunkwan University, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 1099336838, mira90.kang@samsung.com %K blood pressure %K pulse transit time %K pulse wave velocity %K prediction model %K algorithms %K medical informatics %K wearable devices %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Pulse transit time and pulse wave velocity (PWV) are related to blood pressure (BP), and there were continuous attempts to use these to predict BP through wearable devices. However, previous studies were conducted on a small scale and could not confirm the relative importance of each variable in predicting BP. Objective: This study aims to predict systolic blood pressure and diastolic blood pressure based on PWV and to evaluate the relative importance of each clinical variable used in BP prediction models. Methods: This study was conducted on 1362 healthy men older than 18 years who visited the Samsung Medical Center. The systolic blood pressure and diastolic blood pressure were estimated using the multiple linear regression method. Models were divided into two groups based on age: younger than 60 years and 60 years or older; 200 seeds were repeated in consideration of partition bias. Mean of error, absolute error, and root mean square error were used as performance metrics. Results: The model divided into two age groups (younger than 60 years and 60 years and older) performed better than the model without division. The performance difference between the model using only three variables (PWV, BMI, age) and the model using 17 variables was not significant. Our final model using PWV, BMI, and age met the criteria presented by the American Association for the Advancement of Medical Instrumentation. The prediction errors were within the range of about 9 to 12 mmHg that can occur with a gold standard mercury sphygmomanometer. Conclusions: Dividing age based on the age of 60 years showed better BP prediction performance, and it could show good performance even if only PWV, BMI, and age variables were included. Our final model with the minimal number of variables (PWB, BMI, age) would be efficient and feasible for predicting BP. %M 34889753 %R 10.2196/29212 %U https://medinform.jmir.org/2021/12/e29212 %U https://doi.org/10.2196/29212 %U http://www.ncbi.nlm.nih.gov/pubmed/34889753 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e32507 %T Assessing the Performance of a New Artificial Intelligence–Driven Diagnostic Support Tool Using Medical Board Exam Simulations: Clinical Vignette Study %A Ben-Shabat,Niv %A Sloma,Ariel %A Weizman,Tomer %A Kiderman,David %A Amital,Howard %+ Department of Medicine ‘B’, Sheba Medical Center, Sheba Road 2, Ramat Gan, 52621, Israel, 972 3 530 2652, nivben7@gmail.com %K diagnostic decision support systems %K diagnostic support %K medical decision-making %K medical informatics %K artificial intelligence %K Kahun %K decision support %D 2021 %7 30.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians’ diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products. Objective: The aim of this study was to conduct a preliminary analysis of the diagnostic performance of “Kahun,” a new artificial intelligence-driven diagnostic tool. Methods: Diagnostic performance was evaluated based on the program’s ability to “solve” clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness. Results: The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5%; 95% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8%; 95% CI 66-87) were suggested within the top 10, and 168 (61.8%; 95% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68%; 95% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48%; 95% CI 38-59) within the top 10, and 24 (26%; 95% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96%; 95% CI 91-100), at least 2 out of 3 of the cases’ expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86%; 95% CI 79-93) were suggested within the top 10; and 61 (67%; 95% CI 57-77) within the top 5. Conclusions: The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings. %M 34672262 %R 10.2196/32507 %U https://medinform.jmir.org/2021/11/e32507 %U https://doi.org/10.2196/32507 %U http://www.ncbi.nlm.nih.gov/pubmed/34672262 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e29554 %T A Markerless 2D Video, Facial Feature Recognition–Based, Artificial Intelligence Model to Assist With Screening for Parkinson Disease: Development and Usability Study %A Hou,Xinyao %A Zhang,Yu %A Wang,Yanping %A Wang,Xinyi %A Zhao,Jiahao %A Zhu,Xiaobo %A Su,Jianbo %+ Department of Automation, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai, 200240, China, 86 21 34204276, jbsu@sjtu.edu.cn %K Parkinson disease %K facial features %K artificial intelligence %K diagnosis %D 2021 %7 19.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Masked face is a characteristic clinical manifestation of Parkinson disease (PD), but subjective evaluations from different clinicians often show low consistency owing to a lack of accurate detection technology. Hence, it is of great significance to develop methods to make monitoring easier and more accessible. Objective: The study aimed to develop a markerless 2D video, facial feature recognition–based, artificial intelligence (AI) model to assess facial features of PD patients and investigate how AI could help neurologists improve the performance of early PD diagnosis. Methods: We collected 140 videos of facial expressions from 70 PD patients and 70 matched controls from 3 hospitals using a single 2D video camera. We developed and tested an AI model that performs masked face recognition of PD patients based on the acquisition and evaluation of facial features including geometric and texture features. Random forest, support vector machines, and k-nearest neighbor were used to train the model. The diagnostic performance of the AI model was compared with that of 5 neurologists. Results: The experimental results showed that our AI models can achieve feasible and effective facial feature recognition ability to assist with PD diagnosis. The accuracy of PD diagnosis can reach 83% using geometric features. And with the model trained by random forest, the accuracy of texture features is up to 86%. When these 2 features are combined, an F1 value of 88% can be reached, where the random forest algorithm is used. Further, the facial features of patients with PD were not associated with the motor and nonmotor symptoms of PD. Conclusions: PD patients commonly exhibit masked facial features. Videos of a facial feature recognition–based AI model can provide a valuable tool to assist with PD diagnosis and the potential of realizing remote monitoring of the patient’s condition, especially during the COVID-19 pandemic. %M 34806994 %R 10.2196/29554 %U https://www.jmir.org/2021/11/e29554 %U https://doi.org/10.2196/29554 %U http://www.ncbi.nlm.nih.gov/pubmed/34806994 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e26480 %T Willingness of Chinese Men Who Have Sex With Men to Use Smartphone-Based Electronic Readers for HIV Self-testing: Web-Based Cross-sectional Study %A Marley,Gifty %A Fu,Gengfeng %A Zhang,Ye %A Li,Jianjun %A Tucker,Joseph D %A Tang,Weiming %A Yu,Rongbin %+ School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing, 211166, China, 86 13851545125, rongbinyu@njmu.edu.cn %K smartphone-based electronic reader %K electronic readers %K HIV self-testing %K HIVST %K self-testing %K cellular phone–based readers %K mHealth %D 2021 %7 19.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The need for strategies to encourage user-initiated reporting of results after HIV self-testing (HIVST) persists. Smartphone-based electronic readers (SERs) have been shown capable of reading diagnostics results accurately in point-of-care diagnostics and could bridge the current gaps between HIVST and linkage to care. Objective: Our study aimed to assess the willingness of Chinese men who have sex with men (MSM) in the Jiangsu province to use an SER for HIVST through a web-based cross-sectional study. Methods: From February to April 2020, we conducted a convenience web-based survey among Chinese MSM by using a pretested structured questionnaire. Survey items were adapted from previous HIVST feasibility studies and modified as required. Prior to answering reader-related questions, participants watched a video showcasing a prototype SER. Statistical analysis included descriptive analysis, chi-squared test, and multivariable logistic regression. P values less than .05 were deemed statistically significant. Results: Of 692 participants, 369 (53.3%) were aged 26-40 years, 456 (65.9%) had ever self-tested for HIV, and 493 (71.2%) were willing to use an SER for HIVST. Approximately 98% (483/493) of the willing participants, 85.3% (459/538) of ever self-tested and never self-tested, and 40% (46/115) of unwilling participants reported that SERs would increase their HIVST frequency. Engaging in unprotected anal intercourse with regular partners compared to consistently using condoms (adjusted odds ratio [AOR] 3.04, 95% CI 1.19-7.74) increased the odds of willingness to use an SER for HIVST. Participants who had ever considered HIVST at home with a partner right before sex compared to those who had not (AOR 2.99, 95% CI 1.13-7.90) were also more willing to use an SER for HIVST. Playing receptive roles during anal intercourse compared to playing insertive roles (AOR 0.05, 95% CI 0.02-0.14) was associated with decreased odds of being willing to use an SER for HIVST. The majority of the participants (447/608, 73.5%) preferred to purchase readers from local Centers of Disease Control and Prevention offices and 51.2% (311/608) of the participants were willing to pay less than US $4.70 for a reader device. Conclusions: The majority of the Chinese MSM, especially those with high sexual risk behaviors, were willing to use an SER for HIVST. Many MSM were also willing to self-test more frequently for HIV with an SER. Further research is needed to ascertain the diagnostic and real-time data-capturing capacity of prototype SERs during HIVST. %M 34806988 %R 10.2196/26480 %U https://www.jmir.org/2021/11/e26480 %U https://doi.org/10.2196/26480 %U http://www.ncbi.nlm.nih.gov/pubmed/34806988 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e29749 %T The Role of Machine Learning in Diagnosing Bipolar Disorder: Scoping Review %A Jan,Zainab %A AI-Ansari,Noor %A Mousa,Osama %A Abd-alrazaq,Alaa %A Ahmed,Arfan %A Alam,Tanvir %A Househ,Mowafa %+ Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Education City, AI Luqta St AI-Rayyan, Doha, 5825, Qatar, 974 55708549, mhouseh@hbku.edu.qa %K machine learning %K bipolar disorder %K diagnosis %K support vector machine %K clinical data %K mental health %K scoping review %D 2021 %7 19.11.2021 %9 Review %J J Med Internet Res %G English %X Background: Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD. Objective: This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes. Methods: The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed. Results: We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58%). We identified different machine learning models used in the selected studies, including classification models (18, 55%), regression models (5, 16%), model-based clustering methods (2, 6%), natural language processing (1, 3%), clustering algorithms (1, 3%), and deep learning–based models (3, 9%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98%, whereas the minimum accuracy range was 64%. Conclusions: This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry. %M 34806996 %R 10.2196/29749 %U https://www.jmir.org/2021/11/e29749 %U https://doi.org/10.2196/29749 %U http://www.ncbi.nlm.nih.gov/pubmed/34806996 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e30066 %T Deep Learning Techniques for Fatty Liver Using Multi-View Ultrasound Images Scanned by Different Scanners: Development and Validation Study %A Kim,Taewoo %A Lee,Dong Hyun %A Park,Eun-Kee %A Choi,Sanghun %+ School of Mechanical Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, Republic of Korea, 82 53 950 5578, s-choi@knu.ac.kr %K fatty liver %K deep learning %K transfer learning %K classification %K regression %K magnetic resonance imaging–proton density fat fraction %K multi-view ultrasound images %K artificial intelligence %K machine imaging %K imaging %K informatics %K fatty liver disease %K detection %K diagnosis %D 2021 %7 18.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Fat fraction values obtained from magnetic resonance imaging (MRI) can be used to obtain an accurate diagnosis of fatty liver diseases. However, MRI is expensive and cannot be performed for everyone. Objective: In this study, we aim to develop multi-view ultrasound image–based convolutional deep learning models to detect fatty liver disease and yield fat fraction values. Methods: We extracted 90 ultrasound images of the right intercostal view and 90 ultrasound images of the right intercostal view containing the right renal cortex from 39 cases of fatty liver (MRI–proton density fat fraction [MRI–PDFF] ≥ 5%) and 51 normal subjects (MRI–PDFF < 5%), with MRI–PDFF values obtained from Good Gang-An Hospital. We obtained combined liver and kidney-liver (CLKL) images to train the deep learning models and developed classification and regression models based on the VGG19 model to classify fatty liver disease and yield fat fraction values. We employed the data augmentation techniques such as flip and rotation to prevent the deep learning model from overfitting. We determined the deep learning model with performance metrics such as accuracy, sensitivity, specificity, and coefficient of determination (R2). Results: In demographic information, all metrics such as age and sex were similar between the two groups—fatty liver disease and normal subjects. In classification, the model trained on CLKL images achieved 80.1% accuracy, 86.2% precision, and 80.5% specificity to detect fatty liver disease. In regression, the predicted fat fraction values of the regression model trained on CLKL images correlated with MRI–PDFF values (R2=0.633), indicating that the predicted fat fraction values were moderately estimated. Conclusions: With deep learning techniques and multi-view ultrasound images, it is potentially possible to replace MRI–PDFF values with deep learning predictions for detecting fatty liver disease and estimating fat fraction values. %M 34792476 %R 10.2196/30066 %U https://medinform.jmir.org/2021/11/e30066 %U https://doi.org/10.2196/30066 %U http://www.ncbi.nlm.nih.gov/pubmed/34792476 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e25192 %T Developing and Demonstrating the Viability and Availability of the Multilevel Implementation Strategy for Syncope Optimal Care Through Engagement (MISSION) Syncope App: Evidence-Based Clinical Decision Support Tool %A Amin,Shiraz %A Gupta,Vedant %A Du,Gaixin %A McMullen,Colleen %A Sirrine,Matthew %A Williams,Mark V %A Smyth,Susan S %A Chadha,Romil %A Stearley,Seth %A Li,Jing %+ Department of Medicine, Washington University School of Medicine, 600 S Taylor Ave, 00155K, Campus Box 8005, St. Louis, MO, 63110, United States, 1 314 273 9386, l.jing@wustl.edu %K cardiology %K medical diagnosis %K medicine %K mobile applications %K prognostics and health %K syncope %D 2021 %7 16.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Syncope evaluation and management is associated with testing overuse and unnecessary hospitalizations. The 2017 American College of Cardiology/American Heart Association (ACC/AHA) Syncope Guideline aims to standardize clinical practice and reduce unnecessary services. The use of clinical decision support (CDS) tools offers the potential to successfully implement evidence-based clinical guidelines. However, CDS tools that provide an evidence-based differential diagnosis (DDx) of syncope at the point of care are currently lacking. Objective: With input from diverse health systems, we developed and demonstrated the viability of a mobile app, the Multilevel Implementation Strategy for Syncope optImal care thrOugh eNgagement (MISSION) Syncope, as a CDS tool for syncope diagnosis and prognosis. Methods: Development of the app had three main goals: (1) reliable generation of an accurate DDx, (2) incorporation of an evidence-based clinical risk tool for prognosis, and (3) user-based design and technical development. To generate a DDx that incorporated assessment recommendations, we reviewed guidelines and the literature to determine clinical assessment questions (variables) and likelihood ratios (LHRs) for each variable in predicting etiology. The creation and validation of the app diagnosis occurred through an iterative clinician review and application to actual clinical cases. The review of available risk score calculators focused on identifying an easily applied and valid evidence-based clinical risk stratification tool. The review and decision-making factors included characteristics of the original study, clinical variables, and validation studies. App design and development relied on user-centered design principles. We used observations of the emergency department workflow, storyboard demonstration, multiple mock review sessions, and beta-testing to optimize functionality and usability. Results: The MISSION Syncope app is consistent with guideline recommendations on evidence-based practice (EBP), and its user interface (UI) reflects steps in a real-world patient evaluation: assessment, DDx, risk stratification, and recommendations. The app provides flexible clinical decision making, while emphasizing a care continuum; it generates recommendations for diagnosis and prognosis based on user input. The DDx in the app is deemed a pragmatic model that more closely aligns with real-world clinical practice and was validated using actual clinical cases. The beta-testing of the app demonstrated well-accepted functionality and usability of this syncope CDS tool. Conclusions: The MISSION Syncope app development integrated the current literature and clinical expertise to provide an evidence-based DDx, a prognosis using a validated scoring system, and recommendations based on clinical guidelines. This app demonstrates the importance of using research literature in the development of a CDS tool and applying clinical experience to fill the gaps in available research. It is essential for a successful app to be deliberate in pursuing a practical clinical model instead of striving for a perfect mathematical model, given available published evidence. This hybrid methodology can be applied to similar CDS tool development. %M 34783669 %R 10.2196/25192 %U https://www.jmir.org/2021/11/e25192 %U https://doi.org/10.2196/25192 %U http://www.ncbi.nlm.nih.gov/pubmed/34783669 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e29241 %T A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study %A McKenzie,Jordan %A Rajapakshe,Rasika %A Shen,Hua %A Rajapakshe,Shan %A Lin,Angela %+ Radiation Oncology, BC Cancer, 399 Royal Avenue, Kelowna, BC, V1Y 5L3, Canada, 1 250 712 3979, angela.lin@bccancer.bc.ca %K chart review %K natural language processing %K text extraction %K radiation pneumonitis %K lung cancer %K radiation therapy %K python %K electronic medical record %K accuracy %D 2021 %7 12.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Health research frequently requires manual chart reviews to identify patients in a study-specific cohort and examine their clinical outcomes. Manual chart review is a labor-intensive process that requires significant time investment for clinical researchers. Objective: This study aims to evaluate the feasibility and accuracy of an assisted chart review program, using an in-house rule-based text-extraction program written in Python, to identify patients who developed radiation pneumonitis (RP) after receiving curative radiotherapy. Methods: A retrospective manual chart review was completed for patients who received curative radiotherapy for stage 2-3 lung cancer from January 1, 2013 to December 31, 2015, at British Columbia Cancer, Kelowna Centre. In the manual chart review, RP diagnosis and grading were recorded using the Common Terminology Criteria for Adverse Events version 5.0. From the charts of 50 sample patients, a total of 1413 clinical documents were obtained for review from the electronic medical record system. The text-extraction program was built using the Natural Language Toolkit Python platform (and regular expressions, also known as RegEx). Python version 3.7.2 was used to run the text-extraction program. The output of the text-extraction program was a list of the full sentences containing the key terms, document IDs, and dates from which these sentences were extracted. The results from the manual review were used as the gold standard in this study, with which the results of the text-extraction program were compared. Results: Fifty percent (25/50) of the sample patients developed grade ≥1 RP; the natural language processing program was able to ascertain 92% (23/25) of these patients (sensitivity 0.92, 95% CI 0.74-0.99; specificity 0.36, 95% CI 0.18-0.57). Furthermore, the text-extraction program was able to correctly identify all 9 patients with grade ≥2 RP, which are patients with clinically significant symptoms (sensitivity 1.0, 95% CI 0.66-1.0; specificity 0.27, 95% CI 0.14-0.43). The program was useful for distinguishing patients with RP from those without RP. The text-extraction program in this study avoided unnecessary manual review of 22% (11/50) of the sample patients, as these patients were identified as grade 0 RP and would not require further manual review in subsequent studies. Conclusions: This feasibility study showed that the text-extraction program was able to assist with the identification of patients who developed RP after curative radiotherapy. The program streamlines the manual chart review further by identifying the key sentences of interest. This work has the potential to improve future clinical research, as the text-extraction program shows promise in performing chart review in a more time-efficient manner, compared with the traditional labor-intensive manual chart review. %M 34766919 %R 10.2196/29241 %U https://medinform.jmir.org/2021/11/e29241 %U https://doi.org/10.2196/29241 %U http://www.ncbi.nlm.nih.gov/pubmed/34766919 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27414 %T Accuracy of Using Generative Adversarial Networks for Glaucoma Detection: Systematic Review and Bibliometric Analysis %A Saeed,Ali Q %A Sheikh Abdullah,Siti Norul Huda %A Che-Hamzah,Jemaima %A Abdul Ghani,Ahmad Tarmizi %+ Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi Street, Bangi, Selangor, 43600, Malaysia, 60 7740870504, ali.qasim@ntu.edu.iq %K glaucoma %K generative adversarial network %K deep learning %K systematic literature review %K retinal disease %K blood vessels %K optic disc %D 2021 %7 21.9.2021 %9 Review %J J Med Internet Res %G English %X Background: Glaucoma leads to irreversible blindness. Globally, it is the second most common retinal disease that leads to blindness, slightly less common than cataracts. Therefore, there is a great need to avoid the silent growth of this disease using recently developed generative adversarial networks (GANs). Objective: This paper aims to introduce a GAN technology for the diagnosis of eye disorders, particularly glaucoma. This paper illustrates deep adversarial learning as a potential diagnostic tool and the challenges involved in its implementation. This study describes and analyzes many of the pitfalls and problems that researchers will need to overcome to implement this kind of technology. Methods: To organize this review comprehensively, articles and reviews were collected using the following keywords: (“Glaucoma,” “optic disc,” “blood vessels”) and (“receptive field,” “loss function,” “GAN,” “Generative Adversarial Network,” “Deep learning,” “CNN,” “convolutional neural network” OR encoder). The records were identified from 5 highly reputed databases: IEEE Xplore, Web of Science, Scopus, ScienceDirect, and PubMed. These libraries broadly cover the technical and medical literature. Publications within the last 5 years, specifically 2015-2020, were included because the target GAN technique was invented only in 2014 and the publishing date of the collected papers was not earlier than 2016. Duplicate records were removed, and irrelevant titles and abstracts were excluded. In addition, we excluded papers that used optical coherence tomography and visual field images, except for those with 2D images. A large-scale systematic analysis was performed, and then a summarized taxonomy was generated. Furthermore, the results of the collected articles were summarized and a visual representation of the results was presented on a T-shaped matrix diagram. This study was conducted between March 2020 and November 2020. Results: We found 59 articles after conducting a comprehensive survey of the literature. Among the 59 articles, 30 present actual attempts to synthesize images and provide accurate segmentation/classification using single/multiple landmarks or share certain experiences. The other 29 articles discuss the recent advances in GANs, do practical experiments, and contain analytical studies of retinal disease. Conclusions: Recent deep learning techniques, namely GANs, have shown encouraging performance in retinal disease detection. Although this methodology involves an extensive computing budget and optimization process, it saturates the greedy nature of deep learning techniques by synthesizing images and solves major medical issues. This paper contributes to this research field by offering a thorough analysis of existing works, highlighting current limitations, and suggesting alternatives to support other researchers and participants in further improving and strengthening future work. Finally, new directions for this research have been identified. %M 34236992 %R 10.2196/27414 %U https://www.jmir.org/2021/9/e27414 %U https://doi.org/10.2196/27414 %U http://www.ncbi.nlm.nih.gov/pubmed/34236992 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 9 %P e28378 %T Diagnostic Accuracy of Smartphone-Based Audiometry for Hearing Loss Detection: Meta-analysis %A Chen,Chih-Hao %A Lin,Heng-Yu Haley %A Wang,Mao-Che %A Chu,Yuan-Chia %A Chang,Chun-Yu %A Huang,Chii-Yuan %A Cheng,Yen-Fu %+ Department of Otolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital, Taiwan, No.201, Sec. 2, Shipai Rd., Beitou District, Taipei City, 11217, Taiwan, 886 2 2871 2121 ext 1292, yfcheng2@vghtpe.gov.tw %K audiometry %K hearing loss %K hearing test %K mhealth %K mobile health %K digital health %K meta-analysis %K mobile phone %K smartphone diagnostic test accuracy %D 2021 %7 10.9.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Hearing loss is one of the most common disabilities worldwide and affects both individual and public health. Pure tone audiometry (PTA) is the gold standard for hearing assessment, but it is often not available in many settings, given its high cost and demand for human resources. Smartphone-based audiometry may be equally effective and can improve access to adequate hearing evaluations. Objective: The aim of this systematic review is to synthesize the current evidence of the role of smartphone-based audiometry in hearing assessments and further explore the factors that influence its diagnostic accuracy. Methods: Five databases—PubMed, Embase, Cochrane Library, Web of Science, and Scopus—were queried to identify original studies that examined the diagnostic accuracy of hearing loss measurement using smartphone-based devices with conventional PTA as a reference test. A bivariate random-effects meta-analysis was performed to estimate the pooled sensitivity and specificity. The factors associated with diagnostic accuracy were identified using a bivariate meta-regression model. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Results: In all, 25 studies with a total of 4470 patients were included in the meta-analysis. The overall sensitivity, specificity, and area under the receiver operating characteristic curve for smartphone-based audiometry were 89% (95% CI 83%-93%), 93% (95% CI 87%-97%), and 0.96 (95% CI 0.93-0.97), respectively; the corresponding values for the smartphone-based speech recognition test were 91% (95% CI 86%-94%), 88% (95% CI 75%-94%), and 0.93 (95% CI 0.90-0.95), respectively. Meta-regression analysis revealed that patient age, equipment used, and the presence of soundproof booths were significantly related to diagnostic accuracy. Conclusions: We have presented comprehensive evidence regarding the effectiveness of smartphone-based tests in diagnosing hearing loss. Smartphone-based audiometry may serve as an accurate and accessible approach to hearing evaluations, especially in settings where conventional PTA is unavailable. %M 34515644 %R 10.2196/28378 %U https://mhealth.jmir.org/2021/9/e28378/ %U https://doi.org/10.2196/28378 %U http://www.ncbi.nlm.nih.gov/pubmed/34515644 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e27247 %T Lateralization and Bodily Patterns of Segmental Signs and Spontaneous Pain in Acute Visceral Disease: Observational Study %A Shaballout,Nour %A Aloumar,Anas %A Manuel,Jorge %A May,Marcus %A Beissner,Florian %+ Insula Institute for Integrative Therapy Research, Brabeckstraße 177e, Hannover, 30539, Germany, 49 16095543423, f.beissner@insula-institut.org %K digital pain drawings %K visceral referred pain %K referred pain %K head zones %K mydriasis %K chest pain %K clinical examination %K differential diagnosis %K digital health %K digital drawings %K pain %K health technology %K image analysis %D 2021 %7 27.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The differential diagnosis of acute visceral diseases is a challenging clinical problem. Older literature suggests that patients with acute visceral problems show segmental signs such as hyperalgesia, skin resistance, or muscular defense as manifestations of referred visceral pain in somatic or visceral tissues with overlapping segmental innervation. According to these sources, the lateralization and segmental distribution of such signs may be used for differential diagnosis. Segmental signs and symptoms may be accompanied by spontaneous (visceral) pain, which, however, shows a nonsegmental distribution. Objective: This study aimed to investigate the lateralization (ie, localization on one side of the body, in preference to the other) and segmental distribution (ie, surface ratio of the affected segments) of spontaneous pain and (referred) segmental signs in acute visceral diseases using digital pain drawing technology. Methods: We recruited 208 emergency room patients that were presenting for acute medical problems considered by triage as related to internal organ disease. All patients underwent a structured 10-minute bodily examination to test for various segmental signs and spontaneous visceral pain. They were further asked their segmental symptoms such as nausea, meteorism, and urinary retention. We collected spontaneous pain and segmental signs as digital drawings and segmental symptoms as binary values on a tablet PC. After the final diagnosis, patients were divided into groups according to the organ affected. Using statistical image analysis, we calculated mean distributions of pain and segmental signs for the heart, lungs, stomach, liver/gallbladder, and kidneys/ureters, analyzing the segmental distribution of these signs and the lateralization. Results: Of the 208 recruited patients, 110 (52.9%) were later diagnosed with a single-organ problem. These recruited patients had a mean age of 57.3 (SD 17.2) years, and 40.9% (85/208) were female. Of these 110 patients, 85 (77.3%) reported spontaneous visceral pain. Of the 110, 81 (73.6%) had at least 1 segmental sign, and the most frequent signs were hyperalgesia (46/81, 57%), and muscle resistance (39/81, 48%). While pain was distributed along the body midline, segmental signs for the heart, stomach, and liver/gallbladder appeared mostly ipsilateral to the affected organ. An unexpectedly high number of patients (37/110, 33.6%) further showed ipsilateral mydriasis. Conclusions: This study underlines the usefulness of including digitally recorded segmental signs in bodily examinations of patients with acute medical problems. %M 34448718 %R 10.2196/27247 %U https://www.jmir.org/2021/8/e27247 %U https://doi.org/10.2196/27247 %U http://www.ncbi.nlm.nih.gov/pubmed/34448718 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 8 %P e25290 %T Screening Diabetic Retinopathy Using an Automated Retinal Image Analysis System in Independent and Assistive Use Cases in Mexico: Randomized Controlled Trial %A Noriega,Alejandro %A Meizner,Daniela %A Camacho,Dalia %A Enciso,Jennifer %A Quiroz-Mercado,Hugo %A Morales-Canton,Virgilio %A Almaatouq,Abdullah %A Pentland,Alex %+ Prosperia Salud, 58D Secretaria de Marina 1206, Lomas del Chamizal, Mexico City, 05129, Mexico, 52 617 982 47, noriega@mit.edu %K diabetic retinopathy %K automated diagnosis %K retina %K fundus image analysis %D 2021 %7 26.8.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: The automated screening of patients at risk of developing diabetic retinopathy represents an opportunity to improve their midterm outcome and lower the public expenditure associated with direct and indirect costs of common sight-threatening complications of diabetes. Objective: This study aimed to develop and evaluate the performance of an automated deep learning–based system to classify retinal fundus images as referable and nonreferable diabetic retinopathy cases, from international and Mexican patients. In particular, we aimed to evaluate the performance of the automated retina image analysis (ARIA) system under an independent scheme (ie, only ARIA screening) and 2 assistive schemes (ie, hybrid ARIA plus ophthalmologist screening), using a web-based platform for remote image analysis to determine and compare the sensibility and specificity of the 3 schemes. Methods: A randomized controlled experiment was performed where 17 ophthalmologists were asked to classify a series of retinal fundus images under 3 different conditions. The conditions were to (1) screen the fundus image by themselves (solo); (2) screen the fundus image after exposure to the retina image classification of the ARIA system (ARIA answer); and (3) screen the fundus image after exposure to the classification of the ARIA system, as well as its level of confidence and an attention map highlighting the most important areas of interest in the image according to the ARIA system (ARIA explanation). The ophthalmologists’ classification in each condition and the result from the ARIA system were compared against a gold standard generated by consulting and aggregating the opinion of 3 retina specialists for each fundus image. Results: The ARIA system was able to classify referable vs nonreferable cases with an area under the receiver operating characteristic curve of 98%, a sensitivity of 95.1%, and a specificity of 91.5% for international patient cases. There was an area under the receiver operating characteristic curve of 98.3%, a sensitivity of 95.2%, and a specificity of 90% for Mexican patient cases. The ARIA system performance was more successful than the average performance of the 17 ophthalmologists enrolled in the study. Additionally, the results suggest that the ARIA system can be useful as an assistive tool, as sensitivity was significantly higher in the experimental condition where ophthalmologists were exposed to the ARIA system’s answer prior to their own classification (93.3%), compared with the sensitivity of the condition where participants assessed the images independently (87.3%; P=.05). Conclusions: These results demonstrate that both independent and assistive use cases of the ARIA system present, for Latin American countries such as Mexico, a substantial opportunity toward expanding the monitoring capacity for the early detection of diabetes-related blindness. %M 34435963 %R 10.2196/25290 %U https://formative.jmir.org/2021/8/e25290 %U https://doi.org/10.2196/25290 %U http://www.ncbi.nlm.nih.gov/pubmed/34435963 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e28266 %T Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis %A Kummer,Benjamin %A Shakir,Lubaina %A Kwon,Rachel %A Habboushe,Joseph %A Jetté,Nathalie %+ Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave Levy Pl, Box 1137, New York, NY, 10029, United States, 1 2122415050, benjamin.kummer@mountsinai.org %K medical informatics %K clinical informatics %K mhealth %K digital health %K cerebrovascular disease %K medical calculators %K health information %K health information technology %K information technology %K economic health %K clinical health %K electronic health records %D 2021 %7 2.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app–based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc’s calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5% of total and 32% of stroke-related page views), the Mean Arterial Pressure calculator (2.4% of total and 14.0% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9% of total and 11.4% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7% of total and 10.1% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4% of total and 8.1% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7%-91.2% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. %M 34338647 %R 10.2196/28266 %U https://medinform.jmir.org/2021/8/e28266 %U https://doi.org/10.2196/28266 %U http://www.ncbi.nlm.nih.gov/pubmed/34338647 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e23863 %T Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis %A Wu,Jo-Hsuan %A Liu,T Y Alvin %A Hsu,Wan-Ting %A Ho,Jennifer Hui-Chun %A Lee,Chien-Chang %+ Department of Emergency Medicine, National Taiwan University Hospital, No 7, Chung-Shan South Road, Taipei, 100, Taiwan, 886 2 23123456 ext 63485, hit3transparency@gmail.com %K machine learning %K diabetic retinopathy %K diabetes %K deep learning %K neural network %K diagnostic accuracy %D 2021 %7 5.7.2021 %9 Review %J J Med Internet Res %G English %X Background: Diabetic retinopathy (DR), whose standard diagnosis is performed by human experts, has high prevalence and requires a more efficient screening method. Although machine learning (ML)–based automated DR diagnosis has gained attention due to recent approval of IDx-DR, performance of this tool has not been examined systematically, and the best ML technique for use in a real-world setting has not been discussed. Objective: The aim of this study was to systematically examine the overall diagnostic accuracy of ML in diagnosing DR of different categories based on color fundus photographs and to determine the state-of-the-art ML approach. Methods: Published studies in PubMed and EMBASE were searched from inception to June 2020. Studies were screened for relevant outcomes, publication types, and data sufficiency, and a total of 60 out of 2128 (2.82%) studies were retrieved after study selection. Extraction of data was performed by 2 authors according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and the quality assessment was performed according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). Meta-analysis of diagnostic accuracy was pooled using a bivariate random effects model. The main outcomes included diagnostic accuracy, sensitivity, and specificity of ML in diagnosing DR based on color fundus photographs, as well as the performances of different major types of ML algorithms. Results: The primary meta-analysis included 60 color fundus photograph studies (445,175 interpretations). Overall, ML demonstrated high accuracy in diagnosing DR of various categories, with a pooled area under the receiver operating characteristic (AUROC) ranging from 0.97 (95% CI 0.96-0.99) to 0.99 (95% CI 0.98-1.00). The performance of ML in detecting more-than-mild DR was robust (sensitivity 0.95; AUROC 0.97), and by subgroup analyses, we observed that robust performance of ML was not limited to benchmark data sets (sensitivity 0.92; AUROC 0.96) but could be generalized to images collected in clinical practice (sensitivity 0.97; AUROC 0.97). Neural network was the most widely used method, and the subgroup analysis revealed a pooled AUROC of 0.98 (95% CI 0.96-0.99) for studies that used neural networks to diagnose more-than-mild DR. Conclusions: This meta-analysis demonstrated high diagnostic accuracy of ML algorithms in detecting DR on color fundus photographs, suggesting that state-of-the-art, ML-based DR screening algorithms are likely ready for clinical applications. However, a significant portion of the earlier published studies had methodology flaws, such as the lack of external validation and presence of spectrum bias. The results of these studies should be interpreted with caution. %M 34407500 %R 10.2196/23863 %U https://www.jmir.org/2021/7/e23863 %U https://doi.org/10.2196/23863 %U http://www.ncbi.nlm.nih.gov/pubmed/34407500 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25247 %T Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study %A Hu,Hao-Chun %A Chang,Shyue-Yih %A Wang,Chuen-Heng %A Li,Kai-Jun %A Cho,Hsiao-Yun %A Chen,Yi-Ting %A Lu,Chang-Jung %A Tsai,Tzu-Pei %A Lee,Oscar Kuang-Sheng %+ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, No 155, Section 2, Li-Nong Street, Beitou District, Taipei, 11221, Taiwan, 886 2 28757391, oscarlee9203@gmail.com %K artificial intelligence %K convolutional neural network %K dysphonia %K pathological voice %K vocal fold disease %K voice pathology identification %D 2021 %7 8.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies. %M 34100770 %R 10.2196/25247 %U https://www.jmir.org/2021/6/e25247 %U https://doi.org/10.2196/25247 %U http://www.ncbi.nlm.nih.gov/pubmed/34100770 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e22664 %T Neural Network–Based Retinal Nerve Fiber Layer Profile Compensation for Glaucoma Diagnosis in Myopia: Model Development and Validation %A Li,Lei %A Zhu,Haogang %A Zhang,Zhenyu %A Zhao,Liang %A Xu,Liang %A Jonas,Rahul A %A Garway-Heath,David F %A Jonas,Jost B %A Wang,Ya Xing %+ Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital University of Medical Science, Beijing Ophthalmology and Visual Sciences Key Laboratory, 17 Hougou Lane, Beijing, 100005, China, 86 18600059315, yaxingw@gmail.com %K retinal nerve fiber layer thickness %K radial basis neural network %K neural network %K glaucoma %K optic nerve head %K optical coherence tomography %K myopia %K optic nerve %D 2021 %7 18.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Due to the axial elongation–associated changes in the optic nerve and retina in high myopia, traditional methods like optic disc evaluation and visual field are not able to correctly differentiate glaucomatous lesions. It has been clinically challenging to detect glaucoma in highly myopic eyes. Objective: This study aimed to develop a neural network to adjust for the dependence of the peripapillary retinal nerve fiber layer (RNFL) thickness (RNFLT) profile on age, gender, and ocular biometric parameters and to evaluate the network’s performance for glaucoma diagnosis, especially in high myopia. Methods: RNFLT with 768 points on the circumferential 3.4-mm scan was measured using spectral-domain optical coherence tomography. A fully connected network and a radial basis function network were trained for vertical (scaling) and horizontal (shift) transformation of the RNFLT profile with adjustment for age, axial length (AL), disc-fovea angle, and distance in a test group of 2223 nonglaucomatous eyes. The performance of RNFLT compensation was evaluated in an independent group of 254 glaucoma patients and 254 nonglaucomatous participants. Results: By applying the RNFL compensation algorithm, the area under the receiver operating characteristic curve for detecting glaucoma increased from 0.70 to 0.84, from 0.75 to 0.89, from 0.77 to 0.89, and from 0.78 to 0.87 for eyes in the highest 10% percentile subgroup of the AL distribution (mean 26.0, SD 0.9 mm), highest 20% percentile subgroup of the AL distribution (mean 25.3, SD 1.0 mm), highest 30% percentile subgroup of the AL distribution (mean 24.9, SD 1.0 mm), and any AL (mean 23.5, SD 1.2 mm), respectively, in comparison with unadjusted RNFLT. The difference between uncompensated and compensated RNFLT values increased with longer axial length, with enlargement of 19.8%, 18.9%, 16.2%, and 11.3% in the highest 10% percentile subgroup, highest 20% percentile subgroup, highest 30% percentile subgroup, and all eyes, respectively. Conclusions: In a population-based study sample, an algorithm-based adjustment for age, gender, and ocular biometric parameters improved the diagnostic precision of the RNFLT profile for glaucoma detection particularly in myopic and highly myopic eyes. %M 34003137 %R 10.2196/22664 %U https://medinform.jmir.org/2021/5/e22664 %U https://doi.org/10.2196/22664 %U http://www.ncbi.nlm.nih.gov/pubmed/34003137 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25884 %T Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development %A Aktar,Sakifa %A Ahamad,Md Martuza %A Rashed-Al-Mahfuz,Md %A Azad,AKM %A Uddin,Shahadat %A Kamal,AHM %A Alyami,Salem A %A Lin,Ping-I %A Islam,Sheikh Mohammed Shariful %A Quinn,Julian MW %A Eapen,Valsamma %A Moni,Mohammad Ali %+ WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, University of New South Wales, Kensington, Sydney, NSW 2052, Australia, 61 414701759, m.moni@unsw.edu.au %K COVID-19 %K blood samples %K machine learning %K statistical analysis %K prediction %K severity %K mortality %K morbidity %K risk %K blood %K testing %K outcome %K data set %D 2021 %7 13.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective: Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods: We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results: Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19–positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90% for disease severity prediction. Conclusions: We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment. %M 33779565 %R 10.2196/25884 %U https://medinform.jmir.org/2021/4/e25884 %U https://doi.org/10.2196/25884 %U http://www.ncbi.nlm.nih.gov/pubmed/33779565 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 3 %P e19408 %T e-Learning for Instruction and to Improve Reproducibility of Scoring Tumor-Stroma Ratio in Colon Carcinoma: Performance and Reproducibility Assessment in the UNITED Study %A Smit,Marloes A %A van Pelt,Gabi W %A Dequeker,Elisabeth MC %A Al Dieri,Raed %A Tollenaar,Rob AEM %A van Krieken,J Han JM %A Mesker,Wilma E %A , %+ Department of Surgery, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZA, Netherlands, 31 715264005, w.e.mesker@lumc.nl %K colon cancer %K tumor-stroma ratio %K validation %K e-Learning %K reproducibility study %K cancer %K tumor %K colon %K reproducibility %K carcinoma %K prognosis %K diagnostic %K implementation %K online learning %D 2021 %7 19.3.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: The amount of stroma in the primary tumor is an important prognostic parameter. The tumor-stroma ratio (TSR) was previously validated by international research groups as a robust parameter with good interobserver agreement. Objective: The Uniform Noting for International Application of the Tumor-Stroma Ratio as an Easy Diagnostic Tool (UNITED) study was developed to bring the TSR to clinical implementation. As part of the study, an e-Learning module was constructed to confirm the reproducibility of scoring the TSR after proper instruction. Methods: The e-Learning module consists of an autoinstruction for TSR determination (instruction video or written protocol) and three sets of 40 cases (training, test, and repetition sets). Scoring the TSR is performed on hematoxylin and eosin–stained sections and takes only 1-2 minutes. Cases are considered stroma-low if the amount of stroma is ≤50%, whereas a stroma-high case is defined as >50% stroma. Inter- and intraobserver agreements were determined based on the Cohen κ score after each set to evaluate the reproducibility. Results: Pathologists and pathology residents (N=63) with special interest in colorectal cancer participated in the e-Learning. Forty-nine participants started the e-Learning and 31 (63%) finished the whole cycle (3 sets). A significant improvement was observed from the training set to the test set; the median κ score improved from 0.72 to 0.77 (P=.002). Conclusions: e-Learning is an effective method to instruct pathologists and pathology residents for scoring the TSR. The reliability of scoring improved from the training to the test set and did not fall back with the repetition set, confirming the reproducibility of the TSR scoring method. Trial Registration: The Netherlands Trial Registry NTR7270; https://www.trialregister.nl/trial/7072 International Registered Report Identifier (IRRID): RR2-10.2196/13464 %M 33739293 %R 10.2196/19408 %U https://formative.jmir.org/2021/3/e19408 %U https://doi.org/10.2196/19408 %U http://www.ncbi.nlm.nih.gov/pubmed/33739293 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e23456 %T Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study %A Ridgway,Jessica P %A Uvin,Arno %A Schmitt,Jessica %A Oliwa,Tomasz %A Almirol,Ellen %A Devlin,Samantha %A Schneider,John %+ Department of Medicine, University of Chicago, 5841 S Maryland Ave, MC 5065, Chicago, IL, 60637, United States, 1 7737029185, jessica.ridgway@uchospitals.edu %K natural language processing %K HIV %K substance use %K mental illness %K electronic medical records %D 2021 %7 10.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Mental illness and substance use are prevalent among people living with HIV and often lead to poor health outcomes. Electronic medical record (EMR) data are increasingly being utilized for HIV-related clinical research and care, but mental illness and substance use are often underdocumented in structured EMR fields. Natural language processing (NLP) of unstructured text of clinical notes in the EMR may more accurately identify mental illness and substance use among people living with HIV than structured EMR fields alone. Objective: The aim of this study was to utilize NLP of clinical notes to detect mental illness and substance use among people living with HIV and to determine how often these factors are documented in structured EMR fields. Methods: We collected both structured EMR data (diagnosis codes, social history, Problem List) as well as the unstructured text of clinical HIV care notes for adults living with HIV. We developed NLP algorithms to identify words and phrases associated with mental illness and substance use in the clinical notes. The algorithms were validated based on chart review. We compared numbers of patients with documentation of mental illness or substance use identified by structured EMR fields with those identified by the NLP algorithms. Results: The NLP algorithm for detecting mental illness had a positive predictive value (PPV) of 98% and a negative predictive value (NPV) of 98%. The NLP algorithm for detecting substance use had a PPV of 92% and an NPV of 98%. The NLP algorithm for mental illness identified 54.0% (420/778) of patients as having documentation of mental illness in the text of clinical notes. Among the patients with mental illness detected by NLP, 58.6% (246/420) had documentation of mental illness in at least one structured EMR field. Sixty-three patients had documentation of mental illness in structured EMR fields that was not detected by NLP of clinical notes. The NLP algorithm for substance use detected substance use in the text of clinical notes in 18.1% (141/778) of patients. Among patients with substance use detected by NLP, 73.8% (104/141) had documentation of substance use in at least one structured EMR field. Seventy-six patients had documentation of substance use in structured EMR fields that was not detected by NLP of clinical notes. Conclusions: Among patients in an urban HIV care clinic, NLP of clinical notes identified high rates of mental illness and substance use that were often not documented in structured EMR fields. This finding has important implications for epidemiologic research and clinical care for people living with HIV. %M 33688848 %R 10.2196/23456 %U https://medinform.jmir.org/2021/3/e23456 %U https://doi.org/10.2196/23456 %U http://www.ncbi.nlm.nih.gov/pubmed/33688848 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 2 %P e24061 %T Designing a Personalized Health Dashboard: Interdisciplinary and Participatory Approach %A Weijers,Miriam %A Bastiaenen,Caroline %A Feron,Frans %A Schröder,Kay %+ Department of Social Medicine, Faculty of Health, Medicine and Life Sciences, Care and Public Health Research Institute, Maastricht University, Postbus 616, Maastricht, 6200 MD, Netherlands, 31 +31646442957, miriam.weijers@ggdzl.nl %K visualization design model %K dashboard %K evaluation %K personalized health care %K International Classification of Functioning, Disability and Health (ICF) %K patient access to records %K human–computer interaction %K health information visualization %D 2021 %7 9.2.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Within the Dutch Child Health Care (CHC), an online tool (360° CHILD-profile) is designed to enhance prevention and transformation toward personalized health care. From a personalized preventive perspective, it is of fundamental importance to timely identify children with emerging health problems interrelated to multiple health determinants. While digitalization of children’s health data is now realized, the accessibility of data remains a major challenge for CHC professionals, let alone for parents/youth. Therefore, the idea was initiated from CHC practice to develop a novel approach to make relevant information accessible at a glance. Objective: This paper describes the stepwise development of a dashboard, as an example of using a design model to achieve visualization of a comprehensive overview of theoretically structured health data. Methods: Developmental process is based on the nested design model with involvement of relevant stakeholders in a real-life context. This model considers immediate upstream validation within 4 cascading design levels: Domain Problem and Data Characterization, Operation and Data Type Abstraction, Visual Encoding and Interaction Design, and Algorithm Design. This model also includes impact-oriented downstream validation, which can be initiated after delivering the prototype. Results: A comprehensible 360° CHILD-profile is developed: an online accessible visualization of CHC data based on the theoretical concept of the International Classification of Functioning, Disability and Health. This dashboard provides caregivers and parents/youth with a holistic view on children’s health and “entry points” for preventive, individualized health plans. Conclusions: Describing this developmental process offers guidance on how to utilize the nested design model within a health care context. %M 33560229 %R 10.2196/24061 %U https://formative.jmir.org/2021/2/e24061 %U https://doi.org/10.2196/24061 %U http://www.ncbi.nlm.nih.gov/pubmed/33560229 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 2 %P e25184 %T Preliminary Screening for Hereditary Breast and Ovarian Cancer Using a Chatbot Augmented Intelligence Genetic Counselor: Development and Feasibility Study %A Sato,Ann %A Haneda,Eri %A Suganuma,Nobuyasu %A Narimatsu,Hiroto %+ Department of Genetic Medicine, Kanagawa Cancer Center, 2-3-2 Nakao, Asahi-ku, Yokohama, Kanagawa, 241-8515, Japan, 81 045 520 2222, hiroto-narimatsu@umin.org %K artificial intelligence %K augmented intelligence %K hereditary cancer %K familial cancer %K IBM Watson %K preliminary screening %K cancer %K genetics %K chatbot %K screening %K feasibility %D 2021 %7 5.2.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Breast cancer is the most common form of cancer in Japan; genetic background and hereditary breast and ovarian cancer (HBOC) are implicated. The key to HBOC diagnosis involves screening to identify high-risk individuals. However, genetic medicine is still developing; thus, many patients who may potentially benefit from genetic medicine have not yet been identified. Objective: This study’s objective is to develop a chatbot system that uses augmented intelligence for HBOC screening to determine whether patients meet the National Comprehensive Cancer Network (NCCN) BRCA1/2 testing criteria. Methods: The system was evaluated by a doctor specializing in genetic medicine and certified genetic counselors. We prepared 3 scenarios and created a conversation with the chatbot to reflect each one. Then we evaluated chatbot feasibility, the required time, the medical accuracy of conversations and family history, and the final result. Results: The times required for the conversation were 7 minutes for scenario 1, 15 minutes for scenario 2, and 16 minutes for scenario 3. Scenarios 1 and 2 met the BRCA1/2 testing criteria, but scenario 3 did not, and this result was consistent with the findings of 3 experts who retrospectively reviewed conversations with the chatbot according to the 3 scenarios. A family history comparison ascertained by the chatbot with the actual scenarios revealed that each result was consistent with each scenario. From a genetic medicine perspective, no errors were noted by the 3 experts. Conclusions: This study demonstrated that chatbot systems could be applied to preliminary genetic medicine screening for HBOC. %M 33544084 %R 10.2196/25184 %U https://formative.jmir.org/2021/2/e25184 %U https://doi.org/10.2196/25184 %U http://www.ncbi.nlm.nih.gov/pubmed/33544084 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 2 %P e18837 %T Comparison of Endoscopy First and Laparoscopic Cholecystectomy First Strategies for Patients With Gallstone Disease and Intermediate Risk of Choledocholithiasis: Protocol for a Clinical Randomized Controlled Trial %A Aleknaite,Ausra %A Simutis,Gintaras %A Stanaitis,Juozas %A Jucaitis,Tomas %A Drungilas,Mantas %A Valantinas,Jonas %A Strupas,Kestutis %+ Center of Hepatology, Gastroenterology and Dietetics, Vilnius University Hospital Santaros Klinikos, Santariskiu 2, Vilnius, 08406, Lithuania, 370 61818076, ausra.aleknaite@santa.lt %K choledocholithiasis %K endoscopic ultrasound %K intraoperative cholangiography %K common bile duct stone %K endoscopic retrograde cholangiopancreatography %K laparoscopic cholecystectomy %D 2021 %7 4.2.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: The optimal approach for patients with gallbladder stones and intermediate risk of choledocholithiasis remains undetermined. The use of endoscopic retrograde cholangiopancreatography for diagnosis should be minimized as it carries considerable risk of postprocedural complications, and nowadays, less invasive and safer techniques are available. Objective: This study compares the two management strategies of endoscopic ultrasound before laparoscopic cholecystectomy and intraoperative cholangiography for patients with symptomatic cholecystolithiasis and intermediate risk of choledocholithiasis. Methods: This is a randomized, active-controlled, single-center clinical trial enrolling adult patients undergoing laparoscopic cholecystectomy for symptomatic gallbladder stones with intermediate risk of choledocholithiasis. The risk of choledocholithiasis is calculated using an original prognostic score (the Vilnius University Hospital Index). This index in a retrospective evaluation showed better prognostic performance than the score proposed by the American Society for Gastrointestinal Endoscopy in 2010. A total of 106 participants will be included and randomized into two groups. Evaluation of bile ducts using endoscopic ultrasound and endoscopic retrograde cholangiography on demand will be performed before laparoscopic cholecystectomy for one arm (“endoscopy first”). Intraoperative cholangiography during laparoscopic cholecystectomy and postoperative endoscopic retrograde cholangiopancreatography on demand will be performed in another arm (“cholecystectomy first”). Postoperative follow-up is 6 months. The primary endpoint is the length of hospital stay. The secondary endpoints are accuracy of the different management strategies, adverse events of the interventions, duct clearance and technical success of the interventions (intraoperative cholangiography, endoscopic ultrasound, and endoscopic retrograde cholangiography), and cost of treatment. Results: The trial protocol was approved by the Vilnius Regional Biomedical Research Ethics Committee in December 2017. Enrollment of patients was started in January 2018. As of June 2020, 66 patients have been enrolled. Conclusions: This trial is planned to determine the superior strategy for patients with intermediate risk of common bile duct stones and to define a simple and safe algorithm for managing choledocholithiasis. Trial Registration: ClinicalTrials.gov NCT03658863; https://clinicaltrials.gov/ct2/show/NCT03658863. International Registered Report Identifier (IRRID): DERR1-10.2196/18837 %M 33538700 %R 10.2196/18837 %U https://www.researchprotocols.org/2021/2/e18837 %U https://doi.org/10.2196/18837 %U http://www.ncbi.nlm.nih.gov/pubmed/33538700 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 1 %P e19739 %T An Application of Machine Learning to Etiological Diagnosis of Secondary Hypertension: Retrospective Study Using Electronic Medical Records %A Diao,Xiaolin %A Huo,Yanni %A Yan,Zhanzheng %A Wang,Haibin %A Yuan,Jing %A Wang,Yuxin %A Cai,Jun %A Zhao,Wei %+ Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, 167 Beilishi Road, Beijing, 100037, China, 86 1 333 119 2899, zw@fuwai.com %K secondary hypertension %K etiological diagnosis %K machine learning %K prediction model %D 2021 %7 25.1.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Secondary hypertension is a kind of hypertension with a definite etiology and may be cured. Patients with suspected secondary hypertension can benefit from timely detection and treatment and, conversely, will have a higher risk of morbidity and mortality than those with primary hypertension. Objective: The aim of this study was to develop and validate machine learning (ML) prediction models of common etiologies in patients with suspected secondary hypertension. Methods: The analyzed data set was retrospectively extracted from electronic medical records of patients discharged from Fuwai Hospital between January 1, 2016, and June 30, 2019. A total of 7532 unique patients were included and divided into 2 data sets by time: 6302 patients in 2016-2018 as the training data set for model building and 1230 patients in 2019 as the validation data set for further evaluation. Extreme Gradient Boosting (XGBoost) was adopted to develop 5 models to predict 4 etiologies of secondary hypertension and occurrence of any of them (named as composite outcome), including renovascular hypertension (RVH), primary aldosteronism (PA), thyroid dysfunction, and aortic stenosis. Both univariate logistic analysis and Gini Impurity were used for feature selection. Grid search and 10-fold cross-validation were used to select the optimal hyperparameters for each model. Results: Validation of the composite outcome prediction model showed good performance with an area under the receiver-operating characteristic curve (AUC) of 0.924 in the validation data set, while the 4 prediction models of RVH, PA, thyroid dysfunction, and aortic stenosis achieved AUC of 0.938, 0.965, 0.959, and 0.946, respectively, in the validation data set. A total of 79 clinical indicators were identified in all and finally used in our prediction models. The result of subgroup analysis on the composite outcome prediction model demonstrated high discrimination with AUCs all higher than 0.890 among all age groups of adults. Conclusions: The ML prediction models in this study showed good performance in detecting 4 etiologies of patients with suspected secondary hypertension; thus, they may potentially facilitate clinical diagnosis decision making of secondary hypertension in an intelligent way. %M 33492233 %R 10.2196/19739 %U http://medinform.jmir.org/2021/1/e19739/ %U https://doi.org/10.2196/19739 %U http://www.ncbi.nlm.nih.gov/pubmed/33492233 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 1 %P e25535 %T Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach %A Xu,Ming %A Ouyang,Liu %A Han,Lei %A Sun,Kai %A Yu,Tingting %A Li,Qian %A Tian,Hua %A Safarnejad,Lida %A Zhang,Hengdong %A Gao,Yue %A Bao,Forrest Sheng %A Chen,Yuanfang %A Robinson,Patrick %A Ge,Yaorong %A Zhu,Baoli %A Liu,Jie %A Chen,Shi %+ Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1277 Jiefang Avenue, Wuhan, China, 86 13469981699, liu_jie0823@163.com %K COVID-19 %K machine learning %K deep learning %K multimodal %K feature fusion %K biomedical imaging %K diagnosis support %K diagnosis %K imaging %K differentiation %K testing %K diagnostic %D 2021 %7 6.1.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19. Objective: We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection. Methods: In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants’ clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia. Results: Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%). Conclusions: Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study’s hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features. %M 33404516 %R 10.2196/25535 %U http://www.jmir.org/2021/1/e25535/ %U https://doi.org/10.2196/25535 %U http://www.ncbi.nlm.nih.gov/pubmed/33404516 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 1 %P e22637 %T Young Adults’ Perspectives on the Use of Symptom Checkers for Self-Triage and Self-Diagnosis: Qualitative Study %A Aboueid,Stephanie %A Meyer,Samantha %A Wallace,James R %A Mahajan,Shreya %A Chaurasia,Ashok %+ School of Public Health and Health Systems, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada, 966 0530468122, seaboueid@uwaterloo.ca %K self-assessment %K symptom checkers %K self-triage %K self-diagnosis %K young adults %K digital platforms %K internet %K user experience %K Google search %D 2021 %7 6.1.2021 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Young adults often browse the internet for self-triage and diagnosis. More sophisticated digital platforms such as symptom checkers have recently become pervasive; however, little is known about their use. Objective: The aim of this study was to understand young adults’ (18-34 years old) perspectives on the use of the Google search engine versus a symptom checker, as well as to identify the barriers and enablers for using a symptom checker for self-triage and self-diagnosis. Methods: A qualitative descriptive case study research design was used. Semistructured interviews were conducted with 24 young adults enrolled in a university in Ontario, Canada. All participants were given a clinical vignette and were asked to use a symptom checker (WebMD Symptom Checker or Babylon Health) while thinking out loud, and were asked questions regarding their experience. Interviews were audio-recorded, transcribed, and imported into the NVivo software program. Inductive thematic analysis was conducted independently by two researchers. Results: Using the Google search engine was perceived to be faster and more customizable (ie, ability to enter symptoms freely in the search engine) than a symptom checker; however, a symptom checker was perceived to be useful for a more personalized assessment. After having used a symptom checker, most of the participants believed that the platform needed improvement in the areas of accuracy, security and privacy, and medical jargon used. Given these limitations, most participants believed that symptom checkers could be more useful for self-triage than for self-diagnosis. Interestingly, more than half of the participants were not aware of symptom checkers prior to this study and most believed that this lack of awareness about the existence of symptom checkers hindered their use. Conclusions: Awareness related to the existence of symptom checkers and their integration into the health care system are required to maximize benefits related to these platforms. Addressing the barriers identified in this study is likely to increase the acceptance and use of symptom checkers by young adults. %M 33404515 %R 10.2196/22637 %U https://publichealth.jmir.org/2021/1/e22637 %U https://doi.org/10.2196/22637 %U http://www.ncbi.nlm.nih.gov/pubmed/33404515 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e21983 %T Artificial Intelligence for the Prediction of Helicobacter Pylori Infection in Endoscopic Images: Systematic Review and Meta-Analysis Of Diagnostic Test Accuracy %A Bang,Chang Seok %A Lee,Jae Jun %A Baik,Gwang Ho %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, , Republic of Korea, 82 33 240 5000, csbang@hallym.ac.kr %K artificial intelligence %K convolutional neural network %K deep learning %K machine learning %K endoscopy %K Helicobacter pylori %D 2020 %7 16.9.2020 %9 Review %J J Med Internet Res %G English %X Background: Helicobacter pylori plays a central role in the development of gastric cancer, and prediction of H pylori infection by visual inspection of the gastric mucosa is an important function of endoscopy. However, there are currently no established methods of optical diagnosis of H pylori infection using endoscopic images. Definitive diagnosis requires endoscopic biopsy. Artificial intelligence (AI) has been increasingly adopted in clinical practice, especially for image recognition and classification. Objective: This study aimed to evaluate the diagnostic test accuracy of AI for the prediction of H pylori infection using endoscopic images. Methods: Two independent evaluators searched core databases. The inclusion criteria included studies with endoscopic images of H pylori infection and with application of AI for the prediction of H pylori infection presenting diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Ultimately, 8 studies were identified. Pooled sensitivity, specificity, diagnostic odds ratio, and area under the curve of AI for the prediction of H pylori infection were 0.87 (95% CI 0.72-0.94), 0.86 (95% CI 0.77-0.92), 40 (95% CI 15-112), and 0.92 (95% CI 0.90-0.94), respectively, in the 1719 patients (385 patients with H pylori infection vs 1334 controls). Meta-regression showed methodological quality and included the number of patients in each study for the purpose of heterogeneity. There was no evidence of publication bias. The accuracy of the AI algorithm reached 82% for discrimination between noninfected images and posteradication images. Conclusions: An AI algorithm is a reliable tool for endoscopic diagnosis of H pylori infection. The limitations of lacking external validation performance and being conducted only in Asia should be overcome. Trial Registration: PROSPERO CRD42020175957; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=175957 %M 32936088 %R 10.2196/21983 %U http://www.jmir.org/2020/9/e21983/ %U https://doi.org/10.2196/21983 %U http://www.ncbi.nlm.nih.gov/pubmed/32936088 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 9 %P e20995 %T Identifying Key Predictors of Cognitive Dysfunction in Older People Using Supervised Machine Learning Techniques: Observational Study %A Rankin,Debbie %A Black,Michaela %A Flanagan,Bronac %A Hughes,Catherine F %A Moore,Adrian %A Hoey,Leane %A Wallace,Jonathan %A Gill,Chris %A Carlin,Paul %A Molloy,Anne M %A Cunningham,Conal %A McNulty,Helene %+ School of Computing, Engineering and Intelligent Systems, Ulster University, Northland Road, Derry~Londonderry, BT48 7JL, United Kingdom, 44 287167 ext 5841, d.rankin1@ulster.ac.uk %K classification %K supervised machine learning %K cognition %K diet %K aging %K geriatric assessment %D 2020 %7 16.9.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Machine learning techniques, specifically classification algorithms, may be effective to help understand key health, nutritional, and environmental factors associated with cognitive function in aging populations. Objective: This study aims to use classification techniques to identify the key patient predictors that are considered most important in the classification of poorer cognitive performance, which is an early risk factor for dementia. Methods: Data were used from the Trinity-Ulster and Department of Agriculture study, which included detailed information on sociodemographic, clinical, biochemical, nutritional, and lifestyle factors in 5186 older adults recruited from the Republic of Ireland and Northern Ireland, a proportion of whom (987/5186, 19.03%) were followed up 5-7 years later for reassessment. Cognitive function at both time points was assessed using a battery of tests, including the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), with a score <70 classed as poorer cognitive performance. This study trained 3 classifiers—decision trees, Naïve Bayes, and random forests—to classify the RBANS score and to identify key health, nutritional, and environmental predictors of cognitive performance and cognitive decline over the follow-up period. It assessed their performance, taking note of the variables that were deemed important for the optimized classifiers for their computational diagnostics. Results: In the classification of a low RBANS score (<70), our models performed well (F1 score range 0.73-0.93), all highlighting the individual’s score from the Timed Up and Go (TUG) test, the age at which the participant stopped education, and whether or not the participant’s family reported memory concerns to be of key importance. The classification models performed well in classifying a greater rate of decline in the RBANS score (F1 score range 0.66-0.85), also indicating the TUG score to be of key importance, followed by blood indicators: plasma homocysteine, vitamin B6 biomarker (plasma pyridoxal-5-phosphate), and glycated hemoglobin. Conclusions: The results suggest that it may be possible for a health care professional to make an initial evaluation, with a high level of confidence, of the potential for cognitive dysfunction using only a few short, noninvasive questions, thus providing a quick, efficient, and noninvasive way to help them decide whether or not a patient requires a full cognitive evaluation. This approach has the potential benefits of making time and cost savings for health service providers and avoiding stress created through unnecessary cognitive assessments in low-risk patients. %M 32936084 %R 10.2196/20995 %U http://medinform.jmir.org/2020/9/e20995/ %U https://doi.org/10.2196/20995 %U http://www.ncbi.nlm.nih.gov/pubmed/32936084 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 9 %P e18689 %T An Intelligent Mobile-Enabled System for Diagnosing Parkinson Disease: Development and Validation of a Speech Impairment Detection System %A Zhang,Liang %A Qu,Yue %A Jin,Bo %A Jing,Lu %A Gao,Zhan %A Liang,Zhanhua %+ Department of Neurology, The First Affiliated Hospital of Dalian Medical University, No.222 Zhongshan Road, Dalian, 116011, China, 86 18098876262, jinglu131129@126.com %K Parkinson disease %K speech disorder %K remote diagnosis %K artificial intelligence %K mobile phone app %K mobile health %D 2020 %7 16.9.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Parkinson disease (PD) is one of the most common neurological diseases. At present, because the exact cause is still unclear, accurate diagnosis and progression monitoring remain challenging. In recent years, exploring the relationship between PD and speech impairment has attracted widespread attention in the academic world. Most of the studies successfully validated the effectiveness of some vocal features. Moreover, the noninvasive nature of speech signal–based testing has pioneered a new way for telediagnosis and telemonitoring. In particular, there is an increasing demand for artificial intelligence–powered tools in the digital health era. Objective: This study aimed to build a real-time speech signal analysis tool for PD diagnosis and severity assessment. Further, the underlying system should be flexible enough to integrate any machine learning or deep learning algorithm. Methods: At its core, the system we built consists of two parts: (1) speech signal processing: both traditional and novel speech signal processing technologies have been employed for feature engineering, which can automatically extract a few linear and nonlinear dysphonia features, and (2) application of machine learning algorithms: some classical regression and classification algorithms from the machine learning field have been tested; we then chose the most efficient algorithms and relevant features. Results: Experimental results showed that our system had an outstanding ability to both diagnose and assess severity of PD. By using both linear and nonlinear dysphonia features, the accuracy reached 88.74% and recall reached 97.03% in the diagnosis task. Meanwhile, mean absolute error was 3.7699 in the assessment task. The system has already been deployed within a mobile app called No Pa. Conclusions: This study performed diagnosis and severity assessment of PD from the perspective of speech order detection. The efficiency and effectiveness of the algorithms indirectly validated the practicality of the system. In particular, the system reflects the necessity of a publicly accessible PD diagnosis and assessment system that can perform telediagnosis and telemonitoring of PD. This system can also optimize doctors’ decision-making processes regarding treatments. %M 32936086 %R 10.2196/18689 %U http://medinform.jmir.org/2020/9/e18689/ %U https://doi.org/10.2196/18689 %U http://www.ncbi.nlm.nih.gov/pubmed/32936086 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e21573 %T An Innovative Artificial Intelligence–Based App for the Diagnosis of Gestational Diabetes Mellitus (GDM-AI): Development Study %A Shen,Jiayi %A Chen,Jiebin %A Zheng,Zequan %A Zheng,Jiabin %A Liu,Zherui %A Song,Jian %A Wong,Sum Yi %A Wang,Xiaoling %A Huang,Mengqi %A Fang,Po-Han %A Jiang,Bangsheng %A Tsang,Winghei %A He,Zonglin %A Liu,Taoran %A Akinwunmi,Babatunde %A Wang,Chi Chiu %A Zhang,Casper J P %A Huang,Jian %A Ming,Wai-Kit %+ Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, Guangzhou, China, 86 14715485116, wkming@connect.hku.hk %K AI %K application %K disease diagnosis %K maternal health care %K artificial intelligence %K app %K women %K rural %K innovation %K diabetes %K gestational diabetes %K diagnosis %D 2020 %7 15.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Gestational diabetes mellitus (GDM) can cause adverse consequences to both mothers and their newborns. However, pregnant women living in low- and middle-income areas or countries often fail to receive early clinical interventions at local medical facilities due to restricted availability of GDM diagnosis. The outstanding performance of artificial intelligence (AI) in disease diagnosis in previous studies demonstrates its promising applications in GDM diagnosis. Objective: This study aims to investigate the implementation of a well-performing AI algorithm in GDM diagnosis in a setting, which requires fewer medical equipment and staff and to establish an app based on the AI algorithm. This study also explores possible progress if our app is widely used. Methods: An AI model that included 9 algorithms was trained on 12,304 pregnant outpatients with their consent who received a test for GDM in the obstetrics and gynecology department of the First Affiliated Hospital of Jinan University, a local hospital in South China, between November 2010 and October 2017. GDM was diagnosed according to American Diabetes Association (ADA) 2011 diagnostic criteria. Age and fasting blood glucose were chosen as critical parameters. For validation, we performed k-fold cross-validation (k=5) for the internal dataset and an external validation dataset that included 1655 cases from the Prince of Wales Hospital, the affiliated teaching hospital of the Chinese University of Hong Kong, a non-local hospital. Accuracy, sensitivity, and other criteria were calculated for each algorithm. Results: The areas under the receiver operating characteristic curve (AUROC) of external validation dataset for support vector machine (SVM), random forest, AdaBoost, k-nearest neighbors (kNN), naive Bayes (NB), decision tree, logistic regression (LR), eXtreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT) were 0.780, 0.657, 0.736, 0.669, 0.774, 0.614, 0.769, 0.742, and 0.757, respectively. SVM also retained high performance in other criteria. The specificity for SVM retained 100% in the external validation set with an accuracy of 88.7%. Conclusions: Our prospective and multicenter study is the first clinical study that supports the GDM diagnosis for pregnant women in resource-limited areas, using only fasting blood glucose value, patients’ age, and a smartphone connected to the internet. Our study proved that SVM can achieve accurate diagnosis with less operation cost and higher efficacy. Our study (referred to as GDM-AI study, ie, the study of AI-based diagnosis of GDM) also shows our app has a promising future in improving the quality of maternal health for pregnant women, precision medicine, and long-distance medical care. We recommend future work should expand the dataset scope and replicate the process to validate the performance of the AI algorithms. %M 32930674 %R 10.2196/21573 %U https://www.jmir.org/2020/9/e21573 %U https://doi.org/10.2196/21573 %U http://www.ncbi.nlm.nih.gov/pubmed/32930674