TY - JOUR AU - Popoff, Benjamin AU - Cabon, Sandie AU - Cuggia, Marc AU - Bouzillé, Guillaume AU - Clavier, Thomas PY - 2025/4/23 TI - Expectations of Intensive Care Physicians Regarding an AI-Based Decision Support System for Weaning From Continuous Renal Replacement Therapy: Predevelopment Survey Study JO - JMIR Med Inform SP - e63709 VL - 13 KW - clinical decision support system KW - artificial intelligence KW - decision support KW - decision making KW - clinical decision making KW - survey study KW - intensive care physicians KW - renal replacement therapy KW - therapeutic KW - ICU KW - user-centered design KW - cross-sectional survey KW - survey KW - French KW - physician KW - questionnaire KW - AI tools KW - user-centered N2 - Background: Critically ill patients in intensive care units (ICUs) require continuous monitoring, generating vast amounts of data. Clinical decision support systems (CDSS) leveraging artificial intelligence (AI) technologies have shown promise in improving diagnostic, prognostic, and therapeutic decision-making. However, these models are rarely implemented in clinical practice. Objective: The aim of this study was to survey ICU physicians to understand their expectations, opinions, and level of knowledge regarding a proposed AI-based CDSS for continuous renal replacement therapy (CRRT) weaning, a clinical decision-making process that is still complex and lacking in guidelines. This will be used to guide the development of an AI-based CDSS on which our team is working to ensure user-centered design and successful integration into clinical practice. Methods: A prospective cross-sectional survey of French-speaking physicians with clinical activity in intensive care was conducted between December 2023 and April 2024. The questionnaire consisted of 20 questions structured around 4 axes: overview of the problem and current practices concerning weaning from CRRT, opinion on AI-based CDSS, implementation in daily clinical practice, real-life operation and willingness to adopt the CDSS in everyday practice. Statistical analyses included Wilcoxon rank sum tests for quantitative variables and ?2 or Fisher exact tests for qualitative variables, with multivariate analyses performed using ordinal logistic regression. Results: A total of 171 complete responses were received. Physicians expressed an interest in a CDSS for CRRT weaning, with 70.2% (120/171) viewing AI-based CDSS favorably. Opinions were split regarding the difficulty of the weaning decision itself, with 46.2% (79/171) disagreeing that it is challenging, while 31.6% (54/171) agreed. However, 66.1% (113/171) of respondents supported the value of an AI-based CDSS to assist them in this decision, with younger physicians showing stronger support (81.8%, 27/33 vs 62.3%; 86/138; P=.01). Most respondents (163/171, 95.3%) emphasized the importance of understanding the criteria used by the model to make its predictions. Conclusions: Our findings highlight an optimistic attitude among ICU physicians toward AI-based CDSS for CRRT weaning, emphasizing the need for transparency, integration into existing workflows, and alignment with clinicians? decision-making processes. Actionable recommendations include incorporating key variables such as urine output and biological parameters, defining probability thresholds for recommendations and ensuring model transparency to facilitate the successful adoption and integration into clinical practice. The methodology of this survey may help the development of further predevelopment studies accompanying AI-based CDSS projects. UR - https://medinform.jmir.org/2025/1/e63709 UR - http://dx.doi.org/10.2196/63709 ID - info:doi/10.2196/63709 ER - TY - JOUR AU - Lazli, Lilia PY - 2025/4/21 TI - Improved Alzheimer Disease Diagnosis With a Machine Learning Approach and Neuroimaging: Case Study Development JO - JMIRx Med SP - e60866 VL - 6 KW - Alzheimer disease KW - computer-aided diagnosis system KW - machine learning KW - principal component analysis KW - linear discriminant analysis KW - t-distributed stochastic neighbor embedding KW - feedforward neural network KW - vision transformer architecture KW - support vector machines KW - magnetic resonance imaging KW - positron emission tomography imaging KW - Open Access Series of Imaging Studies KW - Alzheimer's Disease Neuroimaging Initiative KW - OASIS KW - ADNI N2 - Background: Alzheimer disease (AD) is a severe neurological brain disorder. While not curable, earlier detection can help improve symptoms substantially. Machine learning (ML) models are popular and well suited for medical image processing tasks such as computer-aided diagnosis. These techniques can improve the process for an accurate diagnosis of AD. Objective: In this paper, a complete computer-aided diagnosis system for the diagnosis of AD has been presented. We investigate the performance of some of the most used ML techniques for AD detection and classification using neuroimages from the Open Access Series of Imaging Studies (OASIS) and Alzheimer?s Disease Neuroimaging Initiative (ADNI) datasets. Methods: The system uses artificial neural networks (ANNs) and support vector machines (SVMs) as classifiers, and dimensionality reduction techniques as feature extractors. To retrieve features from the neuroimages, we used principal component analysis (PCA), linear discriminant analysis, and t-distributed stochastic neighbor embedding. These features are fed into feedforward neural networks (FFNNs) and SVM-based ML classifiers. Furthermore, we applied the vision transformer (ViT)?based ANNs in conjunction with data augmentation to distinguish patients with AD from healthy controls. Results: Experiments were performed on magnetic resonance imaging and positron emission tomography scans. The OASIS dataset included a total of 300 patients, while the ADNI dataset included 231 patients. For OASIS, 90 (30%) patients were healthy and 210 (70%) were severely impaired by AD. Likewise for the ADNI database, a total of 149 (64.5%) patients with AD were detected and 82 (35.5%) patients were used as healthy controls. An important difference was established between healthy patients and patients with AD (P=.02). We examined the effectiveness of the three feature extractors and classifiers using 5-fold cross-validation and confusion matrix?based standard classification metrics, namely, accuracy, sensitivity, specificity, precision, F1-score, and area under the receiver operating characteristic curve (AUROC). Compared with the state-of-the-art performing methods, the success rate was satisfactory for all the created ML models, but SVM and FFNN performed best with the PCA extractor, while the ViT classifier performed best with more data. The data augmentation/ViT approach worked better overall, achieving accuracies of 93.2% (sensitivity=87.2, specificity=90.5, precision=87.6, F1-score=88.7, and AUROC=92) for OASIS and 90.4% (sensitivity=85.4, specificity=88.6, precision=86.9, F1-score=88, and AUROC=90) for ADNI. Conclusions: Effective ML models using neuroimaging data could help physicians working on AD diagnosis and will assist them in prescribing timely treatment to patients with AD. Good results were obtained on the OASIS and ADNI datasets with all the proposed classifiers, namely, SVM, FFNN, and ViTs. However, the results show that the ViT model is much better at predicting AD than the other models when a sufficient amount of data are available to perform the training. This highlights that the data augmentation process could impact the overall performance of the ViT model. UR - https://xmed.jmir.org/2025/1/e60866 UR - http://dx.doi.org/10.2196/60866 ID - info:doi/10.2196/60866 ER - TY - JOUR AU - Shaw, Matthew Kendrick AU - Shao, Yu-Ping AU - Ghanta, Manohar AU - Junior, Moura Valdery AU - Kimchi, Y. Eyal AU - Houle, T. Timothy AU - Akeju, Oluwaseun AU - Westover, Brandon Michael PY - 2025/4/18 TI - Daily Automated Prediction of Delirium Risk in Hospitalized Patients: Model Development and Validation JO - JMIR Med Inform SP - e60442 VL - 13 KW - delirium KW - prediction model KW - machine learning KW - boosted trees KW - model development KW - validation KW - AI KW - artificial intelligence KW - screening KW - prevention KW - develop KW - logistic regression KW - vitals KW - vital signs KW - gender KW - age KW - prevent N2 - Background: Delirium is common in hospitalized patients and is correlated with increased morbidity and mortality. Despite this, delirium is underdiagnosed, and many institutions do not have sufficient resources to consistently apply effective screening and prevention. Objective: This study aims to develop a machine learning algorithm to identify patients at the highest risk of delirium in the hospital each day in an automated fashion based on data available in the electronic medical record, reducing the barrier to large-scale delirium screening. Methods: We developed and compared multiple machine learning models on a retrospective dataset of all hospitalized adult patients with recorded Confusion Assessment Method (CAM) screens at a major academic medical center from April 2, 2016, to January 16, 2019, comprising 23,006 patients. The patient?s age, gender, and all available laboratory values, vital signs, prior CAM screens, and medication administrations were used as potential predictors. Four machine learning approaches were investigated: logistic regression with L1-regularization, multilayer perceptrons, random forests, and boosted trees. Model development used 80% of the patients; the remaining 20% was reserved for testing the final models. Laboratory values, vital signs, medications, gender, and age were used to predict a positive CAM screen in the next 24 hours. Results: The boosted tree model achieved the greatest predictive power, with an area under the receiver operator characteristic curve (AUROC) of 0.92 (95% CI 0.913-9.22), followed by the random forest (AUROC 0.91, 95% CI 0.909-0.918), multilayer perceptron (AUROC 0.86, 95% CI 0.850-0.861), and logistic regression (AUROC 0.85, 95% CI 0.841-0.852). These AUROCs decreased to 0.78-0.82 and 0.74-0.80 when limited to patients who currently do not or never have had delirium, respectively. Conclusions: A boosted tree machine learning model was able to identify hospitalized patients at elevated risk for delirium in the next 24 hours. This may allow for automated delirium risk screening and more precise targeting of proven and investigational interventions to prevent delirium. UR - https://medinform.jmir.org/2025/1/e60442 UR - http://dx.doi.org/10.2196/60442 UR - http://www.ncbi.nlm.nih.gov/pubmed/39721068 ID - info:doi/10.2196/60442 ER - TY - JOUR AU - Wang, Longyun AU - Wang, Zeyu AU - Zhao, Bowei AU - Wang, Kai AU - Zheng, Jingying AU - Zhao, Lijing PY - 2025/4/18 TI - Diagnosis Test Accuracy of Artificial Intelligence for Endometrial Cancer: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e66530 VL - 27 KW - artificial intelligence KW - endometrial cancer KW - diagnostic test accuracy KW - systematic review KW - meta-analysis KW - machine learning KW - deep learning N2 - Background: Endometrial cancer is one of the most common gynecological tumors, and early screening and diagnosis are crucial for its treatment. Research on the application of artificial intelligence (AI) in the diagnosis of endometrial cancer is increasing, but there is currently no comprehensive meta-analysis to evaluate the diagnostic accuracy of AI in screening for endometrial cancer. Objective: This paper presents a systematic review of AI-based endometrial cancer screening, which is needed to clarify its diagnostic accuracy and provide evidence for the application of AI technology in screening for endometrial cancer. Methods: A search was conducted across PubMed, Embase, Cochrane Library, Web of Science, and Scopus databases to include studies published in English, which evaluated the performance of AI in endometrial cancer screening. A total of 2 independent reviewers screened the titles and abstracts, and the quality of the selected studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies?2 (QUADAS-2) tool. The certainty of the diagnostic test evidence was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. Results: A total of 13 studies were included, and the hierarchical summary receiver operating characteristic model used for the meta-analysis showed that the overall sensitivity of AI-based endometrial cancer screening was 86% (95% CI 79%-90%) and specificity was 92% (95% CI 87%-95%). Subgroup analysis revealed similar results across AI type, study region, publication year, and study type, but the overall quality of evidence was low. Conclusions: AI-based endometrial cancer screening can effectively detect patients with endometrial cancer, but large-scale population studies are needed in the future to further clarify the diagnostic accuracy of AI in screening for endometrial cancer. Trial Registration: PROSPERO CRD42024519835; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024519835 UR - https://www.jmir.org/2025/1/e66530 UR - http://dx.doi.org/10.2196/66530 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66530 ER - TY - JOUR AU - Park, Soo Ji AU - Park, Sa-Yoon AU - Moon, Won Jae AU - Kim, Kwangsoo AU - Suh, In Dong PY - 2025/4/18 TI - Artificial Intelligence Models for Pediatric Lung Sound Analysis: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e66491 VL - 27 KW - machine learning KW - respiratory disease classification KW - wheeze detection KW - auscultation KW - mel-spectrogram KW - abnormal lung sound detection KW - artificial intelligence KW - pediatric KW - lung sound analysis KW - systematic review KW - asthma KW - pneumonia KW - children KW - morbidity KW - mortality KW - diagnostic KW - respiratory pathology N2 - Background: Pediatric respiratory diseases, including asthma and pneumonia, are major causes of morbidity and mortality in children. Auscultation of lung sounds is a key diagnostic tool but is prone to subjective variability. The integration of artificial intelligence (AI) and machine learning (ML) with electronic stethoscopes offers a promising approach for automated and objective lung sound. Objective: This systematic review and meta-analysis assess the performance of ML models in pediatric lung sound analysis. The study evaluates the methodologies, model performance, and database characteristics while identifying limitations and future directions for clinical implementation. Methods: A systematic search was conducted in Medline via PubMed, Embase, Web of Science, OVID, and IEEE Xplore for studies published between January 1, 1990, and December 16, 2024. Inclusion criteria are as follows: studies developing ML models for pediatric lung sound classification with a defined database, physician-labeled reference standard, and reported performance metrics. Exclusion criteria are as follows: studies focusing on adults, cardiac auscultation, validation of existing models, or lacking performance metrics. Risk of bias was assessed using a modified Quality Assessment of Diagnostic Accuracy Studies (version 2) framework. Data were extracted on study design, dataset, ML methods, feature extraction, and classification tasks. Bivariate meta-analysis was performed for binary classification tasks, including wheezing and abnormal lung sound detection. Results: A total of 41 studies met the inclusion criteria. The most common classification task was binary detection of abnormal lung sounds, particularly wheezing. Pooled sensitivity and specificity for wheeze detection were 0.902 (95% CI 0.726-0.970) and 0.955 (95% CI 0.762-0.993), respectively. For abnormal lung sound detection, pooled sensitivity was 0.907 (95% CI 0.816-0.956) and specificity 0.877 (95% CI 0.813-0.921). The most frequently used feature extraction methods were Mel-spectrogram, Mel-frequency cepstral coefficients, and short-time Fourier transform. Convolutional neural networks were the predominant ML model, often combined with recurrent neural networks or residual network architectures. However, high heterogeneity in dataset size, annotation methods, and evaluation criteria were observed. Most studies relied on small, single-center datasets, limiting generalizability. Conclusions: ML models show high accuracy in pediatric lung sound analysis, but face limitations due to dataset heterogeneity, lack of standard guidelines, and limited external validation. Future research should focus on standardized protocols and the development of large-scale, multicenter datasets to improve model robustness and clinical implementation. UR - https://www.jmir.org/2025/1/e66491 UR - http://dx.doi.org/10.2196/66491 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66491 ER - TY - JOUR AU - Matulis 3rd, Charles John AU - Greenwood, Jason AU - Eberle, Michele AU - Anderson, Benjamin AU - Blair, David AU - Chaudhry, Rajeev PY - 2025/4/11 TI - Implementation of an Integrated, Clinical Decision Support Tool at the Point of Antihypertensive Medication Refill Request to Improve Hypertension Management: Controlled Pre-Post Study JO - JMIR Med Inform SP - e70752 VL - 13 KW - clinical decision support systems KW - population health KW - hypertension KW - electronic health records N2 - Background: Improving processes regarding the management of electronic health record (EHR) requests for chronic antihypertensive medication renewals may represent an opportunity to enhance blood pressure (BP) management at the individual and population level. Objective: This study aimed to evaluate the effectiveness of the eRx HTN Chart Check, an integrated clinical decision support tool available at the point of antihypertensive medication refill request, in facilitating enhanced provider management of chronic hypertension. Methods: The study was conducted at two Mayo Clinic sites?Northwest Wisconsin Family Medicine and Rochester Community Internal Medicine practices?with control groups in comparable Mayo Clinic practices. The intervention integrated structured clinical data, including recent BP readings, laboratory results, and visit dates, into the electronic prescription renewal interface to facilitate prescriber decision-making regarding hypertension management. A difference-in-differences (DID) design compared pre- and postintervention hypertension control rates between the intervention and control groups. Data were collected from the Epic EHR system and analyzed using linear regression models. Results: The baseline BP control rates were slightly higher in intervention clinics. Postimplementation, no significant improvement in population-level hypertension control was observed (DID estimate: 0.07%, 95% CI ?4.0% to 4.1%; P=.97). Of the 19,968 refill requests processed, 46% met all monitoring criteria. However, clinician approval rates remained high (90%), indicating minimal impact on prescribing behavior. Conclusions: Despite successful implementation, the tool did not significantly improve hypertension control, possibly due to competing quality initiatives and high in-basket volumes. Future iterations should focus on enhanced integration with other decision support tools and strategies to improve clinician engagement and patient outcomes. Further research is needed to optimize chronic disease management through EHR-integrated decision support systems. UR - https://medinform.jmir.org/2025/1/e70752 UR - http://dx.doi.org/10.2196/70752 ID - info:doi/10.2196/70752 ER - TY - JOUR AU - Lim, De Ming AU - Connie, Tee AU - Goh, Ong Michael Kah AU - Saedon, ?Izzati Nor PY - 2025/4/8 TI - Model-Based Feature Extraction and Classification for Parkinson Disease Screening Using Gait Analysis: Development and Validation Study JO - JMIR Aging SP - e65629 VL - 8 KW - model-based features KW - gait analysis KW - Parkinson disease KW - computer vision KW - support vector machine N2 - Background: Parkinson disease (PD) is a progressive neurodegenerative disorder that affects motor coordination, leading to gait abnormalities. Early detection of PD is crucial for effective management and treatment. Traditional diagnostic methods often require invasive procedures or are performed when the disease has significantly progressed. Therefore, there is a need for noninvasive techniques that can identify early motor symptoms, particularly those related to gait. Objective: The study aimed to develop a noninvasive approach for the early detection of PD by analyzing model-based gait features. The primary focus is on identifying subtle gait abnormalities associated with PD using kinematic characteristics. Methods: Data were collected through controlled video recordings of participants performing the timed up and go (TUG) assessment, with particular emphasis on the turning phase. The kinematic features analyzed include shoulder distance, step length, stride length, knee and hip angles, leg and arm symmetry, and trunk angles. These features were processed using advanced filtering techniques and analyzed through machine learning methods to distinguish between normal and PD-affected gait patterns. Results: The analysis of kinematic features during the turning phase of the TUG assessment revealed that individuals with PD exhibited subtle gait abnormalities, such as freezing of gait, reduced step length, and asymmetrical movements. The model-based features proved effective in differentiating between normal and PD-affected gait, demonstrating the potential of this approach in early detection. Conclusions: This study presents a promising noninvasive method for the early detection of PD by analyzing specific gait features during the turning phase of the TUG assessment. The findings suggest that this approach could serve as a sensitive and accurate tool for diagnosing and monitoring PD, potentially leading to earlier intervention and improved patient outcomes. UR - https://aging.jmir.org/2025/1/e65629 UR - http://dx.doi.org/10.2196/65629 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65629 ER - TY - JOUR AU - Zheng, Rui AU - Jiang, Xiao AU - Shen, Li AU - He, Tianrui AU - Ji, Mengting AU - Li, Xingyi AU - Yu, Guangjun PY - 2025/4/7 TI - Investigating Clinicians? Intentions and Influencing Factors for Using an Intelligence-Enabled Diagnostic Clinical Decision Support System in Health Care Systems: Cross-Sectional Survey JO - J Med Internet Res SP - e62732 VL - 27 KW - artificial intelligence KW - clinical decision support systems KW - task-technology fit KW - technology acceptance model KW - perceived risk KW - performance expectations KW - intention to use N2 - Background: An intelligence-enabled clinical decision support system (CDSS) is a computerized system that integrates medical knowledge, patient data, and clinical guidelines to assist health care providers make clinical decisions. Research studies have shown that CDSS utilization rates have not met expectations. Clinicians? intentions and their attitudes determine the use and promotion of CDSS in clinical practice. Objective: The aim of this study was to enhance the successful utilization of CDSS by analyzing the pivotal factors that influence clinicians? intentions to adopt it and by putting forward targeted management recommendations. Methods: This study proposed a research model grounded in the task-technology fit model and the technology acceptance model, which was then tested through a cross-sectional survey. The measurement instrument comprised demographic characteristics, multi-item scales, and an open-ended query regarding areas where clinicians perceived the system required improvement. We leveraged structural equation modeling to assess the direct and indirect effects of ?task-technology fit? and ?perceived ease of use? on clinicians? intentions to use the CDSS when mediated by ?performance expectation? and ?perceived risk.? We collated and analyzed the responses to the open-ended question. Results: We collected a total of 247 questionnaires. The model explained 65.8% of the variance in use intention. Performance expectations (?=0.228; P<.001) and perceived risk (?=?0.579; P<.001) were both significant predictors of use intention. Task-technology fit (?=?0.281; P<.001) and perceived ease of use (?=?0.377; P<.001) negatively affected perceived risk. Perceived risk (?=?0.308; P<.001) negatively affected performance expectations. Task-technology fit positively affected perceived ease of use (?=0.692; P<.001) and performance expectations (?=0.508; P<.001). Task characteristics (?=0.168; P<.001) and technology characteristics (?=0.749; P<.001) positively affected task-technology fit. Contrary to expectations, perceived ease of use (?=0.108; P=.07) did not have a significant impact on use intention. From the open-ended question, 3 main themes emerged regarding clinicians? perceived deficiencies in CDSS: system security risks, personalized interaction, seamless integration. Conclusions: Perceived risk and performance expectations were direct determinants of clinicians? adoption of CDSS, significantly influenced by task-technology fit and perceived ease of use. In the future, increasing transparency within CDSS and fostering trust between clinicians and technology should be prioritized. Furthermore, focusing on personalized interactions and ensuring seamless integration into clinical workflows are crucial steps moving forward. UR - https://www.jmir.org/2025/1/e62732 UR - http://dx.doi.org/10.2196/62732 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/62732 ER - TY - JOUR AU - Templeton, Michael John AU - Poellabauer, Christian AU - Schneider, Sandra AU - Rahimi, Morteza AU - Braimoh, Taofeek AU - Tadamarry, Fhaheem AU - Margolesky, Jason AU - Burke, Shanna AU - Al Masry, Zeina PY - 2025/4/4 TI - Modernizing the Staging of Parkinson Disease Using Digital Health Technology JO - J Med Internet Res SP - e63105 VL - 27 KW - digital health KW - Parkinson disease KW - disease classification KW - wearables KW - personalized medicine KW - neurocognition KW - artificial intelligence KW - AI UR - https://www.jmir.org/2025/1/e63105 UR - http://dx.doi.org/10.2196/63105 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63105 ER - TY - JOUR AU - Lewis, Claire AU - Groarke, Jenny AU - Graham-Wisener, Lisa AU - James, Jacqueline PY - 2025/4/2 TI - Public Awareness of and Attitudes Toward the Use of AI in Pathology Research and Practice: Mixed Methods Study JO - J Med Internet Res SP - e59591 VL - 27 KW - artificial intelligence KW - AI KW - public opinion KW - pathology KW - health care KW - public awareness KW - survey N2 - Background: The last decade has witnessed major advances in the development of artificial intelligence (AI) technologies for use in health care. One of the most promising areas of research that has potential clinical utility is the use of AI in pathology to aid cancer diagnosis and management. While the value of using AI to improve the efficiency and accuracy of diagnosis cannot be underestimated, there are challenges in the development and implementation of such technologies. Notably, questions remain about public support for the use of AI to assist in pathological diagnosis and for the use of health care data, including data obtained from tissue samples, to train algorithms. Objective: This study aimed to investigate public awareness of and attitudes toward AI in pathology research and practice. Methods: A nationally representative, cross-sectional, web-based mixed methods survey (N=1518) was conducted to assess the UK public?s awareness of and views on the use of AI in pathology research and practice. Respondents were recruited via Prolific, an online research platform. To be eligible for the study, participants had to be aged >18 years, be UK residents, and have the capacity to express their own opinion. Respondents answered 30 closed-ended questions and 2 open-ended questions. Sociodemographic information and previous experience with cancer were collected. Descriptive and inferential statistics were used to analyze quantitative data; qualitative data were analyzed thematically. Results: Awareness was low, with only 23.19% (352/1518) of the respondents somewhat or moderately aware of AI being developed for use in pathology. Most did not support a diagnosis of cancer (908/1518, 59.82%) or a diagnosis based on biomarkers (694/1518, 45.72%) being made using AI only. However, most (1478/1518, 97.36%) supported diagnoses made by pathologists with AI assistance. The adjusted odds ratio (aOR) for supporting AI in cancer diagnosis and management was higher for men (aOR 1.34, 95% CI 1.02-1.75). Greater awareness (aOR 1.25, 95% CI 1.10-1.42), greater trust in data security and privacy protocols (aOR 1.04, 95% CI 1.01-1.07), and more positive beliefs (aOR 1.27, 95% CI 1.20-1.36) also increased support, whereas identifying more risks reduced the likelihood of support (aOR 0.80, 95% CI 0.73-0.89). In total, 3 main themes emerged from the qualitative data: bringing the public along, the human in the loop, and more hard evidence needed, indicating conditional support for AI in pathology with human decision-making oversight, robust measures for data handling and protection, and evidence for AI benefit and effectiveness. Conclusions: Awareness of AI?s potential use in pathology was low, but attitudes were positive, with high but conditional support. Challenges remain, particularly among women, regarding AI use in cancer diagnosis and management. Apprehension persists about the access to and use of health care data by private organizations. UR - https://www.jmir.org/2025/1/e59591 UR - http://dx.doi.org/10.2196/59591 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59591 ER - TY - JOUR AU - Xu, He-Li AU - Li, Xiao-Ying AU - Jia, Ming-Qian AU - Ma, Qi-Peng AU - Zhang, Ying-Hua AU - Liu, Fang-Hua AU - Qin, Ying AU - Chen, Yu-Han AU - Li, Yu AU - Chen, Xi-Yang AU - Xu, Yi-Lin AU - Li, Dong-Run AU - Wang, Dong-Dong AU - Huang, Dong-Hui AU - Xiao, Qian AU - Zhao, Yu-Hong AU - Gao, Song AU - Qin, Xue AU - Tao, Tao AU - Gong, Ting-Ting AU - Wu, Qi-Jun PY - 2025/3/24 TI - AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e67922 VL - 27 KW - artificial intelligence KW - AI KW - blood biomarker KW - ovarian cancer KW - diagnosis KW - PRISMA N2 - Background: Emerging evidence underscores the potential application of artificial intelligence (AI) in discovering noninvasive blood biomarkers. However, the diagnostic value of AI-derived blood biomarkers for ovarian cancer (OC) remains inconsistent. Objective: We aimed to evaluate the research quality and the validity of AI-based blood biomarkers in OC diagnosis. Methods: A systematic search was performed in the MEDLINE, Embase, IEEE Xplore, PubMed, Web of Science, and the Cochrane Library databases. Studies examining the diagnostic accuracy of AI in discovering OC blood biomarkers were identified. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies?AI tool. Pooled sensitivity, specificity, and area under the curve (AUC) were estimated using a bivariate model for the diagnostic meta-analysis. Results: A total of 40 studies were ultimately included. Most (n=31, 78%) included studies were evaluated as low risk of bias. Overall, the pooled sensitivity, specificity, and AUC were 85% (95% CI 83%-87%), 91% (95% CI 90%-92%), and 0.95 (95% CI 0.92-0.96), respectively. For contingency tables with the highest accuracy, the pooled sensitivity, specificity, and AUC were 95% (95% CI 90%-97%), 97% (95% CI 95%-98%), and 0.99 (95% CI 0.98-1.00), respectively. Stratification by AI algorithms revealed higher sensitivity and specificity in studies using machine learning (sensitivity=85% and specificity=92%) compared to those using deep learning (sensitivity=77% and specificity=85%). In addition, studies using serum reported substantially higher sensitivity (94%) and specificity (96%) than those using plasma (sensitivity=83% and specificity=91%). Stratification by external validation demonstrated significantly higher specificity in studies with external validation (specificity=94%) compared to those without external validation (specificity=89%), while the reverse was observed for sensitivity (74% vs 90%). No publication bias was detected in this meta-analysis. Conclusions: AI algorithms demonstrate satisfactory performance in the diagnosis of OC using blood biomarkers and are anticipated to become an effective diagnostic modality in the future, potentially avoiding unnecessary surgeries. Future research is warranted to incorporate external validation into AI diagnostic models, as well as to prioritize the adoption of deep learning methodologies. Trial Registration: PROSPERO CRD42023481232; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023481232 UR - https://www.jmir.org/2025/1/e67922 UR - http://dx.doi.org/10.2196/67922 UR - http://www.ncbi.nlm.nih.gov/pubmed/40126546 ID - info:doi/10.2196/67922 ER - TY - JOUR AU - Amagai, Saki AU - Kaat, J. Aaron AU - Fox, S. Rina AU - Ho, H. Emily AU - Pila, Sarah AU - Kallen, A. Michael AU - Schalet, D. Benjamin AU - Nowinski, J. Cindy AU - Gershon, C. Richard PY - 2025/3/21 TI - Customizing Computerized Adaptive Test Stopping Rules for Clinical Settings Using the Negative Affect Subdomain of the NIH Toolbox Emotion Battery: Simulation Study JO - JMIR Form Res SP - e60215 VL - 9 KW - computerized adaptive testing KW - CAT KW - stopping rules KW - NIH Toolbox KW - reliability KW - test burden KW - clinical setting KW - patient-reported outcome KW - clinician N2 - Background: Patient-reported outcome measures are crucial for informed medical decisions and evaluating treatments. However, they can be burdensome for patients and sometimes lack the reliability clinicians need for clear clinical interpretations. Objective: We aimed to assess the extent to which applying alternative stopping rules can increase reliability for clinical use while minimizing the burden of computerized adaptive tests (CATs). Methods: CAT simulations were conducted on 3 adult item banks in the NIH Toolbox for Assessment of Neurological and Behavioral Function Emotion Battery; the item banks were in the Negative Affect subdomain (ie, Anger Affect, Fear Affect, and Sadness) and contained at least 8 items. In the originally applied NIH Toolbox CAT stopping rules, the CAT was stopped if the score SE reached <0.3 before 12 items were administered. We first contrasted this with a SE-change rule in a planned simulation analysis. We then contrasted the original rules with fixed-length CATs (4?12 items), a reduction of the maximum number of items to 8, and other modifications in post hoc analyses. Burden was measured by the number of items administered per simulation, precision by the percentage of assessments yielding reliability cutoffs (0.85, 0.90, and 0.95), and accurate score recovery by the root mean squared error between the generating ? and the CAT-estimated ?expected a posteriori??based ?. Results: In general, relative to the original rules, the alternative stopping rules slightly decreased burden while also increasing the proportion of assessments achieving high reliability for the adult banks; however, the SE-change rule and fixed-length CATs with 8 or fewer items also notably increased assessments yielding reliability <0.85. Among the alternative rules explored, the reduced maximum stopping rule best balanced precision and parsimony, presenting another option beyond the original rules. Conclusions: Our findings demonstrate the challenges in attempting to reduce test burden while also achieving score precision for clinical use. Stopping rules should be modified in accordance with the context of the study population and the purpose of the study. UR - https://formative.jmir.org/2025/1/e60215 UR - http://dx.doi.org/10.2196/60215 ID - info:doi/10.2196/60215 ER - TY - JOUR AU - Chetla, Nitin AU - Chen, Matthew AU - Chang, Joseph AU - Smith, Aaron AU - Hage, Rajai Tamer AU - Patel, Romil AU - Gardner, Alana AU - Bryer, Bridget PY - 2025/3/21 TI - Assessing the Diagnostic Accuracy of ChatGPT-4 in Identifying Diverse Skin Lesions Against Squamous and Basal Cell Carcinoma JO - JMIR Dermatol SP - e67299 VL - 8 KW - chatbot KW - ChatGPT KW - ChatGPT-4 KW - squamous cell carcinoma KW - basal cell carcinoma KW - skin cancer KW - skin cancer detection KW - dermatoscopic image analysis KW - skin lesion differentiation KW - dermatologist KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - AI in dermatology KW - algorithm KW - model KW - analytics KW - diagnostic accuracy UR - https://derma.jmir.org/2025/1/e67299 UR - http://dx.doi.org/10.2196/67299 ID - info:doi/10.2196/67299 ER - TY - JOUR AU - Mansoor, Masab AU - Ibrahim, F. Andrew AU - Grindem, David AU - Baig, Asad PY - 2025/3/19 TI - Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance JO - JMIRx Med SP - e65263 VL - 6 KW - natural language processing KW - NLP KW - machine learning KW - ML KW - artificial intelligence KW - language model KW - large language model KW - LLM KW - generative pretrained transformer KW - GPT KW - pediatrics N2 - Background: Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis. Objective: This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings. Methods: This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0?18 years; n=261, 52.2% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses. Results: The GPT-3 model achieved an accuracy of 87.3% (131/150 cases), sensitivity of 85% (95% CI 82%?88%), and specificity of 90% (95% CI 87%?93%), comparable to pediatricians? accuracy of 91.3% (137/150 cases; P=.47). Performance was consistent across age groups (0?5 years: 54/62, 87%; 6?12 years: 47/53, 89%; 13?18 years: 30/35, 86%) and common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80%) but comparable to pediatricians (17/20, 85%; P=.62). Conclusions: This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation. UR - https://xmed.jmir.org/2025/1/e65263 UR - http://dx.doi.org/10.2196/65263 ID - info:doi/10.2196/65263 ER - TY - JOUR AU - Paz-Arbaizar, Leire AU - Lopez-Castroman, Jorge AU - Artés-Rodríguez, Antonio AU - Olmos, M. Pablo AU - Ramírez, David PY - 2025/3/18 TI - Emotion Forecasting: A Transformer-Based Approach JO - J Med Internet Res SP - e63962 VL - 27 KW - affect KW - emotional valence KW - machine learning KW - mental disorder KW - monitoring KW - mood KW - passive data KW - Patient Health Questionnaire-9 KW - PHQ-9 KW - psychological distress KW - time-series forecasting N2 - Background: Monitoring the emotional states of patients with psychiatric problems has always been challenging due to the noncontinuous nature of clinical assessments, the effect of the health care environment, and the inherent subjectivity of evaluation instruments. However, mental states in psychiatric disorders exhibit substantial variability over time, making real-time monitoring crucial for preventing risky situations and ensuring appropriate treatment. Objective: This study aimed to leverage new technologies and deep learning techniques to enable more objective, real-time monitoring of patients. This was achieved by passively monitoring variables such as step count, patient location, and sleep patterns using mobile devices. We aimed to predict patient self-reports and detect sudden variations in their emotional valence, identifying situations that may require clinical intervention. Methods: Data for this project were collected using the Evidence-Based Behavior (eB2) app, which records both passive and self-reported variables daily. Passive data refer to behavioral information gathered via the eB2 app through sensors embedded in mobile devices and wearables. These data were obtained from studies conducted in collaboration with hospitals and clinics that used eB2. We used hidden Markov models (HMMs) to address missing data and transformer deep neural networks for time-series forecasting. Finally, classification algorithms were applied to predict several variables, including emotional state and responses to the Patient Health Questionnaire-9. Results: Through real-time patient monitoring, we demonstrated the ability to accurately predict patients? emotional states and anticipate changes over time. Specifically, our approach achieved high accuracy (0.93) and a receiver operating characteristic (ROC) area under the curve (AUC) of 0.98 for emotional valence classification. For predicting emotional state changes 1 day in advance, we obtained an ROC AUC of 0.87. Furthermore, we demonstrated the feasibility of forecasting responses to the Patient Health Questionnaire-9, with particularly strong performance for certain questions. For example, in question 9, related to suicidal ideation, our model achieved an accuracy of 0.9 and an ROC AUC of 0.77 for predicting the next day?s response. Moreover, we illustrated the enhanced stability of multivariate time-series forecasting when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods, such as recurrent neural networks or long short-term memory cells. Conclusions: The stability of multivariate time-series forecasting improved when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods (eg, recurrent neural network and long short-term memory), leveraging the attention mechanisms to capture longer time dependencies and gain interpretability. We showed the potential to assess the emotional state of a patient and the scores of psychiatric questionnaires from passive variables in advance. This allows real-time monitoring of patients and hence better risk detection and treatment adjustment. UR - https://www.jmir.org/2025/1/e63962 UR - http://dx.doi.org/10.2196/63962 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63962 ER - TY - JOUR AU - Jørgensen, Lee Natasha AU - Merrild, Hoffmann Camilla AU - Jensen, Bach Martin AU - Moeslund, B. Thomas AU - Kidholm, Kristian AU - Thomsen, Laust Janus PY - 2025/3/12 TI - The Perceptions of Potential Prerequisites for Artificial Intelligence in Danish General Practice: Vignette-Based Interview Study Among General Practitioners JO - JMIR Med Inform SP - e63895 VL - 13 KW - general practice KW - general practitioners KW - GPs KW - artificial intelligence KW - AI KW - prerequisites KW - interviews KW - vignettes KW - qualitative study KW - thematic analysis N2 - Background: Artificial intelligence (AI) has been deemed revolutionary in medicine; however, no AI tools have been implemented or validated in Danish general practice. General practice in Denmark has an excellent digitization system for developing and using AI. Nevertheless, there is a lack of involvement of general practitioners (GPs) in developing AI. The perspectives of GPs as end users are essential for facilitating the next stage of AI development in general practice. Objective: This study aimed to identify the essential prerequisites that GPs perceive as necessary to realize the potential of AI in Danish general practice. Methods: This study used semistructured interviews and vignettes among GPs to gain perspectives on the potential of AI in general practice. A total of 12 GPs interested in the potential of AI in general practice were interviewed in 2019 and 2021. The interviews were transcribed verbatim and thematic analysis was conducted to identify the dominant themes throughout the data. Results: In the data analysis, four main themes were identified as essential prerequisites for GPs when considering the potential of AI in general practice: (1) AI must begin with the low-hanging fruit, (2) AI must be meaningful in the GP?s work, (3) the GP-patient relationship must be maintained despite AI, and (4) AI must be a free, active, and integrated option in the electronic health record (EHR). These 4 themes suggest that the development of AI should initially focus on low-complexity tasks that do not influence patient interactions but facilitate GPs? work in a meaningful manner as an integrated part of the EHR. Examples of this include routine and administrative tasks. Conclusions: The research findings outline the participating GPs? perceptions of the essential prerequisites to consider when exploring the potential applications of AI in primary care settings. We believe that these perceptions of potential prerequisites can support the initial stages of future development and assess the suitability of existing AI tools for general practice. UR - https://medinform.jmir.org/2025/1/e63895 UR - http://dx.doi.org/10.2196/63895 ID - info:doi/10.2196/63895 ER - TY - JOUR AU - Guo, Weiqi AU - Chen, Yang PY - 2025/3/5 TI - Investigating Whether AI Will Replace Human Physicians and Understanding the Interplay of the Source of Consultation, Health-Related Stigma, and Explanations of Diagnoses on Patients? Evaluations of Medical Consultations: Randomized Factorial Experiment JO - J Med Internet Res SP - e66760 VL - 27 KW - artificial intelligence KW - AI KW - medical artificial intelligence KW - medical AI KW - human?artificial intelligence interaction KW - human-AI interaction KW - medical consultation KW - health-related stigma KW - diagnosis explanation KW - health communication N2 - Background: The increasing use of artificial intelligence (AI) in medical diagnosis and consultation promises benefits such as greater accuracy and efficiency. However, there is little evidence to systematically test whether the ideal technological promises translate into an improved evaluation of the medical consultation from the patient?s perspective. This perspective is significant because AI as a technological solution does not necessarily improve patient confidence in diagnosis and adherence to treatment at the functional level, create meaningful interactions between the medical agent and the patient at the relational level, evoke positive emotions, or reduce the patient?s pessimism at the emotional level. Objective: This study aims to investigate, from a patient-centered perspective, whether AI or human-involved AI can replace the role of human physicians in diagnosis at the functional, relational, and emotional levels as well as how some health-related differences between human-AI and human-human interactions affect patients? evaluations of the medical consultation. Methods: A 3 (consultation source: AI vs human-involved AI vs human) × 2 (health-related stigma: low vs high) × 2 (diagnosis explanation: without vs with explanation) factorial experiment was conducted with 249 participants. The main effects and interaction effects of the variables were examined on individuals? functional, relational, and emotional evaluations of the medical consultation. Results: Functionally, people trusted the diagnosis of the human physician (mean 4.78-4.85, SD 0.06-0.07) more than medical AI (mean 4.34-4.55, SD 0.06-0.07) or human-involved AI (mean 4.39-4.56, SD 0.06-0.07; P<.001), but at the relational and emotional levels, there was no significant difference between human-AI and human-human interactions (P>.05). Health-related stigma had no significant effect on how people evaluated the medical consultation or contributed to preferring AI-powered systems over humans (P>.05); however, providing explanations of the diagnosis significantly improved the functional (P<.001), relational (P<.05), and emotional (P<.05) evaluations of the consultation for all 3 medical agents. Conclusions: The findings imply that at the current stage of AI development, people trust human expertise more than accurate AI, especially for decisions traditionally made by humans, such as medical diagnosis, supporting the algorithm aversion theory. Surprisingly, even for highly stigmatized diseases such as AIDS, where we assume anonymity and privacy are preferred in medical consultations, the dehumanization of AI does not contribute significantly to the preference for AI-powered medical agents over humans, suggesting that instrumental needs of diagnosis override patient privacy concerns. Furthermore, explaining the diagnosis effectively improves treatment adherence, strengthens the physician-patient relationship, and fosters positive emotions during the consultation. This provides insights for the design of AI medical agents, which have long been criticized for lacking transparency while making highly consequential decisions. This study concludes by outlining theoretical contributions to research on health communication and human-AI interaction and discusses the implications for the design and application of medical AI. UR - https://www.jmir.org/2025/1/e66760 UR - http://dx.doi.org/10.2196/66760 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053785 ID - info:doi/10.2196/66760 ER - TY - JOUR AU - Tang, Wen-Zhen AU - Mo, Shu-Tian AU - Xie, Yuan-Xi AU - Wei, Tian-Fu AU - Chen, Guo-Lian AU - Teng, Yan-Juan AU - Jia, Kui PY - 2025/3/4 TI - Predicting Overall Survival in Patients with Male Breast Cancer: Nomogram Development and External Validation Study JO - JMIR Cancer SP - e54625 VL - 11 KW - male breast cancer KW - specific survival KW - prediction model KW - nomogram KW - Surveillance, Epidemiology, and End Results database KW - SEER database N2 - Background: Male breast cancer (MBC) is an uncommon disease. Few studies have discussed the prognosis of MBC due to its rarity. Objective: This study aimed to develop a nomogram to predict the overall survival of patients with MBC and externally validate it using cases from China. Methods: Based on the Surveillance, Epidemiology, and End Results (SEER) database, male patients who were diagnosed with breast cancer between January 2010, and December 2015, were enrolled. These patients were randomly assigned to either a training set (n=1610) or a validation set (n=713) in a 7:3 ratio. Additionally, 22 MBC cases diagnosed at the First Affiliated Hospital of Guangxi Medical University between January 2013 and June 2021 were used for external validation, with the follow-up endpoint being June 10, 2023. Cox regression analysis was performed to identify significant risk variables and construct a nomogram to predict the overall survival of patients with MBC. Information collected from the test set was applied to validate the model. The concordance index (C-index), receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and a Kaplan-Meier survival curve were used to evaluate the accuracy and reliability of the model. Results: A total of 2301 patients with MBC in the SEER database and 22 patients with MBC from the study hospital were included. The predictive model included 7 variables: age (hazard ratio [HR] 1.89, 95% CI 1.50?2.38), surgery (HR 0.38, 95% CI 0.29?0.51), marital status (HR 0.75, 95% CI 0.63?0.89), tumor stage (HR 1.17, 95% CI 1.05?1.29), clinical stage (HR 1.41, 95% CI 1.15?1.74), chemotherapy (HR 0.62, 95% CI 0.50?0.75), and HER2 status (HR 2.68, 95% CI 1.20?5.98). The C-index was 0.72, 0.747, and 0.981 in the training set, internal validation set, and external validation set, respectively. The nomogram showed accurate calibration, and the ROC curve confirmed the advantage of the model in clinical validity. The DCA analysis indicated that the model had good clinical applicability. Furthermore, the nomogram classification allowed for more accurate differentiation of risk subgroups, and patients with low-risk MBC demonstrated substantially improved survival outcomes compared with medium- and high-risk patients (P<.001). Conclusions: A survival prognosis prediction nomogram with 7 variables for patients with MBC was constructed in this study. The model can predict the survival outcome of these patients and provide a scientific basis for clinical diagnosis and treatment. UR - https://cancer.jmir.org/2025/1/e54625 UR - http://dx.doi.org/10.2196/54625 ID - info:doi/10.2196/54625 ER - TY - JOUR AU - Cabral, Pereira Bernardo AU - Braga, Maciel Luiza Amara AU - Conte Filho, Gilbert Carlos AU - Penteado, Bruno AU - Freire de Castro Silva, Luis Sandro AU - Castro, Leonardo AU - Fornazin, Marcelo AU - Mota, Fabio PY - 2025/2/27 TI - Future Use of AI in Diagnostic Medicine: 2-Wave Cross-Sectional Survey Study JO - J Med Internet Res SP - e53892 VL - 27 KW - artificial intelligence KW - AI KW - diagnostic medicine KW - survey research KW - researcher opinion KW - future N2 - Background: The rapid evolution of artificial intelligence (AI) presents transformative potential for diagnostic medicine, offering opportunities to enhance diagnostic accuracy, reduce costs, and improve patient outcomes. Objective: This study aimed to assess the expected future impact of AI on diagnostic medicine by comparing global researchers? expectations using 2 cross-sectional surveys. Methods: The surveys were conducted in September 2020 and February 2023. Each survey captured a 10-year projection horizon, gathering insights from >3700 researchers with expertise in AI and diagnostic medicine from all over the world. The survey sought to understand the perceived benefits, integration challenges, and evolving attitudes toward AI use in diagnostic settings. Results: Results indicated a strong expectation among researchers that AI will substantially influence diagnostic medicine within the next decade. Key anticipated benefits include enhanced diagnostic reliability, reduced screening costs, improved patient care, and decreased physician workload, addressing the growing demand for diagnostic services outpacing the supply of medical professionals. Specifically, x-ray diagnosis, heart rhythm interpretation, and skin malignancy detection were identified as the diagnostic tools most likely to be integrated with AI technologies due to their maturity and existing AI applications. The surveys highlighted the growing optimism regarding AI?s ability to transform traditional diagnostic pathways and enhance clinical decision-making processes. Furthermore, the study identified barriers to the integration of AI in diagnostic medicine. The primary challenges cited were the difficulties of embedding AI within existing clinical workflows, ethical and regulatory concerns, and data privacy issues. Respondents emphasized uncertainties around legal responsibility and accountability for AI-supported clinical decisions, data protection challenges, and the need for robust regulatory frameworks to ensure safe AI deployment. Ethical concerns, particularly those related to algorithmic transparency and bias, were noted as increasingly critical, reflecting a heightened awareness of the potential risks associated with AI adoption in clinical settings. Differences between the 2 survey waves indicated a growing focus on ethical and regulatory issues, suggesting an evolving recognition of these challenges over time. Conclusions: Despite these barriers, there was notable consistency in researchers? expectations across the 2 survey periods, indicating a stable and sustained outlook on AI?s transformative potential in diagnostic medicine. The findings show the need for interdisciplinary collaboration among clinicians, AI developers, and regulators to address ethical and practical challenges while maximizing AI?s benefits. This study offers insights into the projected trajectory of AI in diagnostic medicine, guiding stakeholders, including health care providers, policy makers, and technology developers, on navigating the opportunities and challenges of AI integration. UR - https://www.jmir.org/2025/1/e53892 UR - http://dx.doi.org/10.2196/53892 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053779 ID - info:doi/10.2196/53892 ER - TY - JOUR AU - Song, Xiaowei AU - Wang, Jiayi AU - He, Feifei AU - Yin, Wei AU - Ma, Weizhi AU - Wu, Jian PY - 2025/2/26 TI - Stroke Diagnosis and Prediction Tool Using ChatGLM: Development and Validation Study JO - J Med Internet Res SP - e67010 VL - 27 KW - stroke KW - diagnosis KW - large language model KW - ChatGLM KW - generative language model KW - primary care KW - acute stroke KW - prediction tool KW - stroke detection KW - treatment KW - electronic health records KW - noncontrast computed tomography N2 - Background: Stroke is a globally prevalent disease that imposes a significant burden on health care systems and national economies. Accurate and rapid stroke diagnosis can substantially increase reperfusion rates, mitigate disability, and reduce mortality. However, there are considerable discrepancies in the diagnosis and treatment of acute stroke. Objective: The aim of this study is to develop and validate a stroke diagnosis and prediction tool using ChatGLM-6B, which uses free-text information from electronic health records in conjunction with noncontrast computed tomography (NCCT) reports to enhance stroke detection and treatment. Methods: A large language model (LLM) using ChatGLM-6B was proposed to facilitate stroke diagnosis by identifying optimal input combinations, using external tools, and applying instruction tuning and low-rank adaptation (LoRA) techniques. A dataset containing details of 1885 patients with and those without stroke from 2016 to 2024 was used for training and internal validation; another 335 patients from two hospitals were used as an external test set, including 230 patients from the training hospital but admitted at different periods, and 105 patients from another hospital. Results: The LLM, which is based on clinical notes and NCCT, demonstrates exceptionally high accuracy in stroke diagnosis, achieving 99% in the internal validation dataset and 95.5% and 79.1% in two external test cohorts. It effectively distinguishes between ischemia and hemorrhage, with an accuracy of 100% in the validation dataset and 99.1% and 97.1% in the other test cohorts. In addition, it identifies large vessel occlusions (LVO) with an accuracy of 80% in the validation dataset and 88.6% and 83.3% in the other test cohorts. Furthermore, it screens patients eligible for intravenous thrombolysis (IVT) with an accuracy of 89.4% in the validation dataset and 60% and 80% in the other test cohorts. Conclusions: We developed an LLM that leverages clinical text and NCCT to identify strokes and guide recanalization therapy. While our results necessitate validation through widespread deployment, they hold the potential to enhance stroke identification and reduce reperfusion time. UR - https://www.jmir.org/2025/1/e67010 UR - http://dx.doi.org/10.2196/67010 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67010 ER - TY - JOUR AU - Campagner, Andrea AU - Agnello, Luisa AU - Carobene, Anna AU - Padoan, Andrea AU - Del Ben, Fabio AU - Locatelli, Massimo AU - Plebani, Mario AU - Ognibene, Agostino AU - Lorubbio, Maria AU - De Vecchi, Elena AU - Cortegiani, Andrea AU - Piva, Elisa AU - Poz, Donatella AU - Curcio, Francesco AU - Cabitza, Federico AU - Ciaccio, Marcello PY - 2025/2/26 TI - Complete Blood Count and Monocyte Distribution Width?Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study JO - J Med Internet Res SP - e55492 VL - 27 KW - sepsis KW - medical machine learning KW - external validation KW - complete blood count KW - controllable AI KW - machine learning KW - artificial intelligence KW - development study KW - validation study KW - organ KW - organ dysfunction KW - detection KW - clinical signs KW - clinical symptoms KW - biomarker KW - diagnostic KW - machine learning model KW - sepsis detection KW - early detection KW - data distribution N2 - Background: Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improving the patient outcome. Laboratory medicine can play a crucial role by providing biomarkers whose alteration can be detected before the onset of clinical signs and symptoms. In particular, the relevance of monocyte distribution width (MDW) as a sepsis biomarker has emerged in the previous decade. However, despite encouraging results, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. Objective: This study aims to investigate the use of machine learning (ML) to overcome the limitations mentioned earlier by combining different parameters and therefore improving sepsis detection. However, making ML models function in clinical practice may be problematic, as their performance may suffer when deployed in contexts other than the research environment. In fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. Methods: In this multicentric study, we developed ML models whose intended use is the early detection of sepsis on the basis of MDW and complete blood count parameters. In total, data from 6 patient cohorts (encompassing 5344 patients) collected at 5 different Italian hospitals were used to train and externally validate ML models. The models were trained on a patient cohort encompassing patients enrolled at the emergency department, and it was externally validated on 5 different cohorts encompassing patients enrolled at both the emergency department and the intensive care unit. The cohorts were selected to exhibit a variety of data distribution shifts compared to the training set, including label, covariate, and missing data shifts, enabling a conservative validation of the developed models. To improve generalizability and robustness to different types of distribution shifts, the developed ML models combine traditional methodologies with advanced techniques inspired by controllable artificial intelligence (AI), namely cautious classification, which gives the ML models the ability to abstain from making predictions, and explainable AI, which provides health operators with useful information about the models? functioning. Results: The developed models achieved good performance on the internal validation (area under the receiver operating characteristic curve between 0.91 and 0.98), as well as consistent generalization performance across the external validation datasets (area under the receiver operating characteristic curve between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance and were used to derive an interpretable set of diagnostic rules. Conclusions: Our findings demonstrate how controllable AI approaches based on complete blood count and MDW may be used for the early detection of sepsis while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts. UR - https://www.jmir.org/2025/1/e55492 UR - http://dx.doi.org/10.2196/55492 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55492 ER - TY - JOUR AU - Fu, Yao AU - Huang, Zongyao AU - Deng, Xudong AU - Xu, Linna AU - Liu, Yang AU - Zhang, Mingxing AU - Liu, Jinyi AU - Huang, Bin PY - 2025/2/14 TI - Artificial Intelligence in Lymphoma Histopathology: Systematic Review JO - J Med Internet Res SP - e62851 VL - 27 KW - lymphoma KW - artificial intelligence KW - bias KW - histopathology KW - tumor KW - hematological KW - lymphatic disease KW - public health KW - pathologists KW - pathology KW - immunohistochemistry KW - diagnosis KW - prognosis N2 - Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3% to 100%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. UR - https://www.jmir.org/2025/1/e62851 UR - http://dx.doi.org/10.2196/62851 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/62851 ER - TY - JOUR AU - Shimada, Hiroyuki AU - Doi, Takehiko AU - Tsutsumimoto, Kota AU - Makino, Keitaro AU - Harada, Kenji AU - Tomida, Kouki AU - Morikawa, Masanori AU - Makizako, Hyuma PY - 2025/2/14 TI - A New Computer-Based Cognitive Measure for Early Detection of Dementia Risk (Japan Cognitive Function Test): Validation Study JO - J Med Internet Res SP - e59015 VL - 27 KW - cognition KW - neurocognitive test KW - dementia KW - Alzheimer disease KW - aged KW - MMSE KW - cognitive impairment KW - Mini-Mental State Examination KW - monitoring KW - eHealth N2 - Background: The emergence of disease-modifying treatment options for Alzheimer disease is creating a paradigm shift in strategies to identify patients with mild symptoms in primary care settings. Systematic reviews on digital cognitive tests reported that most showed diagnostic performance comparable with that of paper-and-pencil tests for mild cognitive impairment and dementia. However, most studies have small sample sizes, with fewer than 100 individuals, and are based on case-control or cross-sectional designs. Objective: This study aimed to examine the predictive validity of the Japanese Cognitive Function Test (J-Cog), a new computerized cognitive battery test, for dementia development. Methods: We randomly assigned 2520 older adults (average age 72.7, SD 6.7 years) to derivation and validation groups to determine and validate cutoff points for the onset of dementia. The Mini-Mental State Examination (MMSE) was used for comparison purposes. The J-Cog consists of 12 tasks that assess orientation, designation, attention and calculation, mental rotation, verbal fluency, sentence completion, working memory, logical reasoning, attention, common knowledge, word memory recall, and episodic memory recall. The onset of dementia was monitored for 60 months. In the derivation group, receiver operating characteristic curves were plotted to determine the MMSE and J-Cog cutoff points that best discriminated between the groups with and without dementia. In the validation group, Cox proportional regression models were developed to predict the associations of the group classified using the cutoff points of the J-Cog or MMSE with dementia incidence. Harrell C-statistic was estimated to summarize how well a predicted risk score described an observed sequence of events. The Akaike information criterion was calculated for relative goodness of fit, where lower absolute values indicate a better model fit. Results: Significant hazard ratios (HRs) for dementia incidence were found using the MMSE cutoff between 23 and 24 point (HR 1.93, 95% CI 1.13-3.27) and the J-Cog cutoff between 43 and 44 points (HR 2.42, 95% CI 1.50-3.93). In the total validation group, the C-statistic was above 0.8 for all cutoff points. Akaike information criterion with MMSE cutoff between 23 and 24 points as a reference showed a poor fit for MMSE cutoff between 28 and 29 points, and a good fit for the J-Cog cutoff between 43 and 44 points. Conclusions: The J-Cog has higher accuracy in predicting the development of dementia than the MMSE and has advantages for use in the community as a test of cognitive function, which can be administered by nonprofessionals. UR - https://www.jmir.org/2025/1/e59015 UR - http://dx.doi.org/10.2196/59015 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59015 ER - TY - JOUR AU - Rubaiat, Rahmina AU - Templeton, Michael John AU - Schneider, L. Sandra AU - De Silva, Upeka AU - Madanian, Samaneh AU - Poellabauer, Christian PY - 2025/2/12 TI - Exploring Speech Biosignatures for Traumatic Brain Injury and Neurodegeneration: Pilot Machine Learning Study JO - JMIR Neurotech SP - e64624 VL - 4 KW - speech biosignatures KW - speech feature analysis KW - amyotrophic lateral sclerosis KW - ALS KW - neurodegenerative disease KW - Parkinson's disease KW - detection KW - speech KW - neurological KW - traumatic brain injury KW - concussion KW - mobile device KW - digital health KW - machine learning KW - mobile health KW - diagnosis KW - mobile phone N2 - Background: Speech features are increasingly linked to neurodegenerative and mental health conditions, offering the potential for early detection and differentiation between disorders. As interest in speech analysis grows, distinguishing between conditions becomes critical for reliable diagnosis and assessment. Objective: This pilot study explores speech biosignatures in two distinct neurodegenerative conditions: (1) mild traumatic brain injuries (eg, concussions) and (2) Parkinson disease (PD) as the neurodegenerative condition. Methods: The study included speech samples from 235 participants (97 concussed and 94 age-matched healthy controls, 29 PD and 15 healthy controls) for the PaTaKa test and 239 participants (91 concussed and 104 healthy controls, 29 PD and 15 healthy controls) for the Sustained Vowel (/ah/) test. Age-matched healthy controls were used. Young age-matched controls were used for concussion and respective age-matched controls for neurodegenerative participants (15 healthy samples for both tests). Data augmentation with noise was applied to balance small datasets for neurodegenerative and healthy controls. Machine learning models (support vector machine, decision tree, random forest, and Extreme Gradient Boosting) were employed using 37 temporal and spectral speech features. A 5-fold stratified cross-validation was used to evaluate classification performance. Results: For the PaTaKa test, classifiers performed well, achieving F1-scores above 0.9 for concussed versus healthy and concussed versus neurodegenerative classifications across all models. Initial tests using the original dataset for neurodegenerative versus healthy classification yielded very poor results, with F1-scores below 0.2 and accuracy under 30% (eg, below 12 out of 44 correctly classified samples) across all models. This underscored the need for data augmentation, which significantly improved performance to 60%?70% (eg, 26?31 out of 44 samples) accuracy. In contrast, the Sustained Vowel test showed mixed results; F1-scores remained high (more than 0.85 across all models) for concussed versus neurodegenerative classifications but were significantly lower for concussed versus healthy (0.59?0.62) and neurodegenerative versus healthy (0.33?0.77), depending on the model. Conclusions: This study highlights the potential of speech features as biomarkers for neurodegenerative conditions. The PaTaKa test exhibited strong discriminative ability, especially for concussed versus neurodegenerative and concussed versus healthy tasks, whereas challenges remain for neurodegenerative versus healthy classification. These findings emphasize the need for further exploration of speech-based tools for differential diagnosis and early identification in neurodegenerative health. UR - https://neuro.jmir.org/2025/1/e64624 UR - http://dx.doi.org/10.2196/64624 ID - info:doi/10.2196/64624 ER - TY - JOUR AU - Downing, J. Gregory AU - Tramontozzi, M. Lucas AU - Garcia, Jackson AU - Villanueva, Emma PY - 2025/2/11 TI - Harnessing Internet Search Data as a Potential Tool for Medical Diagnosis: Literature Review JO - JMIR Ment Health SP - e63149 VL - 12 KW - health KW - informatics KW - internet search data KW - early diagnosis KW - web search KW - information technology KW - internet KW - machine learning KW - medical records KW - diagnosis KW - health care KW - self-diagnosis KW - detection KW - intervention KW - patient education KW - internet search KW - health-seeking behavior KW - artificial intelligence KW - AI N2 - Background: The integration of information technology into health care has created opportunities to address diagnostic challenges. Internet searches, representing a vast source of health-related data, hold promise for improving early disease detection. Studies suggest that patterns in search behavior can reveal symptoms before clinical diagnosis, offering potential for innovative diagnostic tools. Leveraging advancements in machine learning, researchers have explored linking search data with health records to enhance screening and outcomes. However, challenges like privacy, bias, and scalability remain critical to its widespread adoption. Objective: We aimed to explore the potential and challenges of using internet search data in medical diagnosis, with a specific focus on diseases and conditions such as cancer, cardiovascular disease, mental and behavioral health, neurodegenerative disorders, and nutritional and metabolic diseases. We examined ethical, technical, and policy considerations while assessing the current state of research, identifying gaps and limitations, and proposing future research directions to advance this emerging field. Methods: We conducted a comprehensive analysis of peer-reviewed literature and informational interviews with subject matter experts to examine the landscape of internet search data use in medical research. We searched for published peer-reviewed literature on the PubMed database between October and December 2023. Results: Systematic selection based on predefined criteria included 40 articles from the 2499 identified articles. The analysis revealed a nascent domain of internet search data research in medical diagnosis, marked by advancements in analytics and data integration. Despite challenges such as bias, privacy, and infrastructure limitations, emerging initiatives could reshape data collection and privacy safeguards. Conclusions: We identified signals correlating with diagnostic considerations in certain diseases and conditions, indicating the potential for such data to enhance clinical diagnostic capabilities. However, leveraging internet search data for improved early diagnosis and health care outcomes requires effectively addressing ethical, technical, and policy challenges. By fostering interdisciplinary collaboration, advancing infrastructure development, and prioritizing patient engagement and consent, researchers can unlock the transformative potential of internet search data in medical diagnosis to ultimately enhance patient care and advance health care practice and policy. UR - https://mental.jmir.org/2025/1/e63149 UR - http://dx.doi.org/10.2196/63149 UR - http://www.ncbi.nlm.nih.gov/pubmed/39813106 ID - info:doi/10.2196/63149 ER - TY - JOUR AU - Stroud, M. Austin AU - Curtis, H. Susan AU - Weir, B. Isabel AU - Stout, J. Jeremiah AU - Barry, A. Barbara AU - Bobo, V. William AU - Athreya, P. Arjun AU - Sharp, R. Richard PY - 2025/2/10 TI - Physician Perspectives on the Potential Benefits and Risks of Applying Artificial Intelligence in Psychiatric Medicine: Qualitative Study JO - JMIR Ment Health SP - e64414 VL - 12 KW - artificial intelligence KW - machine learning KW - digital health KW - mental health KW - psychiatry KW - depression KW - interviews KW - family medicine KW - physicians KW - qualitative KW - providers KW - attitudes KW - opinions KW - perspectives KW - ethics N2 - Background: As artificial intelligence (AI) tools are integrated more widely in psychiatric medicine, it is important to consider the impact these tools will have on clinical practice. Objective: This study aimed to characterize physician perspectives on the potential impact AI tools will have in psychiatric medicine. Methods: We interviewed 42 physicians (21 psychiatrists and 21 family medicine practitioners). These interviews used detailed clinical case scenarios involving the use of AI technologies in the evaluation, diagnosis, and treatment of psychiatric conditions. Interviews were transcribed and subsequently analyzed using qualitative analysis methods. Results: Physicians highlighted multiple potential benefits of AI tools, including potential support for optimizing pharmaceutical efficacy, reducing administrative burden, aiding shared decision-making, and increasing access to health services, and were optimistic about the long-term impact of these technologies. This optimism was tempered by concerns about potential near-term risks to both patients and themselves including misguiding clinical judgment, increasing clinical burden, introducing patient harms, and creating legal liability. Conclusions: Our results highlight the importance of considering specialist perspectives when deploying AI tools in psychiatric medicine. UR - https://mental.jmir.org/2025/1/e64414 UR - http://dx.doi.org/10.2196/64414 UR - http://www.ncbi.nlm.nih.gov/pubmed/39928397 ID - info:doi/10.2196/64414 ER - TY - JOUR AU - Rz?deczka, Marcin AU - Sterna, Anna AU - Stoli?ska, Julia AU - Kaczy?ska, Paulina AU - Moskalewicz, Marcin PY - 2025/2/7 TI - The Efficacy of Conversational AI in Rectifying the Theory-of-Mind and Autonomy Biases: Comparative Analysis JO - JMIR Ment Health SP - e64396 VL - 12 KW - cognitive bias KW - conversational artificial intelligence KW - artificial intelligence KW - AI KW - chatbots KW - digital mental health KW - bias rectification KW - affect recognition N2 - Background: The increasing deployment of conversational artificial intelligence (AI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases are particularly relevant in mental health contexts as they can exacerbate conditions such as depression and anxiety by reinforcing maladaptive thought patterns or unrealistic expectations in human-AI interactions. Objective: This study aimed to assess the effectiveness of therapeutic chatbots (Wysa and Youper) versus general-purpose language models (GPT-3.5, GPT-4, and Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods: This study used constructed case scenarios simulating typical user-bot interactions to examine how effectively chatbots address selected cognitive biases. The cognitive biases assessed included theory-of-mind biases (anthropomorphism, overtrust, and attribution) and autonomy biases (illusion of control, fundamental attribution error, and just-world hypothesis). Each chatbot response was evaluated based on accuracy, therapeutic quality, and adherence to cognitive behavioral therapy principles using an ordinal scale to ensure consistency in scoring. To enhance reliability, responses underwent a double review process by 2 cognitive scientists, followed by a secondary review by a clinical psychologist specializing in cognitive behavioral therapy, ensuring a robust assessment across interdisciplinary perspectives. Results: This study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, whereas the therapeutic bot Wysa scored the lowest. Notably, general-purpose bots showed more consistent accuracy and adaptability in recognizing and addressing bias-related cues across different contexts, suggesting a broader flexibility in handling complex cognitive patterns. In addition, in affect recognition tasks, general-purpose chatbots not only excelled but also demonstrated quicker adaptation to subtle emotional nuances, outperforming therapeutic bots in 67% (4/6) of the tested biases. Conclusions: This study shows that, while therapeutic chatbots hold promise for mental health support and cognitive bias intervention, their current capabilities are limited. Addressing cognitive biases in AI-human interactions requires systems that can both rectify and analyze biases as integral to human cognition, promoting precision and simulating empathy. The findings reveal the need for improved simulated emotional intelligence in chatbot design to provide adaptive, personalized responses that reduce overreliance and encourage independent coping skills. Future research should focus on enhancing affective response mechanisms and addressing ethical concerns such as bias mitigation and data privacy to ensure safe, effective AI-based mental health support. UR - https://mental.jmir.org/2025/1/e64396 UR - http://dx.doi.org/10.2196/64396 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/64396 ER - TY - JOUR AU - Siira, Elin AU - Johansson, Hanna AU - Nygren, Jens PY - 2025/2/6 TI - Mapping and Summarizing the Research on AI Systems for Automating Medical History Taking and Triage: Scoping Review JO - J Med Internet Res SP - e53741 VL - 27 KW - scoping review KW - artificial intelligence KW - AI KW - medical history taking KW - triage KW - health care KW - automation N2 - Background: The integration of artificial intelligence (AI) systems for automating medical history taking and triage can significantly enhance patient flow in health care systems. Despite the promising performance of numerous AI studies, only a limited number of these systems have been successfully integrated into routine health care practice. To elucidate how AI systems can create value in this context, it is crucial to identify the current state of knowledge, including the readiness of these systems, the facilitators of and barriers to their implementation, and the perspectives of various stakeholders involved in their development and deployment. Objective: This study aims to map and summarize empirical research on AI systems designed for automating medical history taking and triage in health care settings. Methods: The study was conducted following the framework proposed by Arksey and O?Malley and adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guidelines. A comprehensive search of 5 databases?PubMed, CINAHL, PsycINFO, Scopus, and Web of Science?was performed. A detailed protocol was established before the review to ensure methodological rigor. Results: A total of 1248 research publications were identified and screened. Of these, 86 (6.89%) met the eligibility criteria. Notably, most (n=63, 73%) studies were published between 2020 and 2022, with a significant concentration on emergency care (n=32, 37%). Other clinical contexts included radiology (n=12, 14%) and primary care (n=6, 7%). Many (n=15, 17%) studies did not specify a clinical context. Most (n=31, 36%) studies used retrospective designs, while others (n=34, 40%) did not specify their methodologies. The predominant type of AI system identified was the hybrid model (n=68, 79%), with forecasting (n=40, 47%) and recognition (n=36, 42%) being the most common tasks performed. While most (n=70, 81%) studies included patient populations, only 1 (1%) study investigated patients? views on AI-based medical history taking and triage, and 2 (2%) studies considered health care professionals? perspectives. Furthermore, only 6 (7%) studies validated or demonstrated AI systems in relevant clinical settings through real-time model testing, workflow implementation, clinical outcome evaluation, or integration into practice. Most (n=76, 88%) studies were concerned with the prototyping, development, or validation of AI systems. In total, 4 (5%) studies were reviews of several empirical studies conducted in different clinical settings. The facilitators and barriers to AI system implementation were categorized into 4 themes: technical aspects, contextual and cultural considerations, end-user engagement, and evaluation processes. Conclusions: This review highlights current trends, stakeholder perspectives, stages of innovation development, and key influencing factors related to implementing AI systems in health care. The identified literature gaps regarding stakeholder perspectives and the limited research on AI systems for automating medical history taking and triage indicate significant opportunities for further investigation and development in this evolving field. UR - https://www.jmir.org/2025/1/e53741 UR - http://dx.doi.org/10.2196/53741 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53741 ER - TY - JOUR AU - Jonnalagedda-Cattin, Magali AU - Moukam Datchoua, Manoëla Alida AU - Yakam, Flore Virginie AU - Kenfack, Bruno AU - Petignat, Patrick AU - Thiran, Jean-Philippe AU - Schönenberger, Klaus AU - Schmidt, C. Nicole PY - 2025/2/5 TI - Barriers and Facilitators to the Preadoption of a Computer-Aided Diagnosis Tool for Cervical Cancer: Qualitative Study on Health Care Providers? Perspectives in Western Cameroon JO - JMIR Cancer SP - e50124 VL - 11 KW - qualitative research KW - technology acceptance KW - cervical cancer KW - diagnosis KW - computer-assisted KW - decision support systems KW - artificial intelligence KW - health personnel attitudes KW - Cameroon KW - mobile phone N2 - Background: Computer-aided detection and diagnosis (CAD) systems can enhance the objectivity of visual inspection with acetic acid (VIA), which is widely used in low- and middle-income countries (LMICs) for cervical cancer detection. VIA?s reliance on subjective health care provider (HCP) interpretation introduces variability in diagnostic accuracy. CAD tools can address some limitations; nonetheless, understanding the contextual factors affecting CAD integration is essential for effective adoption and sustained use, particularly in resource-constrained settings. Objective: This study investigated the barriers and facilitators perceived by HCPs in Western Cameroon regarding sustained CAD tool use for cervical cancer detection using VIA. The aim was to guide smooth technology adoption in similar settings by identifying specific barriers and facilitators and optimizing CAD?s potential benefits while minimizing obstacles. Methods: The perspectives of HCPs on adopting CAD for VIA were explored using a qualitative methodology. The study participants included 8 HCPs (6 midwives and 2 gynecologists) working in the Dschang district, Cameroon. Focus group discussions were conducted with midwives, while individual interviews were conducted with gynecologists to comprehend unique perspectives. Each interview was audio-recorded, transcribed, and independently coded by 2 researchers using the ATLAS.ti (Lumivero, LLC) software. The technology acceptance lifecycle framework guided the content analysis, focusing on the preadoption phases to examine the perceived acceptability and initial acceptance of the CAD tool in clinical workflows. The study findings were reported adhering to the COREQ (Consolidated Criteria for Reporting Qualitative Research) and SRQR (Standards for Reporting Qualitative Research) checklists. Results: Key elements influencing the sustained use of CAD tools for VIA by HCPs were identified, primarily within the technology acceptance lifecycle?s preadoption framework. Barriers included the system?s ease of use, particularly challenges associated with image acquisition, concerns over confidentiality and data security, limited infrastructure and resources such as the internet and device quality, and potential workflow changes. Facilitators encompassed the perceived improved patient care, the potential for enhanced diagnostic accuracy, and the integration of CAD tools into routine clinical practices, provided that infrastructure and training were adequate. The HCPs emphasized the importance of clinical validation, usability testing, and iterative feedback mechanisms to build trust in the CAD tool?s accuracy and utility. Conclusions: This study provides practical insights from HCPs in Western Cameroon regarding the adoption of CAD tools for VIA in clinical settings. CAD technology can aid diagnostic objectivity; however, data management, workflow adaptation, and infrastructure limitations must be addressed to avoid ?pilotitis??the failure of digital health tools to progress beyond the pilot phase. Effective implementation requires comprehensive technology management, including regulatory compliance, infrastructure support, and user-focused training. Involving end users can ensure that CAD tools are fully integrated and embraced in LMICs to aid cervical cancer screening. UR - https://cancer.jmir.org/2025/1/e50124 UR - http://dx.doi.org/10.2196/50124 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/50124 ER - TY - JOUR AU - Choomung, Pichsinee AU - He, Yupeng AU - Matsunaga, Masaaki AU - Sakuma, Kenji AU - Kishi, Taro AU - Li, Yuanying AU - Tanihara, Shinichi AU - Iwata, Nakao AU - Ota, Atsuhiko PY - 2025/1/29 TI - Estimating the Prevalence of Schizophrenia in the General Population of Japan Using an Artificial Neural Network?Based Schizophrenia Classifier: Web-Based Cross-Sectional Survey JO - JMIR Form Res SP - e66330 VL - 9 KW - schizophrenia KW - schizophrenic KW - prevalence KW - artificial neural network KW - neural network KW - neural networks KW - ANN KW - deep learning KW - machine learning KW - SZ classifier KW - web-based survey KW - epidemiology KW - epidemiological KW - Japan KW - classifiers KW - mental illness KW - mental disorder KW - mental health N2 - Background: Estimating the prevalence of schizophrenia in the general population remains a challenge worldwide, as well as in Japan. Few studies have estimated schizophrenia prevalence in the Japanese population and have often relied on reports from hospitals and self-reported physician diagnoses or typical schizophrenia symptoms. These approaches are likely to underestimate the true prevalence owing to stigma, poor insight, or lack of access to health care among respondents. To address these issues, we previously developed an artificial neural network (ANN)?based schizophrenia classification model (SZ classifier) using data from a large-scale Japanese web-based survey to enhance the comprehensiveness of schizophrenia case identification in the general population. In addition, we also plan to introduce a population-based survey to collect general information and sample participants matching the population?s demographic structure, thereby achieving a precise estimate of the prevalence of schizophrenia in Japan. Objective: This study aimed to estimate the prevalence of schizophrenia by applying the SZ classifier to random samples from the Japanese population. Methods: We randomly selected a sample of 750 participants where the age, sex, and regional distributions were similar to Japan?s demographic structure from a large-scale Japanese web-based survey. Demographic data, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities were collected and applied to the SZ classifier, as this information was also used for developing the SZ classifier. The crude prevalence of schizophrenia was calculated through the proportion of positive cases detected by the SZ classifier. The crude estimate was further refined by excluding false-positive cases and including false-negative cases to determine the actual prevalence of schizophrenia. Results: Out of 750 participants, 62 were classified as schizophrenia cases by the SZ classifier, resulting in a crude prevalence of schizophrenia in the general population of Japan of 8.3% (95% CI 6.6%-10.1%). Among these 62 cases, 53 were presumed to be false positives, and 3 were presumed to be false negatives. After adjustment, the actual prevalence of schizophrenia in the general population was estimated to be 1.6% (95% CI 0.7%-2.5%). Conclusions: This estimated prevalence was slightly higher than that reported in previous studies, possibly due to a more comprehensive disease classification methodology or, conversely, model limitations. This study demonstrates the capability of an ANN-based model to improve the estimation of schizophrenia prevalence in the general population, offering a novel approach to public health analysis. UR - https://formative.jmir.org/2025/1/e66330 UR - http://dx.doi.org/10.2196/66330 ID - info:doi/10.2196/66330 ER - TY - JOUR AU - Jiang, Yiqun AU - Li, Qing AU - Huang, Yu-Li AU - Zhang, Wenli PY - 2025/1/29 TI - Urgency Prediction for Medical Laboratory Tests Through Optimal Sparse Decision Tree: Case Study With Echocardiograms JO - JMIR AI SP - e64188 VL - 4 KW - interpretable machine learning KW - urgency prediction KW - appointment scheduling KW - echocardiogram KW - health care management N2 - Background: In the contemporary realm of health care, laboratory tests stand as cornerstone components, driving the advancement of precision medicine. These tests offer intricate insights into a variety of medical conditions, thereby facilitating diagnosis, prognosis, and treatments. However, the accessibility of certain tests is hindered by factors such as high costs, a shortage of specialized personnel, or geographic disparities, posing obstacles to achieving equitable health care. For example, an echocardiogram is a type of laboratory test that is extremely important and not easily accessible. The increasing demand for echocardiograms underscores the imperative for more efficient scheduling protocols. Despite this pressing need, limited research has been conducted in this area. Objective: The study aims to develop an interpretable machine learning model for determining the urgency of patients requiring echocardiograms, thereby aiding in the prioritization of scheduling procedures. Furthermore, this study aims to glean insights into the pivotal attributes influencing the prioritization of echocardiogram appointments, leveraging the high interpretability of the machine learning model. Methods: Empirical and predictive analyses have been conducted to assess the urgency of patients based on a large real-world echocardiogram appointment dataset (ie, 34,293 appointments) sourced from electronic health records encompassing administrative information, referral diagnosis, and underlying patient conditions. We used a state-of-the-art interpretable machine learning algorithm, the optimal sparse decision tree (OSDT), renowned for its high accuracy and interpretability, to investigate the attributes pertinent to echocardiogram appointments. Results: The method demonstrated satisfactory performance (F1-score=36.18% with an improvement of 1.7% and F2-score=28.18% with an improvement of 0.79% by the best-performing baseline model) in comparison to the best-performing baseline model. Moreover, due to its high interpretability, the results provide valuable medical insights regarding the identification of urgent patients for tests through the extraction of decision rules from the OSDT model. Conclusions: The method demonstrated state-of-the-art predictive performance, affirming its effectiveness. Furthermore, we validate the decision rules derived from the OSDT model by comparing them with established medical knowledge. These interpretable results (eg, attribute importance and decision rules from the OSDT model) underscore the potential of our approach in prioritizing patient urgency for echocardiogram appointments and can be extended to prioritize other laboratory test appointments using electronic health record data. UR - https://ai.jmir.org/2025/1/e64188 UR - http://dx.doi.org/10.2196/64188 UR - http://www.ncbi.nlm.nih.gov/pubmed/39879091 ID - info:doi/10.2196/64188 ER - TY - JOUR AU - Ghaffar, Faisal AU - Furtado, M. Nadine AU - Ali, Imad AU - Burns, Catherine PY - 2025/1/29 TI - Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design JO - JMIR Med Inform SP - e63109 VL - 13 KW - decision-making KW - human-centered AI design KW - human factors KW - experts versus novices differences KW - optometry KW - glaucoma diagnosis KW - experts versus novices KW - glaucoma KW - eye disease KW - vision KW - vision impairment KW - comparative analysis KW - methodology KW - optometrist KW - artificial intelligence KW - AI KW - diagnostic accuracy KW - consistency KW - clinical data KW - risk assessment KW - progression analysis N2 - Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies. Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions. Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches. Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20%) concordance rates with limited data in comparison to novices (7/69, 10%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40% for both groups (experts: 5/12, 42% and novices: 8/21, 42%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions. Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in accuracy, approach, and management. These findings demonstrate the critical need for AI systems tailored to varying levels of expertise. They also provide insights for the future design of AI systems aimed at enhancing the diagnostic accuracy of optometrists and consistency across different expertise levels, ultimately improving patient outcomes in optometric practice. UR - https://medinform.jmir.org/2025/1/e63109 UR - http://dx.doi.org/10.2196/63109 UR - http://www.ncbi.nlm.nih.gov/pubmed/39879089 ID - info:doi/10.2196/63109 ER - TY - JOUR AU - Scribano Parada, Paz María de la AU - González Palau, Fátima AU - Valladares Rodríguez, Sonia AU - Rincon, Mariano AU - Rico Barroeta, José Maria AU - García Rodriguez, Marta AU - Bueno Aguado, Yolanda AU - Herrero Blanco, Ana AU - Díaz-López, Estela AU - Bachiller Mayoral, Margarita AU - Losada Durán, Raquel PY - 2025/1/28 TI - Preclinical Cognitive Markers of Alzheimer Disease and Early Diagnosis Using Virtual Reality and Artificial Intelligence: Literature Review JO - JMIR Med Inform SP - e62914 VL - 13 KW - dementia KW - Alzheimer disease KW - mild cognitive impairment KW - virtual reality KW - artificial intelligence KW - early detection KW - qualitative review KW - literature review KW - AI N2 - Background: This review explores the potential of virtual reality (VR) and artificial intelligence (AI) to identify preclinical cognitive markers of Alzheimer disease (AD). By synthesizing recent studies, it aims to advance early diagnostic methods to detect AD before significant symptoms occur. Objective: Research emphasizes the significance of early detection in AD during the preclinical phase, which does not involve cognitive impairment but nevertheless requires reliable biomarkers. Current biomarkers face challenges, prompting the exploration of cognitive behavior indicators beyond episodic memory. Methods: Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we searched Scopus, PubMed, and Google Scholar for studies on neuropsychiatric disorders utilizing conversational data. Results: Following an analysis of 38 selected articles, we highlight verbal episodic memory as a sensitive preclinical AD marker, with supporting evidence from neuroimaging and genetic profiling. Executive functions precede memory decline, while processing speed is a significant correlate. The potential of VR remains underexplored, and AI algorithms offer a multidimensional approach to early neurocognitive disorder diagnosis. Conclusions: Emerging technologies like VR and AI show promise for preclinical diagnostics, but thorough validation and regulation for clinical safety and efficacy are necessary. Continued technological advancements are expected to enhance early detection and management of AD. UR - https://medinform.jmir.org/2025/1/e62914 UR - http://dx.doi.org/10.2196/62914 ID - info:doi/10.2196/62914 ER - TY - JOUR AU - Martinez, Stanford AU - Ramirez-Tamayo, Carolina AU - Akhter Faruqui, Hasib Syed AU - Clark, Kal AU - Alaeddini, Adel AU - Czarnek, Nicholas AU - Aggarwal, Aarushi AU - Emamzadeh, Sahra AU - Mock, R. Jeffrey AU - Golob, J. Edward PY - 2025/1/22 TI - Discrimination of Radiologists' Experience Level Using Eye-Tracking Technology and Machine Learning: Case Study JO - JMIR Form Res SP - e53928 VL - 9 KW - machine learning KW - eye-tracking KW - experience level determination KW - radiology education KW - search pattern feature extraction KW - search pattern KW - radiology KW - classification KW - gaze KW - fixation KW - education KW - experience KW - spatio-temporal KW - image KW - x-ray KW - eye movement N2 - Background: Perception-related errors comprise most diagnostic mistakes in radiology. To mitigate this problem, radiologists use personalized and high-dimensional visual search strategies, otherwise known as search patterns. Qualitative descriptions of these search patterns, which involve the physician verbalizing or annotating the order he or she analyzes the image, can be unreliable due to discrepancies in what is reported versus the actual visual patterns. This discrepancy can interfere with quality improvement interventions and negatively impact patient care. Objective: The objective of this study is to provide an alternative method for distinguishing between radiologists by means of captured eye-tracking data such that the raw gaze (or processed fixation data) can be used to discriminate users based on subconscious behavior in visual inspection. Methods: We present a novel discretized feature encoding based on spatiotemporal binning of fixation data for efficient geometric alignment and temporal ordering of eye movement when reading chest x-rays. The encoded features of the eye-fixation data are used by machine learning classifiers to discriminate between faculty and trainee radiologists. A clinical trial case study was conducted using metrics such as the area under the curve, accuracy, F1-score, sensitivity, and specificity to evaluate the discriminability between the 2 groups regarding their level of experience. The classification performance was then compared with state-of-the-art methodologies. In addition, a repeatability experiment using a separate dataset, experimental protocol, and eye tracker was performed with 8 participants to evaluate the robustness of the proposed approach. Results: The numerical results from both experiments demonstrate that classifiers using the proposed feature encoding methods outperform the current state-of-the-art in differentiating between radiologists in terms of experience level. An average performance gain of 6.9% is observed compared with traditional features while classifying experience levels of radiologists. This gain in accuracy is also substantial across different eye tracker?collected datasets, with improvements of 6.41% using the Tobii eye tracker and 7.29% using the EyeLink eye tracker. These results signify the potential impact of the proposed method for identifying radiologists? level of expertise and those who would benefit from additional training. Conclusions: The effectiveness of the proposed spatiotemporal discretization approach, validated across diverse datasets and various classification metrics, underscores its potential for objective evaluation, informing targeted interventions and training strategies in radiology. This research advances reliable assessment tools, addressing challenges in perception-related errors to enhance patient care outcomes. UR - https://formative.jmir.org/2025/1/e53928 UR - http://dx.doi.org/10.2196/53928 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53928 ER - TY - JOUR AU - Lee, Hocheol AU - Park, Myung-Bae AU - Won, Young-Joo PY - 2025/1/21 TI - AI Machine Learning?Based Diabetes Prediction in Older Adults in South Korea: Cross-Sectional Analysis JO - JMIR Form Res SP - e57874 VL - 9 KW - diabetes KW - prediction model KW - super-aging population KW - extreme gradient boosting model KW - geriatrics KW - older adults KW - aging KW - artificial intelligence KW - machine learning N2 - Background: Diabetes is prevalent in older adults, and machine learning algorithms could help predict diabetes in this population. Objective: This study determined diabetes risk factors among older adults aged ?60 years using machine learning algorithms and selected an optimized prediction model. Methods: This cross-sectional study was conducted on 3084 older adults aged ?60 years in Seoul from January to November 2023. Data were collected using a mobile app (Gosufit) that measured depression, stress, anxiety, basal metabolic rate, oxygen saturation, heart rate, and average daily step count. Health coordinators recorded data on diabetes, hypertension, hyperlipidemia, chronic obstructive pulmonary disease, percent body fat, and percent muscle. The presence of diabetes was the target variable, with various health indicators as predictors. Machine learning algorithms, including random forest, gradient boosting model, light gradient boosting model, extreme gradient boosting model, and k-nearest neighbors, were employed for analysis. The dataset was split into 70% training and 30% testing sets. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the curve (AUC). Shapley additive explanations (SHAPs) were used for model interpretability. Results: Significant predictors of diabetes included hypertension (?²1=197.294; P<.001), hyperlipidemia (?²1=47.671; P<.001), age (mean: diabetes group 72.66 years vs nondiabetes group 71.81 years), stress (mean: diabetes group 42.68 vs nondiabetes group 41.47; t3082=?2.858; P=.004), and heart rate (mean: diabetes group 75.05 beats/min vs nondiabetes group 73.14 beats/min; t3082=?7.948; P<.001). The extreme gradient boosting model (XGBM) demonstrated the best performance, with an accuracy of 84.88%, precision of 77.92%, recall of 66.91%, F1 score of 72.00, and AUC of 0.7957. The SHAP analysis of the top-performing XGBM revealed key predictors for diabetes: hypertension, age, percent body fat, heart rate, hyperlipidemia, basal metabolic rate, stress, and oxygen saturation. Hypertension strongly increased diabetes risk, while advanced age and elevated stress levels also showed significant associations. Hyperlipidemia and higher heart rates further heightened diabetes probability. These results highlight the importance and directional impact of specific features in predicting diabetes, providing valuable insights for risk stratification and targeted interventions. Conclusions: This study focused on modifiable risk factors, providing crucial data for establishing a system for the automated collection of health information and lifelog data from older adults using digital devices at service facilities. UR - https://formative.jmir.org/2025/1/e57874 UR - http://dx.doi.org/10.2196/57874 ID - info:doi/10.2196/57874 ER - TY - JOUR AU - Zhang, Haofuzi AU - Zou, Peng AU - Luo, Peng AU - Jiang, Xiaofan PY - 2025/1/20 TI - Machine Learning for the Early Prediction of Delayed Cerebral Ischemia in Patients With Subarachnoid Hemorrhage: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e54121 VL - 27 KW - machine learning KW - subarachnoid hemorrhage KW - delayed cerebral ischemia KW - systematic review N2 - Background: Delayed cerebral ischemia (DCI) is a primary contributor to death after subarachnoid hemorrhage (SAH), with significant incidence. Therefore, early determination of the risk of DCI is an urgent need. Machine learning (ML) has received much attention in clinical practice. Recently, some studies have attempted to apply ML models for early noninvasive prediction of DCI. However, systematic evidence for its predictive accuracy is still lacking. Objective: The aim of this study was to synthesize the prediction accuracy of ML models for DCI to provide evidence for the development or updating of intelligent detection tools. Methods: PubMed, Cochrane, Embase, and Web of Science databases were systematically searched up to May 18, 2023. The risk of bias in the included studies was assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool). During the analysis, we discussed the performance of different models in the training and validation sets. Results: We finally included 48 studies containing 16,294 patients with SAH and 71 ML models with logistic regression as the main model type. In the training set, the pooled concordance index (C index), sensitivity, and specificity of all the models were 0.786 (95% CI 0.737-0.835), 0.77 (95% CI 0.69-0.84), and 0.83 (95% CI 0.75-0.89), respectively, while those of the logistic regression models were 0.770 (95% CI 0.724-0.817), 0.75 (95% CI 0.67-0.82), and 0.71 (95% CI 0.63-0.78), respectively. In the validation set, the pooled C index, sensitivity, and specificity of all the models were 0.767 (95% CI 0.741-0.793), 0.66 (95% CI 0.53-0.77), and 0.78 (95% CI 0.71-0.84), respectively, while those of the logistic regression models were 0.757 (95% CI 0.715-0.800), 0.59 (95% CI 0.57-0.80), and 0.80 (95% CI 0.71-0.87), respectively. Conclusions: ML models appear to have relatively desirable power for early noninvasive prediction of DCI after SAH. However, enhancing the prediction sensitivity of these models is challenging. Therefore, efficient, noninvasive, or minimally invasive low-cost predictors should be further explored in future studies to improve the prediction accuracy of ML models. Trial Registration: PROSPERO (CRD42023438399); https://tinyurl.com/yfuuudde UR - https://www.jmir.org/2025/1/e54121 UR - http://dx.doi.org/10.2196/54121 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54121 ER - TY - JOUR AU - De Silva, Upeka AU - Madanian, Samaneh AU - Olsen, Sharon AU - Templeton, Michael John AU - Poellabauer, Christian AU - Schneider, L. Sandra AU - Narayanan, Ajit AU - Rubaiat, Rahmina PY - 2025/1/13 TI - Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders JO - J Med Internet Res SP - e63004 VL - 27 KW - digital health KW - health informatics KW - digital biomarker KW - speech analytics KW - artificial intelligence KW - machine learning N2 - Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems. Objective: This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives. Methods: A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis. Results: A total of 389 articles met the initial eligibility criteria, of which 72 (18.5%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing?based speech features (such as wavelet transformation?based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically. Conclusions: The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance. UR - https://www.jmir.org/2025/1/e63004 UR - http://dx.doi.org/10.2196/63004 UR - http://www.ncbi.nlm.nih.gov/pubmed/39804693 ID - info:doi/10.2196/63004 ER - TY - JOUR AU - Zhang, Yong AU - Lu, Xiao AU - Luo, Yan AU - Zhu, Ying AU - Ling, Wenwu PY - 2025/1/9 TI - Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis JO - JMIR Med Inform SP - e63924 VL - 13 KW - chatbots KW - ChatGPT KW - ERNIE Bot KW - performance KW - accuracy rates KW - ultrasound KW - language KW - examination N2 - Background: Artificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. Objective: This study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. Methods: We curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. Results: Of the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P<.05). Both models showed a performance decline in English, but ERNIE Bot?s decline was less significant. The models performed better in terms of basic knowledge, ultrasound methods, and diseases than in terms of ultrasound signs and diagnosis. Conclusions: Chatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use. UR - https://medinform.jmir.org/2025/1/e63924 UR - http://dx.doi.org/10.2196/63924 ID - info:doi/10.2196/63924 ER - TY - JOUR AU - Zhuang, Yan AU - Zhang, Junyan AU - Li, Xiuxing AU - Liu, Chao AU - Yu, Yue AU - Dong, Wei AU - He, Kunlun PY - 2025/1/6 TI - Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text JO - JMIR Med Inform SP - e63020 VL - 13 KW - BERT KW - bidirectional encoder representations from transformers KW - pretrained language models KW - prompt learning KW - ICD KW - International Classification of Diseases KW - cardiovascular disease KW - few-shot learning KW - multicenter medical data N2 - Background: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate. We developed a fully automated pipeline based on the Key?bidirectional encoder representations from transformers (BERT) approach and large-scale medical records for continued pretraining, which effectively converts long free text into standard ICD codes. By adjusting parameter settings, such as mixed templates and soft verbalizers, the model can adapt flexibly to different requirements, enabling task-specific prompt learning. Objective: This study aims to propose a prompt learning real-time framework based on pretrained language models that can automatically label long free-text data with ICD-10 codes for cardiovascular diseases without the need for semiautomatic preprocessing. Methods: We integrated 4 components into our framework: a medical pretrained BERT, a keyword filtration BERT in a functional order, a fine-tuning phase, and task-specific prompt learning utilizing mixed templates and soft verbalizers. This framework was validated on a multicenter medical dataset for the automated ICD coding of 13 common cardiovascular diseases (584,969 records). Its performance was compared against robustly optimized BERT pretraining approach, extreme language network, and various BERT-based fine-tuning pipelines. Additionally, we evaluated the framework?s performance under different prompt learning and fine-tuning settings. Furthermore, few-shot learning experiments were conducted to assess the feasibility and efficacy of our framework in scenarios involving small- to mid-sized datasets. Results: Compared with traditional pretraining and fine-tuning pipelines, our approach achieved a higher micro?F1-score of 0.838 and a macro?area under the receiver operating characteristic curve (macro-AUC) of 0.958, which is 10% higher than other methods. Among different prompt learning setups, the combination of mixed templates and soft verbalizers yielded the best performance. Few-shot experiments showed that performance stabilized and the AUC peaked at 500 shots. Conclusions: These findings underscore the effectiveness and superior performance of prompt learning and fine-tuning for subtasks within pretrained language models in medical practice. Our real-time ICD coding pipeline efficiently converts detailed medical free text into standardized labels, offering promising applications in clinical decision-making. It can assist doctors unfamiliar with the ICD coding system in organizing medical record information, thereby accelerating the medical process and enhancing the efficiency of diagnosis and treatment. UR - https://medinform.jmir.org/2025/1/e63020 UR - http://dx.doi.org/10.2196/63020 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63020 ER - TY - JOUR AU - Jiang, Xiangkui AU - Wang, Bingquan PY - 2024/12/31 TI - Enhancing Clinical Decision Making by Predicting Readmission Risk in Patients With Heart Failure Using Machine Learning: Predictive Model Development Study JO - JMIR Med Inform SP - e58812 VL - 12 KW - prediction model KW - heart failure KW - hospital readmission KW - machine learning KW - cardiology KW - admissions KW - hospitalization N2 - Background: Patients with heart failure frequently face the possibility of rehospitalization following an initial hospital stay, placing a significant burden on both patients and health care systems. Accurate predictive tools are crucial for guiding clinical decision-making and optimizing patient care. However, the effectiveness of existing models tailored specifically to the Chinese population is still limited. Objective: This study aimed to formulate a predictive model for assessing the likelihood of readmission among patients diagnosed with heart failure. Methods: In this study, we analyzed data from 1948 patients with heart failure in a hospital in Sichuan Province between 2016 and 2019. By applying 3 variable selection strategies, 29 relevant variables were identified. Subsequently, we constructed 6 predictive models using different algorithms: logistic regression, support vector machine, gradient boosting machine, Extreme Gradient Boosting, multilayer perception, and graph convolutional networks. Results: The graph convolutional network model showed the highest prediction accuracy with an area under the receiver operating characteristic curve of 0.831, accuracy of 75%, sensitivity of 52.12%, and specificity of 90.25%. Conclusions: The model crafted in this study proves its effectiveness in forecasting the likelihood of readmission among patients with heart failure, thus serving as a crucial reference for clinical decision-making. UR - https://medinform.jmir.org/2024/1/e58812 UR - http://dx.doi.org/10.2196/58812 ID - info:doi/10.2196/58812 ER - TY - JOUR AU - Wyatt, Sage AU - Lunde Markussen, Dagfinn AU - Haizoune, Mounir AU - Vestbø, Strand Anders AU - Sima, Tilahun Yeneabeba AU - Sandboe, Ilene Maria AU - Landschulze, Marcus AU - Bartsch, Hauke AU - Sauer, Martin Christopher PY - 2024/12/31 TI - Leveraging Machine Learning to Identify Subgroups of Misclassified Patients in the Emergency Department: Multicenter Proof-of-Concept Study JO - J Med Internet Res SP - e56382 VL - 26 KW - emergency department KW - triage KW - machine learning KW - real world evidence KW - random forest KW - classification KW - subgroup KW - misclassification KW - patient KW - multi-center KW - proof-of-concept KW - hospital KW - clinical feature KW - Norway KW - retrospective KW - cohort study KW - electronic health system KW - electronic health record N2 - Background: Hospitals use triage systems to prioritize the needs of patients within available resources. Misclassification of a patient can lead to either adverse outcomes in a patient who did not receive appropriate care in the case of undertriage or a waste of hospital resources in the case of overtriage. Recent advances in machine learning algorithms allow for the quantification of variables important to under- and overtriage. Objective: This study aimed to identify clinical features most strongly associated with triage misclassification using a machine learning classification model to capture nonlinear relationships. Methods: Multicenter retrospective cohort data from 2 big regional hospitals in Norway were extracted. The South African Triage System is used at Bergen University Hospital, and the Rapid Emergency Triage and Treatment System is used at Trondheim University Hospital. Variables included triage score, age, sex, arrival time, subject area affiliation, reason for emergency department contact, discharge location, level of care, and time of death were retrieved. Random forest classification models were used to identify features with the strongest association with overtriage and undertriage in clinical practice in Bergen and Trondheim. We reported variable importance as SHAP (SHapley Additive exPlanations)-values. Results: We collected data on 205,488 patient records from Bergen University Hospital and 304,997 patient records from Trondheim University Hospital. Overall, overtriage was very uncommon at both hospitals (all <0.1%), with undertriage differing between both locations, with 0.8% at Bergen and 0.2% at Trondheim University Hospital. Demographics were similar for both hospitals. However, the percentage given a high-priority triage score (red or orange) was higher in Bergen (24%) compared with 9% in Trondheim. The clinical referral department was found to be the variable with the strongest association with undertriage (mean SHAP +0.62 and +0.37 for Bergen and Trondheim, respectively). Conclusions: We identified subgroups of patients consistently undertriaged using 2 common triage systems. While the importance of clinical patient characteristics to triage misclassification varies by triage system and location, we found consistent evidence between the two locations that the clinical referral department is the most important variable associated with triage misclassification. Replication of this approach at other centers could help to further improve triage scoring systems and improve patient care worldwide. UR - https://www.jmir.org/2024/1/e56382 UR - http://dx.doi.org/10.2196/56382 UR - http://www.ncbi.nlm.nih.gov/pubmed/39451101 ID - info:doi/10.2196/56382 ER - TY - JOUR AU - Wang, Wei AU - Chen, Xiang AU - Xu, Licong AU - Huang, Kai AU - Zhao, Shuang AU - Wang, Yong PY - 2024/12/27 TI - Artificial Intelligence?Aided Diagnosis System for the Detection and Classification of Private-Part Skin Diseases: Decision Analytical Modeling Study JO - J Med Internet Res SP - e52914 VL - 26 KW - artificial intelligence-aided diagnosis KW - private parts KW - skin disease KW - knowledge graph KW - dermatology KW - classification KW - artificial intelligence KW - AI KW - diagnosis N2 - Background: Private-part skin diseases (PPSDs) can cause a patient?s stigma, which may hinder the early diagnosis of these diseases. Artificial intelligence (AI) is an effective tool to improve the early diagnosis of PPSDs, especially in preventing the deterioration of skin tumors in private parts such as Paget disease. However, to our knowledge, there is currently no research on using AI to identify PPSDs due to the complex backgrounds of the lesion areas and the challenges in data collection. Objective: This study aimed to develop and evaluate an AI-aided diagnosis system for the detection and classification of PPSDs: aiding patients in self-screening and supporting dermatologists? diagnostic enhancement. Methods: In this decision analytical modeling study, a 2-stage AI-aided diagnosis system was developed to classify PPSDs. In the first stage, a multitask detection network was trained to automatically detect and classify skin lesions (type, color, and shape). In the second stage, we proposed a knowledge graph based on dermatology expertise and constructed a decision network to classify seven PPSDs (condyloma acuminatum, Paget disease, eczema, pearly penile papules, genital herpes, syphilis, and Bowen disease). A reader study with 13 dermatologists of different experience levels was conducted. Dermatologists were asked to classify the testing cohort under reading room conditions, first without and then with system support. This AI-aided diagnostic study used the data of 635 patients from two institutes between July 2019 and April 2022. The data of Institute 1 contained 2701 skin lesion samples from 520 patients, which were used for the training of the multitask detection network in the first stage. In addition, the data of Institute 2 consisted of 115 clinical images and the corresponding medical records, which were used for the test of the whole 2-stage AI-aided diagnosis system. Results: On the test data of Institute 2, the proposed system achieved the average precision, recall, and F1-score of 0.81, 0.86, and 0.83, respectively, better than existing advanced algorithms. For the reader performance test, our system improved the average F1-score of the junior, intermediate, and senior dermatologists by 16%, 7%, and 4%, respectively. Conclusions: In this study, we constructed the first skin-lesion?based dataset and developed the first AI-aided diagnosis system for PPSDs. This system provides the final diagnosis result by simulating the diagnostic process of dermatologists. Compared with existing advanced algorithms, this system is more accurate in identifying PPSDs. Overall, our system can not only help patients achieve self-screening and alleviate their stigma but also assist dermatologists in diagnosing PPSDs. UR - https://www.jmir.org/2024/1/e52914 UR - http://dx.doi.org/10.2196/52914 UR - http://www.ncbi.nlm.nih.gov/pubmed/39729353 ID - info:doi/10.2196/52914 ER - TY - JOUR AU - Stephan, Daniel AU - Bertsch, Annika AU - Burwinkel, Matthias AU - Vinayahalingam, Shankeeth AU - Al-Nawas, Bilal AU - Kämmerer, W. Peer AU - Thiem, GE Daniel PY - 2024/12/23 TI - AI in Dental Radiology?Improving the Efficiency of Reporting With ChatGPT: Comparative Study JO - J Med Internet Res SP - e60684 VL - 26 KW - artificial intelligence KW - ChatGPT KW - radiology report KW - dental radiology KW - dental orthopantomogram KW - panoramic radiograph KW - dental KW - radiology KW - chatbot KW - medical documentation KW - medical application KW - imaging KW - disease detection KW - clinical decision support KW - natural language processing KW - medical licensing KW - dentistry KW - patient care N2 - Background: Structured and standardized documentation is critical for accurately recording diagnostic findings, treatment plans, and patient progress in health care. Manual documentation can be labor-intensive and error-prone, especially under time constraints, prompting interest in the potential of artificial intelligence (AI) to automate and optimize these processes, particularly in medical documentation. Objective: This study aimed to assess the effectiveness of ChatGPT (OpenAI) in generating radiology reports from dental panoramic radiographs, comparing the performance of AI-generated reports with those manually created by dental students. Methods: A total of 100 dental students were tasked with analyzing panoramic radiographs and generating radiology reports manually or assisted by ChatGPT using a standardized prompt derived from a diagnostic checklist. Results: Reports generated by ChatGPT showed a high degree of textual similarity to reference reports; however, they often lacked critical diagnostic information typically included in reports authored by students. Despite this, the AI-generated reports were consistent in being error-free and matched the readability of student-generated reports. Conclusions: The findings from this study suggest that ChatGPT has considerable potential for generating radiology reports, although it currently faces challenges in accuracy and reliability. This underscores the need for further refinement in the AI?s prompt design and the development of robust validation mechanisms to enhance its use in clinical settings. UR - https://www.jmir.org/2024/1/e60684 UR - http://dx.doi.org/10.2196/60684 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60684 ER - TY - JOUR AU - Cabanillas Silva, Patricia AU - Sun, Hong AU - Rezk, Mohamed AU - Roccaro-Waldmeyer, M. Diana AU - Fliegenschmidt, Janis AU - Hulde, Nikolai AU - von Dossow, Vera AU - Meesseman, Laurent AU - Depraetere, Kristof AU - Stieg, Joerg AU - Szymanowsky, Ralph AU - Dahlweid, Fried-Michael PY - 2024/12/13 TI - Longitudinal Model Shifts of Machine Learning?Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals JO - J Med Internet Res SP - e51409 VL - 26 KW - model shift KW - model monitoring KW - prediction models KW - acute kidney injury KW - AKI KW - sepsis KW - delirium KW - decision curve analysis KW - DCA N2 - Background: In recent years, machine learning (ML)?based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. Objective: This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases?delirium, sepsis, and acute kidney injury (AKI)?from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. Methods: We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. Results: The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=?1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. Conclusions: Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making. UR - https://www.jmir.org/2024/1/e51409 UR - http://dx.doi.org/10.2196/51409 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/51409 ER - TY - JOUR AU - Zou, Zhuan AU - Chen, Bin AU - Xiao, Dongqiong AU - Tang, Fajuan AU - Li, Xihong PY - 2024/12/11 TI - Accuracy of Machine Learning in Detecting Pediatric Epileptic Seizures: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e55986 VL - 26 KW - epileptic seizures KW - machine learning KW - deep learning KW - electroencephalogram KW - EEG KW - children KW - pediatrics KW - epilepsy KW - detection N2 - Background: Real-time monitoring of pediatric epileptic seizures poses a significant challenge in clinical practice. In recent years, machine learning (ML) has attracted substantial attention from researchers for diagnosing and treating neurological diseases, leading to its application for detecting pediatric epileptic seizures. However, systematic evidence substantiating its feasibility remains limited. Objective: This systematic review aimed to consolidate the existing evidence regarding the effectiveness of ML in monitoring pediatric epileptic seizures with an effort to provide an evidence-based foundation for the development and enhancement of intelligent tools in the future. Methods: We conducted a systematic search of the PubMed, Cochrane, Embase, and Web of Science databases for original studies focused on the detection of pediatric epileptic seizures using ML, with a cutoff date of August 27, 2023. The risk of bias in eligible studies was assessed using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies?2). Meta-analyses were performed to evaluate the C-index and the diagnostic 4-grid table, using a bivariate mixed-effects model for the latter. We also examined publication bias for the C-index by using funnel plots and the Egger test. Results: This systematic review included 28 original studies, with 15 studies on ML and 13 on deep learning (DL). All these models were based on electroencephalography data of children. The pooled C-index, sensitivity, specificity, and accuracy of ML in the training set were 0.76 (95% CI 0.69-0.82), 0.77 (95% CI 0.73-0.80), 0.74 (95% CI 0.70-0.77), and 0.75 (95% CI 0.72-0.77), respectively. In the validation set, the pooled C-index, sensitivity, specificity, and accuracy of ML were 0.73 (95% CI 0.67-0.79), 0.88 (95% CI 0.83-0.91), 0.83 (95% CI 0.71-0.90), and 0.78 (95% CI 0.73-0.82), respectively. Meanwhile, the pooled C-index of DL in the validation set was 0.91 (95% CI 0.88-0.94), with sensitivity, specificity, and accuracy being 0.89 (95% CI 0.85-0.91), 0.91 (95% CI 0.88-0.93), and 0.89 (95% CI 0.86-0.92), respectively. Conclusions: Our systematic review demonstrates promising accuracy of artificial intelligence methods in epilepsy detection. DL appears to offer higher detection accuracy than ML. These findings support the development of DL-based early-warning tools in future research. Trial Registration: PROSPERO CRD42023467260; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42023467260 UR - https://www.jmir.org/2024/1/e55986 UR - http://dx.doi.org/10.2196/55986 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55986 ER - TY - JOUR AU - Du, Jianchao AU - Ding, Junyao AU - Wu, Yuan AU - Chen, Tianyan AU - Lian, Jianqi AU - Shi, Lei AU - Zhou, Yun PY - 2024/12/9 TI - A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and Validation JO - JMIR Form Res SP - e58423 VL - 8 KW - fever of unknown origin KW - FUO KW - intelligent diagnosis KW - machine learning KW - hierarchical classification KW - feature selection KW - model design KW - validation KW - diagnostic KW - prediction model N2 - Background: Fever of unknown origin (FUO) is a significant challenge for the medical community due to its association with a wide range of diseases, the complexity of diagnosis, and the likelihood of misdiagnosis. Machine learning can extract valuable information from the extensive data of patient indicators, aiding doctors in diagnosing the underlying cause of FUO. Objective: The study aims to design a multipath hierarchical classification algorithm to diagnose FUO due to the hierarchical structure of the etiology of FUO. In addition, to improve the diagnostic performance of the model, a mechanism for feature selection is added to the model. Methods: The case data of patients with FUO admitted to the First Affiliated Hospital of Xi?an Jiaotong University between 2011 and 2020 in China were used as the dataset for model training and validation. The hierarchical structure tree was then characterized according to etiology. The structure included 3 layers, with the top layer representing the FUO, the middle layer dividing the FUO into 5 categories of etiology (bacterial infection, viral infection, other infection, autoimmune diseases, and other noninfection), and the last layer further refining them to 16 etiologies. Finally, ablation experiments were set to determine the optimal structure of the proposed method, and comparison experiments were to verify the diagnostic performance. Results: According to ablation experiments, the model achieved the best performance with an accuracy of 76.08% when the number of middle paths was 3%, and 25% of the features were selected. According to comparison experiments, the proposed model outperformed the comparison methods, both from the perspective of feature selection methods and hierarchical classification methods. Specifically, brucellosis had an accuracy of 100%, and liver abscess, viral infection, and lymphoma all had an accuracy of more than 80%. Conclusions: In this study, a novel multipath feature selection and hierarchical classification model was designed for the diagnosis of FUO and was adequately evaluated quantitatively. Despite some limitations, this model enriches the exploration of FUO in machine learning and assists physicians in their work. UR - https://formative.jmir.org/2024/1/e58423 UR - http://dx.doi.org/10.2196/58423 ID - info:doi/10.2196/58423 ER - TY - JOUR AU - Sorich, Joseph Michael AU - Mangoni, Aleksander Arduino AU - Bacchi, Stephen AU - Menz, Douglas Bradley AU - Hopkins, Mark Ashley PY - 2024/12/6 TI - The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance JO - J Med Internet Res SP - e67409 VL - 26 KW - generative artificial intelligence KW - large language models KW - triage KW - diagnosis KW - accuracy KW - physician KW - ChatGPT KW - diagnostic KW - primary care KW - physicians KW - prediction KW - medical care KW - internet KW - LLMs KW - AI UR - https://www.jmir.org/2024/1/e67409 UR - http://dx.doi.org/10.2196/67409 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67409 ER - TY - JOUR AU - Sakamoto, Tetsu AU - Harada, Yukinori AU - Shimizu, Taro PY - 2024/11/27 TI - Facilitating Trust Calibration in Artificial Intelligence?Driven Diagnostic Decision Support Systems for Determining Physicians? Diagnostic Accuracy: Quasi-Experimental Study JO - JMIR Form Res SP - e58666 VL - 8 KW - trust calibration KW - artificial intelligence KW - diagnostic accuracy KW - diagnostic decision support KW - decision support KW - diagnosis KW - diagnostic KW - chart KW - history KW - reliable KW - reliability KW - accurate KW - accuracy KW - AI N2 - Background: Diagnostic errors are significant problems in medical care. Despite the usefulness of artificial intelligence (AI)?based diagnostic decision support systems, the overreliance of physicians on AI-generated diagnoses may lead to diagnostic errors. Objective: We investigated the safe use of AI-based diagnostic decision support systems with trust calibration by adjusting trust levels to match the actual reliability of AI. Methods: A quasi-experimental study was conducted at Dokkyo Medical University, Japan, with physicians allocated (1:1) to the intervention and control groups. A total of 20 clinical cases were created based on the medical histories recorded by an AI-driven automated medical history?taking system from actual patients who visited a community-based hospital in Japan. The participants reviewed the medical histories of 20 clinical cases generated by an AI-driven automated medical history?taking system with an AI-generated list of 10 differential diagnoses and provided 1 to 3 possible diagnoses. Physicians were asked whether the final diagnosis was in the AI-generated list of 10 differential diagnoses in the intervention group, which served as the trust calibration. We analyzed the diagnostic accuracy of physicians and the correctness of the trust calibration in the intervention group. We also investigated the relationship between the accuracy of the trust calibration and the diagnostic accuracy of physicians, and the physicians? confidence level regarding the use of AI. Results: Among the 20 physicians assigned to the intervention (n=10) and control (n=10) groups, the mean age was 30.9 (SD 3.9) years and 31.7 (SD 4.2) years, the proportion of men was 80% and 60%, and the mean postgraduate year was 5.8 (SD 2.9) and 7.2 (SD 4.6), respectively, with no significant differences. The physicians? diagnostic accuracy was 41.5% in the intervention group and 46% in the control group, with no significant difference (95% CI ?0.75 to 2.55; P=.27). The overall accuracy of the trust calibration was only 61.5%, and despite correct calibration, the diagnostic accuracy was 54.5%. In the multivariate logistic regression model, the accuracy of the trust calibration was a significant contributor to the diagnostic accuracy of physicians (adjusted odds ratio 5.90, 95% CI 2.93?12.46; P<.001). The mean confidence level for AI was 72.5% in the intervention group and 45% in the control group, with no significant difference. Conclusions: Trust calibration did not significantly improve physicians? diagnostic accuracy when considering the differential diagnoses generated by reading medical histories and the possible differential diagnosis lists of an AI-driven automated medical history?taking system. As this was a formative study, the small sample size and suboptimal trust calibration methods may have contributed to the lack of significant differences. This study highlights the need for a larger sample size and the implementation of supportive measures of trust calibration. UR - https://formative.jmir.org/2024/1/e58666 UR - http://dx.doi.org/10.2196/58666 ID - info:doi/10.2196/58666 ER - TY - JOUR AU - Drogt, Jojanneke AU - Milota, Megan AU - Veldhuis, Wouter AU - Vos, Shoko AU - Jongsma, Karin PY - 2024/11/21 TI - The Promise of AI for Image-Driven Medicine: Qualitative Interview Study of Radiologists? and Pathologists? Perspectives JO - JMIR Hum Factors SP - e52514 VL - 11 KW - digital medicine KW - computer vision KW - medical AI KW - image-driven specialisms KW - qualitative interview study KW - digital health ethics KW - artificial intelligence KW - AI KW - imaging KW - imaging informatics KW - radiology KW - pathology N2 - Background: Image-driven specialisms such as radiology and pathology are at the forefront of medical artificial intelligence (AI) innovation. Many believe that AI will lead to significant shifts in professional roles, so it is vital to investigate how professionals view the pending changes that AI innovation will initiate and incorporate their views in ongoing AI developments. Objective: Our study aimed to gain insights into the perspectives and wishes of radiologists and pathologists regarding the promise of AI. Methods: We have conducted the first qualitative interview study investigating the perspectives of both radiologists and pathologists regarding the integration of AI in their fields. The study design is in accordance with the consolidated criteria for reporting qualitative research (COREQ). Results: In total, 21 participants were interviewed for this study (7 pathologists, 10 radiologists, and 4 computer scientists). The interviews revealed a diverse range of perspectives on the impact of AI. Respondents discussed various task-specific benefits of AI; yet, both pathologists and radiologists agreed that AI had yet to live up to its hype. Overall, our study shows that AI could facilitate welcome changes in the workflows of image-driven professionals and eventually lead to better quality of care. At the same time, these professionals also admitted that many hopes and expectations for AI were unlikely to become a reality in the next decade. Conclusions: This study points to the importance of maintaining a ?healthy skepticism? on the promise of AI in imaging specialisms and argues for more structural and inclusive discussions about whether AI is the right technology to solve current problems encountered in daily clinical practice. UR - https://humanfactors.jmir.org/2024/1/e52514 UR - http://dx.doi.org/10.2196/52514 ID - info:doi/10.2196/52514 ER - TY - JOUR AU - Ji, Lu AU - Yao, Yifan AU - Yu, Dandan AU - Chen, Wen AU - Yin, Shanshan AU - Fu, Yun AU - Tang, Shangfeng AU - Yao, Lan PY - 2024/11/20 TI - Performance of a Full-Coverage Cervical Cancer Screening Program Using on an Artificial Intelligence? and Cloud-Based Diagnostic System: Observational Study of an Ultralarge Population JO - J Med Internet Res SP - e51477 VL - 26 KW - full coverage KW - cervical cancer screening KW - artificial intelligence KW - primary health institutions KW - accessibility KW - efficiency N2 - Background: The World Health Organization has set a global strategy to eliminate cervical cancer, emphasizing the need for cervical cancer screening coverage to reach 70%. In response, China has developed an action plan to accelerate the elimination of cervical cancer, with Hubei province implementing China?s first provincial full-coverage screening program using an artificial intelligence (AI) and cloud-based diagnostic system. Objective: This study aimed to evaluate the performance of AI technology in this full-coverage screening program. The evaluation indicators included accessibility, screening efficiency, diagnostic quality, and program cost. Methods: Characteristics of 1,704,461 individuals screened from July 2022 to January 2023 were used to analyze accessibility and AI screening efficiency. A random sample of 220 individuals was used for external diagnostic quality control. The costs of different participating screening institutions were assessed. Results: Cervical cancer screening services were extended to all administrative districts, especially in rural areas. Rural women had the highest participation rate at 67.54% (1,147,839/1,699,591). Approximately 1.7 million individuals were screened, achieving a cumulative coverage of 13.45% in about 6 months. Full-coverage programs could be achieved by AI technology in approximately 1 year, which was 87.5 times more efficient than the manual reading of slides. The sample compliance rate was as high as 99.1%, and compliance rates for positive, negative, and pathology biopsy reviews exceeded 96%. The cost of this program was CN ¥49 (the average exchange rate in 2022 is as follows: US $1=CN ¥6.7261) per person, with the primary screening institution and the third-party testing institute receiving CN ¥19 and ¥27, respectively. Conclusions: AI-assisted diagnosis has proven to be accessible, efficient, reliable, and low cost, which could support the implementation of full-coverage screening programs, especially in areas with insufficient health resources. AI technology served as a crucial tool for rapidly and effectively increasing screening coverage, which would accelerate the achievement of the World Health Organization?s goals of eliminating cervical cancer. UR - https://www.jmir.org/2024/1/e51477 UR - http://dx.doi.org/10.2196/51477 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/51477 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Harada, Yukinori AU - Tokumasu, Kazuki AU - Shiraishi, Tatsuya AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2024/11/19 TI - Comparative Analysis of Diagnostic Performance: Differential Diagnosis Lists by LLaMA3 Versus LLaMA2 for Case Reports JO - JMIR Form Res SP - e64844 VL - 8 KW - artificial intelligence KW - clinical decision support system KW - generative artificial intelligence KW - large language models KW - natural language processing KW - NLP KW - AI KW - clinical decision making KW - decision support KW - decision making KW - LLM: diagnostic KW - case report KW - diagnosis KW - generative AI KW - LLaMA N2 - Background: Generative artificial intelligence (AI), particularly in the form of large language models, has rapidly developed. The LLaMA series are popular and recently updated from LLaMA2 to LLaMA3. However, the impacts of the update on diagnostic performance have not been well documented. Objective: We conducted a comparative evaluation of the diagnostic performance in differential diagnosis lists generated by LLaMA3 and LLaMA2 for case reports. Methods: We analyzed case reports published in the American Journal of Case Reports from 2022 to 2023. After excluding nondiagnostic and pediatric cases, we input the remaining cases into LLaMA3 and LLaMA2 using the same prompt and the same adjustable parameters. Diagnostic performance was defined by whether the differential diagnosis lists included the final diagnosis. Multiple physicians independently evaluated whether the final diagnosis was included in the top 10 differentials generated by LLaMA3 and LLaMA2. Results: In our comparative evaluation of the diagnostic performance between LLaMA3 and LLaMA2, we analyzed differential diagnosis lists for 392 case reports. The final diagnosis was included in the top 10 differentials generated by LLaMA3 in 79.6% (312/392) of the cases, compared to 49.7% (195/392) for LLaMA2, indicating a statistically significant improvement (P<.001). Additionally, LLaMA3 showed higher performance in including the final diagnosis in the top 5 differentials, observed in 63% (247/392) of cases, compared to LLaMA2?s 38% (149/392, P<.001). Furthermore, the top diagnosis was accurately identified by LLaMA3 in 33.9% (133/392) of cases, significantly higher than the 22.7% (89/392) achieved by LLaMA2 (P<.001). The analysis across various medical specialties revealed variations in diagnostic performance with LLaMA3 consistently outperforming LLaMA2. Conclusions: The results reveal that the LLaMA3 model significantly outperforms LLaMA2 per diagnostic performance, with a higher percentage of case reports having the final diagnosis listed within the top 10, top 5, and as the top diagnosis. Overall diagnostic performance improved almost 1.5 times from LLaMA2 to LLaMA3. These findings support the rapid development and continuous refinement of generative AI systems to enhance diagnostic processes in medicine. However, these findings should be carefully interpreted for clinical application, as generative AI, including the LLaMA series, has not been approved for medical applications such as AI-enhanced diagnostics. UR - https://formative.jmir.org/2024/1/e64844 UR - http://dx.doi.org/10.2196/64844 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/64844 ER - TY - JOUR AU - Zhu, Jinpu AU - Yang, Fushuang AU - Wang, Yang AU - Wang, Zhongtian AU - Xiao, Yao AU - Wang, Lie AU - Sun, Liping PY - 2024/11/18 TI - Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e57641 VL - 26 KW - machine learning KW - artificial intelligence KW - Kawasaki disease KW - febrile illness KW - coronary artery lesions KW - systematic review KW - meta-analysis N2 - Background: Kawasaki disease (KD) is an acute pediatric vasculitis that can lead to coronary artery aneurysms and severe cardiovascular complications, often presenting with obvious fever in the early stages. In current clinical practice, distinguishing KD from other febrile illnesses remains a significant challenge. In recent years, some researchers have explored the potential of machine learning (ML) methods for the differential diagnosis of KD versus other febrile illnesses, as well as for predicting coronary artery lesions (CALs) in people with KD. However, there is still a lack of systematic evidence to validate their effectiveness. Therefore, we have conducted the first systematic review and meta-analysis to evaluate the accuracy of ML in differentiating KD from other febrile illnesses and in predicting CALs in people with KD, so as to provide evidence-based support for the application of ML in the diagnosis and treatment of KD. Objective: This study aimed to summarize the accuracy of ML in differentiating KD from other febrile illnesses and predicting CALs in people with KD. Methods: PubMed, Cochrane Library, Embase, and Web of Science were systematically searched until September 26, 2023. The risk of bias in the included original studies was appraised using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Stata (version 15.0; StataCorp) was used for the statistical analysis. Results: A total of 29 studies were incorporated. Of them, 20 used ML to differentiate KD from other febrile illnesses. These studies involved a total of 103,882 participants, including 12,541 people with KD. In the validation set, the pooled concordance index, sensitivity, and specificity were 0.898 (95% CI 0.874-0.922), 0.91 (95% CI 0.83-0.95), and 0.86 (95% CI 0.80-0.90), respectively. Meanwhile, 9 studies used ML for early prediction of the risk of CALs in children with KD. These studies involved a total of 6503 people with KD, of whom 986 had CALs. The pooled concordance index in the validation set was 0.787 (95% CI 0.738-0.835). Conclusions: The diagnostic and predictive factors used in the studies we included were primarily derived from common clinical data. The ML models constructed based on these clinical data demonstrated promising effectiveness in differentiating KD from other febrile illnesses and in predicting coronary artery lesions. Therefore, in future research, we can explore the use of ML methods to identify more efficient predictors and develop tools that can be applied on a broader scale for the differentiation of KD and the prediction of CALs. UR - https://www.jmir.org/2024/1/e57641 UR - http://dx.doi.org/10.2196/57641 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57641 ER - TY - JOUR AU - Cho, Na Ha AU - Jun, Joon Tae AU - Kim, Young-Hak AU - Kang, Heejun AU - Ahn, Imjin AU - Gwon, Hansle AU - Kim, Yunha AU - Seo, Jiahn AU - Choi, Heejung AU - Kim, Minkyoung AU - Han, Jiye AU - Kee, Gaeun AU - Park, Seohyun AU - Ko, Soyoung PY - 2024/11/18 TI - Task-Specific Transformer-Based Language Models in Health Care: Scoping Review JO - JMIR Med Inform SP - e49724 VL - 12 KW - transformer-based language models KW - medicine KW - health care KW - medical language model N2 - Background: Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. Objective: This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. Methods: We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. Results: Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. Conclusions: This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics. UR - https://medinform.jmir.org/2024/1/e49724 UR - http://dx.doi.org/10.2196/49724 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/49724 ER - TY - JOUR AU - Chustecki, Margaret PY - 2024/11/18 TI - Benefits and Risks of AI in Health Care: Narrative Review JO - Interact J Med Res SP - e53616 VL - 13 KW - artificial intelligence KW - safety risks KW - biases KW - AI KW - benefit KW - risk KW - health care KW - safety KW - ethics KW - transparency KW - data privacy KW - accuracy N2 - Background: The integration of artificial intelligence (AI) into health care has the potential to transform the industry, but it also raises ethical, regulatory, and safety concerns. This review paper provides an in-depth examination of the benefits and risks associated with AI in health care, with a focus on issues like biases, transparency, data privacy, and safety. Objective: This study aims to evaluate the advantages and drawbacks of incorporating AI in health care. This assessment centers on the potential biases in AI algorithms, transparency challenges, data privacy issues, and safety risks in health care settings. Methods: Studies included in this review were selected based on their relevance to AI applications in health care, focusing on ethical, regulatory, and safety considerations. Inclusion criteria encompassed peer-reviewed articles, reviews, and relevant research papers published in English. Exclusion criteria included non?peer-reviewed articles, editorials, and studies not directly related to AI in health care. A comprehensive literature search was conducted across 8 databases: OVID MEDLINE, OVID Embase, OVID PsycINFO, EBSCO CINAHL Plus with Full Text, ProQuest Sociological Abstracts, ProQuest Philosopher?s Index, ProQuest Advanced Technologies & Aerospace, and Wiley Cochrane Library. The search was last updated on June 23, 2023. Results were synthesized using qualitative methods to identify key themes and findings related to the benefits and risks of AI in health care. Results: The literature search yielded 8796 articles. After removing duplicates and applying the inclusion and exclusion criteria, 44 studies were included in the qualitative synthesis. This review highlights the significant promise that AI holds in health care, such as enhancing health care delivery by providing more accurate diagnoses, personalized treatment plans, and efficient resource allocation. However, persistent concerns remain, including biases ingrained in AI algorithms, a lack of transparency in decision-making, potential compromises of patient data privacy, and safety risks associated with AI implementation in clinical settings. Conclusions: In conclusion, while AI presents the opportunity for a health care revolution, it is imperative to address the ethical, regulatory, and safety challenges linked to its integration. Proactive measures are required to ensure that AI technologies are developed and deployed responsibly, striking a balance between innovation and the safeguarding of patient well-being. UR - https://www.i-jmr.org/2024/1/e53616 UR - http://dx.doi.org/10.2196/53616 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53616 ER - TY - JOUR AU - Zhang, Cheng AU - Liu, Shanshan AU - Zhou, Xingyu AU - Zhou, Siyu AU - Tian, Yinglun AU - Wang, Shenglin AU - Xu, Nanfang AU - Li, Weishi PY - 2024/11/15 TI - Examining the Role of Large Language Models in Orthopedics: Systematic Review JO - J Med Internet Res SP - e59607 VL - 26 KW - large language model KW - LLM KW - orthopedics KW - generative pretrained transformer KW - GPT KW - ChatGPT KW - digital health KW - clinical practice KW - artificial intelligence KW - AI KW - generative AI KW - Bard N2 - Background: Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent. Objective: The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges. Methods: PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of ?large language model,? ?generative artificial intelligence,? ?ChatGPT,? and ?orthopaedics,? were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials?Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment. Results: A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs? performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4?s accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections. Conclusions: LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision. UR - https://www.jmir.org/2024/1/e59607 UR - http://dx.doi.org/10.2196/59607 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59607 ER - TY - JOUR AU - An, Jinghui AU - Shi, Fengwu AU - Wang, Huajun AU - Zhang, Hang AU - Liu, Su PY - 2024/11/8 TI - Evaluating the Sensitivity of Wearable Devices in Posttranscatheter Aortic Valve Implantation Functional Assessment JO - JMIR Mhealth Uhealth SP - e65277 VL - 12 KW - aortic valve KW - implantation functional KW - wearable devices UR - https://mhealth.jmir.org/2024/1/e65277 UR - http://dx.doi.org/10.2196/65277 ID - info:doi/10.2196/65277 ER - TY - JOUR AU - Lin, Yu-Chun AU - Yan, Huang-Ting AU - Lin, Chih-Hsueh AU - Chang, Hen-Hong PY - 2024/11/8 TI - Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study JO - J Med Internet Res SP - e58466 VL - 26 KW - frailty phenotypes KW - older adults KW - successful aging KW - vocal biomarkers KW - frailty KW - phenotype KW - vocal biomarker KW - cross-sectional KW - gerontology KW - geriatrics KW - older adult KW - Taiwan KW - energy-based KW - hybrid-based KW - sarcopenia N2 - Background: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice. Objective: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers. Methods: Participants aged ?60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty?energy, and hybrid-based frailty?sarcopenia. Participants were asked to pronounce a sustained vowel ?/a/? for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters?the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)?were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model. Results: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty?sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty?energy (OR 1.43, 95% CI 1.02-2.01), respectively. Conclusions: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype. UR - https://www.jmir.org/2024/1/e58466 UR - http://dx.doi.org/10.2196/58466 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58466 ER - TY - JOUR AU - Wang, Leyao AU - Wan, Zhiyu AU - Ni, Congning AU - Song, Qingyuan AU - Li, Yang AU - Clayton, Ellen AU - Malin, Bradley AU - Yin, Zhijun PY - 2024/11/7 TI - Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review JO - J Med Internet Res SP - e22769 VL - 26 KW - large language model KW - ChatGPT KW - artificial intelligence KW - natural language processing KW - health care KW - summarization KW - medical knowledge inquiry KW - reliability KW - bias KW - privacy N2 - Background: The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. Objective: This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. Methods: We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care. UR - https://www.jmir.org/2024/1/e22769 UR - http://dx.doi.org/10.2196/22769 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/22769 ER - TY - JOUR AU - Kim, Heon Ho AU - Jeong, Chan Won AU - Pi, Kyungran AU - Lee, Soeun Angela AU - Kim, Soo Min AU - Kim, Jin Hye AU - Kim, Hong Jae PY - 2024/11/5 TI - A Deep Learning Model to Predict Breast Implant Texture Types Using Ultrasonography Images: Feasibility Development Study JO - JMIR Form Res SP - e58776 VL - 8 KW - breast implants KW - mammoplasty KW - ultrasonography: AI-assisted diagnosis KW - cshell surface topography KW - artificial intelligence KW - deep learning KW - machine learning N2 - Background: Breast implants, including textured variants, have been widely used in aesthetic and reconstructive mammoplasty. However, the textured type, which is one of the shell texture types of breast implants, has been identified as a possible etiologic factor for lymphoma, specifically breast implant?associated anaplastic large cell lymphoma (BIA-ALCL). Identifying the shell texture type of the implant is critical to diagnosing BIA-ALCL. However, distinguishing the shell texture type can be difficult due to the loss of human memory and medical history. An alternative approach is to use ultrasonography, but this method also has limitations in quantitative assessment. Objective: This study aims to determine the feasibility of using a deep learning model to classify the shell texture type of breast implants and make robust predictions from ultrasonography images from heterogeneous sources. Methods: A total of 19,502 breast implant images were retrospectively collected from heterogeneous sources, including images captured from both Canon and GE devices, images of ruptured implants, and images without implants, as well as publicly available images. The Canon images were trained using ResNet-50. The model?s performance on the Canon dataset was evaluated using stratified 5-fold cross-validation. Additionally, external validation was conducted using the GE and publicly available datasets. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (PRAUC) were calculated based on the contribution of the pixels with Gradient-weighted Class Activation Mapping (Grad-CAM). To identify the significant pixels for classification, we masked the pixels that contributed less than 10%, up to a maximum of 100%. To assess the model?s robustness to uncertainty, Shannon entropy was calculated for 4 image groups: Canon, GE, ruptured implants, and without implants. Results: The deep learning model achieved an average AUROC of 0.98 and a PRAUC of 0.88 in the Canon dataset. The model achieved an AUROC of 0.985 and a PRAUC of 0.748 for images captured with GE devices. Additionally, the model predicted an AUROC of 0.909 and a PRAUC of 0.958 for the publicly available dataset. This model maintained the PRAUC values for quantitative validation when masking up to 90% of the least-contributing pixels and the remnant pixels in breast shell layers. Furthermore, the prediction uncertainty increased in the following order: Canon (0.066), GE (0072), ruptured implants (0.371), and no implants (0.777). Conclusions: We have demonstrated the feasibility of using deep learning to predict the shell texture type of breast implants. This approach quantifies the shell texture types of breast implants, supporting the first step in the diagnosis of BIA-ALCL. UR - https://formative.jmir.org/2024/1/e58776 UR - http://dx.doi.org/10.2196/58776 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58776 ER - TY - JOUR AU - von Bahr, Joar AU - Diwan, Vinod AU - Mårtensson, Andreas AU - Linder, Nina AU - Lundin, Johan PY - 2024/11/1 TI - AI-Supported Digital Microscopy Diagnostics in Primary Health Care Laboratories: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e58149 VL - 13 KW - AI KW - artificial intelligence KW - convolutional neural network KW - deep learning KW - diagnosis KW - digital diagnostics KW - machine learning KW - pathology KW - primary health care KW - whole slide images N2 - Background: Digital microscopy combined with artificial intelligence (AI) is increasingly being implemented in health care, predominantly in advanced laboratory settings. However, AI-supported digital microscopy could be especially advantageous in primary health care settings, since such methods could improve access to diagnostics via automation and lead to a decreased need for experts on site. To our knowledge, no scoping or systematic review had been published on the use of AI-supported digital microscopy within primary health care laboratories when this scoping review was initiated. A scoping review can guide future research by providing insights to help navigate the challenges of implementing these novel methods in primary health care laboratories. Objective: The objective of this scoping review is to map peer-reviewed studies on AI-supported digital microscopy in primary health care laboratories to generate an overview of the subject. Methods: A systematic search of the databases PubMed, Web of Science, Embase, and IEEE will be conducted. Only peer-reviewed articles in English will be considered, and no limit on publication year will be applied. The concept inclusion criteria in the scoping review include studies that have applied AI-supported digital microscopy with the aim of achieving a diagnosis on the subject level. In addition, the studies must have been performed in the context of primary health care laboratories, as defined by the criteria of not having a pathologist on site and using simple sample preparations. The study selection and data extraction will be performed by 2 independent researchers, and in the case of disagreements, a third researcher will be involved. The results will be presented in a table developed by the researchers, including information on investigated diseases, sample collection, preparation and digitization, AI model used, and results. Furthermore, the results will be described narratively to provide an overview of the studies included. The proposed methodology is in accordance with the JBI methodology for scoping reviews. Results: The scoping review was initiated in January 2023, and a protocol was published in the Open Science Framework in January 2024. The protocol was completed in March 2024, and the systematic search will be performed after the protocol has been peer reviewed. The scoping review is expected to be finalized by the end of 2024. Conclusions: A systematic review of studies on AI-supported digital microscopy in primary health care laboratories is anticipated to identify the diseases where these novel methods could be advantageous, along with the shared challenges encountered and approaches taken to address them. International Registered Report Identifier (IRRID): PRR1-10.2196/58149 UR - https://www.researchprotocols.org/2024/1/e58149 UR - http://dx.doi.org/10.2196/58149 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58149 ER - TY - JOUR AU - Riad, Rachid AU - Denais, Martin AU - de Gennes, Marc AU - Lesage, Adrien AU - Oustric, Vincent AU - Cao, Nga Xuan AU - Mouchabac, Stéphane AU - Bourla, Alexis PY - 2024/10/31 TI - Automated Speech Analysis for Risk Detection of Depression, Anxiety, Insomnia, and Fatigue: Algorithm Development and Validation Study JO - J Med Internet Res SP - e58572 VL - 26 KW - speech analysis KW - voice detection KW - voice analysis KW - speech biomarkers KW - speech-based systems KW - computer-aided diagnosis KW - mental health symptom detection KW - machine learning KW - mental health KW - fatigue KW - anxiety KW - depression N2 - Background: While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in mental health do not properly assess the limitations of speech-based systems, such as uncertainty, or fairness for a safe clinical deployment. Objective: We investigated the predictive potential of mobile-collected speech data for detecting and estimating depression, anxiety, fatigue, and insomnia, focusing on other factors than mere accuracy, in the general population. Methods: We included 865 healthy adults and recorded their answers regarding their perceived mental and sleep states. We asked how they felt and if they had slept well lately. Clinically validated questionnaires measuring depression, anxiety, insomnia, and fatigue severity were also used. We developed a novel speech and machine learning pipeline involving voice activity detection, feature extraction, and model training. We automatically modeled speech with pretrained deep learning models that were pretrained on a large, open, and free database, and we selected the best one on the validation set. Based on the best speech modeling approach, clinical threshold detection, individual score prediction, model uncertainty estimation, and performance fairness across demographics (age, sex, and education) were evaluated. We used a train-validation-test split for all evaluations: to develop our models, select the best ones, and assess the generalizability of held-out data. Results: The best model was Whisper M with a max pooling and oversampling method. Our methods achieved good detection performance for all symptoms, depression (Patient Health Questionnaire-9: area under the curve [AUC]=0.76; F1-score=0.49 and Beck Depression Inventory: AUC=0.78; F1-score=0.65), anxiety (Generalized Anxiety Disorder 7-item scale: AUC=0.77; F1-score=0.50), insomnia (Athens Insomnia Scale: AUC=0.73; F1-score=0.62), and fatigue (Multidimensional Fatigue Inventory total score: AUC=0.68; F1-score=0.88). The system performed well when it needed to abstain from making predictions, as demonstrated by low abstention rates in depression detection with the Beck Depression Inventory and fatigue, with risk-coverage AUCs below 0.4. Individual symptom scores were accurately predicted (correlations were all significant with Pearson strengths between 0.31 and 0.49). Fairness analysis revealed that models were consistent for sex (average disparity ratio [DR] 0.86, SD 0.13), to a lesser extent for education level (average DR 0.47, SD 0.30), and worse for age groups (average DR 0.33, SD 0.30). Conclusions: This study demonstrates the potential of speech-based systems for multifaceted mental health assessment in the general population, not only for detecting clinical thresholds but also for estimating their severity. Addressing fairness and incorporating uncertainty estimation with selective classification are key contributions that can enhance the clinical utility and responsible implementation of such systems. UR - https://www.jmir.org/2024/1/e58572 UR - http://dx.doi.org/10.2196/58572 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58572 ER - TY - JOUR AU - Barlow, Richard AU - Bewley, Anthony AU - Gkini, Angeliki Maria PY - 2024/10/16 TI - AI in Psoriatic Disease: Scoping Review JO - JMIR Dermatol SP - e50451 VL - 7 KW - artificial intelligence KW - machine learning KW - psoriasis KW - psoriatic arthritis KW - psoriatic disease KW - biologics KW - prognostic models KW - mobile phone N2 - Background: Artificial intelligence (AI) has many applications in numerous medical fields, including dermatology. Although the majority of AI studies in dermatology focus on skin cancer, there is growing interest in the applicability of AI models in inflammatory diseases, such as psoriasis. Psoriatic disease is a chronic, inflammatory, immune-mediated systemic condition with multiple comorbidities and a significant impact on patients? quality of life. Advanced treatments, including biologics and small molecules, have transformed the management of psoriatic disease. Nevertheless, there are still considerable unmet needs. Globally, delays in the diagnosis of the disease and its severity are common due to poor access to health care systems. Moreover, despite the abundance of treatments, we are unable to predict which is the right medication for the right patient, especially in resource-limited settings. AI could be an additional tool to address those needs. In this way, we can improve rates of diagnosis, accurately assess severity, and predict outcomes of treatment. Objective: This study aims to provide an up-to-date literature review on the use of AI in psoriatic disease, including diagnostics and clinical management as well as addressing the limitations in applicability. Methods: We searched the databases MEDLINE, PubMed, and Embase using the keywords ?AI AND psoriasis OR psoriatic arthritis OR psoriatic disease,? ?machine learning AND psoriasis OR psoriatic arthritis OR psoriatic disease,? and ?prognostic model AND psoriasis OR psoriatic arthritis OR psoriatic disease? until June 1, 2023. Reference lists of relevant papers were also cross-examined for other papers not detected in the initial search. Results: Our literature search yielded 38 relevant papers. AI has been identified as a key component in digital health technologies. Within this field, there is the potential to apply specific techniques such as machine learning and deep learning to address several aspects of managing psoriatic disease. This includes diagnosis, particularly useful for remote teledermatology via photographs taken by patients as well as monitoring and estimating severity. Similarly, AI can be used to synthesize the vast data sets already in place through patient registries which can help identify appropriate biologic treatments for future cohorts and those individuals most likely to develop complications. Conclusions: There are multiple advantageous uses for AI and digital health technologies in psoriatic disease. With wider implementation of AI, we need to be mindful of potential limitations, such as validation and standardization or generalizability of results in specific populations, such as patients with darker skin phototypes. UR - https://derma.jmir.org/2024/1/e50451 UR - http://dx.doi.org/10.2196/50451 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/50451 ER - TY - JOUR AU - Cho, Yunah AU - Talboys, L. Sharon PY - 2024/10/15 TI - Trends in South Korean Medical Device Development for Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder: Narrative Review JO - JMIR Biomed Eng SP - e60399 VL - 9 KW - ADHD KW - attention-deficit/hyperactivity disorder KW - ASD KW - autism spectrum disorder KW - medical device KW - digital therapeutics N2 - Background: Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) are among the most prevalent mental disorders among school-aged youth in South Korea and may play a role in the increasing pressures on teachers and school-based special education programming. A lack of support for special education; tensions between teachers, students, and parents; and limited backup for teacher absences are common complaints among Korean educators. New innovations in technology to screen and treat ADHD and ASD may offer relief to students, parents, and teachers through earlier and efficient diagnosis; access to treatment options; and ultimately, better-managed care and expectations. Objective: This narrative literature review provides an account of medical device use and development in South Korea for the diagnosis and management of ADHD and ASD and highlights research gaps. Methods: A narrative review was conducted across 4 databases (PubMed, Korean National Assembly Library, Scopus, and PsycINFO). Journal articles, dissertations, and government research and development reports were included if they discussed medical devices for ADHD and ASD. Only Korean or English papers were included. Resources were excluded if they did not correspond to the research objective or did not discuss at least 1 topic about medical devices for ADHD and ASD. Journal articles were excluded if they were not peer reviewed. Resources were limited to publications between 2013 and July 22, 2024. Results: A total of 1794 records about trends in Korean medical device development were categorized into 2 major groups: digital therapeutics and traditional therapy. Digital therapeutics resulted in 5 subgroups: virtual reality and artificial intelligence, machine learning and robot, gaming and visual contents, eye-feedback and movement intervention, and electroencephalography and neurofeedback. Traditional therapy resulted in 3 subgroups: cognitive behavioral therapy and working memory; diagnosis and rating scale; and musical, literary therapy, and mindfulness-based stress reduction. Digital therapeutics using artificial intelligence, machine learning, and electroencephalography technologies account for the biggest portions of development in South Korea, rather than traditional therapies. Most resources, 94.15% (1689/1794), were from the Korean National Assembly Library. Conclusions: Limitations include small sizes of populations to conclude findings in many articles, a lower number of articles discussing medical devices for ASD, and a majority of articles being dissertations. Emerging digital medical devices and those integrated with traditional therapies are important solutions to reducing the prevalence rates of ADHD and ASD in South Korea by promoting early diagnosis and intervention. Furthermore, their application will relieve pressures on teachers and school-based special education programming by providing direct supporting resources to students with ADHD or ASD. Future development of medical devices for ADHD and ASD is predicted to heavily rely on digital technologies, such as those that sense people?s behaviors, eye movement, and brainwaves. UR - https://biomedeng.jmir.org/2024/1/e60399 UR - http://dx.doi.org/10.2196/60399 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60399 ER - TY - JOUR AU - Mao, Lijun AU - Yu, Zhen AU - Lin, Luotao AU - Sharma, Manoj AU - Song, Hualing AU - Zhao, Hailei AU - Xu, Xianglong PY - 2024/10/9 TI - Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms JO - JMIR Aging SP - e59810 VL - 7 KW - visual impairment KW - China KW - middle-aged and elderly adults KW - machine learning KW - prediction model N2 - Background: Visual impairment (VI) is a prevalent global health issue, affecting over 2.2 billion people worldwide, with nearly half of the Chinese population aged 60 years and older being affected. Early detection of high-risk VI is essential for preventing irreversible vision loss among Chinese middle-aged and older adults. While machine learning (ML) algorithms exhibit significant predictive advantages, their application in predicting VI risk among the general middle-aged and older adult population in China remains limited. Objective: This study aimed to predict VI and identify its determinants using ML algorithms. Methods: We used 19,047 participants from 4 waves of the China Health and Retirement Longitudinal Study (CHARLS) that were conducted between 2011 and 2018. To envisage the prevalence of VI, we generated a geographical distribution map. Additionally, we constructed a model using indicators of a self-reported questionnaire, a physical examination, and blood biomarkers as predictors. Multiple ML algorithms, including gradient boosting machine, distributed random forest, the generalized linear model, deep learning, and stacked ensemble, were used for prediction. We plotted receiver operating characteristic and calibration curves to assess the predictive performance. Variable importance analysis was used to identify key predictors. Results: Among all participants, 33.9% (6449/19,047) had VI. Qinghai, Chongqing, Anhui, and Sichuan showed the highest VI rates, while Beijing and Xinjiang had the lowest. The generalized linear model, gradient boosting machine, and stacked ensemble achieved acceptable area under curve values of 0.706, 0.710, and 0.715, respectively, with the stacked ensemble performing best. Key predictors included hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, high-density lipoprotein cholesterol, and arthritis or rheumatism. Conclusions: Nearly one-third of middle-aged and older adults in China had VI. The prevalence of VI shows regional variations, but there are no distinct east-west or north-south distribution differences. ML algorithms demonstrate accurate predictive capabilities for VI. The combination of prediction models and variable importance analysis provides valuable insights for the early identification and intervention of VI among Chinese middle-aged and older adults. UR - https://aging.jmir.org/2024/1/e59810 UR - http://dx.doi.org/10.2196/59810 ID - info:doi/10.2196/59810 ER - TY - JOUR AU - Tao, Jin AU - Liu, Dan AU - Hu, Fu-Bi AU - Zhang, Xiao AU - Yin, Hongkun AU - Zhang, Huiling AU - Zhang, Kai AU - Huang, Zixing AU - Yang, Kun PY - 2024/10/9 TI - Development and Validation of a Computed Tomography?Based Model for Noninvasive Prediction of the T Stage in Gastric Cancer: Multicenter Retrospective Study JO - J Med Internet Res SP - e56851 VL - 26 KW - gastric cancer KW - computed tomography KW - radiomics KW - T stage KW - deep learning KW - cancer KW - multicenter study KW - accuracy KW - binary classification KW - tumor KW - hybrid model KW - performance KW - pathological stage N2 - Background: As part of the TNM (tumor-node-metastasis) staging system, T staging based on tumor depth is crucial for developing treatment plans. Previous studies have constructed a deep learning model based on computed tomographic (CT) radiomic signatures to predict the number of lymph node metastases and survival in patients with resected gastric cancer (GC). However, few studies have reported the combination of deep learning and radiomics in predicting T staging in GC. Objective: This study aimed to develop a CT-based model for automatic prediction of the T stage of GC via radiomics and deep learning. Methods: A total of 771 GC patients from 3 centers were retrospectively enrolled and divided into training, validation, and testing cohorts. Patients with GC were classified into mild (stage T1 and T2), moderate (stage T3), and severe (stage T4) groups. Three predictive models based on the labeled CT images were constructed using the radiomics features (radiomics model), deep features (deep learning model), and a combination of both (hybrid model). Results: The overall classification accuracy of the radiomics model was 64.3% in the internal testing data set. The deep learning model and hybrid model showed better performance than the radiomics model, with overall classification accuracies of 75.7% (P=.04) and 81.4% (P=.001), respectively. On the subtasks of binary classification of tumor severity, the areas under the curve of the radiomics, deep learning, and hybrid models were 0.875, 0.866, and 0.886 in the internal testing data set and 0.820, 0.818, and 0.972 in the external testing data set, respectively, for differentiating mild (stage T1~T2) from nonmild (stage T3~T4) patients, and were 0.815, 0.892, and 0.894 in the internal testing data set and 0.685, 0.808, and 0.897 in the external testing data set, respectively, for differentiating nonsevere (stage T1~T3) from severe (stage T4) patients. Conclusions: The hybrid model integrating radiomics features and deep features showed favorable performance in diagnosing the pathological stage of GC. UR - https://www.jmir.org/2024/1/e56851 UR - http://dx.doi.org/10.2196/56851 UR - http://www.ncbi.nlm.nih.gov/pubmed/39382960 ID - info:doi/10.2196/56851 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Harada, Yukinori AU - Tokumasu, Kazuki AU - Ito, Takahiro AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2024/10/2 TI - Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study JO - JMIR Med Inform SP - e63010 VL - 12 KW - artificial intelligence KW - clinical decision support KW - diagnostic excellence KW - generative artificial intelligence KW - large language models KW - natural language processing N2 - Background: Generative artificial intelligence (GAI) systems by Google have recently been updated from Bard to Gemini and Gemini Advanced as of December 2023. Gemini is a basic, free-to-use model after a user?s login, while Gemini Advanced operates on a more advanced model requiring a fee-based subscription. These systems have the potential to enhance medical diagnostics. However, the impact of these updates on comprehensive diagnostic accuracy remains unknown. Objective: This study aimed to compare the accuracy of the differential diagnosis lists generated by Gemini Advanced, Gemini, and Bard across comprehensive medical fields using case report series. Methods: We identified a case report series with relevant final diagnoses published in the American Journal Case Reports from January 2022 to March 2023. After excluding nondiagnostic cases and patients aged 10 years and younger, we included the remaining case reports. After refining the case parts as case descriptions, we input the same case descriptions into Gemini Advanced, Gemini, and Bard to generate the top 10 differential diagnosis lists. In total, 2 expert physicians independently evaluated whether the final diagnosis was included in the lists and its ranking. Any discrepancies were resolved by another expert physician. Bonferroni correction was applied to adjust the P values for the number of comparisons among 3 GAI systems, setting the corrected significance level at P value <.02. Results: In total, 392 case reports were included. The inclusion rates of the final diagnosis within the top 10 differential diagnosis lists were 73% (286/392) for Gemini Advanced, 76.5% (300/392) for Gemini, and 68.6% (269/392) for Bard. The top diagnoses matched the final diagnoses in 31.6% (124/392) for Gemini Advanced, 42.6% (167/392) for Gemini, and 31.4% (123/392) for Bard. Gemini demonstrated higher diagnostic accuracy than Bard both within the top 10 differential diagnosis lists (P=.02) and as the top diagnosis (P=.001). In addition, Gemini Advanced achieved significantly lower accuracy than Gemini in identifying the most probable diagnosis (P=.002). Conclusions: The results of this study suggest that Gemini outperformed Bard in diagnostic accuracy following the model update. However, Gemini Advanced requires further refinement to optimize its performance for future artificial intelligence?enhanced diagnostics. These findings should be interpreted cautiously and considered primarily for research purposes, as these GAI systems have not been adjusted for medical diagnostics nor approved for clinical use. UR - https://medinform.jmir.org/2024/1/e63010 UR - http://dx.doi.org/10.2196/63010 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63010 ER - TY - JOUR AU - Ming, Antao AU - Clemens, Vera AU - Lorek, Elisabeth AU - Wall, Janina AU - Alhajjar, Ahmad AU - Galazky, Imke AU - Baum, Anne-Katrin AU - Li, Yang AU - Li, Meng AU - Stober, Sebastian AU - Mertens, David Nils AU - Mertens, Rene Peter PY - 2024/10/1 TI - Game-Based Assessment of Peripheral Neuropathy Combining Sensor-Equipped Insoles, Video Games, and AI: Proof-of-Concept Study JO - J Med Internet Res SP - e52323 VL - 26 KW - diabetes mellitus KW - metabolic syndrome KW - peripheral neuropathy KW - sensor-equipped insoles KW - video games KW - machine learning KW - feature extraction N2 - Background: Detecting peripheral neuropathy (PNP) is crucial in preventing complications such as foot ulceration. Clinical examinations for PNP are infrequently provided to patients at high risk due to restrictions on facilities, care providers, or time. A gamified health assessment approach combining wearable sensors holds the potential to address these challenges and provide individuals with instantaneous feedback on their health status. Objective: We aimed to develop and evaluate an application that assesses PNP through video games controlled by pressure sensor?equipped insoles. Methods: In the proof-of-concept exploratory cohort study, a complete game-based framework that allowed the study participant to play 4 video games solely by modulating plantar pressure values was established in an outpatient clinic setting. Foot plantar pressures were measured by the sensor-equipped insole and transferred via Bluetooth to an Android tablet for game control in real time. Game results and sensor data were delivered to the study server for visualization and analysis. Each session lasted about 15 minutes. In total, 299 patients with diabetes mellitus and 30 with metabolic syndrome were tested using the game application. Patients? game performance was initially assessed by hypothesis-driven key capabilities that consisted of reaction time, sensation, skillfulness, balance, endurance, and muscle strength. Subsequently, specific game features were extracted from gaming data sets and compared with nerve conduction study findings, neuropathy symptoms, or disability scores. Multiple machine learning algorithms were applied to 70% (n=122) of acquired data to train predictive models for PNP, while the remaining data were held out for final model evaluation. Results: Overall, clinically evident PNP was present in 247 of 329 (75.1%) participants, with 88 (26.7%) individuals showing asymmetric nerve deficits. In a subcohort (n=37) undergoing nerve conduction study as the gold standard, sensory and motor nerve conduction velocities and nerve amplitudes in lower extremities significantly correlated with 79 game features (|R|>0.4, highest R value +0.65; P<.001; adjusted R2=0.36). Within another subcohort (n=173) with normal cognition and matched covariates (age, sex, BMI, etc), hypothesis-driven key capabilities and specific game features were significantly correlated with the presence of PNP. Predictive models using selected game features achieved 76.1% (left) and 81.7% (right foot) accuracy for PNP detection. Multiclass models yielded an area under the receiver operating characteristic curve of 0.76 (left foot) and 0.72 (right foot) for assessing nerve damage patterns (small, large, or mixed nerve fiber damage). Conclusions: The game-based application presents a promising avenue for PNP screening and classification. Evaluation in expanded cohorts may iteratively optimize artificial intelligence model efficacy. The integration of engaging motivational elements and automated data interpretation will support acceptance as a telemedical application. UR - https://www.jmir.org/2024/1/e52323 UR - http://dx.doi.org/10.2196/52323 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52323 ER - TY - JOUR AU - Xie, Fagen AU - Lee, Ming-sum AU - Allahwerdy, Salam AU - Getahun, Darios AU - Wessler, Benjamin AU - Chen, Wansu PY - 2024/9/30 TI - Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach JO - JMIR Cardio SP - e60503 VL - 8 KW - echocardiography report KW - heart valve KW - stenosis KW - regurgitation KW - natural language processing KW - algorithm N2 - Background: Valvular heart disease (VHD) is a leading cause of cardiovascular morbidity and mortality that poses a substantial health care and economic burden on health care systems. Administrative diagnostic codes for ascertaining VHD diagnosis are incomplete. Objective: This study aimed to develop a natural language processing (NLP) algorithm to identify patients with aortic, mitral, tricuspid, and pulmonic valve stenosis and regurgitation from transthoracic echocardiography (TTE) reports within a large integrated health care system. Methods: We used reports from echocardiograms performed in the Kaiser Permanente Southern California (KPSC) health care system between January 1, 2011, and December 31, 2022. Related terms/phrases of aortic, mitral, tricuspid, and pulmonic stenosis and regurgitation and their severities were compiled from the literature and enriched with input from clinicians. An NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review, followed by adjudication. The developed algorithm was applied to 200 annotated echocardiography reports to assess its performance and then the study echocardiography reports. Results: A total of 1,225,270 TTE reports were extracted from KPSC electronic health records during the study period. In these reports, valve lesions identified included 111,300 (9.08%) aortic stenosis, 20,246 (1.65%) mitral stenosis, 397 (0.03%) tricuspid stenosis, 2585 (0.21%) pulmonic stenosis, 345,115 (28.17%) aortic regurgitation, 802,103 (65.46%) mitral regurgitation, 903,965 (73.78%) tricuspid regurgitation, and 286,903 (23.42%) pulmonic regurgitation. Among the valves, 50,507 (4.12%), 22,656 (1.85%), 1685 (0.14%), and 1767 (0.14%) were identified as prosthetic aortic valves, mitral valves, tricuspid valves, and pulmonic valves, respectively. Mild and moderate were the most common severity levels of heart valve stenosis, while trace and mild were the most common severity levels of regurgitation. Males had a higher frequency of aortic stenosis and all 4 valvular regurgitations, while females had more mitral, tricuspid, and pulmonic stenosis. Non-Hispanic Whites had the highest frequency of all 4 valvular stenosis and regurgitations. The distribution of valvular stenosis and regurgitation severity was similar across race/ethnicity groups. Frequencies of aortic stenosis, mitral stenosis, and regurgitation of all 4 heart valves increased with age. In TTE reports with stenosis detected, younger patients were more likely to have mild aortic stenosis, while older patients were more likely to have severe aortic stenosis. However, mitral stenosis was opposite (milder in older patients and more severe in younger patients). In TTE reports with regurgitation detected, younger patients had a higher frequency of severe/very severe aortic regurgitation. In comparison, older patients had higher frequencies of mild aortic regurgitation and severe mitral/tricuspid regurgitation. Validation of the NLP algorithm against the 200 annotated TTE reports showed excellent precision, recall, and F1-scores. Conclusions: The proposed computerized algorithm could effectively identify heart valve stenosis and regurgitation, as well as the severity of valvular involvement, with significant implications for pharmacoepidemiological studies and outcomes research. UR - https://cardio.jmir.org/2024/1/e60503 UR - http://dx.doi.org/10.2196/60503 UR - http://www.ncbi.nlm.nih.gov/pubmed/39348175 ID - info:doi/10.2196/60503 ER - TY - JOUR AU - Goehringer, Jessica AU - Kosmin, Abigail AU - Laible, Natalie AU - Romagnoli, Katrina PY - 2024/9/26 TI - Assessing the Utility of a Patient-Facing Diagnostic Tool Among Individuals With Hypermobile Ehlers-Danlos Syndrome: Focus Group Study JO - JMIR Form Res SP - e49720 VL - 8 KW - diagnostic tool KW - hypermobile Ehlers-Danlos syndrome KW - patient experiences KW - diagnostic odyssey KW - affinity mapping KW - mobile health app KW - mobile phone N2 - Background: Hypermobile Ehlers-Danlos syndrome (hEDS), characterized by joint hypermobility, skin laxity, and tissue fragility, is thought to be the most common inherited connective tissue disorder, with millions affected worldwide. Diagnosing this condition remains a challenge that can impact quality of life for individuals with hEDS. Many with hEDS describe extended diagnostic odysseys involving exorbitant time and monetary investment. This delay is due to the complexity of diagnosis, symptom overlap with other conditions, and limited access to providers. Many primary care providers are unfamiliar with hEDS, compounded by genetics clinics that do not accept referrals for hEDS evaluation and long waits for genetics clinics that do evaluate for hEDS, leaving patients without sufficient options. Objective: This study explored the user experience, quality, and utility of a prototype of a patient-facing diagnostic tool intended to support clinician diagnosis for individuals with symptoms of hEDS. The questions included within the prototype are aligned with the 2017 international classification of Ehlers-Danlos syndromes. This study explored how this tool may help patients communicate information about hEDS to their physicians, influencing the diagnosis of hEDS and affecting patient experience. Methods: Participants clinically diagnosed with hEDS were recruited from either a medical center or private groups on a social media platform. Interested participants provided verbal consent, completed questionnaires about their diagnosis, and were invited to join an internet-based focus group to share their thoughts and opinions on a diagnostic tool prototype. Participants were invited to complete the Mobile App Rating Scale (MARS) to evaluate their experience viewing the diagnostic tool. The MARS is a framework for evaluating mobile health apps across 4 dimensions: engagement, functionality, esthetics, and information quality. Qualitative data were analyzed using affinity mapping to organize information and inductively create themes that were categorized within the MARS framework dimensions to help identify strengths and weaknesses of the diagnostic tool prototype. Results: In total, 15 individuals participated in the internet-based focus groups; 3 (20%) completed the MARS. Through affinity diagramming, 2 main categories of responses were identified, including responses related to the user interface and responses related to the application of the tool. Each category included several themes and subthemes that mapped well to the 4 MARS dimensions. The analysis showed that the tool held value and utility among the participants diagnosed with hEDS. The shareable ending summary sheet provided by the tool stood out as a strength for facilitating communication between patient and provider during the diagnostic evaluation. Conclusions: The results provide insights on the perceived utility and value of the tool, including preferred phrasing, layout and design preferences, and tool accessibility. The participants expressed that the tool may improve the hEDS diagnostic odyssey and help educate providers about the diagnostic process. UR - https://formative.jmir.org/2024/1/e49720 UR - http://dx.doi.org/10.2196/49720 UR - http://www.ncbi.nlm.nih.gov/pubmed/39325533 ID - info:doi/10.2196/49720 ER - TY - JOUR AU - Guni, Ahmad AU - Sounderajah, Viknesh AU - Whiting, Penny AU - Bossuyt, Patrick AU - Darzi, Ara AU - Ashrafian, Hutan PY - 2024/9/18 TI - Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies Using AI (QUADAS-AI): Protocol for a Qualitative Study JO - JMIR Res Protoc SP - e58202 VL - 13 KW - artificial intelligence KW - AI KW - AI-specific quality assessment of diagnostic accuracy studies KW - QUADAS-AI KW - AI-driven KW - diagnostics KW - evidence synthesis KW - quality assessment KW - evaluation KW - diagnostic KW - accuracy KW - bias KW - translation KW - clinical practice KW - assessment tool KW - diagnostic service N2 - Background: Quality assessment of diagnostic accuracy studies (QUADAS), and more recently QUADAS-2, were developed to aid the evaluation of methodological quality within primary diagnostic accuracy studies. However, its current form, QUADAS-2 does not address the unique considerations raised by artificial intelligence (AI)?centered diagnostic systems. The rapid progression of the AI diagnostics field mandates suitable quality assessment tools to determine the risk of bias and applicability, and subsequently evaluate translational potential for clinical practice. Objective: We aim to develop an AI-specific QUADAS (QUADAS-AI) tool that addresses the specific challenges associated with the appraisal of AI diagnostic accuracy studies. This paper describes the processes and methods that will be used to develop QUADAS-AI. Methods: The development of QUADAS-AI can be distilled into 3 broad stages. Stage 1?a project organization phase had been undertaken, during which a project team and a steering committee were established. The steering committee consists of a panel of international experts representing diverse stakeholder groups. Following this, the scope of the project was finalized. Stage 2?an item generation process will be completed following (1) a mapping review, (2) a meta-research study, (3) a scoping survey of international experts, and (4) a patient and public involvement and engagement exercise. Candidate items will then be put forward to the international Delphi panel to achieve consensus for inclusion in the revised tool. A modified Delphi consensus methodology involving multiple online rounds and a final consensus meeting will be carried out to refine the tool, following which the initial QUADAS-AI tool will be drafted. A piloting phase will be carried out to identify components that are considered to be either ambiguous or missing. Stage 3?once the steering committee has finalized the QUADAS-AI tool, specific dissemination strategies will be aimed toward academic, policy, regulatory, industry, and public stakeholders, respectively. Results: As of July 2024, the project organization phase, as well as the mapping review and meta-research study, have been completed. We aim to complete the item generation, including the Delphi consensus, and finalize the tool by the end of 2024. Therefore, QUADAS-AI will be able to provide a consensus-derived platform upon which stakeholders may systematically appraise the methodological quality associated with AI diagnostic accuracy studies by the beginning of 2025. Conclusions: AI-driven systems comprise an increasingly significant proportion of research in clinical diagnostics. Through this process, QUADAS-AI will aid the evaluation of studies in this domain in order to identify bias and applicability concerns. As such, QUADAS-AI may form a key part of clinical, governmental, and regulatory evaluation frameworks for AI diagnostic systems globally. International Registered Report Identifier (IRRID): DERR1-10.2196/58202 UR - https://www.researchprotocols.org/2024/1/e58202 UR - http://dx.doi.org/10.2196/58202 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58202 ER - TY - JOUR AU - Badal, D. Varsha AU - Reinen, M. Jenna AU - Twamley, W. Elizabeth AU - Lee, E. Ellen AU - Fellows, P. Robert AU - Bilal, Erhan AU - Depp, A. Colin PY - 2024/9/16 TI - Investigating Acoustic and Psycholinguistic Predictors of Cognitive Impairment in Older Adults: Modeling Study JO - JMIR Aging SP - e54655 VL - 7 KW - acoustic KW - psycholinguistic KW - speech KW - speech marker KW - speech markers KW - cognitive impairment KW - CI KW - mild cognitive impairment KW - MCI KW - cognitive disability KW - cognitive restriction KW - cognitive limitation KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - early warning KW - early detection KW - NLP KW - natural language processing KW - Alzheimer KW - dementia KW - neurological decline KW - neurocognition KW - neurocognitive disorder N2 - Background: About one-third of older adults aged 65 years and older often have mild cognitive impairment or dementia. Acoustic and psycho-linguistic features derived from conversation may be of great diagnostic value because speech involves verbal memory and cognitive and neuromuscular processes. The relative decline in these processes, however, may not be linear and remains understudied. Objective: This study aims to establish associations between cognitive abilities and various attributes of speech and natural language production. To date, the majority of research has been cross-sectional, relying mostly on data from structured interactions and restricted to textual versus acoustic analyses. Methods: In a sample of 71 older (mean age 83.3, SD 7.0 years) community-dwelling adults who completed qualitative interviews and cognitive testing, we investigated the performance of both acoustic and psycholinguistic features associated with cognitive deficits contemporaneously and at a 1-2 years follow up (mean follow-up time 512.3, SD 84.5 days). Results: Combined acoustic and psycholinguistic features achieved high performance (F1-scores 0.73-0.86) and sensitivity (up to 0.90) in estimating cognitive deficits across multiple domains. Performance remained high when acoustic and psycholinguistic features were used to predict follow-up cognitive performance. The psycholinguistic features that were most successful at classifying high cognitive impairment reflected vocabulary richness, the quantity of speech produced, and the fragmentation of speech, whereas the analogous top-ranked acoustic features reflected breathing and nonverbal vocalizations such as giggles or laughter. Conclusions: These results suggest that both acoustic and psycholinguistic features extracted from qualitative interviews may be reliable markers of cognitive deficits in late life. UR - https://aging.jmir.org/2024/1/e54655 UR - http://dx.doi.org/10.2196/54655 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54655 ER - TY - JOUR AU - Wunderlich, Markus Maximilian AU - Frey, Nicolas AU - Amende-Wolf, Sandro AU - Hinrichs, Carl AU - Balzer, Felix AU - Poncette, Akira-Sebastian PY - 2024/9/9 TI - Alarm Management in Provisional COVID-19 Intensive Care Units: Retrospective Analysis and Recommendations for Future Pandemics JO - JMIR Med Inform SP - e58347 VL - 12 KW - patient monitoring KW - intensive care unit KW - ICU KW - alarm fatigue KW - alarm management KW - patient safety KW - alarm system KW - alarm system quality KW - medical devices KW - clinical alarms KW - COVID-19 N2 - Background: In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. Objective: We aimed to compare alarm situations between provisional COVID-19 ICUs and non?COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. Methods: Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non?COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. Results: COVID-19 ICUs had significantly more alarms per bed per day than non?COVID-19 ICUs (P<.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19?specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (P=.04). Conclusions: Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings. UR - https://medinform.jmir.org/2024/1/e58347 UR - http://dx.doi.org/10.2196/58347 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58347 ER - TY - JOUR AU - Antaki, Fares AU - Hammana, Imane AU - Tessier, Marie-Catherine AU - Boucher, Andrée AU - David Jetté, Laurence Maud AU - Beauchemin, Catherine AU - Hammamji, Karim AU - Ong, Yuhan Ariel AU - Rhéaume, Marc-André AU - Gauthier, Danny AU - Harissi-Dagher, Mona AU - Keane, A. Pearse AU - Pomp, Alfons PY - 2024/9/3 TI - Implementation of Artificial Intelligence?Based Diabetic Retinopathy Screening in a Tertiary Care Hospital in Quebec: Prospective Validation Study JO - JMIR Diabetes SP - e59867 VL - 9 KW - artificial intelligence KW - diabetic retinopathy KW - screening KW - clinical validation KW - diabetic KW - diabetes KW - tertiary care hospital KW - validation study KW - Quebec KW - Canada KW - vision KW - vision loss KW - ophthalmological KW - AI KW - detection KW - eye N2 - Background: Diabetic retinopathy (DR) affects about 25% of people with diabetes in Canada. Early detection of DR is essential for preventing vision loss. Objective: We evaluated the real-world performance of an artificial intelligence (AI) system that analyzes fundus images for DR screening in a Quebec tertiary care center. Methods: We prospectively recruited adult patients with diabetes at the Centre hospitalier de l?Université de Montréal (CHUM) in Montreal, Quebec, Canada. Patients underwent dual-pathway screening: first by the Computer Assisted Retinal Analysis (CARA) AI system (index test), then by standard ophthalmological examination (reference standard). We measured the AI system's sensitivity and specificity for detecting referable disease at the patient level, along with its performance for detecting any retinopathy and diabetic macular edema (DME) at the eye level, and potential cost savings. Results: This study included 115 patients. CARA demonstrated a sensitivity of 87.5% (95% CI 71.9-95.0) and specificity of 66.2% (95% CI 54.3-76.3) for detecting referable disease at the patient level. For any retinopathy detection at the eye level, CARA showed 88.2% sensitivity (95% CI 76.6-94.5) and 71.4% specificity (95% CI 63.7-78.1). For DME detection, CARA had 100% sensitivity (95% CI 64.6-100) and 81.9% specificity (95% CI 75.6-86.8). Potential yearly savings from implementing CARA at the CHUM were estimated at CAD $245,635 (US $177,643.23, as of July 26, 2024) considering 5000 patients with diabetes. Conclusions: Our study indicates that integrating a semiautomated AI system for DR screening demonstrates high sensitivity for detecting referable disease in a real-world setting. This system has the potential to improve screening efficiency and reduce costs at the CHUM, but more work is needed to validate it. UR - https://diabetes.jmir.org/2024/1/e59867 UR - http://dx.doi.org/10.2196/59867 UR - http://www.ncbi.nlm.nih.gov/pubmed/39226095 ID - info:doi/10.2196/59867 ER - TY - JOUR AU - Pradhan, Apoorva AU - Wright, A. Eric AU - Hayduk, A. Vanessa AU - Berhane, Juliana AU - Sponenberg, Mallory AU - Webster, Leeann AU - Anderson, Hannah AU - Park, Siyeon AU - Graham, Jove AU - Friedenberg, Scott PY - 2024/8/29 TI - Impact of an Electronic Health Record?Based Interruptive Alert Among Patients With Headaches Seen in Primary Care: Cluster Randomized Controlled Trial JO - JMIR Med Inform SP - e58456 VL - 12 KW - headache management KW - migraine management KW - electronic health record?based alerts KW - primary care KW - clinician decision support tools KW - electronic health record KW - EHR N2 - Background: Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%?30% of referrals from primary care to neurology. In primary care, electronic health record?based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care. Objective: This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients? overall headache management. Methods: We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to?health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level. Results: We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate ?1.89, 95% CI ?5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon. Conclusions: The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches. Trial Registration: ClinicalTrials.gov NCT05067725; https://clinicaltrials.gov/study/NCT05067725 UR - https://medinform.jmir.org/2024/1/e58456 UR - http://dx.doi.org/10.2196/58456 ID - info:doi/10.2196/58456 ER - TY - JOUR AU - Lee, Haedeun AU - Oh, Bumjo AU - Kim, Seung-Chan PY - 2024/8/26 TI - Recognition of Forward Head Posture Through 3D Human Pose Estimation With a Graph Convolutional Network: Development and Feasibility Study JO - JMIR Form Res SP - e55476 VL - 8 KW - posture correction KW - injury prediction KW - human pose estimation KW - forward head posture KW - machine learning KW - graph convolutional networks KW - posture KW - graph neural network KW - graph KW - pose KW - postural KW - deep learning KW - neural network KW - neural networks KW - upper KW - algorithms N2 - Background: Prolonged improper posture can lead to forward head posture (FHP), causing headaches, impaired respiratory function, and fatigue. This is especially relevant in sedentary scenarios, where individuals often maintain static postures for extended periods?a significant part of daily life for many. The development of a system capable of detecting FHP is crucial, as it would not only alert users to correct their posture but also serve the broader goal of contributing to public health by preventing the progression of chronic injuries associated with this condition. However, despite significant advancements in estimating human poses from standard 2D images, most computational pose models do not include measurements of the craniovertebral angle, which involves the C7 vertebra, crucial for diagnosing FHP. Objective: Accurate diagnosis of FHP typically requires dedicated devices, such as clinical postural assessments or specialized imaging equipment, but their use is impractical for continuous, real-time monitoring in everyday settings. Therefore, developing an accessible, efficient method for regular posture assessment that can be easily integrated into daily activities, providing real-time feedback, and promoting corrective action, is necessary. Methods: The system sequentially estimates 2D and 3D human anatomical key points from a provided 2D image, using the Detectron2D and VideoPose3D algorithms, respectively. It then uses a graph convolutional network (GCN), explicitly crafted to analyze the spatial configuration and alignment of the upper body?s anatomical key points in 3D space. This GCN aims to implicitly learn the intricate relationship between the estimated 3D key points and the correct posture, specifically to identify FHP. Results: The test accuracy was 78.27% when inputs included all joints corresponding to the upper body key points. The GCN model demonstrated slightly superior balanced performance across classes with an F1-score (macro) of 77.54%, compared to the baseline feedforward neural network (FFNN) model?s 75.88%. Specifically, the GCN model showed a more balanced precision and recall between the classes, suggesting its potential for better generalization in FHP detection across diverse postures. Meanwhile, the baseline FFNN model demonstrates a higher precision for FHP cases but at the cost of lower recall, indicating that while it is more accurate in confirming FHP when detected, it misses a significant number of actual FHP instances. This assertion is further substantiated by the examination of the latent feature space using t-distributed stochastic neighbor embedding, where the GCN model presented an isotropic distribution, unlike the FFNN model, which showed an anisotropic distribution. Conclusions: Based on 2D image input using 3D human pose estimation joint inputs, it was found that it is possible to learn FHP-related features using the proposed GCN-based network to develop a posture correction system. We conclude the paper by addressing the limitations of our current system and proposing potential avenues for future work in this area. UR - https://formative.jmir.org/2024/1/e55476 UR - http://dx.doi.org/10.2196/55476 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55476 ER - TY - JOUR AU - Ridhi, Smriti AU - Robert, Dennis AU - Soren, Pitamber AU - Kumar, Manish AU - Pawar, Saniya AU - Reddy, Bhargava PY - 2024/8/21 TI - Comparing the Output of an Artificial Intelligence Algorithm in Detecting Radiological Signs of Pulmonary Tuberculosis in Digital Chest X-Rays and Their Smartphone-Captured Photos of X-Ray Films: Retrospective Study JO - JMIR Form Res SP - e55641 VL - 8 KW - artificial intelligence KW - AI KW - deep learning KW - early detection KW - tuberculosis KW - TB KW - computer-aided detection KW - diagnostic accuracy KW - chest x-ray KW - mobile phone N2 - Background: Artificial intelligence (AI) based computer-aided detection devices are recommended for screening and triaging of pulmonary tuberculosis (TB) using digital chest x-ray (CXR) images (soft copies). Most AI algorithms are trained using input data from digital CXR Digital Imaging and Communications in Medicine (DICOM) files. There can be scenarios when only digital CXR films (hard copies) are available for interpretation. A smartphone-captured photo of the digital CXR film may be used for AI to process in such a scenario. There is a gap in the literature investigating if there is a significant difference in the performance of AI algorithms when digital CXR DICOM files are used as input for AI to process as opposed to photos of the digital CXR films being used as input. Objective: The primary objective was to compare the agreement of AI in detecting radiological signs of TB when using DICOM files (denoted as CXRd) as input versus when using smartphone-captured photos of digital CXR films (denoted as CXRp) with human readers. Methods: Pairs of CXRd and CXRp images were obtained retrospectively from patients screened for TB. AI results were obtained using both the CXRd and CXRp files. The majority consensus on the presence or absence of TB in CXR pairs was obtained from a panel of 3 independent radiologists. The positive and negative percent agreement of AI in detecting radiological signs of TB in CXRd and CXRp were estimated by comparing with the majority consensus. The distribution of AI probability scores was also compared. Results: A total of 1278 CXR pairs were analyzed. The positive percent agreement of AI was found to be 92.22% (95% CI 89.94-94.12) and 90.75% (95% CI 88.32-92.82), respectively, for CXRd and CXRp images (P=.09). The negative percent agreement of AI was 82.08% (95% CI 78.76-85.07) and 79.23% (95% CI 75.75-82.42), respectively, for CXRd and CXRp images (P=.06). The median of the AI probability score was 0.72 (IQR 0.11-0.97) in CXRd and 0.72 (IQR 0.14-0.96) in CXRp images (P=.75). Conclusions: We did not observe any statistically significant differences in the output of AI in digital CXRs and photos of digital CXR films. UR - https://formative.jmir.org/2024/1/e55641 UR - http://dx.doi.org/10.2196/55641 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55641 ER - TY - JOUR AU - Rahman, Jessica AU - Brankovic, Aida AU - Tracy, Mark AU - Khanna, Sankalp PY - 2024/8/20 TI - Exploring Computational Techniques in Preprocessing Neonatal Physiological Signals for Detecting Adverse Outcomes: Scoping Review JO - Interact J Med Res SP - e46946 VL - 13 KW - physiological signals KW - preterm KW - neonatal intensive care unit KW - morbidity KW - signal processing KW - signal analysis KW - adverse outcomes KW - predictive and diagnostic models N2 - Background: Computational signal preprocessing is a prerequisite for developing data-driven predictive models for clinical decision support. Thus, identifying the best practices that adhere to clinical principles is critical to ensure transparency and reproducibility to drive clinical adoption. It further fosters reproducible, ethical, and reliable conduct of studies. This procedure is also crucial for setting up a software quality management system to ensure regulatory compliance in developing software as a medical device aimed at early preclinical detection of clinical deterioration. Objective: This scoping review focuses on the neonatal intensive care unit setting and summarizes the state-of-the-art computational methods used for preprocessing neonatal clinical physiological signals; these signals are used for the development of machine learning models to predict the risk of adverse outcomes. Methods: Five databases (PubMed, Web of Science, Scopus, IEEE, and ACM Digital Library) were searched using a combination of keywords and MeSH (Medical Subject Headings) terms. A total of 3585 papers from 2013 to January 2023 were identified based on the defined search terms and inclusion criteria. After removing duplicates, 2994 (83.51%) papers were screened by title and abstract, and 81 (0.03%) were selected for full-text review. Of these, 52 (64%) were eligible for inclusion in the detailed analysis. Results: Of the 52 articles reviewed, 24 (46%) studies focused on diagnostic models, while the remainder (n=28, 54%) focused on prognostic models. The analysis conducted in these studies involved various physiological signals, with electrocardiograms being the most prevalent. Different programming languages were used, with MATLAB and Python being notable. The monitoring and capturing of physiological data used diverse systems, impacting data quality and introducing study heterogeneity. Outcomes of interest included sepsis, apnea, bradycardia, mortality, necrotizing enterocolitis, and hypoxic-ischemic encephalopathy, with some studies analyzing combinations of adverse outcomes. We found a partial or complete lack of transparency in reporting the setting and the methods used for signal preprocessing. This includes reporting methods to handle missing data, segment size for considered analysis, and details regarding the modification of the state-of-the-art methods for physiological signal processing to align with the clinical principles for neonates. Only 7 (13%) of the 52 reviewed studies reported all the recommended preprocessing steps, which could have impacts on the downstream analysis. Conclusions: The review found heterogeneity in the techniques used and inconsistent reporting of parameters and procedures used for preprocessing neonatal physiological signals, which is necessary to confirm adherence to clinical and software quality management system practices, usefulness, and choice of best practices. Enhancing transparency in reporting and standardizing procedures will boost study interpretation and reproducibility and expedite clinical adoption, instilling confidence in the research findings and streamlining the translation of research outcomes into clinical practice, ultimately contributing to the advancement of neonatal care and patient outcomes. UR - https://www.i-jmr.org/2024/1/e46946 UR - http://dx.doi.org/10.2196/46946 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/46946 ER - TY - JOUR AU - Szumilas, Dawid AU - Ochmann, Anna AU - Zi?ba, Katarzyna AU - Bartoszewicz, Bart?omiej AU - Kubrak, Anna AU - Makuch, Sebastian AU - Agrawal, Siddarth AU - Mazur, Grzegorz AU - Chudek, Jerzy PY - 2024/8/14 TI - Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study JO - JMIR Med Inform SP - e57162 VL - 12 KW - LabTest Checker KW - CDSS KW - symptom checker KW - laboratory testing KW - AI KW - assessment KW - accuracy KW - artificial intelligence KW - health care KW - medical fields KW - clinical decision support systems KW - application KW - applications KW - diagnoses KW - patients KW - patient KW - medical history KW - tool KW - tools N2 - Background: In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results? significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area. Objective: The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients? medical histories. Methods: This cohort study embraced a prospective data collection approach. A total of 101 patients aged ?18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard. Results: The system achieved a 74.3% accuracy and 100% sensitivity for emergency safety and 92.3% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6% (42/101) and achieved an 82.9% accuracy in identifying underlying pathologies. Conclusions: This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC?s performance evaluation highlights the advancements in AI?s role in laboratory medicine. Trial Registration: ClinicalTrials.gov NCT05813938; https://clinicaltrials.gov/study/NCT05813938 UR - https://medinform.jmir.org/2024/1/e57162 UR - http://dx.doi.org/10.2196/57162 ID - info:doi/10.2196/57162 ER - TY - JOUR AU - Trettin, Bettina AU - Skjøth, Maria Mette AU - Munk, Trier Nadja AU - Vestergaard, Tine AU - Nielsen, Charlotte PY - 2024/8/14 TI - Shifting Grounds?Facilitating Self-Care in Testing for Sexually Transmitted Infections Through the Use of Self-Test Technology: Qualitative Study JO - J Particip Med SP - e55705 VL - 16 KW - chlamydia KW - sexually transmitted diseases KW - participatory design KW - self-test KW - qualitative KW - Chlamydia trachomatis KW - lymphogranuloma venereum KW - participatory KW - STD KW - STDs KW - sexually transmitted KW - sexually transmitted illness KW - sexually transmitted illnesses KW - STI KW - STIs KW - participation KW - self-testing KW - screening KW - health screening KW - asymptomatic screening KW - testing uptake N2 - Background: Chlamydia remains prevalent worldwide and is considered a global public health problem. However, testing rates among young sexually active people remain low. Effective clinical management relies on screening asymptomatic patients. However, attending face-to-face consultations of testing for sexually transmitted infections is associated with stigmatization and anxiety. Self-testing technology (STT) allows patients to test themselves for chlamydia and gonorrhea without the presence of health care professionals. This may result in wider access to testing and increase testing uptake. Therefore, the sexual health clinic at Odense University Hospital has designed and developed a technology that allows patients to get tested at the clinic through self-collected sampling without a face-to-face consultation. Objective: This study aimed to (1) pilot-test STT used in clinical practice and (2) investigate the experiences of patients who have completed a self-test for chlamydia and gonorrhea. Methods: The study was conducted as a qualitative study inspired by the methodology of participatory design. Ethnographic methods were applied in the feasibility study and the data analyzed were inspired by the action research spiral in iterative processes using steps, such as plan, act, observe, and reflect. The qualitative evaluation study used semistructured interviews and data were analyzed using a qualitative 3-level analytical model. Results: The findings from the feasibility study, such as lack of signposting and adequate information, led to the final modifications of the self-test technology and made it possible to implement it in clinical practice. The qualitative evaluation study found that self-testing was seen as more appealing than testing at a face-to-face consultation because it was an easy solution that both saved time and allowed for the freedom to plan the visit independently. Security was experienced when the instructions balanced between being detail-oriented while also being simple and illustrative. The anonymity and discretion contributed to preserving privacy and removed the fear of an awkward conversation or being judged by health care professionals thus leading to the reduction of intrusive feelings. Conclusions: Accessible health care services are crucial in preventing and reducing the impact of sexually transmitted infections and STT may have the potential to increase testing uptake as it takes into account some of the barriers that exist. The pilot test and evaluation have resulted in a fully functioning implementation of STT in clinical practice. UR - https://jopm.jmir.org/2024/1/e55705 UR - http://dx.doi.org/10.2196/55705 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55705 ER - TY - JOUR AU - Tsai, Chung-You AU - Tian, Jing-Hui AU - Lee, Chien-Cheng AU - Kuo, Hann-Chorng PY - 2024/7/23 TI - Building Dual AI Models and Nomograms Using Noninvasive Parameters for Aiding Male Bladder Outlet Obstruction Diagnosis and Minimizing the Need for Invasive Video-Urodynamic Studies: Development and Validation Study JO - J Med Internet Res SP - e58599 VL - 26 KW - bladder outlet obstruction KW - lower urinary tract symptoms KW - machine learning KW - nomogram KW - artificial intelligence KW - video urodynamic study N2 - Background: Diagnosing underlying causes of nonneurogenic male lower urinary tract symptoms associated with bladder outlet obstruction (BOO) is challenging. Video-urodynamic studies (VUDS) and pressure-flow studies (PFS) are both invasive diagnostic methods for BOO. VUDS can more precisely differentiate etiologies of male BOO, such as benign prostatic obstruction, primary bladder neck obstruction, and dysfunctional voiding, potentially outperforming PFS. Objective: These examinations? invasive nature highlights the need for developing noninvasive predictive models to facilitate BOO diagnosis and reduce the necessity for invasive procedures. Methods: We conducted a retrospective study with a cohort of men with medication-refractory, nonneurogenic lower urinary tract symptoms suspected of BOO who underwent VUDS from 2001 to 2022. In total, 2 BOO predictive models were developed?1 based on the International Continence Society?s definition (International Continence Society?defined bladder outlet obstruction; ICS-BOO) and the other on video-urodynamic studies?diagnosed bladder outlet obstruction (VBOO). The patient cohort was randomly split into training and test sets for analysis. A total of 6 machine learning algorithms, including logistic regression, were used for model development. During model development, we first performed development validation using repeated 5-fold cross-validation on the training set and then test validation to assess the model?s performance on an independent test set. Both models were implemented as paper-based nomograms and integrated into a web-based artificial intelligence prediction tool to aid clinical decision-making. Results: Among 307 patients, 26.7% (n=82) met the ICS-BOO criteria, while 82.1% (n=252) were diagnosed with VBOO. The ICS-BOO prediction model had a mean area under the receiver operating characteristic curve (AUC) of 0.74 (SD 0.09) and mean accuracy of 0.76 (SD 0.04) in development validation and AUC and accuracy of 0.86 and 0.77, respectively, in test validation. The VBOO prediction model yielded a mean AUC of 0.71 (SD 0.06) and mean accuracy of 0.77 (SD 0.06) internally, with AUC and accuracy of 0.72 and 0.76, respectively, externally. When both models? predictions are applied to the same patient, their combined insights can significantly enhance clinical decision-making and simplify the diagnostic pathway. By the dual-model prediction approach, if both models positively predict BOO, suggesting all cases actually resulted from medication-refractory primary bladder neck obstruction or benign prostatic obstruction, surgical intervention may be considered. Thus, VUDS might be unnecessary for 100 (32.6%) patients. Conversely, when ICS-BOO predictions are negative but VBOO predictions are positive, indicating varied etiology, VUDS rather than PFS is advised for precise diagnosis and guiding subsequent therapy, accurately identifying 51.1% (47/92) of patients for VUDS. Conclusions: The 2 machine learning models predicting ICS-BOO and VBOO, based on 6 noninvasive clinical parameters, demonstrate commendable discrimination performance. Using the dual-model prediction approach, when both models predict positively, VUDS may be avoided, assisting in male BOO diagnosis and reducing the need for such invasive procedures. UR - https://www.jmir.org/2024/1/e58599 UR - http://dx.doi.org/10.2196/58599 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58599 ER - TY - JOUR AU - Knitza, Johannes AU - Tascilar, Koray AU - Fuchs, Franziska AU - Mohn, Jacob AU - Kuhn, Sebastian AU - Bohr, Daniela AU - Muehlensiepen, Felix AU - Bergmann, Christina AU - Labinsky, Hannah AU - Morf, Harriet AU - Araujo, Elizabeth AU - Englbrecht, Matthias AU - Vorbrüggen, Wolfgang AU - von der Decken, Cay-Benedict AU - Kleinert, Stefan AU - Ramming, Andreas AU - Distler, W. Jörg H. AU - Bartz-Bazzanella, Peter AU - Vuillerme, Nicolas AU - Schett, Georg AU - Welcker, Martin AU - Hueber, Axel PY - 2024/7/23 TI - Diagnostic Accuracy of a Mobile AI-Based Symptom Checker and a Web-Based Self-Referral Tool in Rheumatology: Multicenter Randomized Controlled Trial JO - J Med Internet Res SP - e55542 VL - 26 KW - symptom checker KW - artificial intelligence KW - eHealth KW - diagnostic decision support system KW - rheumatology KW - decision support KW - decision KW - diagnostic KW - tool KW - rheumatologists KW - symptom assessment KW - resources KW - randomized controlled trial KW - diagnosis KW - decision support system KW - support system KW - support N2 - Background: The diagnosis of inflammatory rheumatic diseases (IRDs) is often delayed due to unspecific symptoms and a shortage of rheumatologists. Digital diagnostic decision support systems (DDSSs) have the potential to expedite diagnosis and help patients navigate the health care system more efficiently. Objective: The aim of this study was to assess the diagnostic accuracy of a mobile artificial intelligence (AI)?based symptom checker (Ada) and a web-based self-referral tool (Rheport) regarding IRDs. Methods: A prospective, multicenter, open-label, crossover randomized controlled trial was conducted with patients newly presenting to 3 rheumatology centers. Participants were randomly assigned to complete a symptom assessment using either Ada or Rheport. The primary outcome was the correct identification of IRDs by the DDSSs, defined as the presence of any IRD in the list of suggested diagnoses by Ada or achieving a prespecified threshold score with Rheport. The gold standard was the diagnosis made by rheumatologists. Results: A total of 600 patients were included, among whom 214 (35.7%) were diagnosed with an IRD. Most frequent IRD was rheumatoid arthritis with 69 (11.5%) patients. Rheport?s disease suggestion and Ada?s top 1 (D1) and top 5 (D5) disease suggestions demonstrated overall diagnostic accuracies of 52%, 63%, and 58%, respectively, for IRDs. Rheport showed a sensitivity of 62% and a specificity of 47% for IRDs. Ada?s D1 and D5 disease suggestions showed a sensitivity of 52% and 66%, respectively, and a specificity of 68% and 54%, respectively, concerning IRDs. Ada?s diagnostic accuracy regarding individual diagnoses was heterogenous, and Ada performed considerably better in identifying rheumatoid arthritis in comparison to other diagnoses (D1: 42%; D5: 64%). The Cohen ? statistic of Rheport for agreement on any rheumatic disease diagnosis with Ada D1 was 0.15 (95% CI 0.08-0.18) and with Ada D5 was 0.08 (95% CI 0.00-0.16), indicating poor agreement for the presence of any rheumatic disease between the 2 DDSSs. Conclusions: To our knowledge, this is the largest comparative DDSS trial with actual use of DDSSs by patients. The diagnostic accuracies of both DDSSs for IRDs were not promising in this high-prevalence patient population. DDSSs may lead to a misuse of scarce health care resources. Our results underscore the need for stringent regulation and drastic improvements to ensure the safety and efficacy of DDSSs. Trial Registration: German Register of Clinical Trials DRKS00017642; https://drks.de/search/en/trial/DRKS00017642 UR - https://www.jmir.org/2024/1/e55542 UR - http://dx.doi.org/10.2196/55542 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55542 ER - TY - JOUR AU - Kizaki, Hayato AU - Satoh, Hiroki AU - Ebara, Sayaka AU - Watabe, Satoshi AU - Sawada, Yasufumi AU - Imai, Shungo AU - Hori, Satoko PY - 2024/7/23 TI - Construction of a Multi-Label Classifier for Extracting Multiple Incident Factors From Medication Incident Reports in Residential Care Facilities: Natural Language Processing Approach JO - JMIR Med Inform SP - e58141 VL - 12 KW - residential facilities KW - incidents KW - non-medical staff KW - natural language processing KW - risk management N2 - Background: Medication safety in residential care facilities is a critical concern, particularly when nonmedical staff provide medication assistance. The complex nature of medication-related incidents in these settings, coupled with the psychological impact on health care providers, underscores the need for effective incident analysis and preventive strategies. A thorough understanding of the root causes, typically through incident-report analysis, is essential for mitigating medication-related incidents. Objective: We aimed to develop and evaluate a multilabel classifier using natural language processing to identify factors contributing to medication-related incidents using incident report descriptions from residential care facilities, with a focus on incidents involving nonmedical staff. Methods: We analyzed 2143 incident reports, comprising 7121 sentences, from residential care facilities in Japan between April 1, 2015, and March 31, 2016. The incident factors were annotated using sentences based on an established organizational factor model and previous research findings. The following 9 factors were defined: procedure adherence, medicine, resident, resident family, nonmedical staff, medical staff, team, environment, and organizational management. To assess the label criteria, 2 researchers with relevant medical knowledge annotated a subset of 50 reports; the interannotator agreement was measured using Cohen ?. The entire data set was subsequently annotated by 1 researcher. Multiple labels were assigned to each sentence. A multilabel classifier was developed using deep learning models, including 2 Bidirectional Encoder Representations From Transformers (BERT)?type models (Tohoku-BERT and a University of Tokyo Hospital BERT pretrained with Japanese clinical text: UTH-BERT) and an Efficiently Learning Encoder That Classifies Token Replacements Accurately (ELECTRA), pretrained on Japanese text. Both sentence- and report-level training were performed; the performance was evaluated by the F1-score and exact match accuracy through 5-fold cross-validation. Results: Among all 7121 sentences, 1167, 694, 2455, 23, 1905, 46, 195, 1104, and 195 included ?procedure adherence,? ?medicine,? ?resident,? ?resident family,? ?nonmedical staff,? ?medical staff,? ?team,? ?environment,? and ?organizational management,? respectively. Owing to limited labels, ?resident family? and ?medical staff? were omitted from the model development process. The interannotator agreement values were higher than 0.6 for each label. A total of 10, 278, and 1855 reports contained no, 1, and multiple labels, respectively. The models trained using the report data outperformed those trained using sentences, with macro F1-scores of 0.744, 0.675, and 0.735 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. The report-trained models also demonstrated better exact match accuracy, with 0.411, 0.389, and 0.399 for Tohoku-BERT, UTH-BERT, and ELECTRA, respectively. Notably, the accuracy was consistent even when the analysis was confined to reports containing multiple labels. Conclusions: The multilabel classifier developed in our study demonstrated potential for identifying various factors associated with medication-related incidents using incident reports from residential care facilities. Thus, this classifier can facilitate prompt analysis of incident factors, thereby contributing to risk management and the development of preventive strategies. UR - https://medinform.jmir.org/2024/1/e58141 UR - http://dx.doi.org/10.2196/58141 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58141 ER - TY - JOUR AU - Bienefeld, Nadine AU - Keller, Emanuela AU - Grote, Gudela PY - 2024/7/22 TI - Human-AI Teaming in Critical Care: A Comparative Analysis of Data Scientists? and Clinicians? Perspectives on AI Augmentation and Automation JO - J Med Internet Res SP - e50130 VL - 26 KW - AI in health care KW - human-AI teaming KW - sociotechnical systems KW - intensive care KW - ICU KW - AI adoption KW - AI implementation KW - augmentation KW - automation, health care policy and regulatory foresight KW - explainable AI KW - explainable KW - human-AI KW - human-computer KW - human-machine KW - ethical implications of AI in health care KW - ethical KW - ethic KW - ethics KW - artificial intelligence KW - policy KW - foresight KW - policies KW - recommendation KW - recommendations KW - policy maker KW - policy makers KW - Delphi KW - sociotechnical N2 - Background: Artificial intelligence (AI) holds immense potential for enhancing clinical and administrative health care tasks. However, slow adoption and implementation challenges highlight the need to consider how humans can effectively collaborate with AI within broader socio-technical systems in health care. Objective: In the example of intensive care units (ICUs), we compare data scientists? and clinicians? assessments of the optimal utilization of human and AI capabilities by determining suitable levels of human-AI teaming for safely and meaningfully augmenting or automating 6 core tasks. The goal is to provide actionable recommendations for policy makers and health care practitioners regarding AI design and implementation. Methods: In this multimethod study, we combine a systematic task analysis across 6 ICUs with an international Delphi survey involving 19 health data scientists from the industry and academia and 61 ICU clinicians (25 physicians and 36 nurses) to define and assess optimal levels of human-AI teaming (level 1=no performance benefits; level 2=AI augments human performance; level 3=humans augment AI performance; level 4=AI performs without human input). Stakeholder groups also considered ethical and social implications. Results: Both stakeholder groups chose level 2 and 3 human-AI teaming for 4 out of 6 core tasks in the ICU. For one task (monitoring), level 4 was the preferred design choice. For the task of patient interactions, both data scientists and clinicians agreed that AI should not be used regardless of technological feasibility due to the importance of the physician-patient and nurse-patient relationship and ethical concerns. Human-AI design choices rely on interpretability, predictability, and control over AI systems. If these conditions are not met and AI performs below human-level reliability, a reduction to level 1 or shifting accountability away from human end users is advised. If AI performs at or beyond human-level reliability and these conditions are not met, shifting to level 4 automation should be considered to ensure safe and efficient human-AI teaming. Conclusions: By considering the sociotechnical system and determining appropriate levels of human-AI teaming, our study showcases the potential for improving the safety and effectiveness of AI usage in ICUs and broader health care settings. Regulatory measures should prioritize interpretability, predictability, and control if clinicians hold full accountability. Ethical and social implications must be carefully evaluated to ensure effective collaboration between humans and AI, particularly considering the most recent advancements in generative AI. UR - https://www.jmir.org/2024/1/e50130 UR - http://dx.doi.org/10.2196/50130 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/50130 ER - TY - JOUR AU - Zha, Bowen AU - Cai, Angshu AU - Wang, Guiqi PY - 2024/7/15 TI - Diagnostic Accuracy of Artificial Intelligence in Endoscopy: Umbrella Review JO - JMIR Med Inform SP - e56361 VL - 12 KW - endoscopy KW - artificial intelligence KW - umbrella review KW - meta-analyses KW - AI KW - diagnostic KW - researchers KW - researcher KW - tools KW - tool KW - assessment N2 - Background: Some research has already reported the diagnostic value of artificial intelligence (AI) in different endoscopy outcomes. However, the evidence is confusing and of varying quality. Objective: This review aimed to comprehensively evaluate the credibility of the evidence of AI?s diagnostic accuracy in endoscopy. Methods: Before the study began, the protocol was registered on PROSPERO (CRD42023483073). First, 2 researchers searched PubMed, Web of Science, Embase, and Cochrane Library using comprehensive search terms. Then, researchers screened the articles and extracted information. We used A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2) to evaluate the quality of the articles. When there were multiple studies aiming at the same result, we chose the study with higher-quality evaluations for further analysis. To ensure the reliability of the conclusions, we recalculated each outcome. Finally, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) was used to evaluate the credibility of the outcomes. Results: A total of 21 studies were included for analysis. Through AMSTAR2, it was found that 8 research methodologies were of moderate quality, while other studies were regarded as having low or critically low quality. The sensitivity and specificity of 17 different outcomes were analyzed. There were 4 studies on esophagus, 4 studies on stomach, and 4 studies on colorectal regions. Two studies were associated with capsule endoscopy, two were related to laryngoscopy, and one was related to ultrasonic endoscopy. In terms of sensitivity, gastroesophageal reflux disease had the highest accuracy rate, reaching 97%, while the invasion depth of colon neoplasia, with 71%, had the lowest accuracy rate. On the other hand, the specificity of colorectal cancer was the highest, reaching 98%, while the gastrointestinal stromal tumor, with only 80%, had the lowest specificity. The GRADE evaluation suggested that the reliability of most outcomes was low or very low. Conclusions: AI proved valuabe in endoscopic diagnoses, especially in esophageal and colorectal diseases. These findings provide a theoretical basis for developing and evaluating AI-assisted systems, which are aimed at assisting endoscopists in carrying out examinations, leading to improved patient health outcomes. However, further high-quality research is needed in the future to fully validate AI?s effectiveness. UR - https://medinform.jmir.org/2024/1/e56361 UR - http://dx.doi.org/10.2196/56361 ID - info:doi/10.2196/56361 ER - TY - JOUR AU - Cho, Youngjin AU - Yoon, Minjae AU - Kim, Joonghee AU - Lee, Hyun Ji AU - Oh, Il-Young AU - Lee, Joo Chan AU - Kang, Seok-Min AU - Choi, Dong-Ju PY - 2024/7/3 TI - Artificial Intelligence?Based Electrocardiographic Biomarker for Outcome Prediction in Patients With Acute Heart Failure: Prospective Cohort Study JO - J Med Internet Res SP - e52139 VL - 26 KW - acute heart failure KW - electrocardiography KW - artificial intelligence KW - deep learning N2 - Background: Although several biomarkers exist for patients with heart failure (HF), their use in routine clinical practice is often constrained by high costs and limited availability. Objective: We examined the utility of an artificial intelligence (AI) algorithm that analyzes printed electrocardiograms (ECGs) for outcome prediction in patients with acute HF. Methods: We retrospectively analyzed prospectively collected data of patients with acute HF at two tertiary centers in Korea. Baseline ECGs were analyzed using a deep-learning system called Quantitative ECG (QCG), which was trained to detect several urgent clinical conditions, including shock, cardiac arrest, and reduced left ventricular ejection fraction (LVEF). Results: Among the 1254 patients enrolled, in-hospital cardiac death occurred in 53 (4.2%) patients, and the QCG score for critical events (QCG-Critical) was significantly higher in these patients than in survivors (mean 0.57, SD 0.23 vs mean 0.29, SD 0.20; P<.001). The QCG-Critical score was an independent predictor of in-hospital cardiac death after adjustment for age, sex, comorbidities, HF etiology/type, atrial fibrillation, and QRS widening (adjusted odds ratio [OR] 1.68, 95% CI 1.47-1.92 per 0.1 increase; P<.001), and remained a significant predictor after additional adjustments for echocardiographic LVEF and N-terminal prohormone of brain natriuretic peptide level (adjusted OR 1.59, 95% CI 1.36-1.87 per 0.1 increase; P<.001). During long-term follow-up, patients with higher QCG-Critical scores (>0.5) had higher mortality rates than those with low QCG-Critical scores (<0.25) (adjusted hazard ratio 2.69, 95% CI 2.14-3.38; P<.001). Conclusions: Predicting outcomes in patients with acute HF using the QCG-Critical score is feasible, indicating that this AI-based ECG score may be a novel biomarker for these patients. Trial Registration: ClinicalTrials.gov NCT01389843; https://clinicaltrials.gov/study/NCT01389843 UR - https://www.jmir.org/2024/1/e52139 UR - http://dx.doi.org/10.2196/52139 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52139 ER - TY - JOUR AU - Marri, Shankar Shiva AU - Albadri, Warood AU - Hyder, Salman Mohammed AU - Janagond, B. Ajit AU - Inamadar, C. Arun PY - 2024/7/2 TI - Efficacy of an Artificial Intelligence App (Aysa) in Dermatological Diagnosis: Cross-Sectional Analysis JO - JMIR Dermatol SP - e48811 VL - 7 KW - artificial intelligence KW - AI KW - AI-aided diagnosis KW - dermatology KW - mobile app KW - application KW - neural network KW - machine learning KW - dermatological KW - skin KW - computer-aided diagnosis KW - diagnostic KW - imaging KW - lesion N2 - Background: Dermatology is an ideal specialty for artificial intelligence (AI)?driven image recognition to improve diagnostic accuracy and patient care. Lack of dermatologists in many parts of the world and the high frequency of cutaneous disorders and malignancies highlight the increasing need for AI-aided diagnosis. Although AI-based applications for the identification of dermatological conditions are widely available, research assessing their reliability and accuracy is lacking. Objective: The aim of this study was to analyze the efficacy of the Aysa AI app as a preliminary diagnostic tool for various dermatological conditions in a semiurban town in India. Methods: This observational cross-sectional study included patients over the age of 2 years who visited the dermatology clinic. Images of lesions from individuals with various skin disorders were uploaded to the app after obtaining informed consent. The app was used to make a patient profile, identify lesion morphology, plot the location on a human model, and answer questions regarding duration and symptoms. The app presented eight differential diagnoses, which were compared with the clinical diagnosis. The model?s performance was evaluated using sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F1-score. Comparison of categorical variables was performed with the ?2 test and statistical significance was considered at P<.05. Results: A total of 700 patients were part of the study. A wide variety of skin conditions were grouped into 12 categories. The AI model had a mean top-1 sensitivity of 71% (95% CI 61.5%-74.3%), top-3 sensitivity of 86.1% (95% CI 83.4%-88.6%), and all-8 sensitivity of 95.1% (95% CI 93.3%-96.6%). The top-1 sensitivities for diagnosis of skin infestations, disorders of keratinization, other inflammatory conditions, and bacterial infections were 85.7%, 85.7%, 82.7%, and 81.8%, respectively. In the case of photodermatoses and malignant tumors, the top-1 sensitivities were 33.3% and 10%, respectively. Each category had a strong correlation between the clinical diagnosis and the probable diagnoses (P<.001). Conclusions: The Aysa app showed promising results in identifying most dermatoses. UR - https://derma.jmir.org/2024/1/e48811 UR - http://dx.doi.org/10.2196/48811 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/48811 ER - TY - JOUR AU - Lu, Linken AU - Lu, Tangsheng AU - Tian, Chunyu AU - Zhang, Xiujun PY - 2024/6/28 TI - AI: Bridging Ancient Wisdom and Modern Innovation in Traditional Chinese Medicine JO - JMIR Med Inform SP - e58491 VL - 12 KW - traditional Chinese medicine KW - TCM KW - artificial intelligence KW - AI KW - diagnosis UR - https://medinform.jmir.org/2024/1/e58491 UR - http://dx.doi.org/10.2196/58491 UR - http://www.ncbi.nlm.nih.gov/pubmed/38941141 ID - info:doi/10.2196/58491 ER - TY - JOUR AU - Kale, U. Aditya AU - Hogg, Jeffry Henry David AU - Pearson, Russell AU - Glocker, Ben AU - Golder, Su AU - Coombe, April AU - Waring, Justin AU - Liu, Xiaoxuan AU - Moore, J. David AU - Denniston, K. Alastair PY - 2024/6/28 TI - Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review JO - JMIR Res Protoc SP - e51614 VL - 13 KW - patient safety KW - adverse events KW - randomized controlled trials KW - medical device KW - systematic review KW - algorithmic KW - artificial intelligence KW - AI KW - AI health technology KW - safety KW - algorithm error N2 - Background: Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. Objective: This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. Methods: This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. Results: The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. Conclusions: Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. Trial Registration: PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747 International Registered Report Identifier (IRRID): PRR1-10.2196/51614 UR - https://www.researchprotocols.org/2024/1/e51614 UR - http://dx.doi.org/10.2196/51614 UR - http://www.ncbi.nlm.nih.gov/pubmed/38941147 ID - info:doi/10.2196/51614 ER - TY - JOUR AU - Meer, Andreas AU - Rahm, Philipp AU - Schwendinger, Markus AU - Vock, Michael AU - Grunder, Bettina AU - Demurtas, Jacopo AU - Rutishauser, Jonas PY - 2024/6/27 TI - A Symptom-Checker for Adult Patients Visiting an Interdisciplinary Emergency Care Center and the Safety of Patient Self-Triage: Real-Life Prospective Evaluation JO - J Med Internet Res SP - e58157 VL - 26 KW - safety KW - telemedicine KW - teletriage KW - symptom-checker KW - self-triage KW - self-assessment KW - triage KW - triaging KW - symptom KW - symptoms KW - validation KW - validity KW - telehealth KW - mHealth KW - mobile health KW - app KW - apps KW - application KW - applications KW - diagnosis KW - diagnoses KW - diagnostic KW - diagnostics KW - checker KW - checkers KW - check KW - web KW - neural network KW - neural networks N2 - Background: Symptom-checkers have become important tools for self-triage, assisting patients to determine the urgency of medical care. To be safe and effective, these tools must be validated, particularly to avoid potentially hazardous undertriage without leading to inefficient overtriage. Only limited safety data from studies including small sample sizes have been available so far. Objective: The objective of our study was to prospectively investigate the safety of patients? self-triage in a large patient sample. We used SMASS (Swiss Medical Assessment System; in4medicine, Inc) pathfinder, a symptom-checker based on a computerized transparent neural network. Methods: We recruited 2543 patients into this single-center, prospective clinical trial conducted at the cantonal hospital of Baden, Switzerland. Patients with an Emergency Severity Index of 1-2 were treated by the team of the emergency department, while those with an index of 3-5 were seen at the walk-in clinic by general physicians. We compared the triage recommendation obtained by the patients? self-triage with the assessment of clinical urgency made by 3 successive interdisciplinary panels of physicians (panels A, B, and C). Using the Clopper-Pearson CI, we assumed that to confirm the symptom-checkers? safety, the upper confidence bound for the probability of a potentially hazardous undertriage should lie below 1%. A potentially hazardous undertriage was defined as a triage in which either all (consensus criterion) or the majority (majority criterion) of the experts of the last panel (panel C) rated the triage of the symptom-checker to be ?rather likely? or ?likely? life-threatening or harmful. Results: Of the 2543 patients, 1227 (48.25%) were female and 1316 (51.75%) male. None of the patients reached the prespecified consensus criterion for a potentially hazardous undertriage. This resulted in an upper 95% confidence bound of 0.1184%. Further, 4 cases met the majority criterion. This resulted in an upper 95% confidence bound for the probability of a potentially hazardous undertriage of 0.3616%. The 2-sided 95% Clopper-Pearson CI for the probability of overtriage (n=450 cases,17.69%) was 16.23% to 19.24%, which is considerably lower than the figures reported in the literature. Conclusions: The symptom-checker proved to be a safe triage tool, avoiding potentially hazardous undertriage in a real-life clinical setting of emergency consultations at a walk-in clinic or emergency department without causing undesirable overtriage. Our data suggest the symptom-checker may be safely used in clinical routine. Trial Registration: ClinicalTrials.gov NCT04055298; https://clinicaltrials.gov/study/NCT04055298 UR - https://www.jmir.org/2024/1/e58157 UR - http://dx.doi.org/10.2196/58157 UR - http://www.ncbi.nlm.nih.gov/pubmed/38809606 ID - info:doi/10.2196/58157 ER - TY - JOUR AU - Li, Aoyu AU - Li, Jingwen AU - Chai, Jiali AU - Wu, Wei AU - Chaudhary, Suamn AU - Zhao, Juanjuan AU - Qiang, Yan PY - 2024/6/26 TI - Detection of Mild Cognitive Impairment Through Hand Motor Function Under Digital Cognitive Test: Mixed Methods Study JO - JMIR Mhealth Uhealth SP - e48777 VL - 12 KW - mild cognitive impairment KW - movement kinetics KW - digital cognitive test KW - dual task KW - mobile phone N2 - Background: Early detection of cognitive impairment or dementia is essential to reduce the incidence of severe neurodegenerative diseases. However, currently available diagnostic tools for detecting mild cognitive impairment (MCI) or dementia are time-consuming, expensive, or not widely accessible. Hence, exploring more effective methods to assist clinicians in detecting MCI is necessary. Objective: In this study, we aimed to explore the feasibility and efficiency of assessing MCI through movement kinetics under tablet-based ?drawing and dragging? tasks. Methods: We iteratively designed ?drawing and dragging? tasks by conducting symposiums, programming, and interviews with stakeholders (neurologists, nurses, engineers, patients with MCI, healthy older adults, and caregivers). Subsequently, stroke patterns and movement kinetics were evaluated in healthy control and MCI groups by comparing 5 categories of features related to hand motor function (ie, time, stroke, frequency, score, and sequence). Finally, user experience with the overall cognitive screening system was investigated using structured questionnaires and unstructured interviews, and their suggestions were recorded. Results: The ?drawing and dragging? tasks can detect MCI effectively, with an average accuracy of 85% (SD 2%). Using statistical comparison of movement kinetics, we discovered that the time- and score-based features are the most effective among all the features. Specifically, compared with the healthy control group, the MCI group showed a significant increase in the time they took for the hand to switch from one stroke to the next, with longer drawing times, slow dragging, and lower scores. In addition, patients with MCI had poorer decision-making strategies and visual perception of drawing sequence features, as evidenced by adding auxiliary information and losing more local details in the drawing. Feedback from user experience indicates that our system is user-friendly and facilitates screening for deficits in self-perception. Conclusions: The tablet-based MCI detection system quantitatively assesses hand motor function in older adults and further elucidates the cognitive and behavioral decline phenomenon in patients with MCI. This innovative approach serves to identify and measure digital biomarkers associated with MCI or Alzheimer dementia, enabling the monitoring of changes in patients? executive function and visual perceptual abilities as the disease advances. UR - https://mhealth.jmir.org/2024/1/e48777 UR - http://dx.doi.org/10.2196/48777 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/48777 ER - TY - JOUR AU - Chandhiruthil Sathyan, Anjana AU - Yadav, Pramod AU - Gupta, Prashant AU - Mahapathra, Kumar Arun AU - Galib, Ruknuddin PY - 2024/6/10 TI - In Silico Approaches to Polyherbal Synergy: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e56646 VL - 13 KW - polyherbal formulation KW - Ayurveda system KW - Ayurveda KW - Ayurvedic medicine KW - Ayurvedic treatment KW - herbal KW - herbal drug KW - pharmacodynamic KW - pharmacology KW - computer-aided drug design KW - in silico methodology KW - scoping review N2 - Background: According to the World Health Organization, more than 80% of the world?s population relies on traditional medicine. Traditional medicine is typically based on the use of single herbal drugs or polyherbal formulations (PHFs) to manage diseases. However, the probable mode of action of these formulations is not well studied or documented. Over the past few decades, computational methods have been used to study the molecular mechanism of phytochemicals in single herbal drugs. However, the in silico methods applied to study PHFs remain unclear. Objective: The aim of this protocol is to develop a search strategy for a scoping review to map the in silico approaches applied in understanding the activity of PHFs used as traditional medicines worldwide. Methods: The scoping review will be conducted based on the methodology developed by Arksey and O?Malley and the recommendations of the Joanna Briggs Institute (JBI). A set of predetermined keywords will be used to identify the relevant studies from five databases: PubMed, Embase, Science Direct, Web of Science, and Google Scholar. Two independent reviewers will conduct the search to yield a list of relevant studies based on the inclusion and exclusion criteria. Mendeley version 1.19.8 will be used to remove duplicate citations, and title and abstract screening will be performed with Rayyan software. The JBI System for the Unified Management, Assessment, and Review of Information tool will be used for data extraction. The scoping review will be reported based on the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Results: Based on the core areas of the scoping review, a 3-step search strategy was developed. The initial search produced 3865 studies. After applying filters, 875 studies were short-listed for further review. Keywords were further refined to yield more relevant studies on the topic. Conclusions: The findings are expected to determine the extent of the knowledge gap in the applications of computational methods in PHFs for any traditional medicine across the world. The study can provide answers to open research questions related to the phytochemical identification of PHFs, criteria for target identification, strategies applied for in silico studies, software used, and challenges in adopting in silico methods for understanding the mechanisms of action of PHFs. This study can thus provide a better understanding of the application and types of in silico methods for investigating PHFs. International Registered Report Identifier (IRRID): PRR1-10.2196/56646 UR - https://www.researchprotocols.org/2024/1/e56646 UR - http://dx.doi.org/10.2196/56646 UR - http://www.ncbi.nlm.nih.gov/pubmed/38857494 ID - info:doi/10.2196/56646 ER - TY - JOUR AU - Bittmann, A. Janina AU - Scherkl, Camilo AU - Meid, D. Andreas AU - Haefeli, E. Walter AU - Seidling, M. Hanna PY - 2024/6/4 TI - Event Analysis for Automated Estimation of Absent and Persistent Medication Alerts: Novel Methodology JO - JMIR Med Inform SP - e54428 VL - 12 KW - clinical decision support system KW - CDSS KW - medication alert system KW - alerting KW - alert acceptance KW - event analysis N2 - Background: Event analysis is a promising approach to estimate the acceptance of medication alerts issued by computerized physician order entry (CPOE) systems with an integrated clinical decision support system (CDSS), particularly when alerts cannot be interactively confirmed in the CPOE-CDSS due to its system architecture. Medication documentation is then reviewed for documented evidence of alert acceptance, which can be a time-consuming process, especially when performed manually. Objective: We present a new automated event analysis approach, which was applied to a large data set generated in a CPOE-CDSS with passive, noninterruptive alerts. Methods: Medication and alert data generated over 3.5 months within the CPOE-CDSS at Heidelberg University Hospital were divided into 24-hour time intervals in which the alert display was correlated with associated prescription changes. Alerts were considered ?persistent? if they were displayed in every consecutive 24-hour time interval due to a respective active prescription until patient discharge and were considered ?absent? if they were no longer displayed during continuous prescriptions in the subsequent interval. Results: Overall, 1670 patient cases with 11,428 alerts were analyzed. Alerts were displayed for a median of 3 (IQR 1-7) consecutive 24-hour time intervals, with the shortest alerts displayed for drug-allergy interactions and the longest alerts displayed for potentially inappropriate medication for the elderly (PIM). Among the total 11,428 alerts, 56.1% (n=6413) became absent, most commonly among alerts for drug-drug interactions (1915/2366, 80.9%) and least commonly among PIM alerts (199/499, 39.9%). Conclusions: This new approach to estimate alert acceptance based on event analysis can be flexibly adapted to the automated evaluation of passive, noninterruptive alerts. This enables large data sets of longitudinal patient cases to be processed, allows for the derivation of the ratios of persistent and absent alerts, and facilitates the comparison and prospective monitoring of these alerts. UR - https://medinform.jmir.org/2024/1/e54428 UR - http://dx.doi.org/10.2196/54428 ID - info:doi/10.2196/54428 ER - TY - JOUR AU - Umibe, Akiko AU - Fushiki, Hiroaki AU - Tsunoda, Reiko AU - Kuroda, Tatsuaki AU - Kuroda, Kazuhiro AU - Tanaka, Yasuhiro PY - 2024/6/4 TI - Development of a Subjective Visual Vertical Test System Using a Smartphone With Virtual Reality Goggles for Screening of Otolithic Dysfunction: Observational Study JO - JMIR Form Res SP - e53642 VL - 8 KW - vestibular function tests KW - telemedicine KW - smartphone KW - virtual reality KW - otolith dysfunction screening tool KW - vestibular evoked myogenic potential KW - iPhone KW - mobile phone N2 - Background: The subjective visual vertical (SVV) test can evaluate otolith function and spatial awareness and is performed in dedicated vertigo centers using specialized equipment; however, it is not otherwise widely used because of the specific equipment and space requirements. An SVV test smartphone app was developed to easily perform assessments in outpatient facilities. Objective: This study aimed to verify whether the SVV test smartphone app with commercially available virtual reality goggles can be used in a clinical setting. Methods: The reference range was calculated for 15 healthy participants. We included 14 adult patients with unilateral vestibular neuritis, sudden sensorineural hearing loss with vertigo, and Meniere disease and investigated the correlation between the SVV test results and vestibular evoked myogenic potential (VEMP) results. Results: The SVV reference range of healthy participants for the sitting front-facing position was small, ranging from ?2.6º to 2.3º. Among the 14 patients, 6 (43%) exceeded the reference range for healthy participants. The SVV of patients with vestibular neuritis and sudden sensorineural hearing loss tended to deviate to the affected side. A total of 9 (64%) had abnormal cervical VEMP (cVEMP) values and 6 (43%) had abnormal ocular VEMP (oVEMP) values. No significant difference was found between the presence or absence of abnormal SVV values and the presence or absence of abnormal cVEMP and oVEMP values; however, the odds ratios (ORs) suggested a higher likelihood of abnormal SVV values among those with abnormal cVEMP and oVEMP responses (OR 2.40, 95% CI 0.18-32.88; P>.99; and OR 2, 95% CI 0.90-4.45; P=.46, respectively). Conclusions: The SVV app can be used anywhere and in a short period while reducing directional bias by using virtual reality goggles, thus making it highly versatile and useful as a practical otolith dysfunction screening tool. UR - https://formative.jmir.org/2024/1/e53642 UR - http://dx.doi.org/10.2196/53642 UR - http://www.ncbi.nlm.nih.gov/pubmed/38833295 ID - info:doi/10.2196/53642 ER - TY - JOUR AU - Eerdekens, Rob AU - Zelis, Jo AU - ter Horst, Herman AU - Crooijmans, Caia AU - van 't Veer, Marcel AU - Keulards, Danielle AU - Kelm, Marcus AU - Archer, Gareth AU - Kuehne, Titus AU - Brueren, Guus AU - Wijnbergen, Inge AU - Johnson, Nils AU - Tonino, Pim PY - 2024/6/3 TI - Cardiac Health Assessment Using a Wearable Device Before and After Transcatheter Aortic Valve Implantation: Prospective Study JO - JMIR Mhealth Uhealth SP - e53964 VL - 12 KW - aortic valve stenosis KW - health watch KW - quality of life KW - heart KW - cardiology KW - cardiac KW - aortic KW - valve KW - stenosis KW - watch KW - smartwatch KW - wearables KW - 6MWT KW - walking KW - test KW - QoL KW - WHOQOL-BREF KW - 6-minute walking test N2 - Background: Due to aging of the population, the prevalence of aortic valve stenosis will increase drastically in upcoming years. Consequently, transcatheter aortic valve implantation (TAVI) procedures will also expand worldwide. Optimal selection of patients who benefit with improved symptoms and prognoses is key, since TAVI is not without its risks. Currently, we are not able to adequately predict functional outcomes after TAVI. Quality of life measurement tools and traditional functional assessment tests do not always agree and can depend on factors unrelated to heart disease. Activity tracking using wearable devices might provide a more comprehensive assessment. Objective: This study aimed to identify objective parameters (eg, change in heart rate) associated with improvement after TAVI for severe aortic stenosis from a wearable device. Methods: In total, 100 patients undergoing routine TAVI wore a Philips Health Watch device for 1 week before and after the procedure. Watch data were analyzed offline?before TAVI for 97 patients and after TAVI for 75 patients. Results: Parameters such as the total number of steps and activity time did not change, in contrast to improvements in the 6-minute walking test (6MWT) and physical limitation domain of the transformed WHOQOL-BREF questionnaire. Conclusions: These findings, in an older TAVI population, show that watch-based parameters, such as the number of steps, do not change after TAVI, unlike traditional 6MWT and QoL assessments. Basic wearable device parameters might be less appropriate for measuring treatment effects from TAVI. UR - https://mhealth.jmir.org/2024/1/e53964 UR - http://dx.doi.org/10.2196/53964 ID - info:doi/10.2196/53964 ER - TY - JOUR AU - Sesgundo III, Angeles Jaime AU - Maeng, Collin David AU - Tukay, Aubrey Jumelle AU - Ascano, Patricia Maria AU - Suba-Cohen, Justine AU - Sampang, Virginia PY - 2024/5/27 TI - Evaluation of Artificial Intelligence Algorithms for Diabetic Retinopathy Detection: Protocol for a Systematic Review and Meta-Analysis JO - JMIR Res Protoc SP - e57292 VL - 13 KW - artificial intelligence KW - diabetic retinopathy KW - deep learning KW - ophthalmology KW - accuracy KW - imaging KW - AI KW - DR KW - complication KW - retinopathy KW - Optha KW - AI algorithms KW - detection KW - management KW - ophthalmologists KW - early detection KW - screening KW - meta-analysis KW - diabetes mellitus KW - DM KW - diabetes KW - systematic review N2 - Background: Diabetic retinopathy (DR) is one of the most common complications of diabetes mellitus. The global burden is immense with a worldwide prevalence of 8.5%. Recent advancements in artificial intelligence (AI) have demonstrated the potential to transform the landscape of ophthalmology with earlier detection and management of DR. Objective: This study seeks to provide an update and evaluate the accuracy and current diagnostic ability of AI in detecting DR versus ophthalmologists. Additionally, this review will highlight the potential of AI integration to enhance DR screening, management, and disease progression. Methods: A systematic review of the current landscape of AI?s role in DR will be undertaken, guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) model. Relevant peer-reviewed papers published in English will be identified by searching 4 international databases: PubMed, Embase, CINAHL, and the Cochrane Central Register of Controlled Trials. Eligible studies will include randomized controlled trials, observational studies, and cohort studies published on or after 2022 that evaluate AI?s performance in retinal imaging detection of DR in diverse adult populations. Studies that focus on specific comorbid conditions, nonimage-based applications of AI, or those lacking a direct comparison group or clear methodology will be excluded. Selected papers will be independently assessed for bias by 2 review authors (JS and DM) using the Quality Assessment of Diagnostic Accuracy Studies tool for systematic reviews. Upon systematic review completion, if it is determined that there are sufficient data, a meta-analysis will be performed. Data synthesis will use a quantitative model. Statistical software such as RevMan and STATA will be used to produce a random-effects meta-regression model to pool data from selected studies. Results: Using selected search queries across multiple databases, we accumulated 3494 studies regarding our topic of interest, of which 1588 were duplicates, leaving 1906 unique research papers to review and analyze. Conclusions: This systematic review and meta-analysis protocol outlines a comprehensive evaluation of AI for DR detection. This active study is anticipated to assess the current accuracy of AI methods in detecting DR. International Registered Report Identifier (IRRID): DERR1-10.2196/57292 UR - https://www.researchprotocols.org/2024/1/e57292 UR - http://dx.doi.org/10.2196/57292 UR - http://www.ncbi.nlm.nih.gov/pubmed/38801771 ID - info:doi/10.2196/57292 ER - TY - JOUR AU - Shao, Jian AU - Pan, Ying AU - Kou, Wei-Bin AU - Feng, Huyi AU - Zhao, Yu AU - Zhou, Kaixin AU - Zhong, Shao PY - 2024/5/24 TI - Generalization of a Deep Learning Model for Continuous Glucose Monitoring?Based Hypoglycemia Prediction: Algorithm Development and Validation Study JO - JMIR Med Inform SP - e56909 VL - 12 KW - hypoglycemia prediction KW - hypoglycemia KW - hypoglycemic KW - blood sugar KW - prediction KW - predictive KW - deep learning KW - generalization KW - machine learning KW - glucose KW - diabetes KW - continuous glucose monitoring KW - type 1 diabetes KW - type 2 diabetes KW - LSTM KW - long short-term memory N2 - Background: Predicting hypoglycemia while maintaining a low false alarm rate is a challenge for the wide adoption of continuous glucose monitoring (CGM) devices in diabetes management. One small study suggested that a deep learning model based on the long short-term memory (LSTM) network had better performance in hypoglycemia prediction than traditional machine learning algorithms in European patients with type 1 diabetes. However, given that many well-recognized deep learning models perform poorly outside the training setting, it remains unclear whether the LSTM model could be generalized to different populations or patients with other diabetes subtypes. Objective: The aim of this study was to validate LSTM hypoglycemia prediction models in more diverse populations and across a wide spectrum of patients with different subtypes of diabetes. Methods: We assembled two large data sets of patients with type 1 and type 2 diabetes. The primary data set including CGM data from 192 Chinese patients with diabetes was used to develop the LSTM, support vector machine (SVM), and random forest (RF) models for hypoglycemia prediction with a prediction horizon of 30 minutes. Hypoglycemia was categorized into mild (glucose=54-70 mg/dL) and severe (glucose<54 mg/dL) levels. The validation data set of 427 patients of European-American ancestry in the United States was used to validate the models and examine their generalizations. The predictive performance of the models was evaluated according to the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results: For the difficult-to-predict mild hypoglycemia events, the LSTM model consistently achieved AUC values greater than 97% in the primary data set, with a less than 3% AUC reduction in the validation data set, indicating that the model was robust and generalizable across populations. AUC values above 93% were also achieved when the LSTM model was applied to both type 1 and type 2 diabetes in the validation data set, further strengthening the generalizability of the model. Under different satisfactory levels of sensitivity for mild and severe hypoglycemia prediction, the LSTM model achieved higher specificity than the SVM and RF models, thereby reducing false alarms. Conclusions: Our results demonstrate that the LSTM model is robust for hypoglycemia prediction and is generalizable across populations or diabetes subtypes. Given its additional advantage of false-alarm reduction, the LSTM model is a strong candidate to be widely implemented in future CGM devices for hypoglycemia prediction. UR - https://medinform.jmir.org/2024/1/e56909 UR - http://dx.doi.org/10.2196/56909 ID - info:doi/10.2196/56909 ER - TY - JOUR AU - Prochaska, Eveline AU - Ammenwerth, Elske PY - 2024/5/23 TI - Clinical Utility and Usability of the Digital Box and Block Test: Mixed Methods Study JO - JMIR Rehabil Assist Technol SP - e54939 VL - 11 KW - assessment KW - clinical utility KW - digital Box and Block Test KW - dBBT KW - hand dexterity KW - dexterity KW - usability N2 - Background: The Box and Block Test (BBT) is a clinical tool used to measure hand dexterity, which is often used for tracking disease progression or the effectiveness of therapy, particularly benefiting older adults and those with neurological conditions. Digitizing the measurement of hand function may enhance the quality of data collection. We have developed and validated a prototype that digitizes this test, known as the digital BBT (dBBT), which automatically measures time and determines and displays the test result. Objective: This study aimed to investigate the clinical utility and usability of the newly developed dBBT and to collect suggestions for future improvements. Methods: A total of 4 occupational therapists participated in our study. To evaluate the clinical utility, we compared the dBBT to the BBT across dimensions such as acceptance, portability, energy and effort, time, and costs. We observed therapists using the dBBT as a dexterity measurement tool and conducted a quantitative usability questionnaire using the System Usability Scale (SUS), along with a focus group. Evaluative, structured, and qualitative content analysis was used for the qualitative data, whereas quantitative analysis was applied to questionnaire data. The qualitative and quantitative data were merged and analyzed using a convergent mixed methods approach. Results: Overall, the results of the evaluative content analysis suggested that the dBBT had a better clinical utility than the original BBT, with ratings of all collected participant statements for the dBBT being 45% (45/99) equal to, 48% (48/99) better than, and 6% (6/99) lesser than the BBT. Particularly in the subcategories ?acceptance,? ?time required for evaluation,? and ?purchase costs,? the dBBT was rated as being better than the original BBT. The dBBT achieved a mean SUS score of 83 (95% CI 76-96). Additionally, several suggested changes to the system were identified. Conclusions: The study demonstrated an overall positive evaluation of the clinical utility and usability of the dBBT. Valuable insights were gathered for future system iterations. These pioneering results highlight the potential of digitizing hand dexterity assessments. Trial Registration: Open Science Framework qv2d9; https://osf.io/qv2d9 UR - https://rehab.jmir.org/2024/1/e54939 UR - http://dx.doi.org/10.2196/54939 ID - info:doi/10.2196/54939 ER - TY - JOUR AU - Lefkovitz, Ilana AU - Walsh, Samantha AU - Blank, J. Leah AU - Jetté, Nathalie AU - Kummer, R. Benjamin PY - 2024/5/22 TI - Direct Clinical Applications of Natural Language Processing in Common Neurological Disorders: Scoping Review JO - JMIR Neurotech SP - e51822 VL - 3 KW - natural language processing KW - NLP KW - unstructured KW - text KW - machine learning KW - deep learning KW - neurology KW - headache disorders KW - migraine KW - Parkinson disease KW - cerebrovascular disease KW - stroke KW - transient ischemic attack KW - epilepsy KW - multiple sclerosis KW - cardiovascular KW - artificial intelligence KW - Parkinson KW - neurological KW - neurological disorder KW - scoping review KW - diagnosis KW - treatment KW - prediction N2 - Background: Natural language processing (NLP), a branch of artificial intelligence that analyzes unstructured language, is being increasingly used in health care. However, the extent to which NLP has been formally studied in neurological disorders remains unclear. Objective: We sought to characterize studies that applied NLP to the diagnosis, prediction, or treatment of common neurological disorders. Methods: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) standards. The search was conducted using MEDLINE and Embase on May 11, 2022. Studies of NLP use in migraine, Parkinson disease, Alzheimer disease, stroke and transient ischemic attack, epilepsy, or multiple sclerosis were included. We excluded conference abstracts, review papers, as well as studies involving heterogeneous clinical populations or indirect clinical uses of NLP. Study characteristics were extracted and analyzed using descriptive statistics. We did not aggregate measurements of performance in our review due to the high variability in study outcomes, which is the main limitation of the study. Results: In total, 916 studies were identified, of which 41 (4.5%) met all eligibility criteria and were included in the final review. Of the 41 included studies, the most frequently represented disorders were stroke and transient ischemic attack (n=20, 49%), followed by epilepsy (n=10, 24%), Alzheimer disease (n=6, 15%), and multiple sclerosis (n=5, 12%). We found no studies of NLP use in migraine or Parkinson disease that met our eligibility criteria. The main objective of NLP was diagnosis (n=20, 49%), followed by disease phenotyping (n=17, 41%), prognostication (n=9, 22%), and treatment (n=4, 10%). In total, 18 (44%) studies used only machine learning approaches, 6 (15%) used only rule-based methods, and 17 (41%) used both. Conclusions: We found that NLP was most commonly applied for diagnosis, implying a potential role for NLP in augmenting diagnostic accuracy in settings with limited access to neurological expertise. We also found several gaps in neurological NLP research, with few to no studies addressing certain disorders, which may suggest additional areas of inquiry. Trial Registration: Prospective Register of Systematic Reviews (PROSPERO) CRD42021228703; https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=228703 UR - https://neuro.jmir.org/2024/1/e51822 UR - http://dx.doi.org/10.2196/51822 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/51822 ER - TY - JOUR AU - Harada, Yukinori AU - Sakamoto, Tetsu AU - Sugimoto, Shu AU - Shimizu, Taro PY - 2024/5/17 TI - Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study JO - JMIR Form Res SP - e53985 VL - 8 KW - atypical presentations KW - diagnostic accuracy KW - diagnosis KW - diagnostics KW - symptom checker KW - uncommon diseases KW - symptom checkers KW - uncommon KW - rare KW - artificial intelligence N2 - Background: Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited. Objective: This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world. Methods: This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker?s diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year). Results: A total of 381 patients were included. Common diseases comprised 257 (67.5%) cases, and typical presentations were observed in 298 (78.2%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1%), which did not differ across the 3 years (first year: 97/219, 44.3%; second year: 32/72, 44.4%; and third year: 43/90, 47.7%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2%) and atypical presentations (12/83, 14.5%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker. Conclusions: A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions. UR - https://formative.jmir.org/2024/1/e53985 UR - http://dx.doi.org/10.2196/53985 UR - http://www.ncbi.nlm.nih.gov/pubmed/38758588 ID - info:doi/10.2196/53985 ER - TY - JOUR AU - Zhang, Jinbo AU - Yang, Pingping AU - Zeng, Lu AU - Li, Shan AU - Zhou, Jiamei PY - 2024/5/14 TI - Ventilator-Associated Pneumonia Prediction Models Based on AI: Scoping Review JO - JMIR Med Inform SP - e57026 VL - 12 KW - artificial intelligence KW - machine learning KW - ventilator-associated pneumonia KW - prediction KW - scoping KW - PRISMA KW - Preferred Reporting Items for Systematic Reviews and Meta-Analyses N2 - Background: Ventilator-associated pneumonia (VAP) is a serious complication of mechanical ventilation therapy that affects patients? treatments and prognoses. Owing to its excellent data mining capabilities, artificial intelligence (AI) has been increasingly used to predict VAP. Objective: This paper reviews VAP prediction models that are based on AI, providing a reference for the early identification of high-risk groups in future clinical practice. Methods: A scoping review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The Wanfang database, the Chinese Biomedical Literature Database, Cochrane Library, Web of Science, PubMed, MEDLINE, and Embase were searched to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. The data extracted from the included studies were synthesized narratively. Results: Of the 137 publications retrieved, 11 were included in this scoping review. The included studies reported the use of AI for predicting VAP. All 11 studies predicted VAP occurrence, and studies on VAP prognosis were excluded. Further, these studies used text data, and none of them involved imaging data. Public databases were the primary sources of data for model building (studies: 6/11, 55%), and 5 studies had sample sizes of <1000. Machine learning was the primary algorithm for studying the VAP prediction models. However, deep learning and large language models were not used to construct VAP prediction models. The random forest model was the most commonly used model (studies: 5/11, 45%). All studies only performed internal validations, and none of them addressed how to implement and apply the final model in real-life clinical settings. Conclusions: This review presents an overview of studies that used AI to predict and diagnose VAP. AI models have better predictive performance than traditional methods and are expected to provide indispensable tools for VAP risk prediction in the future. However, the current research is in the model construction and validation stage, and the implementation of and guidance for clinical VAP prediction require further research. UR - https://medinform.jmir.org/2024/1/e57026 UR - http://dx.doi.org/10.2196/57026 ID - info:doi/10.2196/57026 ER - TY - JOUR AU - Bui, Thu Huong Thi AU - Nguy?n Th? Ph??ng, Qu?nh AU - Cam Tu, Ho AU - Nguyen Phuong, Sinh AU - Pham, Thi Thuy AU - Vu, Thu AU - Nguyen Thi Thu, Huyen AU - Khanh Ho, Lam AU - Nguyen Tien, Dung PY - 2024/5/7 TI - The Roles of NOTCH3 p.R544C and Thrombophilia Genes in Vietnamese Patients With Ischemic Stroke: Study Involving a Hierarchical Cluster Analysis JO - JMIR Bioinform Biotech SP - e56884 VL - 5 KW - Glasgow Coma Scale KW - ischemic stroke KW - hierarchical cluster analysis KW - clustering KW - machine learning KW - MTHFR KW - NOTCH3 KW - modified Rankin scale KW - National Institutes of Health Stroke Scale KW - prothrombin KW - thrombophilia KW - mutations KW - genetics KW - genomics KW - ischemia KW - risk KW - risk analysis N2 - Background: The etiology of ischemic stroke is multifactorial. Several gene mutations have been identified as leading causes of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), a hereditary disease that causes stroke and other neurological symptoms. Objective: We aimed to identify the variants of NOTCH3 and thrombophilia genes, and their complex interactions with other factors. Methods: We conducted a hierarchical cluster analysis (HCA) on the data of 100 patients diagnosed with ischemic stroke. The variants of NOTCH3 and thrombophilia genes were identified by polymerase chain reaction with confronting 2-pair primers and real-time polymerase chain reaction. The overall preclinical characteristics, cumulative cutpoint values, and factors associated with these somatic mutations were analyzed in unidimensional and multidimensional scaling models. Results: We identified the following optimal cutpoints: creatinine, 83.67 (SD 9.19) µmol/L; age, 54 (SD 5) years; prothrombin (PT) time, 13.25 (SD 0.17) seconds; and international normalized ratio (INR), 1.02 (SD 0.03). Using the Nagelkerke method, cutpoint 50% values of the Glasgow Coma Scale score; modified Rankin scale score; and National Institutes of Health Stroke Scale scores at admission, after 24 hours, and at discharge were 12.77, 2.86 (SD 1.21), 9.83 (SD 2.85), 7.29 (SD 2.04), and 6.85 (SD 2.90), respectively. Conclusions: The variants of MTHFR (C677T and A1298C) and NOTCH3 p.R544C may influence the stroke severity under specific conditions of PT, creatinine, INR, and BMI, with risk ratios of 4.8 (95% CI 1.53-15.04) and 3.13 (95% CI 1.60-6.11), respectively (Pfisher<.05). It is interesting that although there are many genes linked to increased atrial fibrillation risk, not all of them are associated with ischemic stroke risk. With the detection of stroke risk loci, more information can be gained on their impacts and interconnections, especially in young patients. UR - https://bioinform.jmir.org/2024/1/e56884 UR - http://dx.doi.org/10.2196/56884 UR - http://www.ncbi.nlm.nih.gov/pubmed/38935968 ID - info:doi/10.2196/56884 ER - TY - JOUR AU - Park, Bogyeom AU - Kim, Yuwon AU - Park, Jinseok AU - Choi, Hojin AU - Kim, Seong-Eun AU - Ryu, Hokyoung AU - Seo, Kyoungwon PY - 2024/4/17 TI - Integrating Biomarkers From Virtual Reality and Magnetic Resonance Imaging for the Early Detection of Mild Cognitive Impairment Using a Multimodal Learning Approach: Validation Study JO - J Med Internet Res SP - e54538 VL - 26 KW - magnetic resonance imaging KW - MRI KW - virtual reality KW - VR KW - early detection KW - mild cognitive impairment KW - multimodal learning KW - hand movement KW - eye movement N2 - Background: Early detection of mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer disease, is crucial for preventing the progression of dementia. Virtual reality (VR) biomarkers have proven to be effective in capturing behaviors associated with subtle deficits in instrumental activities of daily living, such as challenges in using a food-ordering kiosk, for early detection of MCI. On the other hand, magnetic resonance imaging (MRI) biomarkers have demonstrated their efficacy in quantifying observable structural brain changes that can aid in early MCI detection. Nevertheless, the relationship between VR-derived and MRI biomarkers remains an open question. In this context, we explored the integration of VR-derived and MRI biomarkers to enhance early MCI detection through a multimodal learning approach. Objective: We aimed to evaluate and compare the efficacy of VR-derived and MRI biomarkers in the classification of MCI while also examining the strengths and weaknesses of each approach. Furthermore, we focused on improving early MCI detection by leveraging multimodal learning to integrate VR-derived and MRI biomarkers. Methods: The study encompassed a total of 54 participants, comprising 22 (41%) healthy controls and 32 (59%) patients with MCI. Participants completed a virtual kiosk test to collect 4 VR-derived biomarkers (hand movement speed, scanpath length, time to completion, and the number of errors), and T1-weighted MRI scans were performed to collect 22 MRI biomarkers from both hemispheres. Analyses of covariance were used to compare these biomarkers between healthy controls and patients with MCI, with age considered as a covariate. Subsequently, the biomarkers that exhibited significant differences between the 2 groups were used to train and validate a multimodal learning model aimed at early screening for patients with MCI among healthy controls. Results: The support vector machine (SVM) using only VR-derived biomarkers achieved a sensitivity of 87.5% and specificity of 90%, whereas the MRI biomarkers showed a sensitivity of 90.9% and specificity of 71.4%. Moreover, a correlation analysis revealed a significant association between MRI-observed brain atrophy and impaired performance in instrumental activities of daily living in the VR environment. Notably, the integration of both VR-derived and MRI biomarkers into a multimodal SVM model yielded superior results compared to unimodal SVM models, achieving higher accuracy (94.4%), sensitivity (100%), specificity (90.9%), precision (87.5%), and F1-score (93.3%). Conclusions: The results indicate that VR-derived biomarkers, characterized by their high specificity, can be valuable as a robust, early screening tool for MCI in a broader older adult population. On the other hand, MRI biomarkers, known for their high sensitivity, excel at confirming the presence of MCI. Moreover, the multimodal learning approach introduced in our study provides valuable insights into the improvement of early MCI detection by integrating a diverse set of biomarkers. UR - https://www.jmir.org/2024/1/e54538 UR - http://dx.doi.org/10.2196/54538 UR - http://www.ncbi.nlm.nih.gov/pubmed/38631021 ID - info:doi/10.2196/54538 ER - TY - JOUR AU - He, Zhe AU - Bhasuran, Balu AU - Jin, Qiao AU - Tian, Shubo AU - Hanna, Karim AU - Shavor, Cindy AU - Arguello, Garcia Lisbeth AU - Murray, Patrick AU - Lu, Zhiyong PY - 2024/4/17 TI - Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study JO - J Med Internet Res SP - e56655 VL - 26 KW - large language models KW - generative artificial intelligence KW - generative AI KW - ChatGPT KW - laboratory test results KW - patient education KW - natural language processing N2 - Background: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. Objective: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test?related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. Methods: We collected laboratory test result?related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. Results: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4?generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4?s responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one?s medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients? laboratory test result?related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4?s responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation. UR - https://www.jmir.org/2024/1/e56655 UR - http://dx.doi.org/10.2196/56655 UR - http://www.ncbi.nlm.nih.gov/pubmed/38630520 ID - info:doi/10.2196/56655 ER - TY - JOUR AU - Jonathan, Joan AU - Barakabitze, Alex Alcardo AU - Fast, D. Cynthia AU - Cox, Christophe PY - 2024/4/16 TI - Machine Learning for Prediction of Tuberculosis Detection: Case Study of Trained African Giant Pouched Rats JO - Online J Public Health Inform SP - e50771 VL - 16 KW - machine learning KW - African giant pouched rat KW - diagnosis KW - tuberculosis KW - health care N2 - Background: Technological advancement has led to the growth and rapid increase of tuberculosis (TB) medical data generated from different health care areas, including diagnosis. Prioritizing better adoption and acceptance of innovative diagnostic technology to reduce the spread of TB significantly benefits developing countries. Trained TB-detection rats are used in Tanzania and Ethiopia for operational research to complement other TB diagnostic tools. This technology has increased new TB case detection owing to its speed, cost-effectiveness, and sensitivity. Objective: During the TB detection process, rats produce vast amounts of data, providing an opportunity to identify interesting patterns that influence TB detection performance. This study aimed to develop models that predict if the rat will hit (indicate the presence of TB within) the sample or not using machine learning (ML) techniques. The goal was to improve the diagnostic accuracy and performance of TB detection involving rats. Methods: APOPO (Anti-Persoonsmijnen Ontmijnende Product Ontwikkeling) Center in Morogoro provided data for this study from 2012 to 2019, and 366,441 observations were used to build predictive models using ML techniques, including decision tree, random forest, naïve Bayes, support vector machine, and k-nearest neighbor, by incorporating a variety of variables, such as the diagnostic results from partner health clinics using methods endorsed by the World Health Organization (WHO). Results: The support vector machine technique yielded the highest accuracy of 83.39% for prediction compared to other ML techniques used. Furthermore, this study found that the inclusion of variables related to whether the sample contained TB or not increased the performance accuracy of the predictive model. Conclusions: The inclusion of variables related to the diagnostic results of TB samples may improve the detection performance of the trained rats. The study results may be of importance to TB-detection rat trainers and TB decision-makers as the results may prompt them to take action to maintain the usefulness of the technology and increase the TB detection performance of trained rats. UR - https://ojphi.jmir.org/2024/1/e50771 UR - http://dx.doi.org/10.2196/50771 UR - http://www.ncbi.nlm.nih.gov/pubmed/38625737 ID - info:doi/10.2196/50771 ER - TY - JOUR AU - Oreskovic, Jessica AU - Kaufman, Jaycee AU - Fossat, Yan PY - 2024/4/15 TI - Impact of Audio Data Compression on Feature Extraction for Vocal Biomarker Detection: Validation Study JO - JMIR Biomed Eng SP - e56246 VL - 9 KW - vocal biomarker KW - biomarker KW - biomarkers KW - sound KW - sounds KW - audio KW - compression KW - voice KW - acoustic KW - acoustics KW - audio compression KW - feature extraction KW - Python KW - speech KW - detect KW - detection KW - algorithm KW - algorithms N2 - Background: Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care. Objective: The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection. Methods: The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis. Results: In this study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods. Conclusions: Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This study enhances our understanding of audio compression?s influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes. UR - https://biomedeng.jmir.org/2024/1/e56246 UR - http://dx.doi.org/10.2196/56246 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875677 ID - info:doi/10.2196/56246 ER - TY - JOUR AU - Huo, Jian AU - Yu, Yan AU - Lin, Wei AU - Hu, Anmin AU - Wu, Chaoran PY - 2024/4/12 TI - Application of AI in Multilevel Pain Assessment Using Facial Images: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e51250 VL - 26 KW - computer vision KW - facial image KW - monitoring KW - multilevel pain assessment KW - pain KW - postoperative KW - status N2 - Background: The continuous monitoring and recording of patients? pain status is a major problem in current research on postoperative pain management. In the large number of original or review articles focusing on different approaches for pain assessment, many researchers have investigated how computer vision (CV) can help by capturing facial expressions. However, there is a lack of proper comparison of results between studies to identify current research gaps. Objective: The purpose of this systematic review and meta-analysis was to investigate the diagnostic performance of artificial intelligence models for multilevel pain assessment from facial images. Methods: The PubMed, Embase, IEEE, Web of Science, and Cochrane Library databases were searched for related publications before September 30, 2023. Studies that used facial images alone to estimate multiple pain values were included in the systematic review. A study quality assessment was conducted using the Quality Assessment of Diagnostic Accuracy Studies, 2nd edition tool. The performance of these studies was assessed by metrics including sensitivity, specificity, log diagnostic odds ratio (LDOR), and area under the curve (AUC). The intermodal variability was assessed and presented by forest plots. Results: A total of 45 reports were included in the systematic review. The reported test accuracies ranged from 0.27-0.99, and the other metrics, including the mean standard error (MSE), mean absolute error (MAE), intraclass correlation coefficient (ICC), and Pearson correlation coefficient (PCC), ranged from 0.31-4.61, 0.24-2.8, 0.19-0.83, and 0.48-0.92, respectively. In total, 6 studies were included in the meta-analysis. Their combined sensitivity was 98% (95% CI 96%-99%), specificity was 98% (95% CI 97%-99%), LDOR was 7.99 (95% CI 6.73-9.31), and AUC was 0.99 (95% CI 0.99-1). The subgroup analysis showed that the diagnostic performance was acceptable, although imbalanced data were still emphasized as a major problem. All studies had at least one domain with a high risk of bias, and for 20% (9/45) of studies, there were no applicability concerns. Conclusions: This review summarizes recent evidence in automatic multilevel pain estimation from facial expressions and compared the test accuracy of results in a meta-analysis. Promising performance for pain estimation from facial images was established by current CV algorithms. Weaknesses in current studies were also identified, suggesting that larger databases and metrics evaluating multiclass classification performance could improve future studies. Trial Registration: PROSPERO CRD42023418181; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=418181 UR - https://www.jmir.org/2024/1/e51250 UR - http://dx.doi.org/10.2196/51250 UR - http://www.ncbi.nlm.nih.gov/pubmed/38607660 ID - info:doi/10.2196/51250 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Harada, Yukinori AU - Tokumasu, Kazuki AU - Ito, Takahiro AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2024/4/9 TI - Evaluating ChatGPT-4?s Diagnostic Accuracy: Impact of Visual Data Integration JO - JMIR Med Inform SP - e55627 VL - 12 KW - artificial intelligence KW - large language model KW - LLM KW - LLMs KW - language model KW - language models KW - ChatGPT KW - GPT KW - ChatGPT-4V KW - ChatGPT-4 Vision KW - clinical decision support KW - natural language processing KW - decision support KW - NLP KW - diagnostic excellence KW - diagnosis KW - diagnoses KW - diagnose KW - diagnostic KW - diagnostics KW - image KW - images KW - imaging N2 - Background: In the evolving field of health care, multimodal generative artificial intelligence (AI) systems, such as ChatGPT-4 with vision (ChatGPT-4V), represent a significant advancement, as they integrate visual data with text data. This integration has the potential to revolutionize clinical diagnostics by offering more comprehensive analysis capabilities. However, the impact on diagnostic accuracy of using image data to augment ChatGPT-4 remains unclear. Objective: This study aims to assess the impact of adding image data on ChatGPT-4?s diagnostic accuracy and provide insights into how image data integration can enhance the accuracy of multimodal AI in medical diagnostics. Specifically, this study endeavored to compare the diagnostic accuracy between ChatGPT-4V, which processed both text and image data, and its counterpart, ChatGPT-4, which only uses text data. Methods: We identified a total of 557 case reports published in the American Journal of Case Reports from January 2022 to March 2023. After excluding cases that were nondiagnostic, pediatric, and lacking image data, we included 363 case descriptions with their final diagnoses and associated images. We compared the diagnostic accuracy of ChatGPT-4V and ChatGPT-4 without vision based on their ability to include the final diagnoses within differential diagnosis lists. Two independent physicians evaluated their accuracy, with a third resolving any discrepancies, ensuring a rigorous and objective analysis. Results: The integration of image data into ChatGPT-4V did not significantly enhance diagnostic accuracy, showing that final diagnoses were included in the top 10 differential diagnosis lists at a rate of 85.1% (n=309), comparable to the rate of 87.9% (n=319) for the text-only version (P=.33). Notably, ChatGPT-4V?s performance in correctly identifying the top diagnosis was inferior, at 44.4% (n=161), compared with 55.9% (n=203) for the text-only version (P=.002, ?2 test). Additionally, ChatGPT-4?s self-reports showed that image data accounted for 30% of the weight in developing the differential diagnosis lists in more than half of cases. Conclusions: Our findings reveal that currently, ChatGPT-4V predominantly relies on textual data, limiting its ability to fully use the diagnostic potential of visual information. This study underscores the need for further development of multimodal generative AI systems to effectively integrate and use clinical image data. Enhancing the diagnostic performance of such AI systems through improved multimodal data integration could significantly benefit patient care by providing more accurate and comprehensive diagnostic insights. Future research should focus on overcoming these limitations, paving the way for the practical application of advanced AI in medicine. UR - https://medinform.jmir.org/2024/1/e55627 UR - http://dx.doi.org/10.2196/55627 UR - http://www.ncbi.nlm.nih.gov/pubmed/38592758 ID - info:doi/10.2196/55627 ER - TY - JOUR AU - Lin, Yu-Ting AU - Deng, Yuan-Xiang AU - Tsai, Chu-Lin AU - Huang, Chien-Hua AU - Fu, Li-Chen PY - 2024/4/1 TI - Interpretable Deep Learning System for Identifying Critical Patients Through the Prediction of Triage Level, Hospitalization, and Length of Stay: Prospective Study JO - JMIR Med Inform SP - e48862 VL - 12 KW - emergency department KW - triage system KW - hospital admission KW - length of stay KW - multimodal integration N2 - Background: Triage is the process of accurately assessing patients? symptoms and providing them with proper clinical treatment in the emergency department (ED). While many countries have developed their triage process to stratify patients? clinical severity and thus distribute medical resources, there are still some limitations of the current triage process. Since the triage level is mainly identified by experienced nurses based on a mix of subjective and objective criteria, mis-triage often occurs in the ED. It can not only cause adverse effects on patients, but also impose an undue burden on the health care delivery system. Objective: Our study aimed to design a prediction system based on triage information, including demographics, vital signs, and chief complaints. The proposed system can not only handle heterogeneous data, including tabular data and free-text data, but also provide interpretability for better acceptance by the ED staff in the hospital. Methods: In this study, we proposed a system comprising 3 subsystems, with each of them handling a single task, including triage level prediction, hospitalization prediction, and length of stay prediction. We used a large amount of retrospective data to pretrain the model, and then, we fine-tuned the model on a prospective data set with a golden label. The proposed deep learning framework was built with TabNet and MacBERT (Chinese version of bidirectional encoder representations from transformers [BERT]). Results: The performance of our proposed model was evaluated on data collected from the National Taiwan University Hospital (901 patients were included). The model achieved promising results on the collected data set, with accuracy values of 63%, 82%, and 71% for triage level prediction, hospitalization prediction, and length of stay prediction, respectively. Conclusions: Our system improved the prediction of 3 different medical outcomes when compared with other machine learning methods. With the pretrained vital sign encoder and repretrained mask language modeling MacBERT encoder, our multimodality model can provide a deeper insight into the characteristics of electronic health records. Additionally, by providing interpretability, we believe that the proposed system can assist nursing staff and physicians in taking appropriate medical decisions. UR - https://medinform.jmir.org/2024/1/e48862 UR - http://dx.doi.org/10.2196/48862 UR - http://www.ncbi.nlm.nih.gov/pubmed/38557661 ID - info:doi/10.2196/48862 ER - TY - JOUR AU - Hu, Zhao AU - Wang, Min AU - Zheng, Si AU - Xu, Xiaowei AU - Zhang, Zhuxin AU - Ge, Qiaoyue AU - Li, Jiao AU - Yao, Yan PY - 2024/3/26 TI - Clinical Decision Support Requirements for Ventricular Tachycardia Diagnosis Within the Frameworks of Knowledge and Practice: Survey Study JO - JMIR Hum Factors SP - e55802 VL - 11 KW - clinical decision support system KW - requirements analysis KW - ventricular tachycardia KW - knowledge KW - clinical practice KW - questionnaires N2 - Background: Ventricular tachycardia (VT) diagnosis is challenging due to the similarity between VT and some forms of supraventricular tachycardia, complexity of clinical manifestations, heterogeneity of underlying diseases, and potential for life-threatening hemodynamic instability. Clinical decision support systems (CDSSs) have emerged as promising tools to augment the diagnostic capabilities of cardiologists. However, a requirements analysis is acknowledged to be vital for the success of a CDSS, especially for complex clinical tasks such as VT diagnosis. Objective: The aims of this study were to analyze the requirements for a VT diagnosis CDSS within the frameworks of knowledge and practice and to determine the clinical decision support (CDS) needs. Methods: Our multidisciplinary team first conducted semistructured interviews with seven cardiologists related to the clinical challenges of VT and expected decision support. A questionnaire was designed by the multidisciplinary team based on the results of interviews. The questionnaire was divided into four sections: demographic information, knowledge assessment, practice assessment, and CDS needs. The practice section consisted of two simulated cases for a total score of 10 marks. Online questionnaires were disseminated to registered cardiologists across China from December 2022 to February 2023. The scores for the practice section were summarized as continuous variables, using the mean, median, and range. The knowledge and CDS needs sections were assessed using a 4-point Likert scale without a neutral option. Kruskal-Wallis tests were performed to investigate the relationship between scores and practice years or specialty. Results: Of the 687 cardiologists who completed the questionnaire, 567 responses were eligible for further analysis. The results of the knowledge assessment showed that 383 cardiologists (68%) lacked knowledge in diagnostic evaluation. The overall average score of the practice assessment was 6.11 (SD 0.55); the etiological diagnosis section had the highest overall scores (mean 6.74, SD 1.75), whereas the diagnostic evaluation section had the lowest scores (mean 5.78, SD 1.19). A majority of cardiologists (344/567, 60.7%) reported the need for a CDSS. There was a significant difference in practice competency scores between general cardiologists and arrhythmia specialists (P=.02). Conclusions: There was a notable deficiency in the knowledge and practice of VT among Chinese cardiologists. Specific knowledge and practice support requirements were identified, which provide a foundation for further development and optimization of a CDSS. Moreover, it is important to consider clinicians? specialization levels and years of practice for effective and personalized support. UR - https://humanfactors.jmir.org/2024/1/e55802 UR - http://dx.doi.org/10.2196/55802 UR - http://www.ncbi.nlm.nih.gov/pubmed/38530337 ID - info:doi/10.2196/55802 ER - TY - JOUR AU - Ong, Yuhan Ariel AU - Hogg, Jeffry Henry David AU - Kale, U. Aditya AU - Taribagil, Priyal AU - Kras, Ashley AU - Dow, Eliot AU - Macdonald, Trystan AU - Liu, Xiaoxuan AU - Keane, A. Pearse AU - Denniston, K. Alastair PY - 2024/3/14 TI - AI as a Medical Device for Ophthalmic Imaging in Europe, Australia, and the United States: Protocol for a Systematic Scoping Review of Regulated Devices JO - JMIR Res Protoc SP - e52602 VL - 13 KW - AIaMD KW - artificial intelligence as a medical device KW - artificial intelligence KW - deep learning KW - machine learning KW - ophthalmic imaging KW - regulatory approval N2 - Background: Artificial intelligence as a medical device (AIaMD) has the potential to transform many aspects of ophthalmic care, such as improving accuracy and speed of diagnosis, addressing capacity issues in high-volume areas such as screening, and detecting novel biomarkers of systemic disease in the eye (oculomics). In order to ensure that such tools are safe for the target population and achieve their intended purpose, it is important that these AIaMD have adequate clinical evaluation to support any regulatory decision. Currently, the evidential requirements for regulatory approval are less clear for AIaMD compared to more established interventions such as drugs or medical devices. There is therefore value in understanding the level of evidence that underpins AIaMD currently on the market, as a step toward identifying what the best practices might be in this area. In this systematic scoping review, we will focus on AIaMD that contributes to clinical decision-making (relating to screening, diagnosis, prognosis, and treatment) in the context of ophthalmic imaging. Objective: This study aims to identify regulator-approved AIaMD for ophthalmic imaging in Europe, Australia, and the United States; report the characteristics of these devices and their regulatory approvals; and report the available evidence underpinning these AIaMD. Methods: The Food and Drug Administration (United States), the Australian Register of Therapeutic Goods (Australia), the Medicines and Healthcare products Regulatory Agency (United Kingdom), and the European Database on Medical Devices (European Union) regulatory databases will be searched for ophthalmic imaging AIaMD through a snowballing approach. PubMed and clinical trial registries will be systematically searched, and manufacturers will be directly contacted for studies investigating the effectiveness of eligible AIaMD. Preliminary regulatory database searches, evidence searches, screening, data extraction, and methodological quality assessment will be undertaken by 2 independent review authors and arbitrated by a third at each stage of the process. Results: Preliminary searches were conducted in February 2023. Data extraction, data synthesis, and assessment of methodological quality commenced in October 2023. The review is on track to be completed and submitted for peer review by April 2024. Conclusions: This systematic review will provide greater clarity on ophthalmic imaging AIaMD that have achieved regulatory approval as well as the evidence that underpins them. This should help adopters understand the range of tools available and whether they can be safely incorporated into their clinical workflow, and it should also support developers in navigating regulatory approval more efficiently. International Registered Report Identifier (IRRID): DERR1-10.2196/52602 UR - https://www.researchprotocols.org/2024/1/e52602 UR - http://dx.doi.org/10.2196/52602 UR - http://www.ncbi.nlm.nih.gov/pubmed/38483456 ID - info:doi/10.2196/52602 ER - TY - JOUR AU - Weber, Isaac AU - Zagona-Prizio, Caterina AU - Sivesind, E. Torunn AU - Adelman, Madeline AU - Szeto, D. Mindy AU - Liu, Ying AU - Sillau, H. Stefan AU - Bainbridge, Jacquelyn AU - Klawitter, Jost AU - Sempio, Cristina AU - Dunnick, A. Cory AU - Leehey, A. Maureen AU - Dellavalle, P. Robert PY - 2024/3/11 TI - Oral Cannabidiol for Seborrheic Dermatitis in Patients With Parkinson Disease: Randomized Clinical Trial JO - JMIR Dermatol SP - e49965 VL - 7 KW - cannabidiol KW - cannabis KW - CBD treatment KW - CBD KW - image KW - photograph KW - photographs KW - imaging KW - sebum KW - clinical trials KW - seborrheic dermatitis KW - Parkinson disease KW - clinical trial KW - RCT KW - randomized KW - controlled trial KW - drug response KW - SEDASI KW - drug KW - Parkinson KW - dermatitis KW - skin KW - dermatology KW - treatment KW - outcome KW - chi-square N2 - Background: Seborrheic dermatitis (SD) affects 18.6%-59% of persons with Parkinson disease (PD), and recent studies provide evidence that oral cannabidiol (CBD) therapy could reduce sebum production in addition to improving motor and psychiatric symptoms in PD. Therefore, oral CBD could be useful for improving symptoms of both commonly co-occurring conditions. Objective: This study investigates whether oral CBD therapy is associated with a decrease in SD severity in PD. Methods: Facial photographs were collected as a component of a randomized (1:1 CBD vs placebo), parallel, double-blind, placebo-controlled trial assessing the efficacy of a short-term 2.5 mg per kg per day oral sesame solution CBD-rich cannabis extract (formulated to 100 mg/mL CBD and 3.3 mg/mL THC) for reducing motor symptoms in PD. Participants took 1.25 mg per kg per day each morning for 4 ±1 days and then twice daily for 10 ±4 days. Reviewers analyzed the photographs independently and provided a severity ranking based on the Seborrheic Dermatitis Area and Severity Index (SEDASI) scale. Baseline demographic and disease characteristics, as well as posttreatment SEDASI averages and the presence of SD, were analyzed with 2-tailed t tests and Pearson ?2 tests. SEDASI was analyzed with longitudinal regression, and SD was analyzed with generalized estimating equations. Results: A total of 27 participants received a placebo and 26 received CBD for 16 days. SD severity was low in both groups at baseline, and there was no treatment effect. The risk ratio for patients receiving CBD, post versus pre, was 0.69 (95% CI 0.41-1.18; P=.15), compared to 1.20 (95% CI 0.88-1.65; P=.26) for the patients receiving the placebo. The within-group pre-post change was not statistically significant for either group, but they differed from each other (P=.07) because there was an estimated improvement for the CBD group and an estimated worsening for the placebo group. Conclusions: This study does not provide solid evidence that oral CBD therapy reduces the presence of SD among patients with PD. While this study was sufficiently powered to detect the primary outcome (efficacy of CBD on PD motor symptoms), it was underpowered for the secondary outcomes of detecting changes in the presence and severity of SD. Multiple mechanisms exist through which CBD can exert beneficial effects on SD pathogenesis. Larger studies, including participants with increased disease severity and longer treatment periods, may better elucidate treatment effects and are needed to determine CBD?s true efficacy for affecting SD severity. Trial Registration: ClinicalTrials.gov NCT03582137; https://clinicaltrials.gov/ct2/show/NCT03582137 UR - https://derma.jmir.org/2024/1/e49965 UR - http://dx.doi.org/10.2196/49965 UR - http://www.ncbi.nlm.nih.gov/pubmed/38466972 ID - info:doi/10.2196/49965 ER - TY - JOUR AU - Chuang, Bo-Sheng Beau AU - Yang, C. Albert PY - 2024/3/11 TI - Optimization of Using Multiple Machine Learning Approaches in Atrial Fibrillation Detection Based on a Large-Scale Data Set of 12-Lead Electrocardiograms: Cross-Sectional Study JO - JMIR Form Res SP - e47803 VL - 8 KW - machine learning KW - atrial fibrillation KW - light gradient boosting machine KW - power spectral density KW - digital health KW - electrocardiogram KW - machine learning algorithm KW - atrial fibrillation detection KW - real-time KW - detection KW - electrocardiography leads KW - clinical outcome N2 - Background: Atrial fibrillation (AF) represents a hazardous cardiac arrhythmia that significantly elevates the risk of stroke and heart failure. Despite its severity, its diagnosis largely relies on the proficiency of health care professionals. At present, the real-time identification of paroxysmal AF is hindered by the lack of automated techniques. Consequently, a highly effective machine learning algorithm specifically designed for AF detection could offer substantial clinical benefits. We hypothesized that machine learning algorithms have the potential to identify and extract features of AF with a high degree of accuracy, given the intricate and distinctive patterns present in electrocardiogram (ECG) recordings of AF. Objective: This study aims to develop a clinically valuable machine learning algorithm that can accurately detect AF and compare different leads? performances of AF detection. Methods: We used 12-lead ECG recordings sourced from the 2020 PhysioNet Challenge data sets. The Welch method was used to extract power spectral features of the 12-lead ECGs within a frequency range of 0.083 to 24.92 Hz. Subsequently, various machine learning techniques were evaluated and optimized to classify sinus rhythm (SR) and AF based on these power spectral features. Furthermore, we compared the effects of different frequency subbands and different lead selections on machine learning performances. Results: The light gradient boosting machine (LightGBM) was found to be the most effective in classifying AF and SR, achieving an average F1-score of 0.988 across all ECG leads. Among the frequency subbands, the 0.083 to 4.92 Hz range yielded the highest F1-score of 0.985. In interlead comparisons, aVR had the highest performance (F1=0.993), with minimal differences observed between leads. Conclusions: In conclusion, this study successfully used machine learning methodologies, particularly the LightGBM model, to differentiate SR and AF based on power spectral features derived from 12-lead ECGs. The performance marked by an average F1-score of 0.988 and minimal interlead variation underscores the potential of machine learning algorithms to bolster real-time AF detection. This advancement could significantly improve patient care in intensive care units as well as facilitate remote monitoring through wearable devices, ultimately enhancing clinical outcomes. UR - https://formative.jmir.org/2024/1/e47803 UR - http://dx.doi.org/10.2196/47803 UR - http://www.ncbi.nlm.nih.gov/pubmed/38466973 ID - info:doi/10.2196/47803 ER - TY - JOUR AU - Tenda, Daniel Eric AU - Yunus, Eddy Reyhan AU - Zulkarnaen, Benny AU - Yugo, Reynalzi Muhammad AU - Pitoyo, Wicaksono Ceva AU - Asaf, Mazmur Moses AU - Islamiyati, Nur Tiara AU - Pujitresnani, Arierta AU - Setiadharma, Andry AU - Henrina, Joshua AU - Rumende, Martin Cleopas AU - Wulani, Vally AU - Harimurti, Kuntjoro AU - Lydia, Aida AU - Shatri, Hamzah AU - Soewondo, Pradana AU - Yusuf, Astagiri Prasandhya PY - 2024/3/7 TI - Comparison of the Discrimination Performance of AI Scoring and the Brixia Score in Predicting COVID-19 Severity on Chest X-Ray Imaging: Diagnostic Accuracy Study JO - JMIR Form Res SP - e46817 VL - 8 KW - artificial intelligence KW - Brixia KW - chest x-ray KW - COVID-19 KW - CAD4COVID KW - pneumonia KW - radiograph KW - artificial intelligence scoring system KW - AI scoring system KW - prediction KW - disease severity N2 - Background: The artificial intelligence (AI) analysis of chest x-rays can increase the precision of binary COVID-19 diagnosis. However, it is unknown if AI-based chest x-rays can predict who will develop severe COVID-19, especially in low- and middle-income countries. Objective: The study aims to compare the performance of human radiologist Brixia scores versus 2 AI scoring systems in predicting the severity of COVID-19 pneumonia. Methods: We performed a cross-sectional study of 300 patients suspected with and with confirmed COVID-19 infection in Jakarta, Indonesia. A total of 2 AI scores were generated using CAD4COVID x-ray software. Results: The AI probability score had slightly lower discrimination (area under the curve [AUC] 0.787, 95% CI 0.722-0.852). The AI score for the affected lung area (AUC 0.857, 95% CI 0.809-0.905) was almost as good as the human Brixia score (AUC 0.863, 95% CI 0.818-0.908). Conclusions: The AI score for the affected lung area and the human radiologist Brixia score had similar and good discrimination performance in predicting COVID-19 severity. Our study demonstrated that using AI-based diagnostic tools is possible, even in low-resource settings. However, before it is widely adopted in daily practice, more studies with a larger scale and that are prospective in nature are needed to confirm our findings. UR - https://formative.jmir.org/2024/1/e46817 UR - http://dx.doi.org/10.2196/46817 UR - http://www.ncbi.nlm.nih.gov/pubmed/38451633 ID - info:doi/10.2196/46817 ER - TY - JOUR AU - Athreya, Shreeram AU - Radhachandran, Ashwath AU - Ivezi?, Vedrana AU - Sant, R. Vivek AU - Arnold, W. Corey AU - Speier, William PY - 2024/12/17 TI - Enhancing Ultrasound Image Quality Across Disease Domains: Application of Cycle-Consistent Generative Adversarial Network and Perceptual Loss JO - JMIR Biomed Eng SP - e58911 VL - 9 KW - generative networks KW - cycle generative adversarial network KW - image enhancement KW - perceptual loss KW - ultrasound scans KW - ultrasound images KW - imaging KW - machine learning KW - portable handheld devices N2 - Background: Numerous studies have explored image processing techniques aimed at enhancing ultrasound images to narrow the performance gap between low-quality portable devices and high-end ultrasound equipment. These investigations often use registered image pairs created by modifying the same image through methods like down sampling or adding noise, rather than using separate images from different machines. Additionally, they rely on organ-specific features, limiting the models? generalizability across various imaging conditions and devices. The challenge remains to develop a universal framework capable of improving image quality across different devices and conditions, independent of registration or specific organ characteristics. Objective: This study aims to develop a robust framework that enhances the quality of ultrasound images, particularly those captured with compact, portable devices, which are often constrained by low quality due to hardware limitations. The framework is designed to effectively process nonregistered ultrasound image pairs, a common challenge in medical imaging, across various clinical settings and device types. By addressing these challenges, the research seeks to provide a more generalized and adaptable solution that can be widely applied across diverse medical scenarios, improving the accessibility and quality of diagnostic imaging. Methods: A retrospective analysis was conducted by using a cycle-consistent generative adversarial network (CycleGAN) framework enhanced with perceptual loss to improve the quality of ultrasound images, focusing on nonregistered image pairs from various organ systems. The perceptual loss was integrated to preserve anatomical integrity by comparing deep features extracted from pretrained neural networks. The model?s performance was evaluated against corresponding high-resolution images, ensuring that the enhanced outputs closely mimic those from high-end ultrasound devices. The model was trained and validated using a publicly available, diverse dataset to ensure robustness and generalizability across different imaging scenarios. Results: The advanced CycleGAN framework, enhanced with perceptual loss, significantly outperformed the previous state-of-the-art, stable CycleGAN, in multiple evaluation metrics. Specifically, our method achieved a structural similarity index of 0.2889 versus 0.2502 (P<.001), a peak signal-to-noise ratio of 15.8935 versus 14.9430 (P<.001), and a learned perceptual image patch similarity score of 0.4490 versus 0.5005 (P<.001). These results demonstrate the model?s superior ability to enhance image quality while preserving critical anatomical details, thereby improving diagnostic usefulness. Conclusions: This study presents a significant advancement in ultrasound imaging by leveraging a CycleGAN model enhanced with perceptual loss to bridge the quality gap between images from different devices. By processing nonregistered image pairs, the model not only enhances visual quality but also ensures the preservation of essential anatomical structures, crucial for accurate diagnosis. This approach holds the potential to democratize high-quality ultrasound imaging, making it accessible through low-cost portable devices, thereby improving health care outcomes, particularly in resource-limited settings. Future research will focus on further validation and optimization for clinical use. UR - https://biomedeng.jmir.org/2024/1/e58911 UR - http://dx.doi.org/10.2196/58911 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58911 ER - TY - JOUR AU - Santana, Osório Giulia AU - Couto, Macedo Rodrigo de AU - Loureiro, Maffei Rafael AU - Furriel, Silva Brunna Carolinne Rocha AU - Rother, Terezinha Edna AU - de Paiva, Queiroz Joselisa Péres AU - Correia, Reis Lucas PY - 2023/12/28 TI - Economic Evaluations and Equity in the Use of Artificial Intelligence in Imaging Exams for Medical Diagnosis in People With Skin, Neurological, and Pulmonary Diseases: Protocol for a Systematic Review JO - JMIR Res Protoc SP - e48544 VL - 12 KW - artificial intelligence KW - economic evaluation KW - equity KW - medical diagnosis KW - health care system KW - technology KW - systematic review KW - cost-effectiveness KW - imaging exam KW - intervention N2 - Background: Traditional health care systems face long-standing challenges, including patient diversity, geographical disparities, and financial constraints. The emergence of artificial intelligence (AI) in health care offers solutions to these challenges. AI, a multidisciplinary field, enhances clinical decision-making. However, imbalanced AI models may enhance health disparities. Objective: This systematic review aims to investigate the economic performance and equity impact of AI in diagnostic imaging for skin, neurological, and pulmonary diseases. The research question is ?To what extent does the use of AI in imaging exams for diagnosing skin, neurological, and pulmonary diseases result in improved economic outcomes, and does it promote equity in health care systems?? Methods: The study is a systematic review of economic and equity evaluations following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and CHEERS (Consolidated Health Economic Evaluation Reporting Standards) guidelines. Eligibility criteria include articles reporting on economic evaluations or equity considerations related to AI-based diagnostic imaging for specified diseases. Data will be collected from PubMed, Embase, Scopus, Web of Science, and reference lists. Data quality and transferability will be assessed according to CHEC (Consensus on Health Economic Criteria), EPHPP (Effective Public Health Practice Project), and Welte checklists. Results: This systematic review began in March 2023. The literature search identified 9,526 publications and, after full-text screening, 9 publications were included in the study. We plan to submit a manuscript to a peer-reviewed journal once it is finalized, with an expected completion date in January 2024. Conclusions: AI in diagnostic imaging offers potential benefits but also raises concerns about equity and economic impact. Bias in algorithms and disparities in access may hinder equitable outcomes. Evaluating the economic viability of AI applications is essential for resource allocation and affordability. Policy makers and health care stakeholders can benefit from this review?s insights to make informed decisions. Limitations, including study variability and publication bias, will be considered in the analysis. This systematic review will provide valuable insights into the economic and equity implications of AI in diagnostic imaging. It aims to inform evidence-based decision-making and contribute to more efficient and equitable health care systems. International Registered Report Identifier (IRRID): DERR1-10.2196/48544 UR - https://www.researchprotocols.org/2023/1/e48544 UR - http://dx.doi.org/10.2196/48544 UR - http://www.ncbi.nlm.nih.gov/pubmed/38153775 ID - info:doi/10.2196/48544 ER - TY - JOUR AU - Lee, Ra Ah AU - Park, Hojoon AU - Yoo, Aram AU - Kim, Seok AU - Sunwoo, Leonard AU - Yoo, Sooyoung PY - 2023/12/6 TI - Risk Prediction of Emergency Department Visits in Patients With Lung Cancer Using Machine Learning: Retrospective Observational Study JO - JMIR Med Inform SP - e53058 VL - 11 KW - emergency department KW - lung cancer KW - risk prediction KW - machine learning KW - common data model KW - emergency KW - hospitalization KW - hospitalizations KW - lung KW - cancer KW - oncology KW - lungs KW - pulmonary KW - respiratory KW - predict KW - prediction KW - predictions KW - predictive KW - algorithm KW - algorithms KW - risk KW - risks KW - model KW - models N2 - Background: Patients with lung cancer are among the most frequent visitors to emergency departments due to cancer-related problems, and the prognosis for those who seek emergency care is dismal. Given that patients with lung cancer frequently visit health care facilities for treatment or follow-up, the ability to predict emergency department visits based on clinical information gleaned from their routine visits would enhance hospital resource utilization and patient outcomes. Objective: This study proposed a machine learning?based prediction model to identify risk factors for emergency department visits by patients with lung cancer. Methods: This was a retrospective observational study of patients with lung cancer diagnosed at Seoul National University Bundang Hospital, a tertiary general hospital in South Korea, between January 2010 and December 2017. The primary outcome was an emergency department visit within 30 days of an outpatient visit. This study developed a machine learning?based prediction model using a common data model. In addition, the importance of features that influenced the decision-making of the model output was analyzed to identify significant clinical factors. Results: The model with the best performance demonstrated an area under the receiver operating characteristic curve of 0.73 in its ability to predict the attendance of patients with lung cancer in emergency departments. The frequency of recent visits to the emergency department and several laboratory test results that are typically collected during cancer treatment follow-up visits were revealed as influencing factors for the model output. Conclusions: This study developed a machine learning?based risk prediction model using a common data model and identified influencing factors for emergency department visits by patients with lung cancer. The predictive model contributes to the efficiency of resource utilization and health care service quality by facilitating the identification and early intervention of high-risk patients. This study demonstrated the possibility of collaborative research among different institutions using the common data model for precision medicine in lung cancer. UR - https://medinform.jmir.org/2023/1/e53058 UR - http://dx.doi.org/10.2196/53058 UR - http://www.ncbi.nlm.nih.gov/pubmed/38055320 ID - info:doi/10.2196/53058 ER - TY - JOUR AU - Yoon, Jeewoo AU - Han, Jinyoung AU - Ko, Junseo AU - Choi, Seong AU - Park, In Ji AU - Hwang, Seo Joon AU - Han, Mo Jeong AU - Hwang, Duck-Jin Daniel PY - 2023/11/29 TI - Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy JO - J Med Internet Res SP - e48142 VL - 25 KW - computer aided diagnosis KW - ophthalmology KW - deep learning KW - artificial intelligence KW - computer vision KW - imaging informatics KW - retinal disease KW - central serous chorioretinopathy KW - diagnostic study N2 - Background: Although previous research has made substantial progress in developing high-performance artificial intelligence (AI)?based computer-aided diagnosis (AI-CAD) systems in various medical domains, little attention has been paid to developing and evaluating AI-CAD system in ophthalmology, particularly for diagnosing retinal diseases using optical coherence tomography (OCT) images. Objective: This diagnostic study aimed to determine the usefulness of a proposed AI-CAD system in assisting ophthalmologists with the diagnosis of central serous chorioretinopathy (CSC), which is known to be difficult to diagnose, using OCT images. Methods: For the training and evaluation of the proposed deep learning model, 1693 OCT images were collected and annotated. The data set included 929 and 764 cases of acute and chronic CSC, respectively. In total, 66 ophthalmologists (2 groups: 36 retina and 30 nonretina specialists) participated in the observer performance test. To evaluate the deep learning algorithm used in the proposed AI-CAD system, the training, validation, and test sets were split in an 8:1:1 ratio. Further, 100 randomly sampled OCT images from the test set were used for the observer performance test, and the participants were instructed to select a CSC subtype for each of these images. Each image was provided under different conditions: (1) without AI assistance, (2) with AI assistance with a probability score, and (3) with AI assistance with a probability score and visual evidence heatmap. The sensitivity, specificity, and area under the receiver operating characteristic curve were used to measure the diagnostic performance of the model and ophthalmologists. Results: The proposed system achieved a high detection performance (99% of the area under the curve) for CSC, outperforming the 66 ophthalmologists who participated in the observer performance test. In both groups, ophthalmologists with the support of AI assistance with a probability score and visual evidence heatmap achieved the highest mean diagnostic performance compared with that of those subjected to other conditions (without AI assistance or with AI assistance with a probability score). Nonretina specialists achieved expert-level diagnostic performance with the support of the proposed AI-CAD system. Conclusions: Our proposed AI-CAD system improved the diagnosis of CSC by ophthalmologists, which may support decision-making regarding retinal disease detection and alleviate the workload of ophthalmologists. UR - https://www.jmir.org/2023/1/e48142 UR - http://dx.doi.org/10.2196/48142 UR - http://www.ncbi.nlm.nih.gov/pubmed/38019564 ID - info:doi/10.2196/48142 ER - TY - JOUR AU - Abd-alrazaq, Alaa AU - AlSaad, Rawan AU - Harfouche, Manale AU - Aziz, Sarah AU - Ahmed, Arfan AU - Damseh, Rafat AU - Sheikh, Javaid PY - 2023/11/8 TI - Wearable Artificial Intelligence for Detecting Anxiety: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e48754 VL - 25 KW - anxiety KW - artificial intelligence KW - wearable devices KW - machine learning KW - systematic review KW - mobile phone N2 - Background: Anxiety disorders rank among the most prevalent mental disorders worldwide. Anxiety symptoms are typically evaluated using self-assessment surveys or interview-based assessment methods conducted by clinicians, which can be subjective, time-consuming, and challenging to repeat. Therefore, there is an increasing demand for using technologies capable of providing objective and early detection of anxiety. Wearable artificial intelligence (AI), the combination of AI technology and wearable devices, has been widely used to detect and predict anxiety disorders automatically, objectively, and more efficiently. Objective: This systematic review and meta-analysis aims to assess the performance of wearable AI in detecting and predicting anxiety. Methods: Relevant studies were retrieved by searching 8 electronic databases and backward and forward reference list checking. In total, 2 reviewers independently carried out study selection, data extraction, and risk-of-bias assessment. The included studies were assessed for risk of bias using a modified version of the Quality Assessment of Diagnostic Accuracy Studies?Revised. Evidence was synthesized using a narrative (ie, text and tables) and statistical (ie, meta-analysis) approach as appropriate. Results: Of the 918 records identified, 21 (2.3%) were included in this review. A meta-analysis of results from 81% (17/21) of the studies revealed a pooled mean accuracy of 0.82 (95% CI 0.71-0.89). Meta-analyses of results from 48% (10/21) of the studies showed a pooled mean sensitivity of 0.79 (95% CI 0.57-0.91) and a pooled mean specificity of 0.92 (95% CI 0.68-0.98). Subgroup analyses demonstrated that the performance of wearable AI was not moderated by algorithms, aims of AI, wearable devices used, status of wearable devices, data types, data sources, reference standards, and validation methods. Conclusions: Although wearable AI has the potential to detect anxiety, it is not yet advanced enough for clinical use. Until further evidence shows an ideal performance of wearable AI, it should be used along with other clinical assessments. Wearable device companies need to develop devices that can promptly detect anxiety and identify specific time points during the day when anxiety levels are high. Further research is needed to differentiate types of anxiety, compare the performance of different wearable devices, and investigate the impact of the combination of wearable device data and neuroimaging data on the performance of wearable AI. Trial Registration: PROSPERO CRD42023387560; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387560 UR - https://www.jmir.org/2023/1/e48754 UR - http://dx.doi.org/10.2196/48754 UR - http://www.ncbi.nlm.nih.gov/pubmed/37938883 ID - info:doi/10.2196/48754 ER - TY - JOUR AU - Ho, Vy AU - Brown Johnson, Cati AU - Ghanzouri, Ilies AU - Amal, Saeed AU - Asch, Steven AU - Ross, Elsie PY - 2023/11/6 TI - Physician- and Patient-Elicited Barriers and Facilitators to Implementation of a Machine Learning?Based Screening Tool for Peripheral Arterial Disease: Preimplementation Study With Physician and Patient Stakeholders JO - JMIR Cardio SP - e44732 VL - 7 KW - artificial intelligence KW - cardiovascular disease KW - machine learning KW - peripheral arterial disease KW - preimplementation study N2 - Background: Peripheral arterial disease (PAD) is underdiagnosed, partially due to a high prevalence of atypical symptoms and a lack of physician and patient awareness. Implementing clinical decision support tools powered by machine learning algorithms may help physicians identify high-risk patients for diagnostic workup. Objective: This study aims to evaluate barriers and facilitators to the implementation of a novel machine learning?based screening tool for PAD among physician and patient stakeholders using the Consolidated Framework for Implementation Research (CFIR). Methods: We performed semistructured interviews with physicians and patients from the Stanford University Department of Primary Care and Population Health, Division of Cardiology, and Division of Vascular Medicine. Participants answered questions regarding their perceptions toward machine learning and clinical decision support for PAD detection. Rapid thematic analysis was performed using templates incorporating codes from CFIR constructs. Results: A total of 12 physicians (6 primary care physicians and 6 cardiovascular specialists) and 14 patients were interviewed. Barriers to implementation arose from 6 CFIR constructs: complexity, evidence strength and quality, relative priority, external policies and incentives, knowledge and beliefs about intervention, and individual identification with the organization. Facilitators arose from 5 CFIR constructs: intervention source, relative advantage, learning climate, patient needs and resources, and knowledge and beliefs about intervention. Physicians felt that a machine learning?powered diagnostic tool for PAD would improve patient care but cited limited time and authority in asking patients to undergo additional screening procedures. Patients were interested in having their physicians use this tool but raised concerns about such technologies replacing human decision-making. Conclusions: Patient- and physician-reported barriers toward the implementation of a machine learning?powered PAD diagnostic tool followed four interdependent themes: (1) low familiarity or urgency in detecting PAD; (2) concerns regarding the reliability of machine learning; (3) differential perceptions of responsibility for PAD care among primary care versus specialty physicians; and (4) patient preference for physicians to remain primary interpreters of health care data. Facilitators followed two interdependent themes: (1) enthusiasm for clinical use of the predictive model and (2) willingness to incorporate machine learning into clinical care. Implementation of machine learning?powered diagnostic tools for PAD should leverage provider support while simultaneously educating stakeholders on the importance of early PAD diagnosis. High predictive validity is necessary for machine learning models but not sufficient for implementation. UR - https://cardio.jmir.org/2023/1/e44732 UR - http://dx.doi.org/10.2196/44732 UR - http://www.ncbi.nlm.nih.gov/pubmed/37930755 ID - info:doi/10.2196/44732 ER - TY - JOUR AU - Ito, Naoki AU - Kadomatsu, Sakina AU - Fujisawa, Mineto AU - Fukaguchi, Kiyomitsu AU - Ishizawa, Ryo AU - Kanda, Naoki AU - Kasugai, Daisuke AU - Nakajima, Mikio AU - Goto, Tadahiro AU - Tsugawa, Yusuke PY - 2023/11/2 TI - The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study JO - JMIR Med Educ SP - e47532 VL - 9 KW - GPT-4 KW - racial and ethnic bias KW - typical clinical vignettes KW - diagnosis KW - triage KW - artificial intelligence KW - AI KW - race KW - clinical vignettes KW - physician KW - efficiency KW - decision-making KW - bias KW - GPT N2 - Background: Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. Objective: We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. Methods: We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as ?correct? or ?incorrect.? Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. Results: The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients? race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. Conclusions: GPT-4?s ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage. UR - https://mededu.jmir.org/2023/1/e47532 UR - http://dx.doi.org/10.2196/47532 UR - http://www.ncbi.nlm.nih.gov/pubmed/37917120 ID - info:doi/10.2196/47532 ER - TY - JOUR AU - de Koning, Enrico AU - van der Haas, Yvette AU - Saguna, Saguna AU - Stoop, Esmee AU - Bosch, Jan AU - Beeres, Saskia AU - Schalij, Martin AU - Boogers, Mark PY - 2023/10/31 TI - AI Algorithm to Predict Acute Coronary Syndrome in Prehospital Cardiac Care: Retrospective Cohort Study JO - JMIR Cardio SP - e51375 VL - 7 KW - cardiology KW - acute coronary syndrome KW - Hollands Midden Acute Regional Triage?cardiology KW - prehospital KW - triage KW - artificial intelligence KW - natural language processing KW - angina KW - algorithm KW - overcrowding KW - emergency department KW - clinical decision-making KW - emergency medical service KW - paramedics N2 - Background: Overcrowding of hospitals and emergency departments (EDs) is a growing problem. However, not all ED consultations are necessary. For example, 80% of patients in the ED with chest pain do not have an acute coronary syndrome (ACS). Artificial intelligence (AI) is useful in analyzing (medical) data, and might aid health care workers in prehospital clinical decision-making before patients are presented to the hospital. Objective: The aim of this study was to develop an AI model which would be able to predict ACS before patients visit the ED. The model retrospectively analyzed prehospital data acquired by emergency medical services' nurse paramedics. Methods: Patients presenting to the emergency medical services with symptoms suggestive of ACS between September 2018 and September 2020 were included. An AI model using a supervised text classification algorithm was developed to analyze data. Data were analyzed for all 7458 patients (mean 68, SD 15 years, 54% men). Specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for control and intervention groups. At first, a machine learning (ML) algorithm (or model) was chosen; afterward, the features needed were selected and then the model was tested and improved using iterative evaluation and in a further step through hyperparameter tuning. Finally, a method was selected to explain the final AI model. Results: The AI model had a specificity of 11% and a sensitivity of 99.5% whereas usual care had a specificity of 1% and a sensitivity of 99.5%. The PPV of the AI model was 15% and the NPV was 99%. The PPV of usual care was 13% and the NPV was 94%. Conclusions: The AI model was able to predict ACS based on retrospective data from the prehospital setting. It led to an increase in specificity (from 1% to 11%) and NPV (from 94% to 99%) when compared to usual care, with a similar sensitivity. Due to the retrospective nature of this study and the singular focus on ACS it should be seen as a proof-of-concept. Other (possibly life-threatening) diagnoses were not analyzed. Future prospective validation is necessary before implementation. UR - https://cardio.jmir.org/2023/1/e51375 UR - http://dx.doi.org/10.2196/51375 UR - http://www.ncbi.nlm.nih.gov/pubmed/37906226 ID - info:doi/10.2196/51375 ER - TY - JOUR AU - Gong, Jeong Eun AU - Bang, Seok Chang AU - Lee, Jun Jae AU - Jeong, Min Hae AU - Baik, Ho Gwang AU - Jeong, Hoon Jae AU - Dick, Sigmund AU - Lee, Hun Gi PY - 2023/10/30 TI - Clinical Decision Support System for All Stages of Gastric Carcinogenesis in Real-Time Endoscopy: Model Establishment and Validation Study JO - J Med Internet Res SP - e50448 VL - 25 KW - atrophy KW - intestinal metaplasia KW - metaplasia KW - deep learning KW - endoscopy KW - gastric neoplasms KW - neoplasm KW - neoplasms KW - internal medicine KW - cancer KW - oncology KW - decision support KW - real time KW - gastrointestinal KW - gastric KW - intestinal KW - machine learning KW - clinical decision support system KW - CDSS KW - computer aided KW - diagnosis KW - diagnostic KW - carcinogenesis N2 - Background: Our research group previously established a deep-learning?based clinical decision support system (CDSS) for real-time endoscopy-based detection and classification of gastric neoplasms. However, preneoplastic conditions, such as atrophy and intestinal metaplasia (IM) were not taken into account, and there is no established model that classifies all stages of gastric carcinogenesis. Objective: This study aims to build and validate a CDSS for real-time endoscopy for all stages of gastric carcinogenesis, including atrophy and IM. Methods: A total of 11,868 endoscopic images were used for training and internal testing. The primary outcomes were lesion classification accuracy (6 classes: advanced gastric cancer, early gastric cancer, dysplasia, atrophy, IM, and normal) and atrophy and IM lesion segmentation rates for the segmentation model. The following tests were carried out to validate the performance of lesion classification accuracy: (1) external testing using 1282 images from another institution and (2) evaluation of the classification accuracy of atrophy and IM in real-world procedures in a prospective manner. To estimate the clinical utility, 2 experienced endoscopists were invited to perform a blind test with the same data set. A CDSS was constructed by combining the established 6-class lesion classification model and the preneoplastic lesion segmentation model with the previously established lesion detection model. Results: The overall lesion classification accuracy (95% CI) was 90.3% (89%-91.6%) in the internal test. For the performance validation, the CDSS achieved 85.3% (83.4%-97.2%) overall accuracy. The per-class external test accuracies for atrophy and IM were 95.3% (92.6%-98%) and 89.3% (85.4%-93.2%), respectively. CDSS-assisted endoscopy showed an accuracy of 92.1% (88.8%-95.4%) for atrophy and 95.5% (92%-99%) for IM in the real-world application of 522 consecutive screening endoscopies. There was no significant difference in the overall accuracy between the invited endoscopists and established CDSS in the prospective real-clinic evaluation (P=.23). The CDSS demonstrated a segmentation rate of 93.4% (95% CI 92.4%-94.4%) for atrophy or IM lesion segmentation in the internal testing. Conclusions: The CDSS achieved high performance in terms of computer-aided diagnosis of all stages of gastric carcinogenesis and demonstrated real-world application potential. UR - https://www.jmir.org/2023/1/e50448 UR - http://dx.doi.org/10.2196/50448 UR - http://www.ncbi.nlm.nih.gov/pubmed/37902818 ID - info:doi/10.2196/50448 ER - TY - JOUR AU - Lei, Mingxing AU - Wu, Bing AU - Zhang, Zhicheng AU - Qin, Yong AU - Cao, Xuyong AU - Cao, Yuncen AU - Liu, Baoge AU - Su, Xiuyun AU - Liu, Yaosheng PY - 2023/10/23 TI - A Web-Based Calculator to Predict Early Death Among Patients With Bone Metastasis Using Machine Learning Techniques: Development and Validation Study JO - J Med Internet Res SP - e47590 VL - 25 KW - bone metastasis KW - early death KW - machine learning KW - prediction model KW - local interpretable model?agnostic explanation N2 - Background: Patients with bone metastasis often experience a significantly limited survival time, and a life expectancy of <3 months is generally regarded as a contraindication for extensive invasive surgeries. In this context, the accurate prediction of survival becomes very important since it serves as a crucial guide in making clinical decisions. Objective: This study aimed to develop a machine learning?based web calculator that can provide an accurate assessment of the likelihood of early death among patients with bone metastasis. Methods: This study analyzed a large cohort of 118,227 patients diagnosed with bone metastasis between 2010 and 2019 using the data obtained from a national cancer database. The entire cohort of patients was randomly split 9:1 into a training group (n=106,492) and a validation group (n=11,735). Six approaches?logistic regression, extreme gradient boosting machine, decision tree, random forest, neural network, and gradient boosting machine?were implemented in this study. The performance of these approaches was evaluated using 11 measures, and each approach was ranked based on its performance in each measure. Patients (n=332) from a teaching hospital were used as the external validation group, and external validation was performed using the optimal model. Results: In the entire cohort, a substantial proportion of patients (43,305/118,227, 36.63%) experienced early death. Among the different approaches evaluated, the gradient boosting machine exhibited the highest score of prediction performance (54 points), followed by the neural network (52 points) and extreme gradient boosting machine (50 points). The gradient boosting machine demonstrated a favorable discrimination ability, with an area under the curve of 0.858 (95% CI 0.851-0.865). In addition, the calibration slope was 1.02, and the intercept-in-large value was ?0.02, indicating good calibration of the model. Patients were divided into 2 risk groups using a threshold of 37% based on the gradient boosting machine. Patients in the high-risk group (3105/4315, 71.96%) were found to be 4.5 times more likely to experience early death compared with those in the low-risk group (1159/7420, 15.62%). External validation of the model demonstrated a high area under the curve of 0.847 (95% CI 0.798-0.895), indicating its robust performance. The model developed by the gradient boosting machine has been deployed on the internet as a calculator. Conclusions: This study develops a machine learning?based calculator to assess the probability of early death among patients with bone metastasis. The calculator has the potential to guide clinical decision-making and improve the care of patients with bone metastasis by identifying those at a higher risk of early death. UR - https://www.jmir.org/2023/1/e47590 UR - http://dx.doi.org/10.2196/47590 UR - http://www.ncbi.nlm.nih.gov/pubmed/37870889 ID - info:doi/10.2196/47590 ER - TY - JOUR AU - Kim, Young Se AU - Park, Jinseok AU - Choi, Hojin AU - Loeser, Martin AU - Ryu, Hokyoung AU - Seo, Kyoungwon PY - 2023/10/20 TI - Digital Marker for Early Screening of Mild Cognitive Impairment Through Hand and Eye Movement Analysis in Virtual Reality Using Machine Learning: First Validation Study JO - J Med Internet Res SP - e48093 VL - 25 KW - Alzheimer disease KW - biomarkers KW - dementia KW - digital markers KW - eye movement KW - hand movement KW - machine learning KW - mild cognitive impairment KW - screening KW - virtual reality N2 - Background: With the global rise in Alzheimer disease (AD), early screening for mild cognitive impairment (MCI), which is a preclinical stage of AD, is of paramount importance. Although biomarkers such as cerebrospinal fluid amyloid level and magnetic resonance imaging have been studied, they have limitations, such as high cost and invasiveness. Digital markers to assess cognitive impairment by analyzing behavioral data collected from digital devices in daily life can be a new alternative. In this context, we developed a ?virtual kiosk test? for early screening of MCI by analyzing behavioral data collected when using a kiosk in a virtual environment. Objective: We aimed to investigate key behavioral features collected from a virtual kiosk test that could distinguish patients with MCI from healthy controls with high statistical significance. Also, we focused on developing a machine learning model capable of early screening of MCI based on these behavioral features. Methods: A total of 51 participants comprising 20 healthy controls and 31 patients with MCI were recruited by 2 neurologists from a university hospital. The participants performed a virtual kiosk test?developed by our group?where we recorded various behavioral data such as hand and eye movements. Based on these time series data, we computed the following 4 behavioral features: hand movement speed, proportion of fixation duration, time to completion, and the number of errors. To compare these behavioral features between healthy controls and patients with MCI, independent-samples 2-tailed t tests were used. Additionally, we used these behavioral features to train and validate a machine learning model for early screening of patients with MCI from healthy controls. Results: In the virtual kiosk test, all 4 behavioral features showed statistically significant differences between patients with MCI and healthy controls. Compared with healthy controls, patients with MCI had slower hand movement speed (t49=3.45; P=.004), lower proportion of fixation duration (t49=2.69; P=.04), longer time to completion (t49=?3.44; P=.004), and a greater number of errors (t49=?3.77; P=.001). All 4 features were then used to train a support vector machine to distinguish between healthy controls and patients with MCI. Our machine learning model achieved 93.3% accuracy, 100% sensitivity, 83.3% specificity, 90% precision, and 94.7% F1-score. Conclusions: Our research preliminarily suggests that analyzing hand and eye movements in the virtual kiosk test holds potential as a digital marker for early screening of MCI. In contrast to conventional biomarkers, this digital marker in virtual reality is advantageous as it can collect ecologically valid data at an affordable cost and in a short period (5-15 minutes), making it a suitable means for early screening of MCI. We call for further studies to confirm the reliability and validity of this approach. UR - https://www.jmir.org/2023/1/e48093 UR - http://dx.doi.org/10.2196/48093 UR - http://www.ncbi.nlm.nih.gov/pubmed/37862101 ID - info:doi/10.2196/48093 ER - TY - JOUR AU - Velazquez-Diaz, Daniel AU - Arco, E. Juan AU - Ortiz, Andres AU - Pérez-Cabezas, Verónica AU - Lucena-Anton, David AU - Moral-Munoz, A. Jose AU - Galán-Mercant, Alejandro PY - 2023/10/20 TI - Use of Artificial Intelligence in the Identification and Diagnosis of Frailty Syndrome in Older Adults: Scoping Review JO - J Med Internet Res SP - e47346 VL - 25 KW - frail older adult KW - identification KW - diagnosis KW - artificial intelligence KW - review KW - frailty KW - older adults KW - aging KW - biological variability KW - detection KW - accuracy KW - sensitivity KW - screening KW - tool N2 - Background: Frailty syndrome (FS) is one of the most common noncommunicable diseases, which is associated with lower physical and mental capacities in older adults. FS diagnosis is mostly focused on biological variables; however, it is likely that this diagnosis could fail owing to the high biological variability in this syndrome. Therefore, artificial intelligence (AI) could be a potential strategy to identify and diagnose this complex and multifactorial geriatric syndrome. Objective: The objective of this scoping review was to analyze the existing scientific evidence on the use of AI for the identification and diagnosis of FS in older adults, as well as to identify which model provides enhanced accuracy, sensitivity, specificity, and area under the curve (AUC). Methods: A search was conducted using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines on various databases: PubMed, Web of Science, Scopus, and Google Scholar. The search strategy followed Population/Problem, Intervention, Comparison, and Outcome (PICO) criteria with the population being older adults; intervention being AI; comparison being compared or not to other diagnostic methods; and outcome being FS with reported sensitivity, specificity, accuracy, or AUC values. The results were synthesized through information extraction and are presented in tables. Results: We identified 26 studies that met the inclusion criteria, 6 of which had a data set over 2000 and 3 with data sets below 100. Machine learning was the most widely used type of AI, employed in 18 studies. Moreover, of the 26 included studies, 9 used clinical data, with clinical histories being the most frequently used data type in this category. The remaining 17 studies used nonclinical data, most frequently involving activity monitoring using an inertial sensor in clinical and nonclinical contexts. Regarding the performance of each AI model, 10 studies achieved a value of precision, sensitivity, specificity, or AUC ?90. Conclusions: The findings of this scoping review clarify the overall status of recent studies using AI to identify and diagnose FS. Moreover, the findings show that the combined use of AI using clinical data along with nonclinical information such as the kinematics of inertial sensors that monitor activities in a nonclinical context could be an appropriate tool for the identification and diagnosis of FS. Nevertheless, some possible limitations of the evidence included in the review could be small sample sizes, heterogeneity of study designs, and lack of standardization in the AI models and diagnostic criteria used across studies. Future research is needed to validate AI systems with diverse data sources for diagnosing FS. AI should be used as a decision support tool for identifying FS, with data quality and privacy addressed, and the tool should be regularly monitored for performance after being integrated in clinical practice. UR - https://www.jmir.org/2023/1/e47346 UR - http://dx.doi.org/10.2196/47346 UR - http://www.ncbi.nlm.nih.gov/pubmed/37862082 ID - info:doi/10.2196/47346 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Kawamura, Ren AU - Harada, Yukinori AU - Mizuta, Kazuya AU - Tokumasu, Kazuki AU - Kaji, Yuki AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2023/10/9 TI - ChatGPT-Generated Differential Diagnosis Lists for Complex Case?Derived Clinical Vignettes: Diagnostic Accuracy Evaluation JO - JMIR Med Inform SP - e48808 VL - 11 KW - artificial intelligence KW - AI chatbot KW - ChatGPT KW - large language models KW - clinical decision support KW - natural language processing KW - diagnostic excellence KW - language model KW - vignette KW - case study KW - diagnostic KW - accuracy KW - decision support KW - diagnosis N2 - Background: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. Objective: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. Methods: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. Results: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models? diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). Conclusions: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making. UR - https://medinform.jmir.org/2023/1/e48808 UR - http://dx.doi.org/10.2196/48808 UR - http://www.ncbi.nlm.nih.gov/pubmed/37812468 ID - info:doi/10.2196/48808 ER - TY - JOUR AU - Parab, Shubham AU - Boster, Jerry AU - Washington, Peter PY - 2023/9/29 TI - Parkinson Disease Recognition Using a Gamified Website: Machine Learning Development and Usability Study JO - JMIR Form Res SP - e49898 VL - 7 KW - Parkinson disease KW - digital health KW - machine learning KW - remote screening KW - accessible screening N2 - Background: Parkinson disease (PD) affects millions globally, causing motor function impairments. Early detection is vital, and diverse data sources aid diagnosis. We focus on lower arm movements during keyboard and trackpad or touchscreen interactions, which serve as reliable indicators of PD. Previous works explore keyboard tapping and unstructured device monitoring; we attempt to further these works with structured tests taking into account 2D hand movement in addition to finger tapping. Our feasibility study uses keystroke and mouse movement data from a remotely conducted, structured, web-based test combined with self-reported PD status to create a predictive model for detecting the presence of PD. Objective: Analysis of finger tapping speed and accuracy through keyboard input and analysis of 2D hand movement through mouse input allowed differentiation between participants with and without PD. This comparative analysis enables us to establish clear distinctions between the two groups and explore the feasibility of using motor behavior to predict the presence of the disease. Methods: Participants were recruited via email by the Hawaii Parkinson Association (HPA) and directed to a web application for the tests. The 2023 HPA symposium was also used as a forum to recruit participants and spread information about our study. The application recorded participant demographics, including age, gender, and race, as well as PD status. We conducted a series of tests to assess finger tapping, using on-screen prompts to request key presses of constant and random keys. Response times, accuracy, and unintended movements resulting in accidental presses were recorded. Participants performed a hand movement test consisting of tracing straight and curved on-screen ribbons using a trackpad or mouse, allowing us to evaluate stability and precision of 2D hand movement. From this tracing, the test collected and stored insights concerning lower arm motor movement. Results: Our formative study included 31 participants, 18 without PD and 13 with PD, and analyzed their lower limb movement data collected from keyboards and computer mice. From the data set, we extracted 28 features and evaluated their significances using an extra tree classifier predictor. A random forest model was trained using the 6 most important features identified by the predictor. These selected features provided insights into precision and movement speed derived from keyboard tapping and mouse tracing tests. This final model achieved an average F1-score of 0.7311 (SD 0.1663) and an average accuracy of 0.7429 (SD 0.1400) over 20 runs for predicting the presence of PD. Conclusions: This preliminary feasibility study suggests the possibility of using technology-based limb movement data to predict the presence of PD, demonstrating the practicality of implementing this approach in a cost-effective and accessible manner. In addition, this study demonstrates that structured mouse movement tests can be used in combination with finger tapping to detect PD. UR - https://formative.jmir.org/2023/1/e49898 UR - http://dx.doi.org/10.2196/49898 UR - http://www.ncbi.nlm.nih.gov/pubmed/37773607 ID - info:doi/10.2196/49898 ER - TY - JOUR AU - Deng, Yuhan AU - Ma, Yuan AU - Fu, Jingzhu AU - Wang, Xiaona AU - Yu, Canqing AU - Lv, Jun AU - Man, Sailimai AU - Wang, Bo AU - Li, Liming PY - 2023/9/7 TI - Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study JO - JMIR Public Health Surveill SP - e47095 VL - 9 KW - machine learning KW - carotid plaque KW - health check-up KW - prediction KW - fatty liver KW - risk assessment KW - risk stratification KW - cardiovascular KW - logistic regression N2 - Background: Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective: This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods: Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results: Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions: The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver. UR - https://publichealth.jmir.org/2023/1/e47095 UR - http://dx.doi.org/10.2196/47095 UR - http://www.ncbi.nlm.nih.gov/pubmed/37676713 ID - info:doi/10.2196/47095 ER - TY - JOUR AU - Nagino, Ken AU - Okumura, Yuichi AU - Akasaki, Yasutsugu AU - Fujio, Kenta AU - Huang, Tianxiang AU - Sung, Jaemyoung AU - Midorikawa-Inomata, Akie AU - Fujimoto, Keiichi AU - Eguchi, Atsuko AU - Hurramhon, Shokirova AU - Yee, Alan AU - Miura, Maria AU - Ohno, Mizu AU - Hirosawa, Kunihiko AU - Morooka, Yuki AU - Murakami, Akira AU - Kobayashi, Hiroyuki AU - Inomata, Takenori PY - 2023/8/3 TI - Smartphone App?Based and Paper-Based Patient-Reported Outcomes Using a Disease-Specific Questionnaire for Dry Eye Disease: Randomized Crossover Equivalence Study JO - J Med Internet Res SP - e42638 VL - 25 KW - dry eye syndrome KW - mobile app KW - equivalence trial KW - Ocular Surface Disease Index KW - patient-reported outcome measures KW - mobile health KW - reliability KW - validity KW - telemedicine KW - precision medicine N2 - Background: Using traditional patient-reported outcomes (PROs), such as paper-based questionnaires, is cumbersome in the era of web-based medical consultation and telemedicine. Electronic PROs may reduce the burden on patients if implemented widely. Considering promising reports of DryEyeRhythm, our in-house mHealth smartphone app for investigating dry eye disease (DED) and the electronic and paper-based Ocular Surface Disease Index (OSDI) should be evaluated and compared to determine their equivalency. Objective: The purpose of this study is to assess the equivalence between smartphone app?based and paper-based questionnaires for DED. Methods: This prospective, nonblinded, randomized crossover study enrolled 34 participants between April 2022 and June 2022 at a university hospital in Japan. The participants were allocated randomly into 2 groups in a 1:1 ratio. The paper-app group initially responded to the paper-based Japanese version of the OSDI (J-OSDI), followed by the app-based J-OSDI. The app-paper group responded to similar questionnaires but in reverse order. We performed an equivalence test based on minimal clinically important differences to assess the equivalence of the J-OSDI total scores between the 2 platforms (paper-based vs app-based). A 95% CI of the mean difference between the J-OSDI total scores within the ±7.0 range between the 2 platforms indicated equivalence. The internal consistency and agreement of the app-based J-OSDI were assessed with Cronbach ? coefficients and intraclass correlation coefficient values. Results: A total of 33 participants were included in this study. The total scores for the app- and paper-based J-OSDI indicated satisfactory equivalence per our study definition (mean difference 1.8, 95% CI ?1.4 to 5.0). Moreover, the app-based J-OSDI total score demonstrated good internal consistency and agreement (Cronbach ?=.958; intraclass correlation=0.919; 95% CI 0.842 to 0.959) and was significantly correlated with its paper-based counterpart (Pearson correlation=0.932, P<.001). Conclusions: This study demonstrated the equivalence of PROs between the app- and paper-based J-OSDI. Implementing the app-based J-OSDI in various scenarios, including telehealth, may have implications for the early diagnosis of DED and longitudinal monitoring of PROs. UR - https://www.jmir.org/2023/1/e42638 UR - http://dx.doi.org/10.2196/42638 UR - http://www.ncbi.nlm.nih.gov/pubmed/37535409 ID - info:doi/10.2196/42638 ER - TY - JOUR AU - Harada, Yukinori AU - Tomiyama, Shusaku AU - Sakamoto, Tetsu AU - Sugimoto, Shu AU - Kawamura, Ren AU - Yokose, Masashi AU - Hayashi, Arisa AU - Shimizu, Taro PY - 2023/8/2 TI - Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence?Driven Automated History?Taking System: Pilot Cross-Sectional Study JO - JMIR Form Res SP - e49034 VL - 7 KW - collective intelligence KW - differential diagnosis generator KW - diagnostic accuracy KW - automated medical history taking system KW - artificial intelligence KW - AI N2 - Background: Low diagnostic accuracy is a major concern in automated medical history?taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective: The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods: We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)?driven automated medical history?taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history?taking system without reading the index lists generated by the automated medical history?taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians? input). Results: The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions: Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial. UR - https://formative.jmir.org/2023/1/e49034 UR - http://dx.doi.org/10.2196/49034 UR - http://www.ncbi.nlm.nih.gov/pubmed/37531164 ID - info:doi/10.2196/49034 ER - TY - JOUR AU - Hansun, Seng AU - Argha, Ahmadreza AU - Liaw, Siaw-Teng AU - Celler, G. Branko AU - Marks, B. Guy PY - 2023/7/3 TI - Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review JO - J Med Internet Res SP - e43154 VL - 25 KW - chest x-rays KW - convolutional neural networks KW - diagnostic test accuracy KW - machine and deep learning KW - PRISMA guidelines KW - risk of bias KW - QUADAS-2 KW - sensitivity and specificity KW - systematic literature review KW - tuberculosis detection N2 - Background: Tuberculosis (TB) was the leading infectious cause of mortality globally prior to COVID-19 and chest radiography has an important role in the detection, and subsequent diagnosis, of patients with this disease. The conventional experts reading has substantial within- and between-observer variability, indicating poor reliability of human readers. Substantial efforts have been made in utilizing various artificial intelligence?based algorithms to address the limitations of human reading of chest radiographs for diagnosing TB. Objective: This systematic literature review (SLR) aims to assess the performance of machine learning (ML) and deep learning (DL) in the detection of TB using chest radiography (chest x-ray [CXR]). Methods: In conducting and reporting the SLR, we followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A total of 309 records were identified from Scopus, PubMed, and IEEE (Institute of Electrical and Electronics Engineers) databases. We independently screened, reviewed, and assessed all available records and included 47 studies that met the inclusion criteria in this SLR. We also performed the risk of bias assessment using Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) and meta-analysis of 10 included studies that provided confusion matrix results. Results: Various CXR data sets have been used in the included studies, with 2 of the most popular ones being Montgomery County (n=29) and Shenzhen (n=36) data sets. DL (n=34) was more commonly used than ML (n=7) in the included studies. Most studies used human radiologist?s report as the reference standard. Support vector machine (n=5), k-nearest neighbors (n=3), and random forest (n=2) were the most popular ML approaches. Meanwhile, convolutional neural networks were the most commonly used DL techniques, with the 4 most popular applications being ResNet-50 (n=11), VGG-16 (n=8), VGG-19 (n=7), and AlexNet (n=6). Four performance metrics were popularly used, namely, accuracy (n=35), area under the curve (AUC; n=34), sensitivity (n=27), and specificity (n=23). In terms of the performance results, ML showed higher accuracy (mean ~93.71%) and sensitivity (mean ~92.55%), while on average DL models achieved better AUC (mean ~92.12%) and specificity (mean ~91.54%). Based on data from 10 studies that provided confusion matrix results, we estimated the pooled sensitivity and specificity of ML and DL methods to be 0.9857 (95% CI 0.9477-1.00) and 0.9805 (95% CI 0.9255-1.00), respectively. From the risk of bias assessment, 17 studies were regarded as having unclear risks for the reference standard aspect and 6 studies were regarded as having unclear risks for the flow and timing aspect. Only 2 included studies had built applications based on the proposed solutions. Conclusions: Findings from this SLR confirm the high potential of both ML and DL for TB detection using CXR. Future studies need to pay a close attention on 2 aspects of risk of bias, namely, the reference standard and the flow and timing aspects. Trial Registration: PROSPERO CRD42021277155; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=277155 UR - https://www.jmir.org/2023/1/e43154 UR - http://dx.doi.org/10.2196/43154 UR - http://www.ncbi.nlm.nih.gov/pubmed/37399055 ID - info:doi/10.2196/43154 ER - TY - JOUR AU - Selder, L. Jasper AU - Te Kolste, Jan Henryk AU - Twisk, Jos AU - Schijven, Marlies AU - Gielen, Willem AU - Allaart, P. Cornelis PY - 2023/5/26 TI - Accuracy of a Standalone Atrial Fibrillation Detection Algorithm Added to a Popular Wristband and Smartwatch: Prospective Diagnostic Accuracy Study JO - J Med Internet Res SP - e44642 VL - 25 KW - smartwatch KW - atrial fibrillation KW - algorithm KW - fibrillation detection KW - wristband KW - diagnose KW - heart rhythm KW - cardioversion KW - environment KW - software algorithm KW - artificial intelligence KW - AI KW - electrocardiography KW - ECG KW - EKG N2 - Background: Silent paroxysmal atrial fibrillation (AF) may be difficult to diagnose, and AF burden is hard to establish. In contrast to conventional diagnostic devices, photoplethysmography (PPG)?driven smartwatches or wristbands allow for long-term continuous heart rhythm assessment. However, most smartwatches lack an integrated PPG-AF algorithm. Adding a standalone PPG-AF algorithm to these wrist devices might open new possibilities for AF screening and burden assessment. Objective: The aim of this study was to assess the accuracy of a well-known standalone PPG-AF detection algorithm added to a popular wristband and smartwatch, with regard to discriminating AF and sinus rhythm, in a group of patients with AF before and after cardioversion (CV). Methods: Consecutive consenting patients with AF admitted for CV in a large academic hospital in Amsterdam, the Netherlands, were asked to wear a Biostrap wristband or Fitbit Ionic smartwatch with Fibricheck algorithm add-on surrounding the procedure. A set of 1-min PPG measurements and 12-lead reference electrocardiograms was obtained before and after CV. Rhythm assessment by the PPG device-software combination was compared with the 12-lead electrocardiogram. Results: A total of 78 patients were included in the Biostrap-Fibricheck cohort (156 measurement sets) and 73 patients in the Fitbit-Fibricheck cohort (143 measurement sets). Of the measurement sets, 19/156 (12%) and 7/143 (5%), respectively, were not classifiable by the PPG algorithm due to bad quality. The diagnostic performance in terms of sensitivity, specificity, positive predictive value, negative predictive value, and accuracy was 98%, 96%, 96%, 99%, 97%, and 97%, 100%, 100%, 97%, and 99%, respectively, at an AF prevalence of ~50%. Conclusions: This study demonstrates that the addition of a well-known standalone PPG-AF detection algorithm to a popular PPG smartwatch and wristband without integrated algorithm yields a high accuracy for the detection of AF, with an acceptable unclassifiable rate, in a semicontrolled environment. UR - https://www.jmir.org/2023/1/e44642 UR - http://dx.doi.org/10.2196/44642 UR - http://www.ncbi.nlm.nih.gov/pubmed/37234033 ID - info:doi/10.2196/44642 ER - TY - JOUR AU - Constantin, Aurora AU - Atkinson, Malcolm AU - Bernabeu, Oscar Miguel AU - Buckmaster, Fiona AU - Dhillon, Baljean AU - McTrusty, Alice AU - Strang, Niall AU - Williams, Robin PY - 2023/5/25 TI - Optometrists' Perspectives Regarding Artificial Intelligence Aids and Contributing Retinal Images to a Repository: Web-Based Interview Study JO - JMIR Hum Factors SP - e40887 VL - 10 KW - AI in optometry KW - repository of ocular images KW - user studies KW - AI decision support tools KW - perspectives of optometrists and ophthalmologists KW - AI KW - research KW - medical KW - decision support KW - tool KW - digital tool KW - digital N2 - Background: A repository of retinal images for research is being established in Scotland. It will permit researchers to validate, tune, and refine artificial intelligence (AI) decision-support algorithms to accelerate safe deployment in Scottish optometry and beyond. Research demonstrates the potential of AI systems in optometry and ophthalmology, though they are not yet widely adopted. Objective: In this study, 18 optometrists were interviewed to (1) identify their expectations and concerns about the national image research repository and their use of AI decision support and (2) gather their suggestions for improving eye health care. The goal was to clarify attitudes among optometrists delivering primary eye care with respect to contributing their patients? images and to using AI assistance. These attitudes are less well studied in primary care contexts. Five ophthalmologists were interviewed to discover their interactions with optometrists. Methods: Between March and August 2021, 23 semistructured interviews were conducted online lasting for 30-60 minutes. Transcribed and pseudonymized recordings were analyzed using thematic analysis. Results: All optometrists supported contributing retinal images to form an extensive and long-running research repository. Our main findings are summarized as follows. Optometrists were willing to share images of their patients? eyes but expressed concern about technical difficulties, lack of standardization, and the effort involved. Those interviewed thought that sharing digital images would improve collaboration between optometrists and ophthalmologists, for example, during referral to secondary health care. Optometrists welcomed an expanded primary care role in diagnosis and management of diseases by exploiting new technologies and anticipated significant health benefits. Optometrists welcomed AI assistance but insisted that it should not reduce their role and responsibilities. Conclusions: Our investigation focusing on optometrists is novel because most similar studies on AI assistance were performed in hospital settings. Our findings are consistent with those of studies with professionals in ophthalmology and other medical disciplines: showing near universal willingness to use AI to improve health care, alongside concerns over training, costs, responsibilities, skill retention, data sharing, and disruptions to professional practices. Our study on optometrists? willingness to contribute images to a research repository introduces a new aspect; they hope that a digital image sharing infrastructure will facilitate service integration. UR - https://humanfactors.jmir.org/2023/1/e40887 UR - http://dx.doi.org/10.2196/40887 UR - http://www.ncbi.nlm.nih.gov/pubmed/37227761 ID - info:doi/10.2196/40887 ER - TY - JOUR AU - Shahsavar, Yeganeh AU - Choudhury, Avishek PY - 2023/5/17 TI - User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study JO - JMIR Hum Factors SP - e47564 VL - 10 KW - human factors KW - behavioral intention KW - chatbots KW - health care KW - integrated diagnostics KW - use KW - ChatGPT KW - artificial intelligence KW - users KW - self-diagnosis KW - decision-making KW - integration KW - willingness KW - policy N2 - Background: With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have emerged as potential tools for various applications, including health care. However, ChatGPT is not specifically designed for health care purposes, and its use for self-diagnosis raises concerns regarding its adoption?s potential risks and benefits. Users are increasingly inclined to use ChatGPT for self-diagnosis, necessitating a deeper understanding of the factors driving this trend. Objective: This study aims to investigate the factors influencing users? perception of decision-making processes and intentions to use ChatGPT for self-diagnosis and to explore the implications of these findings for the safe and effective integration of AI chatbots in health care. Methods: A cross-sectional survey design was used, and data were collected from 607 participants. The relationships between performance expectancy, risk-reward appraisal, decision-making, and intention to use ChatGPT for self-diagnosis were analyzed using partial least squares structural equation modeling (PLS-SEM). Results: Most respondents were willing to use ChatGPT for self-diagnosis (n=476, 78.4%). The model demonstrated satisfactory explanatory power, accounting for 52.4% of the variance in decision-making and 38.1% in the intent to use ChatGPT for self-diagnosis. The results supported all 3 hypotheses: The higher performance expectancy of ChatGPT (?=.547, 95% CI 0.474-0.620) and positive risk-reward appraisals (?=.245, 95% CI 0.161-0.325) were positively associated with the improved perception of decision-making outcomes among users, and enhanced perception of decision-making processes involving ChatGPT positively impacted users? intentions to use the technology for self-diagnosis (?=.565, 95% CI 0.498-0.628). Conclusions: Our research investigated factors influencing users? intentions to use ChatGPT for self-diagnosis and health-related purposes. Even though the technology is not specifically designed for health care, people are inclined to use ChatGPT in health care contexts. Instead of solely focusing on discouraging its use for health care purposes, we advocate for improving the technology and adapting it for suitable health care applications. Our study highlights the importance of collaboration among AI developers, health care providers, and policy makers in ensuring AI chatbots? safe and responsible use in health care. By understanding users? expectations and decision-making processes, we can develop AI chatbots, such as ChatGPT, that are tailored to human needs, providing reliable and verified health information sources. This approach not only enhances health care accessibility but also improves health literacy and awareness. As the field of AI chatbots in health care continues to evolve, future research should explore the long-term effects of using AI chatbots for self-diagnosis and investigate their potential integration with other digital health interventions to optimize patient care and outcomes. In doing so, we can ensure that AI chatbots, including ChatGPT, are designed and implemented to safeguard users? well-being and support positive health outcomes in health care settings. UR - https://humanfactors.jmir.org/2023/1/e47564 UR - http://dx.doi.org/10.2196/47564 UR - http://www.ncbi.nlm.nih.gov/pubmed/37195756 ID - info:doi/10.2196/47564 ER - TY - JOUR AU - Hadjidimitriou, Stelios AU - Pagourelias, Efstathios AU - Apostolidis, Georgios AU - Dimaridis, Ioannis AU - Charisis, Vasileios AU - Bakogiannis, Constantinos AU - Hadjileontiadis, Leontios AU - Vassilikos, Vassilios PY - 2023/3/13 TI - Clinical Validation of an Artificial Intelligence?Based Tool for Automatic Estimation of Left Ventricular Ejection Fraction and Strain in Echocardiography: Protocol for a Two-Phase Prospective Cohort Study JO - JMIR Res Protoc SP - e44650 VL - 12 KW - artificial intelligence KW - clinical validation KW - computer-aided diagnosis KW - echocardiography KW - ejection fraction KW - global longitudinal strain KW - left ventricle KW - prospective cohort design KW - ultrasound N2 - Background: Echocardiography (ECHO) is a type of ultrasonographic procedure for examining the cardiac function and morphology, with functional parameters of the left ventricle (LV), such as the ejection fraction (EF) and global longitudinal strain (GLS), being important indicators. Estimation of LV-EF and LV-GLS is performed either manually or semiautomatically by cardiologists and requires a nonnegligible amount of time, while estimation accuracy depends on scan quality and the clinician?s experience in ECHO, leading to considerable measurement variability. Objective: The aim of this study is to externally validate the clinical performance of a trained artificial intelligence (AI)?based tool that automatically estimates LV-EF and LV-GLS from transthoracic ECHO scans and to produce preliminary evidence regarding its utility. Methods: This is a prospective cohort study conducted in 2 phases. ECHO scans will be collected from 120 participants referred for ECHO examination based on routine clinical practice in the Hippokration General Hospital, Thessaloniki, Greece. During the first phase, 60 scans will be processed by 15 cardiologists of different experience levels and the AI-based tool to determine whether the latter is noninferior in LV-EF and LV-GLS estimation accuracy (primary outcomes) compared to cardiologists. Secondary outcomes include the time required for estimation and Bland-Altman plots and intraclass correlation coefficients to assess measurement reliability for both the AI and cardiologists. In the second phase, the rest of the scans will be examined by the same cardiologists with and without the AI-based tool to primarily evaluate whether the combination of the cardiologist and the tool is superior in terms of correctness of LV function diagnosis (normal or abnormal) to the cardiologist?s routine examination practice, accounting for the cardiologist?s level of ECHO experience. Secondary outcomes include time to diagnosis and the system usability scale score. Reference LV-EF and LV-GLS measurements and LV function diagnoses will be provided by a panel of 3 expert cardiologists. Results: Recruitment started in September 2022, and data collection is ongoing. The results of the first phase are expected to be available by summer 2023, while the study will conclude in May 2024, with the end of the second phase. Conclusions: This study will provide external evidence regarding the clinical performance and utility of the AI-based tool based on prospectively collected ECHO scans in the routine clinical setting, thus reflecting real-world clinical scenarios. The study protocol may be useful to investigators conducting similar research. International Registered Report Identifier (IRRID): DERR1-10.2196/44650 UR - https://www.researchprotocols.org/2023/1/e44650 UR - http://dx.doi.org/10.2196/44650 UR - http://www.ncbi.nlm.nih.gov/pubmed/36912875 ID - info:doi/10.2196/44650 ER - TY - JOUR AU - Higa, Eduardo AU - Elbéji, Abir AU - Zhang, Lu AU - Fischer, Aurélie AU - Aguayo, A. Gloria AU - Nazarov, V. Petr AU - Fagherazzi, Guy PY - 2022/11/8 TI - Discovery and Analytical Validation of a Vocal Biomarker to Monitor Anosmia and Ageusia in Patients With COVID-19: Cross-sectional Study JO - JMIR Med Inform SP - e35622 VL - 10 IS - 11 KW - vocal biomarker KW - COVID-19 KW - ageusia KW - anosmia KW - loss of smell KW - loss of taste KW - digital assessment tool KW - digital health KW - medical informatics KW - telehealth KW - telemonitoring KW - biomarker KW - pandemic KW - symptoms KW - tool KW - disease KW - noninvasive KW - AI KW - artificial intelligence KW - digital KW - device N2 - Background: The COVID-19 disease has multiple symptoms, with anosmia and ageusia being the most prevalent, varying from 75% to 95% and from 50% to 80% of infected patients, respectively. An automatic assessment tool for these symptoms will help monitor the disease in a fast and noninvasive manner. Objective: We hypothesized that people with COVID-19 experiencing anosmia and ageusia had different voice features than those without such symptoms. Our objective was to develop an artificial intelligence pipeline to identify and internally validate a vocal biomarker of these symptoms for remotely monitoring them. Methods: This study used population-based data. Participants were assessed daily through a web-based questionnaire and asked to register 2 different types of voice recordings. They were adults (aged >18 years) who were confirmed by a polymerase chain reaction test to be positive for COVID-19 in Luxembourg and met the inclusion criteria. Statistical methods such as recursive feature elimination for dimensionality reduction, multiple statistical learning methods, and hypothesis tests were used throughout this study. The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) Prediction Model Development checklist was used to structure the research. Results: This study included 259 participants. Younger (aged <35 years) and female participants showed higher rates of ageusia and anosmia. Participants were aged 41 (SD 13) years on average, and the data set was balanced for sex (female: 134/259, 51.7%; male: 125/259, 48.3%). The analyzed symptom was present in 94 (36.3%) out of 259 participants and in 450 (27.5%) out of 1636 audio recordings. In all, 2 machine learning models were built, one for Android and one for iOS devices, and both had high accuracy?88% for Android and 85% for iOS. The final biomarker was then calculated using these models and internally validated. Conclusions: This study demonstrates that people with COVID-19 who have anosmia and ageusia have different voice features from those without these symptoms. Upon further validation, these vocal biomarkers could be nested in digital devices to improve symptom assessment in clinical practice and enhance the telemonitoring of COVID-19?related symptoms. Trial Registration: Clinicaltrials.gov NCT04380987; https://clinicaltrials.gov/ct2/show/NCT04380987 UR - https://medinform.jmir.org/2022/11/e35622 UR - http://dx.doi.org/10.2196/35622 UR - http://www.ncbi.nlm.nih.gov/pubmed/36265042 ID - info:doi/10.2196/35622 ER - TY - JOUR AU - Wendelboe, Aaron AU - Saber, Ibrahim AU - Dvorak, Justin AU - Adamski, Alys AU - Feland, Natalie AU - Reyes, Nimia AU - Abe, Karon AU - Ortel, Thomas AU - Raskob, Gary PY - 2022/8/5 TI - Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study JO - JMIR Bioinform Biotech SP - e36877 VL - 3 IS - 1 KW - venous thromboembolism KW - public health surveillance KW - machine learning KW - natural language processing KW - medical imaging review KW - public health N2 - Background: Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review. Objective: We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)?an NLP tool?in automatically classifying cases of VTE by ?reading? unstructured text from diagnostic imaging records collected from 2012 to 2014. Methods: After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians? comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05. Results: The VTE model of IDEAL-X ?read? 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI 93.7%-93.8%), 96.3% sensitivity (95% CI 96.2%-96.4%), 92% specificity (95% CI 91.9%-92%), an 89.1% positive predictive value (95% CI 89%-89.2%), and a 97.3% negative predictive value (95% CI 97.3%-97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI 97.8%-98%) than at the OUHSC (93.3%, 95% CI 93.1%-93.4%; P<.001), but the specificity was higher at the OUHSC (95.9%, 95% CI 95.8%-96%) than at Duke University (86.5%, 95% CI 86.4%-86.7%; P<.001). Conclusions: The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process. UR - https://bioinform.jmir.org/2022/1/e36877 UR - http://dx.doi.org/10.2196/36877 UR - http://www.ncbi.nlm.nih.gov/pubmed/37206160 ID - info:doi/10.2196/36877 ER - TY - JOUR AU - Yu, Fangzhou AU - Wu, Peixia AU - Deng, Haowen AU - Wu, Jingfang AU - Sun, Shan AU - Yu, Huiqian AU - Yang, Jianming AU - Luo, Xianyang AU - He, Jing AU - Ma, Xiulan AU - Wen, Junxiong AU - Qiu, Danhong AU - Nie, Guohui AU - Liu, Rizhao AU - Hu, Guohua AU - Chen, Tao AU - Zhang, Cheng AU - Li, Huawei PY - 2022/8/3 TI - A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study JO - J Med Internet Res SP - e34126 VL - 24 IS - 8 KW - vestibular disorders KW - machine learning KW - diagnostic model KW - vertigo KW - ENT KW - questionnaire N2 - Background: Questionnaires have been used in the past 2 decades to predict the diagnosis of vertigo and assist clinical decision-making. A questionnaire-based machine learning model is expected to improve the efficiency of diagnosis of vestibular disorders. Objective: This study aims to develop and validate a questionnaire-based machine learning model that predicts the diagnosis of vertigo. Methods: In this multicenter prospective study, patients presenting with vertigo entered a consecutive cohort at their first visit to the ENT and vertigo clinics of 7 tertiary referral centers from August 2019 to March 2021, with a follow-up period of 2 months. All participants completed a diagnostic questionnaire after eligibility screening. Patients who received only 1 final diagnosis by their treating specialists for their primary complaint were included in model development and validation. The data of patients enrolled before February 1, 2021 were used for modeling and cross-validation, while patients enrolled afterward entered external validation. Results: A total of 1693 patients were enrolled, with a response rate of 96.2% (1693/1760). The median age was 51 (IQR 38-61) years, with 991 (58.5%) females; 1041 (61.5%) patients received the final diagnosis during the study period. Among them, 928 (54.8%) patients were included in model development and validation, and 113 (6.7%) patients who enrolled later were used as a test set for external validation. They were classified into 5 diagnostic categories. We compared 9 candidate machine learning methods, and the recalibrated model of light gradient boosting machine achieved the best performance, with an area under the curve of 0.937 (95% CI 0.917-0.962) in cross-validation and 0.954 (95% CI 0.944-0.967) in external validation. Conclusions: The questionnaire-based light gradient boosting machine was able to predict common vestibular disorders and assist decision-making in ENT and vertigo clinics. Further studies with a larger sample size and the participation of neurologists will help assess the generalization and robustness of this machine learning method. UR - https://www.jmir.org/2022/8/e34126 UR - http://dx.doi.org/10.2196/34126 UR - http://www.ncbi.nlm.nih.gov/pubmed/35921135 ID - info:doi/10.2196/34126 ER - TY - JOUR AU - Ye, Siao AU - Sun, Kevin AU - Huynh, Duong AU - Phi, Q. Huy AU - Ko, Brian AU - Huang, Bin AU - Hosseini Ghomi, Reza PY - 2022/4/15 TI - A Computerized Cognitive Test Battery for Detection of Dementia and Mild Cognitive Impairment: Instrument Validation Study JO - JMIR Aging SP - e36825 VL - 5 IS - 2 KW - cognitive test KW - mild cognitive impairment KW - dementia KW - cognitive decline KW - repeatable battery KW - discriminant analysis N2 - Background: Early detection of dementia is critical for intervention and care planning but remains difficult. Computerized cognitive testing provides an accessible and promising solution to address these current challenges. Objective: The aim of this study was to evaluate a computerized cognitive testing battery (BrainCheck) for its diagnostic accuracy and ability to distinguish the severity of cognitive impairment. Methods: A total of 99 participants diagnosed with dementia, mild cognitive impairment (MCI), or normal cognition (NC) completed the BrainCheck battery. Statistical analyses compared participant performances on BrainCheck based on their diagnostic group. Results: BrainCheck battery performance showed significant differences between the NC, MCI, and dementia groups, achieving 88% or higher sensitivity and specificity (ie, true positive and true negative rates) for separating dementia from NC, and 77% or higher sensitivity and specificity in separating the MCI group from the NC and dementia groups. Three-group classification found true positive rates of 80% or higher for the NC and dementia groups and true positive rates of 64% or higher for the MCI group. Conclusions: BrainCheck was able to distinguish between diagnoses of dementia, MCI, and NC, providing a potentially reliable tool for early detection of cognitive impairment. UR - https://aging.jmir.org/2022/2/e36825 UR - http://dx.doi.org/10.2196/36825 UR - http://www.ncbi.nlm.nih.gov/pubmed/35436212 ID - info:doi/10.2196/36825 ER - TY - JOUR AU - Cheah, Wen-Ting AU - Hwang, Jwu-Jia AU - Hong, Sheng-Yi AU - Fu, Li-Chen AU - Chang, Yu-Ling AU - Chen, Ta-Fu AU - Chen, I-An AU - Chou, Chun-Chen PY - 2022/3/9 TI - A Digital Screening System for Alzheimer Disease Based on a Neuropsychological Test and a Convolutional Neural Network: System Development and Validation JO - JMIR Med Inform SP - e31106 VL - 10 IS - 3 KW - Alzheimer disease KW - mild cognitive impairment KW - screening system KW - convolutional neural network KW - Rey-Osterrieth Complex Figure N2 - Background: Alzheimer disease (AD) and other types of dementia are now considered one of the world?s most pressing health problems for aging people worldwide. It was the seventh-leading cause of death, globally, in 2019. With a growing number of patients with dementia and increasing costs for treatment and care, early detection of the disease at the stage of mild cognitive impairment (MCI) will prevent the rapid progression of dementia. In addition to reducing the physical and psychological stress of patients? caregivers in the long term, it will also improve the everyday quality of life of patients. Objective: The aim of this study was to design a digital screening system to discriminate between patients with MCI and AD and healthy controls (HCs), based on the Rey-Osterrieth Complex Figure (ROCF) neuropsychological test. Methods: The study took place at National Taiwan University between 2018 and 2019. In order to develop the system, pretraining was performed using, and features were extracted from, an open sketch data set using a data-driven deep learning approach through a convolutional neural network. Later, the learned features were transferred to our collected data set to further train the classifier. The first data set was collected using pen and paper for the traditional method. The second data set used a tablet and smart pen for data collection. The system?s performance was then evaluated using the data sets. Results: The performance of the designed system when using the data set that was collected using the traditional pen and paper method resulted in a mean area under the receiver operating characteristic curve (AUROC) of 0.913 (SD 0.004) when distinguishing between patients with MCI and HCs. On the other hand, when discriminating between patients with AD and HCs, the mean AUROC was 0.950 (SD 0.003) when using the data set that was collected using the digitalized method. Conclusions: The automatic ROCF test scoring system that we designed showed satisfying results for differentiating between patients with AD and MCI and HCs. Comparatively, our proposed network architecture provided better performance than our previous work, which did not include data augmentation and dropout techniques. In addition, it also performed better than other existing network architectures, such as AlexNet and Sketch-a-Net, with transfer learning techniques. The proposed system can be incorporated with other tests to assist clinicians in the early diagnosis of AD and to reduce the physical and mental burden on patients? family and friends. UR - https://medinform.jmir.org/2022/3/e31106 UR - http://dx.doi.org/10.2196/31106 UR - http://www.ncbi.nlm.nih.gov/pubmed/35262497 ID - info:doi/10.2196/31106 ER - TY - JOUR AU - Hong, Na AU - Liu, Chun AU - Gao, Jianwei AU - Han, Lin AU - Chang, Fengxiang AU - Gong, Mengchun AU - Su, Longxiang PY - 2022/3/3 TI - State of the Art of Machine Learning?Enabled Clinical Decision Support in Intensive Care Units: Literature Review JO - JMIR Med Inform SP - e28781 VL - 10 IS - 3 KW - machine learning KW - intensive care units KW - clinical decision support KW - prediction model KW - artificial intelligence KW - electronic health records N2 - Background: Modern clinical care in intensive care units is full of rich data, and machine learning has great potential to support clinical decision-making. The development of intelligent machine learning?based clinical decision support systems is facing great opportunities and challenges. Clinical decision support systems may directly help clinicians accurately diagnose, predict outcomes, identify risk events, or decide treatments at the point of care. Objective: We aimed to review the research and application of machine learning?enabled clinical decision support studies in intensive care units to help clinicians, researchers, developers, and policy makers better understand the advantages and limitations of machine learning?supported diagnosis, outcome prediction, risk event identification, and intensive care unit point-of-care recommendations. Methods: We searched papers published in the PubMed database between January 1980 and October 2020. We defined selection criteria to identify papers that focused on machine learning?enabled clinical decision support studies in intensive care units and reviewed the following aspects: research topics, study cohorts, machine learning models, analysis variables, and evaluation metrics. Results: A total of 643 papers were collected, and using our selection criteria, 97 studies were found. Studies were categorized into 4 topics?monitoring, detection, and diagnosis (13/97, 13.4%), early identification of clinical events (32/97, 33.0%), outcome prediction and prognosis assessment (46/97, 47.6%), and treatment decision (6/97, 6.2%). Of the 97 papers, 82 (84.5%) studies used data from adult patients, 9 (9.3%) studies used data from pediatric patients, and 6 (6.2%) studies used data from neonates. We found that 65 (67.0%) studies used data from a single center, and 32 (33.0%) studies used a multicenter data set; 88 (90.7%) studies used supervised learning, 3 (3.1%) studies used unsupervised learning, and 6 (6.2%) studies used reinforcement learning. Clinical variable categories, starting with the most frequently used, were demographic (n=74), laboratory values (n=59), vital signs (n=55), scores (n=48), ventilation parameters (n=43), comorbidities (n=27), medications (n=18), outcome (n=14), fluid balance (n=13), nonmedicine therapy (n=10), symptoms (n=7), and medical history (n=4). The most frequently adopted evaluation metrics for clinical data modeling studies included area under the receiver operating characteristic curve (n=61), sensitivity (n=51), specificity (n=41), accuracy (n=29), and positive predictive value (n=23). Conclusions: Early identification of clinical and outcome prediction and prognosis assessment contributed to approximately 80% of studies included in this review. Using new algorithms to solve intensive care unit clinical problems by developing reinforcement learning, active learning, and time-series analysis methods for clinical decision support will be greater development prospects in the future. UR - https://medinform.jmir.org/2022/3/e28781 UR - http://dx.doi.org/10.2196/28781 UR - http://www.ncbi.nlm.nih.gov/pubmed/35238790 ID - info:doi/10.2196/28781 ER - TY - JOUR AU - Kalafatis, Chris AU - Modarres, Hadi Mohammad AU - Apostolou, Panos AU - Tabet, Naji AU - Khaligh-Razavi, Seyed-Mahdi PY - 2022/1/27 TI - The Use of a Computerized Cognitive Assessment to Improve the Efficiency of Primary Care Referrals to Memory Services: Protocol for the Accelerating Dementia Pathway Technologies (ADePT) Study JO - JMIR Res Protoc SP - e34475 VL - 11 IS - 1 KW - primary health care KW - general practice KW - dementia KW - cognitive assessment KW - artificial intelligence KW - early diagnosis KW - cognition KW - assessment KW - efficiency KW - diagnosis KW - COVID-19 KW - memory KW - mental health KW - impairment KW - screening KW - detection N2 - Background: Existing primary care cognitive assessment tools are crude or time-consuming screening instruments which can only detect cognitive impairment when it is well established. Due to the COVID-19 pandemic, memory services have adapted to the new environment by moving to remote patient assessments to continue meeting service user demand. However, the remote use of cognitive assessments has been variable while there has been scant evaluation of the outcome of such a change in clinical practice. Emerging research in remote memory clinics has highlighted computerized cognitive tests, such as the Integrated Cognitive Assessment (ICA), as prominent candidates for adoption in clinical practice both during the pandemic and for post-COVID-19 implementation as part of health care innovation. Objective: The aim of the Accelerating Dementia Pathway Technologies (ADePT) study is to develop a real-world evidence basis to support the adoption of ICA as an inexpensive screening tool for the detection of cognitive impairment to improve the efficiency of the dementia care pathway. Methods: Patients who have been referred to a memory clinic by a general practitioner (GP) are recruited. Participants complete the ICA either at home or in the clinic along with medical history and usability questionnaires. The GP referral and ICA outcome are compared with the specialist diagnosis obtained at the memory clinic. The clinical outcomes as well as National Health Service reference costing data will be used to assess the potential health and economic benefits of the use of the ICA in the dementia diagnosis pathway. Results: The ADePT study was funded in January 2020 by Innovate UK (Project Number 105837). As of September 2021, 86 participants have been recruited in the study, with 23 participants also completing a retest visit. Initially, the study was designed for in-person visits at the memory clinic; however, in light of the COVID-19 pandemic, the study was amended to allow remote as well as face-to-face visits. The study was also expanded from a single site to 4 sites in the United Kingdom. We expect results to be published by the second quarter of 2022. Conclusions: The ADePT study aims to improve the efficiency of the dementia care pathway at its very beginning and supports systems integration at the intersection between primary and secondary care. The introduction of a standardized, self-administered, digital assessment tool for the timely detection of neurodegeneration as part of a decision support system that can signpost accordingly can reduce unnecessary referrals, service backlog, and assessment variability. Trial Registration: ISRCTN 16596456; https://www.isrctn.com/ISRCTN16596456 International Registered Report Identifier (IRRID): DERR1-10.2196/34475 UR - https://www.researchprotocols.org/2022/1/e34475 UR - http://dx.doi.org/10.2196/34475 UR - http://www.ncbi.nlm.nih.gov/pubmed/34932495 ID - info:doi/10.2196/34475 ER - TY - JOUR AU - Madalinski, Mariusz AU - Prudham, Roger PY - 2021/12/24 TI - Can Real-time Computer-Aided Detection Systems Diminish the Risk of Postcolonoscopy Colorectal Cancer? JO - JMIR Med Inform SP - e25328 VL - 9 IS - 12 KW - artificial intelligence KW - colonoscopy KW - adenoma KW - real-time computer-aided detection KW - colonic polyp UR - https://medinform.jmir.org/2021/12/e25328 UR - http://dx.doi.org/10.2196/25328 UR - http://www.ncbi.nlm.nih.gov/pubmed/34571490 ID - info:doi/10.2196/25328 ER - TY - JOUR AU - Alamgir, Asma AU - Mousa, Osama AU - Shah, Zubair PY - 2021/12/17 TI - Artificial Intelligence in Predicting Cardiac Arrest: Scoping Review JO - JMIR Med Inform SP - e30798 VL - 9 IS - 12 KW - artificial intelligence KW - machine learning KW - deep learning KW - cardiac arrest KW - predict N2 - Background: Cardiac arrest is a life-threatening cessation of activity in the heart. Early prediction of cardiac arrest is important, as it allows for the necessary measures to be taken to prevent or intervene during the onset. Artificial intelligence (AI) technologies and big data have been increasingly used to enhance the ability to predict and prepare for the patients at risk. Objective: This study aims to explore the use of AI technology in predicting cardiac arrest as reported in the literature. Methods: A scoping review was conducted in line with the guidelines of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extension for scoping reviews. Scopus, ScienceDirect, Embase, the Institute of Electrical and Electronics Engineers, and Google Scholar were searched to identify relevant studies. Backward reference list checks of the included studies were also conducted. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively. Results: Out of 697 citations retrieved, 41 studies were included in the review, and 6 were added after backward citation checking. The included studies reported the use of AI in the prediction of cardiac arrest. Of the 47 studies, we were able to classify the approaches taken by the studies into 3 different categories: 26 (55%) studies predicted cardiac arrest by analyzing specific parameters or variables of the patients, whereas 16 (34%) studies developed an AI-based warning system. The remaining 11% (5/47) of studies focused on distinguishing patients at high risk of cardiac arrest from patients who were not at risk. Two studies focused on the pediatric population, and the rest focused on adults (45/47, 96%). Most of the studies used data sets with a size of <10,000 samples (32/47, 68%). Machine learning models were the most prominent branch of AI used in the prediction of cardiac arrest in the studies (38/47, 81%), and the most used algorithm was the neural network (23/47, 49%). K-fold cross-validation was the most used algorithm evaluation tool reported in the studies (24/47, 51%). Conclusions: AI is extensively used to predict cardiac arrest in different patient settings. Technology is expected to play an integral role in improving cardiac medicine. There is a need for more reviews to learn the obstacles to the implementation of AI technologies in clinical settings. Moreover, research focusing on how to best provide clinicians with support to understand, adapt, and implement this technology in their practice is also necessary. UR - https://medinform.jmir.org/2021/12/e30798 UR - http://dx.doi.org/10.2196/30798 UR - http://www.ncbi.nlm.nih.gov/pubmed/34927595 ID - info:doi/10.2196/30798 ER - TY - JOUR AU - Hah, Hyeyoung AU - Goldin, Shevit Deana PY - 2021/12/16 TI - How Clinicians Perceive Artificial Intelligence?Assisted Technologies in Diagnostic Decision Making: Mixed Methods Approach JO - J Med Internet Res SP - e33540 VL - 23 IS - 12 KW - artificial intelligence algorithms KW - AI KW - diagnostic capability KW - virtual care KW - multilevel modeling KW - human-AI teaming KW - natural language understanding N2 - Background: With the rapid development of artificial intelligence (AI) and related technologies, AI algorithms are being embedded into various health information technologies that assist clinicians in clinical decision making. Objective: This study aimed to explore how clinicians perceive AI assistance in diagnostic decision making and suggest the paths forward for AI-human teaming for clinical decision making in health care. Methods: This study used a mixed methods approach, utilizing hierarchical linear modeling and sentiment analysis through natural language understanding techniques. Results: A total of 114 clinicians participated in online simulation surveys in 2020 and 2021. These clinicians studied family medicine and used AI algorithms to aid in patient diagnosis. Their overall sentiment toward AI-assisted diagnosis was positive and comparable with diagnoses made without the assistance of AI. However, AI-guided decision making was not congruent with the way clinicians typically made decisions in diagnosing illnesses. In a quantitative survey, clinicians reported perceiving current AI assistance as not likely to enhance diagnostic capability and negatively influenced their overall performance (?=?0.421, P=.02). Instead, clinicians? diagnostic capabilities tended to be associated with well-known parameters, such as education, age, and daily habit of technology use on social media platforms. Conclusions: This study elucidated clinicians? current perceptions and sentiments toward AI-enabled diagnosis. Although the sentiment was positive, the current form of AI assistance may not be linked with efficient decision making, as AI algorithms are not well aligned with subjective human reasoning in clinical diagnosis. Developers and policy makers in health could gather behavioral data from clinicians in various disciplines to help align AI algorithms with the unique subjective patterns of reasoning that humans employ in clinical diagnosis. UR - https://www.jmir.org/2021/12/e33540 UR - http://dx.doi.org/10.2196/33540 UR - http://www.ncbi.nlm.nih.gov/pubmed/34924356 ID - info:doi/10.2196/33540 ER - TY - JOUR AU - Bang, Seok Chang AU - Lee, Jun Jae AU - Baik, Ho Gwang PY - 2021/12/14 TI - Computer-Aided Diagnosis of Gastrointestinal Ulcer and Hemorrhage Using Wireless Capsule Endoscopy: Systematic Review and Diagnostic Test Accuracy Meta-analysis JO - J Med Internet Res SP - e33267 VL - 23 IS - 12 KW - artificial intelligence KW - computer-aided diagnosis KW - capsule endoscopy KW - ulcer KW - hemorrhage KW - gastrointestinal KW - endoscopy KW - review KW - accuracy KW - meta-analysis KW - diagnostic KW - performance KW - machine learning KW - prediction models N2 - Background: Interpretation of capsule endoscopy images or movies is operator-dependent and time-consuming. As a result, computer-aided diagnosis (CAD) has been applied to enhance the efficacy and accuracy of the review process. Two previous meta-analyses reported the diagnostic performance of CAD models for gastrointestinal ulcers or hemorrhage in capsule endoscopy. However, insufficient systematic reviews have been conducted, which cannot determine the real diagnostic validity of CAD models. Objective: To evaluate the diagnostic test accuracy of CAD models for gastrointestinal ulcers or hemorrhage using wireless capsule endoscopic images. Methods: We conducted core databases searching for studies based on CAD models for the diagnosis of ulcers or hemorrhage using capsule endoscopy and presenting data on diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Overall, 39 studies were included. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of ulcers (or erosions) were .97 (95% confidence interval, .95?.98), .93 (.89?.95), .92 (.89?.94), and 138 (79?243), respectively. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of hemorrhage (or angioectasia) were .99 (.98?.99), .96 (.94?0.97), .97 (.95?.99), and 888 (343?2303), respectively. Subgroup analyses showed robust results. Meta-regression showed that published year, number of training images, and target disease (ulcers vs erosions, hemorrhage vs angioectasia) was found to be the source of heterogeneity. No publication bias was detected. Conclusions: CAD models showed high performance for the optical diagnosis of gastrointestinal ulcer and hemorrhage in wireless capsule endoscopy. UR - https://www.jmir.org/2021/12/e33267 UR - http://dx.doi.org/10.2196/33267 UR - http://www.ncbi.nlm.nih.gov/pubmed/34904949 ID - info:doi/10.2196/33267 ER - TY - JOUR AU - Maile, Howard AU - Li, Olivia Ji-Peng AU - Gore, Daniel AU - Leucci, Marcello AU - Mulholland, Padraig AU - Hau, Scott AU - Szabo, Anita AU - Moghul, Ismail AU - Balaskas, Konstantinos AU - Fujinami, Kaoru AU - Hysi, Pirro AU - Davidson, Alice AU - Liskova, Petra AU - Hardcastle, Alison AU - Tuft, Stephen AU - Pontikos, Nikolas PY - 2021/12/13 TI - Machine Learning Algorithms to Detect Subclinical Keratoconus: Systematic Review JO - JMIR Med Inform SP - e27363 VL - 9 IS - 12 KW - artificial intelligence KW - machine learning KW - cornea KW - keratoconus KW - corneal tomography KW - subclinical KW - corneal imaging KW - decision support systems KW - corneal disease KW - keratometry N2 - Background: Keratoconus is a disorder characterized by progressive thinning and distortion of the cornea. If detected at an early stage, corneal collagen cross-linking can prevent disease progression and further visual loss. Although advanced forms are easily detected, reliable identification of subclinical disease can be problematic. Several different machine learning algorithms have been used to improve the detection of subclinical keratoconus based on the analysis of multiple types of clinical measures, such as corneal imaging, aberrometry, or biomechanical measurements. Objective: The aim of this study is to survey and critically evaluate the literature on the algorithmic detection of subclinical keratoconus and equivalent definitions. Methods: For this systematic review, we performed a structured search of the following databases: MEDLINE, Embase, and Web of Science and Cochrane Library from January 1, 2010, to October 31, 2020. We included all full-text studies that have used algorithms for the detection of subclinical keratoconus and excluded studies that did not perform validation. This systematic review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) recommendations. Results: We compared the measured parameters and the design of the machine learning algorithms reported in 26 papers that met the inclusion criteria. All salient information required for detailed comparison, including diagnostic criteria, demographic data, sample size, acquisition system, validation details, parameter inputs, machine learning algorithm, and key results are reported in this study. Conclusions: Machine learning has the potential to improve the detection of subclinical keratoconus or early keratoconus in routine ophthalmic practice. Currently, there is no consensus regarding the corneal parameters that should be included for assessment and the optimal design for the machine learning algorithm. We have identified avenues for further research to improve early detection and stratification of patients for early treatment to prevent disease progression. UR - https://medinform.jmir.org/2021/12/e27363 UR - http://dx.doi.org/10.2196/27363 UR - http://www.ncbi.nlm.nih.gov/pubmed/34898463 ID - info:doi/10.2196/27363 ER - TY - JOUR AU - Cha, Dongchul AU - Pae, Chongwon AU - Lee, A. Se AU - Na, Gina AU - Hur, Kyun Young AU - Lee, Young Ho AU - Cho, Ra A. AU - Cho, Joon Young AU - Han, Gil Sang AU - Kim, Huhn Sung AU - Choi, Young Jae AU - Park, Hae-Jeong PY - 2021/12/8 TI - Differential Biases and Variabilities of Deep Learning?Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study JO - JMIR Med Inform SP - e33049 VL - 9 IS - 12 KW - human-machine cooperation KW - convolutional neural network KW - deep learning, class imbalance problem KW - otoscopy KW - eardrum KW - artificial intelligence KW - otology KW - computer-aided diagnosis N2 - Background: Deep learning (DL)?based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application. Objective: This study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems. Methods: We used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models. Results: Our ML models had consistently high accuracies (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%), equivalent to those of otolaryngologists (balanced: mean 71.17%, SD 3.37%; imbalanced: mean 72.84%, SD 6.41%) and far better than those of nonotolaryngologists (balanced: mean 45.63%, SD 7.89%; imbalanced: mean 44.08%, SD 15.83%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07). Conclusions: Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation. UR - https://medinform.jmir.org/2021/12/e33049 UR - http://dx.doi.org/10.2196/33049 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889764 ID - info:doi/10.2196/33049 ER - TY - JOUR AU - Park, Dohyun AU - Cho, Jin Soo AU - Kim, Kyunga AU - Woo, Hyunki AU - Kim, Eun Jee AU - Lee, Jin-Young AU - Koh, Janghyun AU - Lee, JeanHyoung AU - Choi, Soo Jong AU - Chang, Kyung Dong AU - Choi, Yoon-Ho AU - Chung, In Ji AU - Cha, Chul Won AU - Jeong, Soon Ok AU - Jekal, Yong Se AU - Kang, Mira PY - 2021/12/8 TI - Prediction Algorithms for Blood Pressure Based on Pulse Wave Velocity Using Health Checkup Data in Healthy Korean Men: Algorithm Development and Validation JO - JMIR Med Inform SP - e29212 VL - 9 IS - 12 KW - blood pressure KW - pulse transit time KW - pulse wave velocity KW - prediction model KW - algorithms KW - medical informatics KW - wearable devices N2 - Background: Pulse transit time and pulse wave velocity (PWV) are related to blood pressure (BP), and there were continuous attempts to use these to predict BP through wearable devices. However, previous studies were conducted on a small scale and could not confirm the relative importance of each variable in predicting BP. Objective: This study aims to predict systolic blood pressure and diastolic blood pressure based on PWV and to evaluate the relative importance of each clinical variable used in BP prediction models. Methods: This study was conducted on 1362 healthy men older than 18 years who visited the Samsung Medical Center. The systolic blood pressure and diastolic blood pressure were estimated using the multiple linear regression method. Models were divided into two groups based on age: younger than 60 years and 60 years or older; 200 seeds were repeated in consideration of partition bias. Mean of error, absolute error, and root mean square error were used as performance metrics. Results: The model divided into two age groups (younger than 60 years and 60 years and older) performed better than the model without division. The performance difference between the model using only three variables (PWV, BMI, age) and the model using 17 variables was not significant. Our final model using PWV, BMI, and age met the criteria presented by the American Association for the Advancement of Medical Instrumentation. The prediction errors were within the range of about 9 to 12 mmHg that can occur with a gold standard mercury sphygmomanometer. Conclusions: Dividing age based on the age of 60 years showed better BP prediction performance, and it could show good performance even if only PWV, BMI, and age variables were included. Our final model with the minimal number of variables (PWB, BMI, age) would be efficient and feasible for predicting BP. UR - https://medinform.jmir.org/2021/12/e29212 UR - http://dx.doi.org/10.2196/29212 UR - http://www.ncbi.nlm.nih.gov/pubmed/34889753 ID - info:doi/10.2196/29212 ER - TY - JOUR AU - Ben-Shabat, Niv AU - Sloma, Ariel AU - Weizman, Tomer AU - Kiderman, David AU - Amital, Howard PY - 2021/11/30 TI - Assessing the Performance of a New Artificial Intelligence?Driven Diagnostic Support Tool Using Medical Board Exam Simulations: Clinical Vignette Study JO - JMIR Med Inform SP - e32507 VL - 9 IS - 11 KW - diagnostic decision support systems KW - diagnostic support KW - medical decision-making KW - medical informatics KW - artificial intelligence KW - Kahun KW - decision support N2 - Background: Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians? diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products. Objective: The aim of this study was to conduct a preliminary analysis of the diagnostic performance of ?Kahun,? a new artificial intelligence-driven diagnostic tool. Methods: Diagnostic performance was evaluated based on the program?s ability to ?solve? clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness. Results: The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5%; 95% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8%; 95% CI 66-87) were suggested within the top 10, and 168 (61.8%; 95% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68%; 95% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48%; 95% CI 38-59) within the top 10, and 24 (26%; 95% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96%; 95% CI 91-100), at least 2 out of 3 of the cases? expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86%; 95% CI 79-93) were suggested within the top 10; and 61 (67%; 95% CI 57-77) within the top 5. Conclusions: The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings. UR - https://medinform.jmir.org/2021/11/e32507 UR - http://dx.doi.org/10.2196/32507 UR - http://www.ncbi.nlm.nih.gov/pubmed/34672262 ID - info:doi/10.2196/32507 ER - TY - JOUR AU - Hou, Xinyao AU - Zhang, Yu AU - Wang, Yanping AU - Wang, Xinyi AU - Zhao, Jiahao AU - Zhu, Xiaobo AU - Su, Jianbo PY - 2021/11/19 TI - A Markerless 2D Video, Facial Feature Recognition?Based, Artificial Intelligence Model to Assist With Screening for Parkinson Disease: Development and Usability Study JO - J Med Internet Res SP - e29554 VL - 23 IS - 11 KW - Parkinson disease KW - facial features KW - artificial intelligence KW - diagnosis N2 - Background: Masked face is a characteristic clinical manifestation of Parkinson disease (PD), but subjective evaluations from different clinicians often show low consistency owing to a lack of accurate detection technology. Hence, it is of great significance to develop methods to make monitoring easier and more accessible. Objective: The study aimed to develop a markerless 2D video, facial feature recognition?based, artificial intelligence (AI) model to assess facial features of PD patients and investigate how AI could help neurologists improve the performance of early PD diagnosis. Methods: We collected 140 videos of facial expressions from 70 PD patients and 70 matched controls from 3 hospitals using a single 2D video camera. We developed and tested an AI model that performs masked face recognition of PD patients based on the acquisition and evaluation of facial features including geometric and texture features. Random forest, support vector machines, and k-nearest neighbor were used to train the model. The diagnostic performance of the AI model was compared with that of 5 neurologists. Results: The experimental results showed that our AI models can achieve feasible and effective facial feature recognition ability to assist with PD diagnosis. The accuracy of PD diagnosis can reach 83% using geometric features. And with the model trained by random forest, the accuracy of texture features is up to 86%. When these 2 features are combined, an F1 value of 88% can be reached, where the random forest algorithm is used. Further, the facial features of patients with PD were not associated with the motor and nonmotor symptoms of PD. Conclusions: PD patients commonly exhibit masked facial features. Videos of a facial feature recognition?based AI model can provide a valuable tool to assist with PD diagnosis and the potential of realizing remote monitoring of the patient?s condition, especially during the COVID-19 pandemic. UR - https://www.jmir.org/2021/11/e29554 UR - http://dx.doi.org/10.2196/29554 UR - http://www.ncbi.nlm.nih.gov/pubmed/34806994 ID - info:doi/10.2196/29554 ER - TY - JOUR AU - Marley, Gifty AU - Fu, Gengfeng AU - Zhang, Ye AU - Li, Jianjun AU - Tucker, D. Joseph AU - Tang, Weiming AU - Yu, Rongbin PY - 2021/11/19 TI - Willingness of Chinese Men Who Have Sex With Men to Use Smartphone-Based Electronic Readers for HIV Self-testing: Web-Based Cross-sectional Study JO - J Med Internet Res SP - e26480 VL - 23 IS - 11 KW - smartphone-based electronic reader KW - electronic readers KW - HIV self-testing KW - HIVST KW - self-testing KW - cellular phone?based readers KW - mHealth N2 - Background: The need for strategies to encourage user-initiated reporting of results after HIV self-testing (HIVST) persists. Smartphone-based electronic readers (SERs) have been shown capable of reading diagnostics results accurately in point-of-care diagnostics and could bridge the current gaps between HIVST and linkage to care. Objective: Our study aimed to assess the willingness of Chinese men who have sex with men (MSM) in the Jiangsu province to use an SER for HIVST through a web-based cross-sectional study. Methods: From February to April 2020, we conducted a convenience web-based survey among Chinese MSM by using a pretested structured questionnaire. Survey items were adapted from previous HIVST feasibility studies and modified as required. Prior to answering reader-related questions, participants watched a video showcasing a prototype SER. Statistical analysis included descriptive analysis, chi-squared test, and multivariable logistic regression. P values less than .05 were deemed statistically significant. Results: Of 692 participants, 369 (53.3%) were aged 26-40 years, 456 (65.9%) had ever self-tested for HIV, and 493 (71.2%) were willing to use an SER for HIVST. Approximately 98% (483/493) of the willing participants, 85.3% (459/538) of ever self-tested and never self-tested, and 40% (46/115) of unwilling participants reported that SERs would increase their HIVST frequency. Engaging in unprotected anal intercourse with regular partners compared to consistently using condoms (adjusted odds ratio [AOR] 3.04, 95% CI 1.19-7.74) increased the odds of willingness to use an SER for HIVST. Participants who had ever considered HIVST at home with a partner right before sex compared to those who had not (AOR 2.99, 95% CI 1.13-7.90) were also more willing to use an SER for HIVST. Playing receptive roles during anal intercourse compared to playing insertive roles (AOR 0.05, 95% CI 0.02-0.14) was associated with decreased odds of being willing to use an SER for HIVST. The majority of the participants (447/608, 73.5%) preferred to purchase readers from local Centers of Disease Control and Prevention offices and 51.2% (311/608) of the participants were willing to pay less than US $4.70 for a reader device. Conclusions: The majority of the Chinese MSM, especially those with high sexual risk behaviors, were willing to use an SER for HIVST. Many MSM were also willing to self-test more frequently for HIV with an SER. Further research is needed to ascertain the diagnostic and real-time data-capturing capacity of prototype SERs during HIVST. UR - https://www.jmir.org/2021/11/e26480 UR - http://dx.doi.org/10.2196/26480 UR - http://www.ncbi.nlm.nih.gov/pubmed/34806988 ID - info:doi/10.2196/26480 ER - TY - JOUR AU - Jan, Zainab AU - AI-Ansari, Noor AU - Mousa, Osama AU - Abd-alrazaq, Alaa AU - Ahmed, Arfan AU - Alam, Tanvir AU - Househ, Mowafa PY - 2021/11/19 TI - The Role of Machine Learning in Diagnosing Bipolar Disorder: Scoping Review JO - J Med Internet Res SP - e29749 VL - 23 IS - 11 KW - machine learning KW - bipolar disorder KW - diagnosis KW - support vector machine KW - clinical data KW - mental health KW - scoping review N2 - Background: Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD. Objective: This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes. Methods: The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed. Results: We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58%). We identified different machine learning models used in the selected studies, including classification models (18, 55%), regression models (5, 16%), model-based clustering methods (2, 6%), natural language processing (1, 3%), clustering algorithms (1, 3%), and deep learning?based models (3, 9%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98%, whereas the minimum accuracy range was 64%. Conclusions: This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry. UR - https://www.jmir.org/2021/11/e29749 UR - http://dx.doi.org/10.2196/29749 UR - http://www.ncbi.nlm.nih.gov/pubmed/34806996 ID - info:doi/10.2196/29749 ER - TY - JOUR AU - Kim, Taewoo AU - Lee, Hyun Dong AU - Park, Eun-Kee AU - Choi, Sanghun PY - 2021/11/18 TI - Deep Learning Techniques for Fatty Liver Using Multi-View Ultrasound Images Scanned by Different Scanners: Development and Validation Study JO - JMIR Med Inform SP - e30066 VL - 9 IS - 11 KW - fatty liver KW - deep learning KW - transfer learning KW - classification KW - regression KW - magnetic resonance imaging?proton density fat fraction KW - multi-view ultrasound images KW - artificial intelligence KW - machine imaging KW - imaging KW - informatics KW - fatty liver disease KW - detection KW - diagnosis N2 - Background: Fat fraction values obtained from magnetic resonance imaging (MRI) can be used to obtain an accurate diagnosis of fatty liver diseases. However, MRI is expensive and cannot be performed for everyone. Objective: In this study, we aim to develop multi-view ultrasound image?based convolutional deep learning models to detect fatty liver disease and yield fat fraction values. Methods: We extracted 90 ultrasound images of the right intercostal view and 90 ultrasound images of the right intercostal view containing the right renal cortex from 39 cases of fatty liver (MRI?proton density fat fraction [MRI?PDFF] ? 5%) and 51 normal subjects (MRI?PDFF < 5%), with MRI?PDFF values obtained from Good Gang-An Hospital. We obtained combined liver and kidney-liver (CLKL) images to train the deep learning models and developed classification and regression models based on the VGG19 model to classify fatty liver disease and yield fat fraction values. We employed the data augmentation techniques such as flip and rotation to prevent the deep learning model from overfitting. We determined the deep learning model with performance metrics such as accuracy, sensitivity, specificity, and coefficient of determination (R2). Results: In demographic information, all metrics such as age and sex were similar between the two groups?fatty liver disease and normal subjects. In classification, the model trained on CLKL images achieved 80.1% accuracy, 86.2% precision, and 80.5% specificity to detect fatty liver disease. In regression, the predicted fat fraction values of the regression model trained on CLKL images correlated with MRI?PDFF values (R2=0.633), indicating that the predicted fat fraction values were moderately estimated. Conclusions: With deep learning techniques and multi-view ultrasound images, it is potentially possible to replace MRI?PDFF values with deep learning predictions for detecting fatty liver disease and estimating fat fraction values. UR - https://medinform.jmir.org/2021/11/e30066 UR - http://dx.doi.org/10.2196/30066 UR - http://www.ncbi.nlm.nih.gov/pubmed/34792476 ID - info:doi/10.2196/30066 ER - TY - JOUR AU - Amin, Shiraz AU - Gupta, Vedant AU - Du, Gaixin AU - McMullen, Colleen AU - Sirrine, Matthew AU - Williams, V. Mark AU - Smyth, S. Susan AU - Chadha, Romil AU - Stearley, Seth AU - Li, Jing PY - 2021/11/16 TI - Developing and Demonstrating the Viability and Availability of the Multilevel Implementation Strategy for Syncope Optimal Care Through Engagement (MISSION) Syncope App: Evidence-Based Clinical Decision Support Tool JO - J Med Internet Res SP - e25192 VL - 23 IS - 11 KW - cardiology KW - medical diagnosis KW - medicine KW - mobile applications KW - prognostics and health KW - syncope N2 - Background: Syncope evaluation and management is associated with testing overuse and unnecessary hospitalizations. The 2017 American College of Cardiology/American Heart Association (ACC/AHA) Syncope Guideline aims to standardize clinical practice and reduce unnecessary services. The use of clinical decision support (CDS) tools offers the potential to successfully implement evidence-based clinical guidelines. However, CDS tools that provide an evidence-based differential diagnosis (DDx) of syncope at the point of care are currently lacking. Objective: With input from diverse health systems, we developed and demonstrated the viability of a mobile app, the Multilevel Implementation Strategy for Syncope optImal care thrOugh eNgagement (MISSION) Syncope, as a CDS tool for syncope diagnosis and prognosis. Methods: Development of the app had three main goals: (1) reliable generation of an accurate DDx, (2) incorporation of an evidence-based clinical risk tool for prognosis, and (3) user-based design and technical development. To generate a DDx that incorporated assessment recommendations, we reviewed guidelines and the literature to determine clinical assessment questions (variables) and likelihood ratios (LHRs) for each variable in predicting etiology. The creation and validation of the app diagnosis occurred through an iterative clinician review and application to actual clinical cases. The review of available risk score calculators focused on identifying an easily applied and valid evidence-based clinical risk stratification tool. The review and decision-making factors included characteristics of the original study, clinical variables, and validation studies. App design and development relied on user-centered design principles. We used observations of the emergency department workflow, storyboard demonstration, multiple mock review sessions, and beta-testing to optimize functionality and usability. Results: The MISSION Syncope app is consistent with guideline recommendations on evidence-based practice (EBP), and its user interface (UI) reflects steps in a real-world patient evaluation: assessment, DDx, risk stratification, and recommendations. The app provides flexible clinical decision making, while emphasizing a care continuum; it generates recommendations for diagnosis and prognosis based on user input. The DDx in the app is deemed a pragmatic model that more closely aligns with real-world clinical practice and was validated using actual clinical cases. The beta-testing of the app demonstrated well-accepted functionality and usability of this syncope CDS tool. Conclusions: The MISSION Syncope app development integrated the current literature and clinical expertise to provide an evidence-based DDx, a prognosis using a validated scoring system, and recommendations based on clinical guidelines. This app demonstrates the importance of using research literature in the development of a CDS tool and applying clinical experience to fill the gaps in available research. It is essential for a successful app to be deliberate in pursuing a practical clinical model instead of striving for a perfect mathematical model, given available published evidence. This hybrid methodology can be applied to similar CDS tool development. UR - https://www.jmir.org/2021/11/e25192 UR - http://dx.doi.org/10.2196/25192 UR - http://www.ncbi.nlm.nih.gov/pubmed/34783669 ID - info:doi/10.2196/25192 ER - TY - JOUR AU - McKenzie, Jordan AU - Rajapakshe, Rasika AU - Shen, Hua AU - Rajapakshe, Shan AU - Lin, Angela PY - 2021/11/12 TI - A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study JO - JMIR Med Inform SP - e29241 VL - 9 IS - 11 KW - chart review KW - natural language processing KW - text extraction KW - radiation pneumonitis KW - lung cancer KW - radiation therapy KW - python KW - electronic medical record KW - accuracy N2 - Background: Health research frequently requires manual chart reviews to identify patients in a study-specific cohort and examine their clinical outcomes. Manual chart review is a labor-intensive process that requires significant time investment for clinical researchers. Objective: This study aims to evaluate the feasibility and accuracy of an assisted chart review program, using an in-house rule-based text-extraction program written in Python, to identify patients who developed radiation pneumonitis (RP) after receiving curative radiotherapy. Methods: A retrospective manual chart review was completed for patients who received curative radiotherapy for stage 2-3 lung cancer from January 1, 2013 to December 31, 2015, at British Columbia Cancer, Kelowna Centre. In the manual chart review, RP diagnosis and grading were recorded using the Common Terminology Criteria for Adverse Events version 5.0. From the charts of 50 sample patients, a total of 1413 clinical documents were obtained for review from the electronic medical record system. The text-extraction program was built using the Natural Language Toolkit Python platform (and regular expressions, also known as RegEx). Python version 3.7.2 was used to run the text-extraction program. The output of the text-extraction program was a list of the full sentences containing the key terms, document IDs, and dates from which these sentences were extracted. The results from the manual review were used as the gold standard in this study, with which the results of the text-extraction program were compared. Results: Fifty percent (25/50) of the sample patients developed grade ?1 RP; the natural language processing program was able to ascertain 92% (23/25) of these patients (sensitivity 0.92, 95% CI 0.74-0.99; specificity 0.36, 95% CI 0.18-0.57). Furthermore, the text-extraction program was able to correctly identify all 9 patients with grade ?2 RP, which are patients with clinically significant symptoms (sensitivity 1.0, 95% CI 0.66-1.0; specificity 0.27, 95% CI 0.14-0.43). The program was useful for distinguishing patients with RP from those without RP. The text-extraction program in this study avoided unnecessary manual review of 22% (11/50) of the sample patients, as these patients were identified as grade 0 RP and would not require further manual review in subsequent studies. Conclusions: This feasibility study showed that the text-extraction program was able to assist with the identification of patients who developed RP after curative radiotherapy. The program streamlines the manual chart review further by identifying the key sentences of interest. This work has the potential to improve future clinical research, as the text-extraction program shows promise in performing chart review in a more time-efficient manner, compared with the traditional labor-intensive manual chart review. UR - https://medinform.jmir.org/2021/11/e29241 UR - http://dx.doi.org/10.2196/29241 UR - http://www.ncbi.nlm.nih.gov/pubmed/34766919 ID - info:doi/10.2196/29241 ER - TY - JOUR AU - Saeed, Q. Ali AU - Sheikh Abdullah, Huda Siti Norul AU - Che-Hamzah, Jemaima AU - Abdul Ghani, Tarmizi Ahmad PY - 2021/9/21 TI - Accuracy of Using Generative Adversarial Networks for Glaucoma Detection: Systematic Review and Bibliometric Analysis JO - J Med Internet Res SP - e27414 VL - 23 IS - 9 KW - glaucoma KW - generative adversarial network KW - deep learning KW - systematic literature review KW - retinal disease KW - blood vessels KW - optic disc N2 - Background: Glaucoma leads to irreversible blindness. Globally, it is the second most common retinal disease that leads to blindness, slightly less common than cataracts. Therefore, there is a great need to avoid the silent growth of this disease using recently developed generative adversarial networks (GANs). Objective: This paper aims to introduce a GAN technology for the diagnosis of eye disorders, particularly glaucoma. This paper illustrates deep adversarial learning as a potential diagnostic tool and the challenges involved in its implementation. This study describes and analyzes many of the pitfalls and problems that researchers will need to overcome to implement this kind of technology. Methods: To organize this review comprehensively, articles and reviews were collected using the following keywords: (?Glaucoma,? ?optic disc,? ?blood vessels?) and (?receptive field,? ?loss function,? ?GAN,? ?Generative Adversarial Network,? ?Deep learning,? ?CNN,? ?convolutional neural network? OR encoder). The records were identified from 5 highly reputed databases: IEEE Xplore, Web of Science, Scopus, ScienceDirect, and PubMed. These libraries broadly cover the technical and medical literature. Publications within the last 5 years, specifically 2015-2020, were included because the target GAN technique was invented only in 2014 and the publishing date of the collected papers was not earlier than 2016. Duplicate records were removed, and irrelevant titles and abstracts were excluded. In addition, we excluded papers that used optical coherence tomography and visual field images, except for those with 2D images. A large-scale systematic analysis was performed, and then a summarized taxonomy was generated. Furthermore, the results of the collected articles were summarized and a visual representation of the results was presented on a T-shaped matrix diagram. This study was conducted between March 2020 and November 2020. Results: We found 59 articles after conducting a comprehensive survey of the literature. Among the 59 articles, 30 present actual attempts to synthesize images and provide accurate segmentation/classification using single/multiple landmarks or share certain experiences. The other 29 articles discuss the recent advances in GANs, do practical experiments, and contain analytical studies of retinal disease. Conclusions: Recent deep learning techniques, namely GANs, have shown encouraging performance in retinal disease detection. Although this methodology involves an extensive computing budget and optimization process, it saturates the greedy nature of deep learning techniques by synthesizing images and solves major medical issues. This paper contributes to this research field by offering a thorough analysis of existing works, highlighting current limitations, and suggesting alternatives to support other researchers and participants in further improving and strengthening future work. Finally, new directions for this research have been identified. UR - https://www.jmir.org/2021/9/e27414 UR - http://dx.doi.org/10.2196/27414 UR - http://www.ncbi.nlm.nih.gov/pubmed/34236992 ID - info:doi/10.2196/27414 ER - TY - JOUR AU - Chen, Chih-Hao AU - Lin, Haley Heng-Yu AU - Wang, Mao-Che AU - Chu, Yuan-Chia AU - Chang, Chun-Yu AU - Huang, Chii-Yuan AU - Cheng, Yen-Fu PY - 2021/9/10 TI - Diagnostic Accuracy of Smartphone-Based Audiometry for Hearing Loss Detection: Meta-analysis JO - JMIR Mhealth Uhealth SP - e28378 VL - 9 IS - 9 KW - audiometry KW - hearing loss KW - hearing test KW - mhealth KW - mobile health KW - digital health KW - meta-analysis KW - mobile phone KW - smartphone diagnostic test accuracy N2 - Background: Hearing loss is one of the most common disabilities worldwide and affects both individual and public health. Pure tone audiometry (PTA) is the gold standard for hearing assessment, but it is often not available in many settings, given its high cost and demand for human resources. Smartphone-based audiometry may be equally effective and can improve access to adequate hearing evaluations. Objective: The aim of this systematic review is to synthesize the current evidence of the role of smartphone-based audiometry in hearing assessments and further explore the factors that influence its diagnostic accuracy. Methods: Five databases?PubMed, Embase, Cochrane Library, Web of Science, and Scopus?were queried to identify original studies that examined the diagnostic accuracy of hearing loss measurement using smartphone-based devices with conventional PTA as a reference test. A bivariate random-effects meta-analysis was performed to estimate the pooled sensitivity and specificity. The factors associated with diagnostic accuracy were identified using a bivariate meta-regression model. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Results: In all, 25 studies with a total of 4470 patients were included in the meta-analysis. The overall sensitivity, specificity, and area under the receiver operating characteristic curve for smartphone-based audiometry were 89% (95% CI 83%-93%), 93% (95% CI 87%-97%), and 0.96 (95% CI 0.93-0.97), respectively; the corresponding values for the smartphone-based speech recognition test were 91% (95% CI 86%-94%), 88% (95% CI 75%-94%), and 0.93 (95% CI 0.90-0.95), respectively. Meta-regression analysis revealed that patient age, equipment used, and the presence of soundproof booths were significantly related to diagnostic accuracy. Conclusions: We have presented comprehensive evidence regarding the effectiveness of smartphone-based tests in diagnosing hearing loss. Smartphone-based audiometry may serve as an accurate and accessible approach to hearing evaluations, especially in settings where conventional PTA is unavailable. UR - https://mhealth.jmir.org/2021/9/e28378/ UR - http://dx.doi.org/10.2196/28378 UR - http://www.ncbi.nlm.nih.gov/pubmed/34515644 ID - info:doi/10.2196/28378 ER - TY - JOUR AU - Shaballout, Nour AU - Aloumar, Anas AU - Manuel, Jorge AU - May, Marcus AU - Beissner, Florian PY - 2021/8/27 TI - Lateralization and Bodily Patterns of Segmental Signs and Spontaneous Pain in Acute Visceral Disease: Observational Study JO - J Med Internet Res SP - e27247 VL - 23 IS - 8 KW - digital pain drawings KW - visceral referred pain KW - referred pain KW - head zones KW - mydriasis KW - chest pain KW - clinical examination KW - differential diagnosis KW - digital health KW - digital drawings KW - pain KW - health technology KW - image analysis N2 - Background: The differential diagnosis of acute visceral diseases is a challenging clinical problem. Older literature suggests that patients with acute visceral problems show segmental signs such as hyperalgesia, skin resistance, or muscular defense as manifestations of referred visceral pain in somatic or visceral tissues with overlapping segmental innervation. According to these sources, the lateralization and segmental distribution of such signs may be used for differential diagnosis. Segmental signs and symptoms may be accompanied by spontaneous (visceral) pain, which, however, shows a nonsegmental distribution. Objective: This study aimed to investigate the lateralization (ie, localization on one side of the body, in preference to the other) and segmental distribution (ie, surface ratio of the affected segments) of spontaneous pain and (referred) segmental signs in acute visceral diseases using digital pain drawing technology. Methods: We recruited 208 emergency room patients that were presenting for acute medical problems considered by triage as related to internal organ disease. All patients underwent a structured 10-minute bodily examination to test for various segmental signs and spontaneous visceral pain. They were further asked their segmental symptoms such as nausea, meteorism, and urinary retention. We collected spontaneous pain and segmental signs as digital drawings and segmental symptoms as binary values on a tablet PC. After the final diagnosis, patients were divided into groups according to the organ affected. Using statistical image analysis, we calculated mean distributions of pain and segmental signs for the heart, lungs, stomach, liver/gallbladder, and kidneys/ureters, analyzing the segmental distribution of these signs and the lateralization. Results: Of the 208 recruited patients, 110 (52.9%) were later diagnosed with a single-organ problem. These recruited patients had a mean age of 57.3 (SD 17.2) years, and 40.9% (85/208) were female. Of these 110 patients, 85 (77.3%) reported spontaneous visceral pain. Of the 110, 81 (73.6%) had at least 1 segmental sign, and the most frequent signs were hyperalgesia (46/81, 57%), and muscle resistance (39/81, 48%). While pain was distributed along the body midline, segmental signs for the heart, stomach, and liver/gallbladder appeared mostly ipsilateral to the affected organ. An unexpectedly high number of patients (37/110, 33.6%) further showed ipsilateral mydriasis. Conclusions: This study underlines the usefulness of including digitally recorded segmental signs in bodily examinations of patients with acute medical problems. UR - https://www.jmir.org/2021/8/e27247 UR - http://dx.doi.org/10.2196/27247 UR - http://www.ncbi.nlm.nih.gov/pubmed/34448718 ID - info:doi/10.2196/27247 ER - TY - JOUR AU - Noriega, Alejandro AU - Meizner, Daniela AU - Camacho, Dalia AU - Enciso, Jennifer AU - Quiroz-Mercado, Hugo AU - Morales-Canton, Virgilio AU - Almaatouq, Abdullah AU - Pentland, Alex PY - 2021/8/26 TI - Screening Diabetic Retinopathy Using an Automated Retinal Image Analysis System in Independent and Assistive Use Cases in Mexico: Randomized Controlled Trial JO - JMIR Form Res SP - e25290 VL - 5 IS - 8 KW - diabetic retinopathy KW - automated diagnosis KW - retina KW - fundus image analysis N2 - Background: The automated screening of patients at risk of developing diabetic retinopathy represents an opportunity to improve their midterm outcome and lower the public expenditure associated with direct and indirect costs of common sight-threatening complications of diabetes. Objective: This study aimed to develop and evaluate the performance of an automated deep learning?based system to classify retinal fundus images as referable and nonreferable diabetic retinopathy cases, from international and Mexican patients. In particular, we aimed to evaluate the performance of the automated retina image analysis (ARIA) system under an independent scheme (ie, only ARIA screening) and 2 assistive schemes (ie, hybrid ARIA plus ophthalmologist screening), using a web-based platform for remote image analysis to determine and compare the sensibility and specificity of the 3 schemes. Methods: A randomized controlled experiment was performed where 17 ophthalmologists were asked to classify a series of retinal fundus images under 3 different conditions. The conditions were to (1) screen the fundus image by themselves (solo); (2) screen the fundus image after exposure to the retina image classification of the ARIA system (ARIA answer); and (3) screen the fundus image after exposure to the classification of the ARIA system, as well as its level of confidence and an attention map highlighting the most important areas of interest in the image according to the ARIA system (ARIA explanation). The ophthalmologists? classification in each condition and the result from the ARIA system were compared against a gold standard generated by consulting and aggregating the opinion of 3 retina specialists for each fundus image. Results: The ARIA system was able to classify referable vs nonreferable cases with an area under the receiver operating characteristic curve of 98%, a sensitivity of 95.1%, and a specificity of 91.5% for international patient cases. There was an area under the receiver operating characteristic curve of 98.3%, a sensitivity of 95.2%, and a specificity of 90% for Mexican patient cases. The ARIA system performance was more successful than the average performance of the 17 ophthalmologists enrolled in the study. Additionally, the results suggest that the ARIA system can be useful as an assistive tool, as sensitivity was significantly higher in the experimental condition where ophthalmologists were exposed to the ARIA system?s answer prior to their own classification (93.3%), compared with the sensitivity of the condition where participants assessed the images independently (87.3%; P=.05). Conclusions: These results demonstrate that both independent and assistive use cases of the ARIA system present, for Latin American countries such as Mexico, a substantial opportunity toward expanding the monitoring capacity for the early detection of diabetes-related blindness. UR - https://formative.jmir.org/2021/8/e25290 UR - http://dx.doi.org/10.2196/25290 UR - http://www.ncbi.nlm.nih.gov/pubmed/34435963 ID - info:doi/10.2196/25290 ER - TY - JOUR AU - Kummer, Benjamin AU - Shakir, Lubaina AU - Kwon, Rachel AU - Habboushe, Joseph AU - Jetté, Nathalie PY - 2021/8/2 TI - Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis JO - JMIR Med Inform SP - e28266 VL - 9 IS - 8 KW - medical informatics KW - clinical informatics KW - mhealth KW - digital health KW - cerebrovascular disease KW - medical calculators KW - health information KW - health information technology KW - information technology KW - economic health KW - clinical health KW - electronic health records N2 - Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app?based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc?s calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5% of total and 32% of stroke-related page views), the Mean Arterial Pressure calculator (2.4% of total and 14.0% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9% of total and 11.4% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7% of total and 10.1% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4% of total and 8.1% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7%-91.2% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. UR - https://medinform.jmir.org/2021/8/e28266 UR - http://dx.doi.org/10.2196/28266 UR - http://www.ncbi.nlm.nih.gov/pubmed/34338647 ID - info:doi/10.2196/28266 ER - TY - JOUR AU - Wu, Jo-Hsuan AU - Liu, Alvin T. Y. AU - Hsu, Wan-Ting AU - Ho, Hui-Chun Jennifer AU - Lee, Chien-Chang PY - 2021/7/5 TI - Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis JO - J Med Internet Res SP - e23863 VL - 23 IS - 7 KW - machine learning KW - diabetic retinopathy KW - diabetes KW - deep learning KW - neural network KW - diagnostic accuracy N2 - Background: Diabetic retinopathy (DR), whose standard diagnosis is performed by human experts, has high prevalence and requires a more efficient screening method. Although machine learning (ML)?based automated DR diagnosis has gained attention due to recent approval of IDx-DR, performance of this tool has not been examined systematically, and the best ML technique for use in a real-world setting has not been discussed. Objective: The aim of this study was to systematically examine the overall diagnostic accuracy of ML in diagnosing DR of different categories based on color fundus photographs and to determine the state-of-the-art ML approach. Methods: Published studies in PubMed and EMBASE were searched from inception to June 2020. Studies were screened for relevant outcomes, publication types, and data sufficiency, and a total of 60 out of 2128 (2.82%) studies were retrieved after study selection. Extraction of data was performed by 2 authors according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and the quality assessment was performed according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). Meta-analysis of diagnostic accuracy was pooled using a bivariate random effects model. The main outcomes included diagnostic accuracy, sensitivity, and specificity of ML in diagnosing DR based on color fundus photographs, as well as the performances of different major types of ML algorithms. Results: The primary meta-analysis included 60 color fundus photograph studies (445,175 interpretations). Overall, ML demonstrated high accuracy in diagnosing DR of various categories, with a pooled area under the receiver operating characteristic (AUROC) ranging from 0.97 (95% CI 0.96-0.99) to 0.99 (95% CI 0.98-1.00). The performance of ML in detecting more-than-mild DR was robust (sensitivity 0.95; AUROC 0.97), and by subgroup analyses, we observed that robust performance of ML was not limited to benchmark data sets (sensitivity 0.92; AUROC 0.96) but could be generalized to images collected in clinical practice (sensitivity 0.97; AUROC 0.97). Neural network was the most widely used method, and the subgroup analysis revealed a pooled AUROC of 0.98 (95% CI 0.96-0.99) for studies that used neural networks to diagnose more-than-mild DR. Conclusions: This meta-analysis demonstrated high diagnostic accuracy of ML algorithms in detecting DR on color fundus photographs, suggesting that state-of-the-art, ML-based DR screening algorithms are likely ready for clinical applications. However, a significant portion of the earlier published studies had methodology flaws, such as the lack of external validation and presence of spectrum bias. The results of these studies should be interpreted with caution. UR - https://www.jmir.org/2021/7/e23863 UR - http://dx.doi.org/10.2196/23863 UR - http://www.ncbi.nlm.nih.gov/pubmed/34407500 ID - info:doi/10.2196/23863 ER - TY - JOUR AU - Hu, Hao-Chun AU - Chang, Shyue-Yih AU - Wang, Chuen-Heng AU - Li, Kai-Jun AU - Cho, Hsiao-Yun AU - Chen, Yi-Ting AU - Lu, Chang-Jung AU - Tsai, Tzu-Pei AU - Lee, Kuang-Sheng Oscar PY - 2021/6/8 TI - Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study JO - J Med Internet Res SP - e25247 VL - 23 IS - 6 KW - artificial intelligence KW - convolutional neural network KW - dysphonia KW - pathological voice KW - vocal fold disease KW - voice pathology identification N2 - Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies. UR - https://www.jmir.org/2021/6/e25247 UR - http://dx.doi.org/10.2196/25247 UR - http://www.ncbi.nlm.nih.gov/pubmed/34100770 ID - info:doi/10.2196/25247 ER - TY - JOUR AU - Li, Lei AU - Zhu, Haogang AU - Zhang, Zhenyu AU - Zhao, Liang AU - Xu, Liang AU - Jonas, A. Rahul AU - Garway-Heath, F. David AU - Jonas, B. Jost AU - Wang, Xing Ya PY - 2021/5/18 TI - Neural Network?Based Retinal Nerve Fiber Layer Profile Compensation for Glaucoma Diagnosis in Myopia: Model Development and Validation JO - JMIR Med Inform SP - e22664 VL - 9 IS - 5 KW - retinal nerve fiber layer thickness KW - radial basis neural network KW - neural network KW - glaucoma KW - optic nerve head KW - optical coherence tomography KW - myopia KW - optic nerve N2 - Background: Due to the axial elongation?associated changes in the optic nerve and retina in high myopia, traditional methods like optic disc evaluation and visual field are not able to correctly differentiate glaucomatous lesions. It has been clinically challenging to detect glaucoma in highly myopic eyes. Objective: This study aimed to develop a neural network to adjust for the dependence of the peripapillary retinal nerve fiber layer (RNFL) thickness (RNFLT) profile on age, gender, and ocular biometric parameters and to evaluate the network?s performance for glaucoma diagnosis, especially in high myopia. Methods: RNFLT with 768 points on the circumferential 3.4-mm scan was measured using spectral-domain optical coherence tomography. A fully connected network and a radial basis function network were trained for vertical (scaling) and horizontal (shift) transformation of the RNFLT profile with adjustment for age, axial length (AL), disc-fovea angle, and distance in a test group of 2223 nonglaucomatous eyes. The performance of RNFLT compensation was evaluated in an independent group of 254 glaucoma patients and 254 nonglaucomatous participants. Results: By applying the RNFL compensation algorithm, the area under the receiver operating characteristic curve for detecting glaucoma increased from 0.70 to 0.84, from 0.75 to 0.89, from 0.77 to 0.89, and from 0.78 to 0.87 for eyes in the highest 10% percentile subgroup of the AL distribution (mean 26.0, SD 0.9 mm), highest 20% percentile subgroup of the AL distribution (mean 25.3, SD 1.0 mm), highest 30% percentile subgroup of the AL distribution (mean 24.9, SD 1.0 mm), and any AL (mean 23.5, SD 1.2 mm), respectively, in comparison with unadjusted RNFLT. The difference between uncompensated and compensated RNFLT values increased with longer axial length, with enlargement of 19.8%, 18.9%, 16.2%, and 11.3% in the highest 10% percentile subgroup, highest 20% percentile subgroup, highest 30% percentile subgroup, and all eyes, respectively. Conclusions: In a population-based study sample, an algorithm-based adjustment for age, gender, and ocular biometric parameters improved the diagnostic precision of the RNFLT profile for glaucoma detection particularly in myopic and highly myopic eyes. UR - https://medinform.jmir.org/2021/5/e22664 UR - http://dx.doi.org/10.2196/22664 UR - http://www.ncbi.nlm.nih.gov/pubmed/34003137 ID - info:doi/10.2196/22664 ER - TY - JOUR AU - Aktar, Sakifa AU - Ahamad, Martuza Md AU - Rashed-Al-Mahfuz, Md AU - Azad, AKM AU - Uddin, Shahadat AU - Kamal, AHM AU - Alyami, A. Salem AU - Lin, Ping-I AU - Islam, Shariful Sheikh Mohammed AU - Quinn, MW Julian AU - Eapen, Valsamma AU - Moni, Ali Mohammad PY - 2021/4/13 TI - Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development JO - JMIR Med Inform SP - e25884 VL - 9 IS - 4 KW - COVID-19 KW - blood samples KW - machine learning KW - statistical analysis KW - prediction KW - severity KW - mortality KW - morbidity KW - risk KW - blood KW - testing KW - outcome KW - data set N2 - Background: Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective: Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods: We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results: Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19?positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90% for disease severity prediction. Conclusions: We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment. UR - https://medinform.jmir.org/2021/4/e25884 UR - http://dx.doi.org/10.2196/25884 UR - http://www.ncbi.nlm.nih.gov/pubmed/33779565 ID - info:doi/10.2196/25884 ER - TY - JOUR AU - Smit, A. Marloes AU - van Pelt, W. Gabi AU - Dequeker, MC Elisabeth AU - Al Dieri, Raed AU - Tollenaar, AEM Rob AU - van Krieken, JM J. Han AU - Mesker, E. Wilma AU - PY - 2021/3/19 TI - e-Learning for Instruction and to Improve Reproducibility of Scoring Tumor-Stroma Ratio in Colon Carcinoma: Performance and Reproducibility Assessment in the UNITED Study JO - JMIR Form Res SP - e19408 VL - 5 IS - 3 KW - colon cancer KW - tumor-stroma ratio KW - validation KW - e-Learning KW - reproducibility study KW - cancer KW - tumor KW - colon KW - reproducibility KW - carcinoma KW - prognosis KW - diagnostic KW - implementation KW - online learning N2 - Background: The amount of stroma in the primary tumor is an important prognostic parameter. The tumor-stroma ratio (TSR) was previously validated by international research groups as a robust parameter with good interobserver agreement. Objective: The Uniform Noting for International Application of the Tumor-Stroma Ratio as an Easy Diagnostic Tool (UNITED) study was developed to bring the TSR to clinical implementation. As part of the study, an e-Learning module was constructed to confirm the reproducibility of scoring the TSR after proper instruction. Methods: The e-Learning module consists of an autoinstruction for TSR determination (instruction video or written protocol) and three sets of 40 cases (training, test, and repetition sets). Scoring the TSR is performed on hematoxylin and eosin?stained sections and takes only 1-2 minutes. Cases are considered stroma-low if the amount of stroma is ?50%, whereas a stroma-high case is defined as >50% stroma. Inter- and intraobserver agreements were determined based on the Cohen ? score after each set to evaluate the reproducibility. Results: Pathologists and pathology residents (N=63) with special interest in colorectal cancer participated in the e-Learning. Forty-nine participants started the e-Learning and 31 (63%) finished the whole cycle (3 sets). A significant improvement was observed from the training set to the test set; the median ? score improved from 0.72 to 0.77 (P=.002). Conclusions: e-Learning is an effective method to instruct pathologists and pathology residents for scoring the TSR. The reliability of scoring improved from the training to the test set and did not fall back with the repetition set, confirming the reproducibility of the TSR scoring method. Trial Registration: The Netherlands Trial Registry NTR7270; https://www.trialregister.nl/trial/7072 International Registered Report Identifier (IRRID): RR2-10.2196/13464 UR - https://formative.jmir.org/2021/3/e19408 UR - http://dx.doi.org/10.2196/19408 UR - http://www.ncbi.nlm.nih.gov/pubmed/33739293 ID - info:doi/10.2196/19408 ER - TY - JOUR AU - Ridgway, P. Jessica AU - Uvin, Arno AU - Schmitt, Jessica AU - Oliwa, Tomasz AU - Almirol, Ellen AU - Devlin, Samantha AU - Schneider, John PY - 2021/3/10 TI - Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study JO - JMIR Med Inform SP - e23456 VL - 9 IS - 3 KW - natural language processing KW - HIV KW - substance use KW - mental illness KW - electronic medical records N2 - Background: Mental illness and substance use are prevalent among people living with HIV and often lead to poor health outcomes. Electronic medical record (EMR) data are increasingly being utilized for HIV-related clinical research and care, but mental illness and substance use are often underdocumented in structured EMR fields. Natural language processing (NLP) of unstructured text of clinical notes in the EMR may more accurately identify mental illness and substance use among people living with HIV than structured EMR fields alone. Objective: The aim of this study was to utilize NLP of clinical notes to detect mental illness and substance use among people living with HIV and to determine how often these factors are documented in structured EMR fields. Methods: We collected both structured EMR data (diagnosis codes, social history, Problem List) as well as the unstructured text of clinical HIV care notes for adults living with HIV. We developed NLP algorithms to identify words and phrases associated with mental illness and substance use in the clinical notes. The algorithms were validated based on chart review. We compared numbers of patients with documentation of mental illness or substance use identified by structured EMR fields with those identified by the NLP algorithms. Results: The NLP algorithm for detecting mental illness had a positive predictive value (PPV) of 98% and a negative predictive value (NPV) of 98%. The NLP algorithm for detecting substance use had a PPV of 92% and an NPV of 98%. The NLP algorithm for mental illness identified 54.0% (420/778) of patients as having documentation of mental illness in the text of clinical notes. Among the patients with mental illness detected by NLP, 58.6% (246/420) had documentation of mental illness in at least one structured EMR field. Sixty-three patients had documentation of mental illness in structured EMR fields that was not detected by NLP of clinical notes. The NLP algorithm for substance use detected substance use in the text of clinical notes in 18.1% (141/778) of patients. Among patients with substance use detected by NLP, 73.8% (104/141) had documentation of substance use in at least one structured EMR field. Seventy-six patients had documentation of substance use in structured EMR fields that was not detected by NLP of clinical notes. Conclusions: Among patients in an urban HIV care clinic, NLP of clinical notes identified high rates of mental illness and substance use that were often not documented in structured EMR fields. This finding has important implications for epidemiologic research and clinical care for people living with HIV. UR - https://medinform.jmir.org/2021/3/e23456 UR - http://dx.doi.org/10.2196/23456 UR - http://www.ncbi.nlm.nih.gov/pubmed/33688848 ID - info:doi/10.2196/23456 ER - TY - JOUR AU - Weijers, Miriam AU - Bastiaenen, Caroline AU - Feron, Frans AU - Schröder, Kay PY - 2021/2/9 TI - Designing a Personalized Health Dashboard: Interdisciplinary and Participatory Approach JO - JMIR Form Res SP - e24061 VL - 5 IS - 2 KW - visualization design model KW - dashboard KW - evaluation KW - personalized health care KW - International Classification of Functioning, Disability and Health (ICF) KW - patient access to records KW - human?computer interaction KW - health information visualization N2 - Background: Within the Dutch Child Health Care (CHC), an online tool (360° CHILD-profile) is designed to enhance prevention and transformation toward personalized health care. From a personalized preventive perspective, it is of fundamental importance to timely identify children with emerging health problems interrelated to multiple health determinants. While digitalization of children?s health data is now realized, the accessibility of data remains a major challenge for CHC professionals, let alone for parents/youth. Therefore, the idea was initiated from CHC practice to develop a novel approach to make relevant information accessible at a glance. Objective: This paper describes the stepwise development of a dashboard, as an example of using a design model to achieve visualization of a comprehensive overview of theoretically structured health data. Methods: Developmental process is based on the nested design model with involvement of relevant stakeholders in a real-life context. This model considers immediate upstream validation within 4 cascading design levels: Domain Problem and Data Characterization, Operation and Data Type Abstraction, Visual Encoding and Interaction Design, and Algorithm Design. This model also includes impact-oriented downstream validation, which can be initiated after delivering the prototype. Results: A comprehensible 360° CHILD-profile is developed: an online accessible visualization of CHC data based on the theoretical concept of the International Classification of Functioning, Disability and Health. This dashboard provides caregivers and parents/youth with a holistic view on children?s health and ?entry points? for preventive, individualized health plans. Conclusions: Describing this developmental process offers guidance on how to utilize the nested design model within a health care context. UR - https://formative.jmir.org/2021/2/e24061 UR - http://dx.doi.org/10.2196/24061 UR - http://www.ncbi.nlm.nih.gov/pubmed/33560229 ID - info:doi/10.2196/24061 ER - TY - JOUR AU - Sato, Ann AU - Haneda, Eri AU - Suganuma, Nobuyasu AU - Narimatsu, Hiroto PY - 2021/2/5 TI - Preliminary Screening for Hereditary Breast and Ovarian Cancer Using a Chatbot Augmented Intelligence Genetic Counselor: Development and Feasibility Study JO - JMIR Form Res SP - e25184 VL - 5 IS - 2 KW - artificial intelligence KW - augmented intelligence KW - hereditary cancer KW - familial cancer KW - IBM Watson KW - preliminary screening KW - cancer KW - genetics KW - chatbot KW - screening KW - feasibility N2 - Background: Breast cancer is the most common form of cancer in Japan; genetic background and hereditary breast and ovarian cancer (HBOC) are implicated. The key to HBOC diagnosis involves screening to identify high-risk individuals. However, genetic medicine is still developing; thus, many patients who may potentially benefit from genetic medicine have not yet been identified. Objective: This study?s objective is to develop a chatbot system that uses augmented intelligence for HBOC screening to determine whether patients meet the National Comprehensive Cancer Network (NCCN) BRCA1/2 testing criteria. Methods: The system was evaluated by a doctor specializing in genetic medicine and certified genetic counselors. We prepared 3 scenarios and created a conversation with the chatbot to reflect each one. Then we evaluated chatbot feasibility, the required time, the medical accuracy of conversations and family history, and the final result. Results: The times required for the conversation were 7 minutes for scenario 1, 15 minutes for scenario 2, and 16 minutes for scenario 3. Scenarios 1 and 2 met the BRCA1/2 testing criteria, but scenario 3 did not, and this result was consistent with the findings of 3 experts who retrospectively reviewed conversations with the chatbot according to the 3 scenarios. A family history comparison ascertained by the chatbot with the actual scenarios revealed that each result was consistent with each scenario. From a genetic medicine perspective, no errors were noted by the 3 experts. Conclusions: This study demonstrated that chatbot systems could be applied to preliminary genetic medicine screening for HBOC. UR - https://formative.jmir.org/2021/2/e25184 UR - http://dx.doi.org/10.2196/25184 UR - http://www.ncbi.nlm.nih.gov/pubmed/33544084 ID - info:doi/10.2196/25184 ER - TY - JOUR AU - Aleknaite, Ausra AU - Simutis, Gintaras AU - Stanaitis, Juozas AU - Jucaitis, Tomas AU - Drungilas, Mantas AU - Valantinas, Jonas AU - Strupas, Kestutis PY - 2021/2/4 TI - Comparison of Endoscopy First and Laparoscopic Cholecystectomy First Strategies for Patients With Gallstone Disease and Intermediate Risk of Choledocholithiasis: Protocol for a Clinical Randomized Controlled Trial JO - JMIR Res Protoc SP - e18837 VL - 10 IS - 2 KW - choledocholithiasis KW - endoscopic ultrasound KW - intraoperative cholangiography KW - common bile duct stone KW - endoscopic retrograde cholangiopancreatography KW - laparoscopic cholecystectomy N2 - Background: The optimal approach for patients with gallbladder stones and intermediate risk of choledocholithiasis remains undetermined. The use of endoscopic retrograde cholangiopancreatography for diagnosis should be minimized as it carries considerable risk of postprocedural complications, and nowadays, less invasive and safer techniques are available. Objective: This study compares the two management strategies of endoscopic ultrasound before laparoscopic cholecystectomy and intraoperative cholangiography for patients with symptomatic cholecystolithiasis and intermediate risk of choledocholithiasis. Methods: This is a randomized, active-controlled, single-center clinical trial enrolling adult patients undergoing laparoscopic cholecystectomy for symptomatic gallbladder stones with intermediate risk of choledocholithiasis. The risk of choledocholithiasis is calculated using an original prognostic score (the Vilnius University Hospital Index). This index in a retrospective evaluation showed better prognostic performance than the score proposed by the American Society for Gastrointestinal Endoscopy in 2010. A total of 106 participants will be included and randomized into two groups. Evaluation of bile ducts using endoscopic ultrasound and endoscopic retrograde cholangiography on demand will be performed before laparoscopic cholecystectomy for one arm (?endoscopy first?). Intraoperative cholangiography during laparoscopic cholecystectomy and postoperative endoscopic retrograde cholangiopancreatography on demand will be performed in another arm (?cholecystectomy first?). Postoperative follow-up is 6 months. The primary endpoint is the length of hospital stay. The secondary endpoints are accuracy of the different management strategies, adverse events of the interventions, duct clearance and technical success of the interventions (intraoperative cholangiography, endoscopic ultrasound, and endoscopic retrograde cholangiography), and cost of treatment. Results: The trial protocol was approved by the Vilnius Regional Biomedical Research Ethics Committee in December 2017. Enrollment of patients was started in January 2018. As of June 2020, 66 patients have been enrolled. Conclusions: This trial is planned to determine the superior strategy for patients with intermediate risk of common bile duct stones and to define a simple and safe algorithm for managing choledocholithiasis. Trial Registration: ClinicalTrials.gov NCT03658863; https://clinicaltrials.gov/ct2/show/NCT03658863. International Registered Report Identifier (IRRID): DERR1-10.2196/18837 UR - https://www.researchprotocols.org/2021/2/e18837 UR - http://dx.doi.org/10.2196/18837 UR - http://www.ncbi.nlm.nih.gov/pubmed/33538700 ID - info:doi/10.2196/18837 ER - TY - JOUR AU - Diao, Xiaolin AU - Huo, Yanni AU - Yan, Zhanzheng AU - Wang, Haibin AU - Yuan, Jing AU - Wang, Yuxin AU - Cai, Jun AU - Zhao, Wei PY - 2021/1/25 TI - An Application of Machine Learning to Etiological Diagnosis of Secondary Hypertension: Retrospective Study Using Electronic Medical Records JO - JMIR Med Inform SP - e19739 VL - 9 IS - 1 KW - secondary hypertension KW - etiological diagnosis KW - machine learning KW - prediction model N2 - Background: Secondary hypertension is a kind of hypertension with a definite etiology and may be cured. Patients with suspected secondary hypertension can benefit from timely detection and treatment and, conversely, will have a higher risk of morbidity and mortality than those with primary hypertension. Objective: The aim of this study was to develop and validate machine learning (ML) prediction models of common etiologies in patients with suspected secondary hypertension. Methods: The analyzed data set was retrospectively extracted from electronic medical records of patients discharged from Fuwai Hospital between January 1, 2016, and June 30, 2019. A total of 7532 unique patients were included and divided into 2 data sets by time: 6302 patients in 2016-2018 as the training data set for model building and 1230 patients in 2019 as the validation data set for further evaluation. Extreme Gradient Boosting (XGBoost) was adopted to develop 5 models to predict 4 etiologies of secondary hypertension and occurrence of any of them (named as composite outcome), including renovascular hypertension (RVH), primary aldosteronism (PA), thyroid dysfunction, and aortic stenosis. Both univariate logistic analysis and Gini Impurity were used for feature selection. Grid search and 10-fold cross-validation were used to select the optimal hyperparameters for each model. Results: Validation of the composite outcome prediction model showed good performance with an area under the receiver-operating characteristic curve (AUC) of 0.924 in the validation data set, while the 4 prediction models of RVH, PA, thyroid dysfunction, and aortic stenosis achieved AUC of 0.938, 0.965, 0.959, and 0.946, respectively, in the validation data set. A total of 79 clinical indicators were identified in all and finally used in our prediction models. The result of subgroup analysis on the composite outcome prediction model demonstrated high discrimination with AUCs all higher than 0.890 among all age groups of adults. Conclusions: The ML prediction models in this study showed good performance in detecting 4 etiologies of patients with suspected secondary hypertension; thus, they may potentially facilitate clinical diagnosis decision making of secondary hypertension in an intelligent way. UR - http://medinform.jmir.org/2021/1/e19739/ UR - http://dx.doi.org/10.2196/19739 UR - http://www.ncbi.nlm.nih.gov/pubmed/33492233 ID - info:doi/10.2196/19739 ER - TY - JOUR AU - Xu, Ming AU - Ouyang, Liu AU - Han, Lei AU - Sun, Kai AU - Yu, Tingting AU - Li, Qian AU - Tian, Hua AU - Safarnejad, Lida AU - Zhang, Hengdong AU - Gao, Yue AU - Bao, Sheng Forrest AU - Chen, Yuanfang AU - Robinson, Patrick AU - Ge, Yaorong AU - Zhu, Baoli AU - Liu, Jie AU - Chen, Shi PY - 2021/1/6 TI - Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach JO - J Med Internet Res SP - e25535 VL - 23 IS - 1 KW - COVID-19 KW - machine learning KW - deep learning KW - multimodal KW - feature fusion KW - biomedical imaging KW - diagnosis support KW - diagnosis KW - imaging KW - differentiation KW - testing KW - diagnostic N2 - Background: Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19. Objective: We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection. Methods: In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants? clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia. Results: Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%). Conclusions: Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study?s hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features. UR - http://www.jmir.org/2021/1/e25535/ UR - http://dx.doi.org/10.2196/25535 UR - http://www.ncbi.nlm.nih.gov/pubmed/33404516 ID - info:doi/10.2196/25535 ER - TY - JOUR AU - Aboueid, Stephanie AU - Meyer, Samantha AU - Wallace, R. James AU - Mahajan, Shreya AU - Chaurasia, Ashok PY - 2021/1/6 TI - Young Adults? Perspectives on the Use of Symptom Checkers for Self-Triage and Self-Diagnosis: Qualitative Study JO - JMIR Public Health Surveill SP - e22637 VL - 7 IS - 1 KW - self-assessment KW - symptom checkers KW - self-triage KW - self-diagnosis KW - young adults KW - digital platforms KW - internet KW - user experience KW - Google search N2 - Background: Young adults often browse the internet for self-triage and diagnosis. More sophisticated digital platforms such as symptom checkers have recently become pervasive; however, little is known about their use. Objective: The aim of this study was to understand young adults? (18-34 years old) perspectives on the use of the Google search engine versus a symptom checker, as well as to identify the barriers and enablers for using a symptom checker for self-triage and self-diagnosis. Methods: A qualitative descriptive case study research design was used. Semistructured interviews were conducted with 24 young adults enrolled in a university in Ontario, Canada. All participants were given a clinical vignette and were asked to use a symptom checker (WebMD Symptom Checker or Babylon Health) while thinking out loud, and were asked questions regarding their experience. Interviews were audio-recorded, transcribed, and imported into the NVivo software program. Inductive thematic analysis was conducted independently by two researchers. Results: Using the Google search engine was perceived to be faster and more customizable (ie, ability to enter symptoms freely in the search engine) than a symptom checker; however, a symptom checker was perceived to be useful for a more personalized assessment. After having used a symptom checker, most of the participants believed that the platform needed improvement in the areas of accuracy, security and privacy, and medical jargon used. Given these limitations, most participants believed that symptom checkers could be more useful for self-triage than for self-diagnosis. Interestingly, more than half of the participants were not aware of symptom checkers prior to this study and most believed that this lack of awareness about the existence of symptom checkers hindered their use. Conclusions: Awareness related to the existence of symptom checkers and their integration into the health care system are required to maximize benefits related to these platforms. Addressing the barriers identified in this study is likely to increase the acceptance and use of symptom checkers by young adults. UR - https://publichealth.jmir.org/2021/1/e22637 UR - http://dx.doi.org/10.2196/22637 UR - http://www.ncbi.nlm.nih.gov/pubmed/33404515 ID - info:doi/10.2196/22637 ER - TY - JOUR AU - Bang, Seok Chang AU - Lee, Jun Jae AU - Baik, Ho Gwang PY - 2020/9/16 TI - Artificial Intelligence for the Prediction of Helicobacter Pylori Infection in Endoscopic Images: Systematic Review and Meta-Analysis Of Diagnostic Test Accuracy JO - J Med Internet Res SP - e21983 VL - 22 IS - 9 KW - artificial intelligence KW - convolutional neural network KW - deep learning KW - machine learning KW - endoscopy KW - Helicobacter pylori N2 - Background: Helicobacter pylori plays a central role in the development of gastric cancer, and prediction of H pylori infection by visual inspection of the gastric mucosa is an important function of endoscopy. However, there are currently no established methods of optical diagnosis of H pylori infection using endoscopic images. Definitive diagnosis requires endoscopic biopsy. Artificial intelligence (AI) has been increasingly adopted in clinical practice, especially for image recognition and classification. Objective: This study aimed to evaluate the diagnostic test accuracy of AI for the prediction of H pylori infection using endoscopic images. Methods: Two independent evaluators searched core databases. The inclusion criteria included studies with endoscopic images of H pylori infection and with application of AI for the prediction of H pylori infection presenting diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Ultimately, 8 studies were identified. Pooled sensitivity, specificity, diagnostic odds ratio, and area under the curve of AI for the prediction of H pylori infection were 0.87 (95% CI 0.72-0.94), 0.86 (95% CI 0.77-0.92), 40 (95% CI 15-112), and 0.92 (95% CI 0.90-0.94), respectively, in the 1719 patients (385 patients with H pylori infection vs 1334 controls). Meta-regression showed methodological quality and included the number of patients in each study for the purpose of heterogeneity. There was no evidence of publication bias. The accuracy of the AI algorithm reached 82% for discrimination between noninfected images and posteradication images. Conclusions: An AI algorithm is a reliable tool for endoscopic diagnosis of H pylori infection. The limitations of lacking external validation performance and being conducted only in Asia should be overcome. Trial Registration: PROSPERO CRD42020175957; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=175957 UR - http://www.jmir.org/2020/9/e21983/ UR - http://dx.doi.org/10.2196/21983 UR - http://www.ncbi.nlm.nih.gov/pubmed/32936088 ID - info:doi/10.2196/21983 ER - TY - JOUR AU - Rankin, Debbie AU - Black, Michaela AU - Flanagan, Bronac AU - Hughes, F. Catherine AU - Moore, Adrian AU - Hoey, Leane AU - Wallace, Jonathan AU - Gill, Chris AU - Carlin, Paul AU - Molloy, M. Anne AU - Cunningham, Conal AU - McNulty, Helene PY - 2020/9/16 TI - Identifying Key Predictors of Cognitive Dysfunction in Older People Using Supervised Machine Learning Techniques: Observational Study JO - JMIR Med Inform SP - e20995 VL - 8 IS - 9 KW - classification KW - supervised machine learning KW - cognition KW - diet KW - aging KW - geriatric assessment N2 - Background: Machine learning techniques, specifically classification algorithms, may be effective to help understand key health, nutritional, and environmental factors associated with cognitive function in aging populations. Objective: This study aims to use classification techniques to identify the key patient predictors that are considered most important in the classification of poorer cognitive performance, which is an early risk factor for dementia. Methods: Data were used from the Trinity-Ulster and Department of Agriculture study, which included detailed information on sociodemographic, clinical, biochemical, nutritional, and lifestyle factors in 5186 older adults recruited from the Republic of Ireland and Northern Ireland, a proportion of whom (987/5186, 19.03%) were followed up 5-7 years later for reassessment. Cognitive function at both time points was assessed using a battery of tests, including the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), with a score <70 classed as poorer cognitive performance. This study trained 3 classifiers?decision trees, Naïve Bayes, and random forests?to classify the RBANS score and to identify key health, nutritional, and environmental predictors of cognitive performance and cognitive decline over the follow-up period. It assessed their performance, taking note of the variables that were deemed important for the optimized classifiers for their computational diagnostics. Results: In the classification of a low RBANS score (<70), our models performed well (F1 score range 0.73-0.93), all highlighting the individual?s score from the Timed Up and Go (TUG) test, the age at which the participant stopped education, and whether or not the participant?s family reported memory concerns to be of key importance. The classification models performed well in classifying a greater rate of decline in the RBANS score (F1 score range 0.66-0.85), also indicating the TUG score to be of key importance, followed by blood indicators: plasma homocysteine, vitamin B6 biomarker (plasma pyridoxal-5-phosphate), and glycated hemoglobin. Conclusions: The results suggest that it may be possible for a health care professional to make an initial evaluation, with a high level of confidence, of the potential for cognitive dysfunction using only a few short, noninvasive questions, thus providing a quick, efficient, and noninvasive way to help them decide whether or not a patient requires a full cognitive evaluation. This approach has the potential benefits of making time and cost savings for health service providers and avoiding stress created through unnecessary cognitive assessments in low-risk patients. UR - http://medinform.jmir.org/2020/9/e20995/ UR - http://dx.doi.org/10.2196/20995 UR - http://www.ncbi.nlm.nih.gov/pubmed/32936084 ID - info:doi/10.2196/20995 ER - TY - JOUR AU - Zhang, Liang AU - Qu, Yue AU - Jin, Bo AU - Jing, Lu AU - Gao, Zhan AU - Liang, Zhanhua PY - 2020/9/16 TI - An Intelligent Mobile-Enabled System for Diagnosing Parkinson Disease: Development and Validation of a Speech Impairment Detection System JO - JMIR Med Inform SP - e18689 VL - 8 IS - 9 KW - Parkinson disease KW - speech disorder KW - remote diagnosis KW - artificial intelligence KW - mobile phone app KW - mobile health N2 - Background: Parkinson disease (PD) is one of the most common neurological diseases. At present, because the exact cause is still unclear, accurate diagnosis and progression monitoring remain challenging. In recent years, exploring the relationship between PD and speech impairment has attracted widespread attention in the academic world. Most of the studies successfully validated the effectiveness of some vocal features. Moreover, the noninvasive nature of speech signal?based testing has pioneered a new way for telediagnosis and telemonitoring. In particular, there is an increasing demand for artificial intelligence?powered tools in the digital health era. Objective: This study aimed to build a real-time speech signal analysis tool for PD diagnosis and severity assessment. Further, the underlying system should be flexible enough to integrate any machine learning or deep learning algorithm. Methods: At its core, the system we built consists of two parts: (1) speech signal processing: both traditional and novel speech signal processing technologies have been employed for feature engineering, which can automatically extract a few linear and nonlinear dysphonia features, and (2) application of machine learning algorithms: some classical regression and classification algorithms from the machine learning field have been tested; we then chose the most efficient algorithms and relevant features. Results: Experimental results showed that our system had an outstanding ability to both diagnose and assess severity of PD. By using both linear and nonlinear dysphonia features, the accuracy reached 88.74% and recall reached 97.03% in the diagnosis task. Meanwhile, mean absolute error was 3.7699 in the assessment task. The system has already been deployed within a mobile app called No Pa. Conclusions: This study performed diagnosis and severity assessment of PD from the perspective of speech order detection. The efficiency and effectiveness of the algorithms indirectly validated the practicality of the system. In particular, the system reflects the necessity of a publicly accessible PD diagnosis and assessment system that can perform telediagnosis and telemonitoring of PD. This system can also optimize doctors? decision-making processes regarding treatments. UR - http://medinform.jmir.org/2020/9/e18689/ UR - http://dx.doi.org/10.2196/18689 UR - http://www.ncbi.nlm.nih.gov/pubmed/32936086 ID - info:doi/10.2196/18689 ER - TY - JOUR AU - Shen, Jiayi AU - Chen, Jiebin AU - Zheng, Zequan AU - Zheng, Jiabin AU - Liu, Zherui AU - Song, Jian AU - Wong, Yi Sum AU - Wang, Xiaoling AU - Huang, Mengqi AU - Fang, Po-Han AU - Jiang, Bangsheng AU - Tsang, Winghei AU - He, Zonglin AU - Liu, Taoran AU - Akinwunmi, Babatunde AU - Wang, Chiu Chi AU - Zhang, P. Casper J. AU - Huang, Jian AU - Ming, Wai-Kit PY - 2020/9/15 TI - An Innovative Artificial Intelligence?Based App for the Diagnosis of Gestational Diabetes Mellitus (GDM-AI): Development Study JO - J Med Internet Res SP - e21573 VL - 22 IS - 9 KW - AI KW - application KW - disease diagnosis KW - maternal health care KW - artificial intelligence KW - app KW - women KW - rural KW - innovation KW - diabetes KW - gestational diabetes KW - diagnosis N2 - Background: Gestational diabetes mellitus (GDM) can cause adverse consequences to both mothers and their newborns. However, pregnant women living in low- and middle-income areas or countries often fail to receive early clinical interventions at local medical facilities due to restricted availability of GDM diagnosis. The outstanding performance of artificial intelligence (AI) in disease diagnosis in previous studies demonstrates its promising applications in GDM diagnosis. Objective: This study aims to investigate the implementation of a well-performing AI algorithm in GDM diagnosis in a setting, which requires fewer medical equipment and staff and to establish an app based on the AI algorithm. This study also explores possible progress if our app is widely used. Methods: An AI model that included 9 algorithms was trained on 12,304 pregnant outpatients with their consent who received a test for GDM in the obstetrics and gynecology department of the First Affiliated Hospital of Jinan University, a local hospital in South China, between November 2010 and October 2017. GDM was diagnosed according to American Diabetes Association (ADA) 2011 diagnostic criteria. Age and fasting blood glucose were chosen as critical parameters.For validation, we performed k-fold cross-validation (k=5) for the internal dataset and an external validation dataset that included 1655 cases from the Prince of Wales Hospital, the affiliated teaching hospital of the Chinese University of Hong Kong, a non-local hospital. Accuracy, sensitivity, and other criteria were calculated for each algorithm. Results: The areas under the receiver operating characteristic curve (AUROC) of external validation dataset for support vector machine (SVM), random forest, AdaBoost, k-nearest neighbors (kNN), naive Bayes (NB), decision tree, logistic regression (LR), eXtreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT) were 0.780, 0.657, 0.736, 0.669, 0.774, 0.614, 0.769, 0.742, and 0.757, respectively. SVM also retained high performance in other criteria. The specificity for SVM retained 100% in the external validation set with an accuracy of 88.7%. Conclusions: Our prospective and multicenter study is the first clinical study that supports the GDM diagnosis for pregnant women in resource-limited areas, using only fasting blood glucose value, patients? age, and a smartphone connected to the internet. Our study proved that SVM can achieve accurate diagnosis with less operation cost and higher efficacy. Our study (referred to as GDM-AI study, ie, the study of AI-based diagnosis of GDM) also shows our app has a promising future in improving the quality of maternal health for pregnant women, precision medicine, and long-distance medical care. We recommend future work should expand the dataset scope and replicate the process to validate the performance of the AI algorithms. UR - https://www.jmir.org/2020/9/e21573 UR - http://dx.doi.org/10.2196/21573 UR - http://www.ncbi.nlm.nih.gov/pubmed/32930674 ID - info:doi/10.2196/21573 ER -