Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Journal Description

JMIR Medical Informatics (JMI, ISSN 2291-9694, Journal Impact Factor 3.8) (Editor-in-chief: Arriel Benis, PhD, FIAHSI) is an open-access journal that focuses on the challenges and impacts of clinical informatics, digitalization of care processes, clinical and health data pipelines from acquisition to reuse, including semantics, natural language processing, natural interactions, meaningful analytics and decision support, electronic health records, infrastructures, implementation, and evaluation (see Focus and Scope).

JMIR Medical Informatics adheres to rigorous quality standards, involving a rapid and thorough peer-review process, professional copyediting, and professional production of PDF, XHTML, and XML proofs.

The journal is indexed in MEDLINEPubMedPubMed CentralDOAJ, Scopus, and the Science Citation Index Expanded (SCIE)

JMIR Medical Informatics received a Journal Impact Factor of 3.8 (Source:Journal Citation Reports 2025 from Clarivate).

JMIR Medical Informatics received a Scopus CiteScore of 7.7 (2024), placing it in the 79th percentile (#32 of 153) as a Q1 journal in the field of Health Informatics.

 

Recent Articles:

  • Source: Freepik; Copyright: DC Studio; URL: https://www.freepik.com/free-photo/african-american-patient-signing-admission-form-into-private-clinic-hospital-front-desk-reception-healthcare-center-receptionist-holding-clipboard-man-sign-before-doctor-appointment_30472460.htm; License: Licensed by JMIR.

    An Artificial Intelligence–Based Framework for Predicting Emergency Department Overcrowding: Development and Evaluation Study

    Abstract:

    Background: Emergency department (ED) overcrowding remains a critical challenge, leading to delays in patient care and increased operational strain. Current hospital management strategies often rely on reactive decision-making, addressing congestion only after it occurs. However, effective patient flow management requires early identification of overcrowding risks to allow timely interventions. Machine learning (ML)–based predictive modeling offers a solution by forecasting key patient flow measures, such as waiting count, enabling proactive resource allocation and improved hospital efficiency. Objective: The aim of this study is to develop ML models that predict ED waiting room occupancy (waiting count) at 2 temporal resolutions. The first approach is the hourly prediction model, which estimates the waiting count exactly 6 hours ahead at each prediction time (eg, a 1 PM prediction forecasts 7 PM). The second approach is the daily prediction model, which forecasts the average waiting count for the next 24-hour period (eg, a 5 PM prediction estimates the following day’s average). These predictive tools support resource allocation and help mitigate overcrowding by enabling proactive interventions before congestion occurs. Methods: Data from a partner hospital’s ED in the southeastern United States were used, integrating internal and external sources. Eleven different ML algorithms, ranging from traditional approaches to deep learning architectures, were systematically trained and evaluated on both hourly and daily predictions to determine the models that achieved the lowest prediction error. Experiments optimized feature combinations, and the best models were tested under high patient volume and across different hours to assess temporal accuracy. Results: The best hourly prediction performance was achieved by time series vision transformer plus (TSiTPlus) with a mean absolute error (MAE) of 4.19 and a mean squared error (MSE) of 29.36. The overall hourly waiting count had a mean of 18.11 and a SD (σ) of 9.77. Prediction accuracy varied by time of day, with the lowest MAE at 11 PM (2.45) and the highest at 8 PM (5.45). Extreme case analysis at (mean + 1σ), (mean + 2σ), and (mean + 3σ) resulted in MAEs of 6.16, 10.16, and 15.59, respectively. For daily predictions, an explainable convolutional neural network plus (XCMPlus) achieved the best results with an MAE of 2.00 and a MSE of 6.64. The daily waiting count had a mean of 18.11 and a SD of 4.51. Both models outperformed traditional forecasting approaches across multiple evaluation metrics. Conclusions: The proposed prediction models effectively forecast ED waiting count at both hourly and daily intervals. The results demonstrate the value of integrating diverse data sources and applying advanced modeling techniques to support proactive resource allocation decisions. The implementation of these forecasting tools within hospital management systems has the potential to improve patient flow and reduce overcrowding in emergency care settings. The code is available in our GitHub repository. Trial Registration:

  • Prompt: Generate a realistic image of a person with bipolar disorder wearing a smartwatch centered in the frame. Surround it with icons for heart rate, steps, and sleep. No text. Source: Image created by the authors with Copilot; Copyright: N/A (AI-generated image); URL: https://medinform.jmir.org/2025/1/e66277/; License: Public Domain (CC0).

    Using Wearable Device and Machine Learning to Predict Mood Symptoms in Bipolar Disorder: Development and Usability Study

    Abstract:

    Background: Bipolar disorder (BD) is a highly recurrent disorder. Early detection, early intervention, and prevention of recurrent bipolar mood symptoms are key for better prognosis. Objective: This study aims to build prediction models for BD with machine learning algorithms. Methods: This study recruited 24 participants with BD. The Beck Depression Inventory (BDI) and Young Mania Rating Scale (YMRS) were used to evaluate depressive and manic episodes respectively. Using digital biomarkers collected from wearable devices as input, six machine learning algorithms (Logistic Regression, Decision Tree, K-Nearest Neighbors, Random Forest, Adaptive Boosting, and Extreme Gradient Boosting) were used to build predictive models. Results: The prediction model for depressive symptoms achieved 83% accuracy, 0.89 Area Under the Receiver Operating Characteristic curve (AUROC), and 0.65 F1 score on testing data. The prediction model for manic symptoms achieved 91% accuracy, 0.88 AUROC, and 0.25 F1 score on testing data. With the interpretable model Shapely Additive exPlanations (SHAP), we found that relatively high resting heart rate, low activity, and lack of sleep may predict depressive symptoms. Conclusions: This study demonstrated that digital biomarkers could be used to predict depressive and manic symptoms. This prediction model may be beneficial for the early detection of mood symptoms, facilitating timely treatment and helping to prevent BD recurrence.

  • Source: Freepik; Copyright: rawpixel.com; URL: https://www.freepik.com/free-photo/donation-community-service-volunteer-support_16473232.htm; License: Licensed by JMIR.

    Holistic Influence of Multimodal Medical Crowdfunding Affordances on Charitable Crowdfunding Outcome: Systematic Multimodel Analysis Study

    Abstract:

    Background: Medical crowdfunding has emerged as a critical tool to alleviate the financial burden of health care costs, particularly in regions where economic disparities limit access to medical treatment. Despite its potential, the success rates of medical crowdfunding projects remain low, with only 9% achieving their fundraising goals in China. Previous research has examined isolated factors influencing success, but a holistic understanding of how multimodal affordances—narrativity, visibility, and progress—collectively impact donor behavior and project outcomes is lacking. Objective: This study aims to investigate how medical crowdfunding affordances, as an integrated system, influence the success of charitable crowdfunding projects. Specifically, it explores the roles of narrativity (textual elements), visibility (visual elements), and progress (dynamic updates) affordances, and how these interact with patient demographics to shape donor engagement and fundraising outcomes. Methods: A multimodal analysis was conducted using 1261 medical crowdfunding projects from the Shuidichou platform in China. Machine learning techniques (eg, sentiment analysis via SnowNLP) and regression models were used to examine textual content, visual elements, and progress updates. Control variables included patient age, gender, and beneficiary type. Hypotheses were tested using both continuous (success ratio) and binary (success indicator) measures of project success. In total, 6 models were constructed to examine the influences of affordances. Results: The study found that narrativity affordances—longer titles (model 1a: P=.04; model 3a: P=.03) and detailed surplus fund descriptions (P=.03)—boosted success, while overly lengthy surplus fund explanations had diminishing returns (P=.005). Disease mentions in titles increased donations (model 1a: P=.01; model 3a: P=.003). A neutral tone in the project plan also improved success (P<.001). For visibility affordances, a moderate number of progress photos maximized project success, while excessive visuals reduced impact (P<.001). Progress affordances followed a similar pattern, with a moderate number of updates enhancing success (P<.001). Critically, when all affordances were considered, only progress update frequency retained a strong inverted U-shaped effect on success (P<.001). Demographics, particularly age, also influenced donations: patients at both ends of the age spectrum received greater support , while middle-aged individuals received less (model 1b: P=.02; model 2b: P=.005; model 3b: P=.02). Conclusions: This study advances medical crowdfunding affordance theory by demonstrating the interconnected effects of narrativity, visibility, and progress affordances on project success. Practically, results highlight the importance of strategically crafted titles, targeted demographic disclosures, and balanced progress updates—with moderate update frequency being crucial when controlling all affordances—to enhance donor engagement. Platform designers and project organizers can apply these insights to optimize fundraising outcomes and effectively address health care inequalities. Future research should further investigate visual content analysis and donor psychology to refine engagement strategies. Trial Registration:

  • Source: Freepik; Copyright: fabrikasimf; URL: https://www.freepik.com/free-photo/stethoscope-laptop-closeup_21125897.htm; License: Licensed by JMIR.

    Evidence for the Use of Patient-Reported Outcome Measures in the Treatment of Patients With Noncommunicable Diseases: Systematic Review

    Abstract:

    Background: The use of patient-reported outcome measures (PROMs) as a clinical tool for screening and decision-making has gained widespread interest, with numerous implementation activities across specialties, even though the evidence has not been clear until now. Objective: The aim of this study was to assess the evidence for using PROMs in clinical practice for patients with diabetes, chronic obstructive pulmonary disease (COPD), heart disease, rheumatoid arthritis (RA), and inflammatory bowel disease (IBD). Additionally, we sought to determine the characteristics of the most effective PROM interventions. Methods: We conducted a systematic review of published randomized controlled trials (RCTs) on the use of PROMs for clinical purposes, such as systematic PROM assessment alone or with a predefined PROM-based decision-making method. Eligible studies included adult patients (>18 years) with diabetes, COPD, heart disease, RA, or IBD. We excluded studies using PROMs as an outcome measure or otherwise not meeting the inclusion criteria. We searched the PubMed/MEDLINE, CINAHL, EMBASE, and Web of Science databases until February 2023. Two investigators independently screened titles, abstracts, and relevant full texts. Three investigators completed data extraction and risk-of-bias assessment using version 2 of the Cochrane risk-of-bias tool for randomized trials (RoB 2). The data were presented in a narrative synthesis and in summarized form. Results: The search yielded 21,203 papers, 686 (3.2%) full-text papers were screened, and 56 (8.2%) original studies were included in the review. The studies included patients with heart disease (n=17, 30.4%), COPD (n=13, 23.2%), diabetes (n=10, 17.9%), IBD (n=9, 16.1%), and RA (n=6, 10.7%), as well as patients with mixed diagnoses (n=1, 1.8%). All interventions incorporated systematic PROM assessments. Some interventions additionally used a predefined method for PROM-based decision-making (n=19, 33.9%) or PROM-based dialogue (n=9, 16.1%), while 5 (8.9%) interventions aimed to substitute face-to-face consultations. The predominant mode of PROM administration was over the phone, followed by electronic devices and apps. Endpoints included disease activity, health care use, mortality, mental well-being, quality of life, self-efficacy, self-care, daily functioning, and other outcomes. Six studies with a low risk of bias demonstrated a positive effect or noninferiority of the PROM intervention. Conclusions: The evidence base for clinical use of PROMs is sparse, with few studies evaluated to have a low or a medium risk of bias. The clinical use of PROMs does not appear superior to usual care in the five included chronic diseases on any endpoint. To guide further research, we highlighted 6 (10.7%) studies with a low risk of bias and PROM interventions with a positive effect. These were characterized by symptom assessment with predefined cutoffs used for decision and dialogue support. Trial Registration: PROSPERO CRD42021226896; https://www.crd.york.ac.uk/PROSPERO/view/CRD42021226896

  • Source: Freepik; Copyright: prostooleh; URL: https://www.freepik.com/free-photo/skillful-nurse-is-doing-blood-test-man-clinic-man-medical-mask_17066434.htm; License: Licensed by JMIR.

    Real-Time Estimation of Arterial Partial Pressure of Carbon Dioxide in Patients Undergoing General Anesthesia: Predictive Modeling Study

    Abstract:

    Background: Adequate ventilation in mechanically ventilated patients during general anesthesia is contingent upon the monitoring of the arterial partial pressure of carbon dioxide (PaCO2). Despite its significance, continuous monitoring remains challenging due to the limitations of intermittent invasive measurements, such as arterial blood gas analysis (ABGA), and the imprecision of estimates and often unpredictable gradient between PaCO2 and ETCO2 may lead to misinterpretation of a patient’s ventilator status, potentially resulting in adverse outcomes. Objective: This study aimed to develop a machine learning model to continuously estimate PaCO2 in mechanically ventilated patients using a comprehensive set of readily available noninvasive parameters, thereby improving intraoperative monitoring accuracy under general anesthesia. Methods: This retrospective study used the VitalDB dataset, a public database from Seoul National University Hospital, containing records from 6,388 noncardiac surgery patients between August 2016 and June 2017. After applying inclusion and exclusion criteria, data from 2,304 surgical cases, yielding 4,651 PaCO2 measurement event points, were included in this analysis. The CatBoostRegressor model was employed to predict PaCO2. A total of 19 noninvasive features were used, comprising intraoperative vital signs such as ETCO2, body temperature, the SpO2/FiO2 ratio, respiratory rate, and airway pressures, along with preoperative clinical information including age, gender, and pulmonary function test results. The model’s performance was evaluated using a nested cross-validation scheme to ensure robust and generalizable results. Performance was assessed across hypocapnic (<35 mmHg), normocapnic (35-45 mmHg), and hypercapnic (>45 mmHg) subgroups and compared to two conventional baseline methods: a simple offset (ETCO2 + 5 mmHg) and linear regression with ETCOw as the sole predictor. Results: The developed model demonstrated superior overall performance compared to both traditional estimation methods. It achieved a mean absolute error (MAE) of 2.38 mmHg and a root mean squared error (RMSE) of 3.26 mmHg. The model showed excellent agreement with actual PaCO2 measurements, with an average intraclass correlation coefficient (ICC) of 0.87 (95% CI: 0.86-0.87). In terms of clinical utility, 90.02% of the model’s estimations fell within the highly acceptable range of ±5 mmHg error, a substantial improvement from the 80.43% achieved by the linear regression baseline. Furthermore, clinically unacceptable errors (> ±10 mmHg) were reduced to 1.20%, less than half the rate of the baseline model (2.64%). These performance improvements were consistently observed across all PaCO2 subgroups, including the more challenging hypocapnic and hypercapnic ranges. Conclusions: The developed machine learning-based model provides more accurate and reliable estimates of PaCO2 than traditional ETCO2-based methods. This approach shows potential for enhancing continuous respiratory monitoring, facilitating timely and precise clinical interventions, and serving as a valuable supplementary tool for anesthetic management. Further validation, including prospective studies to assess its impact on clinical decision-making and patient outcomes, is necessary to fully realize its clinical integration.

  • Source: Freepik; Copyright: tirachardz; URL: https://www.freepik.com/free-photo/young-asian-pregnant-woman-drawing-baby-belly-notebook-mom-feeling-happy-smiling-positive-peaceful-while-take-care-child-lying-sofa-living-room-home_6139091.htm; License: Licensed by JMIR.

    Interpretable Machine Learning for Predicting Adverse Pregnancy Outcomes in Gestational Diabetes: Retrospective Cohort Study

    Abstract:

    Background: Gestational diabetes mellitus (GDM) affects over 5% of pregnancies globally, elevating risks of type 2 diabetes postpartum and complications such as fetal death, miscarriage, and congenital abnormalities. Effective GDM management is essential to balance glycemic control and pregnancy outcomes. Objective: To develop interpretable machine learning models using GDM datasets for predicting adverse pregnancy outcomes and identifying key factors through the SHAP algorithm, thus supporting improved maternal and infant health. Methods: Data preprocessing and feature selection were performed, with ADASYN used to address class imbalance. Classification models, including logistic regression, random forest, SVM, and XGBoost, were built and enhanced through stacking method. Model interpretability was assessed with SHAP to quantify feature contributions. Results: Among 1,670 patients, 200 experienced adverse outcomes. The stacked model achieved 97.9% accuracy and an AUC of 0.96 in external validation. SHAP analysis highlighted key predictive factors, such as gestational age, fasting glucose, and blood pressure, supporting model reliability. Conclusions: This study underscores the potential of machine learning in predicting adverse outcomes in GDM, with interpretable features offering valuable clinical insights to enhance pregnancy management and maternal-infant health.

  • Source: Pexels; Copyright: Kaboompics.com; URL: https://www.pexels.com/photo/a-female-doctor-having-a-video-call-7195310/; License: Licensed by JMIR.

    Assessing Internet Quality Across Public Health Centers in Indonesia: Cross-Sectional Evaluation Study

    Abstract:

    Background: Primary health care centers (Puskesmas) serve as the cornerstone of Indonesia’s healthcare system, providing integrated services aimed at improving individual health through prevention, treatment, and health promotion. To fulfill these roles effectively, robust technological infrastructure—particularly reliable internet connectivity—is increasingly essential. Assessing the availability and quality of internet access in Puskesmas is therefore a critical step in understanding their readiness to implement digital health initiatives and fulfill their responsibilities in delivering accessible and effective healthcare services. Objective: This study provides a national baseline assessment of internet quality and its relevant information technology infrastructure in over 10,000 Puskesmas across Indonesia. Methods: A cross-sectional survey was taken throughout all Puskesmas (10,382) in 34 provinces in Indonesia, using an online questionnaire. Categorization was done to analyze internet quality level results. Results: A total of 10,378 public health centres (99.96%) participated in this study, with 745 (7.18%) did not have internet access, 1,487 (14,33%) have limited internet access, 5,567 (53.64%) have sufficient internet access, and 2,579 (24.85%) have sufficient and fast internet access. Moreover, 832 Puskesmas (8.02%) do not have 24-hour electricity, 44,196 (43.7%) have CPU with i3 specifications, 43,044 (42.56%) have 512 GB hard disk capacity, and 67,272 (66,5%) uses antivirus. Conclusions: Although 79% of Puskesmas in Indonesia already had sufficient internet access, 21% still have limited and insufficient access. To ensure universal internet availability, it is essential to build collaborative support among internet providers and government to foster the availability and utilisation of internet satellites, high-quality computers, and electrical power to support internet connectivity.

  • Source: Freepik; Copyright: rawpixel.com; URL: https://www.freepik.com/free-photo/hand-medical-glove-pointing-virtual-screen-medical-technology_15606707.htm#; License: Licensed by JMIR.

    Lessons Learned From Building a Data Platform for Longitudinal, Analytical Use Cases and Scaling to 77 German Hospitals: Implementation Report

    Abstract:

    Background: Increasing adoption of electronic medical records (EHR) enables research on real-world data. In Germany this has been limited to university hospitals, data from acute care hospitals below university level is lacking. To address this gap we initiated the Helios Safe Medical Data-Platform (HeSaMeDa), which aggregates and standardizes pseudonymised EHR data with patients’ consent. Objective: To report on the design, implementation, patient participation and lessons learned during the scaling of a research platform to incorporate consented real-world data from 77 distinct hospitals into a unified data lake. Methods: Due to variations in EHR adoption, IT infrastructure, software vendors, interface availability and regulatory requirements, we used an agile development cycle that involves constant, incremental standardization of data. We implemented a layered lambda infrastructure built on Apache Hadoop. Decentralized connectors ensure data minimization and pseudonymization. Results: We successfully scaled our data model both laterally and horizontally in 77 hospitals. However, we encountered issues during the scaling of real-time data pipelines and IHE interfaces. During the first 2 years patients were asked to consent to secondary data use 1,475,244 times during inpatient admission. We registered 1,023,633 broad consents (consent rate: 70.2%). Conclusions: Patients are generally willing to provide consent for secondary use of their data, but obtaining consent requires considerable effort. Building a research data platform isn’t an end goal, but rather a necessary step in collecting and standardizing longitudinal data to enable research on real-world data. Through the combination of agile development, phased rollouts and very high levels of automation, we have been able to achieve fast turnaround times for incorporating user feedback and are constantly improving data quality and standardization.

  • Source: Freepik; Copyright: rawpixel.com; URL: https://www.freepik.com/free-photo/medical-examination-report-history-history_17056893.htm; License: Licensed by JMIR.

    Performance of Natural Language Processing for Information Extraction From Electronic Health Records Within Cancer: Systematic Review

    Abstract:

    Background: Over the last decade, natural language processing (NLP) has provided various solutions for information extraction (IE) from textual clinical data. In recent years, the use of NLP in cancer research has gained considerable attention, with numerous studies exploring the effectiveness of various NLP techniques for identifying and extracting cancer-related entities from clinical text data. Objective: We aimed to summarize the performance differences between various NLP models for IE within the context of cancer to provide an overview of the relative performance of existing models. Methods: This systematic literature review was conducted using three databases (PubMed, Scopus, and Web of Science) to search for articles extracting cancer-related entities from clinical texts. 33 articles were eligible for inclusion. We extracted NLP models and their performance by F1 scores. Each model was categorized into the following categories: Rule-based, Traditional Machine Learning, CRF-based, Neural Network, and Bidirectional transformer. The average of the performance difference for each combination of categorizations was calculated across all articles. Results: The articles covered various scenarios, with the best performance for each article, ranging from 0.355 to 0.985 in F1 score. Looking at the overall relative performances, the bidirectional transformer category outperformed every other category (by between 0.2335 and 0.0439 on average F1 score). The percentage of articles on implementing bidirectional transformers has increased over the years. Conclusions: NLP has demonstrated the ability to identify and extract cancer-related entities from unstructured textual data. Generally, more advanced models outperform less advanced ones. The bidirectional transformer category performed the best.

  • Source: Pixabay; Copyright: Pexels; URL: https://pixabay.com/photos/laptop-apple-computer-desk-macbook-1846277/; License: Licensed by JMIR.

    Extracting Symptoms of Complex Conditions From Online Discourse (Subreddit to Symptomatology): Lexicon-Based Approach

    Abstract:

    Background: Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. Objective: We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. Methods: We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT–based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. Results: In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. Conclusions: The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.

  • Source: The Authors; Copyright: The Authors; URL: https://medinform.jmir.org/2025/1/e64759/; License: Creative Commons Attribution (CC-BY).

    Enhancing Oral Health Diagnostics With Hyperspectral Imaging and Computer Vision: Clinical Dataset Study

    Abstract:

    Background: Diseases of the oral cavity, including oral squamous cell carcinoma (OSCC) pose major challenges to healthcare worldwide due to its late diagnosis and complicated differentiation of oral tissues. Endoscopic hyperspectral imaging (eHSI) represents a promising approach to the demand for modern, non-invasive tissue diagnostics. This dataset is designed to enhance the performance of deep learning models by providing comprehensive spectral data essential for distinguishing between healthy and pathological oral tissue conditions. Objective: To develop and validate a clinical dataset of endoscopic hyperspectral imaging (eHSI) of the oral cavity and to evaluate the performance of deep learning-based semantic segmentation models for automated tissue classification. Methods: This clinical study included 226 participants (166 women, 60 men, aged 24-87). eHSI data were collected using an endoscopic hyperspectral sensor system, capturing spectral data in the range of 500-1000 nm. Each participant underwent five standardized intraoral hyperspectral scans of the cheek, palate, tongue, and teeth. RGB and eHSI images were archived in NPY format for Python analysis. Oral structures were annotated using RectLabel Pro©. DeepLabv3 with a ResNet-50 backbone was adapted for eHSI segmentation by modifying the first convolutional layer. The model was trained for 50 epochs on 70% of the dataset, with 30% for evaluation. Performance metrics (Precision, Recall, F1-score) confirmed its efficacy in distinguishing oral tissue types. Results: Preliminary analysis revealed that the Coefficient of Variation exceeded 15% for most spectral bands, indicating high variability in spectral signatures. DeepLabv3 (ResNet-101) achieved strong segmentation performance (F1-score: 0.856, Precision: 0.8506, Recall: 0.8634), excelling in key structures—mucosa (F1: 0.915), retractor (F1: 0.940), tooth (F1: 0.902), and palate (F1: 0.900). Moderate results for gingiva (F1: 0.760) and lip (F1: 0.706) suggest potential for further refinement. To assess dataset robustness, additional models were tested, including DeepLabv3 (ResNet-50), FCN (ResNet-50/101), PSPNet (ResNet-50/VGG16), and U-Net (EfficientNet-B0/ResNet-50). DeepLabv3 (ResNet-101) and U-Net (EfficientNet-B0) emerged as top performers, particularly in retractor, mucosa, and tooth segmentation. Conclusions: Conducting an in-depth hyperspectral analysis of oral tissue has facilitated the development of robust deep learning algorithms, thereby enhancing diagnostic accuracy and clinical applicability. This study integrates advanced imaging technologies with deep learning, offering significant potential to enhance non-invasive detection, classification, and therefore individualized treatment of oral diseases. Clinical Trial: Deutsche Forschungsgemeinschaft (DFG) - Projectnumber 516210826 https://gepris.dfg.de/gepris/projekt/516210826

  • Source: Freepik; Copyright: DC Studio; URL: https://www.freepik.com/free-photo/stomatolog-nurse-tooth-clinic-checking-patient-appointment-looking-computer-monitor-stomatology-assistant-teeth-doctor-discussing-reception-dental-office_17437513.htm; License: Licensed by JMIR.

    Using Machine Learning to Predict-Then-Optimize Elective Orthopedic Surgery Scheduling to Improve Operating Room Utilization: Retrospective Study

    Abstract:

    Background: Total knee and hip arthroplasty (TKA and THA) are among the most performed elective procedures. Rising demand and the resources intensive nature of these procedures has contributed to longer wait times despite significant healthcare investment. Current scheduling methods often rely on average surgical durations, overlooking patient-specific variability. Objective: To determine the potential for improving elective surgery scheduling for total knee and hip arthroplasty (TKA and THA, respectively) by utilizing a two-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Methods: Two ML models (one each for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 patients, respectively, from a large international database. Three optimization formulations based on varying surgeon flexibility were compared: Any (surgeons could operate in any operating room at any time), Split (limitation of two surgeons per operating room per day) and MSSP (limit of one surgeon per operating room per day). Two years of daily scheduling simulations were performed for each optimization problem using ML-prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high-volume arthroplasty hospital in Canada. Results: The TKA and THA prediction models achieved test accuracy (with a 30-minute buffer) of 78.1% (MSE 0.898) and 75.4% (MSE 0.916), respectively. Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (P<.001). The latter two problems performed similarly (P>.05) over most schedule parameters. The ML-prediction schedules outperformed those generated using a mean DOS for most scheduling parameters, with overtime reduced on average by 300 to 500 minutes per week (12-20 minutes per operating room per day) (P <.001). However, there was more OR underutilization with the ML-prediction schedules, with it ranging from 70-192 minutes more underutilization (P<.001). Using a 15-minute schedule granularity with a waitlist pool of minimum one month generated the ML-schedule that outperformed the mean schedule 97.1% of times. Conclusions: Assuming a full waiting list, optimizing an individual surgeon’s elective operating room time using an ML-assisted predict-then-optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime. This has significant potential implications for healthcare systems struggling with pressures of rising costs and growing operative waitlists.

Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Latest Submissions Open for Peer-Review:

View All Open Peer Review Articles
  • GPT-4o Powered Pre-Anesthetic AI: Development and Validation

    Date Submitted: Sep 10, 2025

    Open Peer Review Period: Sep 16, 2025 - Nov 11, 2025

    Background: Pre-anesthetic assessment is essential for identifying high-risk surgical patients and minimizing perioperative complications. However, conventional tools such as the American Society of A...

    Background: Pre-anesthetic assessment is essential for identifying high-risk surgical patients and minimizing perioperative complications. However, conventional tools such as the American Society of Anesthesiologists (ASA) classification and postoperative nausea and vomiting (PONV) risk scores are limited by subjectivity and reliance on manual data input, reducing consistency and scalability. Objective: This study aimed to develop and retrospectively validate an artificial intelligence (AI)–enabled pre-anesthetic assessment system powered by GPT-4o. The system was designed to predict ASA physical status and PONV risk using structured and unstructured data from electronic medical records. Methods: A retrospective, single-center study was conducted at a medical center in Taiwan between January and May 2025. A total of 600 hospitalized surgical patients aged ≥18 years were selected using stratified random sampling. (For PONV, the primary analysis counted High risk as test-positive; a sensitivity analysis counted Moderate and High as positive.) Results: With National Health Insurance (NHI) data, agreement for ASA was near-perfect (κ=0.883); without NHI it was moderate (κ=0.518). For PONV (High=positive), the AI achieved sensitivity 34.7% (95% CI 22.9–48.7), specificity 99.1% (95% CI 97.9–99.6), and accuracy 93.8% (95% CI 91.6–95.5). The 2×3 association was significant (χ²(2)=169.25, p<0.001; Cramér’s V=0.531). Conclusions: The GPT-4o–powered AI system demonstrated robust validity in pre-anesthetic risk assessment. Incorporating comprehensive data sources, such as NHI datasets, significantly improved ASA prediction accuracy. These findings support the integration of large language model (LLM)–based tools into preoperative workflows, with potential to enhance decision support, optimize resource use, and advance smart healthcare delivery.

  • User Satisfaction and Perceived Barriers to Implementation of Electronic Health Management Systems in Childcare in Ghana: A Systematic Review

    Date Submitted: Sep 7, 2025

    Open Peer Review Period: Sep 16, 2025 - Nov 11, 2025

    Background: In an era where technology is rapidly transforming healthcare, Electronic Health Management Information Systems (EHMIS) have emerged as critical tools for enhancing patient-centered care a...

    Background: In an era where technology is rapidly transforming healthcare, Electronic Health Management Information Systems (EHMIS) have emerged as critical tools for enhancing patient-centered care and healthcare management. Despite the potential benefits of EHMIS, including improved access to accurate patient information and reduced child morbidity and mortality rates, their implementation remains suboptimal due to barriers such as inadequate infrastructure, resistance from healthcare providers, and a lack of technical expertise. Objective: This systematic review aimed to evaluate user satisfaction with Electronic Health Management Information Systems (EHMIS) in pediatric healthcare settings in Ghana and identify implementation barriers. Methods: Following PRISMA guidelines, the review analyzed 17 studies (10 on user satisfaction, 7 on implementation barriers) selected from 626 initial records using the JBI critical appraisal checklist. Findings revealed moderate to high satisfaction levels among healthcare professionals (55-72%), with pediatric staff generally reporting higher satisfaction than other departments. Results: Key satisfaction factors included ease of retrieving child health information, improved data management, and enhanced system responsiveness. Implementation barriers encompassed technical challenges (unreliable power, poor connectivity, limited interoperability), human resource issues (insufficient training, staff resistance), system design limitations (inadequate pediatric-specific interfaces), and ethical concerns regarding child health data security. Conclusions: Recommendations include developing pediatric-specific EHMIS modules, enhancing user interfaces with input from pediatric professionals, implementing targeted training programs, improving infrastructure in pediatric departments, establishing data standards for interoperability, and developing mobile health extensions to improve child healthcare outcomes.

  • Health Knowledge Management Platform: A Requirement-based Evaluation of a Data-centric Approach for Patient Care and Research

    Date Submitted: Sep 5, 2025

    Open Peer Review Period: Sep 16, 2025 - Nov 11, 2025

    Background: In the evolving landscape of healthcare, data utilization plays an ever-increasing role in health care IT. However, data are often siloed, and uncoded free-text, distributed across several...

    Background: In the evolving landscape of healthcare, data utilization plays an ever-increasing role in health care IT. However, data are often siloed, and uncoded free-text, distributed across several IT systems. This paper introduces a Health Knowledge Management Platform, designed to integrate, harmonize and enable re-use of health care and medical research data. The platform aims to bridge the gap between research and patient care, showcased through real-world scenarios, emphasizing data harmonization and knowledge management within a healthcare institution. The study is based at University Hospital Schleswig-Holstein's (UKSH). Objective: The main objective of this project is to design, implement and evaluate a knowledge management platform that integrates health care and biomedical research to support use cases in both domains. Methods: The study describes the " health knowledge management platform" designed to access and gain knowledge from health care and medical research data. We performed several rounds of focus groups with stakeholders to elicit the platform requirements. In the process we identified key aspects of the platform. From the requirements we designed an architecture concept. The platform evaluation follows the Framework for Evaluation in Design Science Research (FEDS) and ISO/IEC 25000 standard with a focus on key aspects identified, and real-world scenarios. Two application scenarios – cardiology and radiology – are selected for a naturalistic, qualitative evaluation. Results: We show that our Open Health Knowledge Management Platform is capable of integrating diverse data formats like HL7® V2 messages, CSV exports, and DICOM® imaging data. The platform is also capable of supporting different scenarios based on its five-layer architecture including a clinical data repository and services like Master Patient Index and Consent Management. The evaluation showed our platform’s capability in certain real-world scenarios of cardiology and radiology. Our evaluation confirms the platform’s coverage of key points and requirements identified to support knowledge management in health care institutions. Conclusions: Our evaluation of the health knowledge management platform at UKSH reveals its capabilities which are possibly leading to better knowledge transfer between patient care and research. The platform's architecture and standardized data improve the quality of data and facilitates access to knowledge. Ongoing development and potential quantitative measures will further enhance its applicability and performance in dynamic health care landscapes. Clinical Trial: N/A

  • A Secure User Interface for Pre-Clinical Evaluation of Artificial Intelligence in Patient Portal Message Management

    Date Submitted: Sep 2, 2025

    Open Peer Review Period: Sep 15, 2025 - Nov 10, 2025

    Background: The growing use of artificial intelligence (AI) to support patient portal message management requires rigorous pre-clinical evaluation. Directly testing AI within electronic health record...

    Background: The growing use of artificial intelligence (AI) to support patient portal message management requires rigorous pre-clinical evaluation. Directly testing AI within electronic health record (EHR) systems poses significant safety, workflow, and data-governance risks. We developed and describe a secure user interface (UI) that enables clinical and technical teams to experiment with AI for portal messaging on de-identified data before clinical integration. Objective: We developed a secure user interface (UI) that enables clinical and technical teams to experiment with AI for portal messaging on de-identified data before clinical integration, which we describe here for others to adapt. Methods: We developed a web UI in Python 3 with a modular backend for data handling, de-identification, and AI task execution. The system runs in a secure research environment equipped with an NVIDIA GRID T4-1Q GPU and institutional access controls. An IRB-approved corpus of patient-portal messages was cleaned to a dementia-relevant subset of 6,941 Medical Advice Request messages. We designed a de-identification pipeline to remove or replace personal health identifiers. The platform supports single-message and batch workflows and exposes exemplar LLM-enabled tasks such as authorship identification, message categorization, criticality flagging, and response drafting using zero-, one-, and few-shot prompting. Only de-identified text is sent to the model endpoint. Results: The system successfully executed end-to-end workflows: ingest de-identified messages, run individual or batch AI analyses, and present outputs for review. PHI masking was consistently applied across the corpus. Exemplar runs demonstrated prompting strategies to yield interpretable outputs for authorship identification, categorization, and criticality flagging, while response drafting produced editable clinician starting points. A token-based cost readout provided transparent operating estimates for LLM-backed tasks. Conclusions: This framework offers a practical path to study AI behavior on real, de-identified messages without affecting live EHR workflows and thus supports exploratory testing, prompt iteration, and comparative analyses, including LLM prompts versus baseline models, while preserving governance boundaries. We discuss design choices, safety controls, and the limits of a sandbox approach. A secure, UI-based sandbox enables health-system teams to evaluate AI for patient-portal messaging before clinical integration.

  • Effects of Follow-Up Interval and Care Setting on Valve Severity Changes Between Successive Echocardiograms: Secondary Analysis of MIMIC-III EchoNotes Cohort

    Date Submitted: Aug 31, 2025

    Open Peer Review Period: Sep 15, 2025 - Nov 10, 2025

    Background: Echocardiography reports in routine care often encode clinical reasoning as ordinal valve-lesion severity. The EchoNotes database derived from MIMIC III harmonizes such report text into a...

    Background: Echocardiography reports in routine care often encode clinical reasoning as ordinal valve-lesion severity. The EchoNotes database derived from MIMIC III harmonizes such report text into a reproducible ordinal schema across echocardiograms, enabling population-scale analysis of report-level change between successive exams. However, most prior work has focused on image-based automation (e.g., ejection fraction estimation) or label extraction from text, with far less attention to how these ordinal severities change over time. Critically, apparent change is confounded by follow-up interval (\mathrm{\Delta t}) and care setting (inpatient vs outpatient), yet the literature offers few \mathrm{\Delta t}-aware summaries or nonparametric, \mathrm{\Delta t}-standardized setting comparisons. Objective: The study conducted a secondary data analysis of the de‑identified EchoNotes dataset to provide an evidence‑based description of how ordinal valve-lesion severity levels change between successive examinations and to determine how apparent change depends on \mathrm{\Delta t} and care setting using observed data with uncertainty quantification. Methods: We analyzed the EchoNotes dataset that has 45,794 reports. For each patient and lesion of interest (aortic valve (AV) stenosis, AV regurgitation, mitral valve (MV) stenosis, and MV regurgitation), we chronologically ordered reports, formed successive within‑patient pairs, and retained evaluable ordinal states (0 = normal, 1 = mild, 2 = moderate, 3 = severe). We computed row‑normalized next‑visit transition matrices summarizing, for each baseline state, the distribution of the subsequent state. To make visit timing explicit, we stratified transitions by \mathrm{\Delta t} (< 7, 7-30, 30-90, and ≥ 90 days). Care‑setting differences were estimated by assigning pairs to inpatient or outpatient status. Because settings differ in follow‑up timing, we produced a nonparametric \mathrm{\Delta t}‑standardized contrast. Within each baseline state, inpatient and outpatient pairs were reweighted to a shared \mathrm{\Delta t} bin mix before computing difference matrices (inpatient minus outpatient). Uncertainty was quantified with a subject‑level bootstrap. Results: Left-sided lesion codes were frequently evaluable (AV regurgitation—72.7%, AV stenosis—65.2%, MV regurgitation—67.8%, MV stenosis—37.0%). The ordinal association analysis witnessed the strongest correlation between right-ventricle dysfunction and chamber enlargement (ρ ≈ 0.62). In \mathrm{\Delta t}-stratified next-visit transition matrices, diagonal persistence dominated short intervals, with off-diagonal movement rising at longer \mathrm{\Delta t} (most for MV regurgitation and least for AV stenosis). Care-setting comparisons showed more movement among inpatients in the non-\mathrm{\Delta t}-standardized matrices (e.g., 35% inpatient vs 53% outpatient for AV stenosis when concerning the moderate-to-moderate transitions), but \mathrm{\Delta t}-standardization attenuated most differences, with residual contrasts largely confined to higher-severity AV regurgitation where inpatients were more likely to improve from worse states. Conclusions: We propose an evidence based workflow that yields interpretable maps of how report level echo severities behave in routine care. Our \mathrm{\Delta t} aware results separate visit scheduling from biology, support fairer comparisons across services, and provide practical priors for simulation, cohort design, and evaluation of future multimodal or AI systems against an observational baseline.

  • Patient Stratification for Improving Acute Chest Pain Management and Mitigate ED Crowding

    Date Submitted: Aug 27, 2025

    Open Peer Review Period: Sep 9, 2025 - Nov 4, 2025

    Objectives: Artificial intelligence (AI) models were developed to support clinical decision-making in the initial management of patients with acute chest pain. The models can accurately detect ACS, a...

    Objectives: Artificial intelligence (AI) models were developed to support clinical decision-making in the initial management of patients with acute chest pain. The models can accurately detect ACS, as well as rapidly and reliably identify low-risk patients based on one single hsTnT test result. By integrating this AI-assisted strategy into clinical workflows, we aim to reduce ED length of stay, alleviate crowding, and ultimately improve healthcare efficiency while lowering medical expenditures. Methods: We conducted a retrospective study at a tertiary teaching hospital using data from 2015 to 2022. Models based artificial neural networks (ANN) were trained to build multi-classifiers for placing patients into three classes: critical patients with ACS, critical patients without ACS, and non-critical patients. Model performance was evaluated using AUROC, AUPRC, and subgroup-specific metrics. Results: After excluding ED visits with missing triage data, incomplete medical histories, or unsuitable dispositions, 17,935 visits were included. All ANN models demonstrated strong AUROC performance, with one showing the highest AUPRC (0.946). Regarding multi-classification, one classifier achieved excellent intended performance: it maintained high sensitivity (0.917) for identifying ACS patients, while achieving high PPV (0.901) and sensitivity (0.861) for identifying non-critical patients. Medically interpretable features which drives the predictive power were identified. Model performance supported the expansion of single-sample hs-TnT protocols. Conclusions: The ANN-based classifiers developed in this study offer effective clinical decision support for evaluating acute chest pain in the ED. By improving risk stratification and accurately identifying both high- and low-risk patients, the models can enhance diagnostic accuracy and help reduce ED overcrowding.