JMIR Publications

JMIR Medical Informatics

Clinical informatics, decision support for health professionals, electronic health records, and ehealth infrastructures.


Journal Description

JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.

Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2015: 4.532), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.

JMIR Medical Informatics journal features a rapid and thorough peer-review process, professional copyediting, professional production of PDF, XHTML, and XML proofs (ready for deposit in PubMed Central/PubMed). The site is optimized for mobile and iPad use.

JMIR Medical Informatics adheres to the same quality standards as JMIR and all articles published here are also cross-listed in the Table of Contents of JMIR, the worlds' leading medical journal in health sciences / health services research and health informatics (


Recent Articles:

  • Clinician and patient viewing EMR data (Adobe stock photo).

    Progress in the Enhanced Use of Electronic Medical Records: Data From the Ontario Experience


    Background: This paper describes a change management strategy, including a self-assessment survey tool and electronic medical record (EMR) maturity model (EMM), developed to support the adoption and implementation of EMRs among community-based physicians in the province of Ontario, Canada. Objective: The aim of our study was to present an analysis of progress in EMR use in the province of Ontario based on data from surveys completed by over 4000 EMR users. Methods: The EMM and the EMR progress report (EPR) survey tool clarify levels of capability and expected benefits of improved use. Maturity is assessed on a 6-point scale (0-5) for 25 functions, across 7 functional areas, ranging from basic to more advanced. A total of 4214 clinicians completed EPR surveys between April 2013 and March 2016. Univariate and multivariate descriptive statistics were calculated to describe the survey results. Results: Physicians reported continual improvement over years of use, perceiving that the longer they used their EMR, the better patient care they provided. Those with at least two years of experience reported the greatest progress. Conclusions: From our analyses at this stage we identified: (1) a direct correlation between years of EMR use and EMR maturity as measured in our model, (2) a similar positive correlation between years of EMR use and the perception that these systems improve clinical care in at least four patient-centered areas, and (3) evidence of ongoing improvement even in advanced years of use. Future analyses will be supplemented by qualitative and quantitative data collected from field staff engagements as part of the new EMR practice enhancement program (EPEP).

  • Image sourced from and owned by the authors.

    Checking Questionable Entry of Personally Identifiable Information Encrypted by One-Way Hash Transformation


    Background: As one of the several effective solutions for personal privacy protection, a global unique identifier (GUID) is linked with hash codes that are generated from combinations of personally identifiable information (PII) by a one-way hash algorithm. On the GUID server, no PII is permitted to be stored, and only GUID and hash codes are allowed. The quality of PII entry is critical to the GUID system. Objective: The goal of our study was to explore a method of checking questionable entry of PII in this context without using or sending any portion of PII while registering a subject. Methods: According to the principle of GUID system, all possible combination patterns of PII fields were analyzed and used to generate hash codes, which were stored on the GUID server. Based on the matching rules of the GUID system, an error-checking algorithm was developed using set theory to check PII entry errors. We selected 200,000 simulated individuals with randomly-planted errors to evaluate the proposed algorithm. These errors were placed in the required PII fields or optional PII fields. The performance of the proposed algorithm was also tested in the registering system of study subjects. Results: There are 127,700 error-planted subjects, of which 114,464 (89.64%) can still be identified as the previous one and remaining 13,236 (10.36%, 13,236/127,700) are discriminated as new subjects. As expected, 100% of nonidentified subjects had errors within the required PII fields. The possibility that a subject is identified is related to the count and the type of incorrect PII field. For all identified subjects, their errors can be found by the proposed algorithm. The scope of questionable PII fields is also associated with the count and the type of the incorrect PII field. The best situation is to precisely find the exact incorrect PII fields, and the worst situation is to shrink the questionable scope only to a set of 13 PII fields. In the application, the proposed algorithm can give a hint of questionable PII entry and perform as an effective tool. Conclusions: The GUID system has high error tolerance and may correctly identify and associate a subject even with few PII field errors. Correct data entry, especially required PII fields, is critical to avoiding false splits. In the context of one-way hash transformation, the questionable input of PII may be identified by applying set theory operators based on the hash codes. The count and the type of incorrect PII fields play an important role in identifying a subject and locating questionable PII fields.

  • OVERT-MED visual interface.

    Ontology-Driven Search and Triage: Design of a Web-Based Visual Interface for MEDLINE


    Background: Diverse users need to search health and medical literature to satisfy open-ended goals such as making evidence-based decisions and updating their knowledge. However, doing so is challenging due to at least two major difficulties: (1) articulating information needs using accurate vocabulary and (2) dealing with large document sets returned from searches. Common search interfaces such as PubMed do not provide adequate support for exploratory search tasks. Objective: Our objective was to improve support for exploratory search tasks by combining two strategies in the design of an interactive visual interface by (1) using a formal ontology to help users build domain-specific knowledge and vocabulary and (2) providing multi-stage triaging support to help mitigate the information overload problem. Methods: We developed a Web-based tool, Ontology-Driven Visual Search and Triage Interface for MEDLINE (OVERT-MED), to test our design ideas. We implemented a custom searchable index of MEDLINE, which comprises approximately 25 million document citations. We chose a popular biomedical ontology, the Human Phenotype Ontology (HPO), to test our solution to the vocabulary problem. We implemented multistage triaging support in OVERT-MED, with the aid of interactive visualization techniques, to help users deal with large document sets returned from searches. Results: Formative evaluation suggests that the design features in OVERT-MED are helpful in addressing the two major difficulties described above. Using a formal ontology seems to help users articulate their information needs with more accurate vocabulary. In addition, multistage triaging combined with interactive visualizations shows promise in mitigating the information overload problem. Conclusions: Our strategies appear to be valuable in addressing the two major problems in exploratory search. Although we tested OVERT-MED with a particular ontology and document collection, we anticipate that our strategies can be transferred successfully to other contexts.

  • Predictive analytics. Image Source: Author: geralt. Copyright: CC0 Public Domain.

    Patient-Specific Predictive Modeling Using Random Forests: An Observational Study for the Critically Ill

    Authors List:


    Background: With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling. Objective: The objective of the study is to investigate the random forest (RF) proximity measure as a PSM in the context of personalized mortality prediction for intensive care unit (ICU) patients. Methods: A total of 17,152 ICU admissions were extracted from the Multiparameter Intelligent Monitoring in Intensive Care II database. A number of predictor variables were extracted from the first 24 hours in the ICU. Outcome to be predicted was 30-day mortality. A patient-specific predictive model was trained for each ICU admission using an RF PSM inspired by the RF proximity measure. Death counting, logistic regression, decision tree, and RF models were studied with a hard threshold applied to RF PSM values to only include the M most similar patients in model training, where M was varied. In addition, case-specific random forests (CSRFs), which uses RF proximity for weighted bootstrapping, were trained. Results: Compared to our previous study that investigated a cosine similarity PSM, the RF PSM resulted in superior or comparable predictive performance. RF and CSRF exhibited the best performances (in terms of mean area under the receiver operating characteristic curve [95% confidence interval], RF: 0.839 [0.835-0.844]; CSRF: 0.832 [0.821-0.843]). RF and CSRF did not benefit from personalization via the use of the RF PSM, while the other models did. Conclusions: The RF PSM led to good mortality prediction performance for several predictive models, although it failed to induce improved performance in RF and CSRF. The distinction between predictor and similarity variables is an important issue arising from the present study. RFs present a promising method for patient-specific outcome prediction.

  • Health care professionals. Image sourced and copyright owned by authors.

    The Value of Electronic Medical Record Implementation in Mental Health Care: A Case Study


    Background: Electronic medical records (EMR) have been implemented in many organizations to improve the quality of care. Evidence supporting the value added to a recovery-oriented mental health facility is lacking. Objective: The goal of this project was to implement and customize a fully integrated EMR system in a specialized, recovery-oriented mental health care facility. This evaluation examined the outcomes of quality improvement initiatives driven by the EMR to determine the value that the EMR brought to the organization. Methods: The setting was a tertiary-level mental health facility in Ontario, Canada. Clinical informatics and decision support worked closely with point-of-care staff to develop workflows and documentation tools in the EMR. The primary initiatives were implementation of modules for closed loop medication administration, collaborative plan of care, clinical practice guidelines for schizophrenia, restraint minimization, the infection prevention and control surveillance status board, drug of abuse screening, and business intelligence. Results: Medication and patient scan rates have been greater than 95% since April 2014, mitigating the adverse effects of medication errors. Specifically, between April 2014 and March 2015, only 1 moderately severe and 0 severe adverse drug events occurred. The number of restraint incidents decreased 19.7%, which resulted in cost savings of more than Can $1.4 million (US $1.0 million) over 2 years. Implementation of clinical practice guidelines for schizophrenia increased adherence to evidence-based practices, standardizing care across the facility. Improved infection prevention and control surveillance reduced the number of outbreak days from 47 in the year preceding implementation of the status board to 7 days in the year following. Decision support to encourage preferential use of the cost-effective drug of abuse screen when clinically indicated resulted in organizational cost savings. Conclusions: EMR implementation allowed Ontario Shores Centre for Mental Health Sciences to use data analytics to identify and select appropriate quality improvement initiatives, supporting patient-centered, recovery-oriented practices and providing value at the clinical, organizational, and societal levels.

  • Hand Holding A Stethoscope. Image Source: Author: Petr Kratochvil. Copyright: Public Domain.

    Email Between Patient and Provider: Assessing the Attitudes and Perspectives of 624 Primary Health Care Patients


    Background: Email between patients and their health care providers can serve as a continuous and collaborative forum to improve access to care, enhance convenience of communication, reduce administrative costs and missed appointments, and improve satisfaction with the patient-provider relationship. Objective: The main objective of this study was to investigate the attitudes of patients aged 16 years and older toward receiving email communication for health-related purposes from an academic inner-city family health team in Southern Ontario. In addition to exploring the proportion of patients with a functioning email address and interest in email communication with their health care provider, we also examined patient-level predictors of interest in email communication. Methods: A cross-sectional study was conducted using a self-administered, 1-page survey of attitudes toward electronic communication for health purposes. Participants were recruited from attending patients at the McMaster Family Practice in Hamilton, Ontario, Canada. These patients were aged 16 years and older and were approached consecutively to complete the self-administered survey (N=624). Descriptive analyses were conducted using the Pearson chi-square test to examine correlations between variables. A logistic regression analysis was conducted to determine statistically significant predictors of interest in email communication (yes or no). Results: The majority of respondents (73.2%, 457/624) reported that they would be willing to have their health care provider (from the McMaster Family Practice) contact them via email to communicate health-related information. Those respondents who checked their personal email more frequently were less likely to want to engage in this electronic communication. Among respondents who check their email less frequently (fewer than every 3 days), 46% (37/81) preferred to communicate with the McMaster Family Practice via email. Conclusions: Online applications, including email, are emerging as a viable avenue for patient communication. With increasing utility of mobile devices in the general population, the proportion of patients interested in email communication with their health care providers may continue to increase. When following best practices and appropriate guidelines, health care providers can use this resource to enhance patient-provider communication in their clinical work, ultimately leading to improved health outcomes and satisfaction with care among their patients.

  • EHR and FOCUS. Image sourced and copyright owned by authors.

    Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations


    Background: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients’ notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. Objective: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. Methods: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians’ agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. Results: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen’s kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P<.001). Rich learning features contributed to FOCUS’s performance substantially. Conclusions: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care.

  • Creative abstract healthcare, medicine and cardiology tool concept: laptop or notebook computer with medical cardiologic diagnostic test software on screen and stethoscope isolated on white background. Image source: Image Author: Scanrail1. Image purchased by authors.

    A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences


    Background: Medical concepts are inherently ambiguous and error-prone due to human fallibility, which makes it hard for them to be fully used by classical machine learning methods (eg, for tasks like early stage disease prediction). Objective: Our work was to create a new machine-friendly representation that resembles the semantics of medical concepts. We then developed a sequential predictive model for medical events based on this new representation. Methods: We developed novel contextual embedding techniques to combine different medical events (eg, diagnoses, prescriptions, and labs tests). Each medical event is converted into a numerical vector that resembles its “semantics,” via which the similarity between medical events can be easily measured. We developed simple and effective predictive models based on these vectors to predict novel diagnoses. Results: We evaluated our sequential prediction model (and standard learning methods) in estimating the risk of potential diseases based on our contextual embedding representation. Our model achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.79 on chronic systolic heart failure and an average AUC of 0.67 (over the 80 most common diagnoses) using the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Conclusions: We propose a general early prognosis predictor for 80 different diagnoses. Our method computes numeric representation for each medical event to uncover the potential meaning of those events. Our results demonstrate the efficiency of the proposed method, which will benefit patients and physicians by offering more accurate diagnosis.

  • Word cloud. Image created and copyright owned by authors.

    Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites


    Background: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. Objective: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). Methods: We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. Results: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” Conclusions: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage.

  • Health care challenges. Image sourced and copyright owned by authors.

    Challenges and Opportunities of Big Data in Health Care: A Systematic Review


    Background: Big data analytics offers promise in many business sectors, and health care is looking at big data to provide answers to many age-related issues, particularly dementia and chronic disease management. Objective: The purpose of this review was to summarize the challenges faced by big data analytics and the opportunities that big data opens in health care. Methods: A total of 3 searches were performed for publications between January 1, 2010 and January 1, 2016 (PubMed/MEDLINE, CINAHL, and Google Scholar), and an assessment was made on content germane to big data in health care. From the results of the searches in research databases and Google Scholar (N=28), the authors summarized content and identified 9 and 14 themes under the categories Challenges and Opportunities, respectively. We rank-ordered and analyzed the themes based on the frequency of occurrence. Results: The top challenges were issues of data structure, security, data standardization, storage and transfers, and managerial skills such as data governance. The top opportunities revealed were quality improvement, population management and health, early detection of disease, data quality, structure, and accessibility, improved decision making, and cost reduction. Conclusions: Big data analytics has the potential for positive impact and global implications; however, it must overcome some legitimate obstacles.

  • Visual Representations of Physiologic Data. Image sourced and copyright owned by authors.

    A Review of Visual Representations of Physiologic Data


    Background: Physiological data is derived from electrodes attached directly to patients. Modern patient monitors are capable of sampling data at frequencies in the range of several million bits every hour. Hence the potential for cognitive threat arising from information overload and diminished situational awareness becomes increasingly relevant. A systematic review was conducted to identify novel visual representations of physiologic data that address cognitive, analytic, and monitoring requirements in critical care environments. Objective: The aims of this review were to identify knowledge pertaining to (1) support for conveying event information via tri-event parameters; (2) identification of the use of visual variables across all physiologic representations; (3) aspects of effective design principles and methodology; (4) frequency of expert consultations; (5) support for user engagement and identifying heuristics for future developments. Methods: A review was completed of papers published as of August 2016. Titles were first collected and analyzed using an inclusion criteria. Abstracts resulting from the first pass were then analyzed to produce a final set of full papers. Each full paper was passed through a data extraction form eliciting data for comparative analysis. Results: In total, 39 full papers met all criteria and were selected for full review. Results revealed great diversity in visual representations of physiological data. Visual representations spanned 4 groups including tabular, graph-based, object-based, and metaphoric displays. The metaphoric display was the most popular (n=19), followed by waveform displays typical to the single-sensor-single-indicator paradigm (n=18), and finally object displays (n=9) that utilized spatiotemporal elements to highlight changes in physiologic status. Results obtained from experiments and evaluations suggest specifics related to the optimal use of visual variables, such as color, shape, size, and texture have not been fully understood. Relationships between outcomes and the users’ involvement in the design process also require further investigation. A very limited subset of visual representations (n=3) support interactive functionality for basic analysis, while only one display allows the user to perform analysis including more than one patient. Conclusions: Results from the review suggest positive outcomes when visual representations extend beyond the typical waveform displays; however, there remain numerous challenges. In particular, the challenge of extensibility limits their applicability to certain subsets or locations, challenge of interoperability limits its expressiveness beyond physiologic data, and finally the challenge of instantaneity limits the extent of interactive user engagement.

  • Diabetes. Image Source: Author: Tesaphotography. Copyright: CC0 Public Domain.

    Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language...


    Background: Diabetes case finding based on structured medical records does not fully identify diabetic patients whose medical histories related to diabetes are available in the form of free text. Manual chart reviews have been used but involve high labor costs and long latency. Objective: This study developed and tested a Web-based diabetes case finding algorithm using both structured and unstructured electronic medical records (EMRs). Methods: This study was based on the health information exchange (HIE) EMR database that covers almost all health facilities in the state of Maine, United States. Using narrative clinical notes, a Web-based natural language processing (NLP) case finding algorithm was retrospectively (July 1, 2012, to June 30, 2013) developed with a random subset of HIE-associated facilities, which was then blind tested with the remaining facilities. The NLP-based algorithm was subsequently integrated into the HIE database and validated prospectively (July 1, 2013, to June 30, 2014). Results: Of the 935,891 patients in the prospective cohort, 64,168 diabetes cases were identified using diagnosis codes alone. Our NLP-based case finding algorithm prospectively found an additional 5756 uncodified cases (5756/64,168, 8.97% increase) with a positive predictive value of .90. Of the 21,720 diabetic patients identified by both methods, 6616 patients (6616/21,720, 30.46%) were identified by the NLP-based algorithm before a diabetes diagnosis was noted in the structured EMR (mean time difference = 48 days). Conclusions: The online NLP algorithm was effective in identifying uncodified diabetes cases in real time, leading to a significant improvement in diabetes case finding. The successful integration of the NLP-based case finding algorithm into the Maine HIE database indicates a strong potential for application of this novel method to achieve a more complete ascertainment of diagnoses of diabetes mellitus.

Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Latest Submissions Open for Peer-Review:

View All Open Peer Review Articles
  • A Review of Portable Digital Assistant (PDA) Implementation process in Low Resource Settings

    Date Submitted: Jan 10, 2017

    Open Peer Review Period: Jan 20, 2017 - Mar 17, 2017

    This paper presents the Portable Digital Assistant Implementation in the Sene District of Ghana. The study sought to understand eHealth implementation in low resource settings through the lens of acto...

    This paper presents the Portable Digital Assistant Implementation in the Sene District of Ghana. The study sought to understand eHealth implementation in low resource settings through the lens of actor-network theory. Part of this theory is made up of the sociology of translations, which was employed as a crucial framework for exploring the Portable Digital Assistant Set-up. Data was collected between January 2011 – June 2014. Data collection has been through triangulation of qualitative methods: interviews, participant observation, and document analysis. A total of 20 human and non-human actors’ were identified and semi-structured interviews conducted with the human actors using Face to Face (10), Telephone (2). It was deduced from the case that the champion in the case was able to mobilise the various actors to ensure that the project succeeds. His ability to manage the various resources ensured that there were not any lapses.