This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.
The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using “clinical notes,” “natural language processing,” and “chronic disease” and their variations as keywords to maximize coverage of the articles.
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the
Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
The burden of chronic diseases, such as cancers, diabetes, and hypertension, is widely accepted as one of the principal challenges of health care. While immense progress has been made in the discovery of new treatments and prevention strategies, this challenge not only persists, but its incidence is exhibiting an upward trend [
A promising direction is the secondary use of electronic health records (EHRs) to analyze patient data, advance medical research, and better inform clinical decision making. Methods based in analysis of EHRs [
However, EHRs are challenging to represent and model due to their high dimensionality, noise, heterogeneity, sparseness, incompleteness, random errors, and systematic biases. Moreover, a wealth of information about patient clinical history is generally locked behind free-text clinical narratives [
Clinically relevant information from clinical notes has been historically extracted via manual review by clinical experts, leading to scalability and cost issues. This is of particular relevance for chronic diseases since clinical notes dominate over structured data (for example, Wei et al [
Systematic reviews related to processing of clinical notes have been published in the past [
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [
In the initial queries we also included the following terms: “electronic health records,” “EHR,” “electronic medical records,” and “EMR.” This led to a total of 2652 retrieved articles. However, upon reviewing these articles, we noticed that the scope was too broad, providing results outside of focus of this review. Consequently, we narrowed the search strategy to the keywords specified in the previous section, obtaining a total of 478 articles, with 401 articles from Scopus, 58 from Web of Science (including PubMed), 13 from ACM Digital Library, and 6 added manually, including 4 conference papers. After removing 46 duplicates, 432 articles were retained, and two authors (MS and VO) reviewed their titles and abstracts (216 articles each). After this screening phase, 159 articles were retained for further analysis.
In the second screening stage, five authors independently reviewed the 159 full-text articles, resulting in 106 articles fulfilling our criteria that are discussed in this review. The most common reason for exclusion was that the work was not directly related to chronic diseases (n=32); another reason was the work was not topical (eg, the article was not a journal paper or we could not retrieve the text). A flowchart and description of the selection process are provided in
Preferred Reporting Items for Systematic Reviews and Meta-Analyses article selection flowchart. ACM: Association for Computing Machinery; NLP: natural language processing.
The 106 articles reviewed were largely related to 43 unique chronic diseases (as shown in
Classifications of chronic conditions studied (n=102) and the corresponding number of papers found.
Classification of chronic condition | Studies, n (%) | Conditions included |
Diseases of the circulatory system | 38 (35.8) | Congestive heart disease (2), coronary artery disease (6), heart disease (6), heart failure (7), hypertension (5), peripheral arterial disease (3), pulmonary disease (4) |
Neoplasms | 34 (32.1) | Breast cancer (8), colorectal cancer (7), prostate cancer (4), lymphoma (2) |
Endocrine, nutritional, and metabolic diseases | 14 (13.2) | Type 2 diabetes mellitus (12), obesity (2) |
Other diseases | 16 (15.1) | Diseases of the digestive system (3), diseases of the genitourinary system (3), diseases of the musculoskeletal system and connective tissue (3), diseases of the respiratory system (2), mental and behavioral disorders (2), multidisease (3) |
Relationship between chronic diseases (black sectors) and articles included in the review (for clarity we have included only diseases that are addressed by three or more articles).
The top three disease groups were (1) diseases of the circulatory system (n=38) (such as coronary artery disease [
An unexpected finding is that despite the higher incidence of metabolic diseases in the general population [
Most of the work in this area focused on using NLP to estimate the risk of heart disease. As an example, Chen et al [
Risk of stroke and major bleeding in patients with atrial fibrillation has been predicted using structured data and clinical notes [
Several studies used NLP to extract cases of peripheral arterial disease (PAD) and critical limb ischemia from clinical notes [
Work on hypertension has been principally focused on NLP to extract relevant indicators, comorbidities, and drug therapies [
Byrd et al [
Wang et al [
Topaz et al [
This section reviews a number of cancer-related studies, including detection of multiple types of cancer [
Kasthurirathne et al [
A number of studies have focused on different applications of NLP in pathology, histopathology, and radiology reports [
The three most common types of cancers found are breast cancer (n=8), colorectal cancer (n=7), and prostate cancer (n=4).
Carrell et al [
EHRs and NLP were used to identify patients in need of colorectal cancer screening [
Ping et al [
Applications of NLP in the domain of endocrine, nutritional, and metabolic diseases include negation detection and mention of family history in free-text notes [
Two support vector machines (SVMs) were combined to automatically identify obesity types by extracting obesity and diabetes-related concepts from clinical text [
The remaining 16 papers focused on processing clinical notes of different types of chronic diseases. Three studies concern diseases of the musculoskeletal system and connective tissue, in particular classification of snippets of text related to axial spondyloarthritis in the EMRs of US military veterans using NLP and SVM [
Two papers evaluated deep learning in a multidisease domain. In particular, Miotto et al [
Neural networks were also used to process clinical notes for phenotyping psychiatric diagnosis [
IE from clinical notes based on NLP was also used to (1) screen computed tomography reports for invasive pulmonary mold [
Last, Pivovarov and Elhadad [
In order to understand trends in NLP methods for chronic diseases, in this review we have analyzed papers with respect to the methods employed (machine vs rule-based learning). While there is an increasing use of machine learning methods in comparison to rule-based (as shown in
Our review identified 16 papers that employed hybrid approaches combining rule-based and machine learning methods. Out of these, 2 papers describe work to identify diseases, risk factors, medications, and time attributes. In particular, a hybrid pipeline based on CRFs, SVMs, and rule-based approaches was used to identify negation information and normalize temporal expressions [
We identified 24 papers that focused on comparison between performance of rule-based and machine learning methods. Typically, the rule-based methods were used as a baseline to test the performance against machine learning algorithms.
As for rule-based approaches, the methods in this review include dictionary lookup [
The most widely used machine learning approach is SVMs, having been used for predicting heart disease in medical records [
Naïve Bayes was the second most frequent approach, being used to predict heart disease in medical records [
Natural language processing rule-based methods versus machine learning for chronic diseases.
Most frequently used natural language processing methods and the corresponding number of papers.
Method | Papers (n) |
Support vector machine | 18 |
Naïve Bayes | 11 |
Conditional random fields | 7 |
Random forest | 4 |
Maximum entropy | 3 |
Decision tree | 3 |
Deep neural networks | 3 |
Logistic regression | 3 |
Rule-based methods | 74 |
It is interesting to note that there are only 3 papers using approaches based on deep learning [
The NLP works described in the reviewed papers and associated approaches reveal that the most frequently described tasks are text classification and entity recognition. The majority of the papers describe text classification tasks using standard approaches in NLP such as SVM (n=12) and naïve Bayes (n=4). Entity recognition approaches are based on manually developed resources (dictionary, regular expressions, handwritten rules) as well as methods based on machine learning. As for the former, there are dictionary-based approaches (n=5) and those relying on regular expressions (n=12). As for the latter, the approaches are mainly based on standard machine language techniques such as CRF and deep learning. A few papers describe approaches to coreference resolution (n=2) and negation detection (n=3). Coreference resolution is addressed using SVM, while negation detection is based on SVM (n=2) or manual rules (n=1).
Regarding datasets, the majority of the papers describe experiments run on datasets that are not publicly available (typically clinical data collected at research-based health care institutions and exploited by in-house NLP teams). On the other hand, out of 16 papers involving publicly available corpora, 12 exploit the Informatics for Integrating Biology and the Bedside (i2b2) datasets. The other 4 public datasets used are MIMIC-II [
Interest in using NLP for the automated processing of medical records, and in particular of free-text clinical notes, is increasing, exemplified by a number of recent reviews of the field. Yet none of these works focuses solely on chronic diseases, where the amount of patient clinical notes tends to be larger than other domains or provides specific recommendations on how to advance the field toward a clinical adoption that helps in treating people with chronic conditions. Here we briefly provide a summary of previous works partially related to the work presented in this paper.
Ford et al [
The work by Shivade et al [
Abbe et al [
The review by Spasic et al [
The work by Pons et al [
Closest to our work is a systematic review by Wang et al [
The 106 articles considered in this review were published in 50 unique venues.
Categorization of the publication venues.
Distribution of included studies according to publication venues.
Our systematic review has shown that NLP has a wide range of applications for processing clinical notes of diverse chronic diseases (43 unique chronic diseases identified in the analysis). In this respect, there is a significant increase in the use of machine learning compared with rule-based methods. Despite the potential offered by deep learning, the majority of papers still rely on shallow classifiers. In fact, only a handful of studies (ie, 3 papers) made use of deep classifiers or general deep learning methods for NLP. This was unexpected, considering the potential of deep learning for text processing [
Another finding from our review is that the majority of papers reviewed identify risk factors for a particular disease and classify a clinical note by a certain disease phenotype. However, there are only a handful of papers that extract comorbidities from the free-text or integrate clinical notes with structured data for prediction and longitudinal modeling of trajectories of patients with chronic diseases. Such an outcome could be related to the use of data analysis methods and algorithms (such as shallow classifiers and rule-based approaches highlighted earlier) that do not have the capability to capture temporal and longitudinal relationships between clinical variables and in turn capture disease evolution. Tools (such as MetaMap) and methods (such as mapping n-grams to ontologies) used may have been other influencing factors. While these tools allow extracting meaningful medical information from the text, inherently they reduce the possibility to derive more complex relationships, principally due to phrase structure (for example “breast and lung cancer” may be identified only as “breast” and “lung cancer” rather than both “breast cancer” and “lung cancer”). However, the use of relatively simple methods is advantageous in terms of interpretability of predictions—a highly important aspect in clinical domain—whereas it still represents a significant issue for more complex methods.
Our review has retrieved only a few studies on the topic of extracting word embeddings from clinical notes. This may be due to insufficient available data to train the algorithms as well as the fact that embedding methods have been developed only recently. The issue of insufficient training data could be addressed using transfer learning methods, while using precomputed embeddings for specific diseases or categories of diseases could be useful to effectively capture longitudinal relationships.
Our review has shown that SVM and naïve Bayes algorithms were most often used for machine learning–based tasks or in combination with rule-based methods. This may be due to the popularity of these algorithms as well as because naïve Bayes, being a relatively simple algorithm, requires relatively small amount of training data (in comparison with deep classifiers, for example). Although it is not feasible to directly compare algorithmic performance of the studies that we considered (due to both diversity of data and challenges addressed), we have noted that the most commonly reported performance measures were sensitivity (recall), positive predictive value (precision), and
Finally, our review has reinforced the fact that availability of public datasets remains scarce. This outcome was largely expected given the sensitivity of clinical data in addition to all the legal and regulatory issues, including the Health Insurance Portability and Accountability Act and the Data Protection Directive (Directive 95/46/EC) of the European Law (superseded by the General Data Protection Regulation 2016/679). As a result, the studies reviewed in this paper typically came from research-based health care institutions with in-house NLP teams having access to clinical data. Therefore, the need remains for shared tasks such as i2b2 and access to data that would increase participation in clinical NLP and contribute to improvements of NLP methods and algorithms targeting clinical applications.
This review has examined the last 11 years of clinical IE applications literature and may have the following limitations. The review is limited to journal articles written in the English language, and papers written in other languages, especially papers that consider clinical narratives, may provide additional results. In addition, papers using clinical articles from non-EHR systems have not been considered. Finally, focusing on the clinical domain may have introduced a bias with respect to the methods reviewed (rule-based vs machine learning), as rule-based methods are more prevalent in the clinical domain compared with other domains [
Our review has shown that there is a clear necessity for clinical NLP methods to evolve beyond extraction of clinical concepts and focus more on concept understanding (ie, not only understanding of relationships between concepts but incorporation of clinical facts, domain knowledge, and general knowledge in the reasoning process). In this review, we have not encountered work that attempts to bridge the gap between concept extraction and concept understanding.
We have devised the following specific recommendations:
Focus on recognition of relationships among clinical concepts and entities. While progress has been made in recognizing entities in textual narratives (such as diseases, drugs, procedures), further efforts must be focused on automatic inference of relationships between these entities (for example, drug A causes adverse event B for chronic disease C), which in turn would allow deeper understanding of clinical text.
Temporal extraction, automated mark-up and normalization of temporal information from natural language texts, is an important aspect. This is especially relevant for clinical text as disease progression and clinical events are typically recorded chronologically, with specific events being significant only in a particular temporal context. As such, significant attention should be given to temporal extraction considering its implication in clinical context, especially since none of the works in this review dealt with temporal extraction (or used crude methods such as timestamps of clinical notes).
Scarcity of annotated clinical corpora has raised the need to exploit alternative sources of domain knowledge. In addition to mainstream sources such as biomedical literature, encyclopedias, and textbooks, automatic diagnostic and decision support systems could be exploitable (such as DXplain [
Significant advances in effective clinical NLP will depend on large-scale corpora becoming available to researchers. While shared tasks such as i2b2 and its successor n2c2 are steps in the right direction, further incentives will be required such as developing mechanisms that would empower patients to donate their anonymized data or even providing algorithms that run on clinical text inside care institutions.
Search strategy.
Complete list of reviewed papers, chronic diseases and their classifications, algorithms used, publication venues, and excluded papers.
Breast Imaging-Reporting and Data System
congestive heart failure
conditional random field
Cancer Deep Phenotype Extraction
electronic health record
electronic medical record
heart failure
Informatics for Integrating Biology and the Bedside
International Classification of Diseases, 10th Revision
information extraction
Multiparameter Intelligent Monitoring in Intensive Care II
natural language processing
peripheral arterial disease
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
support vector machine
Temporal Histories of Your Medical Event
This work was partially supported by the European Union's Horizon 2020 research and innovation program under grant agreement #769765.
None declared.