Background

JMI

JMIR Med Inform

JMIR Medical Informatics

2291-9694

JMIR Publications

Toronto, Canada

v5i4e42

29089288

10.2196/medinform.8531

Original Paper

Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach

Eysenbach

Gunther

Cui

Licong

Luo

Jake

Abdulai

Tanko

Fincham

Colin

Mircheva

Iskra

Chen

Jinying

PhD 1

Department of Quantitative Health Sicences University of Massachusetts Medical School

368 Plantation Street

Worcester, MA, 01605

United States 1 774 455 3527 1 508 856 8993 jinying.chen@umassmed.edu

http://orcid.org/0000-0001-7259-4301

Jagannatha

Abhyuday N

MSc 2

http://orcid.org/0000-0001-5334-5481

Fodeh

Samah J

PhD 3

http://orcid.org/0000-0003-4664-3143

Hong

FACMI, PhD 1 4

http://orcid.org/0000-0001-9263-5035

¹ Department of Quantitative Health Sicences University of Massachusetts Medical School

Worcester, MA

United States ² School of Computer Science University of Massachusetts

Amherst, MA

United States ³ Yale Center for Medical Informatics Yale University

New Haven, CT

United States ⁴ Bedford Veterans Affairs Medical Center

Bedford, MA

United States

Corresponding Author: Jinying Chen jinying.chen@umassmed.edu

Oct-Dec2017

31 10 2017

5 4

e42

22 7 2017 6 9 2017 19 9 2017 20 9 2017

©Jinying Chen, Abhyuday N Jagannatha, Samah J Fodeh, Hong Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 31.10.2017.

2017

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.

Background

Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.

Objective

We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation—that is, creating lay definitions for these terms.

Methods

Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.

Results

The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P<.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS’s performance substantially.

Conclusions

ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS’s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.

electronic health records natural language processing lexical entry selection transfer learning information extraction

Introduction Significance and Background

Online patient portals have been widely adopted in the United States in a nationwide effort to promote patient-centered care [1-3]. Many health organizations also allow patients to access their full electronic health record (EHR) notes through patient portals, with early evidence showing improved medical comprehension and health care outcomes [4-6]. However, medical terms—abundant in EHR notes—remain a major obstacle for patients to comprehend medical text, including EHRs [7-12]. In addition, an estimated 36% of adult Americans have limited health literacy [13]. Limited health literacy has been identified as one major barrier to patient use of EHRs [3,14-17]. Misinterpretation of EHR content may result in unintended increases in service utilization and change of patient-provider relationships.

Textbox 1 shows an excerpt from a typical clinical note. The medical terms that may hinder patients’ comprehension are italicized. Here we show a subset of medical terms identified by the Unified Medical Language System (UMLS) lexical tool MetaMap [18] for illustration purposes only.

Illustration of medical terms in a sample clinical note.

Her creatinine has shown a steady rise over the past four years. She does have nephrotic range proteinuria. The likely etiology of her nephrotic range proteinuria is her diabetes.

She was on an ACE inhibitor, which was just stopped in August due to the elevated creatinine of 4.41. Given the severity of her nephrotic syndrome, her chronic kidney disease is likely permanent; however, I will repeat a chem-8 now that she is off the ACE inhibitor. I will also get a renal duplex scan to make sure she does not have any renal artery stenosis.

There has been long-standing research interest in developing health information technologies that promote health literacy and consumer-centered communication of health information [19,20]. Natural language processing (NLP)-enabled interventions have also been developed to link medical terms in EHRs to lay terms [21,22] or definitions [23], showing improved comprehension [22,23]. Although there is a substantial amount of health information available on the Internet, many Internet users face challenges accessing and selecting relevant high-quality information [24-27]. The aforementioned NLP-enabled interventions have the advantage of reducing patients’ information-seeking burden by integrating authorized health-related information in a single place, and thereby helping patients easily read through and understand their EHR notes.

However, high-quality lay language resources—the cornerstone of such interventions—are very limited in the public domain. The readability levels of health educational materials on the Internet often exceed the level that is easily understood by the average patient [28-30]. Definitions of medical terms provided by controlled health vocabularies, such as those included in the UMLS, often themselves contain complex medical concepts. For example, the term “nephrotic syndrome” in Textbox 1 is defined in the US National Cancer Institute vocabulary as “A collection of symptoms that include severe edema, proteinuria, and hypoalbuminemia; it is indicative of renal dysfunction,” where the medical concepts “edema,” “proteinuria,” “hypoalbuminemia,” and “renal dysfunction” may not be familiar to patients.

The consumer health vocabulary (CHV) [31] is a valuable lay language resource that has been integrated into the UMLS and has also been used in EHR simplification [21,22]. CHV contains consumer health terms (which were used by lay people to query online health information) and maps these terms to UMLS concepts. As a result, it contains both lay terms and medical terms, and links between these 2 types of terms. In addition, it provides lay definitions for some medical terms. From our current work, however, we found that CHV alone is not sufficient for comprehending EHR notes, as many medical terms in EHRs do not exist in CHV, and many others exist in CHV but do not have lay terms or lay definitions. For example, among the 19,503 unique terms identified by MetaMap [18] from a corpus of 7839 EHR notes, 4680 (24.0%) terms do not appear in CHV, including “focal motor deficit,” “Hartmann procedure,” “titrate,” and “urethrorectal fistula” (see Multimedia Appendix 1 for more results).

We are building a lay language resource for EHR comprehension by including medical terms from EHRs and creating lay definitions for those terms. This is a time-consuming process that involves collecting candidate definitions from authorized health educational resources, and curating and simplifying these definitions by domain experts. Since the number of candidate terms mined from EHRs is large (hundreds of thousands of terms), we ranked candidate terms based on how important they are for patients’ comprehension of EHRs, and therefore prioritized the annotation effort of lexical entries based on those important terms.

The goal of this study was to develop an NLP system to automate the process of lexical entry selection. This task was challenging because the distinctions between important and nonimportant EHR terms in our task were more subtle than that between medical terms and nonmedical terms (detailed below in the Important Terms for Electronic Health Record Comprehension subsection). To achieve this goal, we developed a new NLP system, called adapted distant supervision (ADS), which uses distant supervision from the CHV and uses transfer learning to adapt itself to the target domain to rank terms from EHRs. We aimed to empirically show that ADS is effective in ranking EHR terms at the corpus level and outperforms supervised learning.

Related Work Natural Language Processing to Facilitate Creation of Lexical Entries

Previous studies have used both unsupervised and supervised learning methods to prioritize terms for inclusion in biomedical and health knowledge resources [32-35]. Term recognition methods, which are widely used unsupervised methods for term extraction, use rules and statistics (eg, corpus-level word and term frequencies) to prioritize technical terms from domain-specific text corpora. Since these methods do not use manually annotated training data, they have better domain portability but are less accurate than supervised learning [32]. The contribution of this study is to propose a new learning-based method for EHR term prioritization, which is more accurate than supervised learning while also having good domain portability.

Our work is also related to previous studies that have used distributional semantics for lexicon expansion [35-37]. In this work, we used word embedding, one technique for distributional semantics, to generate one type of learning features for the ADS system to rank EHR terms.

Ranking Terms in Electronic Health Records

We previously developed NLP systems to rank and identify important terms from each EHR note of individual patients [38,39]. This study is different in that it aimed to rank terms at the EHR corpus level for the purpose of expanding a lay language resource to improve health literacy and EHR comprehension of the general patient population. Notice that both types of work are important for building NLP-enabled interventions to support patient EHR comprehension. For example, a real-world application can link all medical jargon terms in a patient’s EHR note to lay terms or definitions, and then highlight the terms most important for this patient and provide detailed information for these important terms.

Distant Supervision

Our ADS system uses distant supervision from the CHV. Distant supervision refers to the learning framework that uses information from knowledge bases to create labeled data to train machine learning models [40-42]. Previous work often used this technique to address context-based classification problems such as named entity detection and relation detection. In contrast, we used it to rank terms without considering context. However, our work is similar in that it uses heuristic rules and knowledge bases to create training data. Although training data created this way often contain noise, distant supervision has been successfully applied to several biomedical NLP tasks to reduce human annotation efforts, including extraction of entities [40,41,43], relations [44-46], and important sentences [47] from the biomedical literature. In this study, we made novel use of the non-EHR-centric lexical resource CHV to create training data for ranking terms from EHRs. This approach has greater domain portability than conventional distant supervision methods due to fewer demands on the likeness between the knowledge base and the target-domain learning task. On the other hand, learning from the distantly labeled data with a mismatch to the target task is more challenging. We address this challenge by using transfer learning.

Transfer Learning

Transfer learning is a learning framework that transfers knowledge from the source domain D_S (the training data derived from the CHV, in our case) to the target domain D_T to help improve the learning of the target-domain task T_T [48]. We followed Pan and Yang [48] to distinguish between inductive transfer learning, where the source- and target-domain tasks are different, and domain adaptation, where the source- and target-domain tasks are the same but the source and target domains (ie, data distributions) are different. Our approach belongs to the first category because our source-domain and target-domain tasks define positive and negative examples in different ways. Transfer learning has been applied to important bioinformatics tasks such as DNA sequence analysis and gene interaction network analysis [49]. It has also been applied to several clinical and biomedical NLP tasks, including part-of-speech tagging [50] and key concept identification for clinical text [51], semantic role labeling for biomedical articles [52] and clinical text [53], and key sentence extraction from biomedical literature [47]. In this work, we investigated 2 state-of-the-art transfer learning algorithms that have shown superior performance in recent studies [47,53]. We aimed to empirically show that they, in combination with distant supervision, are effective in ranking EHR terms.

Methods Electronic Health Record Corpus and Candidate Terms

We used 7839 discharge summary notes (5.4 million words) from the University of Pittsburgh NLP Repository (using these data requires a license) [54], called EHR-Pittsburgh for convenience, for this study. We applied the linguistic filter of the Java Automatic Term Extraction (JATE) toolkit (version 1.11) [55] to EHR-Pittsburgh to extract candidate terms (see step 1 in Figure 1). JATE’s linguistic filter uses a word extractor, a noun phrase extractor, and a stop word list to select high-quality words and noun phrases as candidate terms. We extracted a total of 106,108 candidate terms and further used them to identify and rank medical terms.

Figure 1

Overview of development of the adapted distant supervision (ADS) natural language processing system to rank candidate terms mined from electronic health record (EHR) corpora: data extraction (steps 1 and 2), ADS (step 3), and evaluation (step 4). CHV: consumer health vocabulary.

Consumer Health Vocabulary

CHV was developed by collaborative research to address vocabulary discrepancies between lay people and health care professionals [56-59]. CHV incorporates terms extracted from various consumer health sites, including queries submitted to MedlinePlus, a consumer-oriented online knowledge resource maintained by the US National Library of Medicine [60,61]. CHV contains 152,338 terms, most of which are consumer health terms [60-62]. Zeng et al [60] mapped these terms to UMLS concepts by a semiautomatic approach. As a result of this work, CHV encompasses lay terms (eg, “low blood sugar level” and “heart attack”), as well as corresponding medical terms (eg, “hypoglycemia” and “myocardial infarction”). In this study, we used CHV to create distantly labeled training data for ADS.

Important Terms for Electronic Health Record Comprehension

We defined important terms as those terms that, if understood by the patients, would significantly improve their EHR comprehension. In practice, we used 4 criteria, unithood, termhood, unfamiliarity, and quality of compound term (defined with examples in Multimedia Appendix 2), to judge term importance.

Except for unithood, which is a general criterion for lexical entry selection, the other 3 criteria all measure term importance from the perspective of patient EHR comprehension (details in Multimedia Appendix 2). For example, familiar terms are not important because they are already known by the average patient. High-quality compound terms are those terms whose meanings are beyond the simple sum of their component words (eg, “community-acquired pneumonia”). These terms are important and should be annotated with lay definitions because otherwise patients would not understand them even if they know all the individual words in these terms.

Distant Supervision from Consumer Health Vocabulary

We used CHV to select positive examples to train ADS (see step 2 in Figure 1). Specifically, we assumed that medical terms that occur in both EHRs and CHV (called EHR-CHV terms) are important for patient EHR comprehension. We chose CHV for distant supervision for 3 reasons. First, terms in CHV have been curated and thus all satisfy the unithood criterion. Second, recall that medical terms existing in CHV are synonyms of consumer health terms initially identified from queries and postings generated by patients in online health forums. Therefore, we expect most of these terms to bear clear and significant clinical meanings for patients and thus satisfy the termhood criterion. Third, CHV assigns familiarity scores to 57.89% (88,189 out of 152,338) of its terms for extended usability, which can be used to distinguish between medical terms and lay terms. CHV familiarity scores estimate the likelihood that a term can be understood by an average reader [63] and take values between 0 and 1 (with 1 being most familiar and 0 being least familiar). CHV provides different types of familiarity scores [21]. Following Zeng-Treitler et al [21], we used the combined score and used a heuristic rule (ie, CHV familiarity score ≤0.6) to identify medical terms.

Despite the aforementioned merits, CHV is not perfect in labeling the training data. First, there is not a clear boundary between familiar and unfamiliar terms if their CHV familiarity scores are close to 0.6. For example, “congestive heart failure” and “atypical migraine” have familiarity scores of 0.64 and 0.61; therefore, they would be labeled as negative examples by CHV. However, these 2 terms were judged by domain experts as important terms that need lay definitions. Second, some compound terms in CHV (eg, “knee osteoarthritis,” “brain MRI,” “aspirin allergy”), although labeled as positive examples by CHV, were judged by domain experts as being not high-quality compound terms from the perspective of efficiently expanding a lay language resource and thus did not need immediate treatment for adding lay definitions.

Transfer Learning Algorithms Problem Formalization

Since CHV-labeled training data are noisy, we used transfer learning to adapt the system distantly supervised by CHV to the target-domain task. More formally, we defined the training data derived from CHV as the source-domain data D_S={(x^s₁, y^s₁), (x^s₂, y^s₂), …, (x^s_N, y^s_N)} and the target-domain data as D_T={(x^t₁, y^t₁), (x^t₂, y^t₂), …, (x^t_M, y^t_M)}, where N is the number of source-domain instances, (x^s, y^s) is the paired feature vector and class label of an instance in the source domain, and M and (x^t, y^t) are defined similarly for the target domain. Notice that we refer to CHV-labeled candidate terms as the source-domain data by following the convention of transfer learning, although these terms were extracted from EHRs. In our study, we used all the N source-domain instances and at most K (K « M) target-domain instances to train the model. The goal of transfer learning is to make an optimal use of the N+K training data to improve model performance on the M-K target-domain test data.

In this study, we investigated 2 state-of-the-art transfer learning methods: feature space augmentation (FSA) and supervised distant supervision (SDS).

Feature Space Augmentation

FSA [64] has shown the best performance in semantic role labeling for clinical text [53].

This approach assumes that D_S and D_T share the same feature space X= R^F (ie, each feature vector is an F-dimension real-valued vector) and defines an augmented feature space X⁺= R^3F. It then defines 2 feature mapping functions, Φ_S and Φ_T: X→ X⁺, by Equation 1 (Figure 2) to respectively map feature vectors from D_S and D_T to the augmented feature space. The motivation is to make the learning easier by separating the general features (ie, the first F dimensions in the augmented feature space, which are useful to learn examples in both D_S and D_T) and the domain-specific features (the second and third F dimensions in the augmented feature space). In addition, it allows a single model to regulate jointly the trade-off between the general and domain-specific feature weights.

Figure 2

Equations for feature mapping functions used in feature space augmentation (1), objective function used in supervised distant supervision (2), and average precision (3).

Supervised Distant Supervision

SDS is an extension of the algorithm recently proposed by Wallace et al [47]. It minimizes an objective function that combines empirical source-domain and target-domain errors, as defined in Equation 2 (Figure 2).

Our algorithm differs from that of Wallace et al [47] in that it does not assume that only positive examples in the source domain are unreliable and is therefore more generalizable.

Implementation Issues

We implemented 2 versions of the ADS system, ADS-fsa and ADS-sds, by incorporating the 2 transfer learning algorithms. We used the log-linear model as the base of all the models (including the baseline models introduced in the subsection Baseline Systems) and used L2 regularization for model training. The output from the log-linear models is probabilities of a candidate term being a positive example and can be used to rank candidate terms directly. We used grid search and cross-validation on the target-domain training data to set the hyperparameters α (the corpus weighting parameter in Equation 2; Figure 2) and C (the hyperparameter of the log-linear model to control the regularization strength; a small C corresponds to a strong regularization). In our experiments, we set α=β(K / N) (N and K are the size of the source- and target-domain training data) and searched β in [0.01, 0.1, 1, 10, 100]. We searched C in [1,0.1,0.001,0.0001].

Training and Evaluation Datasets Data Annotation

We derived the training and evaluation datasets from the 106,108 candidate terms extracted from EHR-Pittsburgh as follows.

First, 3 people with a postgraduate level of education in biology, public health, and biomedical informatics reviewed candidate terms among the terms ranked as high by the nonadapted distant supervision model (ie, among the top 10,000 terms) or by the term recognition algorithm C-value [65] (ie, among the top 5000 terms). We chose top-ranked terms, which were likely to contain more important terms than randomly sampled terms, to speed up the whole annotation process. We used the output from 2 methods to increase the diversity of terms used for evaluation and used more terms from the distant supervision model because a manual review suggested that it outperformed C-value. We adopted the expert annotation approach because nonexperts may lack sufficient knowledge to judge the domain relevance and quality of a candidate term, which could potentially introduce noise to the data and slow down the annotation process.

Each term was annotated by 1 primary reviewer and then reviewed by another reviewer based on the 4 criteria introduced in the subsection Important Terms for Electronic Health Record Comprehension (details in Multimedia Appendix 2). Difficult cases were discussed and resolved within the group. Using this procedure, we obtained 6038 annotated terms (3530 positive examples and 2508 negative examples) before starting this study and used all of them for our experiments. To compute the interannotator agreement, 2 reviewers independently annotated 500 candidate terms and achieved a .71 kappa coefficient on this dataset.

Target-Domain Training and Evaluation Sets

We used 1000 examples randomly sampled from the 6038 annotated terms as the target-domain training set and used the remaining 5038 terms as the evaluation set. We did not use stratified sampling because in practice we did not know the class distribution of the target-domain data or the test data. In transfer learning, the target-domain training data are critical to system performance. Therefore, we repeated the above procedure 100 times to obtain 100 pairs of <target training set, evaluation set> for system evaluation to take into account the variance of the target training set. To test the effects of the size of the target-domain training data, we reported system performance by using L (L=100, 200, 500, 1000) examples randomly selected from the full target training set.

Source-Domain Training Set

We first obtained 100,070 terms by removing the 6038 manually labeled terms from the 106,108 candidate terms. We then automatically labeled the 100,070 terms based on whether a term was an EHR-CHV medical term (ie, positive term) or not (ie, negative term). In this way, we obtained 4166 positive terms and 95,904 negative terms. Because we did not know the distribution of the target-domain data, we randomly sampled 3000 positive and 3000 negative terms from these data to form a balanced source-domain training set. We set the size of the source training set to 6000 by following previous work [66].

Baseline Systems

We employed 2 baselines commonly used to evaluate transfer learning methods [47,53,64]: SourceOnly or nonadapted distant supervision model, which was trained by using only source-domain training data, and TargetOnly, which was trained by using only target-domain training data.

Features Word Embedding

Word embedding is the distributed vector representation of words. It has emerged as a powerful technique for word representation and proved beneficial in a variety of biomedical and clinical NLP tasks. We used word2vec software to create the skip-gram word embeddings [67,68] and trained word2vec using a combined text corpus (over 3 billion words) of English Wikipedia, articles from PubMed Central Open Access Subset, and 99,735 EHR notes from the University of Pittsburgh NLP Repository [54]. We set the training parameters by following Jagannatha et al [37] and Pyysalo et al [69]. Specifically, we used 200-dimension vectors with a window size of 6 and used hierarchical soft-max with a subsampling threshold of 0.001 for training. We represented multiword terms (ie, compound terms) by the mean of the vectors of their component words by following Jagannatha et al [37] and Chen and colleagues [38,39].

Semantic Type

We mapped candidate terms to UMLS concepts and included semantic types for those concepts that had an exact match or a head-noun match as features. Each semantic type is a 0-1 binary feature. This type of feature has been used to identify domain-specific medical terms [23,33] and to rank medical terms from individual EHR notes [38].

Automatic Term Recognition

We used the confidence scores from 2 term-recognition algorithms: corpus-level term frequency-inverse document frequency [55] and C-value [65].

General-Domain Term Frequency

We generated 4 features from the Google Ngram corpus [70]: the average, minimum, and maximum frequencies of a term’s component words and the term frequency. Corpus frequency has proved to be a strong indicator for term familiarity [63,71]. The Google Ngram corpus is a database of unigram and n-gram counts of words collected from over 15 million books containing over 5 billion pages. We used the top 4.4 million high-frequency words from this corpus and their unigram, bigram, and trigram matches to derive our features.

Term Length

Term length is the number of words in a term. Because a long candidate term may not be a good compound term but rather a simple concatenation of shorter terms (eg, “left heart cardiac catheterization”), this feature may help the ADS system to identify and rank as low the low-quality compound terms.

Evaluation Metrics Average Precision

This metric averages precision P(k) at rank k as a function of recall r, as defined in Equation 3 (Figure 2).

Area Under the Receiver Operating Characteristic Curve

The area under the receiver operating characteristic curve (AUC-ROC) is computed; this curve plots the true positive rate (y-coordinate) against the false positive rate (x-coordinate) at various threshold settings.

Recall that we have 100 pairs of <target training set, evaluation set> randomly sampled from the 6038 labeled terms. When evaluating a system, we averaged its performance scores on the 100 pairs of datasets and report the averaged values.

We used sklearn.metrics to compute the average precision and AUC-ROC scores. Scikit-learn is an open source Python library widely used for machine learning [72]. In this study, we only reported the paired-samples t test results for performance difference between the ADS systems and the baselines because the baselines were expected to be better than a random classifier. The AUC-ROC score of each individual system tested in our experiments was significantly better than 0.5—that is, the AUC-ROC score of a random classifier (P<.001).

Statistical Analysis

We used the paired-samples t test to test the significance of the performance difference between a pair of systems. We used scipy.stats to conduct the paired t test. SciPy is an open source Python library widely used for scientific computing [73].

Results ADS Ranking Performance on Evaluation Set

Table 1 shows the evaluation results, where the 2 ADS systems outperformed the 2 baselines significantly (t₉₉ ranges from 4.84 to 133.31, P<.001) for AUC-ROC and average precision under all 4 conditions (ie, using 4 different sizes of target training data). The performance scores of the ADS systems continuously improved with increased size of target training data. When comparing the 2 ADS systems, ADS-fsa performed significantly better than ADS-sds when using 1000 target-domain training examples for transfer learning and performed worse than ADS-sds when using 100 or 200 target-domain training examples (see bottom 2 rows in Table 1 for t and P values).

Table 1

Performance of different natural language processing systems on the evaluation set under 4 conditions using 100, 200, 500, and 1000 target-domain training examples^a.

System		AUC-ROC^b				Average precision
		100	200	500	1000	100	200	500	1000
SourceOnly		0.739	0.739	0.739	0.739	0.811	0.811	0.811	0.811
TargetOnly		0.728	0.749	0.769	0.782	0.799	0.816	0.833	0.844
ADS-fsa^c		0.746	0.756	0.776	0.790	0.815	0.823	0.839	0.850
ADS-sds^d		0.751	0.759	0.775	0.786	0.819	0.826	0.838	0.847
ADS-fsa vs ADS-sds^e
	t₉₉	4.25	2.79		8.78	3.81	3.04		11.58
	P values	<.001	.01		<.001	<.001	.003		<.001

^aThe highest performance scores are italicized.

^bAUC-ROC: area under the receiver operating characteristic curve.

^cADS-fsa: adapted distant supervision-feature space augmentation.

^dADS-sds: adapted distant supervision-supervised distant supervision.

^eThe P values for difference between ADS-fsa and SourceOnly, ADS-sds and SourceOnly, ADS-fsa and TargetOnly, and ADS-sds and TargetOnly are <.001 (t₉₉ ranges from 4.84 to 133.31) for all metrics under all conditions. We report the P values (if the P value ≤.05) and the corresponding t₉₉ values for difference between ADS-fsa and ADS-sds.

The average familiarity level or score of top-ranked terms measures one important aspect of ranking quality. However, because many terms in the evaluation set did not have CHV familiarity scores, we could not compute this value directly. A manual review of the top 500 terms ranked by the best system—that is, ADS-fsa trained using 1000 target-domain training examples—did find many unfamiliar medical terms, including “autoimmune enteropathy,” “ileostomy,” “myasthenia gravis,” “nifedipine,” “parathyroid hormone,” and “phototherapy.”

Effects of Individual Features on ADS Ranking Performance

In addition to evaluating system performance, we tested the contribution of each individual feature to system performance by using feature ablation experiments. Table 2 shows that ADS-sds’s performance dropped significantly (P<.001 for both measures under all 4 conditions) when respectively dropping word embedding, general-domain term frequency, and term length. Dropping the semantic features had mixed results, slightly decreasing performance when the target-domain training set was large and increasing performance when the target-domain training set was small. Dropping features derived from automatic term recognition had no statistically significant effects. The effects of dropping individual features on ADS-fsa’s performance were similar (see the first table in Multimedia Appendix 3).

Table 2

Performance of different ADS-sds^a systems implemented by using all types of features or by dropping each individual type of feature, under 4 conditions using 100, 200, 500, and 1000 target-domain training examples^b.

ADS-sds system		AUC-ROC^c				Average precision
		100	200	500	1000	100	200	500	1000
ADS-sds-ALL^d		0.751	0.759	0.775	0.786	0.819	0.826	0.838	0.847
ADS-sds-woWE^e		0.711	0.718	0.726	0.733	0.780	0.785	0.793	0.799
ADS-sds-woWE vs ADS-sds-ALL
	t₉₉	30.37	32.74	59.92	112.25	36.61	39.63	81.04	124.15
	P value	<.001	<.001	<.001	<.001	<.001	<.001	<.001	<.001
ADS-sds-woSem^f		0.753	0.760	0.772	0.782	0.823	0.829	0.838	0.845
ADS-sds-woSem vs ADS-sds-ALL
	t₉₉			4.63	12.28	3.18	4.00		4.55
	P value			<.001	<.001	.002	<.001		<.001
ADS-sds-woATR^g		0.751	0.759	0.774	0.786	0.819	0.826	0.838	0.847
ADS-sds-woGTF^h		0.740	0.749	0.765	0.777	0.813	0.821	0.833	0.842
ADS-sds-woGTF vs ADS-sds-ALL
	t₉₉	13.04	9.50	14.85	22.55	8.12	6.49	11.52	23.07
	P value	<.001	<.001	<.001	<.001	<.001	<.001	<.001	<.001
ADS-sds-woTLⁱ		0.741	0.751	0.767	0.778	0.807	0.815	0.829	0.838
ADS-sds-woTL vs ADS-sds-ALL
	t₉₉	11.21	10.81	19.78	25.58	16.43	17.15	34.50	41.72
	P value	<.001	<.001	<.001	<.001	<.001	<.001	<.001	<.001

^aADS-sds: adapted distant supervision-supervised distant supervision.

^bWe report the P values (if the P value ≤.05) and the corresponding t₉₉ values for differences between each implementation and ADS-sds-ALL.

^cAUC-ROC: area under the receiver operating characteristic curve.

^dADS-sds-ALL: ADS-sds with all types of features.

^eADS-sds-woWE: ADS-sds without word embedding.

^fADS-sds-woSem: ADS-sds without semantic features.

^gADS-sds-woATR: ADS-sds without features derived from automatic term recognition.

^hADS-sds-woGTF: ADS-sds without general-domain term frequency.

ⁱADS-sds-woTL: ADS-sds without term length.

Discussion Principal Results

In an effort to build a lexical resource that provides lay definitions for medical terms in EHRs, we developed the ADS system to rank candidate terms mined from an EHR corpus and prioritized our efforts to collect and curate lay definitions for top-ranked terms. Given only 100 labeled target training examples, the best ADS system, ADS-sds, achieved 0.751 AUC-ROC and 0.819 average precision on the evaluation set, which are significantly better (P<.001) than the corresponding performance scores of supervised learning (Table 1, ADS-sds vs TargetOnly). When using 1000 target-domain training examples, the best ADS system, ADS-fsa, achieved 0.790 AUC-ROC and 0.850 average precision, also significantly better (P<.001) than that achieved by supervised learning (Table 1, ADS-fsa vs TargetOnly).

Our evaluation set was challenging, because terms included in this set had been prefiltered (ie, ranked as high) by 2 term-ranking methods (details in the Training and Evaluation Datasets subsection). In other words, we evaluated ADS on a set of candidate terms that had higher quality than the average candidate terms mined from EHRs, for which the boundaries between positive and negative examples were more subtle. For example, some candidate terms (eg, “metastatic carcinoid tumor,” “normal serum calcium,” and “acute cardiac ischemia”), although registered as medical terms in UMLS, were judged nonimportant or nonurgent for lay definition creation because their meanings could be easily inferred from their component words.

The evaluation results on this dataset suggest that our ADS system is effective in ranking EHR terms and can be used to facilitate the expansion of lexical resources that support EHR comprehension. In particular, it can be used to alleviate the data sparseness problem when there are very few target-domain training data and can be used to boost the performance of supervised learning when the size of the training data increases.

Effects of Target-Domain Training Data

Our evaluation results also suggested that using more target-domain training data is beneficial for system performance (rows 2-4 in Table 1). In an additional experiment (details in Multimedia Appendix 4), we found that the performance of ADS-fsa, the best system when using 1000 target training data, continued to improve with increased target training data and began to plateau when the number of target training examples reached 2500.

Effects of Individual Features

The results of our feature ablation experiment (Table 2) indicate that word embedding contributes mostly to system performance, followed by general-domain term frequency and term length. Although dropping semantic features had mixed effects, the results from further analysis indicate that semantic features are useful when excluding word embedding from the feature set. Specifically, adding semantic features on the 3 other types of features (ie, automatic term recognition, general-domain term frequency, and term length) significantly improved system performance (t₉₉ ranges from 12.74 to 128.11, P<.001 for 2 measures under 4 conditions; see the second table in Multimedia Appendix 3 for details). This suggests that most information provided by the semantic features for ranking terms is subsumed by that provided by word embedding (but not vice versa). Different from the semantic features, the automatic term recognition features had little additional effect on the performance even without counting word embedding. A likely reason is that our evaluation data set was created by including terms already ranked as high (top 5%) by the automatic term recognition algorithm C-value [65], which may have diminished the effect of this type of feature on this dataset.

Comparing Different Transfer Learning Methods

Although ADS-fsa and ADS-sds were both effective in ranking EHR terms (Table 1), ADS-fsa had small gains over ADS-sds when the size of target training data was large (1000 examples) and vice versa when the size of the target training data was small (100 and 200 examples). The 2 systems used different methods, SDS and FSA, to balance the source- and target-domain training data. Specifically, SDS allows fine-grained weighting of training data from source and target domains at the instance level; FSA, by using an augmented feature space, allows redistribution of feature weights for source, target, and “shared” domains. Our results suggest that instance weighting (ie, ADS-sds) can be more effective when the target-domain training data are very limited.

Error Analysis

We identified three major types of errors through error analysis on the top-rank and low-rank terms (using 300 as the rank threshold) that were ranked by the ADS-sds system that used 1000 target-domain training examples for transfer learning. Error analysis for ADS-fsa showed similar results. First, we found that most errors were caused by compound terms. Specifically, ADS-sds ranked some terms (such as “malignant cell,” “chronic rhinitis,” and “viral bronchitis”) as high, even though their meanings could be easily inferred from their component words. It also ranked certain good compound terms (eg, “community-acquired pneumonia,” “end-stage kidney failure,” and “left ventricular ejection fraction”) as low when these terms contained familiar words. This suggests that advanced features generated by a compound term detector may improve the system’s performance, which we may explore in the future. Second, ADS-sds missed certain terms that are lay terms in the general domain but bear unfamiliar clinical meanings (eg, “baseline,” “vehicle,” and “family history”). Third, ADS-sds ranked some common medical terms (eg, “aspirin,” “vitamin,” and “nerve”) as high, although these terms are likely to be already known by the average patient. The second and third types of errors may be reduced by including domain-specific knowledge about term familiarity as additional features, which we will study in the future.

Conclusion

We report a novel ADS system for ranking and identifying medical terms important for patient EHR comprehension. We empirically show that the ADS system outperforms strong baselines, including supervised learning, and transfer learning can effectively boost its performance even with only 100 target-domain training examples. The EHR terms prioritized by our model have been used to expand a comprehensive lay language lexical resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.

Multimedia Appendix 1

Analysis results of consumer health vocabulary’s coverage of terms in electronic health record notes.

Multimedia Appendix 2

Criteria used for manual selection of terms important for patient comprehension of electronic health record notes.

Multimedia Appendix 3

Effects of features on performance of adapted distant supervision.

Multimedia Appendix 4

Effects of increasing target-domain training data on system performance.

Abbreviations

ADS

adapted distant supervision

AUC-ROC

area under the receiver operating characteristic curve

CHV

consumer health vocabulary

EHR

electronic health record

FSA

feature space augmentation

JATE

Java Automatic Term Extraction

NLP

natural language processing

SDS

supervised distant supervision

UMLS

Unified Medical Language System

This work was supported by the Institutional National Research Service Award (T32) 5T32HL120823-02 from the US National Institutes of Health (NIH) and the Health Services Research & Development Service of the US Department of Veterans Affairs Investigator Initiated Research (1I01HX001457-01). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH, the US Department of Veterans Affairs, or the US Government.

We thank Weisong Liu, Elaine Freund, Emily Druhl, and Victoria Wang for technical support in data collection. We also thank the anonymous reviewers for their constructive comments and suggestions.

HY and JC designed the study. JC and ANJ collected the data. JC designed and developed the ADS system, conducted the experiments, and drafted the manuscript. ANJ contributed substantially to feature generation for ADS. HY and SJF provided important intellectual input into system evaluation and content organization. All authors contributed to the writing and revision of the manuscript.

None declared.

Steinbrook

Health care and the American Recovery and Reinvestment Act

N Engl J Med 2009 03 12 360 11 1057 60

10.1056/NEJMp0900665

19224738

NEJMp0900665

Wright

Feblowitz

Samal

McCoy

Sittig

The Medicare Electronic Health Record Incentive Program: provider performance on core and menu measures

Health Serv Res 2014 02 49 1 Pt 2 325 46

10.1111/1475-6773.12134

24359554

PMC3925405

Irizarry

DeVito

Curran

Patient portals and patient engagement: a state of the science review

J Med Internet Res 2015 17 6 e148

10.2196/jmir.4255

26104044

v17i6e148

PMC4526960

Delbanco

Walker

Bell

Darer

Elmore

Farag

Feldman

Mejilla

Ngo

Ralston

Ross

Trivedi

Vodicka

Leveille

Inviting patients to read their doctors' notes: a quasi-experimental study and a look ahead

Ann Intern Med 2012 10 2 157 7 461 70

10.7326/0003-4819-157-7-201210020-00002

23027317

1363511

PMC3908866

Nazi

Hogan

McInnes

Woods

Graham

Evaluating patient access to electronic health records: results from a survey of veterans

Med Care 2013 03 51 3 Suppl 1 S52 6

10.1097/MLR.0b013e31827808db

23407012

00005650-201303001-00011

Woods

Schwartz

Tuepker

Press

Nazi

Turvey

Nichol

Patient experiences with full electronic access to health records and clinical notes through the My HealtheVet Personal Health Record Pilot: qualitative study

J Med Internet Res 2013 15 3 e65

10.2196/jmir.2356

23535584

v15i3e65

PMC3636169

Pyper

Amery

Watson

Crook

Patients' experiences when accessing their on-line electronic patient records in primary care

Br J Gen Pract 2004 01 54 498 38 43

14965405

PMC1314776

Keselman

Slaughter

Smith

Kim

Divita

Browne

Tsai

Zeng-Treitler

Towards consumer-friendly PHRs: patients' experience with reviewing their health records

AMIA Annu Symp Proc 2007 399 403

18693866

PMC2655877

Chapman

Abraham

Jenkins

Fallowfield

Lay understanding of terms used in cancer consultations

Psychooncology 2003 09 12 6 557 66

10.1002/pon.673

12923796

Lerner

Jehle

Janicke

Moscati

Medical communication: do our patients understand?

Am J Emerg Med 2000 11 18 7 764 6

10.1053/ajem.2000.18040

11103725

S0735-6757(00)39827-8

Jones

McGhee

Patient on-line access to medical records in general practice

Health Bull (Edinb) 1992 03 50 2 143 50

1517087

Baldry

Cheal

Fisher

Gillett

Huet

Giving patients their own records in general practice: experience of patients and staff

Br Med J (Clin Res Ed) 1986 03 1 292 6520 596 8

3081187

PMC1339574

Kutner

Greenberg

Jin

Paulsen

The health literacy of America's adults: results from the 2003 National Assessment of Adult Literacy (NCES 2006-483) 2006 09

2016-07-15

Washington, DC

Department of Education, National Center for Educational Statistics (NCES)

http://nces.ed.gov/pubs2006/2006483.pdf

Sarkar

Karter

Liu

Adler

Nguyen

Lopez

Schillinger

The literacy divide: health literacy and the use of an internet-based patient portal in an integrated health system-results from the diabetes study of northern California (DISTANCE)

J Health Commun 2010 15 Suppl 2 183 96

10.1080/10810730.2010.499988

20845203

926943837

PMC3014858

Zarcadoolas

Vaughon

Czaja

Levy

Rockoff

Consumers' perceptions of patient-accessible electronic medical records

J Med Internet Res 2013 15 8 e168

10.2196/jmir.2507

23978618

v15i8e168

PMC3758049

Tieu

Sarkar

Schillinger

Ralston

Ratanawongsa

Pasick

Lyles

Barriers and facilitators to online portal use among patients and caregivers in a safety net health care system: a qualitative study

J Med Internet Res 2015 17 12 e275

10.2196/jmir.4847

26681155

v17i12e275

PMC4704882

Mackert

Mabry-Flynn

Champlin

Donovan

Pounders

Health literacy and health information technology adoption: the potential for a new digital divide

J Med Internet Res 2016 10 04 18 10 e264

10.2196/jmir.6349

27702738

v18i10e264

PMC5069402

Aronson

Lang

An overview of MetaMap: historical perspective and recent advances

J Am Med Inform Assoc 2010 17 3 229 36

10.1136/jamia.2009.002733

20442139

17/3/229

PMC2995713

McCray

Promoting health literacy

J Am Med Inform Assoc 2005 12 2 152 63

10.1197/jamia.M1687

15561782

M1687

PMC551547

Keselman

Logan

Smith

Leroy

Zeng-Treitler

Developing informatics tools and strategies for consumer-centered health communication

J Am Med Inform Assoc 2008 15 4 473 83

10.1197/jamia.M2744

18436895

M2744

PMC2442255

Zeng-Treitler

Goryachev

Kim

Keselman

Rosendale

Making texts in electronic health records comprehensible to consumers: a prototype translator

AMIA Annu Symp Proc 2007 846 50

18693956

PMC2655860

Kandula

Curtis

Zeng-Treitler

A semantic and syntactic text simplification tool for health content

AMIA Annu Symp Proc 2010 2010 366 70

21347002

PMC3041424

Polepalli

Houston

Brandt

Fang

Improving patients' electronic health record comprehension with NoteAid

Stud Health Technol Inform 2013 192 714 8

23920650

Neter

Brainin

eHealth literacy: extending the digital divide to the realm of health information

J Med Internet Res 2012 01 14 1 e19

10.2196/jmir.1619

22357448

v14i1e19

PMC3374546

Connolly

Crosby

Examining e-Health literacy and the digital divide in an underserved population in Hawai'i

Hawaii J Med Public Health 2014 02 73 2 44 8

24567867

PMC3931409

Diviani

van den Putte

Giani

van Weert

Low health literacy and evaluation of online health information: a systematic review of the literature

J Med Internet Res 2015 17 5 e112

10.2196/jmir.4018

25953147

v17i5e112

PMC4468598

Sbaffi

Rowley

Trust and credibility in web-based health information: a review and agenda for future research

J Med Internet Res 2017 06 19 19 6 e218

10.2196/jmir.7579

28630033

v19i6e218

Graber

Roller

Kaeble

Readability levels of patient education material on the World Wide Web

J Fam Pract 1999 01 48 1 58 61

9934385

Berland

Elliott

Morales

Algazy

Kravitz

Broder

Kanouse

Muñoz

Puyol

Lara

Watkins

Yang

McGlynn

Health information on the Internet: accessibility, quality, and readability in English and Spanish

JAMA 2001 285 20 2612 21

11368735

joc02274

PMC4182102

Kasabwala

Agarwal

Hansberry

Baredes

Eloy

Readability assessment of patient education materials from the American Academy of Otolaryngology--Head and Neck Surgery Foundation

Otolaryngol Head Neck Surg 2012 09 147 3 466 71

10.1177/0194599812442783

22473833

0194599812442783

Zeng

Tse

Exploring and developing consumer health vocabularies

J Am Med Inform Assoc 2006 13 1 24 9

10.1197/jamia.M1761

16221948

M1761

PMC1380193

Zeng

Tse

Divita

Keselman

Crowell

Browne

Goryachev

Ngo

Term identification methods for consumer health vocabulary development

J Med Internet Res 2007 02 28 9 1 e4

10.2196/jmir.9.1.e4

17478413

v9i1e4

PMC1874512

Spasić

Schober

Sansone

Rebholz-Schuhmann

Kell

Paton

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining

BMC Bioinformatics 2008 04 29 9 Suppl 5 S5

10.1186/1471-2105-9-S5-S5

18460187

1471-2105-9-S5-S5

PMC2367623

Doing-Harris

Zeng-Treitler

Computer-assisted update of a consumer health vocabulary through mining of social network data

J Med Internet Res 2011 05 17 13 2 e37

10.2196/jmir.1636

21586386

v13i2e37

PMC3221384

Ahltorp

Skeppstedt

Kitajima

Henriksson

Rzepka

Araki

Expansion of medical vocabularies using distributional semantics on Japanese patient blogs

J Biomed Semantics 2016 09 26 7 1 58

10.1186/s13326-016-0093-x

27671202

10.1186/s13326-016-0093-x

PMC5037651

Henriksson

Conway

Duneld

Chapman

Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records

AMIA Annu Symp Proc 2013 2013 600 9

24551362

PMC3900203

Jagannatha

Chen

Mining and ranking biomedical synonym candidates from Wikipedia

2015

Sixth International Workshop on Health Text Mining and Information Analysis (Louhi)

Sep 17-21, 2015

Lisbon, Portugal

142 151

Chen

Zheng

Finding important terms for patients in their electronic health records: a learning-to-rank approach using expert annotations

JMIR Med Inform 2016 11 30 4 4 e40

10.2196/medinform.6373

27903489

v4i4e40

PMC5156821

Chen

Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients

J Biomed Inform 2017 04 68 121 131

10.1016/j.jbi.2017.02.016

28267590

S1532-0464(17)30045-X

PMC5505865

Craven

Kumlien

Constructing biological knowledge bases by extracting information from text sources

Proc Int Conf Intell Syst Mol Biol 1999 77 86

10786289

Morgan

Hirschman

Colosimo

Yeh

Colombe

Gene name identification and normalization using a model organism database

J Biomed Inform 2004 12 37 6 396 410

10.1016/j.jbi.2004.08.010

15542014

S1532-0464(04)00087-5

Mintz

Bills

Snow

Jurafsky

Distant supervision for relation extraction without labeled data

2009

Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2

Aug 2-7, 2009

Suntec, Singapore

1003 1011

Usami

Cho

Okazaki

Tsujii

Automatic acquisition of huge training data for bio-medical named entity recognition

2011

2011 Workshop on Biomedical Natural Language Processing

Jun 23-24, 2011

Portland, OR, USA

65 73

Buyko

Beisswanger

Hahn

The extraction of pharmacogenetic and pharmacogenomic relations--a case study using PharmGKB

Pac Symp Biocomput 2012 376 87

22174293

9789814366496_0037

Bobic

Klinger

Thomas

Hofmann-Apitius

Improving distantly supervised extraction of drug-drugprotein-protein interactions

2012

Joint Workshop on UnsupervisedSemi-Supervised Learning in NLP

Apr 23-27, 2012

Avignon, France

35 43

Liu

Ling

Relation extraction from biomedical literature with minimal supervision and grouping strategy

2014

2014 IEEE International Conference on Bioinformatics and Biomedicine

Nov 2-5, 2014

Belfast, UK

444 449

10.1109/BIBM.2014.6999198

Wallace

Kuiper

Sharma

Zhu

Marshall

Extracting PICO sentences from clinical trial reports using supervised distant supervision

J Mach Learn Res 2016 17

27746703

PMC5065023

Pan

Yang

A survey on transfer learning

IEEE Trans Knowl Data Eng 2010 10 22 10 1345 1359

10.1109/TKDE.2009.191

Yang

A survey of transfer and multitask learning in bioinformatics

J Comput Sci Eng 2011 5 3 257 268

10.5626/JCSE.2011.5.3.257

Ferraro

Daumé

Duvall

Chapman

Harkema

Haug

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

J Am Med Inform Assoc 2013 20 5 931 9

10.1136/amiajnl-2012-001453

23486109

amiajnl-2012-001453

PMC3756264

Zheng

Identifying key concepts from EHR notes using domain adaptation

2015

Sixth International Workshop on Health Text Mining and Information Analysis (Louhi)

Sep 17-21, 2015

Lisbon, Portugal

115 119

Dahlmeier

Domain adaptation for semantic role labeling in the biomedical domain

Bioinformatics 2010 04 15 26 8 1098 104

10.1093/bioinformatics/btq075

20179074

btq075

Zhang

Tang

Jiang

Wang

Domain adaptation for semantic role labeling of clinical text

J Am Med Inform Assoc 2015 09 22 5 967 79

10.1093/jamia/ocu048

26063745

ocu048

PMC4986662

Mowery

Wiebe

Visweswaran

Harkema

Chapman

Building an automated SOAP classifier for emergency department reports

J Biomed Inform 2012 02 45 1 71 81

10.1016/j.jbi.2011.08.020

21925286

S1532-0464(11)00147-X

PMC3267853

Zhang

Iria

Brewster

Ciravegna

A comparative evaluation of term recognition algorithms

2008

LREC 2008: Sixth International Conference on Language Resources Evaluation

May 26-Jun 1, 2008

Marrakech, Morocco

2108 2113

McCray

Loane

Browne

Bangalore

Terminology issues in user access to Web-based medical information

Proc AMIA Symp 1999 107 11

10566330

D005626

PMC2232498

Zeng

Kogan

Ash

Greenes

Patient and clinician vocabulary: how different are they?

Stud Health Technol Inform 2001 84 Pt 1 399 403

11604772

Patrick

Monga

Sievert

Houston

Longo

Evaluation of controlled vocabulary resources for development of a consumer entry vocabulary for diabetes

J Med Internet Res 2001 3 3 e24

10.2196/jmir.3.3.e24

11720966

PMC1761907

Zeng

Kogan

Ash

Greenes

Boxwala

Characteristics of consumer terminology for health information retrieval

Methods Inf Med 2002 41 4 289 98

12425240

02040289

Zeng

Tse

Crowell

Divita

Roth

Browne

Identifying consumer-friendly display (CFD) names for health concepts

AMIA Annu Symp Proc 2005 859 63

16779162

58480

PMC1560732

Keselman

Smith

Divita

Kim

Browne

Leroy

Zeng-Treitler

Consumer health concepts that do not map to the UMLS: where do they fit?

J Am Med Inform Assoc 2008 15 4 496 505

10.1197/jamia.M2599

18436906

M2599

PMC2442253

Tse

Soergel

Exploring medical expressions used by consumers and the media: an emerging view of consumer health vocabularies

AMIA Annu Symp Proc 2003 674 8

14728258

D030002918

PMC1479921

Zeng

Kim

Crowell

Tse

A text corpora-based estimation of the familiarity of health terminology

2005

Sixth International Conference on Biological and Medical Data Analysis

Nov 10-11, 2005

Aveiro, Portugal

184 192

10.1007/11573067_19

Daume

Frustratingly easy domain adaption

2004

45th Annual Meeting of the Association of Computational Linguistics

Jun 23-30, 2007

Prague, Czech Republic

256 263

Frantzi

Ananiadou

Mima

Automatic recognition of multi-word terms: the C-value/NC-value method

Int J Digit Libr 2000 8 1 3 2 115 130

10.1007/s007999900023

Jiang

Zhai

Instance weighting for domain adaption in NLP

2007

45th Annual Meeting of the Association of Computational Linguistics

Jun 23-30, 2007

Prague, Czech Republic

264 271

Mikolov

Sutskever

Chen

Corrado

Dean

Distributed representations of words and phrases and their compositionality

2013

Advances in Neural Information Processing Systems (NIPS 2013)

Dec 5-10, 2013

Lake Tahoe, NV, USA

3111 3119

Mikolov

Chen

Corrado

Dean

ArXiv13013781 Cs 2013 9 7

2017-06-18

Efficient estimation of word representations in vector space http://arxiv.org/abs/1301.3781

Pyysalo

Ginter

Moen

Salakoski

Ananiadou

Distributional semantics resources for biomedical text processing

2013

The 5th International Symposium on Languages in Biology and Medicine (LBM 2013)

December 12-13, 2013

Tokyo, Japan

39 43

Michel

Shen

Aiden

Veres

Gray

Google Books Team Pickett

Hoiberg

Clancy

Norvig

Orwant

Pinker

Nowak

Aiden

Quantitative analysis of culture using millions of digitized books

Science 2011 01 14 331 6014 176 82

10.1126/science.1199644

21163965

science.1199644

PMC3279742

Elhadad

Comprehending technical texts: predicting and defining unfamiliar terms

AMIA Annu Symp Proc 2006 239 43

17238339

86239

PMC1839621

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Scikit-learn: machine learning in Python

J Mach Learn Res 2011 12 Oct 2825 2830

Jones

Oliphant

Peterson

SciPy: Open source scientific tools for Python. 2001-. SciPy.org 2017

2017-10-20

SciPy developers

https://www.scipy.org/