Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Published on 27.09.18 in Vol 6, No 3 (2018): Jul-Sep

Preprints (earlier versions) of this paper are available at, first published May 12, 2018.

This paper is in the following e-collection/theme issue:

    Original Paper

    Extraction and Standardization of Patient Complaints from Electronic Medication Histories for Pharmacovigilance: Natural Language Processing Analysis in Japanese


    Background: Despite the growing number of studies using natural language processing for pharmacovigilance, there are few reports on manipulating free text patient information in Japanese.

    Objective: This study aimed to establish a method of extracting and standardizing patient complaints from electronic medication histories accumulated in a Japanese community pharmacy for the detection of possible adverse drug event (ADE) signals.

    Methods: Subjective information included in electronic medication history data provided by a Japanese pharmacy operating in Hiroshima, Japan from September 1, 2015 to August 31, 2016, was used as patients’ complaints. We formulated search rules based on morphological analysis and daily (nonmedical) speech and developed a system that automatically executes the search rules and annotates free text data with International Classification of Diseases, Tenth Revision (ICD-10) codes. The performance of the system was evaluated through comparisons with data manually annotated by health care workers for a data set of 5000 complaints.

    Results: Of 5000 complaints, the system annotated 2236 complaints with ICD-10 codes, whereas health care workers annotated 2348 statements. There was a match in the annotation of 1480 complaints between the system and manual work. System performance was .66 regarding precision, .63 in recall, and .65 for the F-measure.

    Conclusions: Our results suggest that the system may be helpful in extracting and standardizing patients’ speech related to symptoms from massive amounts of free text data, replacing manual work. After improving the extraction accuracy, we expect to utilize this system to detect signals of possible ADEs from patients’ complaints in the future.

    JMIR Med Inform 2018;6(3):e11021





    Adverse drug events (ADEs) are any untoward injuries resulting from the use of a drug [1]. They occur in around 18% of inpatients [1-4] and are a significant burden on health care and society. The ADEs are a cause of morbidity, and mortality and their economic loss is estimated at US $177.4 billion annually in the US [5]. In the field of pharmacovigilance, postmarketing surveillance such as spontaneous reporting is important for the detection of ADEs because clinical trials have limitations including patient sample size, population, and administration period [6].

    The need to understand patients’ subjective complaints and to use other sources in pharmacovigilance has increased. Unlike health care providers, patients use various expressions and terminology to describe their situations. Direct reporting from patients is helpful in understanding their detailed symptoms and impacts on quality of life, which medical professionals tend to overlook [7-9]. For example, analysis of the content of comments posted on patients’ online community pages revealed unknown long-term symptoms of antidepressant withdrawal [10]. The Maintenance and Support Services Organization developed the Patient-Friendly Term List [11] based on the most frequent ADEs reported by patients and consumers to facilitate direct patient reporting of ADEs to regulators and the pharmaceutical industry. Despite its importance, little work has been done on exploring patient records until recently due to their unstructured, time-consuming data format.

    Natural language processing (NLP) is the automatic manipulation of natural language such as narrative text and speech for extraction and structuring [12]. Numerous attempts have been made to use NLP in electronic health records (EHRs), social media, medical literature, or existing reporting systems [13-29]. Those studies found that NLP could identify various points for the assessment of medications (eg, inactive medication, nonadherence, patients’ mentions of ADEs).

    In Japan, text analysis and automated detection of medical events from EHRs have been reported [30], and a tool for disease entity encoding was developed [31]. However, these 2 studies intended only to manipulate clinical text provided by health care professionals using medical terminology. No previous study dealt with patients’ complaints in their own words in Japanese.

    Prior Work

    Nikfarjam et al [25] introduced a machine learning-based extraction system using conditional random fields (CRFs) for user posts on DailyStrength (precision: .86, recall: .78, F-measure: .82) and Twitter (precision: .76, recall: .68, F-measure: .72) to detect adverse drug reaction (ADR) signals. Freifeld et al [28] classified Twitter posts (precision: .72, recall: .72) to compare product-event pairs with the US Food and Drug Administration Adverse Event Reporting System (FAERS) data.

    In the mining of patients’ reports, Topaz et al [26] used a linguistic-based approach comparing EHRs (clinicians’ reports) and social media (patients’ mentions) for 2 common drugs. White et al [27] used search log data for the identification of ADE signals and a comparison with FAERS data resulted in high concordance as determined by the Area Under the Curve Receiver Operating Characteristics curve of .82. Denecke et al [29] collected data from multiple media sites with keyword lists and classified texts as relevant/irrelevant using support vector machines.

    Although no previous studies have been completed in Japanese, Aramaki et al [30] reported on a system to extract medical event information from Japanese EHRs based on CRFs (precision: .85, recall: .77, F-measure: .81). The text source in their study was written in medical terminology, mainly by physicians. No lexicon to standardize patients’ informal expressions such as the Patient-Free Term List [11] and the work of Freifeld et al [28] has been published in Japanese.

    Study Aim

    This study aimed to develop techniques to establish a method for extracting and standardizing patient complaints from electronic medication history data (EMHD) accumulated in a Japanese community pharmacy for the detection of possible ADE signals.


    Concept of the System

    We propose a system that automatically extracts and standardizes patient complaints (Figure 1). In this system, subjective information included in the medication histories collected from a pharmacy is input data, and data in which International Classification of Diseases, Tenth Revision (ICD-10) codes are attached to patient expressions are outputs. A dictionary-based method was adopted for extraction and standardization. The processing steps in the system are as follows. First, morphological analysis is performed on input data. Next, the search rules are applied to split data. In the search rules, morpheme combinations in general expressions and the corresponding ICD-10 codes are described for each line, and exclusion rules are set for some ICD-10 codes. When a patient expression satisfies the search rules, a corresponding ICD-10 code is given. Procedures for creating the search and exclusion rules and system development procedures are detailed in “Search Rules” and “System Development.”

    Data Sources

    The EMHD stored in a community pharmacy were used as the source of patients’ comments. When pharmacists dispense prescription drugs to patients, they are required to record the results of medication instructions and patients’ queries/responses. A medication history in Japan is typically written in the “SOAP” format, which consists of 4 sections: “Subjective information” (complaints of the patient), “Objective information” (objective indicators such as laboratory findings or names of drugs prescribed), “Assessment” (the pharmacist’s findings on the occurrence of ADRs, interactions, or doubt about prescription instructions), and “Plan” (action plan of the pharmacist derived from the assessment).

    Although patients do not write the medication history, of those 4 sections the “Subjective information” appeared to be the most appropriate text source, because pharmacists complete that section in the patients’ own words.

    Patients’ comments were extracted from the EMHD of a community pharmacy operated by Holon Co, Ltd, Hiroshima, Japan. This company operates a chain of 14 pharmacies, and the data used in this study mainly came from a single one. The study period was from September 1, 2015 to August 31, 2016. Personal information such as patients’ names and birth dates were anonymized before analysis.

    Information on the hospitals or clinics that issued prescriptions for which the subjective information used in this study was derived is shown in Table 1. The pharmacy filled a total of 42,120 prescriptions during the study period for the top 9 prescribing hospitals or clinics. The number of prescriptions from medical institution A was the highest (18,273/42,120, 43.5%). Clinic A specializes in otolaryngology, and the patients are older adults who often complain of dizziness or hearing loss.

    Table 2 shows the items recorded in the EMHD, while Figure 2 is an example of a recording object.

    Figure 1. Concept of the system. ICD-10: International Classification of Diseases, Tenth Revision..
    View this figure
    Table 1. Backgrounds of the prescriptions used in this study (N=42,120).
    View this table
    Table 2. Items recorded in the electronic medication history data.
    View this table
    Figure 2. Example of a medication history.
    View this figure

    Search Rules

    We created search rules to identify the appropriate ICD-10 code from the free text in the “Subjective information” section and developed a coding system that annotates the ICD-10 codes within patient complaints. The ICD-10 was originally an English-based system but is also used in Japan. It was translated into Japanese by the World Health Organization, and a coding rulebook was published. For example, in Medis [32] the ICD-10 is given as the basic classification code, and coding matched as closely as possible to clinical interpretation is undertaken. Although it may be possible to use the Medical Dictionary for Regulatory Activities (MedDRA) or the International Classification of Primary Care as a medical code system, we adopted ICD-10 in this study because it is used for insurance claims in Japan and because many coders are familiar with ICD-10.

    In developing the system, a nurse with 10 years of experience in the field of terminal care and a medical coder with 20 years of experience created the search rules based on the expressions in the “Subjective information” section. A programmer read the search rules and developed a program to accommodate new expressions. Search rules were created by a combination of morphological analysis and common expressions.

    The search rules govern the pattern for analyzing comments included in the subjective information. The rules were saved in Microsoft Excel format with the corresponding disease entity category and ICD-10 codes. For example, to search for “D69.9: Hemorrhagic condition, unspecified,” the search strings are “(出血|しゅっけつ|血)+(傾向|けいこう|し易く|しやすく|し易い|しやすい|出やすい|止まりにくい|とまりにくい|止まらない|とまらない).” In English, this would translate to “(bleeding|blood)+(tendency|easy to|hard to stop|won’t stop|not stop).” Written Japanese utilizes 3 orthographic systems: Chinese characters, hiragana, and katakana. Therefore, the actual search strings are longer than in English. All rules are shown in Multimedia Appendix 1. The rule-making steps are shown in Textbox 1. We repeated this process 5 times over 1 month in order to refine the search rules.

    The nurse first checked the free text recorded in the “Subjective information” section and selected complaints referring to patients’ symptoms. Then words related to ICD-10 codes were manually extracted from the complaints. Finally, the extracted words were added sequentially to the search string for each ICD-10 code. The search strings consist of patterns of word combinations using “|” (logical sum) or “+” (logical product). At present, a maximum of 3 words/terms can be combined in a string separated by “+” signs. For example, from the text “blood pressure today was a little high,” the terms “blood pressure” and “a little high” were extracted, and the system annotated the text with the ICD-10 code “I10: hypertension.”

    However, some text found in the “Subjective information” section could not be annotated with an ICD-10 code even though it followed the search rules. Therefore, we set exclusion rules for some codes, which were created following the same procedure as for the search rules but were only applied when a health care worker could visually confirm the keyword for exclusion. For the previous example of “D69.9: Hemorrhagic condition, unspecified,” terms with “(-|ない|なし|無い|無し),” in English, “(-|no|none|negative|never|don’t)” were excluded even if they included search strings. For example, “(血がとまりにくい),” in English, “(the bleeding won’t stop),” was annotated as D69.9, but “(血がとまりにくいことはない),” in English, “(I never felt the bleeding wouldn’t stop),” was excluded.

    System Development

    The system developed extracts complaints related to patients’ symptoms from the “Subjective information” section of EMHD automatically and annotated each complaint with the ICD-10 code using the search rules above. During system development, we used Perl as the programming language and MeCab [33] as a morphological analyzer. The Microsoft Excel format was used for subsequent analysis.

    The development procedure can be summarized as follows:

    1. Subjective information was extracted from each saved Microsoft Excel file
    2. Morphological analysis was performed to extract subjective information, separating the text with spaces into minimum meaningful units of words/terms
    3. After the processes above were performed, the subjective information was copied back into a Microsoft Excel file. Search rules and exclusion rules were applied to the subjective information by analyzing each complaint and searching for the ICD-10 code
    4. If an appropriately matching ICD-10 code was found, the complaint was annotated with the ICD-10 code and the corresponding disease entity

    The coding system adapts the search rules (shown in Multimedia Appendix 1) in order from the top. If an adaptable rule is found, the result of ICD-10 coding is output. If multiple rules are matched, all of them are output in the results.

    Textbox 1. Rule-making steps.
    View this box
    Figure 3. System interface screenshot. ICD-10: International Classification of Diseases, Tenth Revision.
    View this figure

    Optimization of System Performance

    For optimal performance of the system, the system-annotated disease entities should ideally match the entities manually annotated by health care professionals. As mentioned above, the more thoroughly the search rules are satisfied, the more accurate the system. Therefore, we reviewed the search rules multiple times to determine the most appropriate ones to improve the accuracy of the system.

    In this study, we did not attempt machine learning for the detection of relevant terms to match ICD-10 codes. By adding search rules as appropriate, free text can be automatically associated with ICD-10 codes via the system.


    An evaluation experiment was conducted to confirm the performance of the system. Five thousand complaints from the subjective information were processed, and 323 search rules were created. In the experiment, health care workers (1 nurse and 1 pharmacist) first independently annotated the 5000 complaints manually with the ICD-10 codes. Second, 108 mismatched annotations were excluded, and the data from the remaining 2348 were used as correct answers for the subsequent step. Finally, the system with 323 search rules was applied to the 5000 complaints.

    The subjective information used in this study consisted of multiple sentences, and thus several patient expressions were obtained from one “Subjective information” section. Since each patient expression is linked to the ICD-10 code, multiple ICD-10 codes are assigned to a single “Subjective information” section. In evaluating the system in this study, if one of the plural ICD-10 codes differed from the manual result, it was judged that all other coding for that entry was incorrect (unmatched). Figure 3 shows an actual system execution screen.

    Based on the results of this experiment, the precision, recall, and F-measure of the system were calculated [34,35]. Precision was calculated by dividing the matched number (the number of “Subjective information” sections for which manual coding and system coding had the same result) by the searched number (the number of “Subjective information” sections that the system annotated with ICD-10 codes). Recall was calculated by dividing the matched number by the correct answers (the number of “Subjective information” sections manually coded). The F-measure was calculated by taking the harmonic mean between precision and recall.

    Ethical Considerations

    This study was approved by the Ethics Committees on Human Research of the Faculty of Pharmacy, Keio University and Nara Institute of Science and Technology.


    Examples of correct answer data and system execution results are shown in Table 3. From 5000 complaints, 2348 ICD-10 codes were extracted by health care workers. The system extracted 2236 codes, 1480 of which matched the manual results. The system performed .662 for precision, .630 for recall and .646 for F-measure. Table 4 shows precision and recall for the 10 most frequent symptoms extracted by health care workers.

    Table 3. Comparison between manual and system extraction of ICD-10 codes for patient complaints from typical examples.
    View this table
    Table 4. Precision and recall for the 10 most frequent symptoms.
    View this table

    Textbox 2. Six reasons for unmatched results. ICD-10: International Classification of Diseases, Tenth Revision.
    View this box

    The results indicated that the average performance of the system was .66 for precision, .63 for recall, and .65 for the F-measure. Comparing the performance for each symptom, the precision of “dizziness and giddiness,” “pain, unspecified,” and “ataxic gait” was especially low. We identified 6 reasons for the unmatched results for these 3 symptoms, as shown in Textbox 2.

    Table 5 details the unmatched results and typical examples for 3 symptoms. The main reason for discordance between manual and system coding was misdetection of negation or possible event in “R42: dizziness and giddiness” (79/108 results, 73.1%) and “R26.0: ataxic gait” (71/79 results, 90%), whereas misdetection of drug class name was the most common in “R52.9: pain, unspecified” (28/91 results, 31%).

    Table 5. Details of unmatched results and typical examples for 3 symptoms.
    View this table


    Principal Results

    Nikfarjam et al [25] and Aramaki et al [30] used CRFs, and Freifeld et al [28] used a tree-based dictionary matching algorithm for extracting the terms. Our approach involved rule-based searching, which is much simpler but less tolerant of orthographic variants. Additionally, differences in linguistic features might have contributed to the gap between the results of the present study and nonJapanese ones [25,28]. In written Japanese, words are not separated by spaces, and therefore the accuracy of extraction is affected by the quality of morphological analysis. Considering these points, the results are at least adequate as the first step in possible ADE signal detection.

    This was the first attempt to standardize patients’ expressions with the Japanese version of ICD-10 and to use the “Subjective Information” section in the medication history as a source. The advantage of using the medication history is its structured format and data storability. The medication history is recorded for patient monitoring including side effects. Its features provide more specialized information relevant to possible ADEs than social media like Twitter or EHRs in hospitals. Moreover, the number of pharmacies in Japan is increasing [36,37], and pharmacists are required to record patient medication histories for health insurance claims. Thus, huge amounts of data on patients’ medication are available, making medication histories an appropriate source for ADE signal detection.

    It is not necessary for ADEs to have causal relationships with drugs, whereas ADRs must have a reasonable association with drug use. Using patient records to detect ADRs is a major challenge because causality cannot be readily assessed; however, it is also important to detect potential ADE signals.

    In this study, some text could not be annotated with ICD-10 codes. As compared with the performance of health care professionals, our newly developed system performed at levels of .66 for precision, .63 for recall, and .65 for the F-measure. These values are relatively lower than in previous studies [25,28,30], likely due to differences in methodology. As explained in the Experiment section, if one of the many ICD-10 codes was different from the manual result, all other coding was regarded as incorrect (unmatched) for that entry. This is one reason why the F-measure was lower than in previous research.

    There was also insufficient specific information about the condition of each patient. Because the majority of patients are not medical experts, they describe their symptoms in everyday language, which is more equivocal and more inflected than medical terminology. Nikfarjam et al [25] reported similar aspects of ambiguity and lack of context in patients’ wording.

    The dialect spoken can affect the subjective information, although, of 5,000 complaints analyzed in this study, only 7 were recorded in a regional dialect. This is probably related to the nature of the text. Although it is recommended that pharmacists record patients’ statements exactly, it is possible that they replace dialect expressions with standard wording to make the information easier to understand by others later.

    Regarding standardization across languages, the present system could be applied to other languages to some extent by translating the morphemes used for the search rules or by adding or refining the search rules later.


    There were some limitations of this study. First, qualitative differences in the text data could have occurred. The “Subjective information” section is filled in by pharmacists, and therefore they may interpret and summarize patients’ comments when they record them. To ensure that the medication histories of all patients are recorded during the daily business hours of community pharmacies, in some cases fixed-form complaint set phrases and excerpts of comments may be relied on to decrease the time needed to complete the “Subjective information” section. It is therefore possible that the finer nuances patients hope to convey are altered or lost during the process. Qualitative differences were also noted among pharmacists for the contents of the “Subjective information” section. Some wrote about symptoms using explicit medical terminology (eg, “back pain and knee pain were unabated”). Others included general information unrelated to symptoms (eg, greetings and general conversation transcribed word for word).

    Second, it was difficult for the system to determine whether the extracted keyword was related to patients’ symptoms or those of others. For example, from the sentence “My friend had hypertension,” the system may extract “hypertension,” although it is unrelated to the speaker’s condition. This point should be improved by revising the search rules after consultation with regulatory experts or using machine learning to deal with ambiguity.

    Also, since only 1 of 14 pharmacies in a single chain participated in this study, there is a possibility that the search rules were optimized for patients receiving prescriptions from specific medical departments. In the experimental results, the most frequent ICD-10 code was “dizziness and giddiness.” As shown in Table 1, the target pharmacy frequently dispenses prescriptions from otolaryngologists, and the results may reflect this potential bias. Before the practical application of the system, it is necessary to improve the search rules by considering a wider range of medication histories including data from other community pharmacies.

    ICD-10 codes were used as normalization terms for patients’ complaints regarding their symptoms because they are widely available and understood, but MedDRA is thought to be more suitable for extracting information on ADRs and for signal detection. We are currently enhancing the system to accommodate MedDRA terms.


    In this study, we developed an automated system to extract terms related to symptoms from the verbal complaints of Japanese patients. As a result of an evaluation experiment comparing automated with manual extraction, the system performed at the level of .66 in precision, .63 in recall, and .65 for the F-measure. Although the accuracy of the system was not satisfactory, our results suggest that it might be useful in extracting and standardizing patients’ expressions related to symptoms from massive amounts of free text data instead of performing those procedures manually. After improving the extraction accuracy, we expect to utilize this system to detect the signals of ADRs from patients’ complaints in the future.


    The authors wish to thank Holon Co, Ltd For agreeing to participate in this study by providing anonymized EMHD records from September 1, 2015, to August 31, 2016. This research was supported by AMED under grant #JP16mk0101058.

    Conflicts of Interest

    One of the authors (TS) is an employee of Holon Co, Ltd.

    Multimedia Appendix 1

    Search rules for the program.

    XLSX File (Microsoft Excel File), 40KB


    1. Nebeker JR, Barach P, Samore MH. Clarifying adverse drug events: a clinician's guide to terminology, documentation, and reporting. Ann Intern Med 2004 May 18;140(10):795-801. [Medline]
    2. Venulet J. Progress in Drug Research. Basel: Birkhauser Verlag, Basel; 1997:233-292.
    3. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA 1998 Apr 15;279(15):1200-1205. [Medline]
    4. Budnitz DS, Lovegrove MC, Shehab N, Richards CL. Emergency hospitalizations for adverse drug events in older Americans. N Engl J Med 2011 Nov 24;365(21):2002-2012. [CrossRef] [Medline]
    5. Ernst FR, Grizzle AJ. Drug-related morbidity and mortality: updating the cost-of-illness model. J Am Pharm Assoc (Wash) 2001 Apr;41(2):192-199. [Medline]
    6. Rogers AS. Adverse drug events: identification and attribution. Drug Intell Clin Pharm 1987 Nov;21(11):915-920. [Medline]
    7. Basch E. The missing voice of patients in drug-safety reporting. N Engl J Med 2010 Mar 11;362(10):865-869 [FREE Full text] [CrossRef] [Medline]
    8. Barbara AM, Loeb M, Dolovich L, Brazil K, Russell M. Agreement between self-report and medical records on signs and symptoms of respiratory illness. Prim Care Respir J 2012 Jun;21(2):145-152 [FREE Full text] [CrossRef] [Medline]
    9. Avery AJ, Anderson C, Bond CM, Fortnum H, Gifford A, Hannaford PC, et al. Evaluation of patient reporting of adverse drug reactions to the UK 'Yellow Card Scheme': literature review, descriptive and qualitative analyses, and questionnaire surveys. Health Technol Assess 2011 May;15(20):1-234, iii [FREE Full text] [CrossRef] [Medline]
    10. Belaise C, Gatti A, Chouinard V, Chouinard G. Patient online report of selective serotonin reuptake inhibitor-induced persistent postwithdrawal anxiety and mood disorders. Psychother Psychosom 2012;81(6):386-388 [FREE Full text] [CrossRef] [Medline]
    11. Patient-Friendly TL(V2. The MSSO. 0)   URL: [accessed 2018-07-07] [WebCite Cache]
    12. Freund PR, Rowell LB, Murphy TM, Hobbs SF, Butler SH. Blockade of the pressor response to muscle ischemia by sensory nerve block in man. Am J Physiol 1979 Oct;237(4):H433-H439. [CrossRef] [Medline]
    13. Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges. Pharmacotherapy 2018 Jun 09. [CrossRef] [Medline]
    14. Uzuner O, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011;18(5):552-556 [FREE Full text] [CrossRef] [Medline]
    15. Li Y, Salmasian H, Harpaz R, Chase H, Friedman C. Determining the reasons for medication prescriptions in the EHR using knowledge and natural language processing. AMIA Annu Symp Proc 2011;2011:768-776 [FREE Full text] [Medline]
    16. Rudd RA, Aleshire N, Zibbell JE, Gladden RM. Increases in Drug and Opioid Overdose Deaths--United States, 2000-2014. MMWR Morb Mortal Wkly Rep 2016 Jan 01;64(50-51):1378-1382 [FREE Full text] [CrossRef] [Medline]
    17. Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014 Dec;52:293-310 [FREE Full text] [CrossRef] [Medline]
    18. Shetty KD, Dalal SR. Using information mining of the medical literature to improve drug safety. J Am Med Inform Assoc 2011;18(5):668-674 [FREE Full text] [CrossRef] [Medline]
    19. Hazell L, Shakir SAW. Under-reporting of adverse drug reactions : a systematic review. Drug Saf 2006;29(5):385-396. [Medline]
    20. Botsis T, Nguyen MD, Woo EJ, Markatou M, Ball R. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc 2011;18(5):631-638 [FREE Full text] [CrossRef] [Medline]
    21. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 2010;17(1):19-24 [FREE Full text] [CrossRef] [Medline]
    22. Carrell DS, Cronkite D, Palmer RE, Saunders K, Gross DE, Masters ET, et al. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform 2015 Dec;84(12):1057-1064. [CrossRef] [Medline]
    23. Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther 2012 Aug;92(2):228-234 [FREE Full text] [CrossRef] [Medline]
    24. Gonzalez-Hernandez G, Sarker A, O'Connor K, Savova G. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform 2017 Aug;26(1):214-227 [FREE Full text] [CrossRef] [Medline]
    25. Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 2015 May;22(3):671-681 [FREE Full text] [CrossRef] [Medline]
    26. Topaz M, Lai K, Dhopeshwarkar N, Seger DL, Sa'adon R, Goss F, et al. Clinicians' Reports in Electronic Health Records Versus Patients' Concerns in Social Media: A Pilot Study of Adverse Drug Reactions of Aspirin and Atorvastatin. Drug Saf 2016 Mar;39(3):241-250. [CrossRef] [Medline]
    27. White RW, Wang S, Pant A, Harpaz R, Shukla P, Sun W, et al. Early identification of adverse drug reactions from search log data. J Biomed Inform 2016 Feb;59:42-48. [CrossRef] [Medline]
    28. Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, et al. Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf 2014 May;37(5):343-350 [FREE Full text] [CrossRef] [Medline]
    29. Denecke K, Krieck M, Otrusina L, Smrz P, Dolog P, Nejdl W, et al. How to exploit twitter for public health monitoring? Methods Inf Med 2013;52(4):326-339. [CrossRef] [Medline]
    30. Aramaki E, Miura Y, Tonoike M, Ohkuma T, Mashuichi H, Ohe K. TEXT2TABLE: Medical text summarization system based on named entity recognition and modality identification. 2009 Presented at: Workshop on BioNLP; June 4-5,2009; Colorado p. 185-192.
    31. Aramaki E, Yano K, Wakamiya S. MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts. Stud Health Technol Inform 2017;245:285-288. [Medline]
    32. Medis. Standard disease name master for ICD-10   URL: [accessed 2018-07-06] [WebCite Cache]
    33. MeCab. MeCab: Yet Another Part-of-Speech and Morphological Analyzer   URL: [accessed 2018-05-10] [WebCite Cache]
    34. Friedman C, Hripcsak G. Evaluating natural language processors in the clinical domain. Methods Inf Med 1998 Nov;37(4-5):334-344. [Medline]
    35. Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 2005;12(3):296-298 [FREE Full text] [CrossRef] [Medline]
    36. Ministry of Health, Labour and Welfare of Japan. Report on Public Health Administration and Services FY2016   URL: [accessed 2018-07-06] [WebCite Cache]
    37. Japan Pharmaceutical Society. Report on insurance dispensing trends   URL: [accessed 2018-07-06] [WebCite Cache]


    ADE: adverse drug event
    ADR: adverse drug reaction
    AMED: Japan Agency for Medical Research and Development
    CRF: conditional random fields
    EHR: electronic health record
    EMHD: electronic medication history data
    FAERS: US Food and Drug Administration Adverse Event Reporting System
    ICD-10: International Classification of Diseases, Tenth Revision
    MedDRA: Medical Dictionary for Regulatory Activities
    NLP: natural language processing

    Edited by G Eysenbach; submitted 12.05.18; peer-reviewed by MA Mayer, KNB Nor Aripin; comments to author 13.06.18; revised version received 07.08.18; accepted 25.08.18; published 27.09.18

    ©Misa Usui, Eiji Aramaki, Tomohide Iwao, Shoko Wakamiya, Tohru Sakamoto, Mayumi Mochizuki. Originally published in JMIR Medical Informatics (, 27.09.2018.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.