Original Paper
Abstract
Background: Severe drug hypersensitivity reactions (DHRs) refer to allergic reactions caused by drugs and usually present with severe skin rashes and internal damage as the main symptoms. Reporting of severe DHRs in hospitals now solely occurs through spontaneous reporting systems (SRSs), which clinicians in charge operate. An automatic identification system scrutinizes clinical notes and reports potential severe DHR cases.
Objective: The goal of the research was to develop an automatic identification system for mining severe DHR cases and discover more DHR cases for further study. The proposed method was applied to 9 years of data in pediatrics electronic health records (EHRs) of Beijing Children’s Hospital.
Methods: The phenotyping task was approached as a document classification problem. A DHR dataset containing tagged documents for training was prepared. Each document contains all the clinical notes generated during 1 inpatient visit in this data set. Document-level tags correspond to DHR types and a negative category. Strategies were evaluated for long document classification on the openly available National NLP Clinical Challenges 2016 smoking task. Four strategies were evaluated in this work: document truncation, hierarchy representation, efficient self-attention, and key sentence selection. In-domain and open-domain pretrained embeddings were evaluated on the DHR dataset. An automatic grid search was performed to tune statistical classifiers for the best performance over the transformed data. Inference efficiency and memory requirements of the best performing models were analyzed. The most efficient model for mining DHR cases from millions of documents in the EHR system was run.
Results: For long document classification, key sentence selection with guideline keywords achieved the best performance and was 9 times faster than hierarchy representation models for inference. The best model discovered 1155 DHR cases in Beijing Children’s Hospital EHR system. After double-checking by clinician experts, 357 cases of severe DHRs were finally identified. For the smoking challenge, our model reached the record of state-of-the-art performance (94.1% vs 94.2%).
Conclusions: The proposed method discovered 357 positive DHR cases from a large archive of EHR records, about 90% of which were missed by SRSs. SRSs reported only 36 cases during the same period. The case analysis also found more suspected drugs associated with severe DHRs in pediatrics.
doi:10.2196/37812
Keywords
Introduction
Drug hypersensitivity reactions (DHRs) are one of the adverse drug reactions resembling allergy occurs. DHRs affect more than 7% of the population and are a significant cause of the postmarketing withdrawal of drugs [
]. Severe DHRs, such as anaphylactic shock, drug-induced hypersensitivity syndrome, Stevens-Johnson syndrome, and epidermolysis bullosa, have been observed worldwide with an annual incidence of 0.05 to 3 persons per million population. With mortality rates varying between 5% to 30%, severe DHRs in pediatric populations, including children, infants, and even newborns, comprise 10% to 20% of reported cases [ , ].Reporting of severe DHRs in hospitals now solely occurs through spontaneous reporting systems (SRSs), which clinicians in charge operate. Previous studies showed that only 10% to 30% of severe adverse drug reactions were reported in SRSs [
]. Even though the missed cases were properly handled and simply not logged into the SRS system, a more thorough report would have helped improve drug guidelines. Recently, routinely collected medical data such as electronic health records (EHRs) are increasingly being used to complement the SRS and enable active pharmacovigilance. EHR systems contain detailed data with timestamps for admissions, discharges, diagnoses, medications, and laboratory tests. However, severe DHR rely on symptoms and signs for detection, which in turn often reside in the free-text areas of EHRs and require the use of natural language processing to extract information.One of the most well-studied medical language processing applications is phenotyping (eg, the automatic evaluation of phenomics traits such as smoking status) [
]. Automatic identification of severe DHRs in patients can also be explored as a phenotyping task. When no structural data are available, the phenotyping of clinical notes can be formulated as a document classification task, which has been well studied in the natural language processing field.Recent work [
- ] has reported that clinical documents are too long for contextualized language models to process. Our research group has integrated the medical data from a hospital and established a vertical data warehouse in its early stage. Unlike previous works that only process discharge summaries [ - ], this DHR task deals with documents consisting of all clinical notes associated with 1 inpatient visit. The average word length of discharge summaries is typically hundreds of words. However, in this DHR data set, the average word length is up to several thousand Chinese characters, and some documents contain tens of thousands of Chinese characters. Therefore, picking the best strategy for long document classification is crucial for achieving our objective.Methods
Pipeline Design
This work approaches the automatic identification of DHR cases as a long document classification problem. For training purposes, domain experts prepared a corpus containing document-level tags.
demonstrates the proposed system pipeline. First, 4 strategies for long document classification on the openly available smoking task were compared and evaluated. Second, the best strategy for the DHR task was applied. The pretrained embedding models of Chinese medical text on our own DHR task were compared and evaluated. A grid search to tune machine learning classifiers for the best document classification performance on the DHR data set was performed. Finally, the best pipeline to 9 years of data in a paramedic EHR was applied.
Ethics Approval
The study was reviewed and approved (2019-k-5) by the Institutional Ethics Committee of Beijing Children’s Hospital in China, with a waiver of informed consent.
Data Set and Metrics
Smoking Task
The smoking challenge [
] automatically determines patients’ smoking status from their discharge summaries. The 502 discharge summaries present 5 statuses: past smoker, current smoker, smoker, nonsmoker, and unknown. Following previous work, the class smoker was ignored. shows the training and test data distribution.Past smoker | Current smoker | Nonsmoker | Unknown | Total | |
Train data set | 36 | 35 | 66 | 252 | 389 |
Test data set | 11 | 11 | 16 | 63 | 101 |
Severe DHR Task
Data Source
Beijing Children’s Hospital’s information system allows for a patient’s history and physician notes to be digitally recorded and instantaneously available via the network to all patient departments. A vertical data warehouse was built based on the integration of medical data in the early stage. It contains 431,972 hospitalization records of 315,608 patients from January 1, 2012, to December 31, 2020, including detailed diagnostic information, medication information, laboratory tests, disease course data, etc. Among them, a hospitalization record represents a hospitalization process. If a patient is hospitalized multiple times, the same patient will have multiple hospitalization records.
Corpus Construction
Positive cases that present severe DHRs were collected from 2 pools: the 31 positive cases logged to National Medical Products Administration reporting system and the 183 positive cases discovered by chart review. After deduplication, 200 positive cases were collected. Each positive case was assigned 1 of 4 subcategories. Furthermore, 400 negative cases were randomly sampled from Beijing Children’s Hospital’s EHR system. These cases were assigned a negative (NEG) tag and hand-checked by physicians to ensure they did not present severe DHRs.
The definitions of the 4 subtypes of severe DHR are shown in
as found in the Guidelines for Medical Nomenclature Use of Adverse Drug Reactions issued by the Center for Drug Reevaluation of the China National Medical Products Administration in 2016 [ ].Training and Test Data Set
These 5 categories of documents were randomly sampled into the training and test data sets. The training and test data distribution is shown in
. The positive and negative ratio is close to the corresponding ratio in the smoking task.SJSa | DIHSb | ASc | EBd | NEGe | Total | |
Training data set | 56 | 44 | 18 | 32 | 323 | 473 |
Test data set | 18 | 3 | 5 | 7 | 77 | 110 |
aSJS: Stevens-Johnson syndrome.
bDIHS: drug-induced hypersensitivity syndrome.
cAS: anaphylactic shock.
dEB: epidermolysis bullosa.
eNEG: negative.
Evaluation Metrics
The micro-averaged F1 score was used to evaluate the performance of different models following previous study [
]. This metric is used for multiclass classification problems, measuring a balance between precision and recall and giving equal weights to each category.Strategies for Long Document Classification
Four strategies were evaluated and compared: document truncation [
], hierarchy representation [ , ], more efficient self-attention [ ], and key sentence selection [ , , , ]. The best strategy for long document classification was based on the openly available National NLP Clinical Challenges 2016 smoking task results [ ]. The results of this task can be more fairly compared to other related works.Document Truncation
The most straightforward way to apply a transformer model with a length limit is to truncate the input and pick the first block of tokens. These models typically require a length limit of 512 words.
More Efficient Self-Attention
Self-attention models, such as bidirectional encoder representation from transformer (BERT), require quadratic computational time and space with respect to the input sequence length. The Longformer model uses sparse self-attention instead of full self-attention to process longer documents (up to 4096 tokens).
Hierarchy Representation
In a hierarchy approach, sentence representations are built first and then aggregated into a document-level representation. In previous work on the phenotyping task of clinical notes, document representation is built by a sampling layer on top of the BERT blocks of each sentence [
].Key Sentence Selection
A few key sentences could be enough for the document classification task. In previous works, unsupervised methods were explored to generate key sentences, which did not always perform well [
, ]. In this work, the keywords extracted from the task-specific guidelines were explored. The sentences containing keywords were selected as key sentences.For the smoking task, unigrams and bigrams from previous work were taken as the keyword list: cigarette, smoke, smoked, smoker, smokes, smoking, tobacco [
].For the DHR task, 2 sets of keywords were evaluated and compared. As an unsupervised method, the term frequency-inverse document frequency (TF-IDF) algorithm computed top feature words. Those containing numbers, foreign alphabets, and special characters were removed from these 2000 words. A total of 163 feature words with a score higher than zero were added to the keyword list.
The parts of the clinical notes that make references to the corresponding guidelines are most relevant for differential classification. Each positive category in the DHR data set is well defined in the corresponding guideline [
- ]. Medical terms were hand-picked from the guidelines. No domain knowledge was required to distinguish medical terms from general text. These keywords are shown in in Chinese and in English.The guideline keywords for the severe drug hypersensitivity reaction task in Chinese. AS: anaphylactic shock; DIHS: drug-induced hypersensitivity syndrome; EB: epidermolysis bullosa; IVIG: intravenous immunoglobulin; SJS: Stevens-Johnson syndrome; TEN: toxic epidermal necrolysis.
- Stevens-Johnson综合征, 过敏性休克, 药物超敏反应综合征, 大疱表皮松解症, AS, EB, TEN, SJS, DIHS
- 过敏,超敏,黏膜,红斑,松解,喘鸣,支气管痉挛,发绀,呼气流量峰值下降,肌张力减退,荨麻疹,血管性水肿,紫绀,低血容量性低血压,斑疹,斑丘疹,无菌性脓疱,紫癜,剥脱性皮炎,融合成片,松弛性水疱,表皮松解,大疱,表皮剥脱,叶状鳞屑,表皮剥离,猩红热样,麻疹样,弥漫性,黏膜侵蚀,大疱
- 糖皮质激素,肾上腺素,甲基泼尼松龙,泼尼松,地塞米松, IVIG,甲泼尼龙
The guideline keywords for the severe drug hypersensitivity reaction task in English. AS: anaphylactic shock; DIHS: drug-induced hypersensitivity syndrome; EB: epidermolysis bullosa; IVIG: intravenous immunoglobulin; SJS: Stevens-Johnson syndrome; TEN: toxic epidermal necrolysis.
- Stevens-Johnson syndrome, anaphylactic shock, drug-induced hypersensitivity syndrome, epidermolysis bullosa, AS, EB, TEN, SJS, DIHS
- Allergy, hypersensitivity, mucous membrane, erythema, epidermolysis, wheezing, bronchospasm, cyanosis, decreased peak expiratory flow, dystonia, urticaria, angioedema, hypovolemic hypotension, macula, maculopapular, sterile pustules, purpura, confluent, flaccid blister, bulla, exfoliative, scales, Scarlet fever–like, measles, diffuse, mucosal erosion, IVIG
- glucocorticoid, adrenaline, prednisolone, prednisone, dexamethasone, methylpred
Data Set With Selected Text
An oracle test was conducted to evaluate whether the strategy of key sentence selection affects performance. This oracle test was performed as follows: (1) for each document that contains any keyword, assign its gold tag, and (2) for all the documents that contain no keywords, assign the UNKNOWN tag (for the smoking task) or the NEG tag (for the DHR task).
As shown in
, key sentence selection reduced the maximum word count and the average word count for both data sets of the smoking task. The oracle micro-F1 was 1.0 for both the training and test set, which meant that the key sentence selection strategy did not affect the overall performance.Two lists of keywords were evaluated for the DHR task: TF-IDF keywords and guideline keywords. As shown in
, key sentence selection reduced the maximum word count and the average word count for both training and test data sets of the DHR task. The oracle test showed that with TF-IDF keywords, the oracle micro-F1 score was almost 1.0. With guideline keywords, about 2% to 3% of errors in the whole pipeline were introduced by this strategy.Maximum word count | Average word count | Oracle micro-F1 | ||
Train | ||||
Original | 3025 | 766 | —b | |
Selected | 194 | 18 | 1 | |
Test | ||||
Original | 2529 | 851 | — | |
Selected | 117 | 18 | 1 |
aFor word counting, all terms split by space delimiters were considered words.
bNot applicable.
Keywords | Maximum average count | Average character count | Oracle micro-F1 | ||||||||
Train | |||||||||||
Original | 27198 | 4615 | —b | ||||||||
Selected | |||||||||||
TF-IDFc | 4681 | 770 | 0.99 | ||||||||
Guideline | 1926 | 199 | 0.98 | ||||||||
Test | |||||||||||
Original | 15454 | 3963 | — | ||||||||
Selected | |||||||||||
TF-IDF | 3210 | 687 | 1 | ||||||||
Guideline | 636 | 177 | 0.97 |
aFor the drug hypersensitivity reaction data set, Chinese characters were counted.
bNot applicable.
cTF-IDF: term frequency-inverse document frequency.
Transformers
In-domain and open-domain pretrained embeddings by contextualized language models were evaluated in this work. For implementation, the SBERT library [
] computes document embedding with pretrained open-domain or domain-specific language models. There was no fine-tuning conducted for these pretrained models.This work evaluated the open-domain model bert-base-uncased [
] and domain-specific models ClinicalBERT and DischargeBERT [ ] for English clinical notes.This work evaluated the open-domain model bert-base-chinese [
] and domain-specific model Medbert-kd-chinese [ ] for Chinese clinical notes.Machine Learning Classifiers
Machine learning classifiers were stacked on top of deep learning transformers. Each machine learning classifier was tuned by 10-fold cross-validation on the training data set. An automatic grid search framework [
] searched for optimal hyperparameters. This work evaluated linear models with stochastic gradient descent (SGD) learning and libsvm for support vector classification (SVC).Results
Smoking Task: Strategies for Long Document Classification
Document Truncation
The library SBERT implemented this strategy with pretrained models BERT, ClinicalBERT, and DischargeBERT. As shown in
, these models performed poorly. When long documents were straightforwardly fed into the transformers, only the first 512-word pieces were reserved.Transformer | Classifier | Micro-averaged F1 (%) | |
Original text | Selected text | ||
Longformer | SGDa | 63.37 | 78.22 |
Bert-base-uncased | SGD | 67.33 | 90.01 |
DischargeBERT | SGD | 63.37b | 91.09 |
ClinicalBERT | SGD | 60.40 | 94.06 |
aSGD: stochastic gradient descent.
bGiven the size of the data set, some models may have the same results.
More Efficient Self-Attention
The Longformer model uses sparse self-attention instead of full self-attention to process longer documents (up to 4096 tokens). However, as shown in
, it did not outperform BERT baselines.Key Sentence Selection
This work used unigrams and bigrams from Pedersen [
] to select key sentences. As shown in , each model performs better on the selected text. The domain-specific pretrained language model, ClinicalBERT (91.09%), and DischargeBERT (93.07%) outperformed the open-domain model, bert-base-uncased (90.01%).Hierarchy Representation
In a hierarchy approach, sentence representations are built first and then aggregated into a document-level representation. For a fair comparison, we evaluated and reported the results of previous work [
] with our own evaluation script. As shown in , the fmean architecture in [ ] (94.2%) achieved state-of-the-art performance.As shown in
, our method (94.1%) achieved comparable performance with the top-performing method. Other earlier work for the smoking task (F1 ranged from 77.0% to 90.0%) did not achieve the same level of performance.The strategies of key sentence selection and hierarchy representation achieve comparable performance. Furthermore, their efficiency and memory requirements were compared. As summarized in
, GPU was not required for training machine learning classifiers in the proposed pipeline. The hierarchy representation model required a Tesla M40 GPU (Nvidia Corp) to train for 1 day. Our method was about 9 times faster than the hierarchy representation model for inference. With the strategies of both documentation truncation and key sentence selection, only 1 block was processed by the transformer models for each document, so the inference time was not reduced by key sentence selection.Transformer | Micro-averaged F1 (%) |
ClinicalBERT (ours) | 94.1 |
fmean [ | ]94.2 |
Shared task 1st place [ | ]90.0 |
Majority label baseline [ | ]81.0 |
CNNb [ | ]77.0 |
aOur method and fmean were evaluated by the same script over the test data set. Other results were found directly from their published reports. For comparison, the precision of the results is 0.1%.
bCNN: convolutional neural networks.
Model | Documents | Inference time on test data set (seconds) | Training time (hours) | GPU memory |
fmean [ | ]text | 35.52 | 24 | 16 |
ClinicalBert | text | 0.46 | —a | — |
+MLClassifier | selected text | 0.437 | 1 | — |
aNot applicable.
Severe DHR Task: Stacked Transformers and Classifiers
The smoking task showed that key sentence selection improved self-attention transformers with length limits. In the DHR task, this strategy was evaluated with various transformers and classifiers. As discussed in Methods, 2 kinds of keywords were evaluated and compared. As an unsupervised method, top TF-IDF [
] feature words were used for key sentence selection. Considering that clinical notes comply with guidelines, keywords were drawn from the DHR guidelines.As shown in
, the guideline keywords always improved the performance, regardless of the stacked transformers and classifiers. The TF-IDF keywords only help with the SVC classifier.Transformers and classifiers | Micro-averaged F1(%) | ||||||
Original text | Selected text | ||||||
TF-IDFa | guidelines | ||||||
Bert-base-chinese | |||||||
SVCb | 80.91 | 82.73 | 87.27 | ||||
SGDc | 80.00 | 77.27 | 86.36 | ||||
Medbert-kd-chinese | |||||||
SVC | 81.82 | 83.64 | 89.09 | ||||
SGD | 82.73 | 73.64 | 87.27 |
aTF-IDF: term frequency-inverse document frequency.
bSVC: support vector classification.
cSGD: stochastic gradient descent.
Applications in a 9-Year EHR
Finally, the best configuration was applied to the 9 years of data in Beijing Children’s Hospital’s EHRs. A total of 1155 cases were alerted. After double-checking by 2 clinicians and 2 pharmacists in pediatrics based on the criterion of severe DHRs, 357 cases of severe DHRs in children were found (
): anaphylactic shock (n=39), drug-induced hypersensitivity syndrome (n=178), Stevens-Johnson syndrome (n=86), and epidermolysis bullosa (n=54). Only 36 of 356 severe DHRs had been reported to SRS before. About 89.89% of cases were underreported, resulting in insufficient attention from drug regulators and clinicians. This suggests that our method could actively identify severe DHRs providing additional evidence for pharmacovigilance in children.The case analysis indicated many suspected drugs that may cause severe DHRs in pediatrics. The suspected drugs leading to anaphylactic shock mainly included pegaspargase injection, L-asparaginase, cefoperazone sulbactam, etc. Phenobarbital, nimesulide, and cephalosporin antibiotics were the key suspected drugs leading to drug-induced hypersensitivity syndrome and Stevens-Johnson syndrome. In addition, lamotrigine, lysine acetylsalicylate, and meropenem were closely related to the occurrence of epidermolysis bullosa.
Severe DHRa | Reported in SRSb of BCHc, n | DHR cases confirmed by experts, (n) | ||
Diagnosed in BCH | Diagnosed in other hospitals | Total | ||
ASd | 4 | 26 | 13 | 39 |
DIHSe | 16 | 29 | 149 | 178 |
SJSf | 7 | 9 | 77 | 86 |
EBg | 9 | 8 | 46 | 54 |
Total | 36 | 72 | 285 | 357 |
aDHR: drug hypersensitivity reaction.
bSRS: spontaneous reporting system.
cBCH: Beijing Children’s Hospital.
dAS: anaphylactic shock.
eDIHS: drug-induced hypersensitivity syndrome.
fSJS: Stevens-Johnson syndrome.
gEB: epidermolysis bullosa.
Discussion
Principal Findings
The results showed that clinical documents were too long to perform document classification baselines. Among the 4 strategies of long document classification, hierarchy representation and key sentence selection were best performed on the smoking task. Moreover, key sentence selection was 9 times faster than hierarchy representation models for inference. The keywords extracted from task-specific guidelines performed better than the unsupervised method. Domain-specific language models always performed better than general embeddings.
A total of 1155 cases were alerted, among which clinicians and pharmacists identified 357 cases of severe DHRs in children. Only 36 of these cases have been reported by SRS. This result suggested that the reporting rate of SRS may be as low as 10.08%. The automatic pipeline that scrutinized clinical notes and reported potential severe DHR cases can help decrease the number of missed positive DHR cases and reduce the cost of labor at the same time.
The case analysis also found more suspected drugs associated with severe DHRs in pediatrics. The analysis could help promote postmarketing drug risk assessment conducive to rational drug use and improve drug guidelines.
Comparison With Prior Work
Our method achieved comparable performance for the smoking task with the top-performing method (94.1% vs 94.2%). For the DHR task, our method discovered 357 positive cases, about 90% of which were missed by SRS.
Recent work has studied that clinical documents are too long for contextualized language models to process [
- ]. Unlike previous works that only process discharge summaries [ - ], this DHR task deals with documents consisting of all clinical notes associated with 1 inpatient visit. The average word length of discharge summaries is typically hundreds of words. However, in the DHR data set, the average word length is up to several thousand Chinese characters, and some documents contain tens of thousands of Chinese characters.This work has 4 strategies evaluated and compared: document truncation [
], hierarchy representation [ , ], more efficient self-attention [ ], and key sentence selection [ , , , ]. None of these works considered the use of guidelines.Limitations
The proposed method required the annotation of about 200 positive cases for supervised training. When applying to the large archive of EHRs in hospital databases, certain preprocessing steps are still required to prevent malfunctions from badly formatted documents. Such preprocessing steps may vary for each hospital’s system.
Conclusions
Automatic identification of severe DHRs can be approached as a document classification problem. The best strategy for long document classification of clinical notes is key sentence selection with task-specific guidelines. The reporting of DHR cases cannot only rely on clinicians in charge. In the same period of data, the SRS system reported 36 cases, whereas the automatic process discovered 357 cases. The case analysis also found more suspected drugs associated with severe DHRs in pediatrics.
Acknowledgments
This work was supported by grant CST2020CT108 from the Clinical Toxicology Program of Chinese Society of Toxicology, grant DSM2021004 from the Post-marketing Drug Risk Assessment Program of China Society for Drug Regulation, and grant CNHDRC-KJ-W-2021-58 from the Clinical Technology Training Program for Comprehensive Evaluation of Pediatric Medication of China National Health and Development Research Center. The funder had no role in conducting the study; collection, management, analysis, and interpretation of data; preparation, review, and approval of the manuscript; or decision to submit the manuscript for publication.
Authors' Contributions
XLW undertook work of framework design and overall guidance of whole research. YCY, XCW, WC, YML, and YFX took responsibility for the data collection. YCY and QYZ performed the data processing and article writing. QYZ and XLW provided data interpretation and methodological advice.
Conflicts of Interest
None declared.
Types of drug hypersensitivity reactions and criteria.
DOCX File , 15 KBReferences
- Naisbitt DJ. Drug hypersensitivity reactions in skin: understanding mechanisms and the development of diagnostic and predictive tests. Toxicology 2004 Jan 15;194(3):179-196. [CrossRef] [Medline]
- Gomes ER, Brockow K, Kuyucu S, Saretta F, Mori F, Blanca-Lopez N, ENDA/EAACI Drug Allergy Interest Group. Drug hypersensitivity in children: report from the pediatric task force of the EAACI Drug Allergy Interest Group. Allergy 2016 Feb;71(2):149-161. [CrossRef] [Medline]
- Rukasin CRF, Norton AE, Broyles AD. Pediatric Drug Hypersensitivity. Curr Allergy Asthma Rep 2019 Feb 22;19(2):11. [CrossRef] [Medline]
- Lopez-Gonzalez E, Herdeiro MT, Figueiras A. Determinants of under-reporting of adverse drug reactions: a systematic review. Drug Saf 2009;32(1):19-31. [CrossRef] [Medline]
- Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008;15(1):14-24 [FREE Full text] [CrossRef] [Medline]
- Andriy M, Elliot S, Masoud R, Mark D. Phenotyping of clinical notes with improved document classification models using contextualized neural language models. ArXiv. Preprint posted online on October 30, 2019 2019:1 [FREE Full text]
- Huang K, Garapati S, Rich A. An interpretable end-to-end fine-tuning approach for long clinical text. ArXiv. Preprint posted online on November 12, 2020 2020:1 [FREE Full text]
- Valmianski I, Goodwin C, Finn I. Evaluating robustness of language models for chief complaint extraction from patient-generated text. ArXiv. Preprint posted online on November 15, 2019 2019:1 [FREE Full text]
- Guidelines for Medical Nomenclature Use of Adverse Drug Reactions. Beijing: National Medical Products Administration; 2016.
- Reimers NG. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv. Preprint posted online on August 27, 2019 2019:1 [FREE Full text] [CrossRef]
- Pappagari RZ, Villalba J, Carmiel Y, Dehak N. Hierarchical transformers for long document classification. ArXiv. Preprint posted online on October 23, 2019 2019:1 [FREE Full text] [CrossRef]
- Beltagy IP, Cohan A. Longformer: the long-document transformer. ArXiv. Preprint posted online on April 10, 2020 2020:1 [FREE Full text]
- Ding MZ, Yang H, Tang J. Cogltx: applying bert to long texts. Adv Neural Inf Process Syst 33. URL: https://proceedings.neurips.cc/paper/2020/file/96671501524948bc3937b4b30d0e57b9-Paper.pdf [accessed 2022-08-18]
- Fiok K, Karwowski W, Gutierrez-Franco E, Davahli MR, Wilamowski M, Ahram T, et al. Text guide: improving the quality of long text classification by a text selection method based on feature importance. IEEE Access 2021;9:105439-105450. [CrossRef]
- Park H, Vyas Y, Shah K. Efficient classification of long documents using transformers. ArXiv. Preprint posted online on March 21, 2022 2021:1 [FREE Full text]
- Pedersen T. Determining smoker status using supervised and unsupervised learning with lexical features. URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.116.1948&rep=rep1&type=pdf [accessed 2022-08-18]
- Li X, Zhai S, Wang Q, Wang Y, Yin J, Chen Y. Recommendations in guideline for emergency management of anaphylaxis. Adverse Drug React J 2019;21(2):85-91. [CrossRef]
- Allergic Diseases Committee. Expert consensus on diagnosis and treatment of drug hypersensitivity syndrome. Chin J Dermatol 2018;51(11):787-790. [CrossRef]
- Adverse Drug Reaction Research Center of Chinese Society of Dermatology. Expert consensus on the diagnosis and treatment of Stevens-Johnson syndrome/toxic epidermal necrolysis. Chin J Dermatol 2021 May 15;54(5):376-381. [CrossRef]
- Alsentzer E, Murphy J, Boag W, Weng W, Jin D, Naumann T. Publicly Available Clinical BERT Embeddings. ArXiv. Preprint posted online on April 6, 2019 2019:1 [FREE Full text] [CrossRef]
- Turc I, Chang M, Lee K, Kristina T. Well-read students learn better: on the importance of pre-training compact models. ArXiv. Preprint posted online on August 23, 2019 2019:1 [FREE Full text]
- trueto: research and application of BERT model in Chinese clinical Natural language processing. 2021. URL: https://github.com/trueto/medbert [accessed 2021-03-01]
- Clark C, Good K, Jezierny L, Macpherson M, Wilson B, Chajewska U. Identifying smokers with a medical extraction system. J Am Med Inform Assoc 2008;15(1):36-39 [FREE Full text] [CrossRef] [Medline]
- Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, et al. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 2019 Jan 07;19(1):1 [FREE Full text] [CrossRef] [Medline]
Abbreviations
BERT: bidirectional encoder representation from transformer |
DHR: drug hypersensitivity reaction |
EHR: electronic health record |
NEG: negative |
SGD: stochastic gradient descent |
SRS: spontaneous reporting system |
SVC: support vector classification |
TF-IDF: term frequency-inverse document frequency |
Edited by T Hao; submitted 08.03.22; peer-reviewed by J Luo, L Chen; comments to author 04.05.22; revised version received 06.07.22; accepted 12.08.22; published 13.09.22
Copyright©Yuncui Yu, Qiuye Zhao, Wang Cao, Xiaochuan Wang, Yanming Li, Yuefeng Xie, Xiaoling Wang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 13.09.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.