Published on in Vol 9, No 4 (2021): April

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/23587, first published .
Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study

Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study

Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study

Original Paper

1Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, China

2Baidu International Technology (Shenzhen) Co, Ltd, Shenzhen, China

3Peng Cheng Laboratory, Shenzhen, China

Corresponding Author:

Buzhou Tang, PhD

Key Laboratory of Network Oriented Intelligent Computation

Harbin Institute of Technology

University Town

Shenzhen, 518055

China

Phone: 86 13725525983

Email: tangbuzhou@hit.edu.cn


Background: Family history information, including information on family members, side of the family of family members, living status of family members, and observations of family members, plays an important role in disease diagnosis and treatment. Family member information extraction aims to extract family history information from semistructured/unstructured text in electronic health records (EHRs), which is a challenging task regarding named entity recognition (NER) and relation extraction (RE), where named entities refer to family members, living status, and observations, and relations refer to relations between family members and living status, and relations between family members and observations.

Objective: This study aimed to introduce the system we developed for the 2019 n2c2/OHNLP track on family history extraction, which can jointly extract entities and relations about family history information from clinical text.

Methods: We proposed a novel graph-based model with biaffine attention for family history extraction from clinical text. In this model, we first designed a graph to represent family history information, that is, representing NER and RE regarding family history in a unified way, and then introduced a biaffine attention mechanism to extract family history information in clinical text. Convolution neural network (CNN)-Bidirectional Long Short Term Memory network (BiLSTM) and Bidirectional Encoder Representation from Transformers (BERT) were used to encode the input sentence, and a biaffine classifier was used to extract family history information. In addition, we developed a postprocessing module to adjust the results. A system based on the proposed method was developed for the 2019 n2c2/OHNLP shared task track on family history information extraction.

Results: Our system ranked first in the challenge, and the F1 scores of the best system on the NER subtask and RE subtask were 0.8745 and 0.6810, respectively. After the challenge, we further fine tuned the parameters and improved the F1 scores of the two subtasks to 0.8823 and 0.7048, respectively.

Conclusions: The experimental results showed that the system based on the proposed method can extract family history information from clinical text effectively.

JMIR Med Inform 2021;9(4):e23587

doi:10.2196/23587

Keywords



Family history information plays an important role in the diagnosis and treatment of diseases, especially genetic disorders. Family history information is always embedded in electronic health records (EHRs) in a semistructured/unstructured format, which needs to be unlocked by natural language processing (NLP) technology.

In order to promote research on family history information extraction, Harvard Medical School and Mayo Clinic organized national NLP challenges on family history information extraction in 2018 and 2019. The family history information extraction task includes the following two subtasks: (1) recognizing family members, living status, and observations and (2) determining which family members the recognized living status and observations belong to, which correspond to two fundamental NLP tasks, namely named entity recognition (NER) and relation extraction (RE). The NER task is usually regarded as a sequence labeling task, while the RE task is the subsequent classification task, and they are tackled by pipeline methods.

For the NER task, traditional machine learning methods, such as hidden Markov model (HMM), conditional random field (CRF) [1], and structured support vector machine (SSVM) [2], and deep learning methods, such as Bidirectional Long Short Term Memory network (BiLSTM) CRF [3] and its variants [4,5], have been widely applied. For the RE task, the typical machine learning methods include support vector machine (SVM) [6], convolutional neural network (CNN) [7], and recurrent neural network [8]. The methods mentioned above have also been applied for clinical entity recognition and RE, such as the NLP challenges organized by i2b2 in 2009 [9], 2010 [10], 2012 [11], and 2014 [12], the NLP challenges organized by SemEval in 2015 [13] and 2016 [14], the NLP challenges organized by ShARe/CLEF in 2013 [15] and 2014 [16], and the NLP challenges organized by BioCreative/OHNLP in 2018 [17]. Most of these methods process NER and RE tasks in a pipeline way, which can suffer from error propagation [18].

A number of joint learning methods have been proposed [18,19] for NER and RE subtasks to avoid error propagation from NER to RE. In the case of family history information extraction, Shi et al [17] developed deep joint learning based on the BiLSTM that won the 2018 BioCreative/OHNLP challenge [20]. Joint learning methods generally used pretrained neural language models. Neural language models pretrained on large-scale unlabeled text have recently been proven to be surprisingly effective in many downstream tasks, and Bidirectional Encoder Representation from Transformers (BERT) [21] is one of the most popular neural language models.

In this study, we proposed a novel graph-based model with biaffine attention. Inspired by the dependency parsing task [22,23], we designed a novel graph-based schema to represent family history information and introduced deep biaffine attention [22,23] to extract family history information from clinical text. A system based on the proposed method was developed for the 2019 n2c2/OHNLP challenge on family history information extraction, and it achieved the highest F1 scores of 0.8823 on subtask1 and 0.7048 on subtask2.


Task Description

There were two subtasks in the 2019 n2c2/OHNLP challenge on family history information extraction. For subtask1, we need to recognize family members with the side of the family, living status mentioned in clinical text, and observations in the family history. All family members can be normalized to standard forms in Table 1. The property of family members named “side of family” includes the following three possible values: NA (“not applicable”), maternal, and paternal. Following the work of Shi et al [17], we compared two different strategies. The first strategy recognized three types of entities (family member, observation, and living status) and determined the “side of family” property for each family member entity through a postprocessing module. The second strategy recognized five types of entities (NA, maternal, paternal, observation, and living status), directly determining the “side of family” property of family members.

For subtask2, we need to extract the relations between family members, observations, and living status. Living status is used to represent the health status of family members, and it has the two properties of “Alive” and “Healthy.” Each property was measured by a real-valued score (yes: 2, NA: 1, and no: 0). The total living status score of family members was their alive score multiplied by their health score. We also need to predict the negation information (Negated and Non_Negated) for each observation, that is, to judge whether the family members have certain diseases or not.

Table 1. Normalized family member names.
DegreeNormalized family member names
1Father, Mother, Parent, Sister, Brother, Daughter, Son, and Child
2Grandmother, Grandfather, Grandparent, Cousin, Sibling, Aunt, and Uncle

Data Statistics

We conducted experiments on the corpus provided by the 2018 and 2019 n2c2/OHNLP shared task tracks on family history information extraction. The training set of the 2019 n2c2/OHNLP shared task together with the test set of the 2018 BioCreative/OHNLP shared task was used as the final training set of 149 EHRs for model training. The test data set of the 2019 n2c2/OHNLP shared task, including 117 EHRs, was used for the model test. During model training, we randomly selected a development set of 14 EHRs from the training set for parameter optimization. The statistics of the corpus used in this study is shown in Table 2.

Table 2. Detailed data set statistics.
ItemTraining set, nDevelopment set, nTest set, n
Document14914117
Sentence77071644
FMa: overall112894b
FM: NAc63155
FM: maternal27224
FM: paternal22515
OBd1439127
LSe59652
FM-OB: overall106497
FM-OB: NA-OB57557
FM-OB: maternal-OB26523
FM-OB: paternal-OB22417
FM-LS: overall60553
FM-LS: NA-LS33429
FM-LS: maternal-LS14512
FM-LS: paternal-LS12612

aFM: family member.

bNot available.

cNA: not applicable.

dOB: observation.

eLS: living status.

Graph-Based Schema

Similar to the dependency parsing task where each token has a head token, we transformed the family history information extraction task to a dependency parsing problem, where a dummy root (denoted by “ROOT”) was appended to each sentence at the beginning and arcs denoted links between two tokens. In the “dependency parsing tree” of a sentence, tokens in each entity were connected together by an “app” arc from right to left, two entities with a relation were connected through linking the right most token by an arc labeled with the entity type, and tokens not in any entity were connected with the “ROOT” node by “NULL” arcs. Figure 1 shows an example of using a “dependency parsing tree” to represent family history information extraction, where the family member entity “children” was determined by the “Family Member” arc from “ROOT” to “children,” the living status entity “generally healthy” was determined by “generally generally,” and the relation between “children” and “generally healthy” was determined by the arc from “children” to “healthy” .

Figure 1. Example of using a graph-based schema to represent family history information.
View this figure

Model Architecture

As shown in Figure 2, our model contained the following two main parts: (1) a representation module, which represented input text using BERT and CNN-BiLSTM and (2) a biaffine attention module to predict label score vectors, including unlabeled arc prediction (top left in Figure 2) and arc label prediction (top right in Figure 2). We have presented them in the following sections in detail.

Figure 2. Overview architecture of our model.
View this figure

Representation Layer

Given a sentence s = x1xixn, where xi is the ith token of s, we used BERT and CNN-BiLSTM to represent it separately as follows:

where CNN [4] is first used to get the character-level representation of each token, and BiLSTM is then used to get the contextual representation of each token in CNN-BiLSTM. The final representation of token xi is

Biaffine Attention Layer

Unlabeled Arc Prediction

Considering the ith token and the jth token, we fed their corresponding representations into a bilinear transformation extension called a biaffine function to get the score of the arc from token i (head) to j (dependent) as follows:

where rj(arcdep)∈Rp and rj(archead)∈Rp are the outputs of multilayer perceptron, U(arc)∈Rp×p is a weight matrix controlling the strength of the arc from token i to j, and u(arc)∈Rp is a bias vector.

Assume that sj(arc) = [s1j(arc);…;snj(arc)] is the score vector of all possible heads of the jth token. We adopted the softmax function to compute the probability distribution dj of all possible heads of token j and the cross-entropy between the predicted dj and gold standard dj(arc) as the loss function as follows:

Thereafter, the best head of token j was determined according to

Arc Label Prediction

For each unlabeled arc, we need to determine its label. Assume that sij(lab)∈R|L| is the label score vector for each arc from token i to j, where |L| is the size of the label set. We can compute sij(lab) as follows:

where rj(labeldep)∈R|Lp and rj(labelhead)∈R|Lp are outputs of the multilayer perceptron, U(label)∈R|Lp×p is a third-order tensor, W(label)∈R|L|×2p is a weight matrix, and u(label)∈R|L| is a bias vector.

We also adopted the softmax function to compute the probability distribution dij of all possible labels of the arc from token i to j and the cross-entropy between the predicted dij and gold standard dij(label) as the loss function as follows:

Thereafter, the best label of the arc from token i to j was determined by

The total loss function was set as

Postprocessing Rules

We designed a rule-based postprocessing module to adjust the outputs of our model. It included the following five parts:

1. Converting the output to entities and relations.

(1) Combining all tokens connected by “app” arcs to form entities and assigning them the label of their last token.

(2) If there was an arc between two entities, but not an “app” arc, there was a relation between them.

2. Normalizing family members.

(1) Converting family member entities into normalized forms as shown in Table 1. For example, we converted the recognized “father’s father” into “grandfather” and “aunt’s son” into “cousin.”

(2) Excluding unnecessary family members. For example, a patient’s nonblood relatives, such as “father” in section “partner’s father,” should be removed. If the family member “father” belonged to section “partner’s father,” we removed “father” since father-in-law was not in Table 1.

3. Determining the side of family members when using the strategy of three types of entities.

(1) If a family member was a first-degree relative, the side of the family was set as “NA.”

(2) If a family member was in the section “maternal family history” or “paternal family history,” the side of the family was set as maternal or paternal.

(3) If there was an indicator (“maternal” or “paternal”) near a family member, the side of the family was determined by the indicator.

(4) Otherwise, the side of the family of a family member was set as “NA.”

4. Determining the living status score of family members following the work of Shi et al [17].

(1) Determining the scores of the properties “Alive” and “Healthy” of a family member through searching the keywords listed in Table 3 from the family member’s living status. If a living status entity contained some keywords listed in Table 2, we assigned its property scores with the corresponding scores; otherwise, both its alive score and healthy score were set as NA=1.

(2) The total living status score was determined according to the alive score and healthy score. For a relative with “Alive=Yes” and “Healthy=Yes,” for example, the living status score should be 4.

5. Determining the negation information of observations.

(1) Determining the negation information of an observation through searching keywords (no, never, not, none, negative, neither, nor, unremarkable, and deny) from the observation’s context. If the context of an observation contained a keyword mentioned above, we set its negation information as “Negated;” otherwise, it was set as “Non_Negated.”

(2) Reversing the negation information of an observation if there were specific phrases, such as “apart from” and “except for,” in the observation’s context. For example, the negation information of the observation entity “Meniere disease” in “there is no history of hearing loss apart from the father's history of Meniere disease” was set as “Non_Negated” rather than “Negated.”

Table 3. Keywords used to determine the properties “Alive” and “Healthy.”
PropertyKeywords
Alive: Yes=2Alive and living
Alive: No=0Dead, die, deceased, death, died, stillborn, and passed away
Healthy: Yes=2Good, health, without problems, healthy, and well

Experimental Settings

The hyperparameters used in our experiments are listed in Table 4, and all other parameters were optimized in the validation set. The pretrained BERT model we used was [BERT-Base, Uncased] [24].

We first investigated our model in the following two settings: (1) a pipeline model that tackled unlabeled arc prediction and arc label prediction separately and (2) a joint model that tackled unlabeled arc prediction and arc label prediction simultaneously. The joint model predicated the arc and label of each token in our model jointly. The pipeline model first trained one model to predict the head of each token and then trained another model to predict the head of each token according to the result of the predicted head. Thereafter, we compared our model with the BERT-based model using the same architecture as that of the model by Shi et al [17], except that we used BERT instead of word embeddings in the input layer (denoted by BERT-2BiLSTM). Finally, we looked into the effect of the sentence representation based on CNN-BiLSTM on our model and the effect of different data sets on our model. The performance of all models for the two subtasks was measured by precision, recall, and F1 score (F1) as follows:

where TP denotes the number of true-positive samples, FP denotes the number of false-positive samples, and FN denotes the number of false-negative samples. We used the tool provided by the organizers [25] to calculate them. The tool accepted partial matching of the observations, for example, the recognized observation “diabetes” whose gold standard observation is “type 2 diabetes” was considered as a true-positive sample. The source code is available at GitHub [26].

Table 4. Major hyperparameters.
ParameterValue
BiLSTMa size256
Arc MLPb size500
Label MLP size100
BERTc size768
Char embedding size25
CNNd kernel size(3, 4, 5)
Char-level CNN size50
Dropout0.5
OptimizerAdam
Learning rate2e-5
Batch size32
Max epoch100

aBiLSTM: Bidirectional Long Short Term Memory network.

bMLP: multilayer perceptron.

cBERT: Bidirectional Encoder Representation from Transformers.

dCNN: convolutional neural network.


As shown in Table 5, the performance of the model considering five types of entities was better than that considering three types of entities. The joint model considering five types of entities achieved the highest F1 score of 0.8823 on the NER subtask and 0.7048 on the RE subtask, which were higher than the values for the joint model considering three types of entities by 1.20% on the NER subtask and 1.87% on the RE subtask.

Compared to the pipeline model, the joint model performed better on both the NER and RE. For example, when considering five types of entities, the joint model outperformed the pipeline model by 1.21% in the F1 score on the NER subtask and 1.97% in the F1 score on the RE subtask. It indicated that error propagation was partially alleviated in our joint model. When considering five types of entities, the joint model achieved higher F1 scores than BERT-2BiLSTM on the NER subtask and RE subtask by 1.18% and 0.39%, respectively.

Table 5. Performance of different models.
SubtaskModelThree types of entitiesFive types of entities
PrecisionRecallF1 scorePrecisionRecallF1 score
NERaPipeline0.92540.80620.86170.92410.82230.8702
NERJoint0.90120.84150.87030.91540.85140.8823
NERBERTb-2BiLSTMcd0.90960.83470.8705
REePipeline0.79090.60050.68270.78950.60510.6851
REJoint0.76790.62000.68610.77170.64870.7048
REBERT-2BiLSTM0.76860.64410.7009

aNER: named entity recognition.

bBERT: Bidirectional Encoder Representation from Transformers.

cBiLSTM: Bidirectional Long Short Term Memory network.

dNot available.

eRE: relation extraction.

The performance of our best model on each type of family member information and relation (except living status not provided in the test set) is listed in Table 6. On the NER subtask, our model performed better on observations than family members by 3.80% in terms of the F1 score. Among the three types of family members, our model achieved the highest F1 score of 0.8702 for maternal family member and the lowest F1 score of 0.8411 for paternal family member. On the RE subtask, the F1 score of our model on the family member-living status relation was nearly the same as that of our model on the family member-observation relation. Among the family member-observation relations, our model performed worse on the maternal-observation relation than the other two types of relations. Among the family member-living status relations, our model performed worse on the paternal-living status relation than the other two types of relations.

Table 6. Performance of the best model on each type of family member information.
SubtaskTypePrecisionRecallF1 score
NERaFMb: overall0.88140.83860.8594
NERFM: NAc0.86990.85150.8606
NERFM: maternal0.91850.82670.8702
NERFM: paternal0.87380.81080.8411
NEROBd0.93850.85980.8974
NERLSef
NEROverall0.91540.85140.8823
REgFM-OB: overall0.78430.63970.7047
REFM-OB: NA-OB0.85950.60980.7134
REFM-OB: maternal-OB0.70670.66010.6826
REFM-OB: paternal-OB0.70770.71500.7113
REFM-LS: overall0.76270.65530.7050
REFM-LS: NA-LS0.76270.65530.7050
REFM-LS: maternal-LS0.71080.73750.7239
REFM-LS: paternal-LS0.68250.68250.6825
REOverall0.77170.64870.7048

aNER: named entity recognition.

bFM: family member.

cNA: not applicable.

dOB: observation.

eLS: living status.

fNot available.

gRE: relation extraction.

As shown in Table 7, without using the additional data for BioCreative/OHNLP 2018, our model considering five types of entities achieved an F1 score of 0.8648 on the NER subtask and 0.6612 on the RE subtask (the F1 score was significantly reduced both on the NER subtask and RE subtask), showing the importance of the data.

Table 7. Performance of our model with different data.
SubtaskData setThree types of entitiesFive types of entities
PrecisionRecallF1 scorePrecisionRecallF1 score
NERa20190.87670.84090.85840.88470.84580.8648
NER2018+2019bc0.91540.83720.8745
NER2018+2019d0.90120.84150.87030.91540.85140.8823
REe20190.72400.59730.65450.72700.60640.6612
RE2018+2019b0.74590.62650.6810
RE2018+2019d0.76790.62000.68610.77170.64870.7048

aNER: named entity recognition.

b2018+2019: the challenge submission performances of our model.

cNot available.

d2018+2019: the performances of our best model after challenge.

eRE: relation extraction.


Effect of Sentence Representation

In order to investigate the effect of sentence representation based on CNN-BiLSTM on our model, we evaluated the model without using the representation and obtained an F1 score of 0.8802 on the NER subtask and an F1 score of 0.7059 on the RE subtask when considering five types of entities. The sentence representation based on CNN-BiLSTM can bring improvement in the NER subtask, but a little loss in the RE subtask. Possibly, we can only share BERT on NER and RE for further improvement.

Impact of Different Decoders on the NER Subtask

Traditional approaches regarded the NER task as a sequence labeling task, in which each token was assigned with a combined label of entity boundary and type. The entity boundaries were represented by the BIO schema, where “B” indicates the beginning of an entity, “I” indicates the inside of an entity, and “O” indicates the outside of an entity. Using a graph schema, we can also convert NER into a graph in the following way: (1) connect all tokens with “ROOT,” that is, the heads of all tokens are set to 0 and (2) set the label of the nonentity token to “NULL,” set the label of the last token in the entity to the entity type, and set the label of the remaining token in the entity to “app.”

We compared different decoders, that is, CRF for sequence labeling, biaffine for NER only (biaffine-NER), and biaffine for joint NER and RE (biaffine-Joint). As shown in Table 8, the performance of biaffine-NER was slightly better than that of CRF, while biaffine-Joint was considerably better than the other two models. Although the head prediction was not directly related to the NER task, the arcs of different types among tokens provided global information that was beneficial to the NER task.

Table 8. Comparison of different decoders on the named entity recognition subtask.
DecoderThree types of entitiesFive types of entities
PrecisionRecallF1 scorePrecisionRecallF1 score
CRFa0.89890.83160.86390.90700.83900.8717
Biaffine-NERb0.90010.83100.86410.88950.85700.8729
Biaffine-Joint0.90120.84150.87030.91540.8514 0.8823

aCRF: conditional random field.

bNER: named entity recognition.

Error Analysis

We performed error analysis on our model considering five types of entities in the development data set. In the case of the NER subtask, 88.24% of errors were boundary errors because of wrong “app” arc prediction, while the remaining 11.76% of errors were type errors that have a correct boundary but wrong entity type. For example, in the sentence “The paternal grandmother, age 53, has wind sucking attributed to not having intestinal during her life,” the paternal entity “grandmother” with the observation entity “wind sucking” was wrongly recognized as a family member entity. In the RE subtask, all errors were caused by incorrect entities. For example, in the sentence “The patient’s father is 43 years old and healthy. His father is 72 years old and was diagnosed with esophageal cancer at age 70,” the family member entity “grandfather” with the observation entity “esophageal cancer” was wrongly extracted as the family member entity “father” with the observation entity “esophageal cancer” as our model could not understand that “his” refers to “the patient’s father,” which needs strong indirect relative reasoning.

Limitations and Future Work

The rule-based postprocessing module in our system cannot handle all cases properly, as shown by the example in the error analysis section. In future work, we will try to solve indirect relative reasoning for further improvement.

Conclusions

In this study, we proposed a novel graph-based model with biaffine attention, where a graph-based schema was design to represent entities and relations regarding family history in a unified way and deep biaffine attention was adopted to extract the entities and relations from clinical text. Our system based on the proposed model achieved the highest F1 score of the challenge to date.

Acknowledgments

This paper was supported in part by the following grants: National Natural Science Foundations of China (U1813215, 61876052, and 61573118), Special Foundation for Technology Research Program of Guangdong Province (2015B010131010), National Natural Science Foundations of Guangdong, China (2019A1515011158), Guangdong Province Covid-19 Pandemic Control Research Fund (2020KZDZX1222), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20180306172232154 and JCYJ20170307150528934), and Innovation Fund of Harbin Institute of Technology (HIT.NSRIF.2017052).

Conflicts of Interest

None declared.

  1. Lafferty J, Mccallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning. 2001 Presented at: Eighteenth International Conference on Machine Learning; June 28-July 1, 2001; Williamstown, MA, USA p. 282-289.
  2. Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Med Inform Decis Mak 2013;13(Suppl 1):S1. [CrossRef]
  3. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv. 2015.   URL: http://arxiv.org/abs/1508.01991 [accessed 2021-03-29]
  4. Ma X, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016 Presented at: 54th Annual Meeting of the Association for Computational Linguistics; 2016; Berlin, Germany p. 1064-1074. [CrossRef]
  5. Tang B, Hu J, Wang X, Chen Q. Recognizing Continuous and Discontinuous Adverse Drug Reaction Mentions from Social Media Using LSTM-CRF. Wireless Communications and Mobile Computing 2018;2018:1-8. [CrossRef]
  6. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995 Sep;20(3):273-297. [CrossRef]
  7. Sahu S, Anand A, Oruganty K. Relation extraction from clinical texts using domain invariant convolutional neural network. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 2016 Presented at: 15th Workshop on Biomedical Natural Language Processing; 2016; Berlin, Germany p. 206-215. [CrossRef]
  8. Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform 2017 Aug;72:85-95 [FREE Full text] [CrossRef] [Medline]
  9. Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc 2010;17(5):514-518 [FREE Full text] [CrossRef] [Medline]
  10. Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc 2013 Sep 01;20(5):828-835 [FREE Full text] [CrossRef] [Medline]
  11. Liu Z, Chen Y, Tang B, Wang X, Chen Q, Li H, et al. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. J Biomed Inform 2015 Dec;58 Suppl:S47-S52 [FREE Full text] [CrossRef] [Medline]
  12. Zhang Y, Wang J, Tang B. UTH_CCB: A report for SemEval 2014 – Task 7 Analysis of Clinical Text. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014 Presented at: 8th International Workshop on Semantic Evaluation (SemEval 2014); 2014; Dublin, Ireland p. 802-806. [CrossRef]
  13. Bethard S, Derczynski L, Savova G, Pustejovsky J, Verhagen M. SemEval-2015 Task 6: Clinical TempEval. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015 Presented at: 9th International Workshop on Semantic Evaluation (SemEval 2015); June 4-5, 2015; Denver, Colorado p. 806-814. [CrossRef]
  14. Kelly L, Goeuriot L, Suominen H, Névéol A, Palotti J, Zuccon G. Overview of the CLEF eHealth Evaluation Lab 2016. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science, vol 9822. Cham: Springer; 2016:255-266.
  15. Suominen H, Salantera S, Velupillai S, Chapman WW, Savova G, Elhadad N, et al. Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Berlin, Heidelberg: Springer; 2013:212-231.
  16. Goeuriot L, Kelly L, Li W, Palotti J, Pecina P, Zuccon G, et al. ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred health information retrieval. In: CLEF 2014 Online Working Notes. 2014 Presented at: CEUR Workshop; September 15-18, 2014; Sheffield, UK p. 43-61.
  17. Shi X, Jiang D, Huang Y, Wang X, Chen Q, Yan J, et al. Family history information extraction via deep joint learning. BMC Med Inform Decis Mak 2019 Dec 27;19(Suppl 10):277 [FREE Full text] [CrossRef] [Medline]
  18. Li Q, Ji H. Incremental Joint Extraction of Entity Mentions and Relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014 Presented at: 52nd Annual Meeting of the Association for Computational Linguistics; June 22-27, 2014; Baltimore, MD, USA p. 402-412. [CrossRef]
  19. Miwa M, Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016 Presented at: 54th Annual Meeting of the Association for Computational Linguistics; August 7-12, 2016; Berlin, Germany p. 1105-1116. [CrossRef]
  20. Liu S, Mojarad MR, Wang Y, Wang L. Overview of the BioCreative/OHNLP 2018 Family History Extraction Task. In: Proceedings of the BioCreative Workshop. 2018 Presented at: BioCreative Workshop; 2018; Washington, DC, USA.
  21. Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019 Presented at: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); June 2-7, 2019; Minneapolis, MN, USA p. 4171-4186. [CrossRef]
  22. Dozat T, Manning C. Deep Biaffine Attention for Neural Dependency Parsing. In: Proceedings of the 5th International Conference on Learning Representations. 2017 Presented at: 5th International Conference on Learning Representations; April 24-26, 2017; Toulon, France.
  23. Yan H, Qiu X, Huang X. A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing. Transactions of the Association for Computational Linguistics 2020 Dec;8:78-92. [CrossRef]
  24. TensorFlow code and pre-trained models for BERT. GitHub.   URL: https://github.com/google-research/bert [accessed 2021-03-05]
  25. OHNLP/BioCreative 2018 Task 1: Family History Extraction. GitHub.   URL: https://github.com/ohnlp/fh_eval [accessed 2021-03-05]
  26. 2019 n2c2/OHNLP Track2 family history extraction. GitHub.   URL: https://github.com/zkczjj/2019_n2c2_FHExtraction [accessed 2021-03-26]


BERT: Bidirectional Encoder Representation from Transformers
BiLSTM: Bidirectional Long Short Term Memory
CNN: convolutional neural network
CRF: conditional random field
EHR: electronic health record
NER: named entity recognition
NLP: natural language processing
RE: relation extraction


Edited by Y Wang, F Shen; submitted 17.08.20; peer-reviewed by I Mircheva, X Yang; comments to author 12.10.20; revised version received 18.01.21; accepted 09.02.21; published 21.04.21

Copyright

©Kecheng Zhan, Weihua Peng, Ying Xiong, Huhao Fu, Qingcai Chen, Xiaolong Wang, Buzhou Tang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 21.04.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.