This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
As the manual creation and maintenance of biomedical ontologies are laborintensive, automatic aids are desirable in the lifecycle of ontology development.
Provided with a set of concept names in the Foundational Model of Anatomy (FMA), we propose an innovative method for automatically generating the taxonomy and the partonomy structures among them, respectively.
Our approach comprises 2 main tasks: The first task is predicting the direct relation between 2 given concept names by utilizing word embedding methods and training 2 machine learning models, Convolutional Neural Networks (CNN) and Bidirectional Long Shortterm Memory Networks (BiLSTM). The second task is the introduction of an original granularitybased method to identify the semantic structures among a group of given concept names by leveraging these trained models.
Results show that both CNN and BiLSTM perform well on the first task, with F1 measures above 0.91. For the second task, our approach achieves an average F1 measure of 0.79 on 100 case studies in the FMA using BiLSTM, which outperforms the primitive pairwisebased method.
We have investigated an automatic way of predicting a hierarchical relationship between 2 concept names; based on this, we have further invented a methodology to structure a group of concept names automatically. This study is an initial investigation that will shed light on further work on the automatic creation and enrichment of biomedical ontologies.
Biomedical ontologies are formalized representations of concepts and the relationships among these concepts for the biomedical domain, and they play a vital role in many medical settings [
In recent years, many ontology learning (OL) efforts have been made to automate the construction of ontologies from free text [
An important observation of biomedical ontologies is that the lexical patterns of the concepts often indicate, to a certain degree, the structural relations between them, especially for hierarchical relations. For instance, in the Foundational Model of Anatomy (FMA) [
In this study, we propose an automatic approach for structuring a given set of concept names based on their lexical granularity. We started by investigating an automatic way to predict the direct relation between 2 given concepts by employing machine learning (ML) algorithms. Since word embedding tools such as Word2Vec [
We selected the most common hierarchical relations in biomedical ontologies for experiments: the
Moving forward, provided with a group of concept names, we aimed to determine how to structure them automatically by utilizing the above ML classifiers. Intuitively, the relative positions of all the concepts can be achieved by pairwise comparisons. However, pairwise comparisons will not only increase the algorithm complexity but also tend to introduce falsepositive relations. To deal with this problem, we deployed our previous work [
In the literature, automatic methods were proposed to alleviate human efforts from different aspects of the ontology lifecycle. Many researchers utilized automatic methods to facilitate semantic knowledge extraction for ontology enrichment. For example, Pembeci et al [
Our study differs from the above work mainly in the following aspects: (1) Instead of predicting the insertion place of a new concept or predicting the relation between a particular concept pair, we predict the whole hierarchical structure for a given set of concept names; (2) aside from names of the concepts, we do not need extra information to predict their positions in the whole group; and (3) instead of concatenating the child and the parent, we encode them separately and use their subtraction as an input instance for the ML models.
We tested our methodology in the FMA [
For our analysis, we used version 5.0.0 of the FMA (Structural Informatics Group at the University of Washington) [
We use the FMA to describe the data preparation process without a loss of generality. We first extracted all the concept pairs directly related by
The
The reason that we chose these 2 kinds of
The data preparation process is illustrated in
The data preparation process. The final dataset D consists of 3 parts: (1) all of the direct
Our aim was to train an ML algorithm that was able to determine whether 2 given ordered terms maintain 1 of the 3 relations between them, namely, an
We shuffled the input vectors along with their labels and used 80% of them as the training set, 10% of them as the validation set, and the remaining 10% as the testing set. Since FMA terms are all short texts, we selected the classic TextCNN proposed by Yoon Kim [
We ran the models using Keras [
In the CNN model, we used 3 Conv1D layers and 2 MaxPooling1D layers following the input layer. After flattening the last layer’s output, we added 2 dense layers such that the former had a
The second classification model we used was BiLSTM. After the input layer, we added a BiLSTM layer with 32 memory units in the middle. Then, we flattened the output of the last layer to add a dense layer which had a
For both models, we set the training data to batches of size 512 and set the
The testing set was used to evaluate the performance of each model. By comparing the predicted results with the real situations in the FMA, we calculated metrics such as the precision, recall, and F1 scores for each model separately.
To demonstrate the robustness of our trained models, we repeated the above experiment 100 times and obtained the average precision, recall, and F1 values. The training set, validation set, and testing set were randomly divided each time, but the 8:1:1 ratio was maintained. In the following step, we selected a particular group of terms from each testing set and automatically obtained the taxonomy and partonomy structures among those terms.
The above ML models only predict if 2 given terms are directly related by
An intuitive solution is to use the pairwise comparison. Let the target term set be Q. Suppose the number of terms in Q is
As such, to reduce the use of pairwise comparison, we deployed our previous work [
The use of lexical granularity to obtain the relative positions of terms. (1) Parallel concept sets (PCS) and PCS thread detection; 7 PCS nodes and 4 PCS threads were detected in this example. PCS: represented by dashed rectangles; Concept names: represented by circles; Substring relations: represented by dashed arrows. (2) Relation prediction.
A parallel concept set (PCS) is a set comprised of concepts sharing the same level of conceptual knowledge [
in a single occurrence of the modifiers used [
In order to detect all the symmetric concept pairs in
As noted, for hierarchical relations, the parent term is more general than the child term and is usually a substring of the child term. As a result, we can leverage the substring threads in Q to organize the PCSs identified from the above step. We used
The relative positions of nonroot PCS nodes were determined. Hence, we no longer looked for their parents elsewhere but only predicted relations between concepts in neighboring nodes. Specifically, we first paired each term in the PCS with its substring term in the parent PCS. If no substring term was found in the parent PCS, the term would be paired with every item in the parent. As illustrated in
Determination of term pairs to be fed into machine learning models for relation prediction. Parallel concept sets (PCSs): represented by rectangles; Concept names: represented by ovals. (a) A, B, and C are 3 PCS nodes; A, C and B, C are neighboring nodes along 2 PCS threads, respectively. Substring relations: represented by solid arrows. As the rightmost term C has no substring term in B, it is paired with every term in B, represented by dashed arrows. Each arrow (solid or dashed) connects 2 terms such that the relation between them is predicted using classification models. (b) A and B are 2 different PCS thread roots. Each root is paired with every PCS node in other threads under different roots; red dashed arrows are used to connect them. For instance, (C, A) is such a pair. (c) Classification models are used to predict the pairwise relations between concept names in C and A from the above step.
In regard to the PCS thread roots, if all the threads shared 1 root, no further treatment was needed. If there existed more than 1 different thread root, we still leveraged pairwise comparison to determine the parents for the roots: For each PCS thread root, we first paired it with every PCS node in other threads, as illustrated in
Lastly, only
To test the generalizability of our method, we selected a group of terms from each of the testing sets in the 100 crossvalidation experiments for automatic structuring. As mentioned, the most useful scenario happens when the terms are closely related instead of semantically distant. Thus, we only selected terms that belong to the same tree for experiments.
The process was as follows: Firstly, we collected all the term roots in the testing set and collected all the
For the selected 100 cases, we followed the steps described above to predict the whole semantic map among the concept names in the groups. Our experiments separately leveraged the 2 previously trained models for direct relation prediction. By comparing the predicted results with the real cases in the FMA, we evaluated the performance of our methodology by calculating the average precision, recall, and F1 values for all the cases.
For a more specific analysis of the results, we selected the largest case with root “First Rib” among the 100 cases. The set contained 57 concepts with 89 relations among them in the FMA, including 34
To demonstrate the advantage of our PCSbased method, we performed another group of experiments for the case study on “First Rib” based on primitive pairwise comparisons among the whole set of concept names. We fed 3192 (from 57×56) term pairs to the models for direct relation predictions. Then, a comparison between the PCS threadbased method and the primitive pairwisebased method was made for this case.
Using the remaining 10% of the data as the testing set in each of the crossvalidation experiments, we evaluated the performances of the 2 models on direct relation prediction between 2 given concept names. The average results are shown in
Average performances of the 2 models on direct relation prediction (100 rounds).
Model 



Overall  

P^{a}  R^{b}  P  R  P  R  P  R  F1 
BiLSTM^{c}  0.93  0.91  0.90  0.91  0.97  0.93  0.95  0.92  0.93 
CNN^{d}  0.91  0.90  0.89  0.90  0.94  0.92  0.92  0.91  0.91 
^{a}P: precision.
^{b}R: recall.
^{c}BiLSTM: Bidirectional Long Shortterm Memory Networks.
^{d}CNN: Convolutional Neural Networks.
In the 100 testing sets, we found that the sizes of all trees were less than 60, and the 100 term groups we selected had an average size of 25. The smallest group contained 20 terms and the largest group contained 57 terms.
We applied the PCSbased algorithm to the 100 cases and calculated the average precision, recall, and F1 values for
Average performances of the parallel concept set (PCS) threadbased algorithm on 100term groups.
Model 


Overall  

P^{a}  R^{b}  P  R  P  R  F1 
BiLSTM^{c}  0.84  0.79  0.82  0.68  0.83  0.76  0.79 
CNN^{d}  0.72  0.79  0.72  0.69  0.72  0.76  0.74 
^{a}P: precision.
^{b}R: recall.
^{c}BiLSTM: Bidirectional Long Shortterm Memory Networks.
^{d}CNN: Convolutional Neural Networks.
To analyze the influence of PCS nodes that contain at least 2 symmetric terms (ie, big PCS nodes) on the performances of the above algorithm, we calculated the proportion of big PCSs among all the PCSs for each study case and demonstrated the relation between the proportion and the F1 value (
Further, to demonstrate the usefulness of ML models in our approach, we collected all the
As the above results show, our proposed algorithm works well on both term pairs, with or without obvious lexical patterns.
The relation between the proportion of big parallel concept set (PCS) nodes and the F1 value for 100 cases.
For the specific case on “First Rib,” in the 57 concept names to be structured, we detected 1 symmetric modifier pair, (
The results of the automatic structuring of term groups based on PCS threads and pairwise comparisons are shown in
We analyzed the results from the “First Rib” case for our PCS threadbased algorithm concerning the BiLSTM model to demonstrate why some relations were wrongly predicted or missed.
The parallel concept set (PCS) threadbased algorithm versus the primitive pairwisebased algorithm on the “First Rib” case, using different models.
Model and Algorithm 


Overall  


Precision  Recall  Precision  Recall  Precision  Recall  F1 



Alg_{1}^{b}  1.0  1.0  0.71  0.63  0.83  0.78  0.80 

Alg_{2}^{c}  0.94  1.0  0.55  0.90  0.66  0.94  0.78 



Alg_{1}  0.97  1.0  0.73  0.64  0.83  0.78  0.80 

Alg_{2}  0.58  1.0  0.42  0.98  0.47  0.98  0.64 
^{a}BiLSTM: Bidirectional Long Shortterm Memory Networks.
^{b}Alg_{1}: PCS threadbased algorithm.
^{c}Alg_{2}: pairwisebased algorithm.
^{d}CNN: Convolutional Neural Networks.
Using the BiLSTM model, our approach predicted 83 relations among the group of concept names, including 34
Compared to the 55 real
The 14 unexpected
The first type was detected
Types of unexpected
The second type was detected
The third type was detected
Although the above instances do not exist in the FMA, they are not all semantically wrong. For example, instances of the first type can be inferred from relation transitivity. Moreover, compared to the real cases in the FMA, the 6 instances of the second type were more reasonable because they show a finer granularity than their counterparts in the FMA. Also, the 2 instances of the third type were semantically correct.
On the other hand, the 20 missed
If those missed parentchild term pairs were fed into the BiLSTM model, could they be correctly detected?
If the group of concept names to be structured do not show much relevance in their linguistic features, the number of PCS thread roots will increase. Under that circumstance, as our algorithm pairs each root with every term in the other threads (
This study proposes an innovative approach to the automatic construction of a given set of concept names with regard to
Some concepts may have multiple
It is not a simple transition from step 1 to step 2. As shown by our results, even though the performances of ML models on relation prediction for randomly selected pairs may be quite promising (
To improve the performance of our PCSbased method, we need to include more possible pairs to be inputted into the ML models for relation prediction. This requires a mechanism to be able to identify hierarchical relations between terms that are not lexically related, and in the meantime, to avoid introducing falsepositive results. The difficulty lies in the ability to distinguish the
Also, to make the methodology provided in this study scalable to more cases in diverse ontologies such as SNOMED CT [
The 100 cases we experimented with in the FMA are not large because the closely related term trees in the testing sets are relatively small. If the target group is much larger, the performance of the proposed algorithm may not be as strong since more terms will increase the number of
This study is an initial step toward automated ontology construction. As the training dataset is collected from the same ontology, the methodology we proposed in this study is applicable, provided that a part of the ontology is already known. To structure an ontology from scratch, the relations between entities will have to be learned from other knowledge sources such as the UMLS [
In this study, given a set of closely related concept names in the FMA, we investigated an automatic way to generate the taxonomy and partonomy structures for them. We trained machine learning models to predict if there exists a direct hierarchical relation between 2 given terms; based on this, we further proposed an innovative granularitybased method to automatically organize a given set of terms. The 100 cases that we studied in the FMA demonstrated that our method is effective for structuring ontology concepts automatically, provided that their names are given. We believe this pioneering study will shed light on future studies on automatic ontology creation and ontology maintenance.
The original hierarchy map for the concept group “First Rib” in the Foundational Model of Anatomy. Red arrows represent 34
Result for the automatic structuring of 57 concept names. The nodes represent the 37 parallel concept sets (PCSs); dashed rectangles represent PCSs with more than 1 term. The nodes in green are the 2 thread roots; arrows connect terms instead of PCSs. Gray arrows represent the 69 correctly predicted
Bidirectional Long ShortTerm Memory Network
Convolutional Neural Networks
the Foundational Model of Anatomy
machine learning
ontology learning
Web Ontology Language
parallel concept set
resource description frame
systematized nomenclature of medicineclinical terms
support vector machine
Unified Medical Language System
This work was supported in part by the Hunan Provincial Natural Science Foundation of China (No.2019JJ50520), the National Science Foundation of China (No.61502221), and the double firstclass construct program of the University of South China (No.2017SYL16).
LL conceived the idea and designed the algorithm. JF performed the experiments. LL and JF created the first manuscript. HY and JW contributed and revised later versions of the manuscript. All authors read and approved the final manuscript.
None declared.