%0 Journal Article
%@ 2291-9694
%I JMIR Publications
%V 8
%N 8
%P e16948
%T Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases
%A Arguello Casteleiro,Mercedes
%A Des Diz,Julio
%A Maroto,Nava
%A Fernandez Prieto,Maria Jesus
%A Peters,Simon
%A Wroe,Chris
%A Sevillano Torrado,Carlos
%A Maseda Fernandez,Diego
%A Stevens,Robert
%+ Department of Computer Science, University of Manchester, Kilburn Building, Oxford Road, M13 9PL, Manchester, , United Kingdom, 44 161 275 6251, robert.stevens@manchester.ac.uk
%K evidence-based practice
%K artificial intelligence
%K deep learning
%K semantic deep learning
%K analogical reasoning
%K embedding analogies
%K PubMed
%D 2020
%7 6.8.2020
%9 Original Paper
%J JMIR Med Inform
%G English
%X Background: How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen (“queen = −man +king +woman”). Objective: This study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise). Methods: As preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = −Romeo +die +died” (search query: −Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query −x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians. Results: We obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments. Conclusions: Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited. 
%M 32759099
%R 10.2196/16948
%U https://medinform.jmir.org/2020/8/e16948
%U https://doi.org/10.2196/16948
%U http://www.ncbi.nlm.nih.gov/pubmed/32759099