This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Precision medicine (PM) is playing a more and more important role in clinical practice. In recent years, the scale of PM research has been growing rapidly. Many reviews have been published to facilitate a better understanding of the status of PM research. However, there is still a lack of research on the intellectual structure in terms of topics.
This study aimed to identify the intellectual structure and evolutionary trends of PM research through the application of various social network analysis and visualization methods.
The bibliographies of papers published between 2009 and 2018 were extracted from the Web of Science database. Based on the statistics of keywords in the papers, a coword network was generated and used to calculate network indicators of both the entire network and local networks. Communities were then detected to identify subdirections of PM research. Topological maps of networks, including networks between communities and within each community, were drawn to reveal the correlation structure. An evolutionary graph and a strategic graph were finally produced to reveal research venation and trends in discipline communities.
The results showed that PM research involves extensive themes and, overall, is not balanced. A minority of themes with a high frequency and network indicators, such as Biomarkers, Genomics, Cancer, Therapy, Genetics, Drug, Target Therapy, Pharmacogenomics, Pharmacogenetics, and Molecular, can be considered the core areas of PM research. However, there were five balanced theme directions with distinguished status and tendencies: Cancer, Biomarkers, Genomics, Drug, and Therapy. These were shown to be the main branches that were both focused and well developed. Therapy, though, was shown to be isolated and undeveloped.
The hotspots, structures, evolutions, and development trends of PM research in the past ten years were revealed using social network analysis and visualization. In general, PM research is unbalanced, but its subdirections are balanced. The clear evolutionary and developmental trend indicates that PM research has matured in recent years. The implications of this study involving PM research will provide reasonable and effective support for researchers, funders, policymakers, and clinicians.
Precision medicine (PM), also called personalized medicine, is a new medical model aimed at providing precise diagnosis, therapy, prognosis prediction, and prevention strategies based on information in a patient’s genes, proteins, and their environment [
Great progress has been made in personalized treatment in the field of oncology. According to a meta-analysis of phase II clinical trials, a personalized treatment strategy across malignancies yields a better outcome and lower likelihood of death than nonpersonalized targeted therapies [
Owing to the potential importance of PM, a few leading experts reviewed this new medical approach in regards to its relevant history, clinical applications, and any interdisciplinary research associated with PM, such as bioinformatics, artificial intelligence, and big data [
Coword analysis is a bibliometric method used to identify relationships between subfields within research areas and to measure the strength of the relationships [
Coword analysis has been widely used to illustrate the intellectual structure and developmental status of research areas [
PM is a new medical approach that classifies patients into different groups related to their diagnosis, treatment, and prevention based on individual gene, protein, or environmental information. It is noteworthy that the terms “precision medicine,” “personalized medicine,” “stratified medicine,” and “P4 [predictive, preventative, personalized, and participatory] medicine” are still interchangeably used by some organizations and scientists [
Every person has polymorphisms in their DNA, RNA, and proteins, as well as methylation. Recent scientific methods have enabled the analysis of biomarkers using omics techniques, including genomics, epigenomics, transcriptomics, proteomics, metabolomics, microbiome analysis, and immunomics. However, pathologists, epidemiologists, and clinicians have also made many contributions to the discovery of links between biomarkers and clinical features [
PM also plays an important role in disease treatment and prevention. DNA information from individual phenotyping will lead to more effective and accurate treatment and prevention. For example, the high risk in women for developing breast cancer is strongly correlated with mutations in
With the pace of PM research rapidly increasing, a large number of studies have been performed from different perspectives. Many reviews have been published to facilitate a better understanding of the status of PM research, as well as clarifying the concept, history, clinical application, ethical concerns, and technological challenges. The efforts listed above have helped raise awareness of the new clinical model among patients, clinicians, and even health policy makers. They have played an important role in the development of intelligent support for decision-making, clinical practice, and public health policies.
According to recent reviews, the features of PM research are as follows. First, some reviews reported by top experts in the field of PM research discuss the foundation, techniques, applications, and perspectives of this new discipline [
Research on PM is still increasing, and some important discoveries have already been beneficial to patients. However, there is still a long way to go in the utilization of PM. How can interdisciplinary researchers start studies? What type of public policy really makes sense regarding the field? How can funders ensure that investment works effectively? All these decisions should be made based on knowledge of PM, so great efforts have been made to describe the nature of this new field. The aim of our study was to address the following problems:
What is the distribution of topics in PM research?
What is the correlation structure of topics in PM research?
What are the evolutionary venations and development trends of PM research?
According to previous studies, papers in the Web of Science Core Collection (WOSCC) can represent the status of medical science, including PM; therefore, we chose WOSCC as our data source. Data processing is shown in
Papers were collected from the WOSCC that covered the period from 1999 to 2018. In this study, initial retrieval was conducted using “precision medicine,” “P4 medicine,” “personalized medicine,” and “stratified medicine” as terms in the field of Topic to guarantee a recall ratio. It included the document types of Article, Review, and Proceedings. The retrieval strategy is illustrated as follows: TOPIC: (“precision medicine”) or TOPIC: (“personalized medicine”) or TOPIC: (“individualized medicine’’) or TOPIC: (“P4 medicine’’) or TOPIC: (“stratified medicine”). Refined by: Document Types: (Article or Review or Proceedings Paper) Timespan: 1999 to 2018. Indices: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI, CCR-EXPANDED, and IC.
A total of 25,573 publications were retrieved, and their bibliographic records were downloaded through the function Save to Other File Formats provided by WOS. Next, a text file containing all records in Tab-delimited (Win) format was obtained. In general, records without keywords data were excluded. Meanwhile, publications not containing the search terms above in Title (the TI field) or Keywords (the DE field) were identified as unrelated to PM research [
In this study, mainstream keywords of high frequency were selected for further analysis, that is, based on their coword network. The largest connected component extracted from the whole coword network represents the mainstream research directions of one field [
Search procedure for documents in precision medicine research. DE: descriptor; TI: title; WOSCC: Web of Science Core Collection.
Basic statistics of the sample papers from 2000 to 2018.
Keywords in a paper provide an adequate description of its contents. A study on the correlation of keywords can reveal connotations of contents [
Many methods have been used to conduct coword analysis, including the method of social network analysis. It is derived from mathematical graph theory, which computes indicators of a coword network and identifies characteristics of the whole network and an individual network [
As unconnected or uncorrelated keywords cannot reflect main thematic subdomains and as what we focused on is the largest component [
Centralization measures the overall characteristics of global network, degree centralization measures the centripetal degree, and closeness centralization measures the proximity degree between any 2 nodes in the network. Its high level equals the close distance between any 2 nodes on the whole. Betweenness centralization indicates the degree of correlation between any 2 nodes through a third one (bridge), and its high level equal the high possibility of correlation through a bridge. Similarly, centrality, the individual network indicator, measures the capacity of one node in network. High degree centrality of one node indicates that it is central in the network and is correlated to many other nodes. It also indicates its powerful capacity of influence and control. High closeness centrality equals the capacity of one node that correlates others as short as possible or directly correlates others. High betweenness centrality equals the powerful role as a bridge to correlate other 2 nodes. Density measures the correlation strength within the network [
In addition, community detection is an effective method to discover research directions or subfields according to the correlation structure of the network [
Visualization is an important method to intuitively display the intellectual structure of coword correlation, the thematic evolution of a research field, and even the comparative development trends of subfields [
A strategic diagram indicates the comparative status and evolutionary trends of subfields of one research field. It is a two-dimensional (2D) map in which the x-axis represents centrality and the y-axis represents density [
In this study, a total of 17,818 keywords were extracted from the sample, and the total frequency was 47,883. The frequency distribution conforms to the power law distribution with an exponent of –1.32 (
The distribution of the keyword frequency in PM research.
Top 100 keywords in precision medicine research.
Number | Keywords | Frequency |
1 | Biomarkers | 1018 |
2 | Genomics | 970 |
3 | Cancer | 851 |
4 | Therapy | 731 |
5 | Genetics | 684 |
6 | Drug | 549 |
7 | Target Therapy | 510 |
8 | Pharmacogenomics | 508 |
9 | Pharmacogenetics | 475 |
10 | Molecular | 357 |
11 | Breast Cancer | 333 |
12 | NGSa | 314 |
13 | Tumor | 296 |
14 | Prediction | 287 |
15 | Mutation | 281 |
16 | Clinical Trials | 268 |
17 | Gene | 259 |
18 | Sequencing | 242 |
19 | Imaging | 239 |
20 | Diagnostics | 223 |
21 | Proteomics | 211 |
22 | Prognosis | 209 |
23 | DNA | 195 |
24 | Phenotype | 187 |
25 | Oncology | 185 |
26 | SNPb | 173 |
27 | Omics | 170 |
28 | Pharmacology | 165 |
29 | Metabolism | 160 |
30 | Lung Cancer | 160 |
31 | Bioinformatics | 151 |
32 | Asthma | 148 |
33 | Chemotherapy | 147 |
34 | Immunotherapy | 146 |
35 | Stem Cell | 143 |
36 | MicroRNA | 137 |
37 | Epigenetics | 137 |
38 | Prostate Cancer | 135 |
39 | Genetic Test | 132 |
40 | EGFRc | 129 |
41 | Risk | 129 |
42 | Inflammation | 126 |
43 | GWASd | 125 |
44 | Polymorphism | 124 |
45 | Colon Cancer | 123 |
46 | Immune | 123 |
47 | Nanotechnology | 122 |
48 | PETe | 122 |
49 | Translation Medicine | 120 |
50 | NSCLCf | 119 |
51 | Heterogeneity | 118 |
52 | Big Data | 118 |
53 | Systems Biology | 117 |
54 | Machine Learning | 117 |
55 | Protein | 115 |
56 | Pathology | 115 |
57 | Genotype | 114 |
58 | Ethics | 111 |
59 | Health Care | 110 |
60 | Drug Development | 110 |
61 | Pharmacokinetics | 109 |
62 | Drug Delivery | 108 |
63 | RNA | 105 |
64 | Diagnosis | 105 |
65 | Prevention | 103 |
66 | Biobank | 103 |
67 | Biology | 102 |
68 | Patients | 100 |
69 | Diabetes | 100 |
70 | Theranostics | 100 |
71 | Metabolomics | 97 |
72 | Liquid Biopsy | 97 |
73 | Screening | 97 |
74 | Depression | 96 |
75 | Classification | 95 |
76 | MRI | 93 |
77 | Molecular Imaging | 92 |
78 | Brain | 91 |
79 | Decision Support | 91 |
80 | Electronic Health Records | 89 |
81 | Systems Medicine | 89 |
82 | Resistance | 88 |
83 | Cardiology | 87 |
84 | Clinical Medicine | 87 |
85 | Circulating Tumor Cell | 86 |
86 | Pancreatic Cancer | 84 |
87 | Companion Diagnostics | 83 |
88 | Nanoparticle | 83 |
89 | Toxicity | 81 |
90 | Radiology | 81 |
91 | Mass Spectrometry | 79 |
92 | Drug Resistance | 77 |
93 | Clinical Practice | 75 |
94 | Microarray | 74 |
95 | Cell | 73 |
96 | Metastasis | 72 |
97 | Molecular Diagnostics | 70 |
98 | Education | 70 |
99 | Gastric Cancer | 70 |
100 | Gene Expression | 70 |
aNGS: next-generation sequencing.
bSNP: Single Nucleotide Polymorphisms.
cEGFR: epidermal growth factor receptor.
dGWAS: genome-wide association studies.
ePET: positron emission tomography.
fNSCLC: non–small cell lung cancer.
The 244 keywords (frequency above 20) in the study generate a total of 9178 edges, which constitute a keyword correlation network. It is known that the network is the largest connected component, indicating that a relatively consistent mainstream direction has been formed in PM studies in recent years. As shown in
In the same way, the indices used to describe each keyword (degree centrality, closeness centrality, and betweenness centrality) represent their position and role in the network. As shown in
The statistics of the correlation network in precision medicine research.
Indicators | Value |
Number of nodes | 244 |
Number of edges | 9178 |
Average degree | 75.2295 |
Network all degree centralization | 0.6214 |
Network all closeness centralization | 0.6685 |
Network betweenness centralization | 0.0277 |
Network clustering coefficient | 0.4843 |
Density | 0.3096 |
Top 10 keywords in terms of degree centrality.
Ranking | Keywords | Degree |
1 | Biomarkers | 225 |
2 | Genomics | 222 |
3 | Therapy | 220 |
4 | Cancer | 215 |
5 | Genetics | 213 |
6 | Drug | 208 |
7 | Prediction | 184 |
8 | Pharmacogenomics | 183 |
9 | Target therapy | 177 |
10 | Molecular | 172 |
Top 10 keywords in terms of closeness centrality.
Ranking | Keywords | Closeness |
1 | Biomarkers | 0.9310 |
2 | Genomics | 0.9205 |
3 | Therapy | 0.9135 |
4 | Cancer | 0.8967 |
5 | Genetics | 0.8901 |
6 | Drug | 0.8741 |
7 | Prediction | 0.8046 |
8 | Pharmacogenomics | 0.8020 |
9 | Target therapy | 0.7864 |
10 | Molecular | 0.7739 |
Top 10 keywords in terms of betweenness centrality.
Ranking | Keywords | Betweenness |
1 | Therapy | 0.0305 |
2 | Biomarkers | 0.0304 |
3 | Genomics | 0.0304 |
4 | Drug | 0.0289 |
5 | Genetics | 0.0283 |
6 | Cancer | 0.0260 |
7 | Pharmacogenomics | 0.0182 |
8 | Prediction | 0.0166 |
9 | Target therapy | 0.0148 |
10 | Gene | 0.0135 |
On the basis of community detection in the coword network, PM research has focused on 5 theme communities or research subdirections in the last decade. These communities are visualized as
The structural characteristics of PM research need to be further assessed by the visualization of its coword networks. As shown in
Correlation structure of theme communities in PM research.
Furthermore, in terms of the internal correlation of the research themes community (
Indicators of 5 theme communities in precision medicine research.
Community | Number of nodes | Number of edges | Total frequency | Average degree | Density |
C1-Cancer | 76 | 1535 | 8221 | 82.8026 | 0.5386 |
C2-Biomarkers | 53 | 652 | 5261 | 78.6792 | 0.4731 |
C3-Genomics | 45 | 469 | 4473 | 70.7778 | 0.4737 |
C4-Drug | 40 | 385 | 3741 | 68.375 | 0.4936 |
C5-Therapy | 30 | 211 | 2743 | 65.7667 | 0.4851 |
A total of 5 theme communities in precision medicine research. EGFR: epidermal growth factor receptor; NGS: next-generation sequencing; NSCLC: non–small cell lung cancer.
The bibliographic data were divided with the year as the unit of time, and an evolution graph was generated to reveal the evolutionary patterns of PM research. In addition, based on centrality and density, theme communities were graphed in a strategic graph (a 2D map). The relative status and development trend of each theme community in the PM research were revealed.
To clearly show the development, the evolution of PM research was divided into 2 stages, namely, Stage 1 (2009-2013) and Stage 2 (2014-2018), as shown in
The evolution of theme communities of precision medicine research over time (2009-2013). ALK: ALK Receptor Tyrosine Kinase; BRAF: v-raf murine sarcoma viral oncogene homolog B1; BRCA: BReast CAncer gene; EGFR: Epidermal growth factor receptor; HER2: Receptor tyrosine-protein kinase erbB-2; NGS: Next-generation sequencing; KRAS: Ki-ras2 Kirsten rat sarcoma viral oncogene homolog; mTOR: The mammalian target of rapamycin; NSCLC: Non–small cell lung cancer.
The evolution of theme communities of precision medicine research over time (2013-2018). ALK: ALK Receptor Tyrosine Kinase; BRAF: v-raf murine sarcoma viral oncogene homolog B1; CYP2C9: Cytochrome P450 2C9; EGFR: Epidermal growth factor receptor; GWAS: genome-wide association study; KRAS: Ki-ras2 Kirsten rat sarcoma viral oncogene homolog; NGS: Next-generation sequencing; NSCLC: Non–small cell lung cancer; PIK3CA: phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha.
First, there are 4 obvious thematic evolutions: the Pharmacogenomics and Pharmacogenetics venation (including Pharmacogenomics, Genetics, Polymorphism, Adverse Drug Reactions, and CYP2C9), the epidermal growth factor receptor (EGFR) and v-raf murine sarcoma viral oncogene homolog B1 (BRAF) venation (including Molecular Imaging, Drug Delivery, non–small-cell lung cancer [NSCLC], and Ki-ras2 Kirsten rat sarcoma viral oncogene homolog [KRAS]), the Proteomics and Metabolomics venation (including Sequencing, Bioinformatics, and Translation Medicine), and the Ethics and Cost-Effectiveness venation (including Health Care, Genetic Test, Health Policy, and Breast Cancer).
Each venation is independent and less differentiated, and the internal system for the theme communities is relatively mature. The Pharmacogenomics and Pharmacogenetics venation and the EGFR and BRAF venation are larger scale, so they can thus be considered the 2 important research directions in this period. The evolution of some themes, such as Schizophrenia and Oncogenes, has been interrupted, which may be due to the lack of continuous concern about such subjects or their integration into other subjects. We also find that there are a few isolated themes during different periods, such as Policy, Clinical Practice, Tumor, Chemotherapy, Organ, and NGS. Owing to strong internal correlation, these themes have been clustered as a research direction. However, such studies have not yet formed a systematic and continuous direction.
We performed an independent analysis for the years 2013 and 2018 to discover the continuity between 2013 and 2014. There are many overlapping thematic communities in these 2 years as well as overlapping research themes, such as EGFR and BRAF, Molecular Imaging and Drugs, and Pharmacogenomics and Pharmacogenetics, which exhibit good continuity. Overall, the sustainability and stability of PM research in this stage are better than that in Stage 1. Research on PM in terms of themes is more concentrated, which indicates the more consistent and mature direction of progression.
According to the evolutionary graph, there are 3 major research themes at this stage: Molecular Imaging and Drug Delivery, EGFR and Mutation, and Pharmacogenomics and Pharmacogenetics. First, the Molecular Imaging and Drug Delivery venation includes Theranostics, Diagnostics, Immunotherapy, and Machine Learning. The EGFR and Mutation venation includes NSCLC, KRAS, Tumor, Target Therapy, NGS, DNA, and MicroRNA. The Pharmacogenomics and Pharmacogenetics venation includes Cytochrome P450, Epigenetics, Cardiovascular Disease, Omics, and Bioinformatics. Simultaneously, Stratification and Prediction and related topics have also formed an independent evolutionary venation. Although small in scale, they have also become a self-contained system. However, there are also discontinuous evolutions and isolated topics at this stage, such as the evolution of Bipolar Disorder, which was interrupted in 2015. In this period, Parkinson Disease, Stem Cell, and Big Data finally become isolated research themes rather than evolutionary venations.
The theme community in the PM study is distributed in the strategic map according to centrality and density (
The relative development status and trends of 5 theme communities in the strategic diagram.
Based on the results, it is possible for us to better understand the main research directions of PM research and accurately evaluate its importance, maturity, and interactions. First, we determined that overall work in PM research is unbalanced but that the theme community is balanced. As PM was newly born as an independent academic subject, researchers paid most of their attention to only a few popular words, such as Biomarkers, Genomics, Cancer, Therapy, Genetics, Drug, Target Therapy, Pharmacogenomics, Pharmacogenetics, and Molecular. The words mentioned above can be classified into the following categories: The applied subject (Cancer), The associated technology and research (Biomarkers, Genomics, and Genetics), pharmacology (Pharmacogenomics), and clinical practice (Treatment, Risk Prediction, Molecular Target Treatment, and Diagnosis). These words not only reflect areas of scientific concern, but more importantly, they indicate the major research directions of PM. However, we also found that the attention paid to most research themes is relatively dispersed. We could speculate that the current status of PM research is possibly as follows: (1) the most mature application of PM is in the subject of Oncology; (2) scientists are interested in discovering Biomarkers, mainly using genomics and genetic methods; (3) pharmacology is an important interdisciplinary field involved with PM, with the aim to make drug utility safer and more efficient; and (4) PM is widely used in Clinical Medicine, including for consulting, diagnosis, and treatment (especially molecular target treatment).
With the visualization of the coword network, we found that the themes were more inclined to be clustered around other popular minority keywords. Thus, the theme communities, both well-layered and balanced-scaled, were finally formed. The communities included C1-Cancer, C2-Biomarkers, C3-Genomics, C4-Drug, and C5-Therapy. According to the analysis of correlation between the theme’s communities, we can draw the following inferences: C1-Cancer, as the largest community, indicates that the application of PM in Clinical Oncology might already be mature. The other directions, such as technical studies and Clinical Medicine, are widely associated with Cancer. C2-Biomarkers is the second largest group and plays a key role as the basis of PM research. Scientists still strive for biomarker discovery with various techniques and for the transformation of these discoveries into clinical therapeutics and the prediction of clinical outcomes [
Through the analysis of the evolution of theme communities over time, PM research has a clear evolutionary and developmental trend. In 2 stages of evolution, we have discovered a large number of well-concentrated evolutionary pathways, which indicates the maturity of PM. The theme community in PM research is well-structured and contains the core and promising directions, such as Biomarkers, Pharmacogenomics, MicroRNA, Imaging, and even Machine Learning. We also identified a dramatic development in techniques and pharmacology directions. It is worth noting that the trend toward PM in nononcology diseases has the potential to become mature, and NSCLC could develop to become an independent and mature venation. It indicates that the application of PM in NSCLC is relatively mature. Clinicians have applied strategies or technologies involved with PM, such as Biomarker, Molecular Imaging, and Pharmacogenomics, to achieve precise treatment [
Our study reveals the structure and developmental trends of PM research from the perspective of keywords and their relationships. To some extent, this study provides insight into PM research; however, there are still limitations to this work. Regarding the research sample, this study used the literature to reveal the development status of PM. This research method could be regarded as a reasonable and cost-effective strategy rather than a comprehensive and accurate way to evaluate the true status of PM research.
Our study reveals the hotspots, structures, evolutions, and developmental trends of PM research in the past 10 years by means of social network analysis and visualization. We also made the following valuable discoveries: (1) using a graph, the network can describe, in detail, the development of PM research; and (2) the network uncovers the relationship between the themes and the intrinsic mechanism about how they interact, which could provide insights into future research directions.
In the future, we will perform data mining on the content of PM-related literature (eg, reports and illness records) to better reveal the condition of the entire network from various perspectives. In terms of research methods, based on previous work, the efficacy of coword analysis has been identified. Our study also validates this research method, and using it, we were able to obtain some valuable discoveries. In future studies, we aim to perform a further, comprehensive assessment of PM research through various perspectives, such as interdisciplinary research and institutes.
two-dimensional
v-raf murine sarcoma viral oncogene homolog B1
epidermal growth factor receptor
human epidermal growth factor receptor 2
Ki-ras2 Kirsten rat sarcoma viral oncogene homolog
next-generation sequencing
non–small cell lung cancer
predictive, preventative, personalized, and participatory medicine
positron emission tomography
precision medicine
Web of Science Core Collection
This study was supported by the National Natural Science Foundation of China Funded Project (71874125), the Ministry of Education in China’s Project of Humanities and Social Sciences (18YJA870004), and the Wuhan University Scientific Research Project (2042014KF0164).
JH and XL conceptualized the study and collected and analyzed the data. They participated in all phases of the review. WD and XX assisted with study conception and design, as well as interpretation of data, drafting of the manuscript, and critical revision. All authors contributed to the writing of the manuscript and approved the final version.
None declared.