Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis

Background: Artificial intelligence (AI)–based therapeutics, devices, and systems are vital innovations in cancer control; particularly,


Background
Every year, over 200 million healthy life years are lost because of cancer, making it one of the highest health care burden causing disability and mortality among men and women [1].Fortunately, many types of cancers can be prevented or effectively treated if patients are diagnosed in a timely manner and offered optimal therapies.In many parts of the world, however, programs for cancer control and prevention are facing multiple barriers because of limited health service infrastructure, availability of treatment options, and health worker capacities.
Artificial intelligence (AI) is considered a disruptive innovation in health and medicine.Over the past six decades, AI has been widely applied to many areas of medical research and clinical practice.The number of published papers on AI and its impacts has been rapidly growing within the research community over the past decade.A bibliometric study has shown that the number of studies on AI applications in medicine has tripled in the past 3 years, with the highest interest in cancer research [2].Various techniques, such as robotics, machine learning, and artificial neural networks, have been applied to the study of cancer, showing promising improvements in clinical prediction, treatment, and diagnosis.For instance, machine learning techniques in the application of proteomics and genomics could increase precision in estimating survival and inform the selection of therapies [3].In large populations, the development and application of AI also holds potential in screening for cancer and scaling up treatment services in a timely manner.

Literature Review
Many approaches and products have been developed to support cancer treatment and for prevention at health facilities and within communities.However, the synthesis of resulting evidence from these efforts is necessary to inform decision making.Some authors have conducted systematic reviews of the performance and effectiveness of AI techniques and products in specific cancers [3][4][5][6][7][8][9][10]. Overall, these reviews found that almost all AI-assisted interventions led to greater effectiveness than conventional approaches.However, insights from these efforts have raised some important points for further exploration.Lisboa et al reviewed predictive models using artificial neural networks and suggested the need for rigorous evaluation of results [4].In addition, Spelt et al emphasized the importance of justifying the complex structure of datasets and individual factors in these models [5].Ray et al reviewed the wearable systems for cancer detection and found that cloud computing and long-range communication paradigms are still lacking, and that AI and machine learning should be applied to current products [8].
Other authors affirmed the greater performance of image-based AI applications to breast cancer diagnosis, but few studies have been supported by a high level of evidence.Conducting further clinical research and health technology assessment is recommended.

Objectives
With the rapid development of technologies, AI-based therapeutics, devices, and systems will be vital innovations in cancer control.To accelerate research and development, it is critical to understand current approaches in the applications of AI in cancer care, multiple disciplines involved, and the trends and establishment of the research landscapes.To our knowledge, none of the previous studies have systematically quantified the development of AI in the bibliographic literature of cancer studies.This study analyzes the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer studies.

Search Strategy
We searched and retrieved all papers related to AI in cancer care on the Web of Science (WOS) that is a Web-based database covering the largest proportion of peer-reviewed literature in this field.The full search strategy has been presented elsewhere [2].In short, we used a set of predefined search terms related to artificial intelligence and health and medicine to search the WOS for publications (inclusion step) and then excluded those that did not satisfy our eligibility criteria of publication year from 1991 to 2018 and publications other than articles and reviews (exclusion step).In this analysis, we selected all the documents of retrieved data on AI applications related to cancer care.

Data Extraction
We downloaded all data from the WOS database in .txtformat, including all information such as author names, paper title, journals, keywords, affiliations of institutions, the prevalence of citation, categories, and abstracts.All of these data were converted to an Excel file (Microsoft Excel, Microsoft Corporation) for checking the data error.A process of standardization was carried out by 2 researchers to bring together the different names of an author.Then, we filtered all downloaded data using the following criteria: (1) not original articles and reviews, (2) not about cancer and AI, and (3) not in English.Any conflict was solved by discussion (Figure 1).The combined dataset was transferred into Stata (version 14.0, STATA Corporation) for further analysis.

Data Analysis
Data were resolved based on basic indicators of publication (number of authors, publication years, and main categories), keywords (most common keywords and co-occurrence keywords), citations, usages, and abstracts.After downloading and extracting the data, we applied the descriptive statistical analysis using Stata to calculate country citations and intercountry collaboration.A network graph illustrating the network of countries by sharing the co-authorships was created, along with the author keyword co-occurrence network and countries network.VOSviewer (version 1.6.8,Center for Science and Technology, Leiden University) was used to establish a co-occurrence network and a countries network.The principles of underlying algorithms used by the software for clustering have been documented elsewhere [11][12][13][14] For content analysis of the abstracts, we applied the exploratory factor analysis to identify research domains emerging from all content of the abstracts, loadings of 0.4 [15].The Jaccard similarity index was utilized to identify research topics or terms most frequently co-occurring with each other [16].Latent Dirichlet Allocation (LDA) was used for classifying papers into corresponding topics [17][18][19][20][21].The summary of analytical techniques for each data type is presented in Table 1.

The Number of Published Items and Publication Trend
There has been a rapid increase in the number of studies applying AI to cancer research from 1991 to 2018.In particular, the research productivity of the past 10 years has accounted for over 90.66% (3223/3555) of the total papers.Rates of citation and usage are also growing fast.The mean usage (downloads) in the past 6 months of papers published in the past 1 to 2 years was twice that of those published in the past 3 to 4 years (Table 2).
In Table 3, we examine the study settings mentioned in the abstracts of publications.The bibliography included country settings 749 times, and in those, the United States was mentioned 46.5% of the times.Over 90% of the total settings were in developed countries.Noticeably, 2 countries with large populations, China and India, accounted for 3.3% and 4.4%, respectively.Analyses of keywords and abstract contents provide us with a better understanding of the scopes of studies and development of the research landscapes.Figure 2 describes the co-occurrence of keywords with the most frequent groups of terms.There were 8 major clusters emerging from 180 most frequent keywords with a co-occurrence of 30 times and higher.Some major clusters included the following: Cluster 1 (red) refers to surgery and treatment outcomes; Cluster 2 (green) focuses on the applications of AI techniques in some specific cancers; Cluster 3 (yellow) describes the therapies for colorectal cancers; and Cluster 4 (blue) illustrates applications of chemotherapy and radiotherapy.The colors of the nodes indicate principal components of the data structure; the node size was scaled to the keyword occurrences; and the thickness of the lines is based on the strength of the association between 2 keywords.
As for the content analysis of abstracts, the top 50 emerging research domains are listed in Table 4. AI techniques have been applied to various aspects of cancer research, including therapies (radiotherapy, chemotherapy, and surgery), capacities (prediction, screening, and treatment), and factors associated with outcomes (physical, social, and economic).
In Table 5, we present the research topics that were constructed using LDA.The labels of the topics were manually annotated by scrutinizing the most frequent words and titles for each topic.Topics with the highest volume of publications included (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction.Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications (Topic 2) and, more interestingly, the quality of life outcomes and functioning of patients receiving these innovations.The changes in research productivity over time are illustrated in Figure 4, which shows the rapid growth of Topics 1, 2, 3, and 4, especially in recent years.

Principal Findings
By systematically synthesizing and analyzing the bibliography of AI applications in cancer studies, we have characterized the development of its research landscape over the period from 1991 to 2018.The findings illustrate the rapidly growing research productivity and expansion of multidisciplinary approaches, largely driven by machine learning, artificial neural networks, and AI in various clinical practices.Our analysis highlights the most frequent areas of research and the paucity of research in other areas.The research topics and landscapes constructed show that the development of AI in cancer care is focused on improving prediction in cancer screening and AI-assisted therapeutics and corresponding areas of precision and personalized medicine.Our findings show the rapid growth in these areas over the past decade.Although cancer outcomes of interest covering clinical and physical functioning and mental and quality of life measures are on the rise, our analysis indicates the relative paucity of research focusing on cancer outcomes and survivorship.This is of special relevance, considering the continuously growing cancer survivor population [22].

Comparison With Past Work
This study supplements the previous global mapping on AI in medicine by analyzing the content and characteristics of studies of specific applications of AI in cancer research and clinical practice [2].Compared with previous reviews, this study is more comprehensive in describing the research trends by applying content analysis and topic modeling [4][5][6][7][8][9][10].Therefore, the findings are helpful to inform the design and priority of the settings of future studies.Classifying information sources and content in corresponding topics to identify priorities for interventions has been widely applied in many studies.For example, previous authors have analyzed newspaper and social media content to understand topics of interest related to breast cancer and secondhand smoking [23][24][25][26][27][28].However, none of the previous studies have analyzed the scientific bibliography to determine the development of research landscapes in AI applied in cancer care.Li et al proposed a text-mining framework using LDA to construct topics that were helpful for supporting systematic reviews [29].In this study, we applied this approach to classify topics that a paper belongs to.Moreover, we further analyzed the frequency of concurrence of terms and their associated clusters using factor analysis.These clusters of terms enrich the understanding of scopes of each topic, especially for diseases involving the development of multidisciplinary research.
The findings from this study help inform the future development of AI applications in cancer research and clinical practices of cancer control and management.First, the difference in citation rates between very recent articles and older articles demonstrates the speed of knowledge accumulation in this area.Understanding the scope of research landscapes helps inform the selection of variables and topics to develop an application or conduct a study.Moreover, the previous bibliometric analysis could only XSL • FO RenderX distinguish and determine trends in the applications of AI techniques in cancer care, whereas this study showed that research trends have also expanded to encompass the comparative effectiveness of these innovations compared with traditional practices [2].In addition, research landscapes have expanded beyond clinics to evaluate the functioning and performance of the patients being treated, in addition to their mental well-being and quality of life.To support this research topic, there should be more exploration of different study settings and incorporation of individual characteristics to improve the validity of AI techniques.One important question is how to integrate and scale-up AI-based applications in cancer care into clinical practice and community prevention.Currently, little is known on the adaptation and integration of AI applications into health systems and communities; future implementation research should be conducted.

Limitations
One of the shortcomings of this study is that we used only WOS databases.Although the WOS covers the greatest proportion of the literature in the field of AI research, it might not be fully representative of all databases.Another limitation is that only documents in English were selected for this study.Finally, the content analysis included only abstracts instead of full texts.Nonetheless, this topic modeling serves to expand, improve, and supplement previous systematic reviews in this field.

Conclusions
In conclusion, AI applications have been rapidly growing in cancer clinical practices, including prediction, diagnosis, enhanced therapeutics, and optimal selection.As interest in AI in medicine continues to grow, it will be increasingly critical to better understand the incremental effectiveness of these innovations and their validities in supporting the performance and quality of life of individuals after getting treated.

Figure 1 .
Figure 1.The global networking of 53 countries having at least five coauthorships classified in 8 clusters.

Figure 1
Figure1presents the global network among 53 countries having at least five co-authorships with other countries.The range of nodes represents the contribution of each country to the total number of publications, and the thickness of lines indicates the proportion of the volume of collaborations.These countries were classified into 8 clusters depending on their level of international collaborations.

Figure 2 .
Figure 2. Co-occurrence of the most frequent author's keywords.
h MRI: magnetic resonance imaging.
i SVM: support vector machine.j TORS: transoral robotic surgery.k HPV: human papilloma virus.l PT: prothrombin time.m AUC: area under the curve.

Figure 3 .Table 5 .Frequency
Figure 3. Co-occurrence of most frequent topics emerged from exploratory factor analysis of abstracts contents.

Figure 4 .
Figure 4. Changes in the applications of artificial intelligence to cancer research during 1991-2018.

Figure 5
Figure 5 presents the hierarchical clustering of research disciplines used in AI and cancer research.The horizontal axis of the dendrogram represents the distance or dissimilarity between clusters.The vertical axis represents the research disciplines.It shows that AI applications in cancer care are rooted in the following disciplines: robotics, multidisciplinary engineering, and multidisciplinary sciences.Imaging science and photography was very close to oncology, obstetrics and gynecology, dentistry, radiology, and optics.Those biomedical and clinical aspects account for the major areas of AI application; meanwhile, health service-focused areas, for example, operations and management, are rather distant.

Figure 5 .
Figure 5. Dendogram of coincidence of research areas using the Web of Science classifications.

Table 1 .
Summary of data analytical techniques.
a WOS: Web of Science.

Table 2 .
General characteristics of publications.
a Mean citation rate per year=total citations/(total citations×[2018−that year]).b Total usage: total downloads.c Mean use rate for the last 6 months=total usage in the last 6 months/total number of papers.d Mean use rate for the last 5 years=total usage in the last 5 years/(total number of papers×5).JMIR Med Inform 2019 | vol.7 | iss. 4 | e14401 | p. 4 https://medinform.jmir.org/2019/4/e14401(page number not for citation purposes)

Table 3 .
Number of papers by countries as study settings (N=749).

Table 4 .
Top 50 research domains that emerged from the exploratory factor analysis of the content of all abstracts.
a RARP: robotic-assisted radical prostatectomy.b RALP: robot assisted laparoscopic prostatectomy.c PSA: prostate specific antigen.d SBRT: stereotactic body radiation therapy.e RARC: remittance advice remark code.f ANN: artificial neural network.g CT: computed tomography.