Published on in Vol 9, No 11 (2021): November

Preprints (earlier versions) of this paper are available at, first published .
Visualizing Knowledge Evolution Trends and Research Hotspots of Personal Health Data Research: Bibliometric Analysis

Visualizing Knowledge Evolution Trends and Research Hotspots of Personal Health Data Research: Bibliometric Analysis

Visualizing Knowledge Evolution Trends and Research Hotspots of Personal Health Data Research: Bibliometric Analysis

Authors of this article:

Jianxia Gong1 Author Orcid Image ;   Vikrant Sihag2 Author Orcid Image ;   Qingxia Kong3 Author Orcid Image ;   Lindu Zhao1 Author Orcid Image

Original Paper

1School of Economics and Management, Southeast University, Nanjing, China

2Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, Netherlands

3Department of Technology and Operations Management, Erasmus University Rotterdam, Rotterdam, Netherlands

Corresponding Author:

Lindu Zhao, PhD

School of Economics and Management

Southeast University

No 2 Sipailou

Nanjing, 210096


Phone: 86 2583793776

Fax:86 2583794731


Background: The recent surge in clinical and nonclinical health-related data has been accompanied by a concomitant increase in personal health data (PHD) research across multiple disciplines such as medicine, computer science, and management. There is now a need to synthesize the dynamic knowledge of PHD in various disciplines to spot potential research hotspots.

Objective: The aim of this study was to reveal the knowledge evolutionary trends in PHD and detect potential research hotspots using bibliometric analysis.

Methods: We collected 8281 articles published between 2009 and 2018 from the Web of Science database. The knowledge evolution analysis (KEA) framework was used to analyze the evolution of PHD research. The KEA framework is a bibliometric approach that is based on 3 knowledge networks: reference co-citation, keyword co-occurrence, and discipline co-occurrence.

Results: The findings show that the focus of PHD research has evolved from medicine centric to technology centric to human centric since 2009. The most active PHD knowledge cluster is developing knowledge resources and allocating scarce resources. The field of computer science, especially the topic of artificial intelligence (AI), has been the focal point of recent empirical studies on PHD. Topics related to psychology and human factors (eg, attitude, satisfaction, education) are also receiving more attention.

Conclusions: Our analysis shows that PHD research has the potential to provide value-based health care in the future. All stakeholders should be educated about AI technology to promote value generation through PHD. Moreover, technology developers and health care institutions should consider human factors to facilitate the effective adoption of PHD-related technology. These findings indicate opportunities for interdisciplinary cooperation in several PHD research areas: (1) AI applications for PHD; (2) regulatory issues and governance of PHD; (3) education of all stakeholders about AI technology; and (4) value-based health care including “allocative value,” “technology value,” and “personalized value.”

JMIR Med Inform 2021;9(11):e31142



Over the past 20 years, the use of patient medical information has rapidly increased in both clinical practice and research [1,2]. Improved access to personal health data (PHD), thanks to emerging technologies such as wearable devices, and mobile phones have improved health care delivery and physician–patient relationships, particularly for patients with noncommunicable chronic diseases [3]. PHD can play an important role in providing patient-centered rather than disease-centered health care by facilitating health care providers to learn about an individual’s medical history and current health status [4-6]. At the same time, this data-driven approach is helping to provide cost-effective and high-quality health care—known as value-based health care [7]. It is expected that PHD will continue to transform the health care industry.

PHD includes both clinical data (eg, electronic medical records [EMRs], electronic health records [EHRs], personal health records [PHRs]) and nonclinical data (eg, sentiments, emotions, characteristics, and social media behavior) [2]. Figure 1 shows the relationship between EMR, EHR, PHR, and PHD. EMR files are real-time electronic files including only clinical records that have replaced paper files; these are usually not sent to other health care providers outside the treating hospital or clinic [8]. This transition to electronic records signifies a great digital transition in the health care industry. The standardization of EHR has provided a repository of health information that has greatly facilitated interoperability between different institutions [2]. EHR usually belongs to health care organizations [9] and cannot be easily transmitted between different organizations because of different data standards and health information systems. To overcome this limitation, PHR was generated [6]. PHRs are electronic records of health-related information that conform to national interoperability standards and can be drawn from multiple sources (eg, EHRs, laboratory test results, smartphones, and wearable devices), while being managed, shared, and controlled by the individual [10].

Figure 1. PHR, EHR, and EMR relationships. EHR: electronic health record; EMR: electronic medical record; PHR: personal health record.
View this figure

Health care providers now have access to clinical data from EHR and patients’ self-reported health data (eg, test results, medication lists, allergies) from PHRs. However, they do not have access to the patients’ self-reported experiences, attitudes, feelings, and emotional states. The development of the internet of things and wearable devices means that PHD can also include nonclinical health-related data, such as daily physical activity and diets. Individuals are now sharing more and more detailed health information via social media platforms such as Twitter and through online health communities such as PatientsLikeMe [11]. Hill [12] defined PHD as any data related to an individual’s health condition [12], while Plastiras and O’Sullivan [13] viewed PHD as health data generated by patients during their daily life. In this study, PHD is defined as data related to clinical and nonclinical well-being, including EMR, EHR, PHR, and environment and social media data. Incorporating broader nonclinical PHD such as emotions and feelings has been shown to enhance personalized health care delivery [14,15].

PHD research has gained attention in various fields, including computer science, bioinformatics, medicine, and public health. Searching for the keyword “personal health data” on Web of Science shows that relevant articles on PHD have increased greatly (Multimedia Appendix 1). Several systematic reviews have been published on different topics associated with PHD (Table 1). These include security and privacy problems associated with EHR [16], data types and standardization [6], facilitators and barriers to using EHR in the United States [17,18], barriers to data sharing [19], and ethical issues of data collection [20]. Others have investigated factors affecting the use of PHR and big data applications of PHD [11,21,22]. While the PHD research literature grows rapidly, some scholars acknowledged the value of presenting comprehensive landscape and topic evolution process of PHD publications for researchers in various disciplines, in which bibliometric as a quantitative analysis method can be useful. Some scholars analyzed the status and detected the high-frequency terms of EHR [23-26]. Wen et al [27] analyzed the production trends of publications on EHRs by countries from 2009 to 2015. Wang et al [28] used bibliometric methods to compare publication hotspots in EHRs from different periods among 6 countries. The recent articles by Qian et al [29] and Zhenni and Yuxing [30] applied social network analysis and topic modeling methods to explore the EHR publications in-depth to evaluate the publications trends and detect the frontiers. However, these were mainly aimed at a specific type of health data: EHR. Karampela et al [2] used a systematic mapping approach to present the publication channel, publication year, and major research topics to provide a more complete overview of PHD research. However, it is not clear what phase each topic is in, how each topic is progressing, what knowledge trends are evolving, and which topics will become research hotspots.

This study aims to examine the evolving trends and to detect the potential research hotspots of PHD by identifying, classifying, and clustering PHD research topics from 2009 to 2018. We used knowledge evolution analysis (KEA) with bibliometric techniques to review articles retrieved from the Web of Science database. This study traces the evolution of PHD using knowledge networks based on reference co-citation, keyword co-occurrence, and discipline co-occurrence. Revealing the interrelationships between PHD research topics will provide a solid framework for future research. Table 2 presents the key questions that will be answered in this study.

Table 1. Comparison of literature reviews.
StudyResearch questionSample sizeTime rangeMethod
Archer et al [17]PHRsa design, functionality, implementation, application, outcomes, and benefits130Unlimited-2010Systematic review
Fernández-Alemán et al [16] Security and privacy in EHRsb492006-2011Systematic review
Van Panhuis et al [19]Barriers to data sharing65Unlimited-2013Systematic review
Kruse et al [18]Adoption factors of EHRs312012-2015Systematic review
Roehrs et al [6]Data types, standards, profiles, goals, methods, functions, and architecture with PHRs972008-2017Systematic review
Yin et al [11]Machine learning in online personal health data1032010-2018Systematic review
Maher et al [20]Ethical issues in passive data collection48Unlimited-2018Systematic review
Abd-alrazaq et al [21]Factors affecting the use of PHRs972000-2018Systematic review
Mehta and Pandit [22]Big data analytics in PHDc582013-2018Systematic review
Wang et al [28]Evolution of publication hotspots in EHRs17,6781957-2016Bibliometric method
Wen et al [27]Production trends of EHR18031991-2005Bibliometric method
Guo et al [23]Status, hotspots of EHR50952005-2010Bibliometric method
Liang et al [24]Status, directions of EHR12621990-2013Bibliometric method
Ruixian et al [25]Status of EMRd in China2621999-2004Bibliometric method
Zhenni and Yuxing [30]Hot spots in EHR13,4381900-2019Bibliometric method
Qian et al [29]Landscape, hot topics, trends of EHRs13,4381900-2019Bibliometric method
Lin et al [26] Status of EMR research in China17521999-2012Bibliometric method
Karampela et al [2]Publication source, publication year, research topic246Unlimited-2018Systematic mapping study
This studyKnowledge evolution trajectory of PHD, including EHR, PHR, and EMR82812009-2018Bibliometric method

aPHR: personal health record.

bEHR: electronic health record.

cPHD: personal health data.

dEMR: electronic medical record.

Table 2. Mapping questions.
Question and IDMapping questionRationale
MQ1a: References

MQ1.1How does the references co-citation network shape?To understand the main topics and the development of research topics in PHD.b

MQ1.2How has the knowledge cluster evolved?To identify which PHD topic has the most longevity and the newest hotspot.

MQ1.3What are the citation bursts of reference networks?To explore the emerging PHD research topic characterized by articles.
MQ2: Keywords

MQ2.1What are the keyword bursts in recent years?To explore the emerging research interests in PHD characterized by keywords.
MQ3: Disciplines

MQ3.1What does the discipline categories co-occurrence network shape?To identify the trends of discipline categories that are involved in PHD.

MQ3.2What are the discipline categories bursts?To explore the discipline categories that increased abruptly in PHD.

aMQ: mapping question.

bPHD: personal health data.

Data Collection

In 2009, the American Health Information Management Association launched a foundation program “Better health information for all” [2]. From then on, PHD research has developed greatly. Therefore, the time span for the retrieval is from 2009 to 2018 (The data collection was on March 8, 2019). In this review, we relied on scholarly publications in the Web of Science Core Collection, which covers over 21,000 science and social science journals and gives access to multiple databases that reference cross-disciplinary research. Web of Science has been long recognized as an ideal data source for bibliometric analysis.

To ensure the quality of the data set, we retrieved both original research articles and review articles from Science Citation Index Expanded and Social Science Citation Index. As there is no common definition for PHD, the following terms were searched in titles, abstracts, or keywords to identify PHD-related research in the Web of Science database: “personal health data”, “personal health record”, “electronic health record” or “electronic medical record”. In Web of Science, the “Topic Search” function returns results in titles, abstracts, or keywords. Thus, the search query was defined as follows:

TS(Topic)=(“personal health data” OR “personal health record” OR “electronic health record” OR “electronic medical record”) AND DT(Document Types)=(“Articles” OR “Review”) AND PY(Year Published)=(2009-2018).

This search yielded 8544 publications. After eliminating publications with replicated or incomplete retrieval data, 8281 records were left, 7855 (94.86%) of which were original articles and 426 (5.14%) review articles. The data set selection process follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow (Figure 2).

Figure 2. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart of data selection.
View this figure

Data Analysis


We used KEA to analyze the evolution of PHD research. The KEA followed a bibliometric approach, whereby each article is viewed as a knowledge resource. The relationships of various knowledge resources represent knowledge networks: reference co-citation, keyword co-occurrence, and discipline co-occurrence. These knowledge networks can be analyzed along the 3 dimensions of references, disciplines, and keywords using similarity-based clustering [31,32]. This combination of reference, keyword, and discipline networks represents a knowledge kernel, which is a three-dimensional space depicting the overall knowledge network of a research field (Figure 3). As such, the 3 knowledge networks present the evolution of a knowledge kernel along the 3 dimensions of references, disciplines, and keywords. Taken together, the 3 knowledge networks represent the knowledge evolution of a knowledge kernel. This approach is referred to as KEA. Besides, the burst detection technique was employed to identify emerging research hotspots.

Figure 3. 3D attributions of knowledge kernel.
View this figure

An article typically cites and is cited by many others. To identify the interrelationships between articles, reference co-citation analysis is commonly used. Co-citation analysis can only categorize part of the cited literature in a research field, so keyword co-occurrence and discipline co-occurrence techniques were also used to reveal information on other key topics. These 3 techniques can help analyze the dynamics of a research field over time and are discussed in detail below.

Reference Co-citation Network

Small [33] defined co-citation as “the frequency with which two items of earlier literature are cited together by the later literature”. The reference co-citation network was generated with a threshold of 4 or more co-citations [34], and the networks were divided into several clusters, with each network being labeled by terms extracted from the titles of the most representative citing articles [35]. This analysis shows how PHD research focus changes over time.

Keywords Co-occurrence Network

A list of predefined keywords represents the core idea of an article. Keyword co-occurrence refers to the statistical correlation between keywords that appear in the same article. A keyword co-occurrence network links keywords listed in the same article and presents the relationships between these keywords as a network map. The shortest distance between any 2 keywords that are not linked directly is viewed as the closeness of the 2 words [34]. The cluster formed by closely linked keywords represents a key subject domain of a research field. The burst detection algorithm shows how keywords emerge through frequency analysis to signify the most active PHD research hotspots over time [36].

Disciplines Co-occurrence Network

In this technique, each scientific article is assigned to 1 or more disciplines to calculate the statistical correlation between disciplines. When an article is assigned to 2 disciplines, these disciplines are related, and related disciplines combine to form a discipline co-occurrence network [37]. A burst detection algorithm can be used to detect the most active disciplines in PHD articles [36,38].

In this study, we used CiteSpace 5.2.R2, a bibliometric tool to analyze PHD articles [39].

In the following sections, we present the KEA of references, disciplines, and keywords in the published PHD research.

Reference Co-citation Network

We constructed a co-citation network of the top 100 most cited articles each year from 2009 to 2018. Clustering was performed using the log-likelihood ratio method. The analysis identified 15 major clusters. Silhouette values ≥0.7 indicate high similarity among articles in the same cluster, while modularity Q values ≥0.6662 indicate high differences between clusters [34].

Figure 4 shows the evolution trajectory of the PHD knowledge kernel based on the reference co-citation network. The colored bars at the top of the figure represent different years. The corresponding colored curves represent co-citations occurring in that year. The size of a node depicted with the citation “tree rings” represents the number of times an article was cited [34]. The networks are further decomposed into clusters as tightly coupled references. Each cluster is labeled using terms extracted from noun phrases in titles.

Figure 4. Co-citation clusters of references (Modularity Q=0.6662, Mean Sihouette=0.278, Selection Criteria=Top 100 per slice).
View this figure

From Figure 4, we can see that the most popular PHD research topics changed over time. Before 2013, knowledge clusters such as clusters 3 (clinical decision support), 5 (information technology diffusion), and 2 (EHR system) mainly focused on medicine and technology. From 2013 onward, the focus shifted to health care resource allocation, such as clusters 8 and 9, focusing on developing knowledge resources and allocating scarce resources. A closer examination of clusters 8 and 9 can be found in Multimedia Appendix 2. It lists articles with coverage ≥9%, which represents the percentage of members in each cluster that articles cite. To some extent, these articles are the most representative articles of each cluster. For example, the articles focusing on developing knowledge resources for precision medicine [40], use of EHRs for clinical decision [41,42], and review of an integrated clinical decision support system [43] are the most representative articles of cluster 8 (developing knowledge resource). Likewise, the articles focusing on scarce resource allocating for heart disease [44], a population-level EHR cohort study [45], and data science application in critical care [46] are the most representative articles of cluster 9 (allocating scarce resource). The major clusters are described in detail in Table 3.

Table 3. Description of co-citation clusters.a
Mean yearbCluster IDSizecSilhouetteLabel (LLRd)
200315100.984User groups perspective
20053690.805Clinical decision support
20055620.815Information technology diffusion
200510310.777Clinical documentation
20061720.872Integrative review
200613180.947Medication reconciliation issue
20074640.838Quality requirement
20076610.809Contingency factor
20102700.804EHRe systems
20107610.847Clinical decision support system
201012300.833Electronic health information exchange
20110950.849Genomic era
20138510.893Developing knowledge resource
20139430.950Allocating scarce resource

aThe connected components in cluster 14 are less than the default value (K=25), so CiteSpace did not report 14 [39].

bThe average year of the articles in a cluster.

cThe number of articles in each cluster.

dLLR: log-likelihood ratio.

eEHR: electronic health record.

Keyword Co-occurrence Network

Multimedia Appendix 3 shows the keyword co-occurrence networks. Multimedia Appendix 4 shows the 56 keywords with the strongest burst out of 100 keywords that were frequently cited each year between 2009 and 2018. This was performed using the “burst detection” function in CiteSpace. In 2009, keywords with the strongest burst mainly focused on basic PHD issues (eg, privacy, physician order entry, and standard) and medical issues (eg, diabetes mellitus, heart disease, blood pressure). Between 2010 and 2013, the keywords clinical information system, database, ambulatory care, personal health record had the strongest burst. Since 2013, burst keywords included attitude and satisfaction, implying that PHD research evolved from focusing on technology- and medicine-centered perspectives to focusing on human-centered perspectives. The most recent burst keywords (eg, readmission, emergency department, usability) appear to be likely PHD research hotspots, focusing on efficiency and quality of health care resources.

Discipline Co-occurrence Network

Figure 5 shows the evolution trajectory of the PHD knowledge kernel based on discipline co-occurrence networks. The size of a node represents the number of articles in a specific discipline. The links between nodes show interdisciplinary collaborations. The colors of links show when a connection was made for the first time. The tree rings represent the co-occurrence history of a discipline. The color of a circle ring denotes the time of corresponding citations. The largest node was health care sciences, followed by medical informatics, general and internal medicine, and computer science, indicating that these are the mainstream disciplines in PHD studies. Nodes with high betweenness centrality (indicated by the purple rim) [35], including health policy and services, psychology, and business and economics, may be pivotal to the paradigm shift of PHD research.

Figure 5. Disciplines co-occurrence network (2009–2018) (Pruning=Pathfinder, Node=91, Density=0.0576, Selection Criteria= Top 60 per slice).
View this figure

Disciplines with the strongest burst are shown in Multimedia Appendix 5. Management was at the top of the list with a burst strength of 4.4358 between 2009 and 2011. Before 2013, most research hotspots, such as biochemistry and molecular biology, dentistry, and oral surgery and medicine, were medicine and biology disciplines. From 2013 to 2016, various technologies were combined into PHD research, including computer science (artificial intelligence [AI]) and medical laboratory technology. Since 2016, substance abuse and psychology disciplines have become more popular in PHD research. Psychology had a relatively high burst strength (6.5215) and appears to be a significant discipline for future research. Social sciences also had a strong burst (4.8105) for the longest time, making it a central focus of PHD research.

Principal Findings

To the best of our knowledge, this is the first systematic review to show how PHD research has evolved and which research areas are potential hotspots. We examined the PHD knowledge kernel in 3 networks—reference co-citation, keyword co-occurrence, and discipline co-occurrence—to unveil how knowledge clusters evolved, which subjects are key, and which disciplines are being studied in PHD research. The proposed KEA framework can be extended to other similar interdisciplinary research areas. This is also the first study to focus on all types of PHD, including EMR, EHR, and PHR; previous reviews have focused on 1 type of health data. Lastly, this study included a large number of articles (8281 articles) and was not restricted to specific research questions or research types.

The reference co-citation network revealed that PHD research mainly focused on medicine and technology issues (eg, clinical decision systems) before 2013. From 2013 onward, the focus shifted toward developing knowledge resources and allocating scarce health care resources. The results also suggest that from 2013 onward, research communities have been actively seeking methods to make meaningful use of PHD. The overall trend of EHR research mirrors the previous finding of Qian et al [29] that EHR research has evolved from the adoption of EHR to higher-level application and integration of EHR. A well-cited publication is one from Blumenthal and Tavenner [47], which briefs about how EHR benefits patients and caregivers. Other studies have explored the benefits of clinical decision support systems based on EHR as well as barriers to using EHR [18,48,49]. Moreover, the application of PHD in medical research has evolved with technological development. At first, EHR-based clinical decision support systems were mainly used to diagnose and treat specific diseases such as diabetes and heart disease [50]. Later on, more effort was made to develop and systematically incorporate health care data to improve genomics and precision medicine [40].

The reference co-citation network also showed that the most active PHD knowledge cluster is developing knowledge resources and allocating scarce resources. This is supported by the analysis of the keywords that shows PHD studies focusing on emergency health care typically involve the application of the latest knowledge and use of scarce resources [44,46]. The co-citation analysis also demonstrated that the focus of PHD research is moving away from improving treatment decisions to optimizing resource distribution to different groups. This pertains to the allocative value of value-based health care, which aims to equalize resource allocation and improve health care outcomes between different groups [51], thereby improving health care services. In line with the aforementioned, AI applications have proven to be effective, especially in image interpretation [52,53] and diagnosis [54,55]. During the COVID-19 pandemic, the AI system played an important role in rapid early detection and diagnosis [56,57]. AI also can help in optimizing treatment regimens, prevention strategies, and allocation of scarce health resources to narrow down the inequality in health care, especially in resource-poor settings attributed to the shortage of human resources and medical devices [58]. These findings suggest that it is necessary to improve the equity in health resource allocation. Notably, value-based health care and AI applications should be given more attention.

The keyword co-occurrence analysis revealed that technical issues such as data privacy, data standardization, data quality, and interoperability between different information systems were studied first, which makes sense as these are initial and critical steps for using PHD. Data quality is important because it ensures the accuracy of the information provided. Interoperability between information systems is also important for information exchange. Privacy protection encourages people to share their health data. The importance of these technical issues has been well supported by other systematic reviews [6,16,59,60]. These findings suggest that adequate processes for collecting PHD are prerequisites for the utilization of PHD and more effort should be put in place at the initial stage of data standardization and optimizing interoperability.

The bursts in topics related to psychology and human factors (eg, attitude, satisfaction, education) indicate the switch from technology-centric issues to more human-centric issues in PHD studies. The study by Blumenthal [4] and Meier [61] showed that meaningful use of PHD requires more attention to education, attitude, and satisfaction of all the stakeholders. Patient satisfaction is critical for successful health care and depends on quality, communication, and interpersonal interactions with health care providers [62]. Moreover, as AI-based technology including machine learning, natural language processing, and artificial networks is integrated into health care more deeply, the “black box” algorithms have raised concerns about technology liability as well as patient and clinician trust [57,63]. Further research on regulatory issues and governance of PHD is therefore recommended.

Our findings also supported the unified theory of acceptance and use of technology [64], which comprises 4 key elements (ie, performance expectancy, effort expectancy, social influence, and facilitating conditions) that influence how we use technology. These elements are related to how humans interact with technology and make sure that technology creates value for patients, physicians, and administrators, which eventually improves satisfaction. As technologies (eg, AI, internet of things) are now widely used in health care, these issues are gaining more importance [65]. The aforementioned human factors reflect the notion of “personalized value,” another dimension of value-based health care, which emphasizes that every patient should be fully informed about the benefits and risks of treatments [66]. Therefore, the technology developer and health care institutions need to consider these human factors for the effective adoption of PHD-related technology.

The discipline co-occurrence analysis revealed the evolution of PHD research over various disciplines over the past 10 years with a more recent focus on computer science, including AI, machine learning, and deep learning. This agrees with the notion that computer science can increase the value of PHD [11,67]. Yin et al [11] reviewed the effectiveness of machine learning technology in personal health investigations based on online PHD [11], and Payrovnaziri et al [68] conducted a review of AI models that use EHR data. Hou et al [36] pointed out that AI could be used not only as a screening tool to interpret radiology images but also to interpret these images with greater consistency than humans can. Moreover, AI-based technology has the potential to improve efforts toward precision medicine. Tran et al [69] stated that AI technology leverages individual health data and data science to enhance prognosis, diagnosis, and rehabilitation. Regardless of the specific technique or function, the general aim of these technologies is to ease the shortage of human and device resources and optimize the allocation of scarce health care resources. This notion of effective technology application within PHD research presents another dimension of value-based health care known as “technical value” [70]. These findings suggest that all stakeholders should be educated about AI technology to promote value generation through PHD.

Overall, our results indicate that health data analytics should go beyond improving decision-making processes to providing better results for populations [71]. In line with this, PHD research is transitioning toward a more human-centric approach with a new focus on value-based health care: “allocative value,” “technology value,” and “personalized value” [70]. These findings indicate that PHD research has the potential to meet the triple aims of value-based health care in the future.


There are some limitations to this review. First, the scope of the data is limited by the source (the Web of Science) and the search items used. This study did not use “sentiments,” “emotions,” and “social media data” for data set search, as they are not well-defined terminologies or keywords, which might bias the data set. An iterative query refinement would improve the quality of the data set, although the search strategy adequately met the study purpose. Second, the results present an overview of how structure and knowledge have evolved in PHD research; however, details on more specific research topics are lacking. Researchers need to explore this in detail using additional methods and other scholarly publications. Topics to address include health care inequity and cost-effective health care through joint efforts of professional health care networks and patient networks [72]. Third, the co-citation networks rely on citation relationships between articles. While some citations reflect a strong connectedness, other citations might reflect a weaker connectedness. Further research is needed to distinguish between different kinds of citations.


This study used KEA to review the evolution of PHD research and identify research hotspots. The results show that the focus of PHD research has evolved from medicine centric to technology centric, to human centric since 2009. PHD is applied to optimize the allocation of scarce health care resources and to improve the quality and efficiency of health care services. Moreover, AI-based technology is becoming more relevant in PHD research, and that this technology may be used to ease the shortage of human and device resources. Furthermore, PHD research is now paying more attention to topics related to psychology and human factors, such as education, attitude, and satisfaction of stakeholders. These findings indicate opportunities for interdisciplinary cooperation in several PHD research areas: (1) AI applications for PHD; (2) regulatory issues and governance of PHD; (3) education of all stakeholders about AI technology; (4) value-based health care including “allocative value,” “technology value,” and “personalized value.”


This work was funded by the National Natural Science Foundation of China (No. 71671039).

Authors' Contributions

All authors have made a substantial intellectual contribution to this study. JG, VS, QK, and LZ designed the study together. JG performed the database searches and data analysis. JG wrote the first draft of the manuscript with the support of VS, QK, and LZ. QK and VS commented on the draft and added to the revisions of the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

None declared.

Multimedia Appendix 1

The annual number of published articles on personal health data in Web of Science (2009–2018).

DOCX File , 52 KB

Multimedia Appendix 2

A list of articles that contributed to clusters #8 and #9.

DOCX File , 19 KB

Multimedia Appendix 3

Keywords co-occurrence network.

DOCX File , 341 KB

Multimedia Appendix 4

The 56 keywords with the strongest burst (2009-2018).

DOCX File , 23 KB

Multimedia Appendix 5

The 15 disciplines with the strongest burst (2009–2018).

DOCX File , 18 KB

  1. Meier CA, Fitzgerald MC, Smith JM. eHealth: extending, enhancing, and evolving health care. Annu Rev Biomed Eng 2013;15:359-382. [CrossRef] [Medline]
  2. Karampela M, Ouhbi S, Isomursu M. Personal health data: A systematic mapping study. Int J Med Inform 2018 Oct;118:86-98. [CrossRef] [Medline]
  3. Bietz MJ, Bloss CS, Calvert S, Godino JG, Gregory J, Claffey MP, et al. Opportunities and challenges in the use of personal health data for health research. J Am Med Inform Assoc 2016 Apr;23(e1):e42-e48 [FREE Full text] [CrossRef] [Medline]
  4. Blumenthal D. Launching HITECH. N Engl J Med 2010 Feb 04;362(5):382-385. [CrossRef] [Medline]
  5. Liu L, Stroulia E, Nikolaidis I, Miguel-Cruz A, Rios Rincon A. Smart homes and home health monitoring technologies for older adults: A systematic review. Int J Med Inform 2016 Jul;91:44-59. [CrossRef] [Medline]
  6. Roehrs A, da Costa CA, Righi RDR, de Oliveira KSF. Personal Health Records: A Systematic Literature Review. J Med Internet Res 2017 Jan 06;19(1):e13 [FREE Full text] [CrossRef] [Medline]
  7. Betancourt JR. In pursuit of high-value healthcare: the case for improving quality and achieving equity in a time of healthcare transformation. Front Health Serv Manage 2014;30(3):16-31. [Medline]
  8. Miller RH, Sim I. Physicians' use of electronic medical records: barriers and solutions. Health Aff (Millwood) 2004;23(2):116-126. [CrossRef] [Medline]
  9. Madden JM, Lakoma MD, Rusinak D, Lu CY, Soumerai SB. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J Am Med Inform Assoc 2016 Nov;23(6):1143-1149 [FREE Full text] [CrossRef] [Medline]
  10. Kahn JS, Aulakh V, Bosworth A. What it takes: characteristics of the ideal personal health record. Health Aff (Millwood) 2009;28(2):369-376. [CrossRef] [Medline]
  11. Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019 Jun 01;26(6):561-576 [FREE Full text] [CrossRef] [Medline]
  12. Segen JC. McGraw Hill Concise Medical Dictionary of Modern Medicine. New York, NY: McGraw-Hill Companies, Inc; 2002:353-354.
  13. Plastiras P, O'Sullivan D. Exchanging personal health data with electronic health records: A standardized information model for patient generated health data and observations of daily living. Int J Med Inform 2018 Dec;120:116-125. [CrossRef] [Medline]
  14. Moorhead SA, Hazlett DE, Harrison L, Carroll JK, Irwin A, Hoving C. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013 Apr 23;15(4):e85 [FREE Full text] [CrossRef] [Medline]
  15. Hassanalieragh M, Page A, Soyata T, Sharma G, Aktas M, Mateos G, et al. Health Monitoring and Management Using Internet-of-Things (IoT) Sensing with Cloud-Based Processing: Opportunities and Challenges. New York, NY: IEEE; 2015 Jun Presented at: SCC '15: Proceedings of the 2015 IEEE International Conference on Services Computing; June 27, 2015 to July 2, 2015; New York, NY p. 285-292. [CrossRef]
  16. Fernández-Alemán JL, Señor IC, Lozoya PÁO, Toval A. Security and privacy in electronic health records: a systematic literature review. J Biomed Inform 2013 Jun;46(3):541-562 [FREE Full text] [CrossRef] [Medline]
  17. Archer N, Fevrier-Thomas U, Lokker C, McKibbon KA, Straus SE. Personal health records: a scoping review. J Am Med Inform Assoc 2011;18(4):515-522 [FREE Full text] [CrossRef] [Medline]
  18. Kruse CS, Kothman K, Anerobi K, Abanaka L. Adoption Factors of the Electronic Health Record: A Systematic Review. JMIR Med Inform 2016 Jun 01;4(2):e19 [FREE Full text] [CrossRef] [Medline]
  19. van Panhuis WG, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst AJ, et al. A systematic review of barriers to data sharing in public health. BMC Public Health 2014 Nov 05;14:1144 [FREE Full text] [CrossRef] [Medline]
  20. Maher NA, Senders JT, Hulsbergen AFC, Lamba N, Parker M, Onnela J, et al. Passive data collection and use in healthcare: A systematic review of ethical issues. Int J Med Inform 2019 Sep;129:242-247. [CrossRef] [Medline]
  21. Abd-Alrazaq AA, Bewick BM, Farragher T, Gardner P. Factors that affect the use of electronic personal health records among patients: A systematic review. Int J Med Inform 2019 Jun;126:164-175. [CrossRef] [Medline]
  22. Mehta N, Pandit A. Concurrence of big data analytics and healthcare: A systematic review. Int J Med Inform 2018 Dec;114:57-65. [CrossRef] [Medline]
  23. Guo H, Dai T, Hu H. Research Status, Hotspots and Trends of Electronic Health Records: Bibliometric Analysis Based on PubMed Database. China Digit Med 2011;8:e1. [CrossRef]
  24. Liang Z, Yong L, Rui Z, Tingting H, Jialin L. Bibliometrics on Electronic Health Records of Web of Science. Chinese J Evidence-Based Med 2013;13(11):1307-1312. [CrossRef]
  25. Ruixian Y, Yu C, Haoyu L. Bibliometric Analysis on Researches of Electronic Medical Records in China. Inf Res 2015;11(217):18-21. [CrossRef]
  26. Lin D, Liu J, Zhang R, Li Y, Huang T. [Application Status of Evaluation Methodology of Electronic Medical Record: Evaluation of Bibliometric Analysis]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2015 Apr;32(2):350-356. [Medline]
  27. Wen H, Ho Y, Jian W, Li H, Hsu YE. Scientific production of electronic health record research, 1991-2005. Comput Methods Programs Biomed 2007 May;86(2):191-196. [CrossRef] [Medline]
  28. Wang Y, Zhao Y, Dang W, Zheng J, Dong H. The Evolution of Publication Hotspots in Electronic Health Records from 1957 to 2016 and Differences Among Six Countries. Big Data 2020 Apr 01;8(2):89-106. [CrossRef] [Medline]
  29. Qian Y, Ni Z, Gui W, Liu Y. Exploring the Landscape, Hot Topics, and Trends of Electronic Health Records Literature with Topics Detection and Evolution Analysis. IJCIS 2021;14(1):744. [CrossRef]
  30. Zhenni N, Yuxing Q. The Status, Hot Topics in the Field of Electronic Health Records: A Literature Review Based on Lda2vec. New York, NY: Association for Computing Machinery; 2020 Presented at: JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020; August 1-5, 2020; Virtual Event p. 479-480. [CrossRef]
  31. Liu L, Mei S. Visualizing the GVC research: a co-occurrence network based bibliometric analysis. Scientometrics 2016 Aug 20;109(2):953-977. [CrossRef]
  32. Boyack KW, Small H, Klavans R. Improving the accuracy of co-citation clustering using full text. J Am Soc Inf Sci Tec 2013 Jul 19;64(9):1759-1767. [CrossRef]
  33. Small H. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci 1973 Jul;24(4):265-269. [CrossRef]
  34. Chen C, Dubin R, Kim MC. Emerging trends and new developments in regenerative medicine: a scientometric update (2000 - 2014). Expert Opin Biol Ther 2014 Sep;14(9):1295-1317. [CrossRef] [Medline]
  35. Chen C. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci 2006 Feb 01;57(3):359-377. [CrossRef]
  36. Hou J, Yang X, Chen C. Emerging trends and new developments in information science: a document co-citation analysis (2009–2016). Scientometrics 2018 Mar 7;115(2):869-892. [CrossRef]
  37. Liu Z, Yin Y, Liu W, Dunford M. Visualizing the intellectual structure and evolution of innovation systems research: a bibliometric analysis. Scientometrics 2015 Jan 22;103(1):135-158. [CrossRef]
  38. Kleinberg J. Bursty and hierarchical structure in streams. Data Min Knowl Discov 2003;7(4):373-397. [CrossRef]
  39. Chen C. CiteSpace: A Practical Guide for Mapping Scientific Literature. Hauppauge, NY: Nova Science Publishers; 2016:26-39.
  40. Hoffman JM, Dunnenberger HM, Kevin Hicks J, Caudle KE, Whirl Carrillo M, Freimuth RR, et al. Developing knowledge resources to support precision medicine: principles from the Clinical Pharmacogenetics Implementation Consortium (CPIC). J Am Med Inform Assoc 2016 Jul;23(4):796-801 [FREE Full text] [CrossRef] [Medline]
  41. Hicks JK, Dunnenberger HM, Gumpper KF, Haidar CE, Hoffman JM. Integrating pharmacogenomics into electronic health records with clinical decision support. Am J Health Syst Pharm 2016 Dec 01;73(23):1967-1976 [FREE Full text] [CrossRef] [Medline]
  42. Caraballo P, Bielinski S, St Sauver JL, Weinshilboum R. Electronic Medical Record-Integrated Pharmacogenomics and Related Clinical Decision Support Concepts. Clin Pharmacol Ther 2017 Aug 26;102(2):254-264. [CrossRef] [Medline]
  43. Hinderer M, Boeker M, Wagner SA, Lablans M, Newe S, Hülsemann JL, et al. Integrating clinical decision support systems for pharmacogenomic testing into clinical routine - a scoping review of designs of user-system interactions in recent system development. BMC Med Inform Decis Mak 2017 Jun 06;17(1):81 [FREE Full text] [CrossRef] [Medline]
  44. Amarasingham R, Patel PC, Toto K, Nelson LL, Swanson TS, Moore BJ, et al. Allocating scarce resources in real-time to reduce heart failure readmissions: a prospective, controlled study. BMJ Qual Saf 2013 Dec;22(12):998-1005 [FREE Full text] [CrossRef] [Medline]
  45. Koudstaal S, Pujades-Rodriguez M, Denaxas S, Gho JMIH, Shah AD, Yu N, et al. Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people. Eur J Heart Fail 2017 Sep;19(9):1119-1127 [FREE Full text] [CrossRef] [Medline]
  46. Sanchez-Pinto LN, Luo Y, Churpek MM. Big Data and Data Science in Critical Care. Chest 2018 Nov;154(5):1239-1248 [FREE Full text] [CrossRef] [Medline]
  47. Blumenthal D, Tavenner M. The "meaningful use" regulation for electronic health records. N Engl J Med 2010 Aug 05;363(6):501-504. [CrossRef] [Medline]
  48. Kruse CS, Kristof C, Jones B, Mitchell E, Martinez A. Barriers to Electronic Health Record Adoption: a Systematic Literature Review. J Med Syst 2016 Dec;40(12):252 [FREE Full text] [CrossRef] [Medline]
  49. Middleton B, Bloomrosen M, Dente MA, Hashmat B, Koppel R, Overhage JM, American Medical Informatics Association. Enhancing patient safety and quality of care by improving the usability of electronic health record systems: recommendations from AMIA. J Am Med Inform Assoc 2013 Jun;20(e1):e2-e8 [FREE Full text] [CrossRef] [Medline]
  50. Bright TJ, Wong A, Dhurjati R, Bristow E, Bastian L, Coeytaux RR, et al. Effect of clinical decision-support systems: a systematic review. Ann Intern Med 2012 Jul 03;157(1):29-43 [FREE Full text] [CrossRef] [Medline]
  51. Gray M. Value based healthcare. BMJ 2017 Jan 27;356:j437. [CrossRef] [Medline]
  52. Lakhani P, Prater AB, Hutson RK, Andriole KP, Dreyer KJ, Morey J, et al. Machine Learning in Radiology: Applications Beyond Image Interpretation. J Am Coll Radiol 2018 Feb;15(2):350-359. [CrossRef] [Medline]
  53. Nichols JA, Herbert Chan HW, Baker MAB. Machine learning: applications of artificial intelligence to imaging and diagnosis. Biophys Rev 2019 Feb;11(1):111-118 [FREE Full text] [Medline]
  54. Huang S, Yang J, Fong S, Zhao Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett 2020 Feb 28;471:61-71. [CrossRef] [Medline]
  55. Szolovits P, Patil RS, Schwartz WB. Artificial intelligence in medical diagnosis. Ann Intern Med 1988 Jan 01;108(1):80-87. [CrossRef] [Medline]
  56. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun 2020 Oct 09;11(1):5088 [FREE Full text] [CrossRef] [Medline]
  57. Vaishya R, Javaid M, Khan IH, Haleem A. Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr 2020;14(4):337-339 [FREE Full text] [CrossRef] [Medline]
  58. Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Health 2018;3(4):e000798 [FREE Full text] [CrossRef] [Medline]
  59. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013 Jan 01;20(1):144-151 [FREE Full text] [CrossRef] [Medline]
  60. Häyrinen K, Saranto K, Nykänen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform 2008 May;77(5):291-304. [CrossRef] [Medline]
  61. Meier C. A role for data: an observation on empowering stakeholders. Am J Prev Med 2013 Jan;44(1 Suppl 1):S5-11 [FREE Full text] [CrossRef] [Medline]
  62. Venkatesh V, Zhang X, Sykes TA. “Doctors Do Too Little Technology”: A Longitudinal Field Study of an Electronic Healthcare System Implementation. Information Systems Research 2011 Sep;22(3):523-546. [CrossRef]
  63. Rigby M. Ethical dimensions of using artificial intelligence in health care. AMA J Ethics American Medical Association 2019;21(2):121-124. [CrossRef]
  64. Venkatesh V, Thong JYL, Xu X. Consumer Acceptance and Use of Information Technology: Extending the Unified Theory of Acceptance and Use of Technology. MIS Quarterly 2012;36(1):157. [CrossRef]
  65. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019 Jan;25(1):30-36 [FREE Full text] [CrossRef] [Medline]
  66. Gray M, Jani A. Promoting Triple Value Healthcare in Countries with Universal Healthcare. Healthc Pap 2016;15(3):42-48. [Medline]
  67. Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc 2018 Oct 01;25(10):1419-1428 [FREE Full text] [CrossRef] [Medline]
  68. Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, et al. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc 2020 Jul 01;27(7):1173-1185 [FREE Full text] [CrossRef] [Medline]
  69. Tran BX, Nghiem S, Sahin O, Vu TM, Ha GH, Vu GT, et al. Modeling Research Topics for Artificial Intelligence Applications in Medicine: Latent Dirichlet Allocation Application Study. J Med Internet Res 2019 Nov 01;21(11):e15511 [FREE Full text] [CrossRef] [Medline]
  70. Kerr DJ, Jani A, Gray SM. Strategies for Sustainable Cancer Care. Am Soc Clin Oncol Educ Book 2016;35:e11-e15 [FREE Full text] [CrossRef] [Medline]
  71. van de Klundert J. Healthcare analytics: big data, little evidence. In: Tutorials in Operations Research. Catonsville, MD: The Institute for Operations Research and the Management Sciences (INFORMS); Nov 2016:1-22.
  72. Patrício L, de Pinho NF, Teixeira JG, Fisk RP. Service Design for Value Networks: Enabling Value Cocreation Interactions in Healthcare. Service Science 2018 Mar;10(1):76-97. [CrossRef]

AI: artificial intelligence
EHR: electronic health record
EMR: electronic medical record
KEA: knowledge evolution analysis
PHD: personal health data
PHR: personal health record
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Edited by G Eysenbach; submitted 11.06.21; peer-reviewed by Q Chen; comments to author 05.07.21; revised version received 17.08.21; accepted 17.09.21; published 01.11.21


©Jianxia Gong, Vikrant Sihag, Qingxia Kong, Lindu Zhao. Originally published in JMIR Medical Informatics (, 01.11.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.