Abstract
Background: Chronic diseases present significant challenges in health care, requiring effective management to reduce morbidity and mortality. While digital technologies like wearable devices and mobile applications have been widely adopted, large language models (LLMs) such as ChatGPT are emerging as promising technologies with the potential to enhance chronic disease management. However, the scope of their current applications in chronic disease management and associated challenges remains underexplored.
Objective: This scoping review investigates LLM applications in chronic disease management, identifies challenges, and proposes actionable recommendations.
Methods: A systematic search for English-language primary studies on LLM use in chronic disease management was conducted across PubMed, IEEE Xplore, Scopus, and Google Scholar to identify articles published between January 1, 2023, and January 15, 2025. Of the 605 screened records, 29 studies met the inclusion criteria. Data on study objectives, LLMs used, health care settings, study designs, users, disease management tasks, and challenges were extracted and thematically analyzed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines.
Results: LLMs were primarily used for patient-centered tasks, including patient education and information provision (18/29, 62%) of studies, diagnosis and treatment (6/29, 21%), self-management and disease monitoring (8/29, 28%), and emotional support and therapeutic conversations (4/29, 14%). Practitioner-centered tasks included clinical decision support (8/29, 28%) and medical predictions (6/29, 21%). Challenges identified include inaccurate and inconsistent LLM responses (18/29, 62%), limited datasets (6/29, 21%), computational and technical (6/29, 21%), usability and accessibility (9/29, 31%), LLM evaluation (5/29, 17%), and legal, ethical, privacy, and regulatory (10/29, 35%). While models like ChatGPT, Llama, and Bard demonstrated use in diabetes management and mental health support, performance issues were evident across studies and use cases.
Conclusions: LLMs show promising potential for enhancing chronic disease management across patient and practitioner-centered tasks. However, challenges related to accuracy, data scarcity, usability, and ethical concerns must be addressed to ensure patient safety and equitable use. Future studies should prioritize the integration of LLMs with low-resource platforms, wearable and mobile technologies, developing culturally and age-appropriate interfaces, and establishing robust regulatory and evaluation frameworks to support safe, effective, and inclusive use in health care.
doi:10.2196/66905
Keywords
Introduction
Chronic diseases, such as diabetes, heart disease, asthma, lung disease, depression, hypertension, Alzheimer disorder, and cancer, are a significant global burden on health care systems [
- ]. These conditions often lead to long-term health issues and have profound physical, psychological, and social impacts on patients [ , ]. Chronic diseases demand continuous, personalized care, often resource-intensive and difficult to scale [ ]. Therefore, disease management, which encompasses screenings, regular check-ups, monitoring, coordination of treatment, medication adherence, lifestyle modifications, and patient education, is crucial for improving patient outcomes, enhancing quality of life, and reducing the overall burden on health care systems [ ]. However, the resource-intensive nature of personalized, continuous care often makes it inaccessible to many patients, particularly in underserved populations where limited access to health care professionals and resources creates significant barriers to effective disease management [ ].In recent years, the use of digital technologies such as wearable devices [
], mobile apps [ ], and chatbots [ , ] has grown significantly in the management of chronic diseases. These have mainly been used for health care tasks, including patient education, symptom monitoring, and medication management [ - ]. The recent emergence of large language models (LLMs) such as GPT, Palm, Llama, and LaMDA [ - ] has demonstrated growing potential in health care applications. These models have been applied in health care for tasks such as personalized treatment recommendations, medical diagnosis, medical record summarization, and interpretation of clinical data to support clinical decision-making and disease management [ - ].Chronic disease management requires continuous monitoring, patient education, treatment coordination, and personalized care strategies [
]. Recent advancements in LLMs have introduced new possibilities for improving these tasks. For instance, ChatGPT has been explored in providing personalized health advice, enhancing patient engagement, and supporting symptom monitoring [ , ]. In diabetes management, GPT-based models have been investigated for interpreting continuous glucose monitoring data, providing personalized lifestyle recommendations [ ], and assessing individualized risk profiles for complications such as retinopathy [ ]. Beyond diabetes, LLMs such as LLaMA and GPT have been investigated for mental health support [ ], blood pressure measurement using wearable bio signals [ ], management of sickle cell anemia [ ], and dissemination of information on inflammatory bowel diseases to patients and health care professionals [ ].Despite these applications, several challenges affect the effectiveness of LLMs in chronic disease management, including inaccurate responses, limited and biased datasets, and ethical concerns [
, , ]. These issues raise concerns regarding the accuracy, reliability, and clinical applicability of LLM-generated recommendations [ ]. Given these challenges, a comprehensive review is essential to assess the current applications, identify key limitations, and propose strategies to enhance the effectiveness and safety of LLMs in chronic disease care. While existing reviews explore LLMs in general health care, this scoping review uniquely focuses on their role in chronic disease management. It synthesizes evidence across patient- and practitioner-centered applications, domain-specific challenges, and provides actionable recommendations. Specifically, this review aims to evaluate the current applications of LLM in chronic disease management tasks, identify key challenges, and provide actionable recommendations to address identified challenges.Methods
Search Strategy and Information Sources
This scoping review explored the use of LLMs in chronic disease management following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) [
]. A comprehensive search was conducted in PubMed, Scopus, IEEE Xplore, and Google Scholar, selected for their coverage of peer-reviewed medical, technical, and AI-related research. Google Scholar was included to capture a broad range of academic publications, including preprints and conference papers that may not be covered by traditional databases. Search terms included combinations of “Large language models,” “LLMs,” “ChatGPT,” “chronic diseases,” and “chronic disease management.” The search targeted both published and unpublished English-language articles from January 1, 2023, to January 15, 2025, ensuring a focus on recent advancements in LLM applications in health care.Article Selection
Studies were included if they focused on applications of LLM in chronic disease management, provided full-text access, were published in English, and appeared between January 2023 and January 15, 2025. Exclusion criteria eliminated nonprimary research (reviews, editorials, viewpoints, and commentaries), abstracts without full text, non-English publications, and articles outside the date range. To capture emerging research and potentially studies, reputable non-peer–reviewed preprints from established repositories such as arXiv and medRxiv were included.
illustrates the eligibility screening process with a decision tree.
Data Extraction and Synthesis
The data from selected studies were extracted into a structured form that captured study objectives, the LLMs used, health care settings, study designs, disease management tasks, identified challenges, evaluation methods, and target users. These extracted data points were selected to ensure a structured and objective analysis aligned with the study’s aims. Study objectives provided insight into the intended applications of LLMs in chronic disease management, while health care settings contextualized their use across clinical and patient-centered environments. Study design and evaluation methods were included to assess methodological rigor and the validity of findings. In addition, data on LLM models, disease management tasks, and key challenges enabled a systematic evaluation of current applications, limitations, and areas for future research.
The Mixed Methods Appraisal Tool (MMAT, version 2018) [
] was used to perform a formal methodological quality assessment. The MMAT was selected due to its flexibility in evaluating diverse study designs included in the review. Two reviewers independently appraised each study using the 5 criteria relevant to its methodological design, with discrepancies resolved through discussion and reached consensus on final ratings. Consistent with scoping review methodology, the studies were not excluded based on quality, but results inform the interpretation of findings [ ]. The findings from the included studies were synthesized and presented in alignment with the study objectives. A thematic analysis approach was used to categorize qualitative insights, grouping findings into patient-centered tasks, practitioner-centered tasks, and challenges. Discrepancies in data interpretation were resolved through consensus among the reviewers. Reference management and citation generation were conducted using Mendeley.Results
Included Studies
The PRISMA flow diagram (
) outlines each stage of the study selection process. The PRISMA-ScR framework was followed to ensure transparency and reproducibility, incorporating detailed search strategies and clearly defined exclusion criteria. A total of 446 records were identified from Google Scholar, 75 from Scopus, 56 from PubMed, and 28 from IEEE Xplore. After removing duplicates, 242 unique records underwent title and abstract screening, resulting in 61 articles for full-text review. Following the application of eligibility criteria, 29 articles were included in the final analysis.
Characteristics of Included Articles
provides the characteristics of the 29 articles included in this scoping review. The articles were categorized based on publication type, with the majority of the studies being journal articles (n=18), followed by conference papers (n=8), and preprints (n=3). The studies used a variety of research designs, including experimental (n=10), qualitative research (n=3), comparative studies (n=4), cross-sectional study (n=2), case study (n=1), observational (n=3), retrospective cohort designs (n=3), mixed methods (n=1), pilot study (n=1), and prospective study (n=1). provides more details on the characteristics of the reviewed studies.
Study | Country | Article category | Study objective | Health care setting | Study design | Evaluation |
Montagna et al [ | ]Italy | Conference paper | To design and implement a system architecture for a chatbot-based home blood pressure monitoring solution. | Home care setting | Experimental (system design and prototype development) | Human evaluation |
Yang et al [ | ]China | Preprint | To explore the application of a fine-tuned model-based outpatient treatment support system for treating patients with diabetes and to evaluate its effectiveness and potential value. | Clinical (West China Hospital) home care | Experimental (fine-tuning) | Human evaluation and automated evaluation metrics |
Raghu et al [ | ]India | Journal article | To evaluate the ability of ChatGPT to predict the diabetic retinopathy risk. | — | Comparative study | Human evaluation |
Song et al [ | ]The Republic of Korea | Preprint | To investigate the experiences of individuals using LLM | chatbots for mental health support.Korea Advanced Institute of Science and Technology | Qualitative study | Human evaluation |
Liu et al [ | ]China | Conference paper | Explore the use of LLMs for cuffless blood pressure measurement using wearable bio signals | Home care settings | Experimental (cuffless blood pressure measurement using LLMs) | Human evaluation |
Ogundare et al [ | ]Nigeria | Conference paper | To investigate the potential of LLMs in ambulatory devices for sickle cell anemia management. | Home care setting | Case study | Automated evaluation metrics |
Cankurtaran et al [ | ]Turkey | Journal article | To evaluate the performance of ChatGPT within the context of inflammatory bowel disease. | — | Cross-sectional study | Human evaluation |
Wang et al [ | ]China | Conference paper | To enhance the diagnosis and treatment of depression. | Clinical and homecare settings | Experimental (pre-training and fine-tuning) | Human evaluation |
Abdullahi et al [ | ]Germany | Journal article | To explore the potential of three popular Large Language Models in medical education to enhance the diagnosis of rare and complex diseases. | Home care setting | Qualitative study | Human evaluation and automated evaluation metrics |
Al Anezi [ | ]Saudi Arabia | Journal article | To analyze the use of ChatGPT as a virtual health coach for chronic disease management. | Home care setting | Quasi-experimental design | Human evaluation |
Athavale et al [ | ]United States | Journal article | To assess whether chatbots could assist with answering patient questions and electronic health record inbox management | Clinical (Division of Vascular Surgery, Stanford University School of Medicine in Palo Alto) | Experimental (chatbot assistance in chronic venous disease management) | Human evaluation |
Soto-Chávez et al [ | ]Colombia | Journal article | To evaluate the reliability and readability of Spanish chronic disease information presented to ChatGPT | — | Cross-sectional study | Human evaluation |
Abbas et al [ | ]Pakistan | Journal article | To assess the predictive accuracy of ChatGPT-assisted machine learning models for various chronic diseases. | Clinical (Tertiary hospital) | Observational study | Automated evaluation metrics |
Anderson et al [ | ]United States | Conference paper | The study aims to discover and rank novel relationships between various aspects of this condition. | — | Experimental | Human evaluation and automated evaluation metrics |
Ding et al [ | ]Taiwan | Conference paper | To develop and evaluate Large Language Multimodal Models that integrate clinical notes and laboratory test results for predicting the risk of chronic diseases, particularly type 2 diabetes mellitus | Clinical (Eastern Memorial Hospital in Taiwan | Retrospective cohort study | Automated evaluation metrics |
Jairoun et al [ | ]Malaysia | Journal article | To investigate the benefits and risks associated with the application of ChatGPT in managing diabetes and metabolic illnesses | — | Qualitative study | Human evaluation |
Mondal and Naskar [ | ]India | Journal article | To evaluate GPT-4’s competency in reviewing diabetic patient management plans compared to expert reviews. | General medical setting | Comparative study | Human evaluation and automated evaluation metrics |
Liu et al [ | ]N/S | Journal article | To leverage LLMs and multi-prompt engineering for chronic disease management, specifically for detecting mental disorders through user-generated textual content. | Online platforms | Experimental (few-shot learning) | Automated evaluation metrics |
Liao et al [ | ]Taiwan | Conference paper | To develop an EHR-based chronic disease prediction platform using LLMs for diabetes, heart disease, and hypertension. | Clinical (Far Eastern Memorial Hospital, Taiwan) | Retrospective cohort study | Automated evaluation metrics |
Ding et al [ | ]Taiwan | Journal article | Predict new-onset type 2 diabetes using large language multimodal models with EHR data | Clinical (Far Eastern Memorial Hospital, Taiwan) | Retrospective cohort study | Automated evaluation metrics |
Dao et al [ | ]Ireland and Singapore | Conference paper | Design and evaluate an AI chatbot system using GPT-3.5 for proactive diabetes prevention | Community-based setting | Experimental (AI design and evaluation) | Automated evaluation metrics |
Khan [ | ]United States | Journal article | Assess the efficacy of ChatGPT in facilitating self-management strategies for diabetic patients. | Outpatient diabetes care | Observational study | Human evaluation |
Mondal et al [ | ]India | Journal article | To evaluate the effectiveness of ChatGPT, an LLM, in providing answers to queries related to lifestyle-related diseases or disorders | Clinical and academic settings | Observational study | Human evaluation |
Young et al [ | ]United States | Journal article | Assess LLMs’ capacity to deliver age-appropriate explanations of chronic pediatric conditions to enhance patient understanding. | Clinical (Boston Children’s Hospital) | Pilot study | Human evaluation |
Li et al [ | ]China | Journal article | Develop DeepDR-LLM, an integrated AI system for primary diabetes care and diabetic retinopathy (DR) screening | Low-resource primary care settings | Experimental | Human evaluation and automated evaluation metrics |
Ying et al [ | ]China | Preprint | To evaluate the feasibility and utility of ChatGPT in diabetes education using retrospective and real-world patient questions. | Outpatient setting | Mixed methods | Human evaluation |
Li et al [ | ]China | Journal article | To evaluate the performance of LLMs in diabetes-related queries and their potential to assist in diabetes training for primary care physicians | Primary diabetes care, endocrinology, and diabetes management. | Prospective study | Human evaluation and automated evaluation metrics |
Hussain and Grundy [ | ]Australia | Journal article | Evaluate the responses of ChatGPT models to queries from diabetes patients, assessing their accuracy, biases, and limitations in providing self-management advice. | Home care setting | Comparative study | Human evaluation |
Wang et al [ | ]China | Journal article | To evaluate the potential of the RISE framework to improve LLMs’ performance in accurately and safely responding to diabetes-related inquiries. | Home care setting | Comparative study | Human evaluation |
aNot available.
b LLM: large language model.
As shown in
, most studies were conducted in China (7/29, 24%) and the United States (4/29, 14%), with limited representation of low-resource settings. Experimental designs were predominant (10/29, 35%), and nearly half of the studies (14/29, 48%) focused on diabetes management. The studies primarily focused on adult populations (28/29, 96%), with only 1 study (1/29, 4%) specifically addressing pediatric applications. Human evaluation was the most common evaluation method (16/29, 55%), followed by automated evaluation metrics (10/29, 35%) used in prototype evaluation, with some studies using both approaches (3/29, 10%). Studies were carried out in diverse health care settings, with home care settings (10/29, 35%) and clinical settings (9/29, 31%) being the most common. This diversity in study characteristics reflects the broad application of LLMs across various health care contexts for chronic disease management.Methodological Quality Assessment
Using the Mixed Methods Appraisal Tool [
], the studies were categorized as quantitative descriptive studies (19/29, 66%), comprizing observational, cross-sectional, case study, system development, prototype evaluation, and comparative analyses. Quantitative nonrandomized studies (3/29, 10%) included retrospective cohort and quasi-experimental designs, qualitative studies (5/29, 17%), and 2 studies with mixed methods design (2/29, 7%). Of the 29 studies, 18 studies (62.1%) were classified as high quality (meeting ≥4 of 5 criteria), 9 studies (31.1%) as moderate quality (meeting 2‐3 criteria), and 2 studies (6.9%) as low quality (meeting ≤1 criterion). The most common methodological limitations identified across studies included inadequate sampling strategy descriptions, limited participant demographic reporting, use of synthetic clinical data, and lack of external validation. Detailed quality appraisal results for each study are provided in . provides an overview of the LLM types, users, tasks, and challenges identified across the included studies.Study | LLM | Users | Disease management tasks | Challenges |
Montagna et al [ | ]GPT-3 | Individuals with hypertension | Patient engagement blood pressure monitoring and management. |
|
Yang et al [ | ]ChatGLM-6B | Patients diagnosed with diabetes | Treatment recommendations, suggesting appropriate laboratory tests, and medication |
|
Raghu et al [ | ]ChatGPT | Practitioner (ophthalmologist) | Patient education, medical reports: diagnoses and predictions |
|
Song et al [ | ]ChatGPT Llama | Individuals who have used LLM chatbots for mental health support |
|
|
Liu et al [ | ]Gemma-7B, Mistral-7B, Yi-6B, MedAlpaca-7B, LLaMA2-7B, LLaMA3-8B, Qwen2-7B, PalmyraMed-20B, PMCLLaMA13B, OpenBioLLM-8B | Patients with hypertension | Cardiovascular disease management via cuffless blood pressure measurement. |
|
Ogundare et al [ | ]Unspecified | Sickle cell patients and clinicians | Assessing anemia severity in real-time, predicting time to vaso-occlusive episodes, and communicating with emergency personnel. |
|
Cankurtaran et al [ | ]ChatGPT | Inflammatory bowel disease patients and health care professionals. |
|
|
Wang et al [ | ]LLaMA-7B, ChatGLM-6B Alpaca. | Individuals with depression. | Diagnosis and treatment of depression |
|
Abdullahi et al [ | ]Bard ChatGPT 3.5 and GPT-4 |
|
|
|
Al Anezi [ | ]ChatGPT | Outpatients with chronic diseases |
|
|
Athavale et al [ | ]ChatGPT 4.0 | Patient | Answered administrative and non-complex medical questions well, and electronic health record inbox management. Answering complex medical questions |
|
Soto-Chávez et al [ | ]ChatGPT | Patients with chronic diseases using the Spanish language | Evaluating the reliability and readability of ChatGPT-generated patient information on chronic diseases in Spanish. |
|
Abbas et al [ | ]GPT-3.5 | Machine Learning engineers, clinical researchers | Chronic disease prediction |
|
Anderson et al [ | ]GPT (Generative Pre-trained Transformer) | Practitioners | Discover and rank novel relationships between various aspects of chronic lower back pain. |
|
Ding et al [ | ]MedAlpaca |
| Early prediction of diabetes Prediction of multiclass chronic diseases |
|
Jairoun et al [ | ]ChatGPT | diabetes and metabolic illnesses, endocrinologists and diabetologists | Patient support and education Tailored treatment |
|
Mondal et al [ | ]GPT-4 | Health care professionals | Reviewing and evaluating diabetes management plans for guideline adherence |
|
Liu et al [ | ]GPT-2 and T5 | Patients with mental disorders | Detection of mental disorders (depression, anorexia, pathological gambling, self-harm) through user-generated textual content. |
|
Liao et al [ | ]BERT BiomedBERT Flan-T5-large-770M GPT-2 | Physicians, health care providers | Prediction of chronic diseases (diabetes, heart disease, hypertension) using EHR data. |
|
Ding et al [ | ]BERT, Roberta, BiomedBERT, Flan-T5, GPT-2 | Researchers, health care professionals | Predict new-onset T2DM, early detection, and risk assessment |
|
Dao et al [ | ]GPT-3.5 | Individuals at risk of diabetes or with prediabetes | Instant Q&A and advice Personalized reminders Data analysis for tailored guidance Health resource aggregation Emotional support |
|
Khan [ | ]ChatGPT | Diabetic patients | Real-time education and support Blood glucose monitoring guidance Medication adherence advice Lifestyle/diet recommendations Emergency detection |
|
Mondal et al [ | ]ChatGPT-4 | Patients and health care professionals | Answering patient questions (causes, symptoms, treatment, diet) Providing information on managing Crohn’s disease (CD) and ulcerative colitis (UC). Addressing professional queries (classification, diagnosis, disease activity, prognostic markers, complications) |
|
Young et al [ | ]GPT-4 Gemini 1.0 Ultra | Pediatric patients, health care providers, and caregivers | Generating explanations for chronic conditions |
|
li et al [ | ]LLaMA | Primary care physicians | Individualized diabetes management recommendations |
|
Ying et al [ | ]GPT-3.5 | Physicians, laypersons, and type 2 diabetes patients | Diabetes education and personalized Q&A support |
|
Li et al [ | ]ChatGPT-3.5 ChatGPT-4.0 Google Bard MedGPT LlaMA2-7B | Researchers Primary care physicians | Answering diabetes-related exam questions and assisting in diabetes training. |
|
Hussain and Grundy [ | ]ChatGPT-3.5 ChatGPT-4 | Diabetes patients and health care providers | Patient education, treatment recommendations, insulin management, dietary advice |
|
Wang et al [ | ]GPT-4 Anthropic Claude 2 Google Bard |
| Responding to diabetes-related inquiries and providing accurate and comprehensive information for diabetes self-management. |
|
a LLM: large language model.
As shown in
, GPT models were the most commonly used (14/29, 48%), followed by LLaMA variants (5/29, 17%), the Bard model (3/29, 10%), and BERT-based models (2/29, 7%). LLMs were primarily used for patient education and information provision (18/29, 62%), with most studies targeting patients (18/29, 62%) rather than health care providers (11/29, 38%). Inaccurate and inconsistencies in responses (18/29, 62 %) were the most frequently reported challenge across studies.Objective 1: The Tasks in Chronic Disease Management Performed by LLMs
Our literature synthesis revealed that LLMs have significant potential to improve various chronic disease management tasks. The tasks identified have been broadly categorized into patient-centered tasks and practitioner-centered tasks.
Patient-Centered Tasks
Patient Education and Information Provision
Eighteen studies (n=18) delved into the use of LLMs in providing health information to enhance patient health literacy [
, , , , , , , , , - ]. Key applications included using ChatGPT to provide personalized guidance on diabetes management [ , , , ], educational content for diabetic retinopathy and inflammatory bowel disease patients [ , ], generating age-appropriate explanations of chronic pediatric conditions [ ], and supporting physician training in diabetes management [ ]. In addition, LLMs supported treatment adherence through tools like ChatGPT and GPT-3.5, offering tailored medication reminders, appointment scheduling, and strategies for behavior change [ , ]. For Spanish-speaking populations, ChatGPT was evaluated for reliability and readability of chronic disease information [ ], while multiple ChatGPT versions were assessed for regional variations in diabetes education quality [ ].Diagnosis and Treatment
Six studies (n=6) examined the role of LLMs in assisting with diagnosis and treatment recommendations. These studies explored the potential of LLMs to suggest appropriate laboratory tests, generating differential diagnoses and medication options tailored to the individual patient’s condition [
, , , ]. Notable applications included enhancing depression diagnosis and treatment through fine-tuning of models like LLaMA-7B and ChatGLM-6B [ ], supporting the diagnosis of rare and complex diseases [ ], and detecting mental disorder patterns through analysis of user-generated content [ ]. Furthermore, integrating AI-driven diagnostic and treatment capabilities with diabetes management systems showed particular promise in low-resource primary care settings [ ].Self-Management and Disease Monitoring
Eight studies (n=8) addressed using LLMs for self-management and disease monitoring. These studies explored how LLMs provide guidance on managing chronic conditions, promote patient engagement, and support home disease monitoring [
, , , , , , , ]. Key applications included developing chatbot architectures for home blood pressure monitoring [ ], creating cuffless blood pressure measurement systems using wearable biosignals [ ], and assessing real-time disease severity in sickle cell anemia [ ]. LLMs also demonstrated value in detecting emergencies such as hypoglycemic episodes in diabetic patients and guiding appropriate actions [ ]. Additional applications encompassed monitoring tools for inflammatory bowel disease management [ ], personalized reminders for diabetes prevention [ ], and comprehensive health parameter tracking with feedback based on patient-shared data [ ]. These implementations highlight the potential of LLMs to enhance patient self-management through continuous monitoring and timely intervention guidance.Emotional Support and Therapeutic Conversations
Four studies (n=4) explored the role of LLMs in providing emotional support and engaging in therapeutic conversations for patients managing chronic diseases. The review identified several key applications, including investigating LLM chatbots for mental health support with tailored recommendations addressing specific stressors [
], evaluating ChatGPT as a virtual health coach identifying barriers to behavior change [ ], assessing GPT-3.5’s emotional support capabilities in proactive diabetes prevention [ ], and examining ChatGPT’s ability to provide coping strategies for diabetic patients [ ].Practitioner-Centered Tasks
Clinical Decision Support
Eight studies (n=8) investigated the use of LLMs for clinical decision support. The review identified several key applications, including generating personalized medical reports with treatment options and diagnostic procedures for conditions like diabetic retinopathy [
] and inflammatory bowel disease [ ], assessing LLMs’ diagnostic accuracy compared with human experts in rare and complex diseases [ ], and exploring potential use for electronic health record inbox management [ ]. Other applications included using GPT to discover and rank novel relationships between aspects of chronic lower back pain [ ], evaluating diabetes management plans using GPT-4 [ ], disease classification and prognosis [ ], and evaluating LLMs’ competency in answering diabetes-related exam questions for physician training [ ].Medical Predictions
Six studies (n=6) explored the predictive capabilities of LLMs in chronic disease management. The review identified several key applications, including predicting diabetic retinopathy risk [
], developing ChatGPT-assisted machine learning models for chronic disease classification [ ], and integrating multimodal data from electronic health records and laboratory tests to predict new-onset type 2 diabetes [ , ]. Additional applications included creating an EHR-based prediction platform for diabetes, heart disease, and hypertension [ ] and implementing integrated AI systems for diabetes risk assessment in primary care settings [ ].LLMs in chronic disease management are predominantly utilized for patient education and information provision, accounting for (18/29) of reported applications. Self-management and disease monitoring and clinical decision support each account for 28% (8/29) of applications. Diagnosis and treatment tasks, along with medical predictions, both constitute 21% (6/29) of applications, while emotional support and therapeutic conversations account for 14% (4/29). Percentages exceed 100% due to thematic overlaps where individual studies addressed multiple tasks.
LLMs in chronic disease management are predominantly utilized for patient education and information provision, accounting for (18/29) of reported applications. Self-management and disease monitoring and clinical decision support each account for 28% (8/29) of applications. Diagnosis and treatment tasks, along with medical predictions, both constitute 21% (6/29) of applications, while emotional support and therapeutic conversations account for 14% (4/29). Percentages exceed 100% due to thematic overlaps where individual studies addressed multiple tasks.
Objective 2: Challenges Associated With Using LLMs for Chronic Disease Management
The challenges identified and presented in
were categorized as follows.Inaccurate and Inconsistencies in Responses
Eighteen studies (n=18) highlighted issues with hallucinations, diagnostic errors, and unreliable outputs [
, , , - , - , - , , , - ]. The review identified several key challenges, including hallucinations where models like ChatGPT and LLaMA generate reasonable but factually incorrect information [ , ], diagnostic errors in conditions ranging from depression to inflammatory bowel disease [ , , ], and inconsistent responses to identical prompts without indicating uncertainty levels [ , , ]. In addition, models demonstrated limited understanding of complex medical records [ ], struggled with regional variations in medical practice [ ], provided insufficient elaboration on medical treatments [ ], and showed difficulty distinguishing medical terminologies [ ]. Further challenges included lower reliability in the Spanish language for specific chronic conditions like heart failure and chronic kidney disease [ ] as well as limited generalizability due to restricted training populations [ , ]. These inaccuracies stem from erroneous input data, such as incomplete or incorrect test results [ , , , , ].Limited Datasets and Knowledge
Six studies (n=6) identified challenges related to limited datasets and knowledge cutoffs in LLM applications for chronic disease management. The review highlighted several key limitations, including scarcity of disease-specific datasets [
, ], dataset imbalances affecting predictions for conditions like hypertension [ ], and knowledge limitations that restrict LLM awareness of the current medical guidelines [ , , ].Computational and Technical Challenges
Six studies (n=6) highlight significant computational and technical challenges in deploying LLMs for chronic disease management. The review identified several key limitations, including resource-intensive processing that results in prolonged training time in resource-constrained environments [
]. In addition, technical challenges include integrating multimodal data from clinical notes and laboratory results [ , ], ensuring model explainability for early disease prediction [ ], and handling noisy user-generated content in mental health applications [ , ]. Further challenges involve difficulties in integrating LLMs into clinical workflows [ ] and managing complex clinical judgments, such as medication adjustments and treatment modifications [ ].Usability and Accessibility Concerns
Nine studies (n=9) identified usability and accessibility concerns surrounding LLMs in chronic disease management tasks. Notably, the restriction to textual inputs limits use for tasks involving multimodal diagnostic tasks [
, ] language and cultural misalignments [ , ], while age-inappropriate outputs pose challenges for pediatric care [ ]. In addition, poor interpretability of model predictions for early disease prediction and risk assessment [ , , ]. Additional challenges included a lack of empathy and ineffectiveness in emergencies [ , ], digital literacy gaps restricting adoption among older adults [ , ]. Furthermore, these studies also noted how insufficient transparency in model decision-making processes hindered trust and clinical acceptance [ , ].LLM Evaluation
Five studies (n=5) noted challenges involving LLM evaluation. Notable challenges identified included that automated evaluation metrics primarily focus on predictive performance and fail to assess the impact on patient treatment outcomes [
], difficulties in achieving consensus among human evaluators when assessing LLM outputs [ ], discrepancies between model performance in test environments versus real-world applications [ ], difficulties in consistently evaluating language models across different diabetes-related tasks [ ], and significant variations in age-appropriateness scoring between different LLM platforms [ ].Legal, Ethical, Privacy, and Regulatory Concerns
Ten studies (n=10) identified legal, ethical, privacy, and regulatory challenges of using LLMs in chronic disease management. The review highlighted several critical concerns, including privacy and data security vulnerabilities [
, , , , , ], absence of regulatory approval and standardized guidelines [ ], and compliance issues with health care laws across different jurisdictions [ , ]. In addition, language and cultural barriers posed additional challenges, particularly for non-English speakers [ , ], while bias and equity issues stemming from limited training data diversity raised concerns about health care disparities [ , , ]. Studies also noted ethical challenges around accountability for errors [ ], lack of transparency in decision-making [ , ], and limitations in addressing complex ethical dilemmas in clinical care [ , ].The most prevalent challenge identified was inaccurate and inconsistent responses, reported in 62% (18/29) of studies. Legal, ethical, privacy, and regulatory concerns followed, appearing in 35% (10/29) of studies. Usability and accessibility issues were noted in 31% (9/29) of studies. Computational and technical limitations, as well as dataset and knowledge constraints, were each reported in 21% (6/29) of studies. Additionally, 17% (5/29) of studies highlighted limitations in evaluation methodologies.
Discussion
Principal Findings
This scoping review presents 3 significant findings from the 29 included studies (n=29): (1) LLMs are mostly used for both patient-centered (18/29, 62%) and practitioner-centered (11/29, 38%) tasks, with patient education and information provision emerging as the major application (18/29, 62%); (2) despite promising applications, significant challenges still exist, particularly regarding LLM response accuracy (18/29, 62% of studies), ethical concerns (10/29, 35%), and usability issues (9/29, 31%); and (3) methodological quality varies considerably across studies, with journal articles demonstrating higher quality (13/18, 72%) compared with conference papers (3/8, 38%) and preprints (1/3, 33%). These findings highlight both the considerable promise and significant limitations of current LLM applications in chronic disease management, which are examined in detail below.
Chronic Disease Management Tasks
Chronic disease management is an approach to managing chronic illnesses involving screenings, regular check-ups, monitoring, coordination of treatment, medication adherence, lifestyle modifications, and patient education [
]. The findings from this scoping review reveal an increasing interest in leveraging LLMs like ChatGPT to support both patient-centered and practitioner-centered tasks [ , , - , , ].Patient-Centered Tasks
The majority of the studies (18/29, 62%) focused on patient-centered tasks, reflecting the emphasis on patient active engagement in chronic disease management [
, ]. Patients’ active engagement enables them to monitor their symptoms, disease progress, weight, and adverse drug effects, and adhere to medication and visits [ , ]. This review found that LLMs support various patient-centered applications, including patient education and information provision, disease monitoring and self-management, emotional support and therapeutic conversations, and diagnosis and treatment assistance.Patient education and information provision emerged as the most prominent application (18/29, 62%), with LLMs providing health information about conditions, treatment plans, and medication schedules [
, , , , , , , , , - ]. With the right health information, individuals with chronic diseases can easily self-manage their conditions. Diagnosis and treatment applications accounted for (6/29, 21%). Studies experimented using LLMs to suggest laboratory tests, for diagnosis, and for medication generation tailored to individual patient conditions [ , , , , , ]. However, using LLMs for diagnostic and treatment has been criticized due to concerns about hallucinations and misinterpretations of clinical guidelines [ , ], highlighting the need for continued research to ensure patient safety. Therefore, LLM usage should not replace health practitioners but instead serve as complementary tools.Self-management and disease monitoring applications (8/29, 28%) demonstrated how LLMs can facilitate home-based monitoring of various physiological parameters [
, - , , , , ]. Recent studies have highlighted that patient engagement with health monitoring technologies is crucial for improving health outcomes in chronic disease management [ ]. Studies have also shown that wearable technologies integrated with LLMs provide real-time patient-centered health data that can better inform self-management decision-making [ ]. However, ensuring consistent long-term engagement is still a challenge.Emotional support and therapeutic conversations (4/29, 14%) represented an emerging application area [
, , , ]. Studies showed that LLMs can provide psychological support through tailored recommendations addressing specific stressors [ ], identifying barriers to behavior change [ ], and offering coping strategies for patients with diabetes [ ]. Emotional support is increasingly recognized as essential in chronic disease management [ ], which helps patients overcome psychological barriers to treatment adherence and lifestyle modifications.Practitioner-Centered Tasks
Practitioner-centered tasks (11/29, 38%) mainly revolved around clinical decision support and medical predictions. Clinical decision support applications (8/29, 28%) provided health care practitioners with actionable information to enhance decision-making. Studies demonstrated that LLMs can generate personalized medical reports, generate treatment recommendations, and support diagnostic processes that assist health care specialists in making informed decisions [
, ]. However, while LLMs enhance diagnostic efficiency, concerns regarding inconsistent outputs pose barriers to clinical adoption. Medical prediction applications account for (7/29, 24%) of LLM use in chronic disease management, showing strong potential for early disease detection and risk stratification. By integrating structured and unstructured clinical data, such as lab results, clinical notes, and imaging, LLMs enable more comprehensive and accurate predictive models compared with traditional methods [ , , , - ].Notably, real-time risk assessment tools, like the ambulatory device developed for sickle cell anemia management [
], demonstrate how LLMs can predict complications before symptoms appear. However, challenges relating to medical accuracy still limit their seamless integration into clinical workflows [ , , , - , ]. A significant emerging trend is the development of retrieval-augmented generation (RAG) frameworks that enhance prediction accuracy by dynamically incorporating up-to-date medical knowledge [ , , - ]. Future advancements should focus on seamless integration of LLMs with existing medical systems, wearable health technologies, and mobile health applications to improve interoperability and trustworthiness.Methodological Quality and Strength of Evidence
The methodological quality assessment revealed notable trends that should be considered while interpreting the results of this scoping review. High-quality studies (18/29, 62%) were not uniformly distributed across LLM application tasks, studies examining medical prediction applications [
, , , ] and patient education and information provision [ , , , ] had significantly stronger methodological rigor. Studies investigating emotional support showed mixed quality, with some high-quality qualitative research [ , ], a quantitative descriptive study [ ] alongside a moderate-quality mixed study [ ]. Notably, self-management and disease monitoring studies showed significant methodological heterogeneity, with quality ratings ranging from low to high.Journal articles demonstrated a substantially higher proportion of high-quality studies (13/18, 72%) compared with conference papers (3/8, 38%) and preprints (1/3, 33%), suggesting that peer-review processes enhance methodological quality. The identified common limitations, including inadequate sampling strategies, limited participant demographic reporting, and insufficient methodological transparency, were more prevalent in lower-tier publication sources, including conference papers and preprints. Quantitative descriptive studies, especially those focused on system design and prototype testing (10/29, 34.5%), showed mixed quality ratings ranging from low to high, with a common limitation being the use of synthetic data, lack of clinical validation, as they commonly prioritized technical utility. These methodological patterns significantly influence the reliability of the findings.
Challenges and Corresponding Recommendations
This section discusses the key challenges identified in the review and presents corresponding recommendations to mitigate these issues. Each challenge is followed by practical solutions to enhance the applicability of LLMs in chronic disease management.
Inaccurate Data and Inconsistencies in Responses
Inaccurate and inconsistent responses emerged as a significant challenge (18/29, 62%), primarily due to poor data quality, inherent biases in training datasets, and the limited scope of knowledge constrained by the model’s training cutoff [
, , , ]. Given that LLM performance is intrinsically linked to data quality, flawed datasets inevitably propagate errors in outputs, a manifestation of the “garbage in, garbage out” principle [ - ]. The issue of biases in AI models has gathered significant attention in recent literature [ - ], prompting possible migration strategies [ - ]. These limitations carry critical implications for health care, as erroneous LLM outputs may lead to incorrect clinical decisions, posing significant risks to patient safety [ , ]. Hence, researchers are exploring technical solutions such as advancing domain-specific fine-tuning techniques [ , ], leveraging retrieval-augmented generation (RAG) frameworks [ , - ], and refining outputs through reinforcement learning (RL) and prompt engineering techniques [ - ]. In addition, implementing expert validation protocols has emerged as a crucial safeguard to ensure adherence to evidence-based practice.Limited Datasets
The scarcity of high-quality datasets for chronic diseases accounted for (6/29, 21%). Studies highlighted limitations and narrow coverage of publicly accessible clinical training datasets due to data privacy and institutional restrictions [
, , - , ]. Given that experimental studies often require substantial model training datasets, the absence of adequate data poses a significant challenge to the effectiveness and success of these studies. Therefore, to address this gap, synthetic datasets and data augmentation techniques have been explored by studies [ , ]. However, these methods risk amplifying pre-existing biases in source data [ - ]. Therefore, dataset validation is essential to ensure quality, collaborative partnerships with health care institutions to access real-world clinical datasets and knowledge distillation techniques, where smaller models can be trained on outputs from larger, clinically validated models, reducing dependency on raw data volume, can be explored [ ].Computational Resources and Technical Challenges
The computational demands of training LLMs for health care applications remain a significant challenge (6/29, 21%). Consequently, low computing resources approaches, such as quantization, parameter-efficient fine-tuning (PEFT) techniques like Low-Rank Adaptation (LORA), Quantized LoRA (QLORA), Weight-Decomposed Low-Rank Adaptation, and REFT [
- ], are evolving and are popularly used in fine-tuning LLMs in low-resource computing environments. In addition, adapter-based tuning methods provide lightweight alternatives by injecting trainable adapter layers into frozen pretrained models, enabling task-specific fine-tuning without updating the entire model parameters [ , ]. Building on these advances, the development of lightweight LLMs optimized for mobile devices presents a promising direction for extending AI-based chronic disease management to resource-constrained settings [ ].Usability and Accessibility Concerns
Usability and accessibility concerns accounted for (9/29, 31%), including issues with text-only interfaces for some LLMs, cultural misalignments, and outputs ill-suited for pediatric or elderly populations [
, , ]. While studies highlighted text-only interfaces as a critical limitation of LLMs in health care [ , ], recent advances in multimodal architectures have addressed this gap. These models now integrate and interpret diverse data modalities, including medical images, audio, and structured documents, while generating composite textual and visual outputs [ , - ]. In addition, having dynamic and simplified interfaces to accommodate low-digital-literacy users, pediatric users, and for different cultural and language settings could further improve LLM usability and adaptation.Legal, Ethical Issues, and Regulatory Issues
Legal, ethical, and regulatory concerns (10/29, 35%) remain a key issue in LLM adoption in health care. Studies identified data privacy, biases, misinformation, responsibility, and accountability for LLM-generated content as key concerns. Although several AI frameworks have been proposed, the absence of standardized guidelines and regulatory approvals creates significant gaps [
, - ]. This regulatory vacuum risks inconsistent model development and validation practices, unaddressed ethical dilemmas regarding accountability for AI-generated recommendations, and potential mismatches between rapidly evolving LLM capabilities and static health care regulations [ , ]. Addressing these ethical concerns requires standard guidelines and rules to ensure responsible use in health care settings[ ]. Therefore, future efforts must prioritize the development of a comprehensive regulatory framework.LLM Evaluation
Evaluation gaps accounted for (5/29, 17%), reflecting critical shortcomings in current methodologies for assessing LLM performance in clinical contexts. The existing automated evaluation metrics mainly focus on predictive performance using metrics such as accuracy and F1-scores that lack medical and treatment knowledge [
, ]. These metrics may produce misleading conclusions if not appropriately selected [ ]. Furthermore, human evaluation, although valuable, introduces subjectivity and interrater variability, yet the absence of a standardized LLM evaluation framework makes attaining consensus among human raters challenging [ , ]. A promising direction involves a hybrid evaluation approach that integrates human expert reviews with automated metrics. Future efforts should prioritize the development of standardized LLM evaluation frameworks tailored to health care settings. summarizes the challenges associated with applying LLMs in chronic disease management with proposed recommendations.Challenge | Key observations | Recommendation |
Inaccurate and inconsistent responses | Hallucinations, inconsistent responses, outdated knowledge | Adopting RAG frameworks, fine-tuning, and prompt engineering. RL, expert validation of LLM-generated recommendations |
Limited datasets | Scarcity of datasets, missing data, and dataset imbalances. | Use synthetic data, data augmentation, partnerships with health care institutions, knowledge distillation |
Computational demands | High computational demands | Adopt PEFT (LORA and QLORA), quantization use lightweight models for mobile devices |
Ethical and privacy concerns | Privacy concerns, language and cultural barriers, and lack of regulatory oversight | Develop a regulatory framework |
Usability issues | Restriction to textual inputs, lack of empathy, ineffectiveness in emergencies Age-appropriate interaction | Use multimodal LLMs. Dynamic interfaces to accommodate low-digital-literacy users, age-appropriate interaction modes customize to different cultural settings Integrate with wearable devices and mobile health apps |
Evaluation challenges | Predictive performance metrics, subjectivity, and interrater variability | Develop a standardized LLM evaluation framework for health care. |
Limitations of the Study
Although quality assessment was conducted using MMAT, studies were not excluded based on their methodological rigor. As a result, including moderate and low-quality studies may have influenced the reliability and consistency of the reviewer’s findings. The varied methodological designs across studies may have affected the interpretation of results and conclusions drawn. Furthermore, the review was limited to only English-language publications, which may have introduced language bias and limited the generalizability of our findings, particularly in contexts where LLMs are being adapted to local languages or integrated into culturally specific health care practices. The exclusion of databases such as Embase and Web of Science may have limited the comprehensiveness of the search. Future research can consider broader database coverage and non-English sources to enhance diversity and scope.
Implications for Practice and Future Research
This scoping review reveals several critical implications for the integration of LLMs in chronic disease management. First, it is essential to address the accuracy issues identified in 18/29, 62% of studies. This calls for both technical solutions (domain-specific fine-tuning, reinforcement learning (RL), and retrieval-augmented generation (RAG) frameworks [
, ]) and nontechnical solutions (expert validation and collaborative partnerships with health care institutions to access and use real-world clinical data). Second, enhancing accessibility across diverse patient populations requires developing culturally adapted LLM interfaces, implementing age-appropriate interaction modes [ ], and integrating with low-resource platforms such as SMS-based systems and lightweight mobile apps for various populations [ ]. Third, robust governance frameworks must be established to address the ethical, legal, and privacy concerns noted in 10/29, 35% of studies to ensure regulatory compliance.Finally, future research should focus on multimodal LLMs that can synthesize diverse data inputs from wearable devices and mobile health applications for holistic patient monitoring [
, , ] and develop resource-efficient deployment techniques. The predominance of diabetes-focused studies (14/29, 48%) highlights a potential research gap in addressing other prevalent chronic conditions. Similarly, there is a research gap in age-inclusive LLM design, given the overwhelming focus on adult populations (28/29, 96%) and the lack of pediatric studies. Addressing these gaps would enhance the clinical relevance and equitable application of LLMs across the full spectrum of chronic disease management.Conclusion
This scoping review highlights the growing potential of LLMs in supporting chronic disease management through patient education, diagnosis and treatment, emotional support, self-management support, decision support, and prediction tasks. While LLMs offer promising capabilities, their effective integration into health care still requires addressing key challenges related to accuracy, accessibility, usability, and ethical and privacy concerns. Future research should focus on integrating LLMs with mobile and wearable technologies, creating culturally and age-appropriate interfaces, and exploring integration with low-resource platforms. Addressing these research gaps will ensure equitable and safe use of LLMs across diverse health care settings.
Acknowledgments
The authors acknowledge the valuable inputs of the reviewers and the editor, and the support of Mr. Bazekketa Datson of Ndejje University, Faculty of Science and Computing, for illustrating the figures used in this scoping review. The authors did not receive any funding for this review study. During the preparation of this work, the authors used Grammarly to improve the readability and language of the paper. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.
Authors' Contributions
SHM contributed to conceptualization, methodology, data extraction and analysis, writing-original draft, and writing-review and editing. OJ managed conceptualization, methodology, data extraction and analysis, and writing-review and editing. HKN handled data extraction and analysis and review and editing. CGO, PS, PW, and NK contributed to review and critical input.
Conflicts of Interest
None declared.
Detailed methodological quality appraisal results for each included study using the Mixed Methods Appraisal tool (MMAT).
DOCX File, 78 KBPRISMA-ScR checklist.
PDF File, 154 KBReferences
- Bardhan I, Chen H, Karahanna E. Connecting systems, data, and people: a multidisciplinary research roadmap for chronic disease management. MIS Q. Mar 1, 2020;44(1):185-200. [CrossRef]
- Hacker K. The burden of chronic disease. Mayo Clinic Proceedings: Innovations, Quality & Outcomes. Feb 2024;8(1):112-119. [CrossRef]
- Holmen H, Larsen MH, Sallinen MH, et al. Working with patients suffering from chronic diseases can be a balancing act for health care professionals - a meta-synthesis of qualitative studies. BMC Health Serv Res. Feb 10, 2020;20(1):98. [CrossRef] [Medline]
- Xie Y, Lu L, Gao F, et al. Integration of artificial intelligence, blockchain, and wearable technology for chronic disease management: a new paradigm in smart healthcare. CURR MED SCI. Dec 2021;41(6):1123-1133. [CrossRef] [Medline]
- Jiménez-Muñoz L, Gutiérrez-Rojas L, Porras-Segovia A, Courtet P, Baca-García E. Mobile applications for the management of chronic physical conditions: a systematic review. Intern Med J. Jan 2022;52(1):21-29. [CrossRef] [Medline]
- Montagna S, Ferretti S, Klopfenstein LC, Florio A, Pengo MF. Data decentralisation of LLM-based chatbot systems in chronic disease self-management. GoodIT ’23: Proceedings of the 2023 ACM Conference on Information Technology for Social Good. Sep 6, 2023:205-212. [CrossRef]
- Haque A, Chowdhury M, Soliman H. Transforming chronic disease management with chatbots: key use cases for personalized and cost-effective care. 2023. Presented at: 2023 Sixth International Symposium on Computer, Consumer and Control (IS3C); Jun 30 to Jul 3, 2023:367-370; Taichung, Taiwan. [CrossRef]
- Reddy S. Evaluating large language models for use in healthcare: a framework for translational value assessment. Informatics in Medicine Unlocked. 2023;41(May):101304. [CrossRef]
- Huang X, Lian J, Lei Y, Yao J, Lian D, Xie X. Recommender AI agent: integrating large language models for interactive recommendations. arXiv. Preprint posted online on Jan 30, 2024. URL: https://arxiv.org/abs/2308.16505 [Accessed 2025-08-24]
- Snoswell CL, Snoswell AJ, Kelly JT, Caffery LJ, Smith AC. Artificial intelligence: augmenting telehealth with large language models. J Telemed Telecare. Jan 2025;31(1):150-154. [CrossRef] [Medline]
- Khan MA, Mohammad N. A review on large language models: architectures, applications, taxonomies, open issues and challenges. TechRxiv. Preprint posted online on Sep 27, 2023. [CrossRef]
- Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: development, applications, and challenges. Health Care Sci. Aug 2023;2(4):255-263. [CrossRef] [Medline]
- Sharma P, Parasa S. ChatGPT and large language models in gastroenterology. Nat Rev Gastroenterol Hepatol. Aug 2023;20(8):481-482. [CrossRef] [Medline]
- Karabacak M, Margetis K. Embracing large language models for medical applications: opportunities and challenges. Cureus. May 2023;15(5):e39305. [CrossRef] [Medline]
- Latif A, Kim J. Evaluation and analysis of large language models for clinical text augmentation and generation. IEEE Access. Apr 2024;12:1-1. [CrossRef]
- Yang H, li J, liu S, Liu J. Exploring the potential of large language models in personalized diabetes treatment strategies. In Review. Preprint posted online on 2023. [CrossRef]
- Raghu K, S T, S Devishamani C, M S, Rajalakshmi R, Raman R. The utility of ChatGPT in diabetic retinopathy risk assessment: a comparative study with clinical diagnosis [response to letter]. Clin Ophthalmol. 2024;18:313-314. [CrossRef] [Medline]
- Song I, Pendse SR, Kumar N. The typing cure: experiences with large language model chatbots for mental health support. arXiv. Preprint posted online on May 9, 2025. URL: https://arxiv.org/abs/2401.14362
- Liu Z, Chen C, Cao J, et al. Large language models for cuffless blood pressure measurement from wearable biosignals. BCB ’24: Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. :1-11. [CrossRef]
- Ogundare O, Sofolahan S. Large language models in ambulatory devices for home health diagnostics: a case study of sickle cell anemia management. arXiv. Preprint posted online on May 5, 2023. [CrossRef]
- Cankurtaran RE, Polat YH, Aydemir NG, Umay E, Yurekli OT. Reliability and usefulness of ChatGPT for inflammatory bowel diseases: an analysis for patients and healthcare professionals. Cureus. Oct 2023;15(10):e46736. [CrossRef] [Medline]
- Wang X, Liu K, Wang C. Knowledge-enhanced pre-training large language model for depression diagnosis and treatment. Presented at: 2023 IEEE 9th International Conference on Cloud Computing and Intelligent Systems (CCIS); Aug 12-13, 2023:532-536; Dali, China. [CrossRef]
- Abdullahi T, Singh R, Eickhoff C. Learning to make rare and complex diagnoses with generative AI assistance: qualitative study of popular large language models. JMIR Med Educ. Feb 13, 2024;10:e51391. [CrossRef] [Medline]
- Al-Anezi FM. Exploring the use of ChatGPT as a virtual health coach for chronic disease management. Learn Health Syst. Jul 2024;8(3):e10406. [CrossRef] [Medline]
- Tricco AC, Lillie E, Zarin W, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. Oct 2, 2018;169(7):467-473. [CrossRef] [Medline]
- Hong QN, Fàbregues S, Bartlett G, et al. The Mixed Methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. EFI. 2018;34(4):285-291. [CrossRef]
- Athavale A, Baier J, Ross E, Fukaya E. The potential of chatbots in chronic venous disease patient management. JVS Vasc Insights. 2023;1:100019. [CrossRef] [Medline]
- Soto-Chávez MJ, Bustos MM, Fernández-Ávila DG, Muñoz OM. Evaluation of information provided to patients by ChatGPT about chronic diseases in Spanish language. Digit Health. 2024;10:20552076231224603. [CrossRef] [Medline]
- Abbas S, Iftikhar M, Shah MM, Khan SJ. ChatGPT-assisted machine learning for chronic disease classification and prediction: a developmental and validation study. Cureus. Dec 2024;16(12):e75851. [CrossRef] [Medline]
- Anderson P, Lin D, Davidson J, et al. Bridging domains in chronic lower back pain: large language models and ontology-driven strategies for knowledge graph construction. bioRxiv. Preprint posted online on Mar 14, 2024. [CrossRef]
- Ding JE, Thao PNM, Peng WC, et al. Large language multimodal models for 5-year chronic disease cohort prediction using EHR data. arXiv. Preprint posted online on Aug 29, 2024. URL: https://arxiv.org/abs/2403.04785
- Jairoun AA, Al-Hemyari SS, Shahwan M, Al-Qirim T, Shahwan M. Benefit-risk assessment of ChatGPT applications in the field of diabetes and metabolic illnesses: a qualitative study. Clin Med Insights Endocrinol Diabetes. 2024;17:11795514241235514. [CrossRef] [Medline]
- Mondal A, Naskar A. Artificial intelligence in diabetes care: evaluating GPT-4’s competency in reviewing diabetic patient management plan in comparison to expert review. Endocrinology (including Diabetes Mellitus and Metabolic Disease). :2024. [CrossRef]
- Liu H, Zhang W, Xie J, et al. Few-shot learning for chronic disease management: leveraging large language models and multi-prompt engineering with medical knowledge injection. Proceedings of the 58th Hawaii International Conference on System Sciences. 2025. [CrossRef] [Medline]
- Liao C, Kuo WT, Hu IH, et al. EHR-based mobile and web platform for chronic disease risk prediction using large language multimodal models. CIKM ’24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. Oct 21, 2024:5244-5248. [CrossRef]
- Ding JE, Thao PNM, Peng WC, et al. Large language multimodal models for new-onset type 2 diabetes prediction using five-year cohort electronic health records. Sci Rep. Sep 6, 2024;14(1):20774. [CrossRef] [Medline]
- Dao D, Teo JYC, Wang W, Nguyen HD. LLM-powered multimodal AI conversations for diabetes prevention. 2024. Presented at: ICMR ’24: International Conference on Multimedia Retrieval:1-6; Phuket, Thailand. URL: https://dl.acm.org/doi/proceedings/10.1145/3643479 [Accessed 2025-09-02]
- Khan M. Assessing the efficacy of ChatGPT in facilitating self-management strategies among diabetic patients section. European Chemical Bulletin. 2023;12(10):10490-10502. URL: https://www.researchgate.net/publication/373420033_Assessing_the_Efficacy_of_ChatGPT_in_Facilitating_Self-Management_Strategies_among_Diabetic_Patients_Section_A-Research_paper_10490_Eur [Accessed 2025-09-02]
- Mondal H, Dash I, Mondal S, Behera JK. ChatGPT in answering queries related to lifestyle-related diseases and disorders. Cureus. Nov 2023. [CrossRef]
- Young CC, Enichen E, Rao A, et al. Pilot study of large language models as an age-appropriate explanatory tool for chronic pediatric conditions. medRxiv. Preprint posted online on Aug 7, 2024. [CrossRef]
- Li J, Guan Z, Wang J, et al. Integrated image-based deep learning and language models for primary diabetes care. Nat Med. Oct 2024;30(10):2886-2896. [CrossRef] [Medline]
- Ying Y, Wang Y, Yuan S, et al. Exploration of ChatGPT application in diabetes education based on multi-dataset. medRxiv. Preprint posted online on Sep 27, 2023. [CrossRef]
- Li H, Jiang Z, Guan Z, et al. Large language models for diabetes training: a prospective study. Sci Bull Sci Found Philipp. Mar 2025;70(6):934-942. [CrossRef]
- Hussain W, Grundy J. Advice for diabetes self-management by ChatGPT models: challenges and recommendations. arXiv. Preprint posted online on Jan 14, 2025. URL: https://arxiv.org/abs/2501.07931
- Wang D, Liang J, Ye J, et al. Enhancement of the performance of large language models in diabetes education through retrieval-augmented generation: comparative study. J Med Internet Res. Nov 8, 2024;26:e58041. [CrossRef] [Medline]
- Eaton C, Vallejo N, McDonald X, et al. User engagement with mHealth interventions to promote treatment adherence and self-management in people with chronic health conditions: systematic review. J Med Internet Res. Sep 24, 2024;26:e50508. [CrossRef] [Medline]
- Cheikh-Moussa K, Mira JJ, Orozco-Beltran D. Improving engagement among patients with chronic cardiometabolic conditions using mHealth: critical review of reviews. JMIR Mhealth Uhealth. Apr 8, 2020;8(4):e15446. [CrossRef] [Medline]
- Mattison G, Canfell O, Forrester D, et al. The influence of wearables on health care outcomes in chronic disease: systematic review. J Med Internet Res. Jul 1, 2022;24(7):e36690. [CrossRef] [Medline]
- Laranjo L, Dunn AG, Tong HL, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. Sep 1, 2018;25(9):1248-1258. [CrossRef] [Medline]
- Lewis P, Perez E, Pictus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. Preprint posted online on Apr 12, 2021. [CrossRef]
- Zakka C, Chaurasia A, Shad R, et al. Almanac: retrieval-augmented language models for clinical medicine. Res Sq. May 2, 2023:rs.3.rs-2883198. [CrossRef] [Medline]
- Wang Y, Ma X, Chen W. Augmenting black-box llms with medical textbooks for clinical question answering. arXiv. Preprint posted online on Feb 23, 2025. [CrossRef]
- Xiong G, Jin Q, Lu Z, Zhang A. Benchmarking retrieval-augmented generation for medicine. Findings of the Association for Computational Linguistics ACL 2024. 2024:6233-6251. [CrossRef]
- Li H, Moon JT, Purkayastha S, Celi LA, Trivedi H, Gichoya JW. Ethics of large language models in medicine and medical research. Lancet Digit Health. Jun 2023;5(6):e333-e335. [CrossRef] [Medline]
- Navigli R, Conia S, Ross B. Biases in large language models: origins, inventory, and discussion. J Data and Information Quality. Jun 30, 2023;15(2):1-21. [CrossRef]
- Liang PP, Wu C, Morency LP, Salakhutdinov R. Towards understanding and mitigating social biases in language models. arXiv. Preprint posted online on Jun 24, 2021. [CrossRef] [Medline]
- Verma S, Ernst M, Just R. Removing biased data to improve fairness and accuracy. arXiv. Preprint posted online on Feb 5, 2021. [CrossRef]
- Kim E, Lee J, Choo J. BiaSwap: removing dataset bias with bias-tailored swapping augmentation. 2021. Presented at: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); Oct 10-17, 2021:14972-14981; Montreal, QC, Canada. [CrossRef]
- Ernst JS, Marton S, Brinkmann J, et al. Bias mitigation for large language models using adversarial learning. 2023. Presented at: CEUR Workshop on Fairness and Bias in AI; Krakow, Poland. URL: https://ceur-ws.org/Vol-3523/paper11.pdf [Accessed 2025-08-24]
- Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: a medical chat model fine-tuned on a Large Language Model Meta-AI (LLaMA) using medical domain knowledge. Cureus. Jun 2023;15(6):e40895. [CrossRef] [Medline]
- Alghanmi I, Espinosa-Anke L, Schockaert S. Self-supervised intermediate fine-tuning of biomedical language models for interpreting patient case descriptions. Proceedings of the 29th International Conference on Computational Linguistics. 2022. URL: https://aclanthology.org/2022.coling-1.123/ [Accessed 2025-09-02]
- Heston TF, Khun C. Prompt engineering in medical education. IME. 2023;2(3):198-205. [CrossRef]
- Meskó B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J Med Internet Res. Oct 4, 2023;25:e50638. [CrossRef] [Medline]
- Wang J, Shi E, Yu S, et al. Prompt engineering for healthcare: methodologies and applications. arXiv. Preprint posted online on Apr 23, 2024. [CrossRef]
- Sarker S, Qian L, Dong X. Medical data augmentation via chatgpt: a case study on medication identification and medication event classification. arXiv. Preprint posted online on Jun 10, 2023. URL: https://arxiv.org/abs/2306.07297
- Yang C, Zhu Y, Lu W, et al. Survey on knowledge distillation for large language models: methods, evaluation, and application. ACM Trans Intell Syst Technol. 2024. [CrossRef]
- Liao B, Meng Y, Monz C. Parameter-efficient fine-tuning without introducing new latency. 2023. Presented at: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1:4242-4260. [CrossRef]
- Ding N, Qin Y, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat Mach Intell. 2023;5(3):220-235. [CrossRef]
- Liu H, Tam D, Muqeeth M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv. Preprint posted online on Aug 26, 2022. [CrossRef]
- Hu E, Shen Y, Wallis P, et al. LoRA: low-rank adaptation of large language models. arXiv. Preprint posted online on Oct 16, 2021. [CrossRef]
- Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: efficient finetuning of quantized LLMs. arXiv. Preprint posted online on May 23, 2023. [CrossRef]
- Liu SY, Wang CY, Yin H, et al. DoRA: weight-decomposed low-rank adaptation. arXiv. Preprint posted online on Jul 9, 2024. [CrossRef]
- Wu Z, Arora A, Wang Z, et al. ReFT: representation finetuning for language models. arXiv. Preprint posted online on May 22, 2024. [CrossRef]
- Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-efficient transfer learning for NLP. arXiv. Preprint posted online on Jun 13, 2019. [CrossRef]
- Li XL, Liang P. Prefix-tuning: optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1). Preprint posted online on Jan, 2021. [CrossRef]
- Wang X, Dang T, Kostakos V, Jia H. Efficient and personalized mobile health event prediction via small language models. 2024. Presented at: Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (MobiCom ’24):2353-2358. [CrossRef]
- Meskó B. The impact of multimodal large language models on health care’s future. J Med Internet Res. Nov 2, 2023;25:e52865. [CrossRef] [Medline]
- Belyaeva A, Cosentino J, Hormozdiari F, et al. Multimodal llms for health grounded in individual-specific data. arXiv. Preprint posted online on Jul 20, 2023. [CrossRef]
- Gemini Team Google. Gemini: a family of highly capable multimodal models. arXiv. Preprint posted online on May 9, 2025. [CrossRef]
- Ali H, Qadir J, Alam T, Househ M, Shah Z. ChatGPT and large language models in healthcare: opportunities and risks. 2023. Presented at: IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings). [CrossRef]
- Nasir S, Khan RA, Bai S. Ethical framework for harnessing the power of AI in healthcare and beyond. IEEE Access. 2024;12:31014-31035. [CrossRef]
- Goirand M, Austin E, Clay-Williams R. Implementing ethics in healthcare AI-based applications: a scoping review. Sci Eng Ethics. Sep 3, 2021;27(5):5. [CrossRef] [Medline]
- Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. Jul 6, 2023;6(1):120. [CrossRef] [Medline]
- Lareyre F, Raffort J. Ethical concerns regarding the use of large language models in healthcare. EJVES Vasc Forum. 2024;61:1. [CrossRef] [Medline]
- Reddy S, Rogers W, Makinen VP, et al. Evaluation framework to guide implementation of AI systems into healthcare settings. BMJ Health Care Inform. Oct 2021;28(1):1-7. [CrossRef] [Medline]
Abbreviations
LLM: large language model |
PRISMA-ScR: Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews |
RAG: retrieval augmented generation |
RL: reinforcement learning |
Edited by Alexandre Castonguay; submitted 26.Sep.2024; peer-reviewed by Ali Jafarizadeh, Soroosh Tayebi Arasteh; final revised version received 06.May.2025; accepted 23.May.2025; published 29.Sep.2025.
Copyright©Henry Mukalazi Serugunda, Ouyang Jianquan, Hasifah Kasujja Namatovu, Paul Ssemaluulu, Nasser Kimbugwe, Christopher Garimoi Orach, Peter Waiswa. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 29.Sep.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.