Background: Sensors are increasingly used in health interventions to unobtrusively and continuously capture participants’ physical activity in free-living conditions. The rich granularity of sensor data offers great potential for analyzing patterns and changes in physical activity behaviors. The use of specialized machine learning and data mining techniques to detect, extract, and analyze these patterns has increased, helping to better understand how participants’ physical activity evolves.
Objective: The aim of this systematic review was to identify and present the various data mining techniques employed to analyze changes in physical activity behaviors from sensors-derived data in health education and health promotion intervention studies. We addressed two main research questions: (1) What are the current techniques used for mining physical activity sensor data to detect behavior changes in health education or health promotion contexts? (2) What are the challenges and opportunities in mining physical activity sensor data for detecting physical activity behavior changes?
Methods: The systematic review was performed in May 2021 using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We queried the Association for Computing Machinery (ACM), IEEE Xplore, ProQuest, Scopus, Web of Science, Education Resources Information Center (ERIC), and Springer literature databases for peer-reviewed references related to wearable machine learning to detect physical activity changes in health education. A total of 4388 references were initially retrieved from the databases. After removing duplicates and screening titles and abstracts, 285 references were subjected to full-text review, resulting in 19 articles included for analysis.
Results: All studies used accelerometers, sometimes in combination with another sensor (37%). Data were collected over a period ranging from 4 days to 1 year (median 10 weeks) from a cohort size ranging between 10 and 11615 (median 74). Data preprocessing was mainly carried out using proprietary software, generally resulting in step counts and time spent in physical activity aggregated predominantly at the daily or minute level. The main features used as input for the data mining models were descriptive statistics of the preprocessed data. The most common data mining methods were classifiers, clusters, and decision-making algorithms, and these focused on personalization (58%) and analysis of physical activity behaviors (42%).
Conclusions: Mining sensor data offers great opportunities to analyze physical activity behavior changes, build models to better detect and interpret behavior changes, and allow for personalized feedback and support for participants, especially where larger sample sizes and longer recording times are available. Exploring different data aggregation levels can help detect subtle and sustained behavior changes. However, the literature suggests that there is still work remaining to improve the transparency, explicitness, and standardization of the data preprocessing and mining processes to establish best practices and make the detection methods easier to understand, scrutinize, and reproduce.
Wearable sensors are increasingly employed in health interventions because of their ability to track participants’ physical activity (PA) in an unobtrusive, continuous, and precise manner under free-living conditions . In the context of health promotion, sensor data are commonly used to objectively assess interventions by monitoring PA changes and progress toward compliance with public health PA guidelines [ ].
The rich data captured by activity sensors contain information about the participants’ PA, potentially unlocking valuable insights into PA behaviors and patterns . These insights can help to advance the understanding of how interventions affect PA behaviors and how behaviors change, thereby scaffolding the design of future interventions, and enhancing their outcomes, efficacy, and adherence.
In the last decade, a growing number of artificial intelligence and data mining models and techniques have been developed to detect and extract these latent PA patterns beyond the typical summaries of pre- and postintervention daily steps or time spent in various PA levels. In this systematic review, we aimed to describe the data mining models and techniques currently used to detect PA with a focus on behavior changes. We discuss their value, identify gaps or challenges, and highlight opportunities. The following research questions (RQs) guided this review:
RQ1: What are the current techniques used for mining PA sensor data to detect behavior changes in health education or health promotion contexts?
RQ1.1 What are the types of sensors used and what data are collected?
RQ1.2 How are data preprocessed?
RQ1.3 What features are used to detect behavior changes?
RQ1.4 What are the data mining models and techniques used to detect behavior changes?
RQ1.5 What are the interpretation of data mining models used for?
RQ2: What are the challenges and opportunities in mining PA sensor data for detecting PA behavior changes?
The RQ1 subquestions were established following the reasoning and order of the process of knowledge discovery in databases . summarizes this process and maps each step with the relevant RQ1 subquestion.
For this systematic review, we followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines  and used the Rayyan QCRI web application [ ] to manage the review process. We identified studies by searching the Association for Computing Machinery (ACM), IEEE Xplore, ProQuest, Scopus, Web of Science, Education Resources Information Center (ERIC), and Springer digital libraries. We also searched Google Scholar to identify grey literature and extracted the first 100 results. For this scholarly reference search, we used the following query: (education OR promotion OR “behaviour change”) AND (“data mining” OR “machine learning” OR “artificial intelligence”) AND (sensor OR accelerometer OR tracker OR wearable) AND “physical activity” AND health. All extracted scholarly references had been added to the database at the latest on the search day (May 28, 2021). The inclusion and exclusion criteria are presented in .
Inclusion and exclusion criteria for article selection in the review.
- Full-length articles
- Peer-reviewed articles in journals or conference papers
- Articles that used data mining techniques for data from physical activity (PA) wearable sensors
- Articles that included PA data
- Articles on applied health education/promotion or on behavior change scenarios
- Articles that used well-known data mining techniques such as classification, regression, clustering, association, and sequence algorithms, as well as specific algorithms to model PA data
- Use of analytics without data mining
- Studies on animals (eg, accelerometers on dogs)
- Self-quantification without a health education or health motivation component
- Dissertations and theses, due to lack of a peer review process
- Systematic reviews, reviews, and meta-analyses
- Health care applications without a health education or motivation for behavior change component
- Specific movement detection (abnormal gait, falls)
- Aid for sport training (eg, maintaining heart rate, postures, specific movements)
The number of references extracted from each electronic database is summarized in.
Following the PRISMA methodology, we retrieved 4388 references from the sources listed in. We then removed 415 duplicates, leaving 3973 unique references that were screened by reading their titles and abstracts. Using the inclusion/exclusion criteria ( ), we excluded 3688 references and selected 285 publications. After full-text reading, we excluded 266 references: 33 on activity recognition, 5 on data mining, 24 on systems, 31 on rehabilitation, 39 not on behavior changes, 54 without data mining, 51 not on health education/promotion, 13 not on PA, and 16 reviews. At the end of the selection process (summarized in ), we retained 19 references for this systematic review.
|Database||Query result, n|
|Web of Science||16|
aACM: Association for Computing Machinery.
bERIC: Education Resources Information Center.
The 19 included articles were published between 2013 and 2021. Their number per year increased from 1 in 2013 to 2 in 2017 and up to 5 in 2018. Subsequently, the number of publications decreased to a mean of 3 per year.
The selected articles were published in conferences and journals focused on five different themes (): medical and public health, medical and health informatics, human-computer interactions, physical human behavior, and engineering and science. The three most popular themes were medical and health informatics, human-computer interactions, and engineering and science (15/19, 79%). Among the included articles, four were published in JMIR publications: three in JMIR mHealth and uHealth and one in JMIR Public Health and Surveillance.
|Conference or journal||Reference|
|Medical and public health|
|BMJ Open||Aguilera et al |
|Public Health Nutrition||Lee et al |
|Medical and health informatics|
|JMIR mHealth and uHealth||Zhou et al , Rabbi et al [ ], Galy et al [ ]|
|JMIR Public Health and Surveillance||Fukuoka et al |
|Journal of Biomedical Informatics||Sprint et al |
|Proceedings of the ACM on Human-Computer Interaction||Zhu et al |
|User Modeling and User-Adapted Interaction||Gasparetti et al |
|Journal of Ambient Intelligence and Humanized Computing||Batool et al |
|Multimedia Tools and Applications||Angelides et al |
|Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization||Schäfer et al |
|Physical human behavior|
|Journal of Behavioral Medicine||Forman et al |
|Journal of Electromyography and Kinesiology||Hermens et al |
|Engineering and science|
|Applied Sciences||Chen et al |
|Sensors||Dijkhuis et al |
|Springer Proceedings in Complexity||Mollee et al |
|IEEE Access||Diaz et al |
|International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems||Mollee and Klein |
Sensor Types and Data Capture
The characteristics of the sensors (eg, number and type) used to capture PA behaviors and of the collected raw data are summarized in.
The length of data recordings varied between 4 days and 1 year, with a median of 70 days. Recording lasted ≤7 days in two studies, between 3 and 5 weeks in six studies, between 10 and 16 weeks in eight studies, and ≥6 months in three studies.
The number of participants varied between 10 and 11,615, with <30 in five studies, between 30 and 299 in 10 studies, and ≥300 participants in four studies.
All included studies used accelerometer sensors. We could categorize these devices into three groups: (1) commercial wrist-worn wearable accelerometers that are consumer-grade devices with a sample rate between 30 Hz and 60 Hz, such as Fitbits [, , , , ], Samsung Gear [ ], and Nokia [ ]; (2) smartphone accelerometers with a sample rate usually set to 50 Hz and up to 100 Hz, in which data were collected via an app installed in the smartphone [ , , , , ]; and (3) scientifically validated wearable accelerometers with a sample rate up to 100 Hz, such as ActiGraph [ ], GENEActiv [ , ], and other devices developed for health care [ , ].
In 7 out of the 19 (37%) selected studies, accelerometers were used with other sensors such as GPS tracking [, , ], compass position tracking [ , ], heart rate trackers [ , ], and smart scales [ , ].
The recorded raw data varied in function of the sensor characteristics, including sampling frequency, accuracy, and axis number. Moreover, other sensor features such as battery duration and storage capacity affected the recording length. For instance, a long battery life and high storage capacity enable longer recording without interruptions.summarizes the number of participants and data recording duration for the included studies.
|Sensor type||Device and model||Raw data||Reference|
|Accelerometer||ActiGraph GT1M uniaxial||Uniaxial accelerometry||Lee et al |
|Accelerometer||GENEActiv triaxial accelerometer||Gravity-subtracted signal vector magnitudes (SVMgs) per second||Galy et al , Diaz et al [ ]|
|Accelerometer||Generic device from the mobile phone||Acceleration (sample rate not specified)||Aguilera et al , Zhou et al [ ]|
|Accelerometer||Triaxial accelerometer (HJA-350IT, Active Style Pro, Omron Healthcare Co, Ltd)||Triaxial acceleration (6 Hz)||Fukuoka et al |
|Accelerometer||Fitbit||Triaxial acceleration (sample rate not specified)||Zhu et al |
|Accelerometer||Fitbit Flex||Triaxial acceleration (sample rate not specified)||Dijkhuis et al |
|Accelerometer||Fitbit Charge HR and Fitbit Flex||Triaxial acceleration (sample rate not specified)||Sprint et al |
|Accelerometer||Fitbit One||Triaxial acceleration (sample rate not specified)||Mollee and Klein |
|Accelerometer||Not specified||Not specified||Mollee et al |
|Accelerometer||Smartphone and Actigraph (GT3X model)||Triaxial acceleration (sample rate not specified)||Schäfer et al |
|Accelerometer and heart rate monitor||Mix of devices and models||Accelerometry, heart rate monitor, PAa information, and user information (sample rate not specified)||Chen et al |
|Accelerometer, GPS, self-log PA, and food||Smartphone||Smartphone accelerometry, GPS data, PA and food logs with sample rate specified||Rabbi et al |
|Activity tracker, smart scale, and smartphone (what they ate and drank in the Fitbit app)||Fitbit Flex 2 activity tracker, Yunmai smart scale, smartphone||Accelerometry, weight and food logs (sample rate not specified)||Forman et al |
|Accelerometer, gyroscope, and magnetic compass||ProMove-3D (developed by Inertia Technology)||Accelerometry (sample rate not specified)||Hermens et al |
|Accelerometer and GPS||Smartphone||Accelerometry and GPS (sample rate not specified)||Batool et al |
|Triaxial accelerometer, heart rate monitor, GPS, 3-axis gyroscope, digital compass, altimeter, light sensor||Samsung Gear Fit and Fitbit Surge||Accelerometry, heart rate data, GPS, 3-axis gyroscopes, digital compass, altimeter, light sensor (sample rate not specified)||Angelides et al |
|Accelerometer, heart rate, and smart scale||Nokia; models not specified||Accelerometer, heart rate data, and smart scale (sample rate not specified)||Gasparetti et al |
aPA: physical activity.
|Length of recording||Participants, n||Reference|
|1 to 7 days|
|4 days||1714||Lee et al |
|7 days||215 (women)||Fukuoka et al |
|1 to 5 weeks|
|3 weeks||17||Rabbi et al |
|3 weeks||48||Zhu et al |
|1 month||14||Angelides et al |
|4 weeks||24 (adolescents)||Galy et al |
|4 weeks||74 (children)||Schäfer et al |
|5 weeks||87 (children)||Diaz et al |
|6 to 20 weeks|
|10 weeks||11||Schäfer et al |
|10 weeks||64||Zhou et al |
|3 months||10||Hermens et al |
|12 weeks||48||Dijkhuis et al |
|12 weeks||108||Mollee and Klein |
|3 months||269||Chen et al |
|12 weeks||2472||Mollee et al |
|16 weeks||52||Forman et al |
|21 weeks to 1 year|
|6 months||276||Aguilera et al |
|6 months||500||Batool et al |
|1 year||11,615||Gasparetti et al |
Raw data extracted from sensors need to be transformed into variables that will contribute to generating the input features for data mining models to detect PA behavior changes.provides a summary of the initial transformation and the resulting preprocessed data.
The preprocessing of the raw data from sensors was carried out in two ways. The first approach was to use proprietary programs to transform the sensors’ data directly into the resulting preprocessed data, without specifying whether there was an initial preprocessing stage such as that used to generate steps, metabolic equivalents (METs), calories, heart rate, or exercise characteristics (type, duration, distance, or frequency). The second approach was to produce intermediate data that were then transformed in the resulting preprocessed data using a custom preprocessing tool. For instance, to generate PA levels (PALs), raw data were first transformed into MET, activity classes, or signal vector magnitudes.
The resulting preprocessed data were mainly activity characteristics (step count, PAL, integrals of the moduli of acceleration, activity types, duration, distance travelled, and frequency) and energy expenditure (MET and calories). Step count from smartphones and commercial wrist-worn devices was the most frequent, followed by PAL from research-grade devices.
The resulting preprocessed data were aggregated at different time levels (). Day and minutes were the most frequent time levels of aggregation. Generally, PAL and MET were aggregated per minute. Calories and step counts were calculated per day.
|Steps: unknown (proprietary program)||[, , - , , , , ]|
|Metabolic equivalents: unknown (proprietary program)|||
|Calories: unknown (proprietary program)||[, , ]|
|Exercise characteristics: unknown (proprietary program)a||[, , ]|
|Sleeping time: unknown (proprietary program)||[, ]|
|Weight: unknown (proprietary program)||[, ]|
|Heart rate: unknown (proprietary program)||[, ]|
|Physical activity (PA) levels|
|Signal vector magnitudes||[, ]|
|Integrals of the moduli of acceleration signals|||
|Actual activity level; not specifiedb|||
aType, duration, distance, frequency.
bDefinition of activity level was not specified.
|Angelides et al ||✓||✓||✓||✓|
|Zhou et al ||✓|
|Aguilera et al ||✓|
|Zhu et al ||✓|
|Mollee and Klein ||✓|
|Forman et al ||✓|
|Gasparetti et al ||✓|
|Chen et al ||✓||✓|
|Dijkhuis et al ||✓|
|Lee et al ||✓|
|Fukuoka et al ||✓|
|Sprint et al ||✓|
|Schäfer et al ||✓|
|Diaz et al ||✓|
|Galy et al ||✓|
|Mollee et al ||✓|
|Hermens et al ||✓|
|Rabbi et al ||✓|
|Batool et al ||✓|
Features Used to Detect and Extract Behavior Changes
The features of the data mining models were mostly generated from the sensors’ preprocessed data and, in some cases, from other sources (nonsensor data).provides the features categorized with respect to the function of their source: accelerometers, other sensors, and nonsensor devices.
Most of the included articles used descriptive statistics to present the preprocessed data as features, for instance total number of steps per day [, , , , ], mean number of steps per day [ ], or PA count per hour [ ]. Other studies created windows or segments of time to calculate PA characteristics, including segments of steps or sleep [ ] and PA bouts [ , ]. Other articles used the preprocessed data to calculate the participants’ step achievements such as whether they reached their step goal [ , , , ]. Zhu et al [ ] used more complex features such as the ratio between the most active and least active period or the circadian rhythm strength.
In addition to the features derived from sensors, others were created from measurements carried out during the intervention by scientists, such as the number of days that a person participated in the intervention  and anthropometric [ , ] or psychological [ , , ] characteristics. Data were collected through surveys/questionnaires or interviews with participants.
|Reference||Features derived from accelerometers||Features derived from other sensors||Features derived from nonsensor devices|
|Aguilera et al ||Number of minutes of activity in the last day, cumulative number of minutes of activity this week, fraction of activity goal, fraction versus expected activity goal at this point in the week||Number of days since each feedback message was sent||Age, gender, language, 8-item Patient Health Questionnaire (depression) score|
|Hermens et al ||Not specified||Not specified||Not specified|
|Chen et al ||Monthly mean metabolic equivalent of task, effective exercise time, type, frequency||Monthly mean exercise and resting heart rate||Gender, height, weight, age|
|Forman et al ||Days where PAa goal is met||Sum of days with self-monitored weight, days with self-monitored eating, days where calorie goal is met, weight loss in pounds||Number of days in the intervention period|
|Gasparetti et al ||Consecutive daily segments of steps, consecutive daily segments of sleep||—b||—|
|Batool et al ||Actual activity level||—||Desired activity level, intention (attitude, subjective norms, perceived behavioral control), habit, and 16 demographic features (eg, age, gender, marital status)|
|Dijkhuis et al ||Hour of the workday, number of steps for that hour, number of steps in the past hour, total number of steps up to that hour, mean number of steps of workdays||—||—|
|Rabbi et al ||PA frequency and calories||—||—|
|Zhou et al ||Daily steps and goal||—||—|
|Angelides et al ||Total and mean hourly, daily, weekly, and monthly sleep duration; sleep calories; exercise duration; exercise distance; exercise calories; step count; step distance; step calories; BMI; and basal metabolic rate||—||Height (cm), weight (kg), age, gender|
|Diaz et al ||Hourly and daily frequency, and mean time spent in moderate to vigorous PA bouts of at least 3, 10, and 30 seconds, and in sedentary bouts of at least 60, 120, and 300 seconds||—||—|
|Galy et al ||Total daily time spent in light/moderate/vigorous PA, total daily number of steps, and a binary goal achievement feature||—||—|
|Fukuoka et al ||Mean metabolic equivalent of tasks per minute, mean moderate-to-vigorous PA per minute||—||—|
|Lee et al ||24-hour mean PA count on weekdays and 24-hour mean PA count on weekends||—||—|
|Sprint et al ||Steps, PALc and bouts count, mean, percentages, ratios and SD. Circadian rhythm time-series statistics and texture features from an image-processing technique||—||—|
|Mollee et al ||Impact of online community (sharing my PAL with peers), target PAL and goal achievement||—||—|
|Schäfer ||PAL per minute||—||—|
|Mollee and Klein ||Daily steps||—||Psychological questionnaire scores for self-efficacy, barriers, social norm, long-term goals, intentions, satisfaction, outcome expectations|
|Zhu et al ||Daily steps||Motivation to exercise (Likert scale)||Iowa-Netherlands Comparison Orientation Measure-23 (INCOM-23) for social comparison (psychometrics)|
aPA: physical activity.
cPAL: physical activity level.
summarizes the data mining methods and specific algorithms used in the selected articles.
Clustering was the most used method, particularly the K-means algorithm. Indeed, in health interventions, the PA performed by each participant varies in duration, form, and intensity. Therefore, an algorithm that clusters PA behaviors is required to analyze them. The unsupervised K-means algorithm is suitable for this task. Indeed, due to its simplicity and ease of use, this is one of the most popular options for data mining . Decision-making algorithms and classifiers were the second most used methods. Both rely on supervised algorithms that use PA characteristics as a method for predicting when and/or what information must be delivered to individual participants for increasing their PA. Other algorithms were also tested to extract PA behaviors, such as social cognitive and contagion models, PA windows permutations, and recommendation algorithms.
|Data mining method and algorithm||Reference|
|K-nearest neighbor and support vector machine|||
|Random forest and weighted score|||
|Shallow neural networks|||
|K-means||[, , , ]|
|Partitioning around medoids and reinforcement learning|||
|Multiarmed bandit upper confidence bound|||
|Reinforcement learning multiarmed bandit||[, ]|
|Behavioral analytics algorithm|
|Social cognitive model for predicting exercise behavior change|||
|Social contagion model combined with a linear model|||
|Physical activity change detection: small window permutation-based change detection in activity routine|||
|Recommendation: genetic algorithms and Pareto optimality|||
aMAB: multiarmed bandit.
Hermens et al  used a k-nearest neighbor model and a support vector machine to determine whether a specific time of the day was suitable for sending a motivational message to optimize adherence to the intervention. Dijkhuis et al [ ] used a tree and tree-based ensemble algorithm classifiers to predict whether users will achieve their daily PA goal. On the basis of this prediction, a personalized PA coaching program was proposed. Forman et al [ ] developed gamified personalized feedback using a score model depending on the PA change detected from accelerometer data. Batool et al [ ] predicted the likelihood that the PA level of a given patient was too low. They also predicted which patients were at higher risk of not adhering to the prescribed therapy to optimize their PA.
Lee et al  grouped participants in two clusters on the basis of their step counts (one more active than the other), and analyzed them to better understand these PA patterns. Diaz et al [ ] used a clustering-based approach for a more insightful analysis of the participants’ PA behavior and of the nature of the PA behavior changes, if present. Galy et al [ ] clustered PA levels and daily step goal achievement to assess the adherence to a health program. Fukuoka et al [ ] identified PA clusters to analyze and compare sociodemographic features and cardiometabolic risks among participants belonging to these clusters. Chen et al [ ] clustered the participants’ PA, and then established a system to adapt the exercise program for the next week as a function of the individual PA behavior change. Gasparetti et al [ ] clustered the participants’ PA to generate groups of habits recommended by a system to the participants with the objective of changing their PA to obtain weight loss effects.
Rabbi et al  generated personalized suggestions in which users were asked to continue, avoid, or make small changes to their existing PA behaviors in order to help them reach their PA goals. Forman et al [ ] developed an algorithm that could personalize and optimize the PAL during the intervention as a function of the amount of PA performed. Aguilera et al [ ] generated personalized messages for participants in the intervention to increase their PA and consequently the intervention effectiveness. Zhou et al [ ] adapted the step goal settings of the intervention depending on the PA behavior change. Zhu et al [ ] personalized social comparison among participants to motivate them toward improving their PA behavior.
Social Cognitive Model
Mollee and Klein  developed a model that simulates changes in PALs over 2 to 12 weeks to optimize the participants’ health outcome.
Social Contagion Model
Mollee et al  used a social contagion model to explain the PAL dynamics in a community.
PA Windows Permutations
Sprint et al  proposed a window-based algorithm to detect changes in segments of users’ PA behavior to motivate progress toward their goals.
Angelides et al  used genetic algorithms and Pareto optimality to compare the participants’ and peer community’s data to help participants interpret the PA data and to generate personal lifestyle improvement recommendations.
Interpretation of the Data Mining Models
Overview of Models
The resulting data mining models detecting PA behavior changes were used for several purposes, as summarized inand below.
|Personalized feedback||[, , , , , ]|
|Personalized program||[, , , ]|
|Support for self-reflection|||
|Cohort analysis of the intervention impact on PAa||[, - , ]|
|Analysis of the social component effects on PA||[, , ]|
aPA: physical activity.
The PA behavior changes extracted from participants’ data were used to promote PA by creating and sending personalized messages that reported the behaviors and gave suggestions for achieving the previously established PA goals. For instance, Aguilera et al  built a system that detects the participants’ PA behavior changes and generates personalized daily text messages with custom timing, frequency, and feedback about their step count/goal and motivational content. Hermens et al [ ] built a system that chooses the best suitable time to send a message with personalized intention, content, and representation. Schäfer et al [ ] created an app with gamified feedback where different avatars are awarded based on the participant’s daily PA behavior. Gasparetti et al [ ] suggested personalized PA patterns based on the participants’ PA patterns. Batool et al [ ] detected the participants’ PA behavior while commuting and suggested how to increase it. Rabbi et al [ ] generated personalized simple PA suggestions (continue, avoid, or make small changes).
The PA intervention program and objectives are adapted to each participant’s needs. For instance, Chen et al  created a guided exercise prescription system that adapts as the participants’ PA behavior changes. Similarly, Forman et al [ ] changed the participant’s exercise intensity suggestion depending on their PA behavior achievements. On the basis of each participant’s step count progress, Dijkhuis et al [ ] suggested new daily step objectives. Zhou et al [ ] used push notifications to deliver daily step goals.
Support for Self-Reflection
Algorithms can help participants to interpret their PA behavior changes. For example, Angelides et al  used an algorithm to assist in the interpretation of the participant’s PA data by comparing them with those of the peer community and to generate personalized recommendations to achieve their daily goals.
Cohort Analysis of the Intervention Impact on PA
These algorithms detect PA behavior changes in participants that allow analyzing the intervention impact. For example, Fukuoka et al  determined PA patterns in women throughout the day that could help to develop more personalized interventions and guidelines. Diaz et al [ ] analyzed the changes in PA behavior (bouts and frequency) during an intervention. Galy et al [ ] tracked the participants’ adherence to the international recommendations during an intervention. Lee et al [ ] identified PA patterns associated with specific subgroups of people who participated in an intervention. Sprint et al [ ] analyzed the participants’ PA changes during an intervention by comparing multiple time windows.
Analysis of the Social Component Effect on PA
These algorithms analyze the psychosocial influences on the participants’ PA. For example, Mollee et al  analyzed the PA dynamics in a community using a social contagion model. Mollee and Klein [ ] analyzed the PA dynamics in a networked community using social cognitive theories, and Zhu et al [ ] personalized social comparison during an intervention to increase the participants’ PA.
The main uses can be classified in two groups. The first group, composed of 11 out of the 19 (58%) selected studies, aimed to generate personalized feedback/PA programs to scaffold and support PA behavior changes among participants. Indeed, researchers seem inclined to generate greater personalization because it increases the intervention efficiency, effectiveness, enjoyment, and reliability . The second group, composed of 8 out of the 19 (42%) selected studies, sought to analyze the impact of interventions on the participants’ PA. Specifically, these studies analyzed the intervention impact on PA at the cohort level to assess health education interventions, and analyzed participants’ PA to show them their behaviors and help to understand them. The main objective of both groups was to explore how PA behavior patterns relate to the intervention effectiveness, which can add new evidence on how to create more effective interventions [ ].
We found 19 articles about data mining models and techniques to detect PA behavior changes in health education or promotion studies, and their number has progressively increased over time. We here discuss the principal findings, identify opportunities and challenges for future research directions, and present the limitations of this systematic review. The Discussion is structured according to the RQs as a guide.
Opportunities and Challenges
Sensor Types and Data Capture
All selected studies used accelerometer sensors to capture PA behaviors. While 7 out of the 19 (37%) studies utilized accelerometers exclusively, the rest employed them with other sensors. Nonaccelerometer sensors capture additional information that may be relevant to PA (such as work/school schedule, itineraries, and sleep patterns ) and could yield auxiliary features for the data mining models. For instance, GPS sensors provide the number of kilometers and location of PA performed.
The median number of participants in the selected studies was 74, and participants were mainly young or middle-aged adults. This low number of participants and the skew toward adults may have generated biased data mining models that can detect and find behavior changes only in a specific population. Different population groups behave differently and should be studied independently. For instance, PA behaviors are different in children and adults . Some of the studies focused on groups with specific PA behaviors, such as children [ , ], adolescents [ ], and women [ ]. However, some population groups with distinctive PA patterns, such as pregnant women [ ] and people with health conditions or disabilities [ ], may need custom detection models.
In 15 out of the 19 (79%) included studies, data were recorded for less than 3 months. Therefore, the current methods for detecting PA behavior changes have been developed mostly for capturing short-term patterns, making the conclusions valid only for short periods. To detect medium- and long-term PA behavior changes, studies with more extended recording periods are needed, such as the study by Gasparetti et al  based on data collected during 1 year. Moreover, new methods to detect extended (eg, annual or seasonal) PA patterns are required to study how the participants’ behavior and habits change over time. An increase in the participants’ number and recording length will lead to new challenges related to big data analysis, such as efficient data management and data mining processing speeds.
Many of the selected studies used commercial accelerometers that allow only the retrieval of aggregated preprocessed data using proprietary software (ie, number of steps per minute), without being transparent on how data were preprocessed (ie, how steps were calculated from the accelerometry data). This data preprocessing black box makes it impossible to determine the quality of the captured PA data and makes the data mining results scientifically irreproducible. Conversely, in studies that used medical-grade accelerometers, the accelerometry data were explained in detail and the preprocessing steps were documented and referenced.
We found a lack of standard procedures for data preprocessing that made it challenging to compare the study results and conclusions. Indeed, if data are not preprocessed correctly, this could cause the transfer of incorrect information to the features and then to the data mining models. This could lead to the creation of inaccurate models, thus limiting the study validity. Data cleaning is a good example of this issue. Indeed, the best procedure to eliminate the nonwearing time remains unclear along with the impact on the accuracy of the resulting models. If nonwearing time is poorly removed, features can generate a PA underestimation by recognizing nonwearing time as sedentary behavior when it is not. Moreover, if sensor data concerning changes in accelerations while commuting by car or bus are not completely removed, they will be erroneously classified as steps, thereby overestimating PA in the model and in the conclusions. Similarly, sedentary activities could be overestimated if sleep time is not correctly removed.
Most of the selected studies aggregated information by day or minute. Although data aggregation is useful when comparing general features of PA behaviors, such as daily steps, this procedure may overlook subtle behavioral changes that can be crucial for detecting major PA behavior changes. For instance, if a person who walks every morning decides to change their behavior and starts to walk at night, the sum of daily steps will be the same, but this new behavior will not be detected. Conversely, it could be detected if the aggregation level is changed to the hour. To detect these and other subtle behavior changes, PA should be analyzed simultaneously at different aggregation levels, and new time frames should be created to match daily habits and behaviors, such as periods of the day (eg, morning, afternoon) or participants’ office hours.
Features Used to Detect and Extract Behavior Changes
Most of the preprocessed data were transformed into features that are simple descriptive statistics, such as the total time spent at a specific PAL or the mean number of steps. These features are valuable to detect behavior changes, but they mainly capture the PA intensity and the PA presence or absence. Yet, PA has more valuable characteristics that vary during PA behavior changes and that can help to detect such behavior changes, such as the length of PAL bouts or the amount of time spent doing PA. These PA characteristics can be extracted from current sensor data. For instance, Galy et al  explored different moderate-to-vigorous PA bout lengths and Sprint et al [ ] assessed the circadian rhythm. International PA guidelines can serve as inspiration to identify new PA features. For instance, according to World Health Organization recommendations, adults should perform muscle-strengthening activities (involving all major muscle groups) at moderate or higher intensity at least twice per week [ ]. This calls for the creation of features that capture the muscle activity type, intensity, and frequency. Moreover, most of the included studies used only PA-derived features to detect behavior changes, and did not consider relevant non-PA data associated with PA changes, such as the participants’ weight and quality of sleep. Some studies captured non-PA data, but they did not use them to detect PA changes. For instance, Rabbi et al [ ] used only PA-derived data (PA frequency and calories burned) to detect behavior changes, although they also recorded the participants’ food intake, thus excluding their caloric intake that is closely related to weight and the amount of PA participants are likely perform.
The use of simple descriptive statistics as features and the exclusion of non-PA data associated with behavior changes indicate that sensor data were underexploited and that the features used to detect PA behavior changes are still underdeveloped. Including new PA characteristics and new non-PA features could help to better understand the nature of PA changes and how these features influence PA behavior changes, ultimately increasing the model detection accuracy.
Data Mining Methods and Techniques
Most studies used off-the-shelf classifiers, clusters, and decision-making algorithms to detect PA behavior changes. We expected to find tailor-made algorithms because in health education settings, it is important to find specific PA patterns in participants of different classes who follow learning modules with different contents and with different PA goals. Moreover, we noticed that most authors did not explain how they chose the algorithms and did not specify the efficiency and accuracy of the models used for detecting PA behavior changes, raising uncertainty about how good they are at this task. This suggests that more efficient and accurate algorithms could be created and calls for more transparency in the algorithm choice process. Therefore, authors should explicitly describe the steps and methodology of new algorithms, and share their source codes to be scrutinized and to compare their detection accuracy. The creation of open accelerometry databases is also needed to enable benchmarking.
Interpretation of the Resulting Data Mining Models
The main uses of the data mining models focused on personalization, support for self-reflection, and analysis of PA behaviors. Model interpretation focused on generating personalization and support for promoting behavior changes. Personalized feedback and intervention programs were based mostly on the participants’ PA data. The inclusion of additional information that may influence behavior changes (eg, contexts, schedules, social constraints, motivation, and weather) would allow for better interpretation and use of the detected behavior changes. Systems could exploit these additional data to improve the feedback delivery time and content, with positive effects on the effectiveness of health education programs and interventions. For instance, with the current models, a participant could receive an automatized personalized behavior change message that suggests taking a short walk, although it is snowing outside. This would decrease the likelihood of following the suggestion. However, if the system could be aware of the weather, the participant would receive this suggestion only after the weather conditions have improved, or a different suggestion that is more likely to trigger a behavior change at that point in time. Moreover, as the models relied mainly on PA features to model and interpret the behavior changes, only the physical dimension of the learning process in health education was incorporated in the models and their interpretation, leaving aside the knowledge dimension of the learning process. Learning management systems and intelligent tutoring systems already capture the knowledge dimension. Their integration would help to understand, in a comprehensive way, how participants learn, and would enable the real-time monitoring of how PA behavior changes align with the intervention purpose. This would allow adapting each participant’s content and learning objectives in real time, thereby improving instructions and learning, ultimately increasing the program or intervention effectiveness.
Most of the included studies generated complex output models that require detailed knowledge of how they were created to interpret the resulting patterns, making them difficult to understand for health scientists and any other scientist not familiar with machine learning. This is a common problem in interdisciplinary teams; however, an effort can be made to create more readable, intuitive, and easy-to-understand algorithms and methods, a goal that exists in related machine learning areas such as explainable artificial intelligence .
Studies on wearable machine learning devices to detect changes in PA in health education have only started to be published in the last decade. As research is advancing, keywords are changing and new terms are created. Although we used a wide range of keywords in our query to include sensors, PA, and health education, we may have left some keywords out, and thus we may have missed some references. This may have also affected the initial reference screening process by title and abstract. We minimized this issue by testing several queries before starting our systematic review until we found the one we ultimately used. Another possible limitation in our search is that we might have omitted references listed only in other peer-reviewed databases (we searched only the most popular databases in engineering and computer science), such as medical databases (ie, PubMed). We mitigated this risk by including grey literature in our systematic review (see the Methods section).
Regarding the research subquestions and the review structure, we created research subquestions in line with the usual data mining process steps, but we certainly left some topics unaddressed. For instance, we did not address ethics, privacy, and security issues, or how data are filtered during preprocessing (eg, sleeping time or sensor nonuse). Although these are common substeps during the data mining process and including them would have made this systematic review more comprehensive, we preferred to limit this review only to the critical steps.
In the last 10 years, different methods have been developed to detect behavior changes in health education or health promotion contexts. These methods have been tested in small populations, are based on short data-recording periods, and rely mainly on accelerometry data. Incorporating information that is complementary to the participants’ PA data would allow for creating more precise detection models, better interpreting these models, and understanding how participants learn and what triggers new behaviors. Exploring other data aggregation levels, in addition to days and minutes, could help to detect more subtle and long-term behavior changes. Fully describing the data preprocessing methods and the efficiency and accuracy of the behavior change detection models would help to better understand, scrutinize, and compare studies. Detection models were mainly used to generate personalized feedback and to provide support for promoting or maintaining behavior changes, but did not integrate the knowledge dimension of the learning process. Adding the knowledge dimension and creating easier-to-understand models could facilitate the interpretation of participants’ behavior changes in a more comprehensive way, opening the way toward better and deeper analyses and personalization.
CD acknowledges the support of the Universidad Adolfo Ibáñez and Comisión Nacional de Investigación Científica y Tecnológica/Chilean National Commission for Scientific and Technological Research (CONICYT) “Becas Chile” Doctoral Fellowship program (grant number 72200111).
Conflicts of Interest
- Plasqui G, Bonomi AG, Westerterp KR. Daily physical activity assessment with accelerometers: new insights and validation studies. Obes Rev 2013 Jun 07;14(6):451-462. [CrossRef] [Medline]
- Bull FC, Al-Ansari SS, Biddle S, Borodulin K, Buman MP, Cardon G, et al. World Health Organization 2020 guidelines on physical activity and sedentary behaviour. Br J Sports Med 2020 Dec 25;54(24):1451-1462 [FREE Full text] [CrossRef] [Medline]
- Rowlands AV. Moving forward with accelerometer-assessed physical activity: two strategies to ensure meaningful, interpretable, and comparable measures. Pediatr Exerc Sci 2018 Nov 01;30(4):450-456. [CrossRef] [Medline]
- Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process for extracting useful knowledge from volumes of data. Commun ACM 1996 Nov;39(11):27-34. [CrossRef]
- Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol 2009 Oct;62(10):1-34 [FREE Full text] [CrossRef] [Medline]
- Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev 2016 Dec 05;5(1):210 [FREE Full text] [CrossRef] [Medline]
- Aguilera A, Figueroa CA, Hernandez-Ramos R, Sarkar U, Cemballi A, Gomez-Pathak L, et al. mHealth app using machine learning to increase physical activity in diabetes and depression: clinical trial protocol for the DIAMANTE Study. BMJ Open 2020 Aug 20;10(8):e034723 [FREE Full text] [CrossRef] [Medline]
- Lee PH, Yu Y, McDowell I, Leung GM, Lam T. A cluster analysis of patterns of objectively measured physical activity in Hong Kong. Public Health Nutr 2013 Aug;16(8):1436-1444. [CrossRef] [Medline]
- Zhou M, Fukuoka Y, Mintz Y, Goldberg K, Kaminsky P, Flowers E, et al. Evaluating machine learning-based automated personalized daily step goals delivered through a mobile phone app: randomized controlled trial. JMIR Mhealth Uhealth 2018 Jan 25;6(1):e28 [FREE Full text] [CrossRef] [Medline]
- Rabbi M, Pfammatter A, Zhang M, Spring B, Choudhury T. Automated personalized feedback for physical activity and dietary behavior change with mobile phones: a randomized controlled trial on adults. JMIR Mhealth Uhealth 2015 May 14;3(2):e42 [FREE Full text] [CrossRef] [Medline]
- Galy O, Yacef K, Caillaud C. Improving Pacific adolescents' physical activity toward international recommendations: exploratory study of a digital education app coupled with activity trackers. JMIR Mhealth Uhealth 2019 Dec 11;7(12):e14854 [FREE Full text] [CrossRef] [Medline]
- Fukuoka Y, Zhou M, Vittinghoff E, Haskell W, Goldberg K, Aswani A. Objectively measured baseline physical activity patterns in women in the mPED Trial: cluster analysis. JMIR Public Health Surveill 2018 Feb 01;4(1):e10 [FREE Full text] [CrossRef] [Medline]
- Sprint G, Cook DJ, Schmitter-Edgecombe M. Unsupervised detection and analysis of changes in everyday physical activity data. J Biomed Inform 2016 Oct;63:54-65 [FREE Full text] [CrossRef] [Medline]
- Zhu J, Dallal DH, Gray RC, Villareale J, Ontañón S, Forman EM, et al. Personalization Paradox in Behavior Change Apps. Proc ACM Hum Comput Interact 2021 Apr 22;5(CSCW1):1-21. [CrossRef]
- Gasparetti F, Aiello LM, Quercia D. Personalized weight loss strategies by mining activity tracker data. User Model User-Adap Inter 2019 Jul 26;30(3):447-476. [CrossRef]
- Batool T, Vanrompay Y, Neven A, Janssens D, Wets G. CTASS: an intelligent framework for personalized travel behaviour advice to cardiac patients. J Ambient Intell Human Comput 2018 May 21;10(12):4693-4705. [CrossRef]
- Angelides MC, Wilson LAC, Echeverría PLB. Wearable data analysis, visualisation and recommendations on the go using android middleware. Multimed Tools Appl 2018 Jun 4;77(20):26397-26448. [CrossRef]
- Schäfer H, Bachner J, Pretscher S, Groh G, Demetriou Y. Study on motivating physical activity in children with personalized gamified feedback. 2018 Presented at: UMAP 18 26th Conference on User Modeling, Adaptation and Personalization; July 8-11, 2018; Singapore. [CrossRef]
- Forman EM, Kerrigan SG, Butryn ML, Juarascio AS, Manasse SM, Ontañón S, et al. Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? J Behav Med 2019 Apr 25;42(2):276-290 [FREE Full text] [CrossRef] [Medline]
- Hermens H, op den Akker H, Tabak M, Wijsman J, Vollenbroek M. Personalized Coaching Systems to support healthy behavior in people with chronic conditions. J Electromyogr Kinesiol 2014 Dec;24(6):815-826. [CrossRef] [Medline]
- Chen H, Chen F, Lin S. An AI-based exercise prescription recommendation system. Appl Sci 2021 Mar 16;11(6):2661. [CrossRef]
- Dijkhuis TB, Blaauw FJ, van Ittersum MW, Velthuijsen H, Aiello M. Personalized physical activity coaching: a machine learning approach. Sensors 2018 Feb 19;18(2):623 [FREE Full text] [CrossRef] [Medline]
- Mollee JS, Araújo EFM, Manzoor A, van Halteren AT, Klein MCA. Explaining changes in physical activity through a computational model of social contagion. In: Gonçalves B, Menezes R, Sinatra R, Zlatic V, editors. Complex Networks VIII. CompleNet 2017. Springer Proceedings in Complexity. Cham: Springer; 2017:213-223.
- Diaz C, Galy O, Caillaud C, Yacef K. A clustering approach for modeling and analyzing changes in physical activity behaviors from accelerometers. IEEE Access 2020;8:224123-224134. [CrossRef]
- Mollee JS, Klein MCA. Empirical validation of a computational model of influences on physical activity behavior. In: Benferhat S, Tabia K, Ali M, editors. Advances in artificial intelligence: from theory to practice. IEA/AIE 2017. Lecture Notes in Computer Science, vol 10351. Cham: Springer; 2017:353-363.
- Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, et al. Top 10 algorithms in data mining. Knowl Inf Syst 2007 Dec 4;14(1):1-37. [CrossRef]
- Cheung KL, Durusu D, Sui X, de Vries H. How recommender systems could support and enhance computer-tailored digital health programs: a scoping review. Digit Health 2019 Jan 24;5:2055207618824727 [FREE Full text] [CrossRef] [Medline]
- Williams SL, French DP. What are the most effective intervention techniques for changing physical activity self-efficacy and physical activity behaviour--and are they the same? Health Educ Res 2011 Apr 14;26(2):308-322. [CrossRef] [Medline]
- Semplonius T, Willoughby T. Long-term links between physical activity and sleep quality. Med Sci Sports Exerc 2018 Dec;50(12):2418-2424. [CrossRef] [Medline]
- Borodulin KM, Evenson KR, Wen F, Herring AH, Benson AM. Physical activity patterns during pregnancy. Med Sci Sports Exerc 2008 Nov;40(11):1901-1908 [FREE Full text] [CrossRef] [Medline]
- Temple VA, Walkley JW. Physical activity of adults with intellectual disability. J Intellec Dev Disabil 2009 Jul 10;28(4):342-353. [CrossRef]
- Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 2018;6:52138-52160. [CrossRef]
|ACM: Association for Computing Machinery|
|ERIC: Education Resources Information Center|
|MET: metabolic equivalent|
|PA: physical activity|
|PAL: physical activity level|
|PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses|
|RQ: research question|
Edited by C Lovis; submitted 17.07.22; peer-reviewed by B Hoyt; comments to author 18.09.22; revised version received 25.11.22; accepted 27.11.22; published 06.03.23Copyright
©Claudio Diaz, Corinne Caillaud, Kalina Yacef. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 06.03.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.