This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Eating disorders affect an increasing number of people. Social networks provide information that can help.
We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders domain.
We collected tweets related to eating disorders, for 3 consecutive months. After preprocessing, a subset of 2000 tweets was labeled: (1) messages written by people suffering from eating disorders or not, (2) messages promoting suffering from eating disorders or not, (3) informative messages or not, and (4) scientific or nonscientific messages. Traditional machine learning and deep learning models were used to classify tweets. We evaluated accuracy, F1 score, and computational time for each model.
A total of 1,058,957 tweets related to eating disorders were collected. were obtained in the 4 categorizations, with The bidirectional encoder representations from transformer–based models had the best score among the machine learning and deep learning techniques applied to the 4 categorization tasks (F1 scores 71.1%-86.4%).
Bidirectional encoder representations from transformer–based models have better performance, although their computational cost is significantly higher than those of traditional techniques, in classifying eating disorder–related tweets.
Physical appearance is an essential element for people in this society. Although many studies corroborate that moderate physical activity and proper nutrition help to maintain a healthy body [
The prevalence of eating disorders has been increasing [
With the emergence of social media, studies [
Despite the increase in studies on eating disorders that have, for example, analyzed pro–eating disorder websites [
Our main objectives were to achieve accurate text classification in performing these 4 tasks, to compare the efficiency of text classification models using traditional machine learning techniques and those using novel techniques, such as pretrained bidirectional encoder representations from transformer (BERT)–based models, to determine which approach has the best combination of performance and computational cost and would be useful for future research.
In our previous research [
Social media, specifically social networks, have become very important sources of information within the field of health informatics. Health informatics includes the design and application of innovations based on information technologies to solve problems related to public health and health services [
Health-related research using social media is mainly focused on two areas. In real-time monitoring and the prediction of diseases (eg, influenza), it is possible to collect and use messages that have been geographically localized and that are on topics of interest. In this way, research tasks related to the user discussions are a simple task. Social media are also used to determine perspectives on different health problems and conditions. Thus, social media are useful, easy to use, and very important tools for observational studies.
Twitter is a very popular and widely used social network within the field of health and social health research. Some studies [
Studies [
Social media facilitate a great deal of research in the field of health informatics, for example, sentiment analysis, behavioral analysis, or information dissemination analysis, which make use of techniques related to machine learning or deep learning techniques for the classification and prediction of content that has been prepared using natural language processing.
Supervised machine learning techniques are used to predict an outcome based on a given input by constructing an input–output pair. The main goal is to build a model that can then be used to make accurate predictions using new data.
Tasks in the field of supervised machine learning include regression—the prediction of a real number—and classification—the prediction of a class label [
Classification techniques make it possible to categorize large data sets efficiently to study text-based data. This approach has many advantages—more accurate predictions than those of humans and time savings [
Naïve Bayes classifiers have been used to predict Zika and dengue diseases using data obtained from Twitter [
Other studies [
It is also possible to combine different classification algorithms and compare their performance to use the best performing classifier for a given task [
There are a number of studies that make use of data related to eating disorders [
A previous social media study predicted depression from texts [
A tool (T-Hoarder [
T-Hoarder allowed us to obtain additional information about tweets for further analysis, such as, ID, text, and author (among other fields). Tweets were identified by keywords [
By using a different Twitter accounts for each set, more tweets could be obtained without exceeding the Twitter platform's usage limit. English terms were used because more tweets are generated in English [
Study workflow: (A) data collection and preprocessing, (B) classification model training, and (C) evaluation. BERT: bidirectional encoder representations from transformer, ML: machine learning.
Preprocessing was conducted in Python (version 3.6). Data were loaded from documents obtained through T-Hoarder, which generates a file up to 100 MB; therefore, 4 files were obtained for data set 1, 4 files were obtained for data set 2, and 2 files were obtained for data set 3. Some data, such as location, name, and biography, contained line breaks or tabs. To avoid conflicts with delimiters, tabs and line breaks were removed using a function. After preprocessing the data frames, they were concatenated into a single data frame. In order to be able to work in a more agile way with the data frame, the memory usage of the data frame was calculated and optimized by converting numeric columns into numbers, converting dates to datetime format, and converting the remaining objects into categories. These steps helped reduce the data frame from 2.7 GB to 1.1 GB. We removed all tweets that were retweets, duplicates (because we unified data sets that might contain common tweets), and non–English tweets.
To select the subset of 2000 tweets, manual filtering was performed to eliminate tweets that were not related to eating disorder issues. Some of our keywords were too generic and meant that the tweets collected were not about eating disorders. For example, some of these words that triggered the collection of tweets unrelated to eating disorders were “food problem,” “inappetence,” “food issue,” and “bingeing”; however, in order to generate predictive models with greater accuracy and less bias, we kept a small sample of tweets (n=286) that did not belong to any of the categories, but that did contain some of the keywords of interest.
Tweets in 4 different categories in the subset were manually labeled (
Categories of labeled tweets and examples.
Category topics | Tweet | |
|
|
|
|
Written by someone who suffers from eating disorder | i was stressed and ate a whole bowl of pasta, where’s my badge for being the worst anorexic #edtwt |
|
Written by someone who does not have an eating disorder | Is your #teenager not eating or eating a lot less than normal? She might be suffering from #anorexia. We can help; please come see us https://t.co/GfStM1IVGz #weightloss #losingweight https://t.co/z5NK0tjNIt |
|
|
|
|
Promotes eating disorders | Currently feeling like the best anorexic #eating disordertwt |
|
Not promotes eating disorders | Higher-calorie diets could lead to a speedier recovery in patients with anorexia nervosa, study shows https://t.co/mipX3nrhHN |
|
|
|
|
Informative |
#AnorexiaNervosa – A Father and Daughter Perspec- |
|
Noninformative | Binge eating makes me sad :( #eatingdisorder |
|
|
|
|
Scientific | The problem extends to Food and Drug Administration and National Institutes of Health data sets used in a recent study appearing in Reproductive Toxicology. #ai #technology #BigData #ML https://t.co/DFvh6gNA38 |
|
Nonscientific | Do not waste time thinking about what you could have done differently. Keep your eyes on the road ahead and do it differently now. #anorexia #eatingdis- order #recovery #nevergiveup #alwayskeepfighting |
Before training and validating the models, tweets in the labeled set with more than 80% similarity were eliminated. It was decided to apply this criterion for tweets containing the same text but using different hashtags. Remaining tweets were processed by removing the stop words (words that have no meaning on their own and that modify or accompany other words, for example, articles, pronouns, adverbs, prepositions, or some verbs) and punctuation or symbols, that hindered the application of machine learning techniques.
We used random forest, recurrent neural networks, bidirectional long short-term memory networks (ClassificationModel; simpletransformers [
Two models—CamemBERT [
For the random forest model, 5-fold cross-validation was used. For the neural networks, 5 different iterations were performed, and the mean F1 score and accuracy were obtained.
Random forest models [
One of the advantages offered by this type of model is the additional randomness when more trees are included. The algorithm searches for the best feature as a node is split from a random set of features. This makes it possible to obtain models with better performance. When a node is split, only a random subset of features is considered. Random thresholds can also be used for each feature, instead of searching for the best possible threshold, which adds additional randomness.
In this type of neural network, a temporal sequence that contains a directed graph made up of connections between different nodes is defined. These networks have the capacity to show a dynamic temporal behavior. These types of networks, which are derived from feedforward neural networks, have the ability to use memory (their internal state) to process input sequences of varying lengths. This feature makes recurrent neural networks useful for tasks such as unsegmented and connected handwriting recognition or speech recognition [
There are 2 classes of recurrent neural networks—finite-pulse and infinite-pulse. The former are made up of a directed acyclic graph that can be unrolled and replaced by a strictly feedforward neural network, whereas the latter are made of a directed cyclic graph, which does not allow the graph from being unrolled.
Bidirectional long short-term memory networks [
The bidirectional encoder representations from transformer framework is not a model in itself. According to Devlin et al [
In the bidirectional encoder representations from transformer–based method, a neural network is trained to learn a language, similar to transfer learning in computer vision neural networks, and follows the linguistic representation in a bidirectional way, looking at the words both after and before each words. It is the combination of these approaches that has made it a successful natural language processing method [
We used Jupyter notebook and TensorFlow and Pytorch libraries. It was necessary to use both libraries because, currently, bidirectional encoder representations from transformer–based networks can only be generated through Pytorch, while TensorFlow is one of the most widely used libraries to generate random forest, recurrent neural network, and bidirectional long short-term memory models.
We used a grid search (GridSearchCV) to select the random forest parameters (
To train recurrent neural networks (sklearn; keras) to perform the binary categorization tasks, the sigmoid activation function used (
For the 7 pretrained bidirectional encoder representations from transformer–based models, the hyperparameters were
All experiments and data are published in a repository accessible to anyone [
Random forest hyperparameters.
Category | criterion | max_depth | max_features | n_estimators |
Category 1 | gini | 7 | log2 | 200 |
Category 2 | gini | 8 | auto | 1000 |
Category 3 | gini | 8 | sqrt | 800 |
Category 4 | gini | 8 | auto | 1000 |
Architecture of the recurrent neural network network. LSTM: long short-term memory.
Architecture of the bidirectional long short-term memory (LSTM) network.
A total of 1,085,957 tweets, written and posted on Twitter between October 20, 2020 and December 26, 2020, were collected. After preprocessing, a total of 494,025 valid tweets were obtained. These tweets are shared and publicly available on the Kaggle platform [
Table of terms and frequencies of the 10 most repeated terms in the initial data set and in the labeled subset of data.
Term | Frequency, n | ||
|
|
||
|
hey mp | 230,013 | |
|
healthy | 210,430 | |
|
pltpinkmonday | 209,330 | |
|
eat | 183,436 | |
|
covid19 | 156,541 | |
|
edtwt | 123,175 | |
|
anorexia | 112,864 | |
|
disorders | 102,063 | |
|
endsars | 99,844 | |
|
bachelorette | 48,370 | |
|
problem | 45,959 | |
|
|
||
|
eat | 1132 | |
|
disorder | 830 | |
|
food | 410 | |
|
recovery | 382 | |
|
edtwt | 301 | |
|
binge | 282 | |
|
people | 245 | |
|
anorexic | 244 | |
|
research | 226 | |
|
study | 202 | |
|
problem | 199 |
In category 1, 50.2% (942/1877) of tweets were written by a person with an eating disorder, and 49.8% (935/1877) of tweets were written by a person without an eating disorder. In category 2, 23.8% (447/1877) of tweets encourage people to suffer from an eating disorder, and 76.2% (1400/1877) of tweets do not encourage people to suffer from an eating disorder.
In category 3, 37% (694/1877) of the tweets were informative, 63% (1183/1877) of tweets were opinionated. In category 4, 23.3% (437/1877) of the tweets were scientific, 76.7% (1440/1877) of tweets were of a nonscientific nature.
Performance (
Classification performance.
Model | Having eating disorders or not | Encouraging eating disorders or not | Informative or not | Scientific or not | ||||
|
F1 score, % | Accuracy, % | F1 score, % | Accuracy, % | F1 score, % | Accuracy, % | F1 score, % | Accuracy, % |
Random forest | 79.8 | 79.2 | 47 | 76.7 | 49.2 | 73.7 | 27.3 | 80.4 |
Recurrent neural network | 83.2 | 82.6 | 61 | 82.1 | 67.3 | 70.7 | 67.3 | 70.7 |
Bidirectional long short-term memory | 78.5 | 79.3 | 67.1 | 86.7 | 67.1 | 78.7 | 76.8 | 85.8 |
Bidirectional encoder representations from transformer–baseda | 83.3 | 83 | 71.9 | 87.2 | 77.6 | 84.3 | 86 | 94.1 |
RoBERTaa | 83.8 | 83.1 | 74.3 | 88.5 | 77.6 | 84.4 | 86.4 | 94.2 |
DistilBERTa | 84 | 83.1 | 72.3 | 87.3 | 75 | 82.8 | 84.2 | 93.3 |
CamemBERTa | 79.1 | 78.7 | 73.6 | 87.8 | 74.7 | 81.7 | 82.5 | 92.3 |
ALBERTa | 81.2 | 80.4 | 74.3 | 88.2 | 73.8 | 81.5 | 83.3 | 93 |
FlauBERTa | 82.6 | 81.7 | 72.9 | 87.5 | 72.2 | 80 | 83.4 | 92.7 |
RobBERTa | 78.8 | 78.4 | 71.1 | 86.2 | 73.8 | 81.6 | 83 | 92.6 |
aA pretrained model was used: bert-based-multilingual-cased for BERT, roberta-base for RoBERTa, distilbert-base-cased for DistilBERT, camembert-base for CamemBERT, albert-base-v1 for ALBERT, flaubert-base-cased for FlauBERT, and robbert-v2-dutch-base for RobBERT.
For bidirectional encoder representations from transformer–based models, despite obtaining better performance metrics in terms of accuracy, the training and validation times of the models are much higher than those of random forest, recurrent neural network, and bidirectional long short-term memory models. For example, bidirectional encoder representations from transformer–based models take approximately 15 times longer than random forest models (
The improvements between the accuracy of the best bidirectional encoder representations from transformer–based model (Categorization 1: DistilBERT 83.1%; Categorization 2: RoBERTa 88.5%; Categorization 3: RoBERTa 84.4%; Categorization 4: RoBERTa 94.2%) and that of the best model between random forest, recurrent neural network, or bidirectional long short-term memory models (Categorization 1: recurrent neural network 82.6%; Categorization 2: bidirectional long short-term memory 86.7%; Categorization 3: bidirectional long short-term memory 78.7%; Categorization 4: bidirectional long short-term memory 85.8%) were 0.61%, 2.08%, 7.24%, and 9.79%, respectively.
Implementation time.
Model | Time (seconds) | |||
|
Having eating disorders or not | Encouraging eating disorders or not | Informative or not | Scientific or not |
Random forest | 1.74 | 12.8 | 10.4 | 12.9 |
Recurrent neural network | 152.1 | 163.1 | 151.5 | 153.7 |
Bidirectional long short-term memory | 163.2 | 175.3 | 164.8 | 167.9 |
Bidirectional encoder representations from transformer–based | 1257.4 | 1232.1 | 1292.7 | 1311.4 |
RoBERTa | 1116.2 | 1158.8 | 1142.5 | 1192.8 |
DistilBERT | 1343.3 | 1327.8 | 1332.0 | 1362.3 |
CamemBERT | 1472.3 | 1457.5 | 1462.0 | 1493.4 |
ALBERT | 1372.7 | 1352.3 | 1331.3 | 1392.5 |
FlauBERT | 1203.9 | 1207.1 | 1202.1 | 1235.1 |
RobBERT | 1234.4 | 1215.4 | 1319.7 | 1123.5 |
Practitioners and researchers can benefit from the use of social media data in the field of eating disorder. Although the model with the best accuracy was always one of the pretrained bidirectional encoder representations from transformer–based models, the computational costs compared with those of simpler models may be excessive. The difference between the accuracy of the best bidirectional encoder representations from transformer–based model and the best of the 3 simpler models (random forest, recurrent neural network, and bidirectional long short-term memory) did not exceed 9.79%.
Given the high computational cost, use of bidirectional encoder representations from transformer–based models in this instance may not be essential. The accuracy for the 4 different categorization tasks is relatively high even in the simplest models.
Despite the fact that we used only 1877 tweets (which is similar to the amounts used in previous studies: 2219 [
For the classification of tweets into informative or noninformative (categorization 3), our models obtained a higher accuracy (80%-84.4%) than those in previous studies (77.7% [
This research has several limitations. (1) It was limited to a social media platform, (2) some categorization tasks were not balanced, which may lead to bias in the generated models, (3) the training set was sufficient but could be larger for better results in a real environment, and (4) when labeling tweets, it is possible that there was a bias in determining whether a tweet was written by someone with an eating disorder due to lack of information about the user.
Machine learning and deep learning models were used to classify eating disorder–related tweets into binary categories in 4 categorization tasks, with accuracies greater than 80%. The best performing models were RoBERTa and DistilBERT, both bidirectional encoder representations from transformer–based classification methods.
The computational cost was much higher for the bidirectional encoder representations from transformer–based models compared to those of the simpler models (random forest, recurrent neural network or traditional bidirectional long short-term memory), time invested in training and validation was greater by a factor of 10.
Future work will include (1) increasing the training and validation data set, (2) applying natural language processing techniques that make use of ontologies with which it is possible to include automation and logical reasoning, (3) integrating predictive models in a real-world development project, such as a Twitter bot, and (4) validating the model using texts written by patients with eating disorders and who are in treatment.
None declared.