The Evolution of Rumors on a Closed Social Networking Platform During COVID-19: Algorithm Development and Content Study

Background: In 2020, the COVID-19 pandemic put the world in a crisis regarding both physical and psychological health. Simultaneously, a myriad of unverified information flowed on social media and online outlets. The situation was so severe that the World Health Organization identified it as an infodemic in February 2020. Objective: The aim of this study was to examine the propagation patterns and textual transformation of COVID-19–related rumors on a closed social media platform. Methods: We obtained a data set of suspicious text messages collected on Taiwan’s most popular instant messaging platform, LINE, between January and July 2020. We proposed a classification-based clustering algorithm that could efficiently cluster messages into groups, with each group representing a rumor. For ease of understanding, a group is referred to as a “rumor group.” Messages in a rumor group could be identical or could have limited textual differences between them. Therefore, each message in a rumor group is a form of the rumor. Results: A total of 936 rumor groups with at least 10 messages each were discovered among 114,124 text messages collected from LINE. Among 936 rumors, 396 (42.3%) were related to COVID-19. Of the 396 COVID-19–related rumors, 134 (33.8%) had been fact-checked by the International Fact-Checking Network–certified agencies in Taiwan and determined to be false or misleading. By studying the prevalence of simplified Chinese characters or phrases in the messages that originated in China, we found that COVID-19–related messages, compared to non–COVID-19–related messages, were more likely to have been written by non-Taiwanese users. The association was statistically significant, with P <.001, as determined by the chi-square independence test. The qualitative investigations of the three most popular COVID-19 rumors revealed that key authoritative figures, mostly medical personnel, were often misquoted in the messages. In addition, these rumors resurfaced multiple times after being fact-checked, usually preceded by major societal events or textual transformations. Conclusions: To fight the infodemic, it is crucial that we first understand why and how a rumor becomes popular. While social media has given rise to an unprecedented number of unverified rumors, it also provides a unique opportunity for us to study the propagation of rumors and their interactions with society. Therefore, we must put more effort into these areas.


Introduction
Online social media has democratized content. By creating a direct path from content producers to consumers, the power of production and sharing of information has been redistributed from limited parties to general populations. However, social media platforms have also given rise to the proliferation of misinformation and enabled the fast dissemination of unverified rumors [1][2][3]. In 2020, the COVID-19 pandemic put the world in a crisis regarding both physical and psychological health. A myriad of unverified information flowed on social media. Rumors and claims of erroneous health practices even interfered with the control of COVID-19 in various parts of the world [4,5]. The World Health Organization (WHO) identified this situation as an infodemic in February 2020 [6], indicating its seriousness.
Previous studies revealed that people relied on social media to gather COVID-19 information and guidelines [7,8]. Efforts have, thus, been put into studies examining true and false rumors on social media [9][10][11]. For example, Cinelli et al [9] compared feedback to reliable and questionable COVID-19 information across five platforms, including Twitter, YouTube, and Gab. Gallotti et al [10] looked at how much unreliable COVID-19 information Twitter users were exposed to across countries.
Machine learning and deep learning techniques have been employed to study COVID-19 posts on social media, with much of the focus on topic modeling, sentiment analysis, and misinformation detection [12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Both sentiment analysis and misinformation detection are supervised classification problems. Many studies have employed the Valence Aware Dictionary and Sentiment Reasoner (VADER) model or long short-term memory (LSTM) for sentiment analysis and ensemble machine learning models, such as Extreme Gradient Boosting (XGBoost), for misinformation detection [13][14][15][17][18][19]23,25]. Topic modeling, on the other hand, is an unsupervised clustering method. Among topic modeling studies, latent Dirichlet allocation (LDA) was the most widely used algorithm [12,15,17,19,21,22,24], and other favorites included k-means clustering [14,16]. For example, Chandrasekaran et al [15] utilized LDA to extract 26 topics among 13.9 million English COVID-19 Twitter posts. Then they adopted the VADER model to compute sentiment scores for each topic. Jelodar et al [19] employed LDA to extract topics from 560,000 COVID- 19 Twitter posts and then used the LSTM neural network to identify the sentiments of the posts. Kwok et al [21] employed LDA to extract topics and Stanford University's CoreNLP (natural language processing) to study the sentiments of Twitter posts regarding COVID-19 vaccinations from Australian Twitter accounts. Also, Chen et al [16] compared the COVID-19 discussions on Twitter and Weibo using t-distributed stochastic neighbor embedding dimensionality reduction with the k-means clustering algorithm to extract topics.
Despite the instructive knowledge provided by the aforementioned machine learning studies, there are two identifiable gaps. First, most studies concentrated on public social media platforms, with the majority using Twitter as the data source [12][13][14][15][16][17][18][19]21,22,24,25]. Investigations on closed social media platforms, such as WhatsApp, WeChat, Telegram, or LINE, remain extremely scarce. Secondly, most studies looked at posts via their high-level theme, such as "misconceptions and complaints about COVID-19 control" [21], "psychological stress" [17], or "government response" [15]. There were limited efforts put into the study of individual narratives or rumors under a high-level theme, for example, rumors such as "protect yourself from coronavirus by putting bleach in your body" and "check for COVID-19 by holding your breath for 10 seconds or longer" under the theme of "erroneous health practices." While high-level themes and sentiments can give us an overview of the public discourse, the capability to efficiently identify individual narratives would be extremely helpful for picking up trending rumors and claims. Discussions on social media platforms are most likely not independent from each other. Thus, simply looking at billions of individual messages is not effective for identifying what rumors are receiving attention. Therefore, there is an apparent need for an efficient way to group and extract the narratives to recognize the popular ones.
Recognizing the limitations from previous studies and to solve the aforementioned problem, our goal was to use machine learning to identify individual COVID-19 rumors from a pool of social media messages, as shown in Figure 1. After identifying the rumors, we then investigated the propagation patterns and textual transformation of those rumors on a closed platform. To achieve this, we proposed a classification-based clustering algorithm to efficiently group tens of thousands of messages according to the similarity of messages. Then, we applied the algorithm to the suspicious messages on LINE, a popular messaging platform in Taiwan. Furthermore, according to the clustering results, we investigated how the messages evolved from temporal and cultural perspectives during the pandemic. To the best of our knowledge, this is the first study to examine COVID-19 rumor diffusion on a closed platform.

Data Collection
LINE is an instant messaging platform. According to the 2018 Taiwan Communication Survey (TCS), 98.5% of people in Taiwan used LINE as their primary messaging tool [26], making it the most popular closed messaging platform. In light of the increasing amount of unreliable information being exchanged through LINE, fact-checking agencies or groups, such as the Taiwan FactCheck Center, Cofacts, or MyGoPen, have developed LINE chatbots for users to voluntarily forward suspicious messages. These chatbots archive the messages and check them against their existing databases to reply with the fact-checked results.
We obtained a data set of suspicious messages forwarded by LINE users to a fact-checking LINE bot between January and July 2020. The data set included messages related to COVID-19 as well as other topics.
Along with the text content of each reported message, we also obtained the report time of each message and a unique identifier for the LINE user that reported the message. The user identifiers we received were scrambled; therefore, it was not possible for us to use the identifiers to attribute any reported message back to any actual LINE user.

Data Preprocessing
After obtaining the text messages, we preprocessed them using the following steps. First, we removed all characters that were neither simplified Chinese nor traditional Chinese. Second, we tokenized each message using the Jieba library [27] in Python (version 3.7; Python Software Foundation) and then removed tokens that were Chinese stop words from the token list. To focus on longer messages, we only kept messages with at least 20 tokens from our data set. Finally, the CountVectorizer module from Python's scikit-learn package [28] was used to create a binary word vector for each message.

Clustering Messages Into Rumor Groups by the Classification-Based Clustering Algorithm
In order to determine what messages belonged to the same rumor, we needed to define distance between messages. We wanted two messages, A and B, to be close to each other if the overlapping text between the two constituted the majority of both messages. When the overlapping text makes up the majority of A but not B, it signals that message A only constitutes a portion of B, meaning that B is likely a combination of several other rumors. In this situation, A and B should be in different groups; therefore, we would like the distance between them to be larger. Based on this idea, we defined the distance between two messages, A and B, to be as follows: where tok(·) is the set of tokens of one message and |·| denotes the number of elements in a set.
While most work relied on the LDA or k-means algorithm to separate messages into groups, both algorithms required a predefined number of final groups. That is, the users need to tell the algorithm how many groups to separate the messages into before being applied. Even though what we wanted to discover was how many narratives, or rumors, there were in all the messages by comparing the distance (equation 1) among all messages, such a requirement contradicted our needs. Hierarchical agglomerative clustering (HAC), on the other hand, starts by merging messages closer to one another into clusters and then iteratively merging closer clusters together until the distance between each cluster exceeds a predefined threshold. That is, instead of predefining a specific number of final groups like in LDA or k-means clustering, HAC determines the number of groups based on a predefined distance threshold. In addition, HAC has the advantage of accepting self-defined distance metrics. Therefore, HAC was the clustering algorithm that fitted our needs.
However, HAC can be quite slow and memory consuming. It suffers with large data sets, especially in the case of social media messages. Therefore, we devised a classification-based clustering algorithm, one that combined the k-nearest neighbors (KNN) algorithm with HAC, to efficiently perform the clustering task. The idea was to randomly select a portion of messages on which to perform HAC; the result was then used to train a KNN algorithm. The trained KNN algorithm was subsequently used to predict the rest of the messages. A detailed algorithm is outlined in Textbox 1, and a flowchart of the algorithm is presented in Figure 2. The experimentation details are outlined in Multimedia Appendix 1, and we demonstrate the efficiency and effectiveness of this algorithm in the following subsection. The algorithm was implemented with the KNeighborsClassifiers and AgglomerativeClustering modules from the Python library scikit-learn [28]. The library gensim (version 3.8.3) [29] was also used in experiments to implement the LDA model for comparisons. We released the code to implement the model in a GitHub repository [30]. Textbox 1. The classification-based clustering algorithm (hierarchical agglomerative clustering plus k-nearest neighbors algorithm).

Notation:
1. (A) j : j th element of set A.

Input:
1. D: the set of all documents to be grouped.
2. D T : the set of tokenized documents. The order is preserved as D.

Algorithm:
1. Select u × |D T | elements from D T , denoted as D T u , and the rest not selected as set D T v .
3. Feed M into hierarchical clustering with a distance threshold of λ. We will get back a sequence of labels L u , where (L u ) i is the label of element (D T u ) i . Elements with the same label are in the same cluster. Since the label itself does not carry meaning, manipulate them so they are all nonnegative whole numbers. 5. Train a k-nearest neighbors classifier K using the training set (D T u , L' u ). Then use K to predict the labels of D T v . Denote the prediction as L v .

For each unique label
6. Construct L by combining L' u and L v , where (L) i is the label of (D T ) i .

Comparing the Classification-Based Clustering Algorithm With Other Popular Algorithms
From Figure 3, we can see that the classification-based clustering algorithm, the HAC+KNN model, greatly reduced the runtime compared to using only HAC, especially when the train portion value u was less than 0.60. Furthermore, such a significant gain in speed did not compromise the clustering results. With the HAC model's results as the gold standard to compare with, the precision values ( Figure 4), recall values ( Figure 5), and F scores ( Figure 6) from the HAC+KNN model remained greater than 99% when the train portion u was not lower than 0.40. The results demonstrated that the HAC+KNN model's assignments of groups were complete, as measured by recall, and the use of KNN did not introduce too many errors in each group, as measured by precision.    We observed that the runtime of the k-means clustering was 10 times slower than that of the HAC algorithm, and the LDA model's runtime was the slowest among all models (Table 1). In addition, the precision of the LDA model was very low, meaning that predicted groups had many false positives. While the precision of the k-means model was comparable to that of the HAC+KNN model, recall was only 73%. This showed that the k-means model missed out on many messages.

Determining Whether a Rumor Is Related to COVID-19
A rumor group contains many messages. To determine if a rumor group is related to COVID-19, we first identified how many messages in the group contained any of the COVID-19 keywords from the list that we put together (Textbox 2). Next, rumor groups with more than 60% of messages containing COVID-19-related keywords were passed to the authors to decide if such a rumor was really about COVID-19. If a rumor was deemed COVID-19-related, then all messages in the group were also deemed COVID-19-related, regardless of whether that message itself contained the keywords. Recognizing COVID-19-relatedness by close neighbors of each message is a more inclusive approach, as there were messages without the keywords that were obviously related to the pandemic; see Multimedia Appendix 2 as an example.

Data Set
Our data set, after preprocessing, contained 114,124 messages. The character distribution is presented in Table 2, and the number of messages reported per date is shown in Figure 7.

Rumor Group Overview
By using the HAC+KNN algorithm, 114,124 messages were separated into 12,260 rumor groups. A total of 8529 rumor groups had only 1 message. Therefore, the rest of the 105,595 messages were separated into 3731 rumor groups. There were 936 rumor groups with at least 10 messages, with the largest one having 2546 messages. We present the statistics of the rumor group sizes in Table 3. Among 936 rumor groups with at least 10 messages, we identified 396 (42.3%) that were related to COVID-19; these consisted of a total of 42,829 messages. Among 396 COVID-19-related rumor groups, 134 (33.8%) were deemed false or misleading by either the Taiwan FactCheck Center or MyGoPen, two International Fact-Checking Network (IFCN)-certified fact-checking agencies in Taiwan.
After recognizing many messages containing simplified Chinese characters or phrases originating from China, we compared the prevalence of those characters and phrases between COVID-19-related and non-COVID-19-related messages. Compared to non-COVID-19-related messages, the pool of COVID-19-related messages had significantly more messages using simplified Chinese characters or phrases that originated from China (Table 4). The association was significant as determined by the chi-square independence test with Yates' continuity correction (χ 2 1 =1088.0, n=96,373; P<.001). The COVID-19-related rumor group sizes had a very long-tailed distribution (Figure 8). Most of the rumor groups only contained a few messages. In fact, only 15 rumor groups contained more than 1000 messages. In the following subsection, we discuss how we qualitatively analyzed the three COVID-19 rumor groups with the largest number of messages.

Overview
We qualitatively analyzed the three rumor groups with the largest number of messages among the 936 COVID-19-related rumor groups. In fact, a total of 7523 messages from the three rumor groups made up 17.6% of all 42,829 COVID-19-related messages.
To study the interactions of the rumors' popularity with society, we picked out some major societal events, as shown in Table  5. While there were multiple important events regarding the pandemic every day, we picked out incidents that were the first occurrences.

Case 1: Do Not Go Outside!
The rumor content for Case 1 is presented in Textbox 3. This rumor first appeared in the data set on February 2, 2020. Over the course of 3.5 months, there were a total of 2119 messages reported. The reported messages went viral at least four times: they peaked on February 22 with 80 messages, they peaked on March 16 with 68 messages, they reached the highest number on April 2 with 205 messages, and they peaked on April 7 with 197 messages (Figure 9). During this period, we observed several content changes (Table 6).
English translation: Academian Zhong Nan-Shan emphasized again, "Do not go outside! At least wait until the Lantern Festival." Be warned that even if cured, you would suffer the rest of your life. This is a plague worse than SARS. The side effects of the drugs are more severe. Even if there is special medicine, it could only save your life, nothing more. Think about your family before stepping outside...This is a war, not a game...No one is an outsider in this war...Please share it with others. By Zhong Nan-Shan.
Original content:    and 1117 (53.3%) messages, respectively. Efforts were made to emphasize the authoritativeness of the quoted party as well. For example, titles for Zhong Nan-Shan became longer, from "Expert in Pandemic from Mainland China" and "Expert in Coronavirus" to "Expert in Coronavirus from Mainland China, 78-year-old Academian Zhong Nan-Shan." Starting from April 1, 2020, every reported message had Zhong replaced with Chen Shih-Chung ( Figure 10). As the Minister of the Ministry of Health and Welfare (MOHW) and director of Taiwan's CECC, Chen's popularity skyrocketed during the pandemic through his daily press conferences. Due to the prevalence of this message spreading on the web and closed platforms, the MOHW and the CECC both sent out a press release [31] on April 2, 2020, reminding the public that this was misinformation. Nevertheless, this did not stop another viral spread of the same message at the end of a 4-day long weekend holiday in Taiwan, where crowds were seen at every tourist attraction on the island. For days, people worried that the long weekend would lead to another outbreak of the pandemic, providing an explanation as to why the message bearing the key topic "do not go out" would become a big hit.

Case 2: Drinking Salt Water Can Prevent the Spread of COVID-19
This rumor promoted drinking salt water to prevent COVID-19. Interestingly, this rumor was actually the combination of two individual rumors (Table 7). Message B had a peak on March 27 with 265 messages, and Message A+B received the most attention on March 30 with 523 messages (Figure 11). This is 100% accurate...Why did we see a huge decline of confirmed cases in China during the last few days? They simply forced their citizens to rinse mouths with salted water three times a day and then drink water for 5 minutes. The virus would attack throats before the lungs, and when getting in touch with salted water, the virus would die or get destroyed in lungs. This is the only way to prevent the spread of COVID-19. There is no need to buy medicine as there is nothing effective on the market. Why did Mainland China show a huge decline of confirmed cases over the last few days? Besides wearing masks and washing hands, they simply rinse mouths with salted water three times a day and then drink water for 5 minutes...Dr Wang of Tung Hospital stated that the novel coronavirus would survive in throats for 4 days before reaching the lungs...If one can drink as much warm water with salt and vinegar as they can, the virus could be destroyed... Figure 11. The number of Case 2 (ie, "Drinking salt water can prevent the spread of COVID-19") messages reported by date. The rumor had been fact-checked rather early; however, the information still received widespread attention. Message B peaked on March 27 with 265 messages, and the combined message peaked on March 30 with 523 messages. Refer to Table 5 for major societal events. Refer to Table 7 for contents of Message A, B, and A+B.

A+B
Among 3283 reported messages, 3093 (94.2%) misquoted medical professionals. The most popular misquoted parties were Dr Wang of Tung Hospital and the director of the Veteran Hospital, each seen in 2340 (71.3%) and 753 (22.9%) messages, respectively.
Drinking salt water to prevent COVID-19 was a popular false claim about COVID-19 internationally. This rumor was fact-checked several times in March by Taiwan's fact-checking agencies [32,33], and even the WHO had fact-checked a similar claim about rinsing noses with saline [34]. However, this did not stop this piece of misinformation from receiving attention ( Figure 11). In fact, several translations of the combined rumor (ie, Message A+B) were observed in April. The translations included English, Indonesian, Filipino, and Tibetan.
The lifespan of this "drink salt water" rumor was rather long. One famous fact-checking platform in Taiwan, MyGoPen, released an article to disprove this false medical advice again in October 2020 [35], 7 months after it was first seen in our data set.

Case 3: This Is a Critical Period; Here Are Some Suggestions
This rumor mentioned that Taiwan "entered a critical period of the pandemic" and provided a list of suggested measures for people to follow (Textbox 4). Some of the suggestions made sense in terms of personal hygiene, while others were without basis. This rumor first appeared in the data set on February 6 and included a total of 2121 messages. Over the 1.5 months of its most popular period, it went viral at least three times: February 10 with 120 messages, February 17 with 394 messages, and March 19 with 543 messages (Figure 12). 10 days from now, Taiwan will be in a critical period to combat COVID-19. Here are some suggested measures.
1. Strictly prohibit going to public places. 2. Take out from restaurants. 3. Eat outside in open spaces. 4. Wash your hands the right way (extremely important). 5. When taking the subway or bus, choose the seats at the front half of the vehicle. 6. Do not wear contact lenses. 7. Eat warm food and more vegetables. 8. Avoid constipation. 9. Drink warm water. 10. Do not visit hair salons. 11. Hang the clothes you're wearing outside for two hours the first moment you get home. 12. Do not wear jewelry. 13. Wash your hands immediately after touching cash or coins. Put coins you just received inside a plastic bag for one day before using them. 14. Do not use a colleague's phone when working. If you have to, disinfect before using. 15 Figure 12. The number of Case 3 (ie, "This is a critical period; here are some suggestions") messages reported by date. The rumor was fact-checked several times in early February. However, higher peaks were still seen later on February 17 with 394 messages and, after a month, on March 19 with 543 messages. Refer to Table 5 for major societal events.
Among 2121 reported messages, 1778 (83.8%) misquoted authorities as endorsing the rumor. The Taiwan Medical Association and the CECC director, Chen Shih-Chung, were the most misquoted parties, each seen in 1637 (77.2%) and 393 (18.5%) messages, respectively ( Figure 13). A major revision of the rumor appeared on February 12 (Table 8), 6 days after the first message. In the revision, the original 18 bullets were pruned to 14, removing the ones that were perhaps more ridiculous or hard to follow. Strong words were also modified to a gentler tone. The Taiwan Medical Association, the most misquoted party, also first appeared in the message. Figure 13. The Taiwan Medical Association (TMA) was quoted in almost every message in this rumor group, even though the TMA released a statement on February 12, 2020, saying that they did not endorse the material. Later, after Chen Shih-Chung went to Legislative Yuan on March 18, the same rumor started misquoting him. After almost a month with only a few messages circulating (Figure 12), on March 18, the CECC director, Chen Shih-Chung, went to the Legislative Yuan (similar to the US Congress) for interpellation about the pandemic. Chen started to be quoted in messages on the same day, making the "suggested measures" look like they were said by Chen during his interpellation (  Figure 12). The Taiwan Medical Association, which was misquoted in 1637 out of 2121 (77.2%) messages, also released a clarifying statement on February 12 [39], stating explicitly that they did not endorse the material. However, similar to what we observed in the previous two cases, such fact-checking efforts did not prevent the rumor from getting widespread attention later. Rather, societal events might have played a larger role in the popularity of the rumor. For example, the spike on February 17 ( Figure 12) was preceded by the first COVID-19 death case and a local cluster in Taiwan. A taxi driver tested positive for the virus and died the same day on February 15. Over the next few days, four of the driver's family members also tested positive, forming the first local cluster of COVID-19 infection in Taiwan. The highest spike on March 19 was preceded by the CECC director's interpellation in the Legislative Yuan, the event after which the messages started misquoting the director.

Principal Findings
First, we demonstrated that by using a combination of HAC and KNN algorithms, we could efficiently separate a large number of social media text messages into fine-grained narratives, or rumors. The addition of the KNN classification algorithm enabled the speedup and, at the same time, achieved near-equivalent results compared to using HAC alone. Hence, this classification-based clustering algorithm could enable future large-scale studies of rumor transformation with social media post content.
We identified 396 rumors related to COVID-19 from the pool of 114,124 suspicious messages collected from the LINE platform between January and July 2020. Among the COVID-19-related rumors, more than one-third were deemed false or misleading by IFCN-certified fact-checking agencies in Taiwan. Compared to non-COVID-19-related messages, COVID-19-related messages were more likely to contain simplified Chinese characters or phrases originating from China. The association was statistically significant. As the official language in Taiwan is traditional Chinese, the result suggested that COVID-19-related messages were more likely to have originated from non-Taiwanese users than the non-COVID-19-related messages.
We qualitatively investigated three COVID-19-related rumors with the highest number of messages and observed several commonalities among these highly popular rumors. First, a significant number of messages from all three rumor groups misquoted key authoritative figures. Given the nature of the pandemic, the authorities were usually medical personnel. At times, a change in the quoted authority figures signaled a paradigm shift, indicating whom the public looked up to, for example, from Zhong Nan-Shan to Chen Shih-Chung. At other times, the quoted party did not seem to make any sense. For example, Dr Wang in Case 2 was in fact an orthopedist, a specialty not directly related to COVID-19. Second, in all three rumors, we observed spikes in reported messages even after several fact-checking agencies released reports that deemed the content false or misleading. Echoing the findings of Wood and Porter [40], the current practice of fact-checking did not seem to effectively stop the false information from getting widespread attention later. In fact, by identifying major societal events preceding each resurfacing peak, we asserted that resurfacing patterns were more influenced by major societal events and textual transformation. However, each peak of popularity would not last long, and it was often without good explanation about how one wave of attention ended.
Our work offers several insights into the landscape of misinformation in a closed platform as well as the behaviors of some popular COVID-19 rumors. These characteristics could serve as rules to discover possible false information as early detection mechanisms. Although we identified these characteristics manually in this study, it is quite possible to employ techniques such as NLP to automatically recognize these textual changes in the future, making it possible to have an automatic early warning system of possible misinformation before fact-checking efforts by professionals.

Comparison With Prior Work
Our work adds to the limited collection of COVID-19 infodemic studies in closed platforms [41]. Compared with other rumor diffusion studies, such as the study of 17 political rumors by Shin et al [42], this work provided an efficient machine learning algorithm that could enable large-scale rumor evolution studies on social media platforms in the future. In comparison to other machine learning applications in COVID-19 infodemic studies, this work focused on fine-grained narratives, or rumors, rather than high-level topics, in order to study individual rumor propagation. To the best of our knowledge, this is the first study to examine rumor diffusion and propagation patterns of COVID-19 misinformation on a closed platform.

Limitations
This study had several limitations. First, the data were collected by LINE users' reports. Therefore, it was impossible to infer the true distribution of messages without making some assumptions. For example, if there was more health-related misinformation in our data, it did not necessarily translate to more health-related rumors circulating in the platform. In fact, it could also be that people were more alert and skeptical of health-related information. Second, we only looked at text messages. Therefore, information distributed visually or in audio form was not covered. Lastly, our algorithm for grouping messages does not work well with short texts.

Conclusions
While social media may give rise to an unprecedented number of unverified rumors, it also provides a unique opportunity to study rumor propagation. In fact, to combat the infodemic, we need to first understand how and why some rumors became popular. In our studies, we proposed an algorithm that enables the research community to perform large-scale studies on the evolution of text messages at the rumor level rather than at the topic level. Moreover, we showed textual commonalities in widespread rumors in Taiwan during COVID-19. We also showed that the attention one rumor received was connected to major societal events and content changes. To the best of our knowledge, this is one of the few studies that has examined COVID-19 misinformation on a closed messaging platform and the first to examine the textual evolution of COVID-19-related rumors during their propagation. We hope that this will further spark more studies in rumor propagation patterns as an effort to fight the infodemic.