Authors:
(1) Anh V. Vu, University of Cambridge, Cambridge Cybercrime Centre ([email protected]);
(2) Alice Hutchings, University of Cambridge, Cambridge Cybercrime Centre ([email protected]);
(3) Ross Anderson, University of Cambridge, and University of Edinburgh ([email protected]).
Table of Links
Abstract and 1 Introduction
2. Deplatforming and the Impacts
2.1. Related Work
2.2. The Kiwi Farms Disruption
3. Methods, Datasets, and Ethics, and 3.1. Forum and Imageboard Discussions
3.2. Telegram Chats and 3.3. Web Traffic and Search Trends Analytics
3.4. Tweets Made by the Online Community and 3.5. Data Licensing
3.6. Ethical Considerations
4. The Impact on Forum Activity and Traffic, and 4.1. The Impact of Major Disruptions
4.2. Platform Displacement
4.3. Traffic Fragmentation
5. The Impacts on Relevant Stakeholders and 5.1. The Community that Started the Campaign
5.2. The Industry Responses
5.3. The Forum Operators
5.4. The Forum Members
6. Tensions, Challenges, and Implications and 6.1. The Efficacy of the Disruption
6.2. Censorship versus Free Speech
6.3. The Role of Industry in Content Moderation
6.4. Policy Implications
6.5. Limitations and Future Work
7. Conclusion, Acknowledgments, and References
Appendix A.
3. Methods, Datasets, and Ethics
Our primary method is data-driven, with findings supported by quantitative evidence derived from multiple longitudinal data sources, which we collect on a regular basis. Where quantitative measurements require enrichment – as when analysing relevant public statements of tech firms directly involved in the disruption, and announcements made by the forum operators – we use qualitative content analysis.
3.1. Forum and Imageboard Discussions
Besides common mainstream social media channels like Facebook and Twitter, independent platforms such as xenForo[4] and Infinity[5] have gained popularity as tools for building online communities. Despite being less visible and requiring more upkeep, these can offer greater resistance against external intervention as the operators have full control over the content and databases, thereby allowing easy backup and redeployment in case of disruption. These platforms typically share a hierarchical data structure ranging from bulletin boards down to threads linked to specific topics, each containing several posts. While facilitating free speech, these also increasingly nurture and disseminate hate and abusive speech. We have been scraping the two most active forums associated with online harassment for years due to their increasingly toxic content, as part of the EXTREMEBB dataset [62]: KIWI FARMS and LOLCOW FARM.
Our collection includes not only posts but also associated metadata such as posting time, user profiles, reactions, and levels of toxicity, identity attack and threat measured by the Google Perspective API as of January 2023.[6] Perspective API also offers other measures such as insult and profanity [63], but we exclude these due to lack of relevance to the aim of this paper. This API uses crowdsourced annotations for model training and substantially outperforms the alternatives [64]. We strive to ensure data completeness by designing our scrapers to visit all sub-forums, threads, and posts while keeping track of every single crawl’s progress to resume incrementally in case of any interruption. A summary of the forum discussion data is shown in Table 1.
KIWI FARMS is built on xenForo, but the operators have been maintaining the forum by their own efforts since late 2021 when xenForo officially revoked their license. Our data covers the entire history of the forum from early January 2013 to the end of 2022 with 10.1M posts in 48.3k threads made by 59.2k active users, providing a full landscape through its evolution over time. While some extremist forums experienced fluctuating activity and rapid declines in recent years [62], KIWI FARMS has shown stable growth until being significantly disrupted in 2022 (see Figure 1). Our data precisely capture major reported suspensions, including those in 2017 and 2022.
The primary rival of KIWI FARMS is LOLCOW FARM, an imageboard built on Infinity [65], [66]. While KIWI FARMS discussions are largely text-based, LOLCOW FARM is centred on descriptive images. While KIWI FARMS users adopt pseudonyms, LOLCOW FARM users mostly remain hidden under the unified ‘Anonymous’ handle. We gathered a complete snapshot of LOLCOW FARM from its inception in June 2014 to the end of 2022, encompassing 4.6M posts made in 10.0k threads. LOLCOW FARM has much fewer threads, but each typically contains lots of posts. This collection brings the total number of posts for both forums to 14.7M (and still growing). We exclude LOLCOW, a smaller competitor to KIWI FARMS (also based on xenForo), as it vanished in mid2022 and had less than 30k posts in total. As LOLCOW FARM is now the largest competitor, analysing it lets us estimate platform displacement when KIWI FARMS was down.
3.2. Telegram Chats
During periods of inaccessibility, the activity level increased in the Telegram groups associated with KIWI FARMS. There are two channels: one is primarily used by the forum operators to disseminate announcements and updates, particularly about where and when the forum could be accessed; and one is adopted by the forum users mainly for normal discussions. Both channels permit public access, allowing people to join and view historical messages. We used Telethon[7] to collect a snapshot of these channels during their entire lifespan until the end of 2022, encompassing 525k messages, 298k replies, and associated metadata such as view counts and 356k emoji reactions made by 2 502 active users. The data is likely complete as our scraper is running in near real time, and messages with metadata are fully captured through the use of official Telegram APIs. As the forum operators are highly incentivised to keep users quickly informed, their announcements provide a reliable incident and response timeline.
3.3. Web Traffic and Search Trends Analytics
We found from announcements in the Telegram group that KIWI FARMS could be accessed through six major domains: the primary one is kiwifarms.net and four alternatives are kiwifarms.ru, kiwifarms.top, kiwifarms.is, and kiwifarms.st, while a Pleroma decentralised web version is at kiwifarms.cc.[8] To investigate how users navigated across these domains when the forum experienced disruption, we analysed traffic analytics towards all six domains provided by Similarweb – the leading platform in the market providing insights and intelligence into web traffic and performance.[9] Their reports aggregate anonymous statistics from multiple inputs, including their own analytic services, data sharing from ISPs and other measurement companies, data crawled from billions of websites, and device traffic data (both website and app) such as plugins, add-ons and pixel tracking. Their algorithm then extrapolates the substantial aggregated data to the entire Internet space. Their estimation therefore may not be completely precise, but reliably reflects trends at both global and country levels. To test that reliability, we deployed our own infrastructure to collect over 19M ground-truth traffic records over six months, grouped them into 30-minute sessions then compared with Similarweb visits. We find that while underestimating the amount of traffic due to how repeat pageviews are counted, Similarweb is able to capture trends with a strong positive linear relationship (Pearson correlation coefficient r = 0.83). Our analysis in the next section also suggests a high correlation between the traffic data and the forum activity
As Similarweb does not offer an academic license, we use a free trial account[10] to access longitudinal web traffic and engagement data going back the past three months. This includes information about total visits, unique visitors, visit duration, pages per visit, bounce rate, and page views. It also provides figures on search activity, data for marketing such as visit sources (e.g., direct, search, email, social, referral, ads), and non-temporal insight into audience geography and demographics. These data, covering both desktop and mobile traffic, provide valuable perspectives. They span from July to December 2022, two months before and four months after the disruption; this time frame is sufficient as there was no significant industry intervention against the forum in the past (as shown in Figure 1), and the disruption campaign mostly ended after a few months (see §4). In addition, we also collected search trends by countries and territories over time from Google Trends, covering the entire lifetime of the forum. Both of these datasets are likely to be complete as they were gathered directly from Similarweb and Google.
[4] The xenForo Platform: https://xenforo.com/
[5] The Infinity Imageboard: https://github.com/ctrlcctrlv/infinity/
[6] Google Perspective API: https://perspectiveapi.com/