Australian legislation could force social media firms to take steps to prevent those under 16 years of age from accessing platforms such as X, TikTok, Facebook and Instagram Copyright GETTY IMAGES NORTH AMERICA/AFP/File Michael M. Santiago
Earlier this year, Facebook rolled back rules against some hate speech and abuse. Along with changes at X (formerly Twitter) that followed its purchase by Elon Musk, the shifts make it harder for social media users to avoid encountering toxic speech.
That does not mean all social networks and other online spaces have given up on the massive challenge of moderating content to protect users. One novel approach relies on artificial intelligence. AI screening tools can analyze content on large scales while sparing human screeners the trauma of constant exposure to toxic speech.
Yet AI content moderation faces a challenge, according to Maria DeArteaga, assistant professor of information, risk, and operations management at Texas McCombs. This is with: being fair as well as being accurate.
An algorithm may be accurate at detecting toxic speech overall, but it may not detect it equally well across all groups of people and all social contexts.
“If I just look at overall performance, I may say, oh, this model is performing really well, even though it may always be giving me the wrong answer for a small group,” DeArteaga explains. For example, it might better detect speech that’s offensive to one ethnic group than to another.
In new research, DeArteaga and her coauthors show it’s possible to achieve high levels of both accuracy and fairness. What’s more, they devise an algorithm that helps stakeholders balance both, finding desirable combinations of accuracy and fairness for their particular situations. DeArteaga worked with datasets of social media posts already rated “toxic” and “nontoxic” or safe by previous researchers. The sets totaled 114,000 posts.
The researchers used a fairness measurement called Group Accuracy Parity (GAP), along with formulas that helped train a machine learning model to balance fairness with accuracy. Applying their approach through AI to analyze the datasets:
- It performed up to 1.5% better than the nextbest approaches for treating all groups fairly.
- It performed the best at maximizing both fairness and accuracy at the same time.
But GAP is not a onesizefitsall solution for fairness, DeArteaga notes. Different measures of fairness may be relevant for different stakeholders. The kinds of data needed to train the systems depends partly on the specific groups and contexts for which they’re being applied.
For example, different groups may have different opinions on what speech is toxic. In addition, standards on toxic speech can evolve over time.
Getting such nuances wrong could wrongly remove someone from a social space by mislabeling nontoxic speech as toxic. At the other extreme, missteps could expose more people to hateful speech.
The challenge is compounded for platforms like Facebook and X, which have global presences and serve wide spectrums of users.
“How do you incorporate fairness considerations in the design of the data and the algorithm in a way that is not just centered on what is relevant in the U.S.?” DeArteaga says.
For that reason, the algorithms may require continual updating, and designers may need to adapt them to the circumstances and kinds of content they’re moderating, she says. To facilitate that, the researchers have made GAP’s code publicly available.
High levels of both fairness and accuracy are achievable, DeArteaga says, if designers pay attention to both technical and cultural contexts.
“You need to care, and you need to have knowledge that is interdisciplinary,” she says. “You really need to take those considerations into account.”
The article“Finding Pareto TradeOffs in Fair and Accurate Detection of Toxic Speech” is published in Information Research.