case study

Nesta Innovation Agency

Conducting counter-speech interventions powered by AI to reduce toxicity and community violations on Reddit

Proactive Moderation
1 December 2023
reading time: 5 minutes

Client

Nesta is an innovation agency from the UK, whose purpose is to support innovation for social good. They fund practical experiments that help generate evidence on the best approaches to designing and employing collective intelligence to solve social challenges.

For more than two decades Nesta has been designing, testing and scaling new solutions to society’s biggest problems, changing millions of lives for the better.

Company type

Time on market

Location

Industry

Context

The main objective of our experiment was to test whether the level of cyberviolence on Reddit can be significantly decreased by community-driven, counter-speech interventions conducted by users in partnership with Artificial Intelligence.

Nowadays, the most common approach to reduce cyberviolence within a given communityis through a semi-automatic moderation system where human moderators support themselves with automatic tools that sift through the content to identify guideline violations and admit relevant sanctions. In most cases, such a moderator uses a negative motivation system – punishments for violating the community guidelines (e.g. warning, blocking, banning). Our goal was to test a a positive motivation system where the unwanted behavior is being decreased through empathic peer pressure and experiential learning of positive community norms.

A similar experiment was also conducted while working on the JamesWalker43 project.

Solution

We aimed to do so by developing a collective approach, in which Artificial Intelligence is used to detect cyberviolence and notify volunteers in real-time so they can get actively involved in deescalating toxic conversations.. Such an approach served as a distributed bottom-up voluntary model of moderation.

In the end, what we were able to compare was the following: the effectiveness of the existing Reddit moderation system (predominantly grounded in a punitive authoritarian paradigm by moderators) versus the existing moderation system combined with collective intelligence – Artificial Intelligence supported by a crowd of volunteers – who introduced the element of positive peer-pressure.

How did it work in practice?

We monitored user activity on Reddit for 30 days to select active users who besides their regular activity, also published content that violated community standards and regulations.

After that, we sent at least 1 intervention to prevent the user from violating the rules in the future. The interventions we sent were divided into several categories: Empathy, Norm, Norm Hard, Group Pressure, and Control. The graphic below shows the specifics of each group.

After this period, we returned to passively monitoring the activity of exactly the same users and verified whether the period after sending the intervention contained less toxic content than the corresponding period before launching the experiment.

One of the many valuable insights I’ve gained throughout this experiment was that speaking out against online negativity requires lots of resilience. It involves confronting harsh content regularly, a task that is far from easy. Despite the fact that the personal attacks were not directed at them, our volunteers found themselves deeply affected. 

Many reported feelings of fatigue and a mood decrease due to exposure to aggressive content. This shows just how important it is to support people who are standing up against negativity and reaffirms what we already understand about the psychological toll of content moderation. 

Another valuable finding concerns the initial effect of interventions – they seemed to have a damaging impact at first. However, with time, they led to a significant, positive change in user behavior. Importantly, this pattern isn’t unique to our experiment. Kevin Munger observed a temporary increase in toxic behavior after interventions addressing racism on Twitter, and we also noted a similar trend in our previous intervention experiment.

Our results have shown that persistence is key, and it’s crucial not to give up. Initial attempts at counter-speech may bring frustration and demotivation but the more counter-speech is spread, the higher the likelihood and magnitude of change.

In conclusion – we should continue to explore new avenues to positively influence online discourse while mitigating the harmful effects of exposure to toxicity.
” Marysia Dowgiałło, Interventions Manager

Results

The short-term effect of interventions is damaging: users tend to be on average around 26% more aggressive the next day, but the effect does not last beyond two days.

However, the cumulative effect of interventions is helpful: each intervention (up to around 8-10 total, the effectiveness of more interventions tends to be lower) decreases daily aggression by 4% on average and the effects accumulate and balance out the short-term effect in the long run. 

The effectiveness of normative interventions seems overall higher, except for the less aggressive offenders and low-aggression users, for which empathetic interventions might be equally or more useful.

Decreased daily aggression

the cumulative effect of interventions is helpful: each intervention (up to around 8-10 total, the effectiveness of more interventions tends to be lower) decreases daily aggression by 4% on average

Higher Effectiveness

The effectiveness of normative interventions seems overall higher, except for the less aggressive offenders and low-aggression users, for which empathetic interventions might be equally or more useful.

Do you want to achieve such
results with us?

Case Studies

We support various businesses in their efforts to moderate online speech. Read our recent case studies and find out how we help our clients all over the world.

Developing AI system that meets the goals of the National Strategy for Preventing Veteran Suicide

Suicide rates have been historically high among Veterans, with the estimated risk being 57% higher than that of the general population. In order to investigate the suicidal tendencies among this group, we collected over 41,000 posts from VA Disability Claims Community Forums – Hadit.com.

Keeping schools and online learning safe by monitoring spaces and 1:1 chats to detect hostile or inappropriate conversations.

The purpose of Samurai is to detect and prevent violence, making it an essential asset for educational institutions that use platforms such as Webex for remote learning and communication.

Moreover, it has a positive impact on the schools’ reputation. By showcasing a reduced incidence of aggressive behavior, institutions can attract parents; preference and potentially  enhance students’ performance in international educational rankings.

Keeping a top gaming community safe from toxicity and cyberbullying by identifying 30% more cases of community guidelines violations

● Over 130 violations of Community Guidelines were detected by Samurai Guardian each day

● 30% more Community Guideline violations were detected by Samurai Guardian and would be automatically removed when compared to human moderators.

● Less than 3% of Community Guideline violations were removed by moderators without being detected by Samurai Guardian.




    Schedule a free consultation
    with our expert

    Take the first step to make the change in your company with just a little effort. Our representative will contact you within 24h after submitting your request.

    Chief Marketing Officer

    Chief Growth Officer