case study

James Walker – AI Guardian of Men’s Rights Subreddit

Reducing personal attacks and cultivating healthier communities with autonomous counterspeech on Reddit

Proactive Moderation
1 December 2023
reading time: 7 minutes


Standard methods of moderating aggressive user behavior are warning, muting, and banning users. In other words, users are punished or excluded for antisocial behavior. Dozens of studies in the field of social impact have shown that even a simple and calm reaction reminding about the human on the other side of the screen can have a beneficial effect on the behavior of the user who posted an aggressive or offensive comment.

At Samurai Labs we decided to find out if:

  • Online hate can be reduced by responding with kindness, respect, and understanding for the bad actor and reminding them about the norms and rules of the community
  • Peer pressure can change the negative behavior of users in online communities
  • AI bots can be used to reduce online hate on a large scale

Together with an entire team of developers, natural language processing engineers, and social scientists, we created James Walker – the ethical bot-guardian of a selected community on Reddit.

“Overcoming hate speech is one of the greatest challenges in the modern world. However, the knowledge of psychologists and sociologists is not sufficient to effectively confront it. We often feel helpless when we see the flood of online hate. The actions of Samurai Labs restore hope that we can do something. When we combine scientists’ knowledge of how to change human behavior with sophisticated AI technologies created by Samurai Labs, we can not only interpret reality – but also change it. Change the behavior of specific haters who destroy our public life. And that’s something.”

Prof. Michał Bilewicz, Psychology Department, University of Warsaw


The Samurai Labs team has refined the neuro-symbolic AI module capable of detecting personal attacks (offensive, direct phrases to other users) with nearly 100% effectiveness. Social researchers, on the other hand, created an experiment plan and distinguished three types of interventions that would have been sent in response to personal attacks:

  • Respectful Disapproval – Induction of Descriptive Norm

Example: “Howdy ho! I kind of understand your emotions. But most of us here express our points without hurtful language.”

  • Abstract Norms – Induction of Prescriptive Norm

Example: “Ability to express ourselves but with respect to others, is a wonderful sign of character and takes lots of courage.”

  • Induction of Empathy

Example: “Some behaviors might be hard to get for some people, but let’s keep in mind there are people of flesh and blood on the other side of the screen.”

On the Reddit platform, we selected a community (subreddit) that was distinguished by a high percentage of personal attacks – r/mensrights*. The subreddit was monitored by our system 24 hours a day, providing real-time detailed information about generated hate.

What is more, we created a bot named JamesWalker43 equipped with artificial intelligence and a database of 100,000 unique interventions.

We wanted James to blend into selected communities and show the characteristics of a real user, not a generative bot, so we integrated it with an 8-year-old account and enriched its latest activity with entries on groups associating jazz, octopus, and wood carving fans. In this way, JamesWalker43 became a calm man in his fifties, delighted with nature and art, filled with respect for other people, who reacted to most posts containing personal attacks and reminded users of mutual respect for other people.

What did the project look like?

In the early days of the experiment, many users thought James was just a common troll, which was counterproductive. Over time, the interpretations began to evolve, and eventually, users got used to it, considering James to be a harmless, likable oddball, focused on his mission.

We also distinguished the types of bot interventions due to the number of interventions sent per user. We created a chain of a total of three interventions that could have been sent to each user once breaking the rule of the community. The first intervention was always compassionate and empathetic, but when the user continued to exhibit negative behavior, James used more forceful responses to convince the user to stop.

For six months, James busily handed out sympathy and kind advice to all those who had lost their heads and had forgotten the human on the other side. At the time, not only did he receive a warm welcome from a rather radical community, but also he was offered to serve as its official moderator.

What did James’ interventions look like?

Below you can find a series of interventions conducted by JamesWalker43 in the comment sections under the content posted by members of the r/mensrights community.

In this situation, a certain community member generated a personal attack towards another member, by calling them an “asshole”. It must have been a second personal attack they have generated during the experiment, as they received the Respectful Disapproval type of intervention.

In the situation above, the user generated a personal attack, received an intervention, and then edited the personal attack out of their original comment. We know this because in some of the interventions James would quote the exact part of the message that he was referring to. In this case an attacker have originally said “you’re a cunt”, then received a normative intervention disapproving of calling others “cunts” in this community, and then edited their comment so that it doesn’t contain the offensive term (“cunt” → nuisance).

In this case, the user received a respectful-disapproval intervention, after their second personal attack sent during the study. The purpose of the first part of this message (“i know people get angry sometimes) was not only to express our sympathy with how they’re feeling but also to shift their focus from the target of their anger to the emotion itself, or in other words to turn their attention inwards in order to reflect on their emotions and their causes, rather than projecting them outwards. In response the recipient agreed with us, and explained what caused their strong emotions.


After analyzing the data, it turned out that without the help of penalties and bans, James reduced the subreddit’s aggression level by 19%. Moreover, the antisocial activity of those who received the interventions decreased in other Reddit groups as well.

There is reason to suspect that the interventions had a positive impact on overall user behavior rather than just motivating the audience to find another place to attack others.

In the six months, and without any involvement of Reddit moderators, Samurai Labs’ bot was able to:

Voluntarily reduce personal attacks by 19%

As people who received interventions went on to voluntarily edit their messages or change their tone in future conversations, the overall percentage of comments with personal attacks dropped by 19%

Identify repeat offenders responsible for 25% of all personal attacks

A subset of intervention recipients continued to engage in personal attacks after receiving multiple interventions. Banning these users would eliminate 25% of all personal attacks

Lower attacks on other threads

Users who received interventions on the Men’s Rights subreddit went on to deliver significantly fewer personal attacks on other subreddits where they frequented, where Samurai Labs monitored conversations but did not intervene

Outstanding precision

Samurai’s patented AI neuro-symbolic approach to detection delivers 95% fewer false positives and also identifies other important nuances, such as the object of aggression, which are used to generate conversational counter-speech interventions.

*This experiment was run on r/mensrights as a part of academic research published HERE. The management of Reddit or the Men’s Rights subreddit were not involved in the design or execution of this project.

Do you want to achieve such
results with us?

Case Studies

We support various businesses in their efforts to moderate online speech. Read our recent case studies and find out how we help our clients all over the world.

Developing AI system that meets the goals of the National Strategy for Preventing Veteran Suicide

Suicide rates have been historically high among Veterans, with the estimated risk being 57% higher than that of the general population. In order to investigate the suicidal tendencies among this group, we collected over 41,000 posts from VA Disability Claims Community Forums –

Keeping schools and online learning safe by monitoring spaces and 1:1 chats to detect hostile or inappropriate conversations.

The purpose of Samurai is to detect and prevent violence, making it an essential asset for educational institutions that use platforms such as Webex for remote learning and communication.

Moreover, it has a positive impact on the schools’ reputation. By showcasing a reduced incidence of aggressive behavior, institutions can attract parents; preference and potentially  enhance students’ performance in international educational rankings.

Keeping a top gaming community safe from toxicity and cyberbullying by identifying 30% more cases of community guidelines violations

● Over 130 violations of Community Guidelines were detected by Samurai Guardian each day

● 30% more Community Guideline violations were detected by Samurai Guardian and would be automatically removed when compared to human moderators.

● Less than 3% of Community Guideline violations were removed by moderators without being detected by Samurai Guardian.

    Schedule a free consultation
    with our expert

    Take the first step to make the change in your company with just a little effort. Our representative will contact you within 24h after submitting your request.

    Chief Marketing Officer

    Chief Growth Officer