Your resource for web content, online publishing
and the distribution of digital products.
«  
  »
S M T W T F S
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
 
 
 
 
 

From Theory to Tech: How Counterspeech Research Tackles Digital Abuse

DATE POSTED:May 26, 2025

:::info Authors:

(1) Yi-Ling Chung, The Alan Turing Institute ([email protected]);

(2) Gavin Abercrombie, The Interaction Lab, Heriot-Watt University ([email protected]);

(3) Florence Enock, The Alan Turing Institute ([email protected]);

(4) Jonathan Bright, The Alan Turing Institute ([email protected]);

(5) Verena Rieser, The Interaction Lab, Heriot-Watt University and now at Google DeepMind ([email protected]).

:::

Table of Links

Abstract and 1 Introduction

2 Background

3 Review Methodology

4 Defining counterspeech

4.1 Classifying counterspeech

5 The Impact of Counterspeech

6 Computational Approaches to Counterspeech and 6.1 Counterspeech Datasets

6.2 Approaches to Counterspeech Detection and 6.3 Approaches to Counterspeech Generation

7 Future Perspectives

8 Conclusion, Acknowledgements, and References

Abstract

Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development of large language models mean that the process of producing counterspeech could be made more efficient by automating its generation, which would enable large-scale online campaigns. However, we currently lack a systematic understanding of several important factors relating to the efficacy of counterspeech for hate mitigation, such as which types of counterspeech are most effective, what are the optimal conditions for implementation, and which specific effects of hate it can best ameliorate. This paper aims to fill this gap by systematically reviewing counterspeech research in the social sciences and comparing methodologies and findings with computer science efforts in automatic counterspeech generation. By taking this multi-disciplinary view, we identify promising future directions in both fields.

1 Introduction

The exposure of social media users to online hate and abuse continues to be a cause for public concern. Volumes of abuse on social media continue to be significant in absolute terms (Vidgen et al., 2019), and some claim they are rising on platforms such as Twitter where at the same time content moderation appears to be becoming less of a priority (Frenkel and Conger, 2022). Receiving abuse can have negative effects on the mental health of targets, and also on others witnessing it (Siegel, 2020; Saha et al., 2019). In the context of public figures the impact on the witnesses (bystanders) is arguably even more important, as the abuse is potentially witnessed by a large volume of people. In addition, politicians and other prominent actors are driven out of the public sphere precisely because of the vitriol they receive on a daily basis (News, 2018), raising concerns for the overall health of democracy.

\ Within this context, research on mechanisms for combating online abuse is becoming ever more important. One such research angle is the area of “counterspeech” (or counter-narratives): content that is designed to resist or contradict abusive or hateful content (Benesch, 2014a; Saltman and Russell, 2014; Bartlett and Krasodomski-Jones, 2015), also see Figure 1. Such counterspeech (as we will elaborate more fully below) is an important potential tool in the fight against online hate and abuse as it does not require any interventions from the platform or from law enforcement, and may contribute to mitigating the effects of abuse (Munger, 2017; Buerger, 2021b; Hangartner et al., 2021; Bilewicz et al., 2021) without impinging on free speech. Several civil organisations have used counterspeech to directly challenge hate, and Facebook has launched campaigns with local communities and policymakers to promote accessibility to counterspeech tools.[2] Similarly, Moonshot and Jigsaw implemented The Redirect Method, presenting alternative counterspeech or counter videos when users search queries that may suggest an inclination towards extremist content or groups.[3]

\ The detection and generation of counterspeech is important because it underpins the promise of AI-powered assistive tools for hate mitigation. Identifying counterspeech is vital also for analytical research in the area: for instance, to disentangle the dynamics of perpetrators, victims and bystanders (Mathew et al., 2018; Garland et al., 2020, 2022), as well as determining which responses are most effective in combating hate speech (Mathew et al., 2018, 2019; Chung et al., 2021a).

\ Automatically producing counterspeech is a timely and important task for two reasons. First, composing counterspeech is time-consuming and requires considerable expertise to be effective (Chung et al., 2021c). Recently, large language models have been able to produce fluent and personalised arguments tailored to user expectations addressing various topics and tasks. Thus, developing counterspeech tools is feasible and can provide support to civil organisations, practitioners and stakeholders in hate intervention at scale. Second, by partially automating counterspeech writing, such assistive tools can lessen practitioners’ psychological strain resulting from prolonged exposure to harmful content (Riedl et al., 2020; Chung et al., 2021c).

\ However, despite the potential for counterspeech, and the growing body of work in this area, the research agenda remains a relatively new one, which also suffers from the fact that it is divided into a number of disciplinary silos. In methodological terms, meanwhile, social scientists studying the dynamics and impacts of counterspeech (e.g. Munger, 2017; Buerger, 2021b; Hangartner et al., 2021; Bilewicz et al., 2021) often do not engage with computer scientists developing models to detect and generate such speech (e.g. Chung et al., 2021b; Saha et al., 2022) (or vice versa).

\ The aim of this review article is to fill this gap, by providing a comprehensive, multi-disciplinary overview of the field of counterspeech covering computer science and the social sciences over the last ten years. We make a number of contributions in particular. Firstly, we outline a definition of counterspeech and a framework for understanding its use and impact, as well as a detailed taxonomy. We review research on the effectiveness of counterspeech, bringing together perspectives on the impact it makes when it is experienced. We also analyse technical work on counterspeech, looking specifically at the task of counterspeech generation, scalability, and the availability and methodology behind different datasets. Importantly, across all studies, we focus on commonalities and differences between computer science and the social sciences, including how the impact of counterspeech is evaluated and which specific effect of hate speech it best ameliorates.

\ We draw on our findings to discuss the challenges and directions of open science (and safe AI) for online hate mitigation. We provide evidence-based recommendations for automatic approaches to counterspeech tools using Natural Language Processing (NLP). Similarly, for social scientists, we set out future perspectives on interdisciplinary collaborations with AI researchers on mitigating online harms, including conducting large-scale analyses and evaluating the impact of automated interventions. Taken together, our work offers researchers, policy-makers and practitioners the tools to further understand the potentials of automated counterspeech for online hate mitigation.

\

:::info This paper is available on arxiv under CC BY-SA 4.0 DEED license.

:::

[2] https://counterspeech.fb.com/en/

\ [3] https://moonshotteam.com/the-redirect-method/