CA | ES | EN
Automatic Generation of Counter-narratives using Argumentative Information to Fight Hate Speech Online
Automatic Generation of Counter-narratives using Argumentative Information to Fight Hate Speech Online
Damian
Damian
 
Furman
Furman
 (
04/Jun/2025
04/Jun/2025
)
Automatic Generation of Counter-narratives using Argumentative Information to Fight Hate Speech Online
Automatic Generation of Counter-narratives using Argumentative Information to Fight Hate Speech Online
 

An industrial PhD

Advisors: 

Maria Vanina Martinez Posse

Laura Alonso Alemany

Maria Vanina Martinez Posse

Laura Alonso Alemany

University: 

Abstract: 

In this work, we develop a dataset of hate tweets annotated with general and domain-specific argumentative components and with different types of counter-narratives defined according to strategies based on these components, with the aim of using them to improve the performance of different language models in the task of automatic counter-narrative generation to combat xenophobia.

We show that an acceptable level of inter-annotator agreement can be achieved, despite the subjective nature of the task, by using an annotation manual defined through an iterative process involving the annotators, and that the proposed argumentative components can later be identified automatically with satisfactory performance.

We study and elaborate on the shortcomings of the metrics used for automatic evaluation of text generation in the task of counter-narrative generation, both those based on n-gram overlap and those based on embedding comparison, and we propose evaluation categories that make it possible to define a methodology for assigning numerical scores to counter-narratives while making explicit the desirable characteristics they should have and defining what it means for a counter-narrative to be acceptable or good.

With this tool, we conduct an evaluation with human annotators through which we conclude that, for the Flan-T5 family of models, the factor that most increases model performance is fine-tuning on a high-quality dataset, compared with increasing model size or even using argumentative information. Argumentative information does not significantly improve model performance, except for those models that are fine-tuned using a single type of counter-narrative and the argumentative information on which their strategy is based.

Finally, we use the human evaluation to train models to perform automatic evaluations, and we thus evaluate the generation produced by different Large Language Models under multiple generation configurations.

In this work, we develop a dataset of hate tweets annotated with general and domain-specific argumentative components and with different types of counter-narratives defined according to strategies based on these components, with the aim of using them to improve the performance of different language models in the task of automatic counter-narrative generation to combat xenophobia.

We show that an acceptable level of inter-annotator agreement can be achieved, despite the subjective nature of the task, by using an annotation manual defined through an iterative process involving the annotators, and that the proposed argumentative components can later be identified automatically with satisfactory performance.

We study and elaborate on the shortcomings of the metrics used for automatic evaluation of text generation in the task of counter-narrative generation, both those based on n-gram overlap and those based on embedding comparison, and we propose evaluation categories that make it possible to define a methodology for assigning numerical scores to counter-narratives while making explicit the desirable characteristics they should have and defining what it means for a counter-narrative to be acceptable or good.

With this tool, we conduct an evaluation with human annotators through which we conclude that, for the Flan-T5 family of models, the factor that most increases model performance is fine-tuning on a high-quality dataset, compared with increasing model size or even using argumentative information. Argumentative information does not significantly improve model performance, except for those models that are fine-tuned using a single type of counter-narrative and the argumentative information on which their strategy is based.

Finally, we use the human evaluation to train models to perform automatic evaluations, and we thus evaluate the generation produced by different Large Language Models under multiple generation configurations.