CA | ES | EN
Toward ethically aligned AI: moral value detection, agreement metrics, and formal reasoning in language models

The detection of moral values in natural language text poses a significant challenge for the development of ethically aligned artificial intelligence. This task is inherently subjective, frequently exhibiting low inter-annotator agreement, which poses significant challenges for both the annotation process and the development of robust evaluation methodologies. This seminar advances the field through two interrelated contributions: an assessment of state-of-the-art Large Language Models (LLMs) in detecting moral values in real-world textual data; and the introduction of a novel inter-annotator agreement metric (i.e. F1-kappa) that bridges the gap between human annotation and machine learning evaluation. This metric enables unified comparisons of human and model outputs in both binary and multi-label classification tasks. Empirical analyses on a benchmark dataset show that LLMs achieve performance superior or comparable to that of human annotators. The F1-kappa metric supports a more coherent and interpretable evaluation paradigm, particularly for tasks where subjectivity and label ambiguity are prevalent. Finally, the seminar will explore broader research directions in this field, including value alignment across different moral theories, the identification of social norms in text, and the possibility to incorporate formal moral reasoning frameworks into the constrained deployment of generative language models.

Luana Bulla is a PhD student in Computer Science at the University of Catania, affiliated with the Institute of Cognitive Sciences and Technologies (ISTC-CNR), where she focuses on Natural Language Processing for human-centered AI. With a background in linguistics and digital humanities, Luana specializes in semantic modeling, moral value detection, and machine learning models, particularly in the area of generative models. She has contributed to several research projects, including IDEHA, SPICE, FAIR, and L4ALL. Her primary interests include AI-generated content detection, emotion and moral values classification, and moral reasoning in language models. Her current research spans sign language translation in low-resource settings, grammar-constrained natural language generation, and the ethical dimensions of AI.