Hate Speech Identification in Formal and Informal Social Media Text Using RoBERTa-Base and XLM-RoBERTa-Base Models

Authors

Keywords:

social media, hate speech identification, RoBERTa-Base, XLM-RoBERTa-Base

Abstract

Social media platforms enable users to interact and share content through text, images, videos, and links. Sentiment analysis, a key method in Natural Language Processing (NLP), evaluates the emotional tone of social media content. It helps to identify hate speech, which includes harmful, offensive language directed at individuals or groups. In this study, we present the performance comparison of two transformer-based models, RoBERTa-Base and XLM-RoBERTa-Base, for hate speech identification in formal (English) and informal (Roman Urdu and mixed English-Roman Urdu) text. We fine-tune both models on multilingual datasets and evaluate the model's performance. XLM-RoBERTa-Base achieved (94.46%) accuracy for English, (93.77%) for Roman Urdu, and (88.32%) for mixed English-Roman Urdu text, while RoBERTa-Base performed better in English (97.24%) and Roman Urdu (94.28%), but dropped slightly to (88%) for mixed English-Roman Urdu text. These results show RoBERTa-Base excels in English, but struggles with Roman Urdu and code-switched mixed languages, while XLM-RoBERTa-Base performs consistently across all languages.

Author Biographies

  • Husnain Saleem, Gomal University, D.I.Khan, KPK, Pakistan

    Gomal Research Institute of Computing (GRIC)
    Gomal University, D.I.Khan, KPK, Pakistan

  • Muhammad Javed, Gomal University, D.I.Khan, KPK, Pakistan

    Gomal Research Institute of Computing (GRIC)
    Gomal University, D.I.Khan, KPK, Pakistan

  • Junaid Khan, Gomal University, D.I.Khan, KPK, Pakistan

    Gomal Research Institute of Computing (GRIC)
    Gomal University, D.I.Khan, KPK, Pakistan

Published

2025-12-05

Issue

Section

Artificial Intelligence, Society & Digital Innovation

Similar Articles

11-20 of 486

You may also start an advanced similarity search for this article.