Hate Speech Identification in Formal and Informal Social Media Text Using RoBERTa-Base and XLM-RoBERTa-Base Models

Husnain Saleem; Muhammad Javed; Junaid Khan

Authors

Husnain Saleem Gomal University, D.I.Khan, KPK, Pakistan Author https://orcid.org/0009-0001-7513-1086
Muhammad Javed Gomal University, D.I.Khan, KPK, Pakistan Author https://orcid.org/0000-0002-8115-0750
Junaid Khan Gomal University, D.I.Khan, KPK, Pakistan Author https://orcid.org/0009-0004-9332-2208

Keywords:

social media, hate speech identification, RoBERTa-Base, XLM-RoBERTa-Base

Abstract

Social media platforms enable users to interact and share content through text, images, videos, and links. Sentiment analysis, a key method in Natural Language Processing (NLP), evaluates the emotional tone of social media content. It helps to identify hate speech, which includes harmful, offensive language directed at individuals or groups. In this study, we present the performance comparison of two transformer-based models, RoBERTa-Base and XLM-RoBERTa-Base, for hate speech identification in formal (English) and informal (Roman Urdu and mixed English-Roman Urdu) text. We fine-tune both models on multilingual datasets and evaluate the model's performance. XLM-RoBERTa-Base achieved (94.46%) accuracy for English, (93.77%) for Roman Urdu, and (88.32%) for mixed English-Roman Urdu text, while RoBERTa-Base performed better in English (97.24%) and Roman Urdu (94.28%), but dropped slightly to (88%) for mixed English-Roman Urdu text. These results show RoBERTa-Base excels in English, but struggles with Roman Urdu and code-switched mixed languages, while XLM-RoBERTa-Base performs consistently across all languages.

Author Biographies

Husnain Saleem, Gomal University, D.I.Khan, KPK, Pakistan

Gomal Research Institute of Computing (GRIC)
Gomal University, D.I.Khan, KPK, Pakistan
Muhammad Javed, Gomal University, D.I.Khan, KPK, Pakistan

Gomal Research Institute of Computing (GRIC)
Gomal University, D.I.Khan, KPK, Pakistan
Junaid Khan, Gomal University, D.I.Khan, KPK, Pakistan

Gomal Research Institute of Computing (GRIC)
Gomal University, D.I.Khan, KPK, Pakistan

Hate Speech Identification in Formal and Informal Social Media Text Using RoBERTa-Base and XLM-RoBERTa-Base Models

Authors

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

License

Similar Articles

Information