Hate Speech Identification in Formal and Informal Social Media Text Using RoBERTa-Base and XLM-RoBERTa-Base Models
Keywords:
social media, hate speech identification, RoBERTa-Base, XLM-RoBERTa-BaseAbstract
Social media platforms enable users to interact and share content through text, images, videos, and links. Sentiment analysis, a key method in Natural Language Processing (NLP), evaluates the emotional tone of social media content. It helps to identify hate speech, which includes harmful, offensive language directed at individuals or groups. In this study, we present the performance comparison of two transformer-based models, RoBERTa-Base and XLM-RoBERTa-Base, for hate speech identification in formal (English) and informal (Roman Urdu and mixed English-Roman Urdu) text. We fine-tune both models on multilingual datasets and evaluate the model's performance. XLM-RoBERTa-Base achieved (94.46%) accuracy for English, (93.77%) for Roman Urdu, and (88.32%) for mixed English-Roman Urdu text, while RoBERTa-Base performed better in English (97.24%) and Roman Urdu (94.28%), but dropped slightly to (88%) for mixed English-Roman Urdu text. These results show RoBERTa-Base excels in English, but struggles with Roman Urdu and code-switched mixed languages, while XLM-RoBERTa-Base performs consistently across all languages.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Husnain Saleem, Muhammad Javed, Junaid Khan (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.