I got 4th position on leaderboard at PAN in this competetion.
Hate speech (HS) is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics. Given the huge amount of user-generated contents on Twitter, the problem of detecting, and therefore possibly contrasting the HS diffusion, is becoming fundamental, for instance for fighting against misogyny and xenophobia. To this end, in this task, we aim at identifying possible hate speech spreaders on Twitter as a first step towards preventing hate speech from being propagated among online users.
After having addressed several aspects of author profiling in social media from 2013 to 2020 (fake news spreaders, bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possbile to discriminate authors that have shared some hate speech in the past from those that, to the best of our knowledge, have never done it.
Data is taken from PAN website The data uploaded on this github repo is password protected. You need to replace it with your data.
In order to reproduce the result, run all notebooks presents in for a particualar task. Each notebook produce two csv files.
Put all csv files produced by all notebooks of a particular task in a folder such as submissions
in our case and run the script.py file to average the predictions.
Five fold cross validation result
English: 75%
Spanish: 85%
English: 72%
Spanish: 82%
@inproceedings{anwar2021identify,
title={Identify Hate Speech Spreaders on Twitter using Transformer Embeddings Features and AutoML Classifiers.},
author={Anwar, Talha},
booktitle={CLEF (Working Notes)},
pages={1808--1812},
year={2021}
}