A page that catalogues datasets annotated for hate speech, online abuse, and offensive language. Datasets may be useful for e.g. training a natural language processing system to detect this language.
HateCheck provides targeted insights into the performance of hate speech detection models. By revealing model strengths and weaknesses, it supports the creation of fairer and more accurate hate speech detection models.
A list of language-specific BERT models compiled by researchers from Bocconi University. The list can be used to help researchers understand and find the best BERT model for a given dataset, task and language.
Perspective uses machine learning models to identify abusive comments. The models score a phrase based on the perceived impact the text may have in a conversation.
Developers and publishers can use this score to give feedback to commenters, help moderators more easily review comments, or help readers filter out toxic language.
GATE Hate is a service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text.
The Abusive Content Classifier can be used to protect abusive and offensive language in your forums or portals. Accessable through an API it identifies offensive language on English language text.
HateSonar is a hate speech detection library for Python. It allows you to detect hate speech and offensive language in English language text.
A monolingual model for hate speech classification of social media content in Dutch. The model was trained on 20000 social media posts (youtube, twitter, facebook) and tested on an independent test set of 2000 posts. It is based on the pre-trained language model BERTje.
DALC - Dutch Abusive Language Corpus (v1.0 and v2.0) is composed by tweets in Dutch extracted using different strategies and covering different time windows. Fine-tuned models for offensive langauge, abusive language and for offensive and abusive language will be avaliable soon.
A text classification model for determining if a social media post in Danish or Norwegian contains a verbal attack. The model is based on the north/t5_large_scand (by Per E. Kummervold, not publicly available) which is a Scandinavian language pretrained for 1.700.000 steps starting with the mT5 checkpoint on a Scandinavian corpus (Bokmål, Nynorsk, Danish, Swedish and Icelandic (+ a tiny bit Faeroyish)).
Hatescan contains several machine learning models used for detecting toxic language in texts. Hatescan API works for English and Swedish. More information about Hatescan can be found here.
A monolingual model for hate speech classification of social media content in English language. The model was trained on 103190 YouTube comments and tested on an independent test set of 20554 YouTube comments. It is based on English BERT base pre-trained language model.
A monolingual model for hate speech classification of social media content in Italian language. The model was trained on 119,670 YouTube comments and tested on an independent test set of 21,072 YouTube comments. It is based on Italian ALBERTO pre-trained language model.
HATE-ITA is a binary hate speech classification model for Italian social media text. The model is a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well to language-specific slurs.
A text classification model for determining if a social media post in Danish or Norwegian contains a verbal attack. The model is based on the north/t5_large_scand (by Per E. Kummervold, not publicly available) which is a Scandinavian language pretrained for 1.700.000 steps starting with the mT5 checkpoint on a Scandinavian corpus (Bokmål, Nynorsk, Danish, Swedish and Icelandic (+ a tiny bit Faeroyish)).
A monolingual model for hate speech classification of social media content in Slovenian language. The model was trained on 50,000 Twitter comments and tested on an independent test set of 10,000 Twitter comments. It is based on multilingual CroSloEngual BERT pre-trained language model.
The Racial Slur Database is a database where slurs denoting race, ethnicity, religion or country of origin is collected.
Hatebase is a collaborative, regionalized repository of multilingual hate speech.
A lexicon developed to identify hate speech on Spanish.
The Weaponized Word uses dynamic dictionaries of known vocabulary, threats, phishing templates and disinformation sources, as well as an understanding of negative language patterns, to provide an unparalleled lexicographic defense to content threats.