Resources


Dataset Catalogue

A page that catalogues datasets annotated for hate speech, online abuse, and offensive language. Datasets may be useful for e.g. training a natural language processing system to detect this language.

hatecheck.ai

HateCheck provides targeted insights into the performance of hate speech detection models. By revealing model strengths and weaknesses, it supports the creation of fairer and more accurate hate speech detection models.

BERT models

A list of language-specific BERT models compiled by researchers from Bocconi University. The list can be used to help researchers understand and find the best BERT model for a given dataset, task and language.

Technologies for detecting online hate


Perspective

Perspective uses machine learning models to identify abusive comments. The models score a phrase based on the perceived impact the text may have in a conversation.

Developers and publishers can use this score to give feedback to commenters, help moderators more easily review comments, or help readers filter out toxic language.

GATE Hate

GATE Hate is a service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text.

Abusive Content Classifier

The Abusive Content Classifier can be used to protect abusive and offensive language in your forums or portals. Accessable through an API it identifies offensive language on English language text.

HateSonar

HateSonar is a hate speech detection library for Python. It allows you to detect hate speech and offensive language in English language text.

Language models for identifying online hate


Dutch: Hate Speech Classifier for Social Media Content in Dutch (1)

A monolingual model for hate speech classification of social media content in Dutch. The model was trained on 20000 social media posts (youtube, twitter, facebook) and tested on an independent test set of 2000 posts. It is based on the pre-trained language model BERTje.

Dutch: Offensive language and abusive language classification Dutch (2)

DALC - Dutch Abusive Language Corpus (v1.0 and v2.0) is composed by tweets in Dutch extracted using different strategies and covering different time windows. Fine-tuned models for offensive langauge, abusive language and for offensive and abusive language will be avaliable soon.

Danish: Verbal attacks in Danish

A text classification model for determining if a social media post in Danish or Norwegian contains a verbal attack. The model is based on the north/t5_large_scand (by Per E. Kummervold, not publicly available) which is a Scandinavian language pretrained for 1.700.000 steps starting with the mT5 checkpoint on a Scandinavian corpus (Bokmål, Nynorsk, Danish, Swedish and Icelandic (+ a tiny bit Faeroyish)).

English: Hatescan for detection of toxic langauge i English

Hatescan contains several machine learning models used for detecting toxic language in texts. Hatescan API works for English and Swedish. More information about Hatescan can be found here.

English: Hate Speech Classifier for Social Media Content in English Language

A monolingual model for hate speech classification of social media content in English language. The model was trained on 103190 YouTube comments and tested on an independent test set of 20554 YouTube comments. It is based on English BERT base pre-trained language model.

Italian: Hate Speech Classifier for Social Media Content in Italian Language (1)

A monolingual model for hate speech classification of social media content in Italian language. The model was trained on 119,670 YouTube comments and tested on an independent test set of 21,072 YouTube comments. It is based on Italian ALBERTO pre-trained language model.

Italian: Hate Speech Classifier for Social Media Content in Italian Language (2)

HATE-ITA is a binary hate speech classification model for Italian social media text. The model is a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well to language-specific slurs.

Norweigan: Verbal attacks in Norwegian

A text classification model for determining if a social media post in Danish or Norwegian contains a verbal attack. The model is based on the north/t5_large_scand (by Per E. Kummervold, not publicly available) which is a Scandinavian language pretrained for 1.700.000 steps starting with the mT5 checkpoint on a Scandinavian corpus (Bokmål, Nynorsk, Danish, Swedish and Icelandic (+ a tiny bit Faeroyish)).

Slovenian: Hate Speech Classifier for Social Media Content in Slovenian Language

A monolingual model for hate speech classification of social media content in Slovenian language. The model was trained on 50,000 Twitter comments and tested on an independent test set of 10,000 Twitter comments. It is based on multilingual CroSloEngual BERT pre-trained language model.

Swedish: Hatescan for detection of toxic langauge i Swedish

Hatescan contains several machine learning models used for detecting toxic language in texts. Hatescan API works for English and Swedish. More information about Hatescan can be found here.

Dictionaries for identifying online hate


Racial Slur Database

The Racial Slur Database is a database where slurs denoting race, ethnicity, religion or country of origin is collected.

Hatebase

Hatebase is a collaborative, regionalized repository of multilingual hate speech.

Spanish Hate speech lexicon

A lexicon developed to identify hate speech on Spanish.

Weaponized Word

The Weaponized Word uses dynamic dictionaries of known vocabulary, threats, phishing templates and disinformation sources, as well as an understanding of negative language patterns, to provide an unparalleled lexicographic defense to content threats.