ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

| Ask | Become a Library member

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2640464364

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

About this item

Full title

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Author / Creator

Hartvigsen, Thomas , Gabriel, Saadia , Palangi, Hamid , Sap, Maarten , Ray, Dipankar and Kamar, Ece

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2022-07

Language

English

Formats

Articles

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that annotators struggle to distinguish machine-generated text from human-written language. We also find that 94.5% of toxic examples are labeled as hate speech by human annotators. Using three publicly-available datasets, we show that finetuning a toxicity classifier on our data improves its performance on human-written data substantially. We also demonstrate that ToxiGen can be used to fight machine-generated toxicity as finetuning improves the classifier significantly on our evaluation subset. Our code and data can be found at https://github.com/microsoft/ToxiGen....

Alternative Titles

Full title

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Authors, Artists and Contributors

Author / Creator

Hartvigsen, Thomas
Gabriel, Saadia
Palangi, Hamid
Sap, Maarten
Ray, Dipankar
Kamar, Ece

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2640464364

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2640464364

Other Identifiers

E-ISSN

2331-8422

How to access this item

Full text available

View in old catalogue

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

About this item

Publication information

Subjects

More information

Scope and Contents

Alternative Titles

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Other Identifiers

How to access this item

Connecting people and collections

Indigenous engagement

Schools and teachers

Stories