Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

| Ask | Become a Library member

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2715606750

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

About this item

Full title

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Author / Creator

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2022-11

Language

English

Formats

Articles

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM)...

Alternative Titles

Full title

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2715606750

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2715606750

Other Identifiers

E-ISSN

2331-8422

How to access this item

Full text available

View in old catalogue

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

About this item

Publication information

Subjects

More information

Scope and Contents

Alternative Titles

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Other Identifiers

How to access this item

Connecting people and collections

Indigenous engagement

Learning

Stories