Log in to save to my catalogue

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2871978049

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

About this item

Full title

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2024-07

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

More information

Scope and Contents

Contents

The primary challenge in deploying Large Language Model (LLM) is ensuring its harmlessness. Red team can identify vulnerabilities by attacking LLM to attain safety. However, current efforts heavily rely on single-round prompt designs and unilateral red team optimizations against fixed blue teams. These static approaches lead to significant reductio...

Alternative Titles

Full title

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2871978049

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2871978049

Other Identifiers

E-ISSN

2331-8422

How to access this item