Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games
About this item
Full title
Author / Creator
Ma, Chengdong , Yang, Ziran , Ci, Hai , Gao, Jun , Gao, Minquan , Pan, Xuehai and Yang, Yaodong
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
The primary challenge in deploying Large Language Model (LLM) is ensuring its harmlessness. Red team can identify vulnerabilities by attacking LLM to attain safety. However, current efforts heavily rely on single-round prompt designs and unilateral red team optimizations against fixed blue teams. These static approaches lead to significant reductio...
Alternative Titles
Full title
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2871978049
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2871978049
Other Identifiers
E-ISSN
2331-8422