Log in to save to my catalogue

Discovering Language Model Behaviors with Model-Written Evaluations

Discovering Language Model Behaviors with Model-Written Evaluations

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2755992596

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approach...

Alternative Titles

Full title

Discovering Language Model Behaviors with Model-Written Evaluations

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2755992596

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2755992596

Other Identifiers

E-ISSN

2331-8422

How to access this item