Log in to save to my catalogue

Towards Understanding Sycophancy in Language Models

Towards Understanding Sycophancy in Language Models

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2880585773

Towards Understanding Sycophancy in Language Models

About this item

Full title

Towards Understanding Sycophancy in Language Models

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2023-10

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judg...

Alternative Titles

Full title

Towards Understanding Sycophancy in Language Models

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2880585773

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2880585773

Other Identifiers

E-ISSN

2331-8422

How to access this item