Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
About this item
Full title
Author / Creator
Bai, Yuntao , Jones, Andy , Ndousse, Kamal , Askell, Amanda , Chen, Anna , DasSarma, Nova , Drain, Dawn , t, Stanislav , Ganguli, Deep , Henighan, Tom , Nicholas, Joseph , Kadavath, Saurav , Jackson Kernion , Conerly, Tom , El-Showk, Sheer , Nelson Elhage , Hatfield-Dodds, Zac , Hernandez, Danny , Hume, Tristan , Johnston, Scott , Kravec, Shauna , Lovitt, Liane , Nanda, Neel , Olsson, Catherine , Amodei, Dario , Brown, Tom , Clark, Jack , McCandlish, Sam , Olah, Chris , Mann, Ben and Kaplan, Jared
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore...
Alternative Titles
Full title
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Authors, Artists and Contributors
Author / Creator
Jones, Andy
Ndousse, Kamal
Askell, Amanda
Chen, Anna
DasSarma, Nova
Drain, Dawn
t, Stanislav
Ganguli, Deep
Henighan, Tom
Nicholas, Joseph
Kadavath, Saurav
Jackson Kernion
Conerly, Tom
El-Showk, Sheer
Nelson Elhage
Hatfield-Dodds, Zac
Hernandez, Danny
Hume, Tristan
Johnston, Scott
Kravec, Shauna
Lovitt, Liane
Nanda, Neel
Olsson, Catherine
Amodei, Dario
Brown, Tom
Clark, Jack
McCandlish, Sam
Olah, Chris
Mann, Ben
Kaplan, Jared
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2649832326
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2649832326
Other Identifiers
E-ISSN
2331-8422