On Proximal Policy Optimization's Heavy-tailed Gradients
On Proximal Policy Optimization's Heavy-tailed Gradients
About this item
Full title
Author / Creator
Publisher
Ithaca: Cornell University Library, arXiv.org
Journal title
Language
English
Formats
Publication information
Publisher
Ithaca: Cornell University Library, arXiv.org
Subjects
More information
Scope and Contents
Contents
Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we...
Alternative Titles
Full title
On Proximal Policy Optimization's Heavy-tailed Gradients
Authors, Artists and Contributors
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2492481799
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2492481799
Other Identifiers
E-ISSN
2331-8422