Log in to save to my catalogue

On Proximal Policy Optimization's Heavy-tailed Gradients

On Proximal Policy Optimization's Heavy-tailed Gradients

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2492481799

On Proximal Policy Optimization's Heavy-tailed Gradients

About this item

Full title

On Proximal Policy Optimization's Heavy-tailed Gradients

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2021-07

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

More information

Scope and Contents

Contents

Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we...

Alternative Titles

Full title

On Proximal Policy Optimization's Heavy-tailed Gradients

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_2492481799

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2492481799

Other Identifiers

E-ISSN

2331-8422

How to access this item