Log in to save to my catalogue

PWM: Policy Learning with Large World Models

PWM: Policy Learning with Large World Models

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3075794151

PWM: Policy Learning with Large World Models

About this item

Full title

PWM: Policy Learning with Large World Models

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2024-07

Language

English

Formats

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

Reinforcement Learning (RL) has achieved impressive results on complex tasks but struggles in multi-task settings with different embodiments. World models offer scalability by learning a simulation of the environment, yet they often rely on inefficient gradient-free optimization methods. We introduce Policy learning with large World Models (PWM), a novel model-based RL algorithm that learns continuous control policies from large multi-task world models. By pre-training the world model on offline data and using it for first-order gradient policy learning, PWM effectively solves tasks with up to 152 action dimensions and outperforms methods using ground-truth dynamics. Additionally, PWM scales to an 80-task setting, achieving up to 27% higher rewards than existing baselines without the need for expensive online planning. Visualizations and code available at https://www.imgeorgiev.com/pwm...

Alternative Titles

Full title

PWM: Policy Learning with Large World Models

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_3075794151

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3075794151

Other Identifiers

E-ISSN

2331-8422

How to access this item