Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile...

| Ask | Become a Library member

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile...

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3132697447

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

About this item

Full title

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Author / Creator

Le, Huy , Gabriel, Miroslav , Hoang, Tai , Neumann, Gerhard and Ngo Anh Vien

Publisher

Ithaca: Cornell University Library, arXiv.org

Journal title

arXiv.org, 2024-11

Language

English

Formats

Articles

Publication information

Publisher

Ithaca: Cornell University Library, arXiv.org

Subjects

Subjects and topics

More information

Scope and Contents

Contents

Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-value function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy algorithm (HyDo) and evaluate its performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo encourages more diverse behavior policies, leading to significantly improved success rates across tasks - for example, increasing from 53% to 72% on a real-world 6D pose alignment task. Project page: https://leh2rng.github.io/hydo...

Alternative Titles

Full title

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Authors, Artists and Contributors

Author / Creator

Le, Huy
Gabriel, Miroslav
Hoang, Tai
Neumann, Gerhard
Ngo Anh Vien

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_3132697447

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3132697447

Other Identifiers

E-ISSN

2331-8422

How to access this item

Full text available

View in old catalogue

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile...

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile...

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

About this item

Publication information

Subjects

More information

Scope and Contents

Alternative Titles

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Other Identifiers

How to access this item

Connecting people and collections

Indigenous engagement

Learning

Stories