Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL

Page view(s)

Checked on Sep 12, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/22078

Title:

Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL

Journal Title:

Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems(AAMAS2025)

DOI:

Publication URL:

https://www.ifaamas.org/Proceedings/aamas2025/

Authors:

Sheng Zang, Zhiguang Cao, Bo An, Senthilnath Jayavelu, Xiaoli Li

Keywords:

Publication Date:

23 May 2025

Citation:

Sheng Zang, Zhiguang Cao, Bo An, Senthilnath Jayavelu, Xiaoli Li, "Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL" AAMAS 2025.

Abstract:

Reinforcement learning via supervised learning (RvS) has been known as a burgeoning paradigm for offline reinforcement learning (RL). While return-conditioned RvS (RvS-R) predominates across a wide range of datasets pertaining to the offline RL tasks, recent findings suggest that goal-conditioned RvS (RvS-G) outperforms in specific sub-optimal datasets where trajectory stitching is crucial for achieving optimal performance. However, the underlying reasons for this superiority remain insufficiently explored. In this paper, employing didactic experiments and theoretical analysis, we reveal that the proficiency of RvS-G in stitching trajectories arises from its adeptness in generalizing to unknown goals during evaluation. Building on this insight, we introduce a novel RvS-G approach, Spatial Composition RvS (SC-RvS), to enhance its ability to generalize to unknown goals. This, in turn, augments the trajectory stitching performance on sub-optimal datasets. Specifically, by harnessing the power of advantage weight and maximum-entropy regularized weight, our approach adeptly balances the promotion of optimistic goal sampling with the preservation of a nuanced level of pessimism in action selection compared to existing RvSG methods. Extensive experimental results on D4RL benchmarks show that our SC-RvS performed favorably against the baselines in most cases, especially on the sub-optimal datasets that demand trajectory stitching.

License type:

Attribution 4.0 International (CC BY 4.0)

Funding Info:

This research / project is supported by the National Research Foundation - Industry Alignment Fund – Pre-positioning (IAF-PP) Funding
Grant Reference no. : Nil

Description:

Copyright © 2025 by International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyright for components of this work owned by others than IFAAMAS must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/22078

ISSN:

NIL

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
anbo-new.pdf	2.04 MB	PDF	Open