Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL

Page view(s)
71
Checked on Sep 12, 2025
Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL
Title:
Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL
Journal Title:
Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems(AAMAS2025)
DOI:
Keywords:
Publication Date:
23 May 2025
Citation:
Sheng Zang, Zhiguang Cao, Bo An, Senthilnath Jayavelu, Xiaoli Li, "Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL" AAMAS 2025.
Abstract:
Reinforcement learning via supervised learning (RvS) has been known as a burgeoning paradigm for offline reinforcement learning (RL). While return-conditioned RvS (RvS-R) predominates across a wide range of datasets pertaining to the offline RL tasks, recent findings suggest that goal-conditioned RvS (RvS-G) outperforms in specific sub-optimal datasets where trajectory stitching is crucial for achieving optimal performance. However, the underlying reasons for this superiority remain insufficiently explored. In this paper, employing didactic experiments and theoretical analysis, we reveal that the proficiency of RvS-G in stitching trajectories arises from its adeptness in generalizing to unknown goals during evaluation. Building on this insight, we introduce a novel RvS-G approach, Spatial Composition RvS (SC-RvS), to enhance its ability to generalize to unknown goals. This, in turn, augments the trajectory stitching performance on sub-optimal datasets. Specifically, by harnessing the power of advantage weight and maximum-entropy regularized weight, our approach adeptly balances the promotion of optimistic goal sampling with the preservation of a nuanced level of pessimism in action selection compared to existing RvSG methods. Extensive experimental results on D4RL benchmarks show that our SC-RvS performed favorably against the baselines in most cases, especially on the sub-optimal datasets that demand trajectory stitching.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the National Research Foundation - Industry Alignment Fund – Pre-positioning (IAF-PP) Funding
Grant Reference no. : Nil
Description:
Copyright © 2025 by International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyright for components of this work owned by others than IFAAMAS must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
ISSN:
NIL
Files uploaded:

File Size Format Action
anbo-new.pdf 2.04 MB PDF Open