Sheng Zang, Zhiguang Cao, Bo An, Senthilnath Jayavelu, Xiaoli Li, "Enhancing Sub-Optimal Trajectory Stitching: Spatial Composition RvS for Offline RL" AAMAS 2025.
Abstract:
Reinforcement learning via supervised learning (RvS) has been
known as a burgeoning paradigm for offline reinforcement learning
(RL). While return-conditioned RvS (RvS-R) predominates across
a wide range of datasets pertaining to the offline RL tasks, recent
findings suggest that goal-conditioned RvS (RvS-G) outperforms
in specific sub-optimal datasets where trajectory stitching is crucial
for achieving optimal performance. However, the underlying
reasons for this superiority remain insufficiently explored. In this
paper, employing didactic experiments and theoretical analysis,
we reveal that the proficiency of RvS-G in stitching trajectories
arises from its adeptness in generalizing to unknown goals during
evaluation. Building on this insight, we introduce a novel RvS-G
approach, Spatial Composition RvS (SC-RvS), to enhance its ability
to generalize to unknown goals. This, in turn, augments the trajectory
stitching performance on sub-optimal datasets. Specifically, by
harnessing the power of advantage weight and maximum-entropy
regularized weight, our approach adeptly balances the promotion
of optimistic goal sampling with the preservation of a nuanced
level of pessimism in action selection compared to existing RvSG
methods. Extensive experimental results on D4RL benchmarks
show that our SC-RvS performed favorably against the baselines
in most cases, especially on the sub-optimal datasets that demand
trajectory stitching.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the National Research Foundation - Industry Alignment Fund – Pre-positioning (IAF-PP) Funding
Grant Reference no. : Nil