Reinforcement learning under temporal logic constraints as a sequence modeling problem

Page view(s)
108
Checked on Jan 12, 2025
Reinforcement learning under temporal logic constraints as a sequence modeling problem
Title:
Reinforcement learning under temporal logic constraints as a sequence modeling problem
Journal Title:
Robotics and Autonomous Systems
Publication Date:
28 December 2022
Citation:
Tian, D., Fang, H., Yang, Q., Yu, H., Liang, W., & Wu, Y. (2023). Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161, 104351. https://doi.org/10.1016/j.robot.2022.104351
Abstract:
Reinforcement learning (RL) under temporal logic typically suffers from slow propagation for credit assignment. Inspired by recent advancements called trajectory transformer in machine learning, the reinforcement learning under Temporal Logic (TL) is modeled as a sequence modeling problem in this paper, where an agent utilizes the transformer to fit the optimal policy satisfying the Finite Linear Temporal Logic (LTL_f) tasks. To combat the sparse reward issue, dense reward functions for LTL_f are designed. For the sake of reducing the computational complexity, a sparse transformer with local and global attention is constructed to automatically conduct credit assignment, which removes the time-consuming value iteration process. The optimal action is found by the beam search performed in transformers. The proposed method generates a series of policies fitted by sparse transformers, which has sustainably high accuracy in fitting the demonstrations. At last, the effectiveness of the proposed method is demonstrated by simulations in Mini-Grid environments.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the A*STAR - Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering domain)
Grant Reference no. : A19E4a0101
Description:
The Key Program of NSFC, China under Grant 62133002, Joint Funds of NSFC, China under Grant U1913602, and in part by the NSFC, China under Grants 61903035, the National Key Research and Development Program of China (No. 2022YFB4702000), the National Key Research and Development Program of China under No. 2022YFA1004703 and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100).
ISSN:
0921-8890
Files uploaded:

File Size Format Action
tian2023reinforcement-amended.pdf 1.15 MB PDF Open