Tian, D., Fang, H., Yang, Q., Yu, H., Liang, W., & Wu, Y. (2023). Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161, 104351. https://doi.org/10.1016/j.robot.2022.104351
Reinforcement learning (RL) under temporal logic typically suffers from slow propagation for credit assignment. Inspired by recent advancements called trajectory transformer in machine learning, the reinforcement learning under Temporal Logic (TL) is modeled as a sequence modeling problem in this paper, where an agent utilizes the transformer to fit the optimal policy satisfying the Finite Linear Temporal Logic (LTL_f) tasks. To combat the sparse reward issue, dense reward functions for LTL_f are designed. For the sake of reducing the computational complexity, a sparse transformer with local and global attention is constructed to automatically conduct credit assignment, which removes the time-consuming value iteration process. The optimal action is found by the beam search performed in transformers. The proposed method generates a series of policies fitted by sparse transformers, which has sustainably high accuracy in fitting the demonstrations. At last, the effectiveness of the proposed method is demonstrated by simulations in Mini-Grid environments.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
This research / project is supported by the A*STAR - Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering domain)
Grant Reference no. : A19E4a0101
The Key Program of NSFC, China under Grant 62133002, Joint Funds of NSFC, China under Grant U1913602, and in part by the NSFC, China under Grants 61903035, the National Key Research and Development Program of China (No. 2022YFB4702000), the National Key Research and Development Program of China under No. 2022YFA1004703 and Shanghai Municipal Science and Technology Major Project