Guo, H., Liu, Z., Shi, R., Yau, W.-Y., & Rus, D. (2023). Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search. IEEE Transactions on Robotics, 1–16. https://doi.org/10.1109/tro.2023.3263459
Abstract:
This article investigates the multirobot efficient search (MuRES) for a nonadversarial moving target problem from the multiagent reinforcement learning (MARL) perspective. MARL is deemed as a promising research field for cooperative multiagent applications. However, one of the main bottlenecks of applying MARL to the MuRES problem is the nonstationarity introduced by multiple learning agents. With learning agents simultaneously updating their policies, the environment cannot be modeled as a stationary Markov decision process, which results in the inapplicability of fundamental reinforcement learning techniques such as deep Q -network and policy gradient (PG). In view of that, we adopt the centralized training and decentralized execution scheme and thereby propose a cross-entropy regularized policy gradient (CE-PG) method to train the learning agents/robots. We let the robots commit to a predetermined policy during execution, collect the trajectories, and then perform centralized training for the corresponding policy improvement. In this way, the nonstationarity problem is overcome, in that the robots do not update their policies during execution. During the centralized training stage, we improve the canonical PG method to consider the interactions among robots by adding a cross-entropy regularization term, which essentially functions to “disperse” the robots in the environment. Extensive simulation results and comparisons with state of the art show CE-PG's superior performance, and we also validate the algorithm with a real multirobot system in an indoor moving target search scenario.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the A*STAR - 2022 HORIZONTAL TECHNOLOGY COORDINATING OFFICE SEED FUND
Grant Reference no. : C221518004