PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search

Page view(s)
71
Checked on Nov 24, 2024
PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search
Title:
PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search
Journal Title:
IEEE Robotics and Automation Letters
Publication Date:
18 July 2022
Citation:
Sheng, W., Guo, H., Yau, W.-Y., & Zhou, Y. (2022). PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search. IEEE Robotics and Automation Letters, 7(4), 8869–8876. https://doi.org/10.1109/lra.2022.3188904
Abstract:
This paper presents a new range of multi-robot search for a non-adversarial moving target problems, namely multi-robot reliable search (MuRRS). The term `reliability' in MuRRS is defined as the expectation of a predefined utility function over the probability density function (pdf) of the target's capture time. We argue that MuRRS subsumes the canonical multi-robot efficient search (MuRES) problem, which minimizes the target's expected capture time, as its special case, and offers the end user with a wide range of objective selection options. Since state-of-the-art algorithms are usually targeting the MuRES problem, and cannot offer up-to-standard performance to the various MuRRS objectives, we, thereby, propose a probability density factorized multi-agent distributional reinforcement learning method, namely PD-FAC, as a unified solution to the MuRRS problem. PD-FAC decomposes the PDF of the multi-robot system's overall value distribution into a set of individual value distributions and guarantees that any reliability objective defined as a function of the overall system's value distribution can be linearly approximated by the same reliability metric defined over the agent's individual value distribution. In this way, the individual global maximum (IGM) principle is satisfied for all the pre-defined reliability metrics. It means that when each reinforcement learning agent is executing the individual policy, which maximizes its own reliability metric, the system's overall reliability performance is also maximized. We evaluate and compare the performance of PD-FAC with state of the arts in a range of canonical multi-robot search environments with satisfying results, and also deploy PD-FAC to a real multi-robot system for non-adversarial moving target search.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation - RIE2020 - Advanced Manufacturing and Engineering
Grant Reference no. : A1687b0033
Description:
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
2377-3774
2377-3766
Files uploaded:

File Size Format Action
ra-l-murrs.pdf 4.00 MB PDF Open