Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems Using Multi-objective Reinforcement Learning

Page view(s)
74
Checked on Jul 01, 2024
Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems Using Multi-objective Reinforcement Learning
Title:
Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems Using Multi-objective Reinforcement Learning
Journal Title:
ACM Transactions on Cyber-Physical Systems
Publication Date:
22 September 2021
Citation:
Chen, Li, K., Li, K., Yu, P. S., & Zeng, Z. (2021). Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems Using Multi-objective Reinforcement Learning. ACM Transactions on Cyber-Physical Systems, 5(4), 1–24. https://doi.org/10.1145/3447623
Abstract:
As a new generation of Public Bicycle-sharing Systems (PBS), the Dockless PBS (DL-PBS) is an important application of cyber-physical systems and intelligent transportation. How to use artificial intelligence to provide efficient bicycle dispatching solutions based on dynamic bicycle rental demand is an essential issue for DL-PBS. In this article, we propose MORL-BD, a dynamic bicycle dispatching algorithm based on multi-objective reinforcement learning to provide the optimal bicycle dispatching solution for DL-PBS. We model the DL-PBS system from the perspective of cyber-physical systems and use deep learning to predict the layout of bicycle parking spots and the dynamic demand of bicycle dispatching. We define the multi-route bicycle dispatching problem as a multi-objective optimization problem by considering the optimization objectives of dispatching costs, dispatch truck's initial load, workload balance among the trucks, and the dynamic balance of bicycle supply and demand. On this basis, the collaborative multi-route bicycle dispatching problem among multiple dispatch trucks is modeled as a multi-agent and multi-objective reinforcement learning model. All dispatch paths between parking spots are defined as state spaces, and the reciprocal of dispatching costs is defined as a reward. Each dispatch truck is equipped with an agent to learn the optimal dispatch path in the dynamic DL-PBS network. We create an elite list to store the Pareto optimal solutions of bicycle dispatch paths found in each action, and finally get the Pareto frontier. Experimental results on the actual DL-PBS show that compared with existing methods, MORL-BD can find a higher quality Pareto frontier with less execution time.
License type:
Publisher Copyright
Funding Info:
This work was partially funded by the National Key R&D Program of China (Grant No. 2020YFB2104000), the National Outstanding Youth Science Program of National Natural Science Foundation of China (Grant No. 61625202), the Program of National Natural Science Foundation of China (Grant No. 61751204), the International (Regional) Cooperation and Exchange Program of National Natural Science Foundation of China (Grant No. 61860206011), the Natural Science Foundation of Hunan Province (Grant No. 2020JJ5084), and the International Postdoctoral Exchange Fellowship Program (Grant No. 20180024). This work was also supported in part by NSF under grants III-1763325, III-1909323, and SaTC-1930941.
Description:
© Author | ACM 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Cyber-Physical Systems, http://dx.doi.org/10.1145/3447623
ISSN:
2378-962X
2378-9638
Files uploaded:

File Size Format Action
18-acm-tcps.pdf 3.75 MB PDF Open