Predicting the future actions of multiple pedestrians is an essential feature for autonomous robots co-working in human crowded-environments. Estimating the unknown future path is a challenging problem due to the complex interactions occurring among pedestrians. Although recent developments in Graph Convolutional Network (GCN) allow for efficient encoding of such complex interactions, the encoded representations still lack the informative factors necessary to accurately predict their future behavior. To solve this, we introduce Disentangled GCN (DGCN) which aims to better capture the crowd interactions by decoupling the spatial and temporal factors. More specifically, we propose to encode the crowd interactions with two low-dimensional latent spaces: spatial latent and temporal latent. We propose a novel regularizer function to train these latents in an unsupervised manner and condition the trajectory prediction on the learned latents using a spatially aware graph decoder. The proposed method is evaluated extensively on publicly available datasets consisting of pedestrians and vehicles. Our method improves mADE on ETH/UCY pedestrians dataset and achieves new state-of-the-art mFDE results on nuScenes vehicle datasets.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - Industry Alignment Fund- Industry Collaboration Projects (IAF-ICP)
Grant Reference no. : I2001E0063
This research / project is supported by the National Research Foundation - Centre for Advanced Robotics Technology Innovation (CARTIN)
Grant Reference no. : NA