H. Zhang, M.C. Leong, L. Li, and W. Lin, “RD-Diff: RLTransformer-based Diffusion Model with Diversity-Inducing Modulator for Human Motion Prediction”, Proceedings of the Asian Conference on Computer Vision (ACCV), 2024.
Abstract:
Human Motion Prediction (HMP) is crucial for human-robot collaboration, surveillance, and autonomous driving applications. Recently, diffusion models have shown promising progress due to their ease of training and realistic generation capabilities. To enhance both accuracy and diversity of the diffusion model in HMP, we present RD-Diff: RLTransformer-based Diffusion model with Diversity-inducing modulator. First, to improve transformer’s effectiveness on the frequency representation of human motion transformed by Discrete Cosine Transform (DCT), we introduce a novel Regulated Linear Transformer (RLTransformer) with a specially designed linear-attention mechanism. Next, to further enhance the performance, we propose a Diversity-Inducing Modulator (DIM) to generate noise-modulated observation conditions for a pretrained diffusion model. Experimental results show that our RD-Diff establishes a new state-of-the-art performance on both accuracy and diversity compared to existing methods.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the SERC Grant - Understanding from Unified Perceptual Grounding [Human Robot Collaborative AI for AME - WP1]
Grant Reference no. : NA
Description:
This is a post-peer-review, pre-copyedit version of the article published in Computer Vision – ACCV 2024. The final authenticated version is available online at https://link.springer.com/.