Controllable Video Generation With Text-Based Instructions

Page view(s)
Checked on May 12, 2024
Controllable Video Generation With Text-Based Instructions
Controllable Video Generation With Text-Based Instructions
Journal Title:
IEEE Transactions on Multimedia
Publication Date:
29 March 2023
Köksal, A., Ak, K. E., Sun, Y., Rajan, D., & Lim, J. H. (2024). Controllable Video Generation With Text-Based Instructions. IEEE Transactions on Multimedia, 26, 190–201.
Most of the existing studies on controllable video generation either transfer disentangled motion to an appearance without detailed control over motion or generate videos of simple actions such as the movement of arbitrary objects conditioned on a control signal from users. In this study, we introduce Controllable Video Generation with text-based Instructions (CVGI) framework that allows text-based control over action performed on a video. CVGI generates videos where hands interact with objects to perform the desired action by generating hand motions with detailed control through text-based instruction from users. By incorporating the motion estimation layer, we divide the task into two sub-tasks: (1) control signal estimation and (2) action generation. In control signal estimation, an encoder models actions as a set of simple motions by estimating low-level control signals for text-based instructions with given initial frames. In action generation, generative adversarial networks (GANs) generate realistic hand-based action videos as a combination of hand motions conditioned on the estimated low control level signal. Evaluations on several datasets (EPIC-Kitchens-55, BAIR robot pushing, and Atari Breakout) show the effectiveness of CVGI in generating realistic videos and in the control over actions.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Files uploaded:

File Size Format Action
cvgi-final-version.pdf 1.79 MB PDF Request a copy