Talking Face Generation via Face Mesh - Controllability without Reference Videos

Page view(s)
5
Checked on Feb 15, 2025
Talking Face Generation via Face Mesh - Controllability without Reference Videos
Title:
Talking Face Generation via Face Mesh - Controllability without Reference Videos
Journal Title:
IEEE Conference on Artificial Intelligence (CAI)
Publication Date:
27 June 2024
Citation:
Koksal, A., Xu, Q., Lim, J.H., Talking Face Generation via Face Mesh - Controllability without Reference Videos, IEEE Conference on Artificial Intelligence (CAI), 2024
Abstract:
Recent development in audio-driven talking face generation strives for controlling facial features including facial expression, head pose, eye blink, etc. as well as accurate lip synchronization and the ability to apply to arbitrary subjects. Existing audio-visual models that can control facial features require encoders that encode driving videos, which is both computationally expensive and limited by the availability of such driving videos. In this paper, we address this limitation and aim to control facial features without encoding driving videos. We propose a cascaded GAN-based audio-visual model, which incorporates face mesh as an intermediate representation. Different from existing cascaded methods that use facial landmarks, our method uses face mesh as a medium of informative facial feature representation. To the best of our knowledge, this is the first cascaded model that allows controllable talking face generation via face mesh. We train our audio-visual model with training samples of MEAD dataset. In the evaluation, we benchmark our model in extensive experiments on MEAD and LRW datasets. The results show our model outperforms existing ones by generating high-fidelity audio-driven talking faces on arbitrary subjects with realistic emotional expression patterns.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the SMU-A*STAR Joint Lab Seed Grant under Human-AI Synergy Pillar - SPASCA (EC-2023-022)
Grant Reference no. : C232918002
Description:
ISSN:
N.A
Files uploaded:

File Size Format Action
talkingfacegenerationviafacemesh.pdf 1.67 MB PDF Open