Talking Face Generation via Face Mesh - Controllability without Reference Videos

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/20366

Title:

Talking Face Generation via Face Mesh - Controllability without Reference Videos

Journal Title:

IEEE Conference on Artificial Intelligence (CAI)

DOI:

10.1109/CAI59869.2024.00246

Publication URL:

https://ieeecai.org/2024/wp-content/pdfs/540900b377/540900b377.pdf

Authors:

Ali Köksal, Qianli Xu, Joo Hwee Lim

Keywords:

face synthesis, talking face generation, facial animation, controllable generation, generative adversarial networks (GANs)

Publication Date:

27 June 2024

Citation:

Koksal, A., Xu, Q., Lim, J.H., Talking Face Generation via Face Mesh - Controllability without Reference Videos, IEEE Conference on Artificial Intelligence (CAI), 2024

Abstract:

Recent development in audio-driven talking face generation strives for controlling facial features including facial expression, head pose, eye blink, etc. as well as accurate lip synchronization and the ability to apply to arbitrary subjects. Existing audio-visual models that can control facial features require encoders that encode driving videos, which is both computationally expensive and limited by the availability of such driving videos. In this paper, we address this limitation and aim to control facial features without encoding driving videos. We propose a cascaded GAN-based audio-visual model, which incorporates face mesh as an intermediate representation. Different from existing cascaded methods that use facial landmarks, our method uses face mesh as a medium of informative facial feature representation. To the best of our knowledge, this is the first cascaded model that allows controllable talking face generation via face mesh. We train our audio-visual model with training samples of MEAD dataset. In the evaluation, we benchmark our model in extensive experiments on MEAD and LRW datasets. The results show our model outperforms existing ones by generating high-fidelity audio-driven talking faces on arbitrary subjects with realistic emotional expression patterns.

License type:

Publisher Copyright

Funding Info:

This research / project is supported by the SMU-A*STAR Joint Lab Seed Grant under Human-AI Synergy Pillar - SPASCA (EC-2023-022)
Grant Reference no. : C232918002

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/20366

ISSN:

N.A

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
talkingfacegenerationviafacemesh.pdf	1.67 MB	PDF	Open