A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques

Nikhil Pakhale¹

Mudasir Ganaie¹

Abhinav Dhall²

¹ Indian Institute of Technology Ropar, India
² Monash University, Australia

[Paper]

Overview of Generic Motion Generation Pipeline of existing SOTAs. Given the input from the respective modalities, the methods generate desired body or face motion using appropriate representation techniques.

Abstract

Body and face motion play an integral role in communication. They convey crucial information on the participants. Advances in generative modeling and multi-modal learning have enabled motion generation from signals such as speech, conversational context and visual cues. However, generating expressive and coherent face and body dynamics remains challenging due to the complex interplay of verbal / non-verbal cues and individual personality traits. This survey reviews body and face motion generation, covering core concepts, representations techniques, generative approaches, datasets and evaluation metrics. We highlight future directions to enhance the realism, coherence and expressiveness of avatars in dyadic settings. To the best of our knowledge, this work is the first comprehensive review to cover both body and face motion.

Representation Techniques

(Top) Given the body image, the body pose and geometry are reconstructed using body representation techniques. Source: Pavlakos et al. (2019)
(Bottom) Given face images, the face is parameterized and reconstructed using 3DMM frameworks, effectively capturing the pose and expression. Source: Retsinas et al. (2024)

Motion Visualizations

Body motion generated using SMPL

Face reconstruction using FaceVerse

Talking Face Generating using EdTalk (Diffusion-based model)

Datasets

Dataset Category	Dataset	Year	Region
Text-Conditioned Motion Datasets	MotionFix	2024	Body
	Motion-X	2023	Body
	CelebV-Text	2023	Face
	HumanML3D	2022	Body
	HUMANISE	2022	Body
	BABEL	2021	Body
	KIT-ML	2016	Body
Audio-Conditioned Motion Datasets	MOSA	2024	Body
	AIOZ-GDANCE	2023	Body
	BEAT	2022	Body + Face
	PhantomDance	2022	Body
	AIST++	2021	Body
	Trinity	2020	Body + Face
	AniDance	2018	Body
Speech-Conditioned Motion Datasets	CHDTF	2025	Face
	Hallo3	2024	Body
	MultiTalk	2024	Body
	ZEGGS	2023	Body + Face
	BEAT	2022	Body + Face
	CelebV-HQ	2022	Face
	VFHQ	2022	Face
	ViCo	2022	Face
	TalkHead-1KH	2021	Face
	MEAD	2020	Face
	PATS	2020	Body + Face
	Trinity	2020	Body + Face
	Speech2Gesture	2019	Body + Face
	CelebV	2018	Face
	VoxCeleb2	2018	Face
	VoxCeleb1	2017	Face
	LRS	2017	Face
	MV-LRS	2017	Face
	LRW	2016	Face
Scene-Conditioned Motion Datasets	Habitat	2023	Body
	Circle	2023	Body
	HUMANISE	2022	Body
	COUCH	2022	Body
	SAMP	2021	Body
	GTA-IM	2020	Body + Face
	PROX	2019	Body
	JTA	2018	Body
	PiGraph	2016	Body
Action-Conditioned Motion Datasets	Motion-X	2023	Body
	BABEL	2021	Body
	EMOGAIT	2021	Body
	HuMMan	2021	Body
	HumanAct12	2020	Body
	NTU-RGB+D	2016	Body
	Penn Action	2013	Body
	UCF101	2012	Body
General Motion Capture Datasets	BioCV	2024	Body
	Motion-X	2023	Body
	HuMMan	2021	Body
	MoVi	2021	Body
	AMASS	2019	Body
	Human3.6M	2014	Body
Interaction Datasets	MMF2F	2025	Face
	DyConv	2025	Face
	CCDb+	2025	Face
	NoXi-J	2024	Body
	InterHuman	2024	Body
	Audio2Photoreal	2024	Body + Face
	RealTalk	2023	Face
	L2L	2023	Face
	UDIVA	2021	Face
	GRAB	2020	Body
	NoXi	2017	Body
	RECOLA	2013	Face

Generative Techniques

Roadmap of Motion Generation Techniques.

Facial Animation Methods

Model	Approach	Year
Diffusion	AV-Flow	2025
	AniPortrait	2024
	DAWN	2024
	EDTalk	2024
	EMO: Emote Portrait Alive	2024
	MEMO	2024
	Real3D-Portrait	2024
	EAT-Face	2024
GAN	ToonifyGB	2025
	G3FA	2024
	Style2Talker	2024
	VideoReTalking	2022
	EAMM	2022
	Talking Face Generation	2019
Neural Network and VAE-based	GSmoothFace	2025
	EmoHuman	2025
	DIM-Listener	2024
	CustomListener	2024
	Talk3D	2024
	FlowVQTalker	2024
	FreeAvatar	2024
	Can Language Models Learn to Listen?	2023
	MODA	2023
	SadTalker	2023
	VividTalk	2023
	Learning2Listen	2022
	RLHG	2022
	ELP	2023
	Trans-VAE	2023
	EVP	2021

Body Animation Methods

Model	Approach	Year
Diffusion	SMD	2025
	Goal-Driven Motion Synthesis	2025
	Light-T2M	2025
	AMUSE	2025
	SkeletonDiffusion	2025
	UniMuMo	2025
	EMDM	2024
	FlowMDM	2024
	MotionDiffuse	2024
	MoFusion	2023
	PhysDiff	2023
	GestureDiffuCLIP	2023
GAN	Conditional GAN for Enhancing Diffusion Models	2025
	MoDI	2023
	BelFusion	2023
	ActFormer	2023
	BiHMP-GAN	2019
Neural Network and VAE-based	SpeechAct	2025
	Audio2Moves	2025
	MotionGPT	2024
	MoMask	2024
	SATO	2024
	M3GPT	2024
	PhysMoP	2024
	Fg-T2M	2023
	T2M-GPT	2023
	MotionBERT	2023
	MotionClip	2022
	PoseGPT	2022
	PoseScript	2022

Paper

L. R. Sookha, N. Pakhale, M. Ganaie, A. Dhall
A Survey of Body and Face Motion Animation: Datasets, Metrics and Generative Techniques

[Bibtex]

Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.