数字人动作生成

语音驱动的面部表情与口型同步生成

该组文献聚焦于通过音频信号驱动数字人的面部动作，涵盖了口型同步、情感表达以及从单张图像生成说话人视频的技术方案。

Data-Driven Expressive 3D Facial Animation Synthesis for Digital Humans（Kazi Injamamul Haque, 2023, SIGGRAPH Asia 2023 Doctoral Consortium）
VQ-VAE Based Audio-Driven Talking Face Generation from a Single Image（Yixin Li, Xizhong Shen, 2025, 2025 5th International Conference on Artificial Intelligence, Virtual Reality and Visualization (AIVRV)）
Audio-driven single image talking face animation with transformers（Yixin Li, Xizhong Shen, 2026, Scientific Reports）
Virtual conversation with a real talking head（O. Gambino, A. Augello, A. Caronia, G. Pilato, R. Pirrone, S. Gaglio, 2008, 2008 Conference on Human System Interactions）
Comparative Study of Digital Sibling Video AI Platform（Leonard Mars Kurniaputra, R. Ferdiana, L. Nugroho, 2025, 2025 International Conference on Metaverse Computing, Networking and Applications (MetaCom)）
Svara Rachana - Audio Driven Facial Expression Synthesis（Karan Khandelwal, Krishiv Pandita, Kshitij Priyankar, Kumar Parakram, T. K, 2024, International Journal for Research in Applied Science and Engineering Technology）

全身肢体动作、手势与复杂行为合成

这组论文探讨了数字人的全身动作生成，包括基于语义的动作拼接、舞蹈动作控制、抓取行为模拟以及对话过程中的手势生成，强调动作的自然度与连贯性。

Semantic-Driven 2D Pose Stitching for Low-Cost and Controllable Digital Human Animation（Ge Cheng, Yun Zhang, Pengyuan Xie, 2025, Proceedings of the 2025 International Conference on Generative Artificial Intelligence for Business）
3D Human Animation Synthesis based on a Temporal Diffusion Generative Model（Baoping Cheng, Wenke Feng, Qinghang Wu, Jie Chen, Zhibiao Cai, Yemeng Zhang, Sheng Wang, Bin Che, 2024, 2024 2nd International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)）
Emotion control of unstructured dance movements（A. Aristidou, Qiong Zeng, E. Stavrakis, KangKang Yin, D. Cohen-Or, Y. Chrysanthou, Baoquan Chen, 2017, Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation）
Target Pose Guided Whole-body Grasping Motion Generation for Digital Humans（Quanquan Shao, Yi Fang, 2024, 2024 International Conference on Advanced Robotics and Mechatronics (ICARM)）
SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis.（Xinyi Wang, Shiguang Liu, Xu Yang, 2025, IEEE Computer Graphics and Applications）
Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss（Qifan Fu, Xiaohang Yang, Muhammad Asad, Changjae Oh, Shanxin Yuan, Gregory G. Slabaugh, 2024, No journal）
A Virtual Modeling Method of Digital Media Image Synchronization Based on Motion Hybrid Algorithm（Yanyan Yang, Zhiping Wang, Caixia Yang, H. Zhu, 2021, Journal of Physics: Conference Series）

基于大语言模型与多模态驱动的交互系统

此类文献研究如何将大语言模型（如ChatGPT/GPT-4）与数字人动作生成结合，构建具有情感响应、实时对话和情境理解能力的智能交互数字人。

Application of ChatGPT-Based Digital Human in Animation Creation（Chong-yu Lan, Yongsheng Wang, Chengze Wang, Shirong Song, Zheng Gong, 2023, Future Internet）
Development of an Interactive Digital Human with Context-Sensitive Facial Expressions（Fan Yang, Lei Fang, R. Suo, Jing Zhang, Mincheol Whang, 2025, Sensors）
Digital Human in an Integrated Physical-Digital World (IPhD)（Zhengyou Zhang, 2021, Proceedings of the 29th ACM International Conference on Multimedia）
GenAI-Powered Multilingual Digital Human: An Intelligent Conversational Companion for Enhancing Elderly Mental and Emotional Well-being（Sanika Deshpande, Supriya Kelkar, 2025, 2025 International Conference on Sustainable Technologies for Humanity and Smart World (HSWTech)）
Toward Industry 5.0: Evaluating Multimodal Virtual Human Interaction for Smart Healthcare in Simulated VR Environments（Han Yang, Qiuyu Tian, Xiaowen Gu, 2025, Internet Technology Letters）

可驱动数字人建模与动态外观重构

该组文献关注数字人的资产构建，包括从单图或单目视频中重构可驱动的3D人体模型、处理服装形变以及卡通风格化人脸的生成技术。

Creative Cartoon Face Synthesis System for Mobile Entertainment（Junfa Liu, Yiqiang Chen, Wen Gao, Rong Fu, Renqin Zhou, 2005, Lecture Notes in Computer Science）
Design of Virtual Digital Human Image and Interaction for Elementary School Students' Ecological Education（Mingyue Wang, Nahua Shi, Yuanrong Zhao, Han Li, Qian Liu, 2025, 2025 6th International Conference on Information Science and Education (ICISE-IE)）
Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis（Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, N. Mitra, 2021, 2021 International Conference on 3D Vision (3DV)）
Photo2Avatar: Single-Image to Animatable 3D Human Avatar via Multi-View Synthesis and Face-Aware Consistency Enhancement（Wengang Zhong, Yu Ni, Weimin Lei, Wei Zhang, 2025, 2025 International Conference on Virtual Reality and Visualization (ICVRV)）
DFGA: Digital Human Faces Generation and Animation from the RGB Video using Modern Deep Learning Technology（Diqiong Jiang, Li You, Jian Chang, Ruofeng Tong, 2022, Pacific Graphics Short Papers, Posters, and Work-in-Progress Papers）
D3-Human: Dynamic Disentangled Digital Human from Monocular Video（Honghu Chen, Bo Peng, Yunfan Tao, Juyong Zhang, 2025, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)）

动作生成框架、评估体系与行业应用研究

这组文献涉及数字人动作生成的底层算法框架优化（如扩散模型、VAE）、性能评估标准以及在教育、医疗等特定领域的应用实践。

Efficient multi-constrained optimization for example-based synthesis（Stefan Hartmann, E. Trunz, Björn Krüger, Reinhard Klein, M. Hullin, 2015, The Visual Computer）
Combining heterogeneous digital human simulations: presenting a novel co-simulation approach for incorporating different character animation technologies（Felix Gaisbauer, Eva Lampen, Philipp Agethen, E. Rukzio, 2020, The Visual Computer）
Deep Learning-Driven Animation: Enhancing Real-Time Character Motion Synthesis（Qi Li, Tianyi Sun, Meidi Zhang, 2025, IEEE Access）
“Wild West” of Evaluating Speech‐Driven 3D Facial Animation Synthesis: A Benchmark Study（Kazi Injamamul Haque, Alkiviadis Pavlou, Zerrin Yumak, 2025, Computer Graphics Forum）
P‐3.10: Research on Key Technologies of Virtual Digital Human（Songzhen Sang, Wanlin Li, 2025, SID Symposium Digest of Technical Papers）
Digital Human Technology in E-Learning: Custom Content Solutions（Sinan Chen, Liuyi Yang, Yue Zhang, Miao Zhang, Yangmei Xie, Zhiyi Zhu, Jialong Li, 2025, Applied Sciences）

数字人动作生成

数字人动作生成的研究正从单一的口型同步向多模态深度集成演进。目前的研究方向主要集中在：1) 利用生成式AI（如Diffusion和Transformer）提升肢体与面部动作的真实感；2) 结合大语言模型构建具备感知和交互能力的具身智能数字人；3) 探索低成本、高质量的单目视频/图像三维重建与动作驱动技术。同时，建立标准化的客观评估指标与感官评价体系已成为该领域进一步发展的关键需求。

共 30 篇文献，5 个研究方向

语音驱动的面部表情与口型同步生成

该组文献聚焦于通过音频信号驱动数字人的面部动作，涵盖了口型同步、情感表达以及从单张图像生成说话人视频的技术方案。相关文献: Kazi Injamamul Haque et. al, 2023 等 6 篇文献

全身肢体动作、手势与复杂行为合成

这组论文探讨了数字人的全身动作生成，包括基于语义的动作拼接、舞蹈动作控制、抓取行为模拟以及对话过程中的手势生成，强调动作的自然度与连贯性。相关文献: Ge Cheng et. al, 2025 等 7 篇文献

基于大语言模型与多模态驱动的交互系统

此类文献研究如何将大语言模型（如ChatGPT/GPT-4）与数字人动作生成结合，构建具有情感响应、实时对话和情境理解能力的智能交互数字人。相关文献: Chong-yu Lan et. al, 2023 等 5 篇文献

可驱动数字人建模与动态外观重构

该组文献关注数字人的资产构建，包括从单图或单目视频中重构可驱动的3D人体模型、处理服装形变以及卡通风格化人脸的生成技术。相关文献: Junfa Liu et. al, 2005 等 6 篇文献

动作生成框架、评估体系与行业应用研究

这组文献涉及数字人动作生成的底层算法框架优化（如扩散模型、VAE）、性能评估标准以及在教育、医疗等特定领域的应用实践。相关文献: Stefan Hartmann et. al, 2015 等 6 篇文献

总计30篇相关文献

Data-Driven Expressive 3D Facial Animation Synthesis for Digital Humans

基于数据驱动的数字人类表情3D面部动画合成

Kazi Injamamul Haque, 2023-SIGGRAPH Asia 2023 Doctoral Consortium

This doctoral research focuses on generating expressive 3D facial animation for digital humans by studying and employing data-driven techniques. Face is the first point of interest during human interaction, and it is not any different for interacting with digital humans. Even minor inconsistencies in facial animation can disrupt user immersion. Traditional animation workflows prove realistic but time-consuming and labor-intensive that cannot meet the ever-increasing demand for 3D contents in recent years. Moreover, recent data-driven approaches focus on speech-driven lip synchrony, leaving out facial expressiveness that resides throughout the face. To address the emerging demand and reduce production efforts, we explore data-driven deep learning techniques for generating controllable, emotionally expressive facial animation. We evaluate the proposed models against state-of-the-art methods and ground-truth, quantitatively, qualitatively, and perceptually. We also emphasize the need for non-deterministic approaches in addition to deterministic methods in order to ensure natural randomness in the non-verbal cues of facial animation.