AI互动影游
多模态生成技术与视听生产工作流
该组文献聚焦于AI在音视频、动画生成层面的底层技术与工程实践。涵盖了AIGC、3D高斯泼溅、全流程AI动画制作、超低比特率视频编码及多模态视频编辑工具,旨在通过技术重构影视生产流程,提升互动影游的视觉质量与生成效率。
- AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control(Xinyue Guo, Xiaoran Yang, Lipan Zhang, Jianxuan Yang, Zhao Wang, Jian Luan, 2025, ArXiv)
- Neural Performance Toolset: AI-Powered Human Performance Synthesis(Jo Plaete, Matteo Olivieri-Dancey, Oriel Frigo, M. Anton, Sebastian Correa, Simon Deckers, Thomas Salama, Terrence Bannon, Tomas Koutsky, 2025, Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Talks)
- Study of Production Workflows in Interactive 3D Animation with AI Applications(E. Adriana, Balgum Song, 2024, International JOURNAL OF CONTENTS)
- From Script to Visualisation: Exploring and Implementing AI-Empowered Dynamic Storyboard Design in Education(Zheyuan Zhang, X. Zhong, 2025, US-China Education Review A)
- Large Generative Models Meet Multimodal Video Intelligence(Mike Zheng Shou, 2023, Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications)
- End-to-end Generative Pretraining for Multimodal Video Captioning(P. H. Seo, Arsha Nagrani, Anurag Arnab, C. Schmid, 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Implementation Paths of Generative AI in Multimodal Learning(B. Tan, 2025, Journal of Computer Science and Artificial Intelligence)
- AI动画在融合中华优秀传统文化方面的探索——以《千秋诗颂》为例(冀芊凝, 2025, 新闻传播科学)
- Ultra-Low Bitrate Multimodal Generative Face Video Coding Framework(Zhi Liu, Shuo Liu, Hongyun Lu, H. Bai, Hongyuan Jing, Mengmeng Zhang, 2025, 2025 Picture Coding Symposium (PCS))
- CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation(Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, 2025, Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers)
- Generative Video Semantic Communication via Multimodal Semantic Fusion With Large Model(Hang Yin, Li Qiao, Yu Ma, Shuo Sun, Kan Li, Zhen Gao, D. Niyato, 2025, IEEE Transactions on Vehicular Technology)
- A Survey of Interactive Generative Video(Jiwen Yu, Yiran Qin, Haoxuan Che, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Hao Chen, Xihui Liu, 2025, ArXiv)
- QuickCut: An Interactive Tool for Editing Narrated Video(A. Truong, Floraine Berthouzoz, Wilmot Li, Maneesh Agrawala, 2016, Proceedings of the 29th Annual Symposium on User Interface Software and Technology)
- AI in Video Analysis, Production and Streaming Delivery(A. Jayanthiladevi, Arun Raj, R. Narmadha, Sajin S. Chandran, Sai Shaju, K. Krishna Prasad, 2020, Journal of Physics: Conference Series)
- Portrait Video Editing Empowered by Multimodal Generative Priors(Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, Juyong Zhang, 2024, SIGGRAPH Asia 2024 Conference Papers)
- AI-driven film and television production technology and intelligent teaching research(Jiaqi Bian, 2025, No journal)
- 人机共创视域下动画电影的生产机制与创作主体性研究(姚语瞳, 2026, 艺术研究快报)
交互式叙事生成模型与智能创作框架
此类研究关注如何利用大语言模型(LLMs)和自动化系统构建动态叙事逻辑。研究重点在于剧情连贯性生成、玩家代理感(Agency)与叙事控制的平衡、无代码创作工具、以及针对特定主题(如环保、时尚)的交互式数字叙事(IDN)世界观构建。
- From Emergence to Planning: A Triangle Framework for Scalable, Controllable Interactive Storytelling(Lasantha Senanayake, 2025, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment)
- SceneCraft: Automating Interactive Narrative Scene Generation in Digital Games with Large Language Models(Vikram Kumaran, Jonathan Rowe, Bradford W. Mott, James C. Lester, 2023, No journal)
- Drama Llama: An LLM-Powered Storylets Framework for Authorable Responsiveness in Interactive Narrative(Yuqian Sun, Phoebe J. Wang, John Joon Young Chung, Melissa Roemmele, Taewook Kim, Max Kreminski, 2025, ArXiv)
- Narrative generation through characters' point of view(J. Porteous, M. Cavazza, Fred Charles, 2010, No journal)
- Structure, Agency, and Improvisation in Human-Led Digital Interactive Narrative Exercises(Mira Fisher, Molly Siler, Stephen G. Ware, 2025, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment)
- Versu—A Simulationist Storytelling System(Richard Evans, Emily Short, 2014, IEEE Transactions on Computational Intelligence and AI in Games)
- Mind Stories: A Story Making Game - From Narrative Therapy to Interactive Narrative Therapy(M. Eladhari, H. Koenitz, 2023, No journal)
- Being Water: Collaborating with an LLM in an Interactive Digital Narrative (IDN) as Speculative Aesthetics(Rafaela Nunes, Terhi Marttila, Andrés Isaza-Giraldo, Paulo Bala, Pedro F. Campos, 2024, No journal)
- WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling(Zhuoran Lu, Qian Zhou, Yi Wang, 2025, Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems)
- Simonstown: An AI-facilitated Interactive Story of Love, Life, and Pandemic(Bingyuan Wang, P. Zhu, Hao Li, D. Yip, Zeyu Wang, 2023, Proceedings of the 16th International Symposium on Visual Information Communication and Interaction)
- Memory Remedy: An AI-Enhanced Interactive Story Exploring Human-Robot Interaction and Companionship(Lei Han, Yu Zhou, Qiongyan Chen, David Yip, 2024, Proceedings of the 17th International Symposium on Visual Information Communication and Interaction)
- Tinker Tales: Interactive Storytelling Framework for Early Childhood Narrative Development and AI Literacy(Nayoung Choi, Peace Cyebukayire, Jinho D. Choi, 2025, ArXiv)
- Exploring Eco-Narrative Interaction through AIGC: The Creative Journey of “Plast-ocean”(Jinlin Miao, Zhiyuan Zhou, Fanjing Meng, 2025, Companion Publication of the 2025 ACM Designing Interactive Systems Conference)
- The Application of Interactive AI-Based Narrative Translation Model in Fashion Design(Siwen Huang, Zhihan Zhang, 2024, 2024 International Conference on Interactive Intelligent Systems and Techniques (IIST))
- Exploring Collaborative Interactive Digital Narrative Creation in Higher Education Through Narrative Analysis: A Case Study on COVID-19 Storytelling(Dimitra Petousi, A. Katifori, Maria Boilé, Eirini Sifaki, 2024, No journal)
- Natural Language Understanding in Façade: Surface-Text Processing(Michael Mateas, A. Stern, 2004, No journal)
- A Conceptual Blending Approach to the Generation of Cognitive Scripts for Interactive Narrative(Justin Permar, Brian Magerko, 2021, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment)
- ConnectVR: A Trigger-Action Interface for Creating Agent-based Interactive VR Stories(Mengyu Chen, Marko Peljhan, Misha Sra, 2024, 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR))
- Evolving Interactive Narrative Worlds(Justus Robertson, J. Heiden, R. E. Cardona-Rivera, 2023, No journal)
- Villanelle: An Authoring Tool for Autonomous Characters in Interactive Fiction(Chris Martens, Owais Iqbal, 2019, No journal)
- Emotional Characters for Automatic Plot Creation(M. Theune, S. Rensen, R. O. D. Akker, D. Heylen, A. Nijholt, 2004, No journal)
- Separating the autonomous behaviors and coordination regimes of non-player characters(G. Milla, Juan Fernández-Olivares, 2014, No journal)
- Interactive Digital Narrative and Structured Debate for Learning Ethics(Caroline Gerbaudo Nakazato, André Santanchè, 2025, No journal)
情感计算、虚拟角色与人机交互深度体验
该组研究聚焦于AI虚拟角色(NPC)的情感建模与互动感知。涉及多模态情感计算、语音木偶技术、情感意识音乐生成,以及用户在沉浸式环境(VR/AR)中与AI智能体产生的情感联结、心理信任与陪伴感研究。
- Authoring vs. Configuring Affective Agents for Interactive Storytelling(Stefan Rank, Steve Hoffmann, H. Struck, Ulrike Spierling, Simon Mayr, P. Petta, 2014, Applied Artificial Intelligence)
- Before We Disappear: The New Faces of Interactive Media(Richard Ramchurn, Joanne Parkes, Callum Berger, Giovani Schiazza, 2024, 2024 International Conference on Electrical and Computer Engineering Researches (ICECER))
- “AI陪伴”现象的形成逻辑、情感形塑与应对策略(王三叶, 2026, 新闻传播科学)
- 基于情感的聊天机器人互动设计研究(王子扬, 2024, 设计进展)
- 被理解的错觉:情感识别AI中的拟人化回应与传播误读机制研究(彭琪琪, 2025, 新闻传播科学)
- Storytelling and VR: Inducing emotions through AI characters(Gabriela Maria Pyjas, Jonathan Weinel, Martyn Broadhead, 2022, Electronic Workshops in Computing)
- MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing(Shreya Ghosh, Zhixi Cai, A. Dhall, Dimitrios Kollias, Roland Goecke, T. Gedeon, 2024, Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing)
- Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting(Miaosen Luo, Jiesen Long, Zequn Li, Yunyi Yang, Yuncheng Jiang, Sijie Mai, 2025, ArXiv)
- Emotion-Aware Music Generation for Personalized Soundtracks in Digital Media Using Transformer-Based AI Models(S. G. Rao, D. Beula, Pooja Sahu, R. Al-Fatlawy, M. Gowtham, M. Manaa, Ahmed Abdulsalam Alhayaly, 2025, 2025 3rd International Conference on Cyber Resilience (ICCR))
- Voice Puppetry: Speech Synthesis Adventures in Human Centred AI(M. Aylett, Yolanda Vazquez-Alvarez, 2020, Companion Proceedings of the 25th International Conference on Intelligent User Interfaces)
- Agents with Emotional Intelligence for Storytelling(João Dias, Ana Paiva, 2011, No journal)
- Interacting with virtual characters in interactive storytelling(M. Cavazza, Fred Charles, Steven J. Mead, 2002, No journal)
- Creativity in Configuring Affective Agents for Interactive Storytelling(Stefan Rank, Steve Hoffmann, H. Struck, Ulrike Spierling, P. Petta, 2012, No journal)
- An Autonomous Real-Time Camera Agent for Interactive Narratives and Games(A. Sorkine-Hornung, G. Lakemeyer, G. Trogemann, 2003, No journal)
- MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding(Yi Liu, Haowen Hou, Fei Ma, Shiguang Ni, F. Yu, 2025, IEEE Signal Processing Letters)
- Making you matter: creating interactive VR narratives through experimentation and learning(Larry Cutler, Eric Darnell, N. Dirksen, Michael Hutchinson, Scott Peterson, R. Schiewe, Wei Wang, 2019, Proceedings of the 2019 Digital Production Symposium)
- Agence, A Dynamic Film about (and with) Artificial Intelligence(Pietro Gagliano, Casey Blustein, Dave Oppenheim, 2021, ACM SIGGRAPH 2021 Immersive Pavilion)
- NarrativePlay: Interactive Narrative Understanding(Runcong Zhao, Wenjia Zhang, Jiazheng Li, Lixing Zhu, Yanran Li, Yulan He, Lin Gui, 2023, ArXiv)
- Influence of Music and Sounds in an Agent-Based Storytelling Environment(A. Leonardo, António Brisson, Ana Paiva, 2009, No journal)
- 虚拟数字人在电商直播场景的设计——以相芯科技数字主播为例(涂化兰, 王全权, 2024, 设计进展)
- WILDWOOD: Southside Chicago Worldbuilding Through Interactive Digital Narrative(E. M. Alexander, 2024, No journal)
- A Multimodal Interactive Storytelling Agent Using the Anthropomorphic Robot Head Flobi(Lilian Schröder, Victoria Buchholz, Victoria Helmich, L. Hindemith, B. Wrede, Lars Schillingmann, 2017, Proceedings of the 5th International Conference on Human Agent Interaction)
垂直领域应用、文化传承与行业实践
探讨AI互动叙事在特定社会场景下的落地,包括医疗模拟(模拟病人对话)、儿童教育(创意讲故事)、文旅遗产数字化保护、博物馆导览以及游戏本地化翻译。强调AI技术如何适配不同文化背景与行业需求。
- Culturally Adaptive Integration of Generative AI in Film Production and Education(Yeqiansui Yao, Gang Xu, 2025, 2025 7th International Conference on Internet of Things, Automation and Artificial Intelligence (IoTAAI))
- 生成式人工智能赋能“儿童文学”智慧教学的实践与探究(方灿灿, 开 健, 2025, 教育进展)
- Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education(Simon N. Chu, Alex J. Goodell, 2024, ArXiv)
- 人机协作的游戏文化元素本地化新探——以《原神》活动对联英译为例(张翼璇, 2025, 现代语言学)
- A Culturally Sensitive Interactive Digital Narrative to Promote Bodily Awareness Among Afghan Women(Pakezea Anwar, Hartmut Koenitz, 2025, No journal)
- AI and Cultural Innovations in South Korea(S. Dongre, 2025, International Journal of Science, Architecture, Technology and Environment)
- Innovation and development strategy of interactive entertainment industry driven by artificial intelligence(Stanley Chen, 2024, Journal of Artificial Intelligence Practice)
- Young Children's Creative Storytelling with ChatGPT vs. Parent: Comparing Interactive Styles(Jenna H. Chin, Seungwook Lee, Mohsena Ashraf, Matt Zago, Yun Xie, Elizabeth A Wolfgram, Tom Yeh, Pilyoung Kim, 2024, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- StoryChat: An Interactive Storytelling System for Engaging with Agent Characters in a Story(Shouqian Sun, Zizhen Wang, Jiarong Zhang, Xiaoliang Zhao, Xianyue Qiao, 2024, 2024 17th International Symposium on Computational Intelligence and Design (ISCID))
- Exploration of the Integrated Application of Creative AI and Film Visual Effects Technology in Cultural Tourism(Peng Yan, Qing Li, Anjia Ma, Jin Zhang, 2025, Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science)
- “Hyper Photography” Artifact: An interactive aesthetic education experience device designed based on AIGC(Yishan Liu, Zhen Chen, Xin Xie, Yaodong Hu, Lie Zhang, Weiran Li, S. Li, 2024, 2024 IEEE World AI IoT Congress (AIIoT))
- 基于活动理论的AIGC观展交互设计研究——以“趣迹”APP为例(胡 琪, 2026, 设计进展)
- From film to simulation: Mixed-methods study of learner-driven artificial intelligence-generated ethical dilemma scenarios and role-play to enhance ethical competence in nursing students.(B. Hwang, Mi Yu, 2026, Nurse education in practice)
- “A Painless Way to Learn:” Designing an Interactive Storytelling Voice User Interface to Engage Older Adults in Informal Health Information Learning(Smit Desai, Morgan Lundy, Jessie Chin, 2023, Proceedings of the 5th International Conference on Conversational User Interfaces)
- Research on Interactive Innovation between Television Program Formats and Film Narratives in the Context of Integrated Media(Xiewei Wu, 2025, Proceedings of the 2025 International Conference on Generative AI and Digital Media Arts)
- 交际翻译理论视角下的游戏本地化翻译研究——以互动式电影游戏《底特律:成为人类》为例(王若冰, 2024, 现代语言学)
- Gaming the System: Case Study in Investigative Journalism and Playful Interactive Narrative Design to Explain Systemic Bias in Immigration Policy(Lindsay D. Grace, 2023, No journal)
艺术本体论、审美心理与伦理规制研究
从理论高度反思AI介入后的媒介变革。包括数字艺术的“过程美学”、具身认知视角的观影体验、复调叙事理论、AI作者身份的法律挑战、数字生命的伦理难题,以及多模态生成的质量评价体系。
- “未完成”的状态:数字艺术的交互、生成与虚拟性如何共构一种过程美学(廖梓全, 2025, 艺术研究快报)
- 近十年来电影叙事动态与趋势研究——基于CiteSpace的CNKI文献可视化分析(欧阳荣, 2026, 统计学与应用)
- Can AI Create an Interactive Digital Narrative? A Benchmarking Framework to Evaluate Generative AI Tools for the Design of IDNs(Hartmut Koenitz, M. Eladhari, J. Barbara, 2024, No journal)
- Streams of Consciousness in Interactive Digital Narrative Research Design(Colette Daiute, Jack Wright, John T. Murray, 2024, No journal)
- Interactive Film Recombination(Fabrizio Guerrini, N. Adami, Sergio Benini, Alberto Piacenza, J. Porteous, M. Cavazza, R. Leonardi, 2017, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM))
- 超验并建构:以复调叙事方式骈接交互电影与视觉游戏(曹 越, 2024, 设计进展)
- 具身认知视角下的互动电影《黑镜:潘达斯奈基》研究(邸敬梅, 李 燕, Unknown Journal)
- AI艺术设计的主体性和情感性研究(王兴珍, 2023, 设计进展)
- 从人际信任的角度探究人机信任的影响因素(曹云霞, 2018, 社会科学前沿)
- Using Self-Determination Theory to Explore Enjoyment of Educational Interactive Narrative Games: A Case Study of Academical(Katelyn M. Grasse, M. Kreminski, Noah Wardrip-Fruin, Michael Mateas, E. Melcer, 2022, No journal)
- How Do Users Adopt AI-Generated Content (AIGC)? An Exploration of Content Cues and Interactive Cues(Chenxi Li, Yixun Lin, Ruqing Chen, J. Chen, 2025, Technology in Society)
- 系统功能语言学视角下人工智能叙事生成中的主位结构——基于DeepSeek生成的虚拟故事(申 瑶, 2025, 现代语言学)
- From Expanded Cinema to Extended Reality: How AI Can Expand and Extend Cinematic Experiences(Junrong Song, Bingyuan Wang, Zeyu Wang, D. Yip, 2023, Proceedings of the 16th International Symposium on Visual Information Communication and Interaction)
- 全媒体时代影视IP的多维度开发与文化传承路径研究(盖亚兰, 辛 平, 2024, 可持续发展)
- Hey Siri, tell me a story: Digital storytelling and AI authorship(S. Thorne, 2020, Convergence: The International Journal of Research into New Media Technologies)
- And Yet… The Paradox of Generative AI Griefbots(S. O'Flynn, 2025, Interactive Film & Media Journal)
- Distant Coding and the Future of Interactive Digital Narrative Pedagogy(John Murray, Anastasia Salter, 2025, No journal)
- 生成式人工智能网络小说的性质认定和权属论证(吕 品, 2024, 社会科学前沿)
- The Second Organ Era: Exploring Human-AI Relationship Through Interactive Narrative(Yanru Qian, Ching Wen Lee, Adorey Shen, 2025, Proceedings of the 2025 ACM Designing Interactive Systems Conference)
- 从技术到价值:纪录片智能创作的应用与审思(王 俐, 2025, 新闻传播科学)
- From Code to Camera: The Making and Meaning of Prosomoíosi (Simulation), an AI Documentary Film(Shuai Liu, Mar Canet Solà, 2025, Proceedings of the 18th International Symposium on Visual Information Communication and Interaction)
- Research on the Diversity and Consistency Evaluation Method for Generative Tasks Based on Multimodal Large Models(Qianying Yang, Hua Zheng, Zhengguo Ren, Cheng Li, Junjian Liu, 2025, 2025 7th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI))
- Analyzing Audience Comments: Improving Interactive Narrative with ChatGPT(Xiaoxu Li, Xiao You, Siyuan Chen, Pittawat Taveekitworachai, R. Thawonmas, 2023, No journal)
- From Simulation to Collaboration: Lessons from Designing an Interactive Narrative About a Character Living with Aphasia(N. Jarrett, 2025, No journal)
- The Importance of Representative Likeness: Why we Should Represent Interactive Digital Narrative with Interaction(H. Koenitz, Joshua A. Fisher, Anne Sullivan, Mirjam Palosaari Eladhari, Michael Cook, 2024, Journal of Interactive Narrative)
最终分组结果覆盖了AI互动影游从“技术底座”到“艺术上层建筑”的全产业链条。研究图谱显示:1. 技术侧已从单模态生成转向复杂的多模态视听协同与3D可控生成;2. 创作侧强调通过LLMs构建具备高代理感的自动化叙事框架;3. 体验侧深挖情感计算与具身认知,试图打破人机交互的冰冷感;4. 应用侧展现出极强的社会渗透力,在教育、医疗、文旅等垂直领域实现价值落地;5. 理论侧则在积极重构数字艺术本体论,并对AI引发的版权、伦理及审美范式转移进行深刻反思。整体呈现出技术效能与人文价值并重的融合发展态势。
总计104篇相关文献
大语言模型正在深刻改变翻译与语言服务行业的发展范式。这一变革在网络游戏本地化领域日趋显著——游戏剧情文本兼具多模态、交互性与沉浸感特征,其文化元素融合现实与虚拟世界,对本地化工作者的创译能力提出了挑战。研究以开放世界角色扮演游戏《原神》为例,通过构建提示词探索DeepSeek模型在游戏文化元素本地化中的应用潜力。研究表明,大语言模型能够辅助生成多版本适配文本,有效提升不同玩家群体的剧情体验,并在保持文本风格一致性方面展现优势。然而,研究同时揭示了大模型在文化适配精准度、幻觉等方面的局限性,表明当前技术仍需与专业译者的人机协作才能实现最优本地化效果。这一发现为人工智能时代的游戏本地化实践提供了重要的方法论参考。
基于近十年CNKI电影叙事研究文献收录数据,运用CiteSpace可视化工具,从发文量、学科分布、关键词聚类及热点词突现等维度展开文献计量分析,可为今后我国电影叙事提供参考。研究发现,近十年电影叙事研究呈现快速发展态势,学科交叉特征显著,研究热点围绕叙事结构、空间叙事、元宇宙叙事等展开,且在数字技术驱动下,电影叙事研究催生出了本土话语建构、观众参与度提升、“媒介+”等新特征。这代表了未来电影叙事研究的无限可能的同时,亟需构建跨学科、跨媒介的综合研究框架,以推动电影叙事向沉浸式、参与式方向发展。
全媒体时代背景下,影视IP作为文化产业的核心要素,其多维度开发与文化传承成为推动文化创新、促进文化产业高质量发展的关键路径。文章深入分析了全媒体时代影视IP开发的现状、全媒体技术对影视IP开发模式的影响,探讨了影视IP在内容创新、技术革新、商业模式创新等方面的多维度开发策略,阐述了影视IP在文化传承中的路径。研究得出:通过深度融合与跨界融合、数字技术与互动体验、多元化盈利渠道等策略,影视IP不仅可以实现价值的最大化,还促进文化的广泛传播与创新发展。影视IP作为文化传播与教育的重要载体,在文化传承中发挥着不可替代的作用,有助于文化的保护、创新与多样性展示。
本研究从系统功能语言学视角,分析了DeepSeek生成的中文虚拟叙事文本中的主位结构特点与规律。通过构建专项语料库,对45篇虚拟故事进行定量与定性分析,找到了人工智能叙事生成的主位结构特点,揭示了AI叙事在主位明确性与推进逻辑性上的优势与不足,并提出了优化AI叙事连贯性与吸引力的路径,为AI叙事生成的理论与实践提供了新见解。
随着数字中国建设的深入推进与数字经济的蓬勃发展,数字艺术作为文化与科技融合的前沿领域,其美学范式正在发生根本性转向。本文旨在探讨数字艺术的三个核心特质——交互性、生成性与虚拟性——如何共同瓦解传统艺术作品的“封闭完满”形态,并共构出一种以“未完成”为核心特征的“过程美学”。论文首先梳理“过程美学”的理论脉络,继而分别剖析交互性带来的“作者–观者”身份弥合、生成性引发的“算法–偶发”创作逻辑,以及虚拟性所构建的“在场–缺席”体验情境。在此基础上,论证三者如何相互交织,共同使数字艺术作品从静态的、确定的“物”转变为动态的、开放的“事件”。本文的研究,不仅为理解数字艺术的本体论提供了新的理论视角,也为我国在实施文化产业数字化战略、构建富有活力的数字文化生态过程中,提供了来自艺术实践前沿的学理支撑。
青年群体在各类展览的参观过程中,普遍面临“看不懂”“没意思”与“留不住”的体验断层,严重制约了展览内容的传播效能。文章引入活动理论与数字叙事视角,针对观展交互中的主体能力不足与工具效能滞后等矛盾,构建了AIGC赋能下的交互系统模型。在此基础上,设计了“趣迹”App:利用计算机视觉与大语言模型技术,将晦涩的展品信息实时转化为通俗的知识卡片,降低认知门槛;通过语音合成技术赋予展品拟人化角色,以第一人称叙事重塑交互规则,增强情感共鸣;引入视觉隐喻与收藏机制,将碎片化的观展体验整合为可流动的数字记忆。设计方案通过多模态技术的介入,有效优化了从感知到记忆的观展全流程,为展览场景的活态传承提供了新的实践路径。
在人机交互中,用户的许多行为都透露着丰富而微妙的情感信息,通过与用户进行文本/语音对话,AI在与人的对话交互中考虑情感信息,使机器具备理解、回应人的情感的能力,由此用户与AI互动中形成“被理解”的心理幻觉,本研究聚焦于人工智能情感识别系统中的“被理解的错觉”现象,旨在揭示算法通过拟人化语言与共情模板如何在传播实践中建构用户的情感感知。研究采用情境模拟与跨平台文本分析方法,选取六类典型情绪场景(正向与负向各三类),并对三个主流AI平台的回应进行编码与比较。结果表明,当前情感识别AI普遍通过情绪标签化反馈、模板化共情句式与第一人称拟人化表达,营造“情感在场感”,进而引导用户产生被理解感与媒介信任。然而,这种情感反馈模式易导致情感误读、依赖与责任归属模糊等风险。研究指出,未来算法情感设计需在亲和力与真实性之间取得平衡,避免过度情感拟态,并提出情绪生成机制透明化与用户媒介素养提升等治理建议。
随着人工智能技术的发展进步,纪录片创作呈现出前所未有的扩展性与多样性。人工智能能够在画面内容扩展、视听元素合成、替换、影像修复与沉浸体验等环节提供技术支持,使纪录片在叙事表达、观众体验和创作效率上得以显著提升,推动影像表达方式的创新。与此同时,人工智能的介入也带来了真实性边界模糊、观众信任易受挑战以及创作者主体性弱化等问题,使媒介伦理与价值自觉成为亟需关注的议题。在此背景下,纪录片创作者需坚持人文精神,合理利用技术辅助影像呈现,保障社会责任与观众信任;行业发展亦需关注培养兼具技术素养与人文关怀的复合型人才,以实现技术创新与价值守护的平衡。
伴随人工智能的发展,模仿人类思维、以社交为主要特征的聊天机器人应运而生,聊天机器人在帮助用户解决问题的同时,也契合用户的社交需求,提供了现实的情感与陪伴,让用户在应用过程中感受到积极的互动效应。本文通过对当下社交聊天机器人的发展现状及设计特点进行分析,结合实际案例,探究情感在人机交互中所发挥的作用,从而发现人工智能技术对人类发展所带来的影响。
随着人工智能生成内容(AIGC)技术的增长,对于动画电影产业来说正经历着从传统“手绘与实拍”向“人工智能生成”的转型。本文主要写在这个“人机共创”的时代之下,去探讨AIGC技术如何重构动画电影的生产机制,并引发创作主体性的演变。研究之后发现,导演与画师的职能也正在从繁复的体力劳动向核心创意的生成、审美把控及情感布局方面的迁移,所以AIGC技术并未完全取代人类创作者,而是推动了生产流程向智能协同模式的转变;这一过程不仅重新定义了“作者”的身份,也为动画电影的叙事美学带来了全新的可能性与挑战。
科技变革下的电影行业不断发展完善,建立在交互技术基础上的互动电影试图打破虚拟和现实的壁垒,创设情境,为观影者带来别样的认知观影体验。具身认知理论是当今认知心理学的新取向,与传统的认知心理学理论相比,具身认知强调认知中身体的主体性,将身体的感知作为认知形成的重要因素。本文基于具身认知理论分析电影《黑镜:潘达斯奈基》:从具身–嵌入、镜像–移情与互动建构三个方面分析探讨互动电影在叙事方式上给受众融入感;研究电影叙事方式上的变革以及科学技术完善互动建构。
AI艺术设计是AI技术成果在人类审美和设计领域的延伸。设计为人类服务而存在。艺术设计需要以设计者和接受者的主体性和情感性为出发点,兼顾人类审美与实际需求。但是,当设计师是机械的AI时,AI艺术设计是否能如人类艺术设计一样具有主体性和情感性?本文认为,AI是在人类提供的资料库基础上随机生成艺术设计,而资料库是人类设计惯例、情感符号、艺术风格的经验总结,因而AI艺术设计作品是在人类审美的“期待视野”之中的,因而AI艺术设计同样具有主体性和情感性。但AI是高理性、强逻辑的,这意味着AI艺术设计的主体性和情感性是有限度的,不可能达到人类一样的感性程度。
复调小说理论意为多种声音或思想同时出现,建构出一个平行统一的客观艺术世界。在戏剧电影和电子游戏领域,复调叙事手法更多地被应用于人物角色塑造方面,从而推动情节发展并与观众进行互动交流,丰富观众和玩家视角。交互电影将完整的叙事情节划分为可被选择或舍弃的若干片段,而复调化叙事正是其最为常见的人物对话显征。近年来涌现的视觉游戏中,同样存在复调性叙事的艺术手法应用。文章通过典型分析法对时下热度较高的影视互动游戏案例进行个体研究,找出其在叙事逻辑中的共通性,探究复调小说理论在互动影游文本中的创新应用。本文将以《隐形守护者》《底特律:化身为人》等作品为引,探究复调叙事理论在互动电影游戏中的角色和作用,提出以复调叙事方法构建交互电影和视觉游戏二者协同创作的艺术实践手段,将隐匿于影像空间之下的叙事情感内核和谐地呈现在观众和玩家面前。
游戏在全球迅猛的扩张带动了游戏软件本地化行业的发展。中国市场作为全球游戏开发商的必争之地,奠定了游戏本地化的重要性。本文在交际翻译理论视角下,以互动式电影游戏《底特律:成为人类》为例,对游戏本地化翻译进行研究。研究发现,在交际翻译理论指导下,译者能够用简练地道的译文,传达原文的内容。用交际翻译理论指导与《底特律:成为人类》同类型游戏的本地化翻译能较为完整地传达其涵义,让目的语观众理解源语文化,能产生与源语观众一样的观影效果。
信任在人际交互过程中起着重要作用,而人与人工智能系统的信任机制是否与人际信任有区别?研究表明,人工智能系统本身的性能(如可靠性、误警率和故障率等)和属性(如外观,身体接触)以及人所处的不同文化背景,均会影响人–机信任的建立。幸运的是,这些影响人机信任的因素,如性能和属性可以由人设计而提升。
在AGI时代,电商直播迎来数字主播的全新发展。数字主播在电商直播场景下拥有用户互动性、形象多样性、运营持续性和营销创新性等特点。同时,以相芯科技为代表的数字化平台将大模型技术引入数字主播中,使其更贴近真人主播的效果。本文旨在分析数字主播在电商直播场景中的设计特点、方法,并以相芯数字主播为例,聚焦数字主播的落地实践策略,从而探讨数字主播在推动电商直播行业发展和技术创新的重要性。
在人工智能技术呈爆发式发展态势下,其于教育领域的渗透愈发深入,于智慧教学环节更是彰显出独特优势与广阔前景。本研究深入探索生成式人工智能工具在儿童文学智慧教学中的创新应用及成效,引导学生借助该工具设计包含课程引入、内容讲解、互动讨论、作业布置和评估反馈等多个关键环节的教学方案,并着重运用其创作AI绘本与虚拟数字人等新颖元素融入教学实践。通过教学实验、问卷调查与数据分析及教案质量评估以及多元评价方式等,深入考察生成式人工智能工具的实际应用效果。研究结果显示,该工具不仅显著提升了智慧教学的效率与质量,极大地丰富了教学形式,增强了教学的趣味性和互动性,进而有效提升了学生的学习兴趣与课堂参与度。同时,本研究明确了生成式人工智能工具在智慧教学中的具体应用模式,深入探讨了其优化教学效果的内在机制,为儿童文学教学乃至整个教育领域借助生成式人工智能技术实现创新发展提供了实证依据和有益参考,有助于推动高等教育数字化向集成化、智能化、国际化方向迈进。
人工智能在出行、就医、娱乐等方面潜移默化的影响着人们的生活,以ChatGPT为代表的生成式人工智能也为网络小说写作提供了便利。随之而来的是生成式人工智能技术发展与现有著作权法律制度之间的冲突,其中最重要的著作权问题有两个:生成式人工智能创作的网络小说是否属于著作权法上所称的作品?该作品的权利和义务又归属于哪一主体?这两个问题富有争议且紧密相关,是研究其他问题的基础,文章从生成式人工智能运行过程中多主体的利益平衡角度入手,结合司法案例分析生成式人工智能的法律地位,从公平与效率的角度确定分配状态,按照主体对生成式人工智能作品的实质智力劳动比例来分配著作权归属。文章对文本生成过程中的数据挖掘和数据分析涉及对其他已有作品的著作权侵权问题提供解决思路,从法律的层面促使生成式人工智能的利用和可持续发展。
《千秋诗颂》是中国第一部全流程AI参与的系列动画片,动画将义务教育阶段的古诗词在视觉上做了全新演绎。该片的播出不仅为AI技术在动画创作领域的发展方向提供了实践范本,更实现了中华优秀传统文化与前沿科技的创新性融合。值得关注的是,尽管《千秋诗颂》在技术探索与应用层面具有开创性价值,但该作品在最终效果和大众舆论方面展现出仍然存在可以优化的空间。未来AI技术在动画制作领域的高效灵活运用,有赖于从业者持续推动技术攻关与艺术创新相结合的深度探索。
In the context of rapidly evolving integrated media, the boundaries between television formats and cinematic narratives are increasingly blurred, with cross-media interaction driving content innovation. This study explores the complementarity of television and film in narrative structures, expressive techniques, and audience engagement from both theoretical and technological perspectives, leveraging computer technologies to provide data support and algorithmic optimization for their convergence. It first analyzes structural characteristics, identifying differences and synergies in pacing, visual expression, and emotional resonance. An interactive model for the integrated media ecosystem is then constructed, integrating content production, dissemination, and feedback to enable dynamic adaptation across platforms. Built on a data collection and analysis platform, the model enhances creators' control over narratives and deepens audience immersion through content tagging, preference prediction, and multimodal processing. Validation results show significant potential in boosting content appeal, extending audience dwell time, and optimizing cross-platform dissemination. This research offers a feasible technical framework and practical insights for television-film interaction and cross-media content creation in the integrated media era.
No abstract available
This paper presents a multimodal framework integrating generative artificial intelligence (AI) into film production and educational content creation, with a focus on cultural adaptability. Combining text generation, visual synthesis, audio-video integration, and interactive Q&A modules, the framework enables automated production of culturally enriched educational media, using Guangdong culture as a case study. A demonstration using ChatGPT-4o, Stable Diffusion XL, and Pika Labs explored a futuristic urban scenario, with evaluations from 15 participants highlighting strong engagement, narrative depth, and cultural resonance. Comparative tool analysis revealed trade-offs in generation speed, quality, and post-editing flexibility, while user feedback emphasized the importance of fine-grained control, educational clarity, and ethical safeguards. This study offers practical and theoretical insights for advancing culturally responsive, AI-driven educational media at the intersection of artificial intelligence, creative industries, and education.
Artificial intelligence is profoundly changing the mode of film and television production and teaching, providing the industry with unprecedented intelligent means. This study explores key applications of AI in film and television production processes, including large model-driven script writing, intelligent character modeling, and text-generated image technologies to improve the efficiency and quality of content generation. In addition, AI also plays an important role in intelligent film and television teaching, including intelligent subtitle translation, virtual laboratories and automatic assessment systems to achieve immersive interactive learning. Combined with the current development trend of AI technology, it analyzes its deep impact in the field of film and television production and education, and discusses the evolution direction of the future intelligent film and television ecology, with the purpose of promoting industry innovation and application landing.
With the rapid development of digital technology, cultural tourism has entered a new stage of ``experience upgrading", and the integration of creative AI and film visual effects (VFX) technology has become a key driving force to enhance the immersive experience and cultural communication efficiency of cultural tourism. This study aims to explore the application path, effect evaluation and optimization strategy of the integration of creative AI and film VFX in cultural tourism. First, the technical framework of the integration system was constructed, including AI-driven VFX content generation (based on GANs and diffusion models), real-time interactive VFX experience (combined with AR/VR), and cultural element intelligent matching modules. Then, a mixed research method combining quantitative and qualitative analysis was adopted: 300 tourists from three typical cultural tourism scenic spots (Lijiang Ancient Town, Dunhuang Mogao Grottoes, and the Palace Museum) were selected for a 3-month comparative experiment, and data such as tourist satisfaction, cultural cognition accuracy, and stay time were collected; in addition, in-depth interviews were conducted with 15 industry experts (including cultural heritage protectors, AI technology developers, and tourism managers) to obtain qualitative opinions on technical applicability and cultural compatibility. The experimental results show that compared with traditional cultural tourism forms, the integrated application of creative AI and film VFX can increase tourist satisfaction by 42.3%, improve cultural cognition accuracy by 35.7%, and extend average stay time by 61.2%. The main challenges identified include the high cost of customized VFX content, the risk of ``over-digitalization" diluting cultural connotations, and the uneven technical acceptance of middle-aged and elderly tourists. This study provides a theoretical basis and practical reference for the digital transformation of cultural tourism and the innovative application of AI-VFX integration technology.
This paper examines the audiovisual artwork Prosomoíosi (Simulation) to critique how successive simulation technologies overwrite cultural memory. Grounded in media-archaeological thinking, the study argues that concepts must be articulated through media whose operative logic remains visible, thereby transforming viewers from passive spectators into active witnesses of algorithmic decision-making. A transparent live diffusion pipeline serves as both method and message, enabling audiences to observe how text prompts, stochastic noise, and performer input co-evolve on screen. By tracing precedents from early interactive installations to contemporary AI-driven works, the paper positions Prosomoíosi within a lineage that challenges tool-centric spectacle and renegotiates authorship at the human-machine frontier. Medium alignment thus emerges as a transferable design heuristic for artists seeking to move beyond technological virtuosity toward works that explicitly reveal, rather than conceal, their underlying political dimensions.
: The rapid advancement in technology has led to the creation of interactive media across various fields, including education, entertainment, advertising, film, gaming, and animation. However, interactive animations have not achieved the same level of popularity as interactive films and games, often due to their complex story structures, additional production steps, high costs, and the necessity for expertise in game engines to enable interactivity. This paper examines the use of artificial intelligence (AI) tools, particularly Convai within Unreal Engine, to establish a more efficient workflow and reduce production costs in interactive 3D animation. The study compares traditional manual production methods using Unreal Engine and ChatGPT with AI-enhanced workflows that incorporate Convai. The findings indicate that AI tools significantly reduce production time and simplify the creation of interactive features. However, Convai has limitations in flexibility and precision, particularly when it comes to customizing features and animations. While AI tools are beneficial for beginners and those with limited programming experience in Unreal Engine due to their user-friendly nature, manual workflows provide greater flexibility for complex interactions and customizations. The research concludes that AI has substantial potential to improve the production of interactive 3D animation, although further advancements are necessary to enhance support for character and animation customization.
We present an interactive story named Simonstown that demonstrates the love and life of ordinary people in the fictional setting of a fatal pandemic. Technically, the artwork integrates different Artificial Intelligence (AI) technologies in the whole production pipeline, including concept formation, creation, and presentation stages; artistically, this interactive film explores the relationship between human and environment in the contemporary context, especially infused with advanced technologies in daily life. The project serves as a demonstration and case study of AI-facilitated interactive storytelling, including better control with AI and how they integrate with live image projects, as well as using the stand-alone camera for real-time synchronization. Our results highlight the significant contribution of AI in visualizing intricate story branching, translation, and adaptation, presenting AI visualization as a distinct, specialized, and well-suited tool for interactive filmmaking.
Interactive films offer a novel viewing experience that diverges from traditional linear cinema. This paper presents an ethical approach to creating an adaptive interactive film using facial recognition software for emotion detection without personal data collection. We propose a new algorithm that dynamically determines scene order based on viewers' emotional responses, ensuring varied experiences across multiple screenings while maintaining narrative coherence. Our method addresses ethical concerns surrounding data protection and AI use in media. Quantitative analysis shows 91.5% facial recognition accuracy in cinematic environments and 87.6% viewer engagement rates. Initial testing demonstrates significant improvements in scene order generation and viewer satisfaction compared to traditional interactive films. This research contributes to the growing field of affective computing in interactive media, exploring the balance between personalisation and privacy.
: This paper discusses the transformation and development of interactive entertainment industry driven by artificial intelligence. With the rapid development of artificial intelligence technology, it plays an increasingly important role in various fields of interactive entertainment industry. In the game industry, AI has realized the intellectualization of game characters and personalized game experience, and improved the level of game production and player satisfaction. In the film and television industry, AI has brought automated editing and special effects synthesis, as well as virtual character and scene generation, bringing a new visual experience to the audience. In the music industry, AI has shown great potential in music creation, recommendation and performance. However, AI has also brought a series of challenges to the interactive entertainment industry, such as the difficulty of understanding 3D environment and Chinese semantic processing in technical problems. In response to these challenges, some coping strategies are proposed, such as improving the accuracy of the model and strengthening the cooperation between human and AI. In short, AI has brought profound changes to the interactive entertainment industry. In the future, we should continue to explore its development direction in order to achieve continuous innovation and progress in the industry.
The Metaphysic Neural Performance Toolset introduces a groundbreaking framework for photorealistic, AI-driven human performance synthesis. Leveraging advanced neural architectures, identity-specific training, and latent space manipulation, it delivers unparalleled realism and control for both cinematic post-production and real-time applications. Successfully deployed in major productions such as HERE, Alien: Romulus, and Mad Max: Furiosa, as well as live performances for Drake and Eminem, this toolset redefines AI-generated content in film and entertainment.
AIM We aimed to develop and implement the Film-Based, Artificial Intelligence (AI)-Generated Interactive Ethical Dilemma Role-Play Simulation program and evaluate its effects on nursing students' biomedical ethics awareness, critical thinking disposition, empathy and program satisfaction. BACKGROUND Although AI-based educational tools enhance interactivity and realism, their application in ethics-focused education remains limited. DESIGN A mixed-methods design was used with a quasi-experimental nonequivalent control group pretest-posttest element and a qualitative thematic analysis based on structured debriefing data. METHODS The program was developed through literature review, expert consultation, film selection and learner-driven AI-supported scenario development. Participants were 50 s- and third-year nursing students. The experimental group completed a four-session, two-week program incorporating film viewing, theoretical lectures, critical reflection, learner-driven scenario development supported by AI, role-play and debriefing, whereas the control group only viewed the films. Data were collected from March to April 2025 and analyzed using descriptive statistics, independent t-tests, nonparametric tests and thematic analysis. RESULTS The experimental group demonstrated significantly greater improvements in biomedical ethics awareness, critical thinking disposition and empathy than the control group. Significant pre-post improvements were also observed. Program satisfaction was high. The qualitative analysis identified three themes: experiencing emotional immersion and empathy, enhancing ethical sensitivity and value reflection and reflecting on the nursing role and strengthening professional identity. CONCLUSIONS The program effectively enhanced ethical awareness, critical thinking and empathy among nursing students. Therefore, integrating film-based learning, AI-generated scenarios and interactive role-play simulations is a feasible and engaging strategy for ethics education.
Dynamic storyboard design serves as the pivotal link between film and television scripts and their visual realisation, occupying a central position in film and television education. Traditional teaching methods face challenges including inefficient script-to-storyboard conversion, suboptimal visual accuracy, and insufficient personalised guidance. The deep integration of AI (artificial intelligence) technology into film and television production has unlocked new possibilities for transforming dynamic storyboard design instruction. Guided by the principle of leveraging AI technology to empower film and television education, this paper focuses on the conversion approach from “textual script to visual presentation”. It explores a three-dimensional intelligent teaching pathway comprising “AI intelligent analysis—human-machine collaborative creation—AI interactive optimisation”. Furthermore, it proposes safeguarding strategies across three dimensions: enhancing teachers’ comprehensive capabilities, developing teaching resources, and optimising evaluation systems. This study aims to provide theoretical reference for the intelligent transformation of core skill teaching in film and television programmes.
Music is a fundamental aspect of generating emotions and creating engaging user experiences in different applications of digital media, such as film, gaming, and interactive narrative. Classical algorithmic composition techniques are incapable of producing contextually meaningful and emotion-aware music in real-time. The emergence of artificial intelligence (AI), especially Transformer-based models, in recent years has demonstrated tremendous leaps in music generation. Though present models cannot dynamically modify music with respect to emotional cues, hence restricting their suitability in creating personal soundtracks, this study has the goal of creating an emotion-aware music generation system that learns to adapt based on user emotions through Transformer-based AI models. The suggested framework utilizes self-attention mechanisms in Transformers to create individualized soundtracks from emotional input signals. We train the model on large symbolic music datasets (e.g., MIDI, MAESTRO) and incorporate affective computing methods to translate user emotions into musical features like tempo, pitch, and harmony. Objective (perplexity, tonal coherence) and subjective (human listening tests) evaluations are performed. Experimental findings suggest that the Transformer model surpasses conventional recurrent neural networks (RNNs) and GAN-based methods in preserving emotional coherence and musical structure. Subjective testing verifies that users find the produced soundtracks to be emotionally consistent with presented input cues. This study shows the potential of AI-based, real-time affective music generation, opening the door to individualized, adaptive soundtracks for multimedia applications. Future research will concentrate on the multimodal emotion recognition integration for enhanced accuracy.
South Korea is at the forefront of both cultural innovation and artificial intelligence (AI), merging technological advancements with artistic expression. This article explores how AI is reshaping South Korea’s creative industries, including K-pop, gaming, film, and visual arts. AI is being used for music composition, virtual idols, adaptive gaming, AI-generated art, and film production, fundamentally altering traditional artistic and entertainment landscapes. While AI presents new creative possibilities, it also raises ethical concerns, copyright issues, and debates about authenticity. The discussion extends to future trends, emphasizing the potential for human-AI collaboration, immersive interactive content, and ethical governance. By maintaining a balance between innovation and artistic integrity, South Korea has the opportunity to lead the global AI-driven cultural revolution while preserving the human essence of creativity.
“And Yet… The Paradox of Generative AI Griefbots” addresses current advances in generative artificial intelligence technologies that offer LLMs and/or chatbots that can be customized to simulate the personae of lost loved ones with the input of digitized materials (text, image, video, audio). This paper examines the benefits and the dangers of intimate interactions with personalized, always-on chatbots that can provide users with deeply immersive experiences through three distinct theoretical frameworks. The first uses the qualitative research method of autoethnography to reflect on the months-long research-creation process of remediating a single photograph of myself and my father via the AI image generator, Midjourney. This project was undertaken as an experiment in elegy and culminated in two works of e-literature, the Twine visual novels, Infinity +1 and Infinite Eddies, and an early critical essay presented at the British Library MixConference 2023. Each reflects differently on the precarity of memory and the affect I experienced in Midjourney’s capacity to identify and remediate a set of identifiable elements that emphasize an emotional relationship configured through the positioning of our bodies in the frame, while simultaneously reinventing through infinite variations in time and place. Critical references include Hiroki Azuma’s conceptualization of “moe-elements” in anime in Otaku: Japan’s Database Animals (2009), Walter Benjamin’s ““The Work of Art in the Age of Its Technological Reproducibility,” and Nettrice Gaskins’ essay, “The Aura of AI-Generated Art.” The second theoretical framework examines the phenomenon of griefbots and human grieving for the beloved, looking back to Gilgamesh mourning Enkidu and Orpheus’ attempt to recover Eurydice from the Underworld recontextualized from the contemporary vantage of new technological products offered by Replika AI, Project December, Super Brain, and Seance AI. These simulations clearly can be beneficial as (re)mediations bridge the void felt after the loss of loved ones. Notably, Replika AI launched after founder Eugenia Kudya created a chatbot from the emails and text messages of her best friend after his death and a Stanford study (2024) has documented emotional benefits for users, including a decrease in suicidal ideation. Joshua Barbeau has written movingly on his experience of interacting with his lost girlfriend Jessica via game-developer Jason Roher’s AI chatbot platform, December Project, stating that “The whole experience gave me a sense of closure I didn’t even know I still needed.” Intertexts informing this critique include Shannon Vallor’s The AI Mirror, Derrida’s reading of the phármakon as remedy and poison, and Joseph Weisenbaum's warning of the “powerful delusional thinking” in user responses to the first AI Chatbot, Eliza (1976). The third section examines existing and proposed regulatory frameworks, the ethics of AI products, or lack thereof, in the “digital afterlife industry. Of particular note is the categorization of harm from “high risk anthropomorphic behaviour” detailed in Garcia v. Character Technologies Inc., et al. The charge that technology companies intentionally design “generative AI systems with anthropomorphic qualities to obfuscate between fiction and reality.…launching their systems without adequate safety features” provides the critical framework for my analysis.
This paper explores the concept of expanded cinema and its relationship to extended reality (XR), focusing on the potential of artificial intelligence (AI) to expand and extend expressive possibilities. Expanded cinema refers to experimental film and multimedia art forms that challenge the conventions of traditional cinema by creating immersive and interactive experiences for audiences. XR, on the other hand, blurs the line between physical and virtual reality, offering immersive storytelling experiences. Both expanded cinema and XR aim to push the boundaries of traditional norms and create immersive experiences through the integration of technology, interactivity, and cross-sensory elements. The paper emphasizes the role of AI in optimizing 3D scene creation for XR and enhancing the overall experience through a case study. It also presents several AI-based techniques, such as generative models and AI-assisted rendering, that facilitate efficient and effective 3D content creation. Additionally, it explores the use of AI plugins in 3D modeling software and the generation of 3D models and textures from 2D images using techniques like GANs and VAEs. The incorporation of AI to extend and expand opens up new possibilities for immersive experiences in the future.
No abstract available
Surveying narrative applications of artificial intelligence in film, games and interactive fiction, this article imagines the future of artificial intelligence (AI) authorship and explores trends that seek to replace human authors with algorithmically generated narrative. While experimental works that draw on text generation and natural language processing have a rich history, this article focuses on commercial applications of AI narrative and looks to future applications of this technology. Video games have incorporated AI and procedural generation for many years, but more recently, new applications of this technology have emerged in other media. Director Oscar Sharp and artist Ross Goodwin, for example, generated significant media buzz about two short films that they produced which were written by their AI screenwriter. It’s No Game (2017), in particular, offers an apt commentary on the possibility of replacing striking screenwriters with AI authors. Increasingly, AI agents and virtual assistants like Siri, Cortana, Alexa and Google Assistant are incorporated into our daily lives. As concerns about their eavesdropping circulate in news media, it is clear that these companions are learning a lot about us, which raises concerns about how our data might be employed in the future. This article explores current applications of AI for storytelling and future directions of this technology to offer insight into issues that have and will continue to arise as AI storytelling advances.
In this work, we present CineMaster, a novel framework for 3D-aware and controllable text-to-video generation. Our goal is to empower users with comparable controllability as professional film directors: precise placement of objects within the scene, flexible manipulation of both objects and camera in 3D space, and intuitive layout control over the rendered frames. To achieve this, CineMaster operates in two stages. In the first stage, we design an interactive workflow that allows users to intuitively construct 3D-aware control signals by positioning object bounding boxes and defining camera movements within the 3D space. In the second stage, these control signals—comprising rendered depth maps, camera trajectories, and object class labels—serve as the guidance for a text-to-video diffusion model, ensuring to generate the user-intended video content. Furthermore, to overcome the scarcity of in-the-wild datasets with 3D object motion and camera pose annotations, we carefully establish an automated data annotation pipeline that extracts 3D bounding boxes and camera trajectories from large-scale video data. Extensive qualitative and quantitative experiments demonstrate that CineMaster significantly outperforms existing methods and implements prominent 3D-aware text-to-video generation.
Agence is a short “dynamic film” that uses AI to power a real-time story. It was co-produced by Transitional Forms and the National Film Board of Canada (NFB). It is available on VR, PC and mobile, but for the purposes of this paper, we will be talking about the VR version, since it most closely matches the director’s vision. The film is directed by Pietro Gagliano whose work on interactive stories has spanned many years and technologies. A few years ago he started Transitional Forms to combine real-time storytelling with artificial intelligence. The intention behind that process is twofold: First, we believe that entertainment will soon be driven by AI. And secondly, artificial intelligence is poised to be humanity’s greatest tool, and stories might be the best way to make sense of it. To this end, we believe that Agence is an innovative production with bold strides in immersion, interactivity and technology. The approaches taken in this film are novel and unique in their propositions, and may open the door to many new projects that may build upon them.
Recent forms of virtual reality (VR) have changed the way people interact with moving images in the entertainment and the games industry, as well as the way the content is created. Technological advances in VR have given an opportunity to create simulated environments that users can immerse themselves in, and sense almost as a real experience by combining film techniques and interactive media approaches. Storytelling in VR presents various challenges due to the spatial properties of the medium. Research suggests that engaging Non-Player Characters (NPCs) enhance storytelling and can do so by communicating emotions. Most VR war experiences use the concept of morale and emotions applied to a group of soldiers or individual characters. To address the need for more believable AI characters in VR, this project will investigate how emotions can be communicated more effectively in a VR war application. VR companies are increasingly using Artificial Intelligence (AI) and cloud technologies to develop a stronger ecosystem for NPCs. However, there are still significant number of limitations in terms of technology and immersive storytelling for VR with characters and props paying significant role for creating convincing VR experiences. This project will therefore aim to enhance storytelling in VR by inducing emotions through AI characters in a war environment inspired by realistic events from WWII.
Virtual Reality (VR) is a transformative medium for narrative storytelling where content creators can place an audience member inside the story, give them a role to play, and ultimately make them matter to the characters. Immersive storytelling is fundamentally different from film and games. It requires a new creative toolset that is still in its infancy compared to other entertainment mediums. We provide a behind the scenes look at our many experimentations, failures, and learnings in developing interactive VR animated narratives spanning four released projects: Invasion!, Asteroids!, Crow: The Legend, and Bonfire. We delve into cinematic techniques for VR, including staging, movement mechanics, and directing the viewer's eye. We explore the role the viewer plays in each of our pieces. Finally, we dive into how we make you matter through nonverbal communication, interactivity that supports the narrative structure, non-linear storytelling, and character AI.
Video technologies evolve steadily with the evolution of machine learning and artificial intelligence which use cloud platform and video transcoding for better video production, delivery and live streaming. AI has profound effect on media and film industry, from content delivery to viewer’s experience. AI serves the richer and realistic experiences in personalization of user experience for video production and analysis process. AI changes the fact of manual tasks and facilitates deep content indexing. Quality assessment becomes easier when AI scrutinizes the content. Personal and interactive video provides new delightful viewing experiences. AI generates new level of interactions at scene by dichotomizing the videos and builds more practical access methods within the content.
State-of-the-art speech synthesis owes much to modern AI machine learning, with recurrent neural networks becoming the new standard. However, how you say something is just as important as what you say. If we draw inspiration from human dramatic performance, ideas such as artistic direction can help us design interactive speech synthesis systems which can be finely controlled by a human voice. This "voice puppetry" has many possible applications from film dubbing to the pre-creation of prompts for a conversational agent. Previous work in voice puppetry has raised the question of how such a system should work and how we might interact with it. Here, we share the results of a focus group discussing voice puppetry and responding to a voice puppetry demo. Results highlight a main challenge in user-centred AI: where is the trade-off between control and automation? and how may users control this trade-off?
This study explores the integration of AIGC with ecological interaction design, proposing a real-time interactive ecological narrative system to enhance public awareness and engagement in environmental protection. Traditional ecological interaction designs are often relatively static and one-dimensional, while AIGC generates dynamic artistic content, offering unique narrative experiences with each interaction and advancing the development of ecological storytelling. Furthermore, AIGC empowers users to co-create with the environment, transforming them from passive participants into active co-creators of ecological narratives. Through case studies and design demonstration, this study presents an interactive installation named “Plast-ocean” and proposes an innovative approach to ecological narrative interaction design, aiming to provide new ideas and practical pathways for fields such as environmental protection, education, exhibitions, and public art.
We present our approach to using AI-generated content (AIGC) and multiple media to develop an immersive, game-based, interactive story experience. The narrative of the story, "Memory Remedy", unfolds through flashbacks, allowing the audience to gradually uncover the story and the complex relationship between the robot protagonist and the older adults. This exploration explores important themes such as the journey of life, the profound influence of memories, and the concept of post-human emotional care. By engaging with this AIGC-based interactive story, audiences are encouraged to reflect on the potential role of robotic companionship in the lives of older adults in the future, and to encourage deeper reflection on the complex relationship between artificial intelligence and humanity.
In this paper, we present Drama Llama, an LLM-powered storylets framework that supports the authoring of responsive, open-ended interactive stories. DL combines the structural benefits of storylet-based systems with the generative capabilities of large language models, enabling authors to create responsive interactive narratives while maintaining narrative control. Rather than crafting complex logical preconditions in a general-purpose or domain-specific programming language, authors define triggers in natural language that fire at appropriate moments in the story. Through a preliminary authoring study with six content authors, we present initial evidence that DL can generate coherent and meaningful narratives with believable character interactions. This work suggests directions for hybrid approaches that enhance authorial control while supporting emergent narrative generation through LLMs.
Supporting high-agency player experiences without compromising narrative control is one of the major challenges in digital interactive narrative design. Humans, on the other hand, frequently meet this challenge when cooperating to improvise a narrative. We present a study examining how humans improvise narratives when paired together as the player and game master of a digital interactive narrative. We collected gameplay logs from these experiences, as well as participants’ reported perceptions of narrative structure, personal agency, and the reasons for both their choices and their partners’. We found a strong link between perceptions of structure and of agency. We also found a tendency for participants to better identify the goals of their partner’s actions following sessions where game masters expressed higher agency. Finally, we characterize the experiences using principles of improv theatre, drawing from the data to analyze negative experiences of agency as failures in the improv partnership.
AI's automated information processing capabilities have started replacing human cognitive thinking in ways that are increasingly imperceptible. We aim to prompt a discussion on the potential loss of human autonomy and the perceptual influences of AI interventions in interpersonal communication by constructing a fictional and interactive narrative. The second organ era envisions a future where a series of AI-powered wearables (respectively, second eye, second ear, and second mouth) become an extension of the human sensory system, mediating how we acquire, interpret, and transmit information. By employing critical making as a reflective design practice, we materialize inquiries into how human-AI relationships should evolve and call for critical evaluation of our growing reliance on AI.
No abstract available
Generative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions. While delegating generation to AI makes IN more interactive, it becomes challenging for authors to control the space of possible narratives - within which the final story experienced by the player emerges from their interaction with AI. In this paper, we present WhatELSE, an AI-bridged IN authoring system that creates narrative possibility spaces from example stories. WhatELSE provides three views (narrative pivot, outline, and variants) to help authors understand the narrative space and corresponding tools leveraging linguistic abstraction to control the boundaries of the narrative space. Taking innovative LLM-based narrative planning approaches, WhatELSE further unfolds the narrative space into executable game events. Through a user study (N=12) and technical evaluations, we found that WhatELSE enables authors to perceive and edit the narrative space and generates engaging interactive narratives at play-time.
No abstract available
Creating engaging interactive story-based experiences dynamically responding to individual player choices poses significant challenges for narrative-centered games. Recent advances in pre-trained large language models (LLMs) have the potential to revolutionize procedural content generation for narrative-centered games. Historically, interactive narrative generation has specified pivotal events in the storyline, often utilizing planning-based approaches toward achieving narrative coherence and maintaining the story arc. However, manual authorship is typically used to create detail and variety in non-player character (NPC) interaction to specify and instantiate plot events. This paper proposes SCENECRAFT, a narrative scene generation framework that automates NPC interaction crucial to unfolding plot events. SCENECRAFT interprets natural language instructions about scene objectives, NPC traits, location, and narrative variations. It then employs large language models to generate game scenes aligned with authorial intent. It generates branching conversation paths that adapt to player choices while adhering to the author’s interaction goals. LLMs generate interaction scripts, semantically extract character emotions and gestures to align with the script, and convert dialogues into a game scripting language. The generated script can then be played utilizing an existing narrative-centered game framework. Through empirical evaluation using automated and human assessments, we demonstrate SCENECRAFT’s effectiveness in creating narrative experiences based on creativity, adaptability, and alignment with intended author instructions.
In this paper, we introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives in an immersive environment. We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives. The system incorporates auto-generated visual display of narrative settings, character portraits, and character speech, greatly enhancing the user experience. Our approach eschews predefined sandboxes, focusing instead on main storyline events from the perspective of a user-selected character. NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or increase affinity with other characters through conversations.
An interactive narrative is bound by the context of the world where its story takes place. However, most work in interactive narrative generation takes its story world design and mechanics as given, which abdicates a large part of story generation to an external world designer. In this paper, we close the story world design gap with an evolutionary search framework for generating interactive narrative worlds and mechanics. Our framework finds story world designs that accommodate multiple distinct player roles. We evaluate our system with an action agreement ratio analysis that shows worlds generated by our framework provide a greater number of in-role action opportunities compared to story worlds randomly sampled from the generative space.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
Scholarly work on Interactive Digital Narrative (IDN) has long been communicated using the non-interactive format of the academic paper. Yet, when we only tell or show, we do not interact, which means that we lose the most important aspect of IDN–the interactive experience. In this article, we consider the limitations of traditional scholarly representations when it comes to IDN and demonstrate a novel format which includes interactive artifacts within the article, a move we consider as a crucial step for advancing IDN scholarship.
Choice-based interactive storytelling games such as Academical, our responsible conduct of research training game, show great promise as a novel way of providing efficacious ethics training. However, much work remains to determine what factors of such games contribute to their advantages over traditional text-based training tools, especially if we hope to further improve their enjoyment, engagement and efficacy. In this article, we present a case study exploring how the motivational factors of Self-Determination Theory (SDT) underlie players’ perceived most and least enjoyable experiences arising from the design of Academical. Specifically, we discuss how certain elements of Academical’s design influence different SDT factors and subsequently player experience, as well as how such elements can be changed to further improve the game. Furthermore, our work highlights potential limitations of existing conceptualizations for the relatedness factor of SDT—discussing ways that it can be extended to properly understand player enjoyment within single-player educational interactive narrative games.
Currently, the technology of AI-generated content (AIGC) has entered a comprehensive commercialization phase and is becoming a important means for interactive experience design. As an vital component of social aesthetic education, museums lack digitization in terms of the adhesion between exhibits and cultural ideas, the initiative of audience participation, the integration of technology and aesthetic education. Aiming at these issues, this study completes the design of the museum’s interactive aesthetic education experience device based on the AIGC paradigm after literature research, questionnaire surveys, data analysis, scheme design, and feasibility testing. This design applies AIGC to the interactive aesthetic education experience in museums, aiming to promote the effective integration of AIGC and aesthetic education, increase the participation and experience of audience, optimize the presentation of beauty, enrich the resource of aesthetic education through digital perception, exploration, and understanding, and enhance the interactive experience of cultural knowledge. It also deepens the understanding of the ideas and cultural connotations behind the exhibits.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
This paper presents a computational approach to the generation of cognitive scripts employed in freeform activities such as pretend play. Pretend play activities involve a high degree of improvisational narrative construction using cognitive scripts acquired from everyday experience, cultural experiences, and previous play experiences. Our computational model of cognitive script generation, based upon conceptual integration theory, applies operations to familiar scripts to generate new blended scripts.
No abstract available
No abstract available
Autonomous characters in interactive storytelling can be supported by using affective agent architectures. The configuration of most current tools for controlling agents is, however, implementation specific and not tailored to the needs of authors. Based on literature review, a questionnaire evaluation of authors’ preferences for character creation, and a case study of an author’s conceptualization of this process, we investigate the different methods of configuration available in current agent architectures, reviewing discrepancies and matches. Given these relations, promising approaches to configuration are identified, based on initial inner states, “global” parameters of characters, libraries of stock characters, and selections of backstory experiences.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
Virtual Actors are at the heart of Interactive Storytelling systems and in recent years multiple approaches have been described to specify their autonomous behaviour. One well known problem is how to achieve a balance between the characters' autonomy, defined in terms of their individual roles and motivations, and the global structure of the plot, which tends to emphasise narrative phenomena and the coordination of multiple characters. In this paper we report a new approach to the definition of virtual characters aimed at achieving a balance between character autonomy and global plot structure. Where previous approaches have tended to focus on individual actions our objective is to reincorporate higher-level narrative elements in the behaviour of individual actors and address the relation between character and plot at the level of behaviour representation. To this end we introduce the notion of a characters' Point of View and show how it enables a story to be described from the perspective of a number of different characters: it is not merely a presentation effect it is also a different way to tell a story. As an illustration, we have developed an Interactive Narrative based on Shakespeare's Merchant of Venice. The system, which features a novel planning approach to story generation, can generate very different stories depending on the Point of View adopted and support dynamic modification of the story world which results in different story consequences. In the paper, we illustrate this approach using example narratives generated using our fully implemented prototype.
No abstract available
No abstract available
Reading storybooks is one of the primary ways to cultivate children’s creativity and enhance their logical thinking. Deep engagement in storylines helps children interpret the content from multiple perspectives, understand the different standpoints of various characters, and thereby encourages comprehensive thinking to refine their logical reasoning. However, in current family education practices, traditional paper-based storytelling methods are still predominantly used to interact with children. This significantly limits children’s engagement with the story, leading to a reliance on parents’ interpretations rather than developing their own insights and independent thinking. Therefore, this study presents StoryChat, an interactive storybooks reading system based on artificial intelligence technology that allows children to engage in dialogue and interaction with agent characters in the story. A user study involving 12 pairs of children aged 5-7 and their parents showed that StoryChat increased children’s interest in reading, promoted children’s empathy cultivation and stimulated deeper thinking. This research provides new ideas for designing interactive storybooks and offers empirical support and design recommendations for the future application of agent characters in children’s storybooks education.
This paper presents Tinker Tales, an interactive storytelling framework in the format of a board game, designed to support both narrative development and AI literacy in early childhood. The framework integrates tangible and speech-based interactions with AI through NFC chip-attached pawns and tokens, along with a speaker and microphone. Children select and define key story elements-such as characters, places, items, and emotions-using the pawns and tokens, providing further details to the AI and receiving proper assistance, similar to how adults prompt AI for specific tasks (e.g., writing). For evaluation, several game sessions were simulated with a child AI agent, and the quality and safety of the generated stories were assessed from various perspectives. This work highlights the potential of combining physical and digital elements in AI literacy, offering a safe and engaging way for children to learn how to effectively collaborate with AI.
From Emergence to Planning: A Triangle Framework for Scalable, Controllable Interactive Storytelling
Interactive story systems today sit at three extremes. Emergent multi‑agent simulations give each character local intelligence but no global view, often losing plot structure. Reactive systems makes fast, state‑based decisions. They form plans using hand-authored rules without searching for action sequences, so these systems can respond quickly but can wander if long-term rules are not explicitly authored. Centralized narrative planners reason globally to craft coherent, goal‑directed plots, yet are computationally expensive. In my doctoral work I treat these not as isolated choices but as the three corners of a triangle spectrum of narrative generation. I propose hybrid, landmark‑guided approaches that can scale to larger domains. I am also exploring how large language models (LLMs) can be embedded within these hybrid approaches themselves. This paper outlines research questions, methodology, progress to date, evaluation plan, and requested feedback.
Creative storytelling with parents plays an important role in child development including language skills, social competence, and emotional understanding. Recognizing the challenges parents face in finding time for storytelling due to work and home responsibilities, we explore the feasibility of ChatGPT for engaging children in creative storytelling. This study investigates the use of ChatGPT, a conversational agent powered by GPT-4, in creative storytelling with children aged 5-6, comparing its interaction styles with those of parents. The current study included eight child-parent dyads. We found that children were engaged in shorter and more frequent interactions with parents compared to ChatGPT. ChatGPT and parents asked different types of questions, and ChatGPT more frequently provided positive feedback compared to parents. More children selected the interactions with ChatGPT as their favorite interactions. The study provides preliminary evidence on ChatGPT's interaction styles and insights into its potential role in supporting families in creative storytelling activities.
No abstract available
We present “Mystery Agent,” an interactive storytelling voice user interface (VUI) equipped with self-regulated learning strategies to deliver informal health-related learning to older adults through a murder mystery story. We conducted a mixed methods user study with 10 older adults to evaluate Mystery Agent, using usability and perception-based questionnaires, followed by a semi-structured interview and co-design activity to generate design insights and identify design priorities. We found older adults had a positive experience interacting with Mystery Agent and considered storytelling to be an appropriate and engaging way to learn health information. However, older adults identified credibility, compassion, and control as crucial factors influencing long-term use. To address this, we present design guidelines using Mystery Agent as an example to help practitioners and researchers devise novel solutions to address the informal health information learning needs of older adults.
The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences is a particularly complex task as it involves coding a series of story events (character animation, movements, time control, dialogues, etc.) to be connected and triggered by a variety of player behaviors. In this work, we present ConnectVR, a trigger-action interface to enable non-technical creators design agent-based narrative experiences. Our no-code authoring method specifically focuses on the design of narratives driven by a series of cause-effect relationships triggered by the player’s actions. We asked 15 participants to use ConnectVR in a preliminary workshop study as well as two artists to extensively use our system to create VR narrative projects in a three-week in-depth study. Our findings shed light on the creative opportunities facilitated by ConnectVR’s trigger-action approach, particularly its capability to establish chained behavioral effects between virtual characters and objects. The results of both studies underscore the positive feedback from participants regarding our system’s capacity to not only support creativity but also to simplify the creation of interactive narrative experiences. Results indicate compatibility with non-technical narrative creator’s workflows, showcasing its potential to enhance the overall creative process in the realm of VR narrative design.
Despite significant advancements in traditional syntactic communications based on Shannon's theory, these methods struggle to meet the requirements of 6G immersive communications, especially under challenging transmission conditions. With the development of generative artificial intelligence (GenAI), progress has been made in reconstructing videos using high-level semantic information. In this paper, we propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction. Specifically, at the transmitter, description and other condition signals (e.g., first frame, sketches, etc.) are extracted from the source video, functioning as text and structural semantics, respectively. At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video. Simulation results demonstrate that, at an ultra-low channel bandwidth ratio (CBR), our scheme effectively captures semantic information to reconstruct videos aligned with human perception under different signal-to-noise ratios. Notably, the proposed First Frame+Desc. scheme consistently achieves CLIP score exceeding 0.92 at CBR = 0.0031 for SNR $>$ 0 dB. This demonstrates its robust performance even under low SNR conditions.
We propose a novel Ultra-Low Bitrate Multimodal Generative Face Video Coding Framework for talking face videos. It fully leverages both audio and facial semantic information to maintain relatively high reconstruction quality at ultra-low bitrate. Talking face videos are segmented into “talking” and “silent” portion using an improved Voice Activity Detection (VAD) algorithm. Audio data is transmitted during the “talking” portion using AAC encoding, and video information is transmitted for the rest by extracting facial semantic features. At the decoder side, two generative models namely Interactive Face Video Coding(IFVC) and AniPortrait are adopted to reconstruct video from audio data and facial semantic feature. To alleviate the temporal artifacts introduced by segment-wise generation, a URP-NET frame interpolation is applied. Meanwhile, to improve the subjective quality of the generated video, a frame selection mechanism is proposed to guide the two generative models. Extensive experiments demonstrate that our proposed method achieves superior perceptual quality and temporal consistency at ultra-low bitrate, significantly outperforming existing Generative Face Video Coding(GFVC) approaches. Moreover, we introduce a frame selection mechanism at the decoding stage, which further enhances the overall generation quality.
We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts. Traditional portrait video editing methods often struggle with 3D and temporal consistency, and typically lack in rendering quality and efficiency. To address these issues, we lift the portrait video frames to a unified dynamic 3D Gaussian field, which ensures structural and temporal coherence across frames. Furthermore, we design a novel Neural Gaussian Texture mechanism that not only enables sophisticated style editing but also achieves rendering speed over 100FPS. Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models. Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates. Extensive experiments demonstrate the temporal consistency, editing efficiency, and superior rendering quality of our method. The broad applicability of the proposed approach is demonstrated through various applications, including text-driven editing, image-driven editing, and relighting, highlighting its great potential to advance the field of video editing. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/PortraitGen/
Interactive Generative Video (IGV) has emerged as a crucial technology in response to the growing demand for high-quality, interactive video content across various domains. In this paper, we define IGV as a technology that combines generative capabilities to produce diverse high-quality video content with interactive features that enable user engagement through control signals and responsive feedback. We survey the current landscape of IGV applications, focusing on three major domains: 1) gaming, where IGV enables infinite exploration in virtual worlds; 2) embodied AI, where IGV serves as a physics-aware environment synthesizer for training agents in multimodal interaction with dynamically evolving scenes; and 3) autonomous driving, where IGV provides closed-loop simulation capabilities for safety-critical testing and validation. To guide future development, we propose a comprehensive framework that decomposes an ideal IGV system into five essential modules: Generation, Control, Memory, Dynamics, and Intelligence. Furthermore, we systematically analyze the technical challenges and future directions in realizing each component for an ideal IGV system, such as achieving real-time generation, enabling open-domain control, maintaining long-term coherence, simulating accurate physics, and integrating causal reasoning. We believe that this systematic analysis will facilitate future research and development in the field of IGV, ultimately advancing the technology toward more sophisticated and practical applications.
Sound effect editing-modifying audio by adding, removing, or replacing elements-remains constrained by existing approaches that rely solely on low-level signal processing or coarse text prompts, often resulting in limited flexibility and suboptimal audio quality. To address this, we propose AV-Edit, a generative sound effect editing framework that enables fine-grained editing of existing audio tracks in videos by jointly leveraging visual, audio, and text semantics. Specifically, the proposed method employs a specially designed contrastive audio-visual masking autoencoder (CAV-MAE-Edit) for multimodal pre-training, learning aligned cross-modal representations. These representations are then used to train an editorial Multimodal Diffusion Transformer (MM-DiT) capable of removing visually irrelevant sounds and generating missing audio elements consistent with video content through a correlation-based feature gating training strategy. Furthermore, we construct a dedicated video-based sound editing dataset as an evaluation benchmark. Experiments demonstrate that the proposed AV-Edit generates high-quality audio with precise modifications based on visual content, achieving state-of-the-art performance in the field of sound effect editing and exhibiting strong competitiveness in the domain of audio generation.
Recent video and language pretraining frameworks lack the ability to generate sentences. We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining framework for learning from unlabelled videos which can be effectively used for generative tasks such as multimodal video captioning. Unlike recent video-language pretraining frameworks, our framework trains both a multimodal video encoder and a sentence decoder jointly. To overcome the lack of captions in unlabelled videos, we leverage the future utterance as an additional text source and propose a bidirectional generation objective - we generate future utterances given the present mulitmodal context, and also the present utterance given future observations. With this objective, we train an encoder-decoder model end-to-end to generate a caption from raw pixels and transcribed speech directly. Our model achieves state-of the-art performance for multimodal video captioning on four standard benchmarks, as well as for other video understanding tasks such as VideoQA, video retrieval and action classification.
Problem: Effective patient-centered communication is a core competency for physicians. However, both seasoned providers and medical trainees report decreased confidence in leading conversations on sensitive topics such as goals of care or end-of-life discussions. The significant administrative burden and the resources required to provide dedicated training in leading difficult conversations has been a long-standing problem in medical education. Approach: In this work, we present a novel educational tool designed to facilitate interactive, real-time simulations of difficult conversations in a video-based format through the use of multimodal generative artificial intelligence (AI). Leveraging recent advances in language modeling, computer vision, and generative audio, this tool creates realistic, interactive scenarios with avatars, or"synthetic patients."These synthetic patients interact with users throughout various stages of medical care using a custom-built video chat application, offering learners the chance to practice conversations with patients from diverse belief systems, personalities, and ethnic backgrounds. Outcomes: While the development of this platform demanded substantial upfront investment in labor, it offers a highly-realistic simulation experience with minimal financial investment. For medical trainees, this educational tool can be implemented within programs to simulate patient-provider conversations and can be incorporated into existing palliative care curriculum to provide a scalable, high-fidelity simulation environment for mastering difficult conversations. Next Steps: Future developments will explore enhancing the authenticity of these encounters by working with patients to incorporate their histories and personalities, as well as employing the use of AI-generated evaluations to offer immediate, constructive feedback to learners post-simulation.
With the rapid advancements in multimodal generative technology, Affective Computing research has provoked discussion about the potential consequences of AI systems equipped with emotional intelligence. Affective Computing involves the design, evaluation, and implementation of Emotion AI and related technologies aimed at improving people's lives. Designing a computational model in affective computing requires vast amounts of multimodal data, including RGB images, video, audio, text, and physiological signals. Moreover, Affective Computing research is deeply engaged with ethical considerations at various stages'from training emotionally intelligent models on large-scale human data to deploying these models in specific applications. Fundamentally, the development of any AI system must prioritize its impact on humans, aiming to augment and enhance human abilities rather than replace them, while drawing inspiration from human intelligence in a safe and responsible manner. The MRAC 2024 Track 1 workshop seeks to extend these principles from controlled, small-scale lab environments to real-world, large-scale contexts, emphasizing responsible development. The workshop also aims to highlight the potential implications of generative technology, along with the ethical consequences of its use, to researchers and industry professionals. To the best of our knowledge, this is the first workshop series to comprehensively address the full spectrum of multimodal, generative affective computing from a responsible AI perspective, and this is the second iteration of this workshop. Webpage: https://react-ws.github.io/2024/
In this talk, I would like to share my recent research around multimodal video intelligence in the era of large generative models. I will first talk about video-language pretraining techniques (All-in-one, EgoVLP) that use one single model to power various understanding tasks ranging from retrieval to QA. Then I will introduce challenges and our efforts of adapting these large pretrained models to AI Assistant, such a real-world application (AssistQ, AssistGPT). Finally I will delve into the reverse problem i.e. given open-world textual description, how to generate videos with diffusion models (Tune-A-Video, Show-1).
Multimodal Affective Computing (MAC) aims to recognize and interpret human emotions by integrating information from diverse modalities such as text, video, and audio. Recent advancements in Multimodal Large Language Models (MLLMs) have significantly reshaped the landscape of MAC by offering a unified framework for processing and aligning cross-modal information. However, practical challenges remain, including performance variability across complex MAC tasks and insufficient understanding of how architectural designs and data characteristics impact affective analysis. To address these gaps, we conduct a systematic benchmark evaluation of state-of-the-art open-source MLLMs capable of concurrently processing audio, visual, and textual modalities across multiple established MAC datasets. Our evaluation not only compares the performance of these MLLMs but also provides actionable insights into model optimization by analyzing the influence of model architectures and dataset properties. Furthermore, we propose a novel hybrid strategy that combines generative knowledge prompting with supervised fine-tuning to enhance MLLMs'affective computing capabilities. Experimental results demonstrate that this integrated approach significantly improves performance across various MAC tasks, offering a promising avenue for future research and development in this field. Our code is released on https://github.com/LuoMSen/MLLM-MAC.
When AI is currently applied to complex scenarios, it only processes single information such as text and images, which cannot meet people's needs to "perceive things as they do." Instead, multimodal learning, which can integrate different information such as text, images, audio, and video, has become a key direction for the development of generative AI. However, the current multimodal applications of generative AI are still stuck in several hurdles: First, the different modalities are quite different. For example, text is segmented and images are pixelated, which can easily cause problems when put together; second, the generated content does not match the intended meaning of the multimodal representation; and third, the quality of cross-modal generation is also unstable, sometimes good and sometimes bad. The stakeholders outlined its implementation approach across five key aspects: data foundation, feature processing, model architecture, training strategy, and quality assessment. First, pre-process and standardize multimodal data to establish a solid data foundation; then, cross-modal feature alignment is used to address modality differences; then, the generative model architecture is adapted to support cross-modal generation; then, multimodal pre-training and incremental learning are used to adapt the model to a wider range of scenarios; and finally, a scientific quality assessment system is employed to optimize the generation results. This research aims to provide usable technical logic for the implementation of generative AI in multimodal scenarios such as intelligent interaction, content creation, and medical diagnosis, and to promote multimodal generation from "being able to generate" to "generating well"
With the rapid development of multimodal large models, generative tasks have become increasingly widespread in fields such as image and video generation. However, evaluating both the diversity and consistency of the generated results remains a challenge. In response to the limitations of existing static evaluation methods, such as poor adaptability, high manual evaluation costs, and the difficulty in balancing diversity and consistency, this study proposes a dynamic iterative evaluation method. By constructing a dynamic reference set based on consensus from multiple models, a dual-metric evaluation system based on feature space distribution and semantic alignment is designed. Additionally, an adaptive iterative optimization algorithm is used to dynamically adjust model weights. Experiments show that this method achieves a Spearman correlation coefficient of 0.93 with manual evaluation in tasks such as text-to-image generation, improving by 18.7% compared to traditional metrics. Furthermore, it significantly enhances the robustness and modal extensibility of the evaluation. This study not only validates the effectiveness of the dynamic evaluation framework but also reveals the universal principle of the “diversity-consistency trade-off” in multimodal generation, providing a theoretical basis for model optimization.
In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies have focused on endowing these models with the capability of temporal perception in untrimmed videos. To address the challenge, in this paper, we introduce a multimodal large language model named MLLM-TA with precise temporal perception to obtain temporal attention. Unlike the traditional MLLMs, answering temporal questions through one or two words related to temporal information, we leverage the text description proficiency of MLLMs to acquire video temporal attention with description. Specifically, we design a dual temporal-aware generative branches aimed at the visual space of the entire video and the textual space of global descriptions, simultaneously generating mutually supervised consistent temporal attention, thereby enhancing the video temporal perception capabilities of MLLMs. Finally, we evaluate our approach on both video grounding task and highlight detection task on three popular benchmarks, including Charades-STA, ActivityNet Captions and QVHighlights. The extensive results show that our MLLM-TA significantly outperforms previous approaches both on zero-shot and supervised setting, achieving state-of-the-art performance.
The role of the fashion design industry has become increasingly prominent, and has gradually become an important means of urban economic and social development, accelerating the transformation and upgrading of urban industry, and enhancing the cultural heritage of the city. This paper constructs a narrative translation model based on interactive AI (Artificial Intelligence), constructs a natural language generation module, and enables the model to generate text content in line with the fashion design situation in real time according to the designer’s input, and provides a human-machine interface for the designer to interact with the artificial intelligence assistant. This paper compares the time required for designers to complete the design task before and after using the model, and evaluates whether the model improves the design efficiency. The designer numbered 2 (Flyer design) has a completion score of 65% before using the model, 85% after using the model, 3.5 hours before using the model, 2 hours after using the model, and a designer satisfaction score of 9 points. This paper will help promote the progress of fashion design industry and shorten the time of fashion design.
伴随生成式人工智能技术的突破性发展,“AI陪伴”作为一种新型的情感实践与关系形态,在青年群体中迅速兴起并融入其日常生活。本研究系统探究“AI陪伴”现象的形成逻辑及其对青年情感构建的复杂影响。研究发现,“AI陪伴”现象根植于现代社会的结构性疏离、数字媒介的技术可供性,以及青年主动寻求低风险情感策略的三重逻辑。本文进一步揭示,“AI陪伴”在提供高度可控、即时响应与个性化情感支持的同时,亦可能重塑青年对亲密关系的认知与期待,并对人际交往能力与社会性发展构成潜在挑战。因此,本文从政策、企业和个人三个层面提出应对策略与思考。
最终分组结果覆盖了AI互动影游从“技术底座”到“艺术上层建筑”的全产业链条。研究图谱显示:1. 技术侧已从单模态生成转向复杂的多模态视听协同与3D可控生成;2. 创作侧强调通过LLMs构建具备高代理感的自动化叙事框架;3. 体验侧深挖情感计算与具身认知,试图打破人机交互的冰冷感;4. 应用侧展现出极强的社会渗透力,在教育、医疗、文旅等垂直领域实现价值落地;5. 理论侧则在积极重构数字艺术本体论,并对AI引发的版权、伦理及审美范式转移进行深刻反思。整体呈现出技术效能与人文价值并重的融合发展态势。