多智能体 游戏角色
基于大语言模型 (LLM) 的智能体认知架构与协作推理
该组研究探讨如何利用 LLM 的逻辑推理、记忆管理和自然语言处理能力构建多智能体系统。重点在于角色一致性、任务规划、意图传播以及通过反射机制和自组织架构提升系统的鲁棒性与协作效率。
- Reinforce LLM Reasoning through Multi-Agent Reflection(Yurun Yuan, Tengyang Xie, 2025, ArXiv)
- MIRIX: Multi-Agent Memory System for LLM-Based Agents(Yu Wang, Xi Chen, 2025, ArXiv)
- AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing(Zhenhua Xu, Dongsheng Chen, Shuo Wang, Jian Li, Chengjie Wang, Meng Han, Yabiao Wang, 2026, ArXiv)
- Role-Specific Reward Design with Large Language Model for StarCraft II(Sijia Li, Haonan Lou, Xu Zhang, Xin Zeng, Zhixuan Shen, Tianrui Li, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks(Heng Zhou, Hejia Geng, Xiangyuan Xue, Zhenfei Yin, Lei Bai, 2025, No journal)
- BlockAgents: Towards Byzantine-Robust LLM-Based Multi-Agent Coordination via Blockchain(Bei Chen, Gaolei Li, Xi Lin, Zheng Wang, Jianhua Li, 2024, Proceedings of the ACM Turing Award Celebration Conference - China 2024)
- Multi-Agent Collaboration via Evolving Orchestration(Yufan Dang, Cheng Qian, Xu Luo, Jingru Fan, Zihao Xie, Ruijie Shi, Weize Chen, Cheng Yang, Xiaoyin Che, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun, 2025, ArXiv)
- Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games(Dekun Wu, Haochen Shi, Zhiyuan Sun, Bang Liu, 2023, No journal)
- YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning(Zhuang Yuan, Yi Shen, Zhili Zhang, Yuxiao Chen, Fei Miao, 2024, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems(Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhen-fei Yin, Siheng Chen, Jing Shao, 2025, ArXiv)
- Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent(Xiaoyan Yu, Tongxu Luo, Yifan Wei, Fangyu Lei, Yiming Huang, Peng Hao, Liehuang Zhu, 2024, No journal)
- Hybrid Voting-Based Task Assignment in Role-Playing Games(Daniel Weiner, Raj Korpan, 2025, ArXiv)
- Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models(Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi, 2024, ArXiv)
- LLM Multi-agent Decision Optimization(J. Curtò, I. D. Zarzà, C. Calafate, 2024, No journal)
- AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration(Bo Pan, Jiaying Lu, Ke Wang, Li Zheng, Zhen Wen, Yingchaojie Feng, Minfeng Zhu, Wei Chen, 2024, ArXiv)
多智能体强化学习 (MARL) 中的协调、通信与博弈优化
聚焦于利用强化学习解决多智能体环境下的协作挑战。研究涵盖了图神经网络 (GNN) 通信、价值分解、互信息正则化、进化算法优化以及在不完全信息博弈(如 MOBA、足球)中的策略平衡与对手建模。
- Enhancing Graph-based Coordination with Evolutionary Algorithms for Episodic Multi-agent Reinforcement Learning(Kexing Peng, Pengyi Li, Jianye Hao, 2025, No journal)
- Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach(Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, Guoliang Fan, 2024, No journal)
- Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents(Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch, 2019, ArXiv)
- Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers(Lei Yuan, Zifei Zhang, Ke Xue, Hao Yin, F. Chen, Cong Guan, Lihe Li, Chao Qian, Yang Yu, 2023, No journal)
- Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning(Wei Duan, Jie Lu, Junyu Xuan, 2024, ArXiv)
- ACORN: Acyclic Coordination with Reachability Network to Reduce Communication Redundancy in Multi-Agent Systems(Yi Xie, Ziqing Zhou, Chun Ouyang, Siao Liu, Linqiang Hu, Zhongxue Gan, 2025, No journal)
- Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data(Fuxiang Zhang, Chengxing Jia, Yi-Chen Li, Lei Yuan, Yang Yu, Zongzhang Zhang, 2023, No journal)
- Towards General Cooperative Game Playing(J. Marinheiro, Henrique Lopes Cardoso, 2018, Trans. Comput. Collect. Intell.)
- A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning(Woojun Kim, Whiyoung Jung, Myungsik Cho, Young-Jin Sung, 2023, ArXiv)
- FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning(Wenzhe Li, Zihan Ding, Seth Karten, Chi Jin, 2024, ArXiv)
- Collaborative museum heist with reinforcement learning(Eleni Evripidou, A. Aristidou, Panayiotis Charalambous, 2023, Computer Animation and Virtual Worlds)
- Learning Pre-Trained Tacit Behavior for Efficient Multi-Agent Adversarial Coordination(Shiqing Yao, Jiajun Chai, Haixin Yu, Yongzhe Chang, Yuanheng Zhu, Xueqian Wang, 2025, No journal)
- BALANCING INTRANSITIVE RELATIONSHIPS IN MOBA GAMES USING DEEP REINFORCEMENT LEARNING(Conor Stephens, Chris Exton, 2020, Proceedings of the 14 th International Conference on Interfaces and Human Computer Interaction 2020 and 13 th International Conference on Game and Entertainment Technologies 2020)
- A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem(Paul Barde, J. Foerster, D. Nowrouzezahrai, Amy Zhang, 2023, ArXiv)
- Team formation through an assessor: choosing MARL agents in pursuit–evasion games(Yue Zhao, Lushan Ju, J. Hernández-Orallo, 2024, Complex & Intelligent Systems)
社会化智能:可信角色建模、情感交互与道德推理
侧重于提升 NPC 的“可信度”和心理深度。研究涉及心理学模型(HEXACO)、道德判断、社会冲突建模、情感合成以及人类玩家对 AI 角色的神经科学反馈,旨在增强叙事沉浸感和人机协作的团队感。
- Agent communication for believable human-like interactions between virtual characters(J. V. Oijen, F. Dignum, 2012, No journal)
- Modeling believable game characters(Hanneke Kersjes, P. Spronck, 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG))
- Perspective-Taking of Non-Player Characters in Prosocial Virtual Reality Games: Effects on Closeness, Empathy, and Game Immersion(Jeffrey C. F. Ho, Ryan Ng, 2020, Behaviour & Information Technology)
- Modeling Morality-Based Argumentation for Believable Game Characters: A Design Postmortem(Rehaf Aljammaz, Michael Mateas, Noah Wardrip-Fruin, 2023, No journal)
- Smells Like Team Spirit: Investigating the Player Experience with Multiple Interlocutors in a VR Game(Nima Zargham, Michael Bonfert, Georg Volkmar, R. Porzel, R. Malaka, 2020, Extended Abstracts of the 2020 Annual Symposium on Computer-Human Interaction in Play)
- Dyadic cooperation with human and artificial agents: Event-related potentials trace dynamic role taking during an interactive game.(Karl-Philipp Flösch, Tobias Flaisch, Martin A. Imhof, H. Schupp, 2023, Psychophysiology)
- Multimodal emotion estimation and emotional synthesize for interaction virtual agent(Minghao Yang, J. Tao, Hao Li, Kaihui Mu, 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems)
- Playing with Social and Emotional Game Companions(Andry Chowanda, Martin Flintham, P. Blanchfield, M. Valstar, 2016, No journal)
- Emotional input for character-based interactive storytelling(M. Cavazza, D. Pizzi, Fred Charles, Thurid Vogt, E. André, 2009, No journal)
- Creating adaptive affective autonomous NPCs(M. Lim, João Dias, R. Aylett, Ana Paiva, 2010, Autonomous Agents and Multi-Agent Systems)
- Computational Models of Emotion, Personality, and Social Relationships for Interactions in Games: (Extended Abstract)(Andry Chowanda, P. Blanchfield, Martin Flintham, M. Valstar, 2016, No journal)
- Towards Simulated Morality Systems: Role-Playing Games as Artificial Societies(Joan Casas-Roma, M. Nelson, J. Arnedo-Moreno, S. Gaudl, Rob Saunders, 2019, No journal)
- Capturing and generating social behavior with the restaurant game(Jeff Orkin, D. Roy, 2010, No journal)
- Affect-Aware Agents for Emergent Social Conflict in Games(Weilun Deng, 2025, Applied and Computational Engineering)
- Meta-evaluating the Effects of Social Preferences on NPC-evaluators in an Energy Community Game(Andrés Isaza-Giraldo, Paulo Bala, Anna Jiskrová, Luiz Sachser, Pedro F. Campos, Lucas Pereira, 2025, Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust(Amogh Mannekote, Adam Davies, Guohao Li, K. Boyer, Chengxiang Zhai, Bonnie J. Dorr, Francesco Pinto, 2025, ArXiv)
具身智能、物理动画与多模态行为生成
关注智能体在物理世界或复杂视觉环境中的表现。包括利用视觉语言模型 (VLM) 进行动作决策、基于生成对抗模仿学习 (GAIL) 的肢体协调、以及多智能体间的物理协同运动(如共同搬运)。
- MAAIP(Mohamed Younes, Ewa Kijak, R. Kulpa, Simon Malinowski, Franck Multon, 2023, Proceedings of the ACM on Computer Graphics and Interactive Techniques)
- CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics(Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang, 2024, ArXiv)
- Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case(Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng, 2024, ArXiv)
- Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning(Isadora White, Kolby Nottingham, Ayush Maniar, Max Robinson, H. Lillemark, M. Maheshwari, Lianhui Qin, Prithviraj Ammanabrolu, 2025, ArXiv)
- Facilitating Video Story Interaction with Multi-Agent Collaborative System(Yiwen Zhang, Jianing Hao, Zhan Wang, Hongling Sheng, Wei Zeng, 2025, ArXiv)
- Behavior NPC Prediction Using Deep Learning(A Y Maulana, Supeno Mardi, E. M. Yuniarno, Y. Suprapto, 2022, 2022 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM))
- Multimodal emotion estimation and emotional synthesize for interaction virtual agent(Minghao Yang, J. Tao, Hao Li, Kaihui Mu, 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems)
系统架构、叙事控制与严肃游戏应用
涵盖了多智能体系统的底层工程实现,如 BDI 模型、有限状态机 (FSM)、Holonic 架构和游戏引擎中间件。同时探讨了导演控制系统在交互式叙事中的应用,以及在教育、医疗和安全培训等严肃游戏领域的实践。
- Implementation of Finite State Machine Models on the Artificial Intelligence System of Characters in The Game "MMORPG" using RPG Maker(Tengku Syahdina, M. Pardede, Fuzy Yustika Manik, 2023, Journal of Artificial Intelligence and Engineering Applications (JAIEA))
- Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments(M. Charity, Mayu Wilson, Steven Lee, Dipika Rajesh, Sam Earle, Julian Togelius, 2025, ArXiv)
- Multi-agent system for managing a game settlement with an expert-based behavior selection system for game characters(A.A. Yarovyi, I.R. Arseniuk, A. Kozlovskyi, D.P. Palamarchuk, O.O. Korolenko, 2026, Optoelectronic Information-Power Technologies)
- A BDI Game Master Agent for Computer Role-Playing Games(Bao Vo Luong, John Thangarajah, Fabio Zambetta, M. Hasan, 2017, Computers in Entertainment (CIE))
- Scripting embodied agents behaviour with CML: character markup language(Yasmine Arafa, E. Mamdani, 2003, No journal)
- Agents for games and simulations(F. Dignum, 2012, Autonomous Agents and Multi-Agent Systems)
- Implementation of a Holonic Multi-agent System in Mixed or Augmented Reality for Large Scale Interactions(Dani Manjah, K. Hagihara, G. Basso, B. Macq, 2020, No journal)
- A MultiAgent Architecture for Collaborative Serious Game applied to Crisis Management Training: Improving Adaptability of Non Player Characters(M'hammed Ali Oulhaci, Erwan Tranvouez, S. Fournier, B. Espinasse, 2014, EAI Endorsed Trans. Serious Games)
- Constella: Supporting Storywriters’ Interconnected Character Creation through LLM-based Multi-Agents(Syemin Park, Soobin Park, Youn-kyung Lim, 2025, ACM Transactions on Computer-Human Interaction)
- Evaluating directorial control in a character-centric interactive narrative framework(Mei Si, S. Marsella, David V. Pynadath, 2010, No journal)
- Teaching STEM through a Role-Playing Serious Game and Intelligent Pedagogical Agents(A. Terracina, Riccardo Berta, F. Bordini, R. Damilano, Massimo Mecella, 2016, 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT))
- Multi-Agent Interactive Game Simulation for Enhancing Child Safety(C. Valliyammai, Keziah Jennie, K. Rameshbabu, D. P. Prshanth, 2024, 2024 2nd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS))
- Spyke3D: A new computer games oriented BDI Agent Framework(Luca Palazzo, Gianluca Dolcini, A. Claudi, Gianluigi Biancucci, Paolo Sernani, L. Ippoliti, Lorenzo Salladini, A. Dragoni, 2013, Proceedings of CGAMES'2013 USA)
评估基准、开发工具与交互原型设计
该组文献致力于为多智能体研究提供支撑环境。包括针对零样本协调 (ZSC) 和角色扮演能力的测试集、快速原型设计工具(如 Paintboard)、以及用于人类参与实验的 Wizard of Oz 框架。
- FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline(Haotian Wu, Shufan Jiang, Mingyu Chen, Yiyang Feng, Hehai Lin, Heqing Zou, Yao Shu, Chengwei Qin, 2025, ArXiv)
- ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination(Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang, 2023, Advances in Neural Information Processing Systems 37)
- Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models(Yuanzhi Liang, Linchao Zhu, Yezhou Yang, 2023, ArXiv)
- LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models(Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang, 2023, No journal)
- Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning(Hangtian Jia, Yujing Hu, Yingfeng Chen, Chunxu Ren, Tangjie Lv, Changjie Fan, Chongjie Zhang, 2020, ArXiv)
- QuickWoZ: a multi-purpose wizard-of-oz framework for experiments with embodied conversational agents(Jan David Smeddinck, K. Wajda, A. Naveed, Leen Touma, Yuting Chen, Muhammad Abu Hasan, Muhammad Waqas Latif, R. Porzel, 2010, No journal)
- PaintBoard: prototyping interactive character behaviors by digitally painting storyboards(Daniel J. Rea, T. Igarashi, J. Young, 2014, Proceedings of the second international conference on Human-agent interaction)
- Tigrito: a multi-mode interactive improvisational agent(Heidy Maldonado, A. Picard, Patrick Doyle, B. Hayes-Roth, 1998, No journal)
- From Text to Tactic: Evaluating LLMs Playing the Game of Avalon(Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu, 2023, ArXiv)
特定博弈中的意图识别与策略建模
针对特定游戏类型(如坦克大战、棋牌、交通模拟)开发的行为模型。重点在于预测对手行为、识别玩家意图、减少协作中的负面效应以及优化特定场景下的决策逻辑。
- Explaining and Predicting the Behavior of BDI-Based Agents in Role-Playing Games(M. Sindlar, M. Dastani, F. Dignum, J. Meyer, 2009, No journal)
- Reinforcement Learning Agents Playing Ticket to Ride–A Complex Imperfect Information Board Game With Delayed Rewards(Shuo Yang, M. Barlow, Thomas Townsend, Xuejie Liu, Dilini Samarasinghe, E. Lakshika, Glennn Moy, Timothy Lynar, Benjamin P. Turnbull, 2023, IEEE Access)
- Towards a multi-agent non-player character road network: a Reinforcement Learning approach(Stela Makri, Panayiotis Charalambous, 2021, 2021 IEEE Conference on Games (CoG))
- Combine Intent Recognition with Behavior Modeling in Teaching Competition Military Simulation Platform(Yi Zhang, Shuilin Li, Chuan Ai, Yong Peng, Kai Xu, 2024, No journal)
- Reinforcement Learning-Based Autonomous Soccer Agents: A Study in Multi-Agent Coordination and Strategy Development(Biplov Paneru, B. Paneru, Ramhari Poudyal, K. Poudyal, 2025, Buana Information Technology and Computer Sciences (BIT and CS))
- Minimizing Negative Side Effects in Cooperative Multi-Agent Systems using Distributed Coordination(Moumita Choudhury, Sandhya Saisubramanian, Hao Zhang, S. Zilberstein, 2024, No journal)
- AIP: Adversarial Interaction Priors for Multi-Agent Physics-based Character Control(Mohamed Younes, Ewa Kijak, Simon Malinowski, R. Kulpa, F. Multon, 2022, SIGGRAPH Asia 2022 Posters)
- Dungeons & Replicants II: Automated Game Balancing Across Multiple Difficulty Dimensions via Deep Player Behavior Modeling(Johannes Pfau, Antonios Liapis, Georgios N. Yannakakis, R. Malaka, 2023, IEEE Transactions on Games)
- Design of a Decision Maker Agent for a Distributed Role Playing Game - Experience of the SimParc Project(Jean-Pierre Briot, Alessandro Sordoni, Eurico Vasconcelos Filho, M. Irving, Gustavo Melo, Vinícius Sebba-Patto, Isabelle Alvarez, 2009, No journal)
本报告综合了多智能体游戏角色领域的研究现状,展示了从传统基于规则的架构向深度强化学习与大语言模型驱动转型的完整图景。研究不仅在技术底层实现了更高效的通信与协调算法(MARL),在认知层面引入了具备长期记忆与逻辑推理的 LLM 框架,更在社会心理层面深入探讨了角色的道德、情感与可信度。此外,具身智能与多模态交互的研究正推动 NPC 从简单的脚本控制走向复杂的物理与视觉感知。完善的评估基准与原型工具链则为该领域的持续演进提供了科学的度量衡,共同推动游戏角色向更智能化、社会化和人性化的方向发展。
总计118篇相关文献
This article presents a novel approach to modeling the intelligent behavior of game characters through a multi-agent system integrated with an expert system for behavior selection. By incorporating this system, game characters acquire the ability to adapt their behavior according to their individual traits and interactions with other characters. To model personality traits, several well-known psychological frameworks were considered, including FFM (The Five Factor Model), HEXACO, and AD (Affiliation and Dominance Model). After comparing the models, a combination of HEXACO and AD was chosen, as this approach allows for detailed modeling of both individual game character traits and their interpersonal relationships. To select the appropriate behavior for a game character, a scoring system was developed that evaluates behavior templates based on input data: the character’s mood, personality traits, relationship types, and knowledge about the behavior of other game characters. This data is used to calculate a total score for each behavior template, determining the character's final action. The calculation process is performed using compound behavior matching matrices that align templates with character traits. The introduction of a random deviation ensures variability in game character behavior and prevents deterministic outcomes. The scoring system is formalized as a mathematical model that describes the influence of each factor on behavior selection through scoring functions and corresponding weighting coefficients. To test the expert system, a game prototype was developed on the Unity platform, where game characters perform tasks to maintain the settlement's functionality. They independently choose tasks based on the current environment state and interact with one another according to behavior templates selected by the expert system. The proposed approach enables the creation of a dynamic game environment with unpredictable character actions, determined by their personality traits and interpersonal relationships. This opens new opportunities for improving behavior systems in games.
Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments
This work introduces Amorphous Fortress Online -- a web-based platform where users can design petri-dish-like environments and games consisting of multi-agent AI characters. Users can play, create, and share artificial life and game environments made up of microscopic but transparent finite-state machine agents that interact with each other. The website features multiple interactive editors and accessible settings to view the multi-agent interactions directly from the browser. This system serves to provide a database of thematically diverse AI and game environments that use the emergent behaviors of simple AI agents.
The proposed multi-agent interactive game simulation combines immersive storytelling with state-of-the-art language models as well as artificial intelligence (AI) characters that will help children to discover the enigmas behind derelict buildings reveal all its mysteries. The simulated personalities provide personalized advice which equips these little explorers with knowledge and security within an exciting virtual environment. Therefore, simulated dialogues between a kid and different characters found inside a dilapidated structure are created using the most advanced language types namely fine-tuned versions of Large Language Model (LLM). In their own unique ways each of the avatars in the game would give accurate direction or encouragement to the child. The design of AI system is put on a server that allows a seamless interaction between the child and virtual characters in real time. Their use of modern technologies empowers children select wisely while remaining secure in urban places. This proposed multi-agent interactive game simulation enhances child safety in derelict buildings by mimicking gameplay.
Collaboration is ubiquitous and essential in day-to-day life -- from exchanging ideas, to delegating tasks, to generating plans together. This work studies how LLMs can adaptively collaborate to perform complex embodied reasoning tasks. To this end we introduce MINDcraft, an easily extensible platform built to enable LLM agents to control characters in the open-world game of Minecraft; and MineCollab, a benchmark to test the different dimensions of embodied and collaborative reasoning. An experimental study finds that the primary bottleneck in collaborating effectively for current state-of-the-art agents is efficient natural language communication, with agent performance dropping as much as 15% when they are required to communicate detailed task completion plans. We conclude that existing LLM agents are ill-optimized for multi-agent collaboration, especially in embodied scenarios, and highlight the need to employ methods beyond in-context and imitation learning. Our website can be found here: https://mindcraft-minecollab.github.io/
The development of deep reinforcement learning (DRL) has benefited from the emergency of a variety type of game environments where new challenging problems are proposed and new algorithms can be tested safely and quickly, such as Board games, RTS, FPS, and MOBA games. However, many existing environments lack complexity and flexibility and assume the actions are synchronously executed in multi-agent settings, which become less valuable. We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game. It is a complex and challenging environment that supports multiple characters, multiple positions, and both the single-agent and multi-agent player control modes. In addition, to better simulate real-world basketball games, the execution time of actions differs among players, which makes Fever Basketball a novel asynchronized environment. We evaluate commonly used multi-agent algorithms of both independent learners and joint-action learners in three game scenarios with varying difficulties, and heuristically propose two baseline methods to diminish the extra non-stationarity brought by asynchronism in Fever Basketball Benchmarks. Besides, we propose an integrated curricula training (ICT) framework to better handle Fever Basketball problems, which includes several game-rule based cascading curricula learners and a coordination curricula switcher focusing on enhancing coordination within the team. The results show that the game remains challenging and can be used as a benchmark environment for studies like long-time horizon, sparse rewards, credit assignment, and non-stationarity, etc. in multi-agent settings.
Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at https://sites.google.com/view/fightladder/home.
No abstract available
This paper presents MultiStyle, a multi-agent centralized heuristic search planner that incorporates distinct agent playstyles to generate solution plans where characters express individual preferences while cooperating to reach a goal. We include algorithmic details, an example domain, and multiple different solution plans generated with unique agent playstyle sets. We discuss our intent to incorporate this planner in a tool for game level designers to help them anticipate and understand how teams of players with distinct playstyles may play through their levels. Ultimately, MultiStyle generates solution plans with a novel and increased expressive range by attempting to satisfy sets of action and proposition preferences for each agent.
Text-adventure games and text role-playing games are grand challenges for reinforcement learning game playing agents. Text role-playing games are open-ended environments where an agent must faithfully play a particular character. We consider the distinction between characters and actors, where an actor agent has the ability to play multiple characters. We present a framework we call a thespian agent that can learn to emulate multiple characters along with a soft prompt that can be used to direct it as to which character to play at any time. We further describe an attention mechanism that allows the agent to learn new characters that are based on previously learned characters in a few-shot fashion. We show that our agent outperforms the state of the art agent framework in multi-character learning and few-shot learning.
Creating detailed and interactive game environments is an area of great importance in the video game industry. This includes creating realistic Non-Player Characters which respond seamlessly to the players actions. Machine learning had great contributions to the area, overcoming scalability and robustness shortcomings of hand-scripted models. We introduce the early results of a reinforcement learning approach in building a simulation environment for heterogeneous, multi-agent non-player characters in a dynamic road network game scene.
No abstract available
No abstract available
Recent advancements in natural language and Large Language Models (LLMs) have enabled AI agents to simulate human-like interactions within virtual worlds. However, these interactions still face limitations in complexity and flexibility, particularly in scenarios involving multiple characters and novel objects. Pre-defining all interactable objects in the agent's world model presents challenges, and conveying implicit intentions to multiple characters through complex interactions remains difficult. To address these issues, we propose integrating virtual Game Masters (GMs) into the agent's world model, drawing inspiration from Tabletop Role-Playing Games (TRPGs). GMs play a crucial role in overseeing information, estimating players' intentions, providing environment descriptions, and offering feedback, compensating for current world model deficiencies. To facilitate future explorations for complex interactions, we introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation (MOE) task and a supporting dataset. MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions. Besides, the dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. Finally, we present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding. We hope that our dataset and task will inspire further research in complex interactions with natural language, fostering the development of more advanced AI agents.
Serious Games (SG) are more and more used for training, as in the crisis management domain, where several hundred stakeholders can be involved, causing various organizational difficulties on field exercises. SGs specific benefits include player immersion and detailed players’ actions tracking during a virtual exercise. Moreover, Non Player Characters (NPC) can adapt the crisis management exercise perimeter to the available stakeholders or to specific training objectives. In this paper we present a Multi-Agent System architecture supporting behavioural simulation as well as monitoring and assessment of human players. A NPC is enacted by a Game Agent which reproduces the behaviour of a human actor, based on a deliberative model (Belief Desire Intention). To facilitate the scenario design, an Agent editor allows a designer to configure agents’ behaviours. The behaviour simulation was implemented within the pre-existing SIMFOR project, a serious game for training in crisis management. Received on 30 December 2013; accepted on 31 March 2014; published on 22 May 2014
Team coordination of non-player characters can create a deeper sense of immersion in real-time games by allowing characters to work together to produce better tactics and strategy. Achieving multi-agent coordination can be a difficult problem, and can incur substantial computational costs. Our goal with this work is to produce a reactive method for coordinating game characters that will allow computationally inexpensive team coordination. Reactive teaming creates teams of agents through the use of simple constant-time agent interactions without increasing the difficulty of authoring game characters.
No abstract available
Verbal communication is a central component in collaborative multiplayer gaming and creates a feeling of companionship among the players. In single-player games, this aspect is often missing. Advancements in speech recognition now open new potentials for voice-activated single-player experiences. In this work, we integrated voice interaction to a single-player virtual reality (VR) game. To create a sense of team spirit, we enabled players to talk to a multiplicity of agents using natural language. We hypothesize that conversing with only one agent cannot produce the same level of camaraderie. We conducted a preliminary qualitative user study (N=10) to explore how players experience talking with the in-game characters in the single-agent and the multi-agent condition. Early results suggest that our participants prefer interacting with the group of interlocutors. They perceived the multi-agent condition as more entertaining and liked the feeling of being part of a team.
There is a high demand for high-quality Non-Player Characters (NPCs) in video games. Hand-crafting their behavior is a labor intensive and error prone engineering process with limited controls exposed to the game designers. We propose to create such NPC behaviors interactively by training an agent in the target environment using imitation learning with a human in the loop. While traditional behavior cloning may fall short of achieving the desired performance, we show that interactivity can substantially improve it with a modest amount of human efforts. The model we train is a multi-resolution ensemble of Markov models, which can be used as is or can be further "compressed" into a more compact model for inference on consumer devices. We illustrate our approach on an example in OpenAI Gym, where a human can help to quickly train an agent with only a handful of interactive demonstrations. We also outline our experiments with NPC training for a first-person shooter game currently in development.
Affect-aware agents are one way to make social conflict between non-player characters in games feel more believable. Instead of relying only on fixed scripts, these agents maintain an internal model of emotion and social preferences that reacts to in-game events and to how other characters behave. When anger, gratitude, fear, or ambition change over time, the result may be alliance, betrayal, sacrifice, or a struggle for power between NPCs or between an NPC and the player. This paper presents a narrative literature review on how AI can recognize or simulate such emotions and social preferences to support emergent conflict. Prior work is organized into three strands: affective and motivation modeling for individual characters, social preferences and multi-agent learning for groups of agents, and drama or experience management systems that control the pacing and intensity of conflict events. For each strand, the review examines how state is represented, how conflict is triggered, and how player experience and safety are evaluated. On the basis of this review, a simple mechanism view is proposed that links emotional state and social values to typical conflict behaviors, and key design choices are identified that can make conflicts both understandable and controllable. The goal is to provide game designers and AI practitioners with practical ideas for using affect-aware agents when building games that rely on emergent social conflict.
We describe a social robot acting as a game master in an interactive tabletop role-playing game. The Robot Game Master (RGM) takes on the role of different characters, which the human players meet during the adventure, as well as of the narrator. The demonstration presents a novel software and hardware platform that allows the robot to (1) proactively lead through the storyline and to (2) react to changes in the ongoing game in real-time, while (3) fostering players' collaborations.
No abstract available
World models improve a learning agent's ability to efficiently operate in interactive and situated environments. This work focuses on the task of building world models of text-based game environments. Text-based games, or interactive narratives, are reinforcement learning environments in which agents perceive and interact with the world using textual natural language. These environments contain long, multi-step puzzles or quests woven through a world that is filled with hundreds of characters, locations, and objects. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world. We frame this task as a Set of Sequences generation problem by exploiting the inherent structure of knowledge graphs and actions and introduce both a transformer-based multi-task architecture and a loss function to train it. A zero-shot ablation study on never-before-seen textual worlds shows that our methodology significantly outperforms existing textual world modeling techniques as well as the importance of each of our contributions.
Simulating realistic interaction and motions for physics-based characters is of great interest for interactive applications, and automatic secondary character animation in the movie and video game industries. Recent works in reinforcement learning have proposed impressive results for single character simulation, especially the ones that use imitation learning based techniques. However, imitating multiple characters interactions and motions requires to also model their interactions. In this paper, we propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach that generalizes the idea of motion imitation for one character to deal with both the interaction and the motions of the multiple physics-based characters. Two unstructured datasets are given as inputs: 1) a single-actor dataset containing motions of a single actor performing a set of motions linked to a specific application, and 2) an interaction dataset containing a few examples of interactions between multiple actors. Based on these datasets, our system trains control policies allowing each character to imitate the interactive skills associated with each actor, while preserving the intrinsic style. This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.
Artificial intelligence and its applications have stood out over the last decade and presented effective solutions in the most diverse sectors and areas of knowledge. For example, in the environmental area, several challenges are faced, mainly in managing natural resources, due to numerous existing conflicts due to increased demand and scarcity of such resources. In this scenario, applications using multi-agent systems show promise. The main objective of this work is the presentation of modeling at the organizational level of agents to study the complexity and the functionalities of characters of a Role-Playing game. The case study of this work is the Gorim, an RPG Game developed in the scenario of the watershed of Lagoa Mirim and Canal Sao Gonc¸alo at ˜the Rio Grande do Sul, Brazil. The organization modeling was developed in the MOISE+ model and implemented to verify it in the JaCaMo platform through hypothetical scenarios. We demonstrated that the results of such scenarios validate the structural, functional, and deontic/normative levels of the Gorim RPG.
Non‐playable characters (NPCs) play a crucial role in enhancing immersion in video games. However, traditional NPC behaviors are often hard‐coded using methods such as Finite State Machines, Decision and Behavior trees. This has a few limitations; namely, it is quite difficult to implement complex cooperative behaviors and secondly this makes it easy for human players to identify and exploit patterns in behavior. To overcome these challenges, Reinforcement learning (RL) can be used to generate dynamic and real‐time NPC responses to human player actions. In this paper, we report on first results of applying RL techniques to a Non‐Zero Sum, adversarial asymmetric game, using a multi‐agent team. The game environment simulates a museum heist, where the objective of the successfully trained team of robbers with different skills (Locksmith, Technician) is to steal valuable items from the museum without being detected by the scripted security guards and cameras. Both agents were trained concurrently with separate policies and received both individual and group reward signals. Through this training process, the agents learned to cooperate effectively and use their skills to maximize both individual and team benefits. These results demonstrate the feasibility of realizing the full game where both robbers and security guards are trained at the same time to achieve their adversarial goals.
No abstract available
No abstract available
We present a novel computational model of emotion, personality and social relationships, implemented and evaluated in an existing commercial game (The Elder Scrolls V: Skyrim). We argue that Non Player Characters (NPCs) with such capabilities will accommodate a new experience in playing games and provide evidence supporting this. Applying the ERiSA Framework [1, 2] to the Skyrim Creation Kit, we designed a simple quest and 2 unique NPCs to interact with. When the ERiSA framework is used, players reported significant changes in their social relationship with the two NPCs compared to the baseline. Most importantly, the results further indicate that the models provide a new experience in playing games. In particular, players report enhanced emotional attachment to the NPCs and appear to forge relationships with the NPCs. Finally, the implemented models result in significant changes in the game engagement and the game immersion score.
Serious computer games have become increasingly popular; they also require more elaborate and natural behavior on the part of Non-Playing Characters. The more elaborate the interactions among characters are during a game, the more difficult it is to design these characters without the use of specialized tools geared towards implementing intelligent agents in a modular way. This thus seems to be an excellent area for the application of intelligent agent technology, which for the past two decades has been developed based on design concepts such as Goals, Intentions, Plans and Beliefs. A first attempt at connecting game engines with these types of agents has been made with Gamebots [1]. Gamebots provides an infrastructure that allows the interfacing of any agent platform to the computer game Unreal Tournament. Gamebots manages the provision of relevant information regarding the game state, while delivering commands for actions from the agents to Unreal. More recently, this package was used as the basis for more extensive middleware called Pogamut [2]. Although the aforementionedmiddleware does allow the interfacing of agents to the game engine, that in itself does not guarantee proper behavior of the agents in the game. In the workshop series on Agents for Games and Simulations [3,4], started in 2009, issues regarding the connection of agent technology to game engines has been discussed. Most of these issues derive from the fact that game engines are typically designed to be in total control of the game’s progress. On the other hand one, of the major attributes of agents developed on MAS platforms is that they are autonomous (to some extent) and interact asynchronously. We want to exploit the benefits of having agents deciding intelligently and autonomously about their next actions, while not losing control of the game. This balancing act leads to three broad categories of issues. The first category is that of technical issues; an important issue in this category is that of coping with real-time environments. Unfortunately, agent technology has hardly bothered with real-time issues up until now. A major exception is the use of agent technology in robotics, where one obviously also has to deal with real-time environments. Maybe this is one of the reasons that, in robotics, people do not use standard (BDI) agent platforms as basis for
The Restaurant Game demonstrates an end-to-end system that captures and generates social behavior for virtual agents. Over 15,000 people have played The Restaurant Game, and we have developed a system to automatically learn patterns of interaction and dialogue from logs of their gameplay sessions. These patterns guide a case-based planning system, which generates behavior and dialogue for a virtual customer or waitress who can interact with a human, or with another agent. The Restaurant Game demonstrates a first step toward empowering non-programmers to realize socially intelligent characters for a wide range of applications.
No abstract available
No abstract available
Recent advancements in multi-agent systems based on large language models (LLM) have shown potential for problem-solving and planning tasks. However, most existing LLM-based multi-agent approaches show vulnerability against byzantine attacks. First, agents instantiated on diverse LLMs may inherit biases present in the LLMs and thus exhibit deception behavior. Second, as the number of agents grows, collusive behavior among multiple malicious agents poses a potential threat. In this paper, we propose BlockAgents, an innovative framework that integrates blockchain into LLM-based cooperative multi-agent systems to mitigate byzantine behaviors. BlockAgents completes multi-agent collaboration through a unified workflow including role assignment, proposal statement, evaluation, and decision-making. To help the agent who contributes the most to the group thinking process acquire accounting rights, we propose a proof-of-thought (PoT) consensus mechanism combined with stake-based miner designation and multi-round debate-style voting. To effectively distinguish valid and abnormal answers, we design a multi-metric prompt-based evaluation method for each evaluator to score each proposal by carefully and comprehensively considering multiple dimensions. Experiments on three datasets show that BlockAgents reduces the interference of poisoning attacks on accuracy to less than 3% and reduces the success rate of backdoor attacks to less than 5%, demonstrating the resistance ability against Byzantine attacks.
No abstract available
The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach.
Effective communication is essential in multi-agent reinforcement learning (MARL) for coordinating actions and maximizing collective rewards. Two common approaches for establishing communication are Graph Neural Networks (GNNs) and Transformers. Both methods introduce communication redundancy in complex scenarios. GNN-based methods model agent relationships through entire graph structures, leading to increased computational time. Transformers also increase computations due to self-attention calculations at each node. In this study, the ACORN ( Acyclic Coordination with Reachability Networks ) framework was introduced, utilizing acyclic coordination combined with a reachability-based attention mechanism. The most relevant nodes and connections in the GNN graph are used for self-attention calculations. Time complexity is reduced to O(|V| x nk x d), which is significantly better than the O(|V|2d) complexity of standard Transformers. Acyclicity is ensured through Auto-Regressive Policy Learning and Sequence-Based Critic Learning. Experiments demonstrate that ACORN outperforms state-of-the-art methods, achieving an average improvement of 11% over MAT in challenging SMACV2 tasks and a 17% improvement within the same training time and steps.
Multi-agent Reinforcement Learning (MARL) has made significant progress in addressing coordination problems, but two key challenges persist in environments with partial observability: limited exploration and inaccurate evaluation of individual agents. To address these challenges, we propose a novel MARL framework that integrates Evolutionary Algorithms (EAs), episodic learning, and curiosity-driven exploration to optimize the coordination of joint policies using graph-based methods, named EECG. EAs are employed for their global optimization capabilities, particularly through population diversity and a gradient-free search mechanism, to enhance policy exploration. Initially, multiple agent teams explore and learn independently while sharing a common experience pool to enable data diversity. During the evolution phase, new joint policies are generated through crossover, mutation, and pareto-based selection. During the RL phase, diverse data is used to model and update the relationships among agents via Graph Neural Networks (GNNs), which help evaluate the effectiveness of individual agents' behaviors. GNNs treat agents as nodes and their interactions as edges, capturing coordination relationships effectively while dynamically assigning representations to nodes and edges. Furthermore, curiosity-based exploration motivates teams to discover new states, while a memory system stores high-reward experiences. We evaluated EECG on several benchmarks, including StarCraft II, SUMO autonomous driving, and the Multi-Agent Particle Environment. Our empirical results show that EECG consistently outperforms current baselines, with its components significantly contributing to faster convergence, especially by improving exploration and agent coordination. Our code is available: https://github.com/MercyM/EECG.
In addressing the multi-agent adversarial coordination problem, existing multi-agent reinforcement learning algorithms primarily rely on team-based rewards to guide agent policy updates, often neglecting the utilization of inter-agent relationships, which limits their performance. Drawing inspiration from human tactics, we introduce the concept of tacit behavior to improve the efficiency of multi-agent reinforcement learning by refining the learning process. This paper presents a novel two-phase framework for learning Pre-trained Tacit Behavior for efficient multi-agent adversarial Coordination (PTBC), comprising a tacit pre-training phase and a centralized adversarial training phase. We demonstrate the superiority of our method through comparisons with several algorithms, each of which possesses distinct strengths.
Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.
No abstract available
Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framework for training large language models (LLMs) as collaborative agents to enable coordinated behaviors in cooperative MARL. Each agent maintains a private intention consisting of its current goal and associated sub-tasks. Agents broadcast their intentions periodically, allowing other agents to infer coordination tasks. A propagation network transforms broadcast intentions into teammate-specific communication messages, sharing relevant goals with designated teammates. The architecture of our framework is structured into planning, grounding, and execution modules. During execution, multiple agents interact in a downstream environment and communicate intentions to enable coordinated behaviors. The grounding module dynamically adapts comprehension strategies based on emerging coordination patterns, while feedback from execution agents influnces the planning module, enabling the dynamic re-planning of sub-tasks. Results in collaborative environment simulation demonstrate intention propagation reduces miscoordination errors by aligning sub-task dependencies between agents. Agents learn when to communicate intentions and which teammates require task details, resulting in emergent coordinated behaviors. This demonstrates the efficacy of intention sharing for cooperative multi-agent RL based on LLMs.
Abstract. This paper addresses the design and coordination challenges of controlling non-player characters' (NPCs) behaviors in multi-agent systems (MAS) using behavior trees (BTs), which are preferred over finite state machines (FSMs) due to their hierarchical structure and ease of maintenance. While BTs effectively resolve the question of "What to do next?" for individual NPCs, their application in MAS, particularly with the integration of a blackboard system for central control, reveals limitations in efficiency, robustness, and heuristic capacity as system complexity increases. To explore solutions to these challenges, this study analyzes various algorithms that enhance the functionality of behavior trees within MAS. The research focuses on three primary areas: the optimization of behavior trees, the development of behavior tree search algorithms, and the improvement of communication algorithms within BTs. Methods involve a comparative analysis of existing and new algorithmic approaches to identify and address inefficiencies in NPC coordination. The findings indicate that advanced behavior tree configurations, when combined with innovative search and communication strategies, significantly improve the coordination, robustness, and operational efficiency of MAS. These enhancements allow for more dynamic and responsive NPC interactions in complex gaming environments.
Autonomous agents operating in real-world environments frequently encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. Even when agents can execute their primary task optimally when operating in isolation, their training may not account for potential negative interactions that arise in the presence of other agents. We frame the challenge of minimizing NSEs as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but recognize that addressing negative side effects creates a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks---up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods.
Large Language Models (LLMs) have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners' beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs' Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement. Code Available at https://github.com/eric-ai-lab/llm_coordination.
No abstract available
Cooperative Multi-agent Reinforcement Learning (CMARL) has shown to be promising for many real-world applications. Previous works mainly focus on improving coordination ability via solving MARL-specific challenges (e.g., non-stationarity, credit assignment, scalability), but ignore the policy perturbation issue when testing in a different environment. This issue hasn't been considered in problem formulation or efficient algorithm design. To address this issue, we firstly model the problem as a Limited Policy Adversary Dec-POMDP (LPA-Dec-POMDP), where some coordinators from a team might accidentally and unpredictably encounter a limited number of malicious action attacks, but the regular coordinators still strive for the intended goal. Then, we propose Robust Multi-Agent Coordination via Evolutionary Generation of Auxiliary Adversarial Attackers (ROMANCE), which enables the trained policy to encounter diversified and strong auxiliary adversarial attacks during training, thus achieving high robustness under various policy perturbations. Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity. The goal of quality is to minimize the ego-system coordination effect, and a novel diversity regularizer based on sparse action is applied to diversify the behaviors among attackers. The ego-system is then paired with a population of attackers selected from the maintained attacker set, and alternately trained against the constantly evolving attackers. Extensive experiments on multiple scenarios from SMAC indicate our ROMANCE provides comparable or better robustness and generalization ability than other baselines.
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution. Our code is available at https://github.com/OpenBMB/ChatDev/tree/puppeteer.
Zero-shot coordination (ZSC) is a new cooperative multi-agent reinforcement learning (MARL) challenge that aims to train an ego agent to work with diverse, unseen partners during deployment. The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the training algorithm makes ZSC a unique out-of-distribution (OOD) generalization challenge. The potential distribution gap between evaluation and deployment-time partners leads to inadequate evaluation, which is exacerbated by the lack of appropriate evaluation metrics. In this paper, we present ZSC-Eval, the first evaluation toolkit and benchmark for ZSC algorithms. ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by Best-Response Diversity (BR-Div); 3) Measurement of generalization performance with various evaluation partners via the Best-Response Proximity (BR-Prox) metric. We use ZSC-Eval to benchmark ZSC algorithms in Overcooked and Google Research Football environments and get novel empirical findings. We also conduct a human experiment of current ZSC algorithms to verify the ZSC-Eval's consistency with human evaluation. ZSC-Eval is now available at https://github.com/sjtu-marl/ZSC-Eval.
Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to what we call the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges, two issues at which current offline MARL algorithms fail. Concretely, we reveal that the prevalent model-free methods are severely deficient and cannot handle coordination-intensive offline multi-agent tasks in either toy or MuJoCo domains. To address this setback, we emphasize the importance of inter-agent interactions and propose the very first model-based offline MARL method. Our resulting algorithm, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO) generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly. This simple model-based solution solves the coordination-intensive offline tasks, significantly outperforming the prevalent model-free methods even under severe partial observability and with learned world models.
In this paper, we propose a new mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents' actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.
As LLMs are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their assigned roles has become a critical concern. In this paper, we investigate how consistently LLM-based role-playing agents'stated beliefs about the behavior of the people they are asked to role-play ("what they say") correspond to their actual behavior during role-play ("how they act"). Specifically, we establish an evaluation framework to rigorously measure how well beliefs obtained by prompting the model can predict simulation outcomes in advance. Using an augmented version of the GenAgents persona bank and the Trust Game (a standard economic game used to quantify players'trust and reciprocity), we introduce a belief-behavior consistency metric to systematically investigate how it is affected by factors such as: (1) the types of beliefs we elicit from LLMs, like expected outcomes of simulations versus task-relevant attributes of individual characters LLMs are asked to simulate; (2) when and how we present LLMs with relevant information about Trust Game; and (3) how far into the future we ask the model to forecast its actions. We also explore how feasible it is to impose a researcher's own theoretical priors in the event that the originally elicited beliefs are misaligned with research objectives. Our results reveal systematic inconsistencies between LLMs'stated (or imposed) beliefs and the outcomes of their role-playing simulation, at both an individual- and population-level. Specifically, we find that, even when models appear to encode plausible beliefs, they may fail to apply them in a consistent way. These findings highlight the need to identify how and when LLMs'stated beliefs align with their simulated behavior, allowing researchers to use LLM-based agents appropriately in behavioral studies.
No abstract available
No abstract available
Much of NLP research has focused on crowd-sourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (Silver et al., 2013). In contrast, one might hope for machine learning systems that become more useful as they interact with people. In this work, we build and deploy a role-playing game, whereby human players converse with learning agents situated in an open-domain fantasy world. We show that by training models on the conversations they have with humans in the game the models progressively improve, as measured by automatic metrics and online engagement scores. This learning is shown to be more efficient than crowdsourced data when applied to conversations with real users, as well as being far cheaper to collect.
Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. With the advent of vision language models (VLMs), agents now have enhanced visual understanding capabilities, enabling them to interact with games using only visual inputs. Despite these advances, current approaches still face challenges in action-oriented tasks, particularly in action role-playing games (ARPGs), where reinforcement learning methods are prevalent but suffer from poor generalization and require extensive training. To address these limitations, we select an ARPG, ``Black Myth: Wukong'', as a research platform to explore the capability boundaries of existing VLMs in scenarios requiring visual-only input and complex action output. We define 12 tasks within the game, with 75% focusing on combat, and incorporate several state-of-the-art VLMs into this benchmark. Additionally, we will release a human operation dataset containing recorded gameplay videos and operation logs, including mouse and keyboard actions. Moreover, we propose a novel VARP (Vision Action Role-Playing) agent framework, consisting of an action planning system and a visual trajectory system. Our framework demonstrates the ability to perform basic tasks and succeed in 90% of easy and medium-level combat scenarios. This research aims to provide new insights and directions for applying multimodal agents in complex action game environments. The code and datasets will be made available at https://varp-agent.github.io/.
No abstract available
In role-playing games (RPGs), the level of immersion is critical-especially when an in-game agent conveys tasks, hints, or ideas to the player. For an agent to accurately interpret the player's emotional state and contextual nuances, a foundational level of understanding is required, which can be achieved using a Large Language Model (LLM). Maintaining the LLM's focus across multiple context changes, however, necessitates a more robust approach, such as integrating the LLM with a dedicated task allocation model to guide its performance throughout gameplay. In response to this need, we introduce Voting-Based Task Assignment (VBTA), a framework inspired by human reasoning in task allocation and completion. VBTA assigns capability profiles to agents and task descriptions to tasks, then generates a suitability matrix that quantifies the alignment between an agent's abilities and a task's requirements. Leveraging six distinct voting methods, a pre-trained LLM, and integrating conflict-based search (CBS) for path planning, VBTA efficiently identifies and assigns the most suitable agent to each task. While existing approaches focus on generating individual aspects of gameplay, such as single quests, or combat encounters, our method shows promise when generating both unique combat encounters and narratives because of its generalizable nature.
Board games are extensively studied in the AI community because of their ability to reflect/represent real-world problems with a high-level of abstraction, and their irreplaceable role as testbeds of state-of-the-art AI algorithms. Modern board games are commonly featured with partially observable state spaces and imperfect information. Despite some recent successes in AI tackling perfect information board games like chess and Go, most imperfect information games are still challenging and have yet to be solved. This paper empirically explores the capabilities of a state-of-the-art Reinforcement Learning (RL) algorithm – Proximal Policy Optimization (PPO) in playing Ticket to Ride, a popular board game with features of imperfect information, large state-action space, and delayed rewards. This paper explores the feasibility of the proposed generalizable modelling and training schemes using a general-purpose RL algorithm with no domain knowledge-based heuristics beyond game rules, game states and scores to tackle this complex imperfect information game. The performance of the proposed methodology is demonstrated in a scaled-down version of Ticket to Ride with a range of RL agents obtained with different training schemes. All RL agents achieve clear advantages over a set of well-designed heuristic agents. The agent constructed through a self-play training scheme outperforms the other RL agents in a Round Robin tournament. The high performance and versality of this self-play agent provide a solid demonstration of the capabilities of this framework.
Humans are highly co-operative and thus cognitively, affectively, and motivationally tuned to pursue shared goals. Yet, cooperative tasks typically require people to constantly take and switch individual roles. Task relevance is dictated by these roles and thereby dynamically changing. Here, we designed a dyadic game to test whether the family of P3 components can trace this dynamic allocation of task relevance. We demonstrate that late positive event-related potential (ERP) modulations not only reflect predictable asymmetries between receiving and sending information but also differentiate whether the receiver's role is related to correct decision making or action monitoring. Furthermore, similar results were observed when playing the game with a computer, suggesting that experimental games may motivate humans to similarly cooperate with an artificial agent. Overall, late positive ERP waves provide a real-time measure of how role taking dynamically shapes the meaning and relevance of stimuli within collaborative contexts. Our results, therefore, shed light on how the processes of mutual coordination unfold during dyadic cooperation.
No abstract available
In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compelling test-bed to study the decision-making and language-processing capabilities of LLM Agents. To facilitate research in this line, we introduce AvalonBench - a comprehensive game environment tailored for evaluating multi-agent LLM Agents. This benchmark incorporates: (1) a game environment for Avalon, (2) rule-based bots as baseline opponents, and (3) ReAct-style LLM agents with tailored prompts for each role. Notably, our evaluations based on AvalonBench highlight a clear capability gap. For instance, models like ChatGPT playing good-role got a win rate of 22.2% against rule-based bots playing evil, while good-role bot achieves 38.2% win rate in the same setting. We envision AvalonBench could be a good test-bed for developing more advanced LLMs (with self-playing) and agent frameworks that can effectively model the layered complexities of such game environments.
No abstract available
No abstract available
: Computer role-playing games (RPGs) often include a simulated morality system as a core design element. Games’ morality systems can include both god’s eye view aspects, in which certain actions are inherently judged by the simulated world to be good or evil, as well as social simulations, in which non-player characters (NPCs) react to judgments of the player’s and each others’ activities. Games with a larger amount of social simulation have clear affinities to multi-agent systems (MAS) research on artificial societies. They differ in a number of key respects, however, due to a mixture of pragmatic game-design considerations and their typically strong embeddedness in narrative arcs, resulting in many important aspects of moral systems being represented using explicitly scripted scenarios rather than through agent-based simulations. In this position paper, we argue that these similarities and differences make RPGs a promising challenge domain for MAS research, highlighting features such as moral dilemmas situated in more organic settings than seen in game-theoretic models of social dilemmas, and heterogeneous representations of morality that use both moral calculus systems and social simulation. We illustrate some possible approaches using a case study of the morality systems in the game The Elder Scrolls IV: Oblivion .
No abstract available
No abstract available
No abstract available
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources. We present an artificial intelligence research environment, inspired by the human game genre of MMORPGs (Massively Multiplayer Online Role-Playing Games, a.k.a. MMOs), that aims to simulate this setting in microcosm. As with MMORPGs and the real world alike, our environment is persistent and supports a large and variable number of agents. Our environment is well suited to the study of large-scale multiagent interaction: it requires that agents learn robust combat and navigation policies in the presence of large populations attempting to do the same. Baseline experiments reveal that population size magnifies and incentivizes the development of skillful behaviors and results in agents that outcompete agents trained in smaller populations. We further show that the policies of agents with unshared weights naturally diverge to fill different niches in order to avoid competition.
Game-based benchmarks have been playing an essential role in the development of Artificial Intelligence (AI) techniques. Providing diverse challenges is crucial to push research toward innovation and understanding in modern techniques. Rinascimento provides a parameterised partially-observable mul-tiplayer card-based board game, these parameters can easily modify the rules, objectives and items in the game. We describe the framework in all its features and the game-playing challenge providing baseline game-playing AIs and analysis of their skills. We reserve to agents’ hyper-parameter tuning a central role in the experiments highlighting how it can heavily influence the performance. The base-line agents contain several additional contribution to Statistical Forward Planning algorithms.
No abstract available
are emerging frameworks where citizens collectively share renewable energy. Levering knowledge about this topic is challenging for how varied these types of communities might be and how many actors are involved in decision taking. We are developing En-join, a game in which the player has to solve open-ended challenges that are mediated and evaluated by conversational agents that represent members of a EC. We implemented and prompted an LLM (Phi-4) to perform role-playing and evaluation simultaneously. We tested prompt variants indicating personality and behavior and meta-evaluated the evaluation performance using six predefined answers across three levels. Our results suggest that indicating social preferences noticeably affects the evaluation behavior. We contribute to the field of games and serious games by showing how LLMs can be used as conversational characters and evaluator agents simultaneously, and suggest that role-playing might be affecting evaluation behavior in any LLM implementations.
No abstract available
No abstract available
Machine learning is having a significant impact on video games, from speeding up the development of new games to the development of new methods in gaming. This study discusses the impact of using ML models in the development and improvement of video games. Examples cited in this paper relate to the ways in which NPCs behave. With the aid of Unity's ML-Agents toolkit, developers can create NPCs that respond to different strategies in real time. This makes the games more exciting and complex. The paper also outlines the possibilities of single-player games, such as first-person shooters and a role-playing game, to notice NPC behavior during the gameplay. In these gaming styles, RL frameworks can be applied to NPCs to enable them to learn and adapt to the environment to create an interactive experience for the player. The conclusions put an emphasis on the evolution of video games with the advent of ML tools; as predicted, NPC behavior becomes more intuitive and less expected.
This study explores the application of generative artificial intelligence (AI) in video games, focusing on improving content richness and personalized player experience. Traditional non-player character (NPC) systems have limitations in adaptability and interactivity. This study explores how generative AI can improve NPC behavior and dialogue. Then, this paper proposes a framework that combines generative agents, Transformer-based dialogue models, and reinforcement learning. Specifically, the generative agent simulates memory-driven planning, the Transformer model generates context-aware dialogues, and reinforcement learning supports adaptive interactions. This study is based on a generative agent dataset and a Role-playing game (RPG) dialogue corpus. The results show that the proposed method enhances the realism of NPCs, the coherence of game narratives, and the responsiveness of player interactions, and improves player immersion and the diversity of interactions. This provides practical insights for scalable intelligent game development and shows the potential of artificial intelligence in automating complex content creation and points out the direction for the future combination of games and artificial intelligence.
Reward acts as a signal to guide the agent’s learning process in Reinforcement Learning (RL), evaluating and assigning rewards to the agent’s actions based on theiralignment with goals. Designing reward is challenging in multiagent environment such as StarCraft II benchmark since agents face credit allocation and role adaptation problems. Recent studies have successfully exploited the language understanding and reasoning capabilities of large language models (LLMs) to learn manipulation tasks. Impressed by the remarkable power of LLMs, this paper employs LLMs as role-specific reward designer for playing StarCraft II, making rewards more flexible and task-oriented. Firstly, we develop an interactive text and multiagent RL environment to study real-time strategy generation in StarCraft II. Secondly, we use LLMs to interpret the game situation and understand agent roles from user instructions. Then, by assigning appropriate subtasks, LLMs quantify the completion of these subtasks to generate role-specific rewards. Further, credit assignment problem is addressed by introducing dynamic reward weights in value decomposition method. In StarCraft II maps, experiments show that role-aligned RL agents trained with our framework achieve superior policy performance, and win rate results demonstrates the effectiveness of our approach in decision-making for micromanagement and long-term planning.
Decision-making is a basic component of agents’ (e.g., intelligent sensors) behaviors, in which one’s cognition plays a crucial role in the process and outcome. Extensive games, a class of interactive decision-making scenarios, have been studied in diverse fields. Recently, a model of extensive games was proposed in which agent cognition of the structure of the underlying game and the quality of the game situations are encoded by artificial neural networks. This model refines the classic model of extensive games, and the corresponding equilibrium concept—cognitive perfect equilibrium (CPE)—differs from the classic subgame perfect equilibrium, since CPE takes agent cognition into consideration. However, this model neglects the consideration that game-playing processes are greatly affected by agents’ cognition of their opponents. To this end, in this work, we go one step further by proposing a framework in which agents’ cognition of their opponents is incorporated. A method is presented for evaluating opponents’ cognition about the game being played, and thus, an algorithm designed for playing such games is analyzed. The resulting equilibrium concept is defined as adversarial cognition equilibrium (ACE). By means of a running example, we demonstrate that the ACE is more realistic than the CPE, since it involves learning about opponents’ cognition. Further results are presented regarding the computational complexity, soundness, and completeness of the game-solving algorithm and the existence of the equilibrium solution. This model suggests the possibility of enhancing an agent’s strategic ability by evaluating opponents’ cognition.
No abstract available
Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by solving the field's most critical challenge: enabling language models to truly remember. Unlike prior approaches, MIRIX transcends text to embrace rich visual and multimodal experiences, making memory genuinely useful in real-world scenarios. MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale. We validate MIRIX in two demanding settings. First, on ScreenshotVQA, a challenging multimodal benchmark comprising nearly 20,000 high-resolution computer screenshots per sequence, requiring deep contextual understanding and where no existing memory systems can be applied, MIRIX achieves 35% higher accuracy than the RAG baseline while reducing storage requirements by 99.9%. Second, on LOCOMO, a long-form conversation benchmark with single-modal textual input, MIRIX attains state-of-the-art performance of 85.4%, far surpassing existing baselines. These results show that MIRIX sets a new performance standard for memory-augmented LLM agents. To allow users to experience our memory system, we provide a packaged application powered by MIRIX. It monitors the screen in real time, builds a personalized memory base, and offers intuitive visualization and secure local storage to ensure privacy.
LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in inadaptability and high inference costs. In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. To address this novel task, we unify the representation of MAS as executable code and propose a consistency-oriented data construction pipeline to create a high-quality dataset comprising coherent and consistent query-MAS pairs. Using this dataset, we train MAS-GPT, an open-source medium-sized LLM that is capable of generating query-adaptive MAS within a single LLM inference. The generated MAS can be seamlessly applied to process user queries and deliver high-quality responses. Extensive experiments on 9 benchmarks and 5 LLMs show that the proposed MAS-GPT consistently outperforms 10+ baseline MAS methods on diverse settings, indicating MAS-GPT's high effectiveness, efficiency and strong generalization ability. Code will be available at https://github.com/rui-ye/MAS-GPT.
Multi-agent systems (MAS) have emerged as a promising approach for enhancing the reasoning capabilities of large language models in complex problem-solving; however, current MAS frameworks suffer from poor flexibility and scalability with underdeveloped optimization strategies. To address these challenges, we propose ReSo, which integrates task graph generation with a reward-driven two-stage agent selection process centered on our Collaborative Reward Model that provides fine-grained reward signals to optimize MAS cooperation. We also introduce an automated data synthesis framework for generating MAS benchmarks without any human annotations. Experimental results show that ReSo matches or outperforms existing methods, achieving 33.7 percent accuracy on Math-MAS and 32.3 percent accuracy on SciBench-MAS, where other approaches completely fail.
Leveraging more test-time computation has proven to be an effective way to boost the reasoning capabilities of large language models (LLMs). Among various methods, the verify-and-improve paradigm stands out for enabling dynamic solution exploration and feedback incorporation. However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (Direct Policy Search by Dynamic Programming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. Theoretically, DPSDP can match the performance of any policy within the training distribution. Empirically, we instantiate DPSDP with various base models and show improvements on both in- and out-of-distribution benchmarks. For example, on benchmark MATH 500, majority voting over five refinement steps increases first-turn accuracy from 58.2% to 63.2% with Ministral-based models. An ablation study further confirms the benefits of multi-agent collaboration and out-of-distribution generalization.
Balanced intransitive relationships are critical to the depth of strategy and player retention within esports games. Intransitive relationships comprise the metagame, a collection of strategies and play styles that are viable, each providing counterplay for other viable strategies. This work presents a framework for testing the balance of massive online battle arena (MOBA) games using deep reinforcement learning to identify the synergies between characters by measuring their effectiveness against the other compositions within the games character roster. This research is designed for game designers and developers to show how multi-agent reinforcement learning (MARL) can accelerate the balancing process and highlight potential game-balance issues during the development process. Our findings conclude that accurate measurements of game balance can be found with under 10 hours of simulation and show imbalances that traditional cost curve analysis approaches failed to capture. Furthermore, we discovered that this approach reduced imbalance in each character's win rate by 20% in our example project a key measurement that would be impossible to measure without collecting data from hundreds of human-controlled games previously. The project's source code is publicly available at https://github.com/Taikatou/top-down-shooter.
An ability to morally reason is crucial to the believability of many fictional characters, from Jane Austen’s heroines to the denizens of The Good Place. These works often foreground the complexity of moral questions and the circumstances un- der which different forms of behavior might be justified. Morality is also foregrounded in many games, from Black and White to Mass Effect 3. Yet, most in-game characters judge other characters (or the player) based on a single reputation scale or binary values of right and wrong. There has been little exploration in games of the relationship between char- acter values and beliefs and moral reasoning. In keeping with this year’s conference theme, “Oh the Humanity,” this design postmortem paper describes the design and development of Argument Box, a model of moral argumentation and reason- ing based on Lakoff’s metaphor theory of moral politics. We describe our design approach, iterations, and authoring con- cerns — covering what went right and wrong in our attempts to model morality-based argumentation for believable game characters.
Technological developments are the main drivers of global social, economic, and cultural change, including in the rapidly growing gaming industry. The Role-Playing Game (RPG) genre, in which players portray characters in the game's story, is gaining popularity. The application of FSM Models and AI technology in character development and RPG game interaction not only resulted in exciting entertainment, but also inspired similar uses in various fields. With AI, characters interact dynamically with players and environments, and FSM Models govern complex character behavior, the game experience is even more immersive. RPG Maker, one of the popular engines, simplifies the process of creating RPG games with an easy user interface. The implementation of the FSM Model is done through events and switches, directing storylines and character situations with structured logic. This study analyzes the application of FSM Model in MMORPG RPG games. Through the design, testing, and analysis stages, FSM proved effective in creating games that combine entertainment with learning. This game invites players to look for requirements and challenges to proceed to the next level. The result is an MMORPG game played on a PC with a Windows operating system, providing an educational and entertaining gaming experience.
ABSTRACT This study explores the effects of the perspective-taking of non-player characters (NPCs) on enhancing game immersion in prosocial virtual reality (VR) games. Prosocial games are games focusing on helping others. Game researchers have been keen to investigate factors that influence the immersive experience in digital games. Previous studies show that VR allows people to take the perspective of others, inducing empathy and prosocial behaviour in the real world. In this lab-based study, we explore whether and how taking the perspective of other game characters – NPCs in a prosocial VR game – influences players’ in-game empathy towards NPCs and game immersion. Participants first experienced either a robot’s perspective of being destroyed by fire in VR or read a text description about the same event. Then, they participated a prosocial VR game in which they saved robots. The findings show that perspective-taking experiences indirectly enhance participants’ game immersion via the effects of closeness with the destroyed robot and empathy towards the four robots protected by the player. This indirect effect is moderated by players’ weekly exposure to video games. These results suggest that VR-based perspective-taking of NPCs can indirectly enhance gameplay experiences in prosocial VR games. Theoretical and game design implications are discussed.
No abstract available
Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy generation, state interpretation and planning function generation modules, before the MARL policy training process. This avoids the ongoing costs and computational time associated with frequent LLMs API calls during training. Moreover, trained decentralized policies based on normal-sized neural networks operate independently of the LLM. We evaluate our method across two different environments and demonstrate that YOLO-MARL outperforms traditional MARL algorithms. The Github repository of our code can be found at https://github.com/paulzyzy/YOLO-MARL.
Can social agents be assertive and persuade users? To what extent do the persuasion abilities of robots depend on the users' own traits? In this paper, we describe the results of a study in which participants interacted with robotic Non-Player Characters (NPC) displaying different levels of assertiveness (high and low), in a storytelling scenario. We sought to understand how the level of assertiveness displayed by the robots impacted the participants' decision-making process and game experience. Our results suggest that NPCs displaying lower levels of assertiveness evoke more positive emotional responses but are not more effective at influencing players' decisions when compared to NPCs displaying higher levels of this trait. However, NPCs displaying a personality trait are more effective persuaders than NPCs not displaying this feature. Overall, this paper highlights the importance of considering the player's personality and its relation to task-specific attributes during the process of game design.
Team formation in multi-agent systems usually assumes the capabilities of each team member are known, and the best formation can be derived from that information. As AI agents become more sophisticated, this characterisation is becoming more elusive and less predictive about the performance of a team in cooperative or competitive situations. In this paper, we introduce a general and flexible way of anticipating the outcome of a game for any lineups (the agents, sociality regimes and any other hyperparameters for the team). To this purpose, we simply train an assessor using an appropriate team representation and standard machine learning techniques. We illustrate how we can interrogate the assessor to find the best formations in a pursuit–evasion game for several scenarios: offline team formation, where teams have to be decided before the game and not changed afterwards, and online team formation, where teams can see the lineups of the other teams and can be changed at any time.
No abstract available
Video game testing has become a major investment of time, labor, and expense in the game industry. Particularly the balancing of in-game units, characters, and classes can cause long-lasting issues that persist years after a game's launch. While approaches incorporating artificial intelligence have already shown successes in reducing manual effort and enhancing game development processes, most of these draw on heuristic, generalized, or optimal behavior routines, while actual low-level decisions from individual players and their resulting playing styles are rarely considered. In this article, we apply deep player behavior modeling to turn atomic actions of 213 players from six months of single-player instances within the MMORPG Aion into generative models that capture and reproduce particular playing strategies. In a subsequent simulation, the resulting generative agents (“replicants”) were tested against common NPC opponent types of MMORPGs that iteratively increased in difficulty, respective to the primary factor that constitutes this enemy type (Melee, Ranged, Rogue, Buffer, Debuffer, Healer, Tank, or Group). As a result, imbalances between classes as well as strengths and weaknesses regarding particular combat challenges could be identified and regulated automatically.
Playing video games is one way to break up the monotony of an otherwise boring day. However, a significant number of people quickly tire of playing some of the games that are available to them. This study will result in the development of a strategy that will give a rapid response during combat and may boost NPC abilities. This will be done so that this problem can be circumvented. There is a non-player character that is considered to be one of the most important in the game (NPC). NPCs that are autonomous and adaptive may change their behavior in reaction to the decisions made by the player as well as the conditions in their environment. Previous research has made use of the neural network methodology to forecast the behavior of NPCs; however, this method has a drawback in that the predicted behavior is not necessarily as desired, which leads to a poor level of accuracy. This study attempts to answer the problem of inadequate accuracy by using as its three input parameters the power of the non-player character (NPC), the distance between the player and the NPC, and the power of the opponent. The outcomes of the test indicate that the machine learning technique may be used to identify the results of the NPC behavior analysis as well as the level of accuracy reached.
: Intent recognition refers to obtaining the observations of an agent and then using the observations to reason its current state and to predict its future actions. Behavior modeling, describing the behavior or performance of an agent, is an important research area in intent recognition. However, few studies have combined behavior modeling with intent recognition to investigate its real-world applications. In this paper, we study behavior modeling for intent recognition for cognitive intelligence, aiming to enhance the situational awareness capability of AI and expand its applications in multiple fields. Taking the combat environment and tanks as the research object, based on the behavior tree and SBR recognition algorithm, this paper designs the framework and experiments for behavior modeling and intent recognition. Firstly, uses the evolution behavior tree algorithm to autonomously generate the behavior model adapted to the environment. Secondly uses the SBR algorithm to effectively recognize actions and plan paths of enemy tank to guide self-tank actions in the TankSimV1.20 simulation platform. The results show that the tank survival rate increases by 80% under the guidance of the intent recognition results, and the method in this paper can provide effective guidance for the intent recognition behavior modeling, which has a broad application prospect.
Video story interaction enables viewers to engage with and explore narrative content for personalized experiences. However, existing methods are limited to user selection, specially designed narratives, and lack customization. To address this, we propose an interactive system based on user intent. Our system uses a Vision Language Model (VLM) to enable machines to understand video stories, combining Retrieval-Augmented Generation (RAG) and a Multi-Agent System (MAS) to create evolving characters and scene experiences. It includes three stages: 1) Video story processing, utilizing VLM and prior knowledge to simulate human understanding of stories across three modalities. 2) Multi-space chat, creating growth-oriented characters through MAS interactions based on user queries and story stages. 3) Scene customization, expanding and visualizing various story scenes mentioned in dialogue. Applied to the Harry Potter series, our study shows the system effectively portrays emergent character social behavior and growth, enhancing the interactive experience in the video story world.
Large Language Models (LLMs) have revolutionized open-domain dialogue agents but encounter challenges in multi-character role-playing (MCRP) scenarios. To address the issue, we present Neeko, an innovative framework designed for efficient multiple characters imitation. Neeko employs a dynamic low-rank adapter (LoRA) strategy, enabling it to adapt seamlessly to diverse characters. Our framework breaks down the role-playing process into agent pre-training, multiple characters playing, and character incremental learning, effectively handling both seen and unseen roles. This dynamic approach, coupled with distinct LoRA blocks for each character, enhances Neeko’s adaptability to unique attributes, personalities, and speaking patterns. As a result, Neeko demonstrates superior performance in MCRP over most existing methods, offering more engaging and versatile user interaction experiences.
We address the problem of controlling and simulating interactions between multiple physics-based characters, using short unlabeled motion clips. We propose Adversarial Interaction Priors (AIP), a multi-agents generative adversarial imitation learning (MAGAIL) approach, which extends recent deep reinforcement learning (RL) works aiming at imitating single character example motions. The main contribution of this work is to extend the idea of motion imitation of a single character to interaction imitation between multiple characters. Our method uses a control policy for each character to imitate interactive behaviors provided by short example motion clips, and associates a discriminator for each character, which is trained on actor-specific interactive motion clips. The discriminator returns interaction rewards that measure the similarity between generated behaviors and demonstrated ones in the reference motion clips. The policies and discriminators are trained in a multi-agent adversarial reinforcement learning procedure, to improve the quality of the behaviors generated by each agent. The initial results show the effectiveness of our method on the interactive task of shadowboxing between two fighters.
LLM role-playing aims to portray arbitrary characters in interactive narratives, yet existing systems often suffer from limited immersion and adaptability. They typically under-model dynamic environmental information and assume largely static scenes and casts, offering insufficient support for multi-character orchestration, scene transitions, and on-the-fly character introduction. We propose an adaptive multi-agent role-playing framework, AdaMARP, featuring an immersive message format that interleaves [Thought], (Action),, and Speech, together with an explicit Scene Manager that governs role-playing through discrete actions (init_scene, pick_speaker, switch_scene, add_role, end) accompanied by rationales. To train these capabilities, we construct AdaRPSet for the Actor Model and AdaSMSet for supervising orchestration decisions, and introduce AdaptiveBench for trajectory-level evaluation. Experiments across multiple backbones and model scales demonstrate consistent improvements: AdaRPSet enhances character consistency, environment grounding, and narrative coherence, with an 8B actor outperforming several commercial LLMs, while AdaSMSet enables smoother scene transitions and more natural role introductions, surpassing Claude Sonnet 4.5 using only a 14B LLM.
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Reinforcement learning (RL) approaches, particularly Q-learning, have emerged as strong tools for autonomous agent training, allowing agents to acquire optimum decision-making rules through interaction with their surroundings. This research investigates the use of Q-learning in the context of training autonomous agents for robotic soccer, a complex and dynamic arena that necessitates strategic planning, coordination, and adaptation. We studied the learning progress and performance of agents taught using Q-learning in a series of experiments carried out in a simulated soccer setting. During training, agents interacted with the soccer environment, iteratively changing their Q-values in response to observable rewards and behaviors. Despite the high-dimensional and stochastic character of the soccer domain, Q-learning helped the agents develop excellent tactics and decision-making capabilities. Notably, our study found that, on average, the agents required 64 steps to reach a stable policy with an average reward of -1.
Constella: Supporting Storywriters’ Interconnected Character Creation through LLM-based Multi-Agents
Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, balance similarities and differences among characters, and intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters’ interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7–8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters’ thoughts and emotions, and deepened writers’ understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers’ attention and effort across the character cast.
In this study, we explore the application of Large Language Models (LLMs) in \textit{Jubensha}, a Chinese detective role-playing game and a novel area in Artificial Intelligence (AI) driven gaming. We introduce the first dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in this game. To evaluate the gaming performance of these AI agents, we developed novel methods measuring their mastery of case information and reasoning skills. Furthermore, we incorporated the latest advancements in in-context learning to improve the agents' performance in information gathering, murderer identification, and logical reasoning. The experimental results validate the effectiveness of our proposed methods. This work aims to offer a novel perspective on understanding LLM capabilities and establish a new benchmark for evaluating large language model-based agents.
No abstract available
Enabling humanoid robots to clean rooms has long been a pursued dream within humanoid research communities. However, many tasks require multi-humanoid collaboration, such as carrying large and heavy furniture together. Given the scarcity of motion capture data on multi-humanoid collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a framework designed to tackle the challenge of multi-humanoid object transportation problem through a two-phase learning paradigm: individual skill learning and subsequent policy transfer. First, a single humanoid character learns to interact with objects through imitation learning from human motion priors. Then, the humanoid learns to collaborate with others by considering the shared dynamics of the manipulated object using centralized training and decentralized execution (CTDE) multi-agent RL algorithms. When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-humanoid HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-humanoid interactions, and can be seamlessly extended to include more participants and a wide range of object types.
No abstract available
No abstract available
No abstract available
No abstract available
Recent advances in artificial intelligence technologies have begun to transform the gaming industry, especially in the areas of player-character interaction and narrative development. Traditionally, game stories and character relationships are predefined through scripted dialogues and sequences, which requires developers to invest a lot of time and effort. However, AI-driven approaches such as large language models (LLMs) and deep learning techniques offer a dynamic alternative that can enable more flexible, player-driven interactions and adaptive AI behaviors. This paper comprehensively reviews the current role of AI in game design from multiple perspectives, including applications in multi-agent interaction, procedural level and game content generation, and game development process optimization. In addition, this study explores the advantages and limitations of AI technology, coping with technical challenges, and ethical issues that may arise during the implementation of AI. The results are intended to provide a reference for the future application of AI in game design and provide recommendations for coping with emerging risks.
No abstract available
No abstract available
No abstract available
In most Interactive Storytelling systems, user interaction is based on natural language communication with virtual agents, either through isolated utterances or through dialogue. Natural language communication is also an essential element of interactive narratives in which the user is supposed to impersonate one of the story's characters. Whilst techniques for narrative generation and agent behaviour have made significant progress in recent years, natural language processing remains a bottleneck hampering the scalability of Interactive Storytelling systems. In this paper, we introduce a novel interaction technique based solely on emotional speech recognition. It allows the user to take part in dialogue with virtual actors without any constraints on style or expressivity, by mapping the recognised emotional categories to narrative situations and virtual characters feelings. Our Interactive Storytelling system uses an emotional planner to drive characters' behaviours. The main feature of this approach is that characters' feelings are part of the planning domain and are at the heart of narrative representations. The emotional speech recogniser analyses the speech signal to produce a variety of features which can be used to define ad-hoc categories on which to train the system. The content of our interactive narrative is an adaptation of one chapter of the XIXth century classic novel, Madame Bovary, which is well suited to a formalisation in terms of characters' feelings. At various stages of the narrative, the user can address the main character or respond to her, impersonating her lover. The emotional category extracted from the user utterance can be analysed in terms of the current narrative context, which includes characters' beliefs, feelings and expectations, to produce a specific influence on the target character, which will become visible through a change in its behaviour, achieving a high level of realism for the interaction. A limited number of emotional categories is sufficient to drive the narrative across multiple courses of actions, since it comprises over thirty narrative functions. We report results from a fully implemented prototype, both in terms of proof of concept and of usability through a preliminary user study.
No abstract available
No abstract available
Interactive narrative allows the user to play a role in a story and interact with other characters controlled by the system. Directorial control is a procedure for dynamically tuning the interaction towards the author's desired effects. Most existing approaches for directorial control are built within plot-centric frameworks for interactive narrative and do not have a systematic way to ensure that the characters are always well-motivated during the interaction. Thespian is a character-centric framework for interactive narrative. In our previous work on Thespian, we presented an approach for applying directorial control while not affecting the consistency of characters' motivations. This work evaluates the effectiveness of our directorial control approach. Given the priority of generating only well-motivated characters' behaviors, we empirically evaluate how often the author's desired effects are achieved. We also discuss how the directorial control procedure can save the author effort in configuring the characters.
QuickWoZ: a multi-purpose wizard-of-oz framework for experiments with embodied conversational agents
No abstract available
No abstract available
本报告综合了多智能体游戏角色领域的研究现状,展示了从传统基于规则的架构向深度强化学习与大语言模型驱动转型的完整图景。研究不仅在技术底层实现了更高效的通信与协调算法(MARL),在认知层面引入了具备长期记忆与逻辑推理的 LLM 框架,更在社会心理层面深入探讨了角色的道德、情感与可信度。此外,具身智能与多模态交互的研究正推动 NPC 从简单的脚本控制走向复杂的物理与视觉感知。完善的评估基准与原型工具链则为该领域的持续演进提供了科学的度量衡,共同推动游戏角色向更智能化、社会化和人性化的方向发展。