群体agent的论文 互相博弈反思
大语言模型(LLM)驱动的战略博弈与认知反思机制
该组论文探讨了利用LLM作为智能体核心,通过心智理论(ToM)、自我反思、递归推理、人格建模以及显式辩论来处理复杂博弈任务。研究强调了LLM在开放环境中的社会推理、法律制定、对抗性演化(红蓝对抗)以及在特定策略游戏(如外交博弈、谋杀之谜)中的战略决策能力。
- Positive Experience Reflection for Agents in Interactive Text Environments(Philip Lippmann, M. Spaan, Jie Yang, 2024, ArXiv)
- Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning(Sai Wang, Yu Wu, Zhongwen Xu, 2025, ArXiv)
- En-join: Speculative LLM Play for Energy Community Engagement and Sustainability Awareness(Luiz Sachser, Andrés Isaza-Giraldo, Anna Jiskrová, Vanessa Cesário, Pedro F. Campos, Lucas Pereira, Paulo Bala, 2025, Companion Publication of the 2025 ACM Designing Interactive Systems Conference)
- NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making(Asutosh Hota, Jussi P. P. Jokinen, 2025, ArXiv)
- LLM4MAC: An LLM-Driven Reinforcement Learning Framework for MAC Protocol Emergence(Renxuan Tan, Rongpeng Li, Zhifeng Zhao, 2025, 2025 IEEE 26th International Workshop on Signal Processing and Artificial Intelligence for Wireless Communications (SPAWC))
- How Strategic Agents Respond: Comparing Analytical Models with LLM-Generated Responses in Strategic Classification(Tian Xie, Pavan Rauch, Xueru Zhang, 2025, ArXiv)
- Psychologically Enhanced AI Agents(Maciej Besta, Shriram Chandran, Robert Gerstenberger, M. Lindner, Marcin Chrapek, Sebastian Hermann Martschat, Taraneh Ghandi, Patrick Iff, H. Niewiadomski, Piotr Nyczyk, Jürgen Müller, Torsten Hoefler, 2025, ArXiv)
- STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making(Chuanhao Li, Runhan Yang, Tiankai Li, Milad Bafarassat, Kourosh Sharifi, Dirk Bergemann, Zhuoran Yang, 2024, ArXiv)
- BladeRunner-AD: Adversarial Co-evolution of Robust and Interpretable Driving Policies via LLM Agent Games(Ao Guo, Jiachen Hou, Fei-Yue Wang, 2025, 2025 21st IEEE International Conference on Mechatronic and Embedded Systems and Applications (MESA))
- PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games(Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He, 2024, ArXiv)
- DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy(Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao, 2025, ArXiv)
- Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations(Gian Marco Orlando, Jin Ye, Valerio La Gatta, Mahdis Saeedi, Vincenzo Moscato, Emilio Ferrara, Luca Luceri, 2025, ArXiv)
- APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents(Jun Chen, Tao Gao, 2024, ArXiv)
- Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game(Rong Ye, Yongxin Zhang, Yikai Zhang, Haoyu Kuang, Zhongyu Wei, Peng Sun, 2025, ArXiv)
- LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game(Fang Liang, Tianshi ZHENG, Chunkit Chan, Yauwai Yim, Yangqiu Song, 2025, ArXiv)
- Evaluating and Enhancing LLMs Agent Based on Theory of Mind in Guandan: A Multi-Player Cooperative Game Under Imperfect Information(Yauwai Yim, Chunkit Chan, Tianyu Shi, Zheye Deng, Wei Fan, Tianshi ZHENG, Yangqiu Song, 2024, 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT))
- Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics(Trung-Kiet Huynh, Duy-Minh Dao-Sy, Thanh-Bang Cao, Phong-Hao Le, Hong-Dan Nguyen, Phu-Quy Nguyen-Lam, Minh-Luan Nguyen-Vo, Hong-Phat Pham, P. Pham, T. Than, Chi-Nguyen Tran, Huy Tran, Gia-Thoai Tran-Le, Alessio Buscemi, Le Hong Trang, H. Anh, 2025, ArXiv)
- Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game(Ruiqi Dong, Zhixuan Liao, Guangwei Lai, Yuhan Ma, Danni Ma, Chenyou Fan, 2024, ArXiv)
- Personality-Aware Multiagent Large Language Models for Strategic Interactions in Polymatrix Games(Rafik Hadfi, Yin Jou Huang, 2025, Proceedings of the 13th International Conference on Human-Agent Interaction)
- Everyone Contributes! Incentivizing Strategic Cooperation in Multi-LLM Systems via Sequential Public Goods Games(Yunhao Liang, Yuan Qu, Jingyuan Yang, Shaochong Lin, Zuo-Jun Max Shen, 2025, ArXiv)
- Game-theory behaviour of large language models: The case of Keynesian beauty contests(S. Lu, 2025, Economics and Business Review)
- Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames(Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis, 2025, ArXiv)
- Multi-Step Adaptive Attack Agent: A Dynamic Approach for Jailbreaking Large Language Models(Huiyun Jing, Jincheng Wei, Wei Wei, Yingshui Tan, Boren Zheng, Qingsong Yao, 2025, 2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI))
- Reasoning and Reflection in the Game of Nomic: Self-Organising Self-Aware Agents with Mutable Rule-Sets(S. Holland, J. Pitt, D. Sanderson, D. Busquets, 2013, 2013 IEEE 7th International Conference on Self-Adaptation and Self-Organizing Systems Workshops)
分布式纳什均衡(NE)寻优算法与多智能体控制理论
这组文献专注于在分布式、约束环境或切换拓扑下寻找纳什均衡的严谨数学方法。研究涵盖了多集群博弈、高阶非线性系统、二阶动力学以及抗攻击(如DoS)的鲁棒性分析,提供了关于算法收敛性、稳定性和同步控制的理论证明。
- Distributed Nash Equilibrium Seeking for Multicluster Game Under Switching Communication Topologies(X. Nian, Fuxi Niu, Zhuo Yang, 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems)
- Dynamic Nash Equilibrium Seeking for Constrained Noncooperative Game of Open Multiagent Systems(Jing-Zhe Xu, Zhiwei Liu, Ding-Xin He, Zhian Jia, Mingfeng Ge, 2025, IEEE Transactions on Systems, Man, and Cybernetics: Systems)
- Nash Equilibrium Seeking for Multi-Cluster Games of Second-Order Systems Over Weight-Unbalanced Digraphs(X. Nian, Dongxin Liu, Fan Li, 2024, IEEE Transactions on Circuits and Systems II: Express Briefs)
- On the linear convergence of distributed Nash equilibrium seeking for multi-cluster games under partial-decision information(Min Meng, Xiuxian Li, 2020, Autom.)
- Fully Distributed Adaptive Fuzzy Resilient Nash Equilibrium Seeking for Uncertain Multiagent Systems With Prescribed-Time Convergence(Yuanxin Li, Bo Xu, Shaocheng Tong, 2025, IEEE Transactions on Fuzzy Systems)
- Distributed Nash Equilibrium Seeking for Multi-Cluster Aggregative Games over Time-Varying Unbalanced Graphs*(Tianshun Lu, Jingzhao Zhao, Hongzhe Liu, Wenying Xu, Wenwu Yu, 2025, 2025 IEEE 64th Conference on Decision and Control (CDC))
- Generalized Nash equilibrium seeking strategy for distributed nonsmooth multi-cluster game(Xianlin Zeng, Jie Chen, Shu Liang, Yiguang Hong, 2019, Autom.)
- Distributed Fault Tolerant Control for Multi-Agent Systems with Sensor Faults in Non-Cooperative Games(Hao Wang, Xiao Zhang, Hao Luo, Xinyu Qiao, Mingyi Huo, Yuchen Jiang, 2023, 2023 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS))
- Distributed Nash Equilibrium Seeking in Disturbed Multi-agent Systems Under DoS Attacks(Hebing Zhang, Qun Lu, Zhezhou Shen, 2025, Journal of Optimization Theory and Applications)
- Prescribed-time distributed Nash equilibrium seeking subject to external disturbances(Shuaiyu Zhou, Xintong Ni, Zhenhua Deng, Yiheng Wei, 2024, 2024 14th Asian Control Conference (ASCC))
- Analysis of Equilibrium Points and Convergent Behaviors for Constrained Signed Networks(Qiang Song, Deyuan Meng, Guanghui Wen, Jinde Cao, Fang Liu, 2024, IEEE Transactions on Automatic Control)
- Fully Distributed Robust Adaptive Nash Equilibrium Seeking of High-Order Uncertain Nonlinear Systems(Bo Xu, Yuanxin Li, Zhongsheng Hou, 2025, IEEE Transactions on Systems, Man, and Cybernetics: Systems)
- Distributed Nash equilibrium seeking for aggregative games with coupled constraints(Shu Liang, Peng Yi, Yiguang Hong, 2016, Autom.)
- Game-Based Formation Control of High-Order Multi-Agent Systems(Zhenhua Deng, 2023, IEEE Transactions on Network Science and Engineering)
- Distributed Nash Equilibrium Seeking for Noncooperative Games of High-Order Nonlinear Multi-Agent Systems Over Weight-Unbalanced Digraphs(Zhenhua Deng, Jin Luo, 2021, ArXiv)
- Pursuit-Evasion Game Based Fault-Tolerant Control of Multi-Agent Systems(Xin Yang, Dong Zhao, Wenjing Ren, 2025, 2025 8th International Conference on Robotics, Control and Automation Engineering (RCAE))
- Distributed Nash equilibrium seeking for uncertain linear multi‐agent systems by output feedback integral control(Xianxian Zheng, Lan Wei, Peng Yi, Yutao Tang, 2025, Asian Journal of Control)
- Distributed Nash Equilibrium Seeking Algorithms for Uncertain Linear Multi-agent Systems(Yutao Tang, 2022, 2022 13th Asian Control Conference (ASCC))
- Constrained Aggregative Games with Second-Order Nonlinear Multi-Agent Systems on a Weight-Unbalanced Digraph(Wenyan Tang, Ying Shi, Jia Wu, Peihao Jin, 2025, 2025 44th Chinese Control Conference (CCC))
- Generalized Nash Equilibrium seeking of Second-Order Multi-Agent Systems With External Disturbance(Yong Chen, Jiarui Li, Fuxi Niu, Tao Yu, 2022, 2022 41st Chinese Control Conference (CCC))
- Game-Based Adaptive Optimization Approach for Multi-Agent Systems(Hao Wang, Zheyuan Ning, Hao Luo, Yuchen Jiang, M. Huo, 2023, 2023 IEEE International Conference on Industrial Technology (ICIT))
- Multi-agent differential game based cooperative synchronization control using a data-driven method(Yu-Jing Shi, Yongzhao Hua, Jianglong Yu, Xiwang Dong, Z. Ren, 2022, Frontiers of Information Technology & Electronic Engineering)
- Stability and Stabilization of Nash Equilibrium for Noncooperative Systems With Vector-Valued Payoff Functions(Zehui Guo, T. Hayakawa, Yuyue Yan, 2023, 2023 62nd IEEE Conference on Decision and Control (CDC))
- Collision-Free Formation Control of Multi-agent Systems with Switching Topology under Aggregation Game(Changping Yu, Yang Liu, Wenling Li, 2025, 2025 Joint International Conference on Automation-Intelligence-Safety (ICAIS) & International Symposium on Autonomous Systems (ISAS))
- Distributed Global Nash Equilibrium of Interactive Adversarial Graphical Games(Yizhong Zhang, Bosen Lian, F. L. Lewis, 2025, Journal of Systems Science and Complexity)
- Nash Equilibrium Topology of Multi-Agent Systems With Competitive Groups(Jingying Ma, Yuanshi Zheng, Long Wang, 2017, IEEE Transactions on Industrial Electronics)
- Decentralized tracking-type games for multi-agent systems with coupled ARX models: Asymptotic Nash equilibria(Tao Li, Ji-feng Zhang, 2008, Autom.)
- Pareto-Optimal Coverage Control for Multi-Objective Multi-Agent Systems: A General Sum Game Reinforcement Learning Perspective(Xiwen Ma, Zhihuan Hu, W. Xie, Jingsong Yang, Hongtian Chen, Weidong Zhang, 2024, 2024 IEEE International Conference on Control Science and Systems Engineering (ICCSSE))
- Equilibrium Seeking in Two-Agent Non-Cooperative Dynamic Game with Asymmetric Horizon Length(Taichi Tanaka, Yasuaki Wasa, T. Hayakawa, 2022, 2022 13th Asian Control Conference (ASCC))
资源分配、能源管理与机制设计中的博弈优化
该组论文将博弈论应用于实际资源调度,包括微电网能源管理、边缘计算任务卸载、V2X通信、卫星网络及区块链。主要利用Stackelberg博弈、联盟博弈和Walrasian机制,旨在平衡各方的经济利益、系统效率与公平性。
- Energy Management Optimization of Microgrid Cluster Based on Multi-Agent-System and Hierarchical Stackelberg Game Theory(Xingwei Dong, Xianshan Li, Shan Cheng, 2020, IEEE Access)
- Distributed Multi-Agent Uplink Resource Scheduling for Space–Air–Ground–Sea Networks: A Game-Theoretic Approach(Ruijing Zhou, Xuedou Xiao, Mozi Chen, Shengkai Zhang, Kezhong Liu, 2026, Journal of Marine Science and Engineering)
- A Game-Theoretic Learning Framework for Multi-Agent Intelligent Wireless Networks(Jinlong Wang, Ximing Wang, Jin Chen, Yijun Yang, Lijun Kong, Luliang Jia, Xin Liu, Yuhua Xu, 2018, ArXiv)
- Privacy-Preserving Intelligent Resource Allocation for Federated Edge Learning in Quantum Internet(Minrui Xu, Dusist Niyato, Zhaohui Yang, Zehui Xiong, Jiawen Kang, Dong In Kim, X. Shen, 2022, IEEE Journal of Selected Topics in Signal Processing)
- Cross-Device Distributed Federated Learning Coalition Formation Game for Constrained IoT(Stephane Durand, K. Khawam, Dominique Quadri, S. Lahoud, Steven Martin, 2025, IEEE Internet of Things Journal)
- Anti-Jamming Resource Allocation in Air-Terrestrial Integrated Networks: A Hierarchical Game-Theoretic MADRL Approach(Jiatao Du, Yifan Xu, Songyi Liu, Hao Han, Hui Tian, Zhibin Feng, Haichao Wang, Yuhua Xu, 2026, IEEE Transactions on Cognitive Communications and Networking)
- A Coalition Formation Game for Cooperative Spectrum Sensing in Cognitive Radio Network under the Constraint of Overhead(Utpala Borgohain, S. Borkotokey, S. Deka, 2021, Int. J. Commun. Networks Inf. Secur.)
- Multi-agent Negotiation Model for Resource Allocation in Grid Environment(Xiaoqin Huang, Linpeng Huang, Minglu Li, 2006, No journal)
- PPA-Game: Characterizing and Learning Competitive Dynamics Among Online Content Creators(Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui, 2024, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2)
- A game-theoretic approach to understand transaction mode selection in electric markets: an evolutionary multi-agent artificial intelligent based algorithm(Ran Ran, Jue Bo, Yubo Liu, Yu Xia, Fei Hu, Nan Hu, 2021, 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE))
- Consumption Pricing Mechanism of Scientific and Technological Resources Based on Multi-Agent Game Theory: An Interactive Analytical Model and Experimental Validation(Fanyin Zheng, F. Gu, Yangjian Ji, Jianfeng Guo, X. Gu, Jin Zhang, 2021, IEICE Trans. Inf. Syst.)
- A new Walrasian mechanism design for optimal pricing and resource allocation in heterogeneous wireless access networks(Vahid Haghighatdoost, S. Khorsandi, 2020, Wireless Networks)
- Game-theoretic analysis of Stackelberg oligopoly with arbitrary rank reflexive behavior of agents(M. Geraskin, 2017, Kybernetes)
- Embracing Complexity: Agent-Based Modeling for HetNets Design and Optimization via Concurrent Reinforcement Learning Algorithms(Mostafa Ibrahim, U. Hashmi, Muhammad Nabeel, A. Imran, S. Ekin, 2021, IEEE Transactions on Network and Service Management)
- Unified Nash Equilibrium Model for Water Management Strategies in Smart Cities(Trinh Bao Ngoc, Nguyen Minh Hieu, Do Ngoc Anh, Vu Huu Thong, To Thanh Thai, Hoang Phuong Thao, L. Chung, Pham Thi Tuyet, 2025, International Research Journal of Multidisciplinary Scope)
- Collaborative Optimal Dispatch of Multi-Agent Distributed Integrated Energy System Based on Game Theory(Hong Fan, Linlin Liu, Yanli Ma, Haowen Luo, Shuyang Chen, Mengmeng Zhuang, 2023, 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2))
- A Bilevel Approach to Integrated Surgeon Scheduling and Surgery Planning solved via Branch-and-Price(B. Maenhout, Přemysl Šůcha, Viktorie Valdmanov'a, Ondrej Tkadlec, Jana Thao Rozlivková, 2025, ArXiv)
- Equilibrium Analysis of Service Ecosystems for Labor-Intensive Services Using Multi-Agent Simulation(T. Takenaka, Takahiro Kushida, N. Nishino, K. Kurumatani, 2018, Int. J. Autom. Technol.)
- Files Delivery and Share Optimization in LEO Satellite-Terrestrial Integrated Networks: A NOMA based Coalition Formation Game Approach(Zhixiang Gao, A. Liu, Chen Han, Xiaohu Liang, 2021, IEEE Transactions on Vehicular Technology)
- Game-theoretic resource allocation for malicious packet detection in computer networks(O. Vanek, Zhengyu Yin, Manish Jain, B. Bošanský, Milind Tambe, M. Pechoucek, 2012, No journal)
- Joint Resource Allocation for V2X Communications With Multi-Type Mean-Field Reinforcement Learning(Yue Xu, Xiao Wu, Yi Tang, Jiaxing Shang, Linjiang Zheng, Liang Zhao, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Supermodular game based energy efficient power allocation in heterogeneous small cell networks(Haijun Zhang, Mengying Sun, Keping Long, Min Sheng, Victor C. M. Leung, 2017, 2017 IEEE International Conference on Communications (ICC))
- Incremental Distribution Network Source-load-storage Collaborative Planning Method Considering Uncertainty and Multi-agent Game(Ye He, Nan Yang, Bangtian Dong, Li Ding, Tao Qin, Yuehua Huang, Chen Chen, 2021, 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2))
- Incentive Analysis for Agent Participation in Federated Learning(Lihui Yi, Xiaochun Niu, Ermin Wei, 2025, 2025 IEEE 64th Conference on Decision and Control (CDC))
群体联盟形成与动态分组博弈机制
侧重于研究Agent如何自主形成联盟或分组以最大化效用。涉及Hedonic博弈、重叠联盟形成(OCF)以及在众包、D2D网络、NOMA通信和无人机数据采集等场景下的稳定分组策略与激励兼容设计。
- Balancing Cooperation and Competition: Selfish Worker Coalition Formation in Spatial Crowdsourcing(Liang Wang, Shan Su, Rong Cheng, Dingqi Yang, Lianbo Ma, Fei Xiong, Bin Guo, Zhiwen Yu, 2025, ACM Transactions on Intelligent Systems and Technology)
- Generalized User Grouping in NOMA Based on Overlapping Coalition Formation Game(Weichao Chen, Shengjie Zhao, Rongqing Zhang, Liuqing Yang, 2021, IEEE Journal on Selected Areas in Communications)
- Coalitional Formation-Based Group-Buying for UAV-Enabled Data Collection: An Auction Game Approach(Nan Qi, Zanqi Huang, Wen Sun, Shi Jin, Xiang Su, 2023, IEEE Transactions on Mobile Computing)
- A Coalition Formation Game Approach for Personalized Federated Learning(Leijie Wu, Song Guo, Yao-Xiang Ding, Yufeng Zhan, J. Zhang, 2022, ArXiv)
- Context-aware group buying in D2D networks: An overlapping coalition formation game approach(Lang Ruan, Jin Chen, Yueming Qiu, Xin Liu, Yuli Zhang, Xucheng Zhu, Yuhua Xu, 2017, 2017 IEEE 17th International Conference on Communication Technology (ICCT))
- Slot Allocation Protocol for UAV Swarm Ad Hoc Networks: A Distributed Coalition Formation Game Approach(Liubin Song, Daoxing Guo, 2025, Entropy)
- Nash Stable Outcomes in Fractional Hedonic Games: Existence, Efficiency and Computation(Vittorio Bilò, A. Fanelli, M. Flammini, G. Monaco, L. Moscardelli, 2018, J. Artif. Intell. Res.)
- Strategic group formation in agent-based simulation(Andrew J. Collins, Erika F. Frydenlund, 2016, SIMULATION)
- Proximity-based group formation game model for community detection in social network(Yuyao Wang, Jie Cao, Zhan Bu, Jiuchuan Jiang, Huanhuan Chen, 2021, Knowl. Based Syst.)
- Structural Stability of a Family of Spatial Group Formation Games(Chenlan Wang, Mehrdad Moharrami, Kun Jin, David Kempe, P. Brantingham, Mingyan Liu, 2024, IEEE Transactions on Network Science and Engineering)
- A Novel Hedonic Coalition Formation Game for Spectrum Shared Communication in CBRS Band(Seungkeun Park, Zhenyu Cao, Hu Jin, Swades De, Jun-Bae Seo, 2025, 2025 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN))
- Coalition Formation Game for Cost-Efficient Multiparty Payment Channel in Payment Channel Networks(Wooseong Kim, 2023, Sensors (Basel, Switzerland))
- Overlapping Coalition Formation Game via Multi-Objective Optimization for Crowdsensing Task Allocation(Yanming Fu, Xiao Liu, Weigeng Han, Shenglin Lu, Jiayuan Chen, Tianbing Tang, 2023, Electronics)
- A Repeated Coalition Formation Game for Physical Layer Security Aware Wireless Communications With Third-Party Intelligent Reflecting Surfaces(Haipeng Zhou, Ruoyang Chen, Changyan Yi, Jianjun Zhang, Jiawen Kang, Jun Cai, Mohsen Guizani, 2025, IEEE Transactions on Wireless Communications)
- A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs(Haipeng Zhou, Ruoyang Chen, Changyan Yi, Juan Li, Jun Cai, 2024, 2024 IEEE Wireless Communications and Networking Conference (WCNC))
- Group Formation through Game Theory and Agent-Based Modeling: Spatial Cohesion, Heterogeneity, and Resource Pooling(Chenlan Wang, Jimin Han, Diana Jue-Rajasingh, 2025, ArXiv)
- A Group Formation Game for Local Anomaly Detection(Zixin Ye, Tansu Alpcan, Christopher Leckie, 2023, 2023 62nd IEEE Conference on Decision and Control (CDC))
多智能体强化学习(MARL)中的学习动力学与策略博弈
探讨MARL与博弈论的深度融合,包括自对弈(Self-play)、奖励重新分配、风险感知决策、贝叶斯学习下的信念收敛,以及通过博弈理论解决MARL中的非平稳性、可扩展性和探索-利用平衡问题。
- Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory(Stefanos Leonardos, G. Piliouras, 2020, No journal)
- A Coordination Optimization Framework for Multi-Agent Reinforcement Learning Based on Reward Redistribution and Experience Reutilization(Bo Yang, Linghang Gao, Fan Zhou, Hongge Yao, Yanfang Fu, Zelong Sun, Feng Tian, H. Ren, 2025, Electronics)
- Strategic Interaction Multi-Agent Deep Reinforcement Learning(Wenhong Zhou, Jie Li, Yiting Chen, Lincheng Shen, 2020, IEEE Access)
- Risk-Aware Multi-Agent Multi-Armed Bandits(Qi Shao, Jiancheng Ye, John C. S. Lui, 2024, Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing)
- Accelerating Nash Q-Learning with Graphical Game Representation and Equilibrium Solving(Yunkai Zhuang, Xingguo Chen, Yang Gao, Yujing Hu, 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI))
- Credible Negotiation for Multi-agent Reinforcement Learning in Long-term Coordination(Tianlong Gu, Taihang Zhi, Xuguang Bao, Liang Chang, 2025, ACM Transactions on Autonomous and Adaptive Systems)
- ADAGE: A generic two-layer framework for adaptive agent based modelling(Benjamin Patrick Evans, Sihan Zeng, Sumitra Ganesh, Leo Ardon, 2025, No journal)
- Nash Equilibrium-Driven Adaptive Behavior in Swarm Intelligence with Self-Organizing Maps(Iftekher S Chowdhury, Hardique Dasore, B. Akhouri, Eric Howard, 2025, Journal of Information Systems Engineering and Management)
- Joint Task Offloading and Resource Optimization in NOMA-based Vehicular Edge Computing: A Game-Theoretic DRL Approach(Xincao Xu, Kai Liu, Penglin Dai, Feiyu Jin, Hualing Ren, Choujun Zhan, Songtao Guo, 2022, ArXiv)
- Multi-Agent Reinforcement Learning-Based Task Offloading with Evolutionary Game Theory for User Association in Edge Computing(Haoran Liu, Deqiang Li, 2026, 2026 6th International Conference on Consumer Electronics and Computer Engineering (ICCECE))
- Game Theory and Multi–Agent DRL Based Anti-Jamming Transmission for Integrated Air-Ground Network(Chengjian Liao, Kui Xu, Guojie Hu, Xiaochen Xia, Chen Wei, Wei Xie, Chunguo Li, Yurong Wang, 2024, IEEE Transactions on Vehicular Technology)
- Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services(Renxuan Tan, Rongpeng Li, Xiaoxue Yu, Xianfu Chen, Xing Xu, Zhifeng Zhao, 2025, IEEE Transactions on Mobile Computing)
- An End-Edge-Cloud Collaborative Scheduling Method Based on Deep Reinforcement Learning and Attention Mechanism Integration(Xin Li, Di Fang, Miao Yu, Sai Luo, Hongchen Liu, Baichuan Lin, 2025, 2025 7th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI))
- Multiagent Federated Reinforcement Learning for Secure Incentive Mechanism in Intelligent Cyber–Physical Systems(Minrui Xu, Jialiang Peng, B. B. Gupta, Jiawen Kang, Zehui Xiong, Zhenni Li, A. El-latif, 2021, IEEE Internet of Things Journal)
- Self-Organised Sequential Multi-Agent Reinforcement Learning for Closely Cooperation Tasks(Hao Fu, Mingyu You, Hongjun Zhou, Bin He, 2025, IEEE Robotics and Automation Letters)
- Is Knowledge Power? On the (Im)possibility of Learning from Strategic Interaction(Nivasini Ananthakrishnan, Nika Haghtalab, Chara Podimata, Kunhe Yang, 2024, ArXiv)
- Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System(Chin-wing Leung, Shuyue Hu, Ho-fung Leung, 2021, 2020 25th International Conference on Pattern Recognition (ICPR))
- Multi-agent Bayesian Learning with Adaptive Strategies: Convergence and Stability(Manxi Wu, Saurabh Amin, A. Ozdaglar, 2020, ArXiv)
- Monte-Carlo Search for an Equilibrium in Dec-POMDPs(Yang You, Vincent Thomas, F. Colas, O. Buffet, 2023, ArXiv)
- Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games(Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi, 2025, ArXiv)
- Enhancing Game Strategy Optimization Using Deep Reinforcement Learning(Jinhan Meng, 2025, IEEE Access)
- Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity(M. Garnelo, Wojciech M. Czarnecki, Siqi Liu, Dhruva Tirumala, Junhyuk Oh, G. Gidel, H. V. Hasselt, David Balduzzi, 2021, ArXiv)
演化博弈理论(EGT)与群体行为涌现分析
利用演化博弈分析大规模群体在长期互动中的策略演化。研究包括合作行为的涌现、社会媒体治理、语言竞争、疫苗接种决策以及在复杂网络上的创新扩散。同时探讨了行为经济学因素(如损失厌恶)对演化稳定性的影响。
- Understanding Emergent Behaviours in Multi-Agent Systems with Evolutionary Game Theory(H. Anh, 2022, ArXiv)
- Strategy Competition Dynamics of Multi-Agent Systems in the Framework of Evolutionary Game Theory(Jianlei Zhang, M. Cao, 2020, IEEE Transactions on Circuits and Systems II: Express Briefs)
- Multi Agent Path Finding using Evolutionary Game Theory(Sheryl Paul, Jyotirmoy V. Deshmukh, 2022, ArXiv)
- An evolutionary game theory-based optimal scheduling strategy for multi-agent distribution network operation considering voltage management(Ji-Won Lee, Mun-Kyeom Kim, 2022, IEEE Access)
- Research on the strategy for improving the utility of government social media information based on a multi-agent game model(Yi Feng, Shanshan Zhang, Xiaoyang Sun, 2024, Journal of Information Science)
- Dynamics of Task Allocation Based on Game Theory in Multi-Agent Systems(Chun-yan Zhang, Qiaoyu Li, Yuying Zhu, Jianlei Zhang, 2019, IEEE Transactions on Circuits and Systems II: Express Briefs)
- Agent-Network-Computation-Based Evolutionary Game Model in Language Competition(Hongrun Wu, Qiurong Wu, Zhenglong Xiang, Xiang Zhang, Lei Zhang, Yingpin Chen, Hui Wang, Jianhua Song, 2024, IEEE Transactions on Computational Social Systems)
- Dynamic Model of Collaboration in Multi-Agent System Based on Evolutionary Game Theory(Zhuozhuo Gou, Yansong Deng, 2021, Games)
- Epidemic prevalence information on social networks can mediate emergent collective outcomes in voluntary vaccine schemes(Anupama Sharma, Shakti N. Menon, V. Sasidevan, S. Sinha, 2017, PLOS Computational Biology)
- Bio-inspired Collective Decision-making on a Multi-Population(Wouter Baar, Leonardo Stella, Dario Bauso, 2025, 2025 IEEE 64th Conference on Decision and Control (CDC))
- Learning Innovation Diffusion as Complex Adaptive Systems through Model Building, Simulation, Game Play and Reflections(Jun Huang, Manu Kapur, 2012)
- Equilibrium seeking of higher-order networks under facet cover constraints.(Shaoyuan Niu, Xiang Li, 2024, Chaos)
- Loss-Averse Behavior May Destabilize Nash Equilibrium: Generalized Stability Results for Noncooperative Agents(Yuyue Yan, T. Hayakawa, 2021, 2021 60th IEEE Conference on Decision and Control (CDC))
- Stability Analysis of Nash Equilibrium in Loss-Aversion-Based Noncooperative Dynamical Systems(Yuyue Yan, T. Hayakawa, Nutthanun Thanomvajamun, 2019, 2019 IEEE 58th Conference on Decision and Control (CDC))
- Stability Analysis of Nash Equilibrium for Two-Agent Loss-Aversion-Based Noncooperative Switched Systems(Yuyue Yan, T. Hayakawa, 2020, IEEE Transactions on Automatic Control)
- Misspecified learning and evolutionary stability(Kevin He, Jonathan Libgober, 2025, J. Econ. Theory)
- Emergent behaviours in multi-agent systems with Evolutionary Game Theory(H. Anh, 2022, AI Communications)
- Explore emission reduction strategy and evolutionary mechanism under central environmental protection inspection system for multi-agent based on evolutionary game theory(Da-Hae Chong, Na Sun, 2020, Comput. Commun.)
- A Multi-Agent Symbiotic Evolution Model and Simulation Research of the Entrepreneurial Ecosystem(Xinyue Qin, Haiqing Hu, Tong Shi, 2026, Systems)
社会博弈、人机协作与网络安全防御
探讨博弈中的社会属性与对抗性防御。包括人机交互中的信任动态、社会权力的最大化、投票操纵、以及在恶意攻击下的防御博弈(如FlipIt模型、Byzantine模型)。旨在通过博弈建模提升社会公共利益与系统安全性。
- Strategic Human-Agent Interaction: From Promoting Traffic Safety to Search and Rescue(Ariel Rosenfeld, 2018, No journal)
- Strategic Mitigation of Agent Inattention in Drivers with Open-Quantum Cognition Models(Qizi Zhang, V. S. S. Nadendla, S. Balakrishnan, J. Busemeyer, 2021, ArXiv)
- Human-agent coordination in a group formation game(Tuomas Takko, Kunal Bhattacharya, Daniel Monsivais, K. Kaski, 2021, Scientific Reports)
- General Opinion Formation Games with Social Group Membership (Discussion Paper)(Vittorio Bilò, Diodato Ferraioli, Cosimo Vinci, 2022, No journal)
- Simulation Model Of Stampede Accident Based On Multi-Agent And Game Theory(Sui Yang, Baoyun Wang, Linna Li, 2021, 2021 2nd Asia Service Sciences and Software Engineering Conference)
- Democratic Forking: Choosing Sides with Social Choice(Ben Abramowitz, Edith Elkind, D. Grossi, E. Shapiro, Nimrod Talmon, 2021, ArXiv)
- A multi-agent game for studying human decision-making(Han Yu, Xinjia Yu, Sufang Lim, Jun Lin, Zhiqi Shen, C. Miao, 2014, No journal)
- Empirical Analysis of Reputation-aware Task Delegation by Humans from a Multi-agent Game(Han Yu, Han Lin, Sufang Lim, Jun Lin, Zhiqi Shen, C. Miao, 2015, No journal)
- Cognitive Model of Trust Dynamics Predicts Human Behavior within and between Two Games of Strategic Interaction with Computerized Confederate Agents(Michael G. Collins, I. Juvina, K. Gluck, 2016, Frontiers in Psychology)
- Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics(Ardian Selmonaj, Alessandro Antonucci, Adrian Schneider, Michael Rüegsegger, Matthias Sommer, 2025, ArXiv)
- A Multi-Layered AI-Driven Cybersecurity Architecture: Integrating Entropy Analytics, Fuzzy Reasoning, Game Theory, and Multi-Agent Reinforcement Learning for Adaptive Threat Defense(Eram Fatima Siddiqui, Mohammad Haleem, Sheikh Fahad Ahmad, Amina Salhi, Abu Taha Zamani, Naushad Varish, 2025, IEEE Access)
- PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense(Xavier Cadet, Simona Boboila, Sie Hendrata Dharmawan, Alina Oprea, Peter Chin, 2025, ArXiv)
- Teamwork Makes the Defense Work: Comprehensive Vulnerability Defense Resource Allocation(Siyu Liu, R. Bazzi, Fei Fang, Tiffany Bao, 2025, No journal)
- BAR Nash Equilibrium and Application to Blockchain Design(Maxime Reynouard, R. Laraki, Olga Gorelkina, 2024, Games Econ. Behav.)
- Mechanism Design for Defense Coordination in Security Games(Jiarui Gan, Edith Elkind, Sarit Kraus, M. Wooldridge, 2020, No journal)
- GUARDS: game theoretic security allocation on a national scale(J. Pita, Milind Tambe, Christopher Kiekintveld, Shane Cullen, Erin Steigerwald, 2011, No journal)
- Maximizing social power in multiple independent Friedkin-Johnsen models(Lingfei Wang, Yu Xing, Claudio Altafini, K. H. Johansson, 2024, 2024 European Control Conference (ECC))
- Controlling Segregation in Social Network Dynamics as an Edge Formation Game(Rui Luo, Buddhika Nettasinghe, V. Krishnamurthy, 2021, IEEE Transactions on Network Science and Engineering)
- Strategic Majoritarian Voting with Propositional Goals(Arianna Novaro, Umberto Grandi, Dominique Longin, E. Lorini, 2019, No journal)
- Practical Specification of Belief Manipulation in Games(Markus Eger, Chris Martens, 2021, No journal)
特定工程领域与大规模系统的博弈建模
涵盖了均值场博弈(MFG)以应对大规模系统的维度灾难,以及在无人机集群控制、自动驾驶、F1赛车、暖通空调优化等特定工业场景下的博弈应用。这些研究通常结合了模型预测控制(MPC)与多目标优化。
- Large-Scale Multi-Agent System Optimization with Fixed Final Density Constraints: An Imbalanced Mean-Field Game Theory(Shawon Dey, Hao Xu, 2024, 2024 American Control Conference (ACC))
- Balancing Efficiency and Complexity in Large Scale Multi-Agent Optimization: A Reconfigurable Mean Field Game Approach(Shawon Dey, Lijun Qian, Hao Xu, 2025, 2025 IEEE Conference on Control Technology and Applications (CCTA))
- Mathematical Framework for ABM-MARL Integration in Financial Systems: A Discrete Multi-Agent Population-Strategy Game Approach(Manas Panda, Bhakta Samal, 2025, International Journal For Multidisciplinary Research)
- Multi-Agent Optimal Control for Central Chiller Plants Using Reinforcement Learning and Game Theory(Shunian Qiu, Zhenhai Li, Zhihong Pang, Zhengwei Li, Yinying Tao, 2023, Syst.)
- Game theory-based MPC control strategy for path following of reconfigurable unmanned ground vehicle: A multi-agent control method(Tiezhen Wang, Xu Yang, Minghao Ma, Hangjie Cen, Zhangzhen Deng, 2025, Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering)
- Multi-Objective Scheduling for Green Flexible Assembly Job-Shop System via Multi-Agent Deep Reinforcement Learning With Game Theory(Xiao Wang, Zhongyuan Liang, Peisi Zhong, Dongmin Li, Hongqi Li, Mei Liu, 2025, IEEE Access)
- Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games(Jiequn Han, Ruimeng Hu, 2019, No journal)
- Cooperative Control of Multi-UAV for Multi-Targets Encirclement and Tracking Based on Potential Game(Jiangbo Jia, Xin Chen, Weizhen Wang, Hongjin Liao, Guangyuan Zhu, 2023, 2023 42nd Chinese Control Conference (CCC))
- Parallel multi-speed Pursuit-Evasion Game algorithms(Renato Fernando dos Santos, R. Ramachandran, Marcos A. M. Vieira, G. Sukhatme, 2023, Robotics Auton. Syst.)
- An LSTM-based Game Theory Method for Multi-Agent DecisionMaking in Highway Scenarios(Siyuan Hu, Chaojie Zhang, J. Wang, 2023, 2023 American Control Conference (ACC))
- A multi-agent path planning algorithm based on game theory and reinforcement learning(Wenbo Xiong, Lei Guo, T. Jiao, 2024, Journal of Shenzhen University Science and Engineering)
- Game Theory in Formula 1: Multi-agent Physical and Strategical Interactions(Giona Fieni, Marc-Philippe Neumann, Francesca Furia, Alessandro Caucino, Alberto Cerofolini, V. Ravaglioli, C. Onder, 2025, ArXiv)
- Solving multi-agent games on networks(Yair Vaknin, A. Meisels, 2025, Autonomous Agents and Multi-Agent Systems)
- Research on multi-agent collaborative hunting algorithm based on game theory and Q-learning for a single escaper(Yanbin Zheng, Wenxin Fan, Mengyun Han, 2020, Journal of Intelligent & Fuzzy Systems)
- VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play(Zelai Xu, Chao Yu, Ruize Zhang, Huining Yuan, Xiangmin Yi, Shilong Ji, Chuqi Wang, Wenhao Tang, Yu Wang, 2025, ArXiv)
合并后的分组全面覆盖了群体智能体博弈与反思的研究前沿。研究版图呈现出明显的层次化特征:底层以分布式控制理论和纳什均衡寻优为数学基础;中层通过多智能体强化学习和演化博弈动力学实现策略的自适应演进与群体行为涌现;高层则引入大语言模型(LLM)赋予智能体心智理论、战略反思与社会推理能力。同时,这些理论在能源管理、网络安全、人机协作及大规模工业调度等实际场景中得到了广泛应用,体现了从“理性计算”向“认知反思”与“社会协同”跨越的发展趋势。
总计223篇相关文献
No abstract available
SUMMARY In the context of Web 2.0, the interaction between users and resources is more and more frequent in the process of resource sharing and consumption. However, the current research on resource pricing mainly focuses on the attributes of the resource itself, and does not weigh the interests of the resource sharing participants. In order to deal with these problems, the pricing mechanism of resource-user interaction evaluation based on multi-agent game theory is established in this paper. Moreover, the user similarity, the evaluation bias based on link analysis and punishment of academic group cheating are also included in the model. Based on the data of 181 scholars and 509 articles from the Wanfang database, this paper conducts 5483 pricing experiments for 13 months, and the results show that this model is more e ff ective than other pricing models - the pricing accuracy of resource resources is 94.2%, and the accuracy of user value evaluation is 96.4%. Besides, this model can intuitively show the relationship within users and within resources. The case study also exhibits that the user’s knowledge level is not positively correlated with his or her authority. Discovering and punishing academic group cheating is conducive to objectively evaluating researchers and resources. The pricing mechanism of scientific and technological resources and the users proposed in this paper is the premise of fair trade of scientific and technological resources.
Government social media (GSM) has become an important tool for government departments to open information, guide public opinion and interact with the government and the people. However, the operation and maintenance of some GSM are not standardised, and the content published is inconsistent with identity positioning, resulting in the realistic dilemma of low utility of GSM information. The purpose of this study is to explore the effective strategies to improve the effectiveness of GSM information. The research is from the perspective of information economics, this article uses evolutionary game theory to build a tripartite evolutionary game model comprising GSM operations departments, government regulators and users in order to explore the evolution process of tripartite game behaviours and the influence of subject behaviour selection on information utility. It subsequently conducts a solution and numerical simulation to demonstrate the influence of different factors on the game results. The experimental results show that there are four situations in which the utility of GSM information affects the evolution and stability strategy of the subject and that changes in different parameter values have significant effects on the results of the three-party game. The evolution trend of the subject behaviour can be changed by increasing the regulatory means of rewards and punishments and establishing an efficient operation mechanism for GSM, thus promoting system convergence to the ideal state. The results of this study can provide references and suggestions for government departments to effectively enhance the effectiveness of GSM information and promote the healthy development of GSM.
In discrete manufacturing systems, production scheduling techniques are urgently needed with lacking dynamic response-ability, inferior real-time decision-making, and sharply increasing uncertainty in the production process. In this regard, this paper presents the flexible assembly job-shop scheduling (FAJS) problem that incorporates job shop and assembly shop. And, a multi-agent deep reinforcement learning (MA-DRL) system with game theory is proposed to solve the proposed FAJS problem aiming at minimizing the Makespan, total energy consumption, and human factor’s comfort. A mathematical model is formulated to describe the FAJS problem, which then is translated into a Markov Decision Process (MDP) where an agent directly selects behavioral policies according to the processing state of the current decision point. The processing state feature data that uses a deep convolutional neural network to fit the value function is extracted from three matrices including the processing time matrix, task assignment Boolean matrix, and an adjacency matrix. Simple Constructive Heuristics (SCH) are used as candidate action for scheduling decisions. For multi-objective optimization into a reward strategy problem, a game model combining Nash equilibrium and Pareto optimality is established to obtain an objective unification solution. A multi-agent deep deterministic policy gradient (MA-DDPG) framework is designed to train the proposed MA-DRL model. The case ‘MK’ with four assembly constraints is formed as the novel datasets. These datasets and a case of the plunger pump production workshop scheduling scenario are designed to analyze the basic performance of this algorithm. The results show that the proposed algorithm compares with other single-indicator optimal algorithms with 9.47 per cent, 12.66 per cent and 22.86 per cent respectively for the three-evaluation metrics (GD, and HV), which contributed to solving the practical production problem of the manufacturing industry.
As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has profound implications for safety, coordination, and the design of AI-driven social and economic infrastructures. Assessing such behaviour requires methods that capture not only what LLMs output, but the underlying intentions that guide their decisions. In this work, we extend the FAIRGAME framework to systematically evaluate LLM behaviour in repeated social dilemmas through two complementary advances: a payoff-scaled Prisoners Dilemma isolating sensitivity to incentive magnitude, and an integrated multi-agent Public Goods Game with dynamic payoffs and multi-agent histories. These environments reveal consistent behavioural signatures across models and languages, including incentive-sensitive cooperation, cross-linguistic divergence and end-game alignment toward defection. To interpret these patterns, we train traditional supervised classification models on canonical repeated-game strategies and apply them to FAIRGAME trajectories, showing that LLMs exhibit systematic, model- and language-dependent behavioural intentions, with linguistic framing at times exerting effects as strong as architectural differences. Together, these findings provide a unified methodological foundation for auditing LLMs as strategic agents and reveal systematic cooperation biases with direct implications for AI governance, collective decision-making, and the design of safe multi-agent systems.
In the face of increasingly sophisticated cyberattacks, including adaptive adversaries and stealthy anomalies, key features of defense mechanisms should be effective, interpretable, and theoretically rooted. Conventional intrusion detection systems are typically based on a single-paradigm machine learning model which can be effective (because it is optimized for conditions), but fail in generalizability and falling back on an explanation of its prediction. This paper outlines a multi-layered AI-enabled cyber defense framework that integrates entropy analytics, fuzzy inference, game-theoretic defense, and multi-agent reinforcement learning (MARL) inside a closed-loop adaptive architecture. In its simplest form, the novelty of the paper is that, four functional paradigms - uncertainty quantification, interpretability, strategic adversarial thinking, and live policy adaptation - are placed into a single coherent system. The framework operates as sequential and feedback salients - entropy analytics quantify the uncertainty in are states, fuzzy inference end maps the uncertainty into qualitative decision rules, game theory shapes defender - attacker towards equilibrium strategies, and MARL dynamically updates those strategies for convergence and long term adaptation. The empirical work on appropriate benchmark intrusion detection datasets consistently outperformed baseline systems including the DDN, Fed-ID, AG-IDS, DL-FL systems producing a 6-12% increase in detection accuracy, lower false positive rates from non-intrusions, and a faster convergence, with adversarial examples across multiple epochs. Also, practical case studies reveal a level of improved explainability in threat classification and anomaly detection, which equates to practical interpretability for security analysts from the framework. The major contributions of the work are threefold: 1) an integrated multi-layered AI-based cybersecurity framework, 2) theoretical robustness results in bounded adversarial models, and 3) performance and interpretability form the systematic empirical evaluations over multiple datasets.
No abstract available
The reconfigurable vehicle (RV) can assemble and disassemble, which is an innovation to the traditional fixed configuration vehicle. The authors propose a concept of reconfigurable unmanned ground vehicle (RUGV), which consists of maneuvering modules (MM) and functional modules (FM) and can greatly broaden civilian unmanned vehicle application scenarios. The reconfiguration of RUGV is not only the connection of the mechanical system but also the control system. The traditional control strategy can not meet the variety of control systems and actuator topology resulting from reconfiguration, so it is necessary to research the reconfiguration control technology of the RV. Since the path tracking problem is a hot issue in the unmanned vehicle field, this paper investigates the reconfiguration control problem in path tracking. To this end, this paper proposes a reconfigurable platform of RUGV, which is composed of single-axle and two-wheel configuration MM called cell unit (CU). To express the different configurations of RUGV, a universal dynamic model (UDM) of RUGV is developed by a vectorized modeling approach. Based on this model, a game theory-based model predictive control (GMPC) path tracking controller is designed, whose sub-GMPC optimization problem is solved by Nash equilibrium game strategy. Numerous simulations are carried out to verify and compare the proposed strategy. The simulation results show that the GMPC can handle RUGV path tracking and speed tracking simultaneously and can also optimize lateral dynamics stability. By comparing with the holistic model predictive control (HMPC) strategy, the GMPC method has almost the same control performance but a shorter average single-step compute time. The proposed strategy also features greater flexibility in RUGV actuators’ topology changes as well as robustness against actuators’ faults.
This paper proposes an anti-jamming transmission algorithm based on game theory and multi-agent deep reinforcement learning (MADRL) for the integrated air-ground network. For uplink process, we propose UAV deployment schemes based on congestion game model and dynamic networking schemes based on coalition game, aiming to counteract malicious jamming from ground and air jammers, effectively enhancing the anti-jamming transmission capability of air-ground networks. For downlink process, to address the joint trajectory and power optimization problem, a partially observable Markov decision process (POMDP) framework is utilized, follows a centralized training and distributed execution framework. During the centralized training process, experiences of each agent interacting with the environment are stored in an experience replay pool and then used to train the soft actor-critic network. This process is conducted on the high altitude platforms (HAP) to alleviate the burden of unmanned aerial vehicle (UAV). During the distributed execution process, each UAV uses the trained actor network to output actions based on observations and adjust its flight position and transmission power for joint service provision. To update the parameters of the soft actor-critic network, an improved proximal policy optimization (PPO) algorithm is proposed. Simulation results demonstrate that the proposed method outperforms traditional algorithms in terms of achieving higher system achievable sum rate, lower power consumption, and faster convergence speed.
In order to solve the problems of low efficiency and difficult practical application of the centralized method in the coordinated control of urban road network signals, this paper proposes a game-based multi-intersection cooperative control method. By constructing a distributed game model, the signal control between intersections is regarded as a game relationship, and the Nash equilibrium solution is solved by using a mixed strategy game, and the signal control strategies of each intersection are obtained. At the same time, combined with the multi-agent reinforcement learning framework, Nash Q-learning is used to update the benefit matrix to realize the learning strategy of the agent. Simulation results show that the proposed method can significantly reduce the average delay and waiting time under low traffic demand [1], and outperforms single-agent control under high traffic demand, effectively avoiding traffic congestion and reducing network delay.
This paper presents a novel distributed optimization algorithm for large-scale multi-agent systems (LS-MAS), particularly with a given fixed final density constraint. Although the Mean field game (MFG) theory provides a distribution solution to overcome the “Curse of dimensionality” in LS-MAS, it significantly sacrifices LS-MAS optimality and also not be capable of achieving arbitrary fixed final probability density function (PDF) constraint. To overcome these challenges, a novel Imbalanced Mean-Field Game (Imb-MFG) theory is developed along with an adaptive PDF decomposition algorithm and distributed reinforcement learning. Specifically, an induction-based PDF parameter estimation is developed to decompose the final density constraints into multiple imbalanced norm distributions. Then, the Imb-MFG theory is designed by integrating multi-group MFG with a constrained K-means clustering algorithm. To solve the developed Imb-MFG and further obtain the distributed optimal solution, a multi-actor-critic-mass (Multi-ACM) algorithm is designed to learn the solution of multi-group coupled Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations simultaneously. Finally, the convergence of the developed Multi-ACM algorithm is guaranteed through Lyapunov analysis.
How to obtain the optimal decision-making scheme based on the investment behavior of various stakeholders is an important issue that needs to be solved urgently in incremental distribution network planning. To this end, this paper introduces the virtual player “Nature” to realize the combination of game theory and robust optimization, and proposes an incremental distribution network source-load collaborate planning method with uncertainties and multi-agent game. Firstly, the planning and decision-making models of DG investment operators, distribution network companies, power consumers and distributed energy storage (DES) investment operators are constructed respectively. Then, the static game behavior between DG investment operators and distribution network companies is analyzed based on the transfer relations between the four participants. At the same time, robust optimization is used to deal with the uncertainty of DG output, and the virtual player “Nature” is introduced to study the dynamic game behavior between the DG investment operator and the distribution company. Finally, a dynamic-static joint game planning model is proposed. The simulation results verify the correctness and effectiveness of the proposed method.
No abstract available
To conserve building energy, optimal operation of a building’s energy systems, especially heating, ventilation and air-conditioning (HVAC) systems, is important. This study focuses on the optimization of the central chiller plant, which accounts for a large portion of the HVAC system’s energy consumption. Classic optimal control methods for central chiller plants are mostly based on system performance models which takes much effort and cost to establish. In addition, inevitable model error could cause control risk to the applied system. To mitigate the model dependency of HVAC optimal control, reinforcement learning (RL) algorithms have been drawing attention in the HVAC control domain due to its model-free feature. Currently, the RL-based optimization of central chiller plants faces several challenges: (1) existing model-free control methods based on RL typically adopt single-agent scheme, which brings high training cost and long training period when optimizing multiple controllable variables for large-scaled systems; (2) multi-agent scheme could overcome the former problem, but it also requires a proper coordination mechanism to harmonize the potential conflicts among all involved RL agents; (3) previous agent coordination frameworks (identified by distributed control or decentralized control) are mainly designed for model-based control methods instead of model-free controllers. To tackle the problems above, this article proposes a multi-agent, model-free optimal control approach for central chiller plants. This approach utilizes game theory and the RL algorithm SARSA for agent coordination and learning, respectively. A data-driven system model is set up using measured field data of a real HVAC system for simulation. The simulation case study results suggest that the energy saving performance (both short- and long-term) of the proposed approach (over 10% in a cooling season compared to the rule-based baseline controller) is close to the classic multi-agent reinforcement learning (MARL) algorithm WoLF-PHC; moreover, the proposed approach’s nature of few pending parameters makes it more feasible and robust for engineering practices than the WoLF-PHC algorithm.
No abstract available
We propose the use of game-theoretic solutions and multi- agent Reinforcement Learning in the mechanism design of smart, sustainable mobility services. In particular, we present applications to ridesharing as an example of a cost game.
Decision-making plays a critical role in autonomous driving, connecting the upstream perception task and the downstream planning task. The interaction among vehicles and the uncertainty of driving intentions are two main challenges of lane change decision-making. To meet these challenges, a game theory method based on the LSTM (Long Short-Term Memeory) neural network is proposed. The game theory method is adopted to model the interaction among vehicles and the LSTM network is used to precisely fit the complex nonlinear relationship between payoffs and features. For highway scenarios with ramps, discretionary and mandatory lane change scenarios are specifically extracted and the lane priority is added to features. The improved GPSO-DE optimizer is used to accelerate network training and reduce the local optimal solutions. Finally, experiments on real-world dataset NGSIM I-80 show that the prediction accuracy of other vehicles’ intentions and the decision-making accuracy of ego vehicles have reached state-of-the-art performance. Moreover, the model is capable of improving the robustness of decision-making and reducing unreasonable jumps effectively.
The escalating demands of energy consumption and environmental conservation necessitate urgent advancements in energy infrastructures. In this context, Deep Integration of Multi-energy Systems (DIMS), which concentrates on the seamless integration of advanced multi-energy and information technologies, is considered the most promising strategy for resilient energy utilization in future societies. In a setting where multiple entities participate across source-network-load-storage links, achieving a balanced equilibrium that maximizes collective benefits is crucial. It therefore becomes vital to create a Distributed Integrated Energy System (DIES) operation strategy that aligns with an open energy market. To address this, we first introduce an Integrated Energy Management System (IEMS) to coordinate internal energy transactions. We propose a distributed hierarchical game architecture, taking bounded rationality into account. Under this mechanism, the upper IEMS modulates the electricity price strategy, while the lower entities achieve autonomous optimization scheduling through multi-energy collaboration and Synergy. These entities then engage in multi-agent cooperative optimization as prosumers. Furthermore, we employ the game mechanism to leverage load discrepancies, flexible role switching, and demand response, thereby effectively improving the load curve. To validate the effectiveness of our method, we employ a comparative analysis through a simulation on a DIES comprising of an IEEE 33-node network and an optimized 7-node natural gas network. This demonstrates the superiority of our proposed method.
The mechanisms of emergence and evolution of collective behaviours in dynamical Multi-Agent Systems (MAS) of multiple interacting agents, with diverse behavioral strategies in co-presence, have been undergoing mathematical study via Evolutionary Game Theory (EGT). Their systematic study also resorts to agent-based modelling and simulation (ABM) techniques, thus enabling the study of aforesaid mechanisms under a variety of conditions, parameters, and alternative virtual games. This paper summarises some main research directions and challenges tackled in our group, using methods from EGT and ABM. These range from the introduction of cognitive and emotional mechanisms into agents’ implementation in an evolving MAS, to the cost-efficient interference for promoting prosocial behaviours in complex networks, to the regulation and governance of AI safety development ecology, and to the equilibrium analysis of random evolutionary multi-player games. This brief aims to sensitize the reader to EGT based issues, results and prospects, which are accruing in importance for the modeling of minds with machines and the engineering of prosocial behaviours in dynamical MAS, with impact on our understanding of the emergence and stability of collective behaviours. In all cases, important open problems in MAS research as viewed or prioritised by the group are described.
Distribution system operators (DSOs) have difficulty in scheduling distributed energy resources owing to the increasing power demand and penetration of renewable energy. The goal of this study is to determine the charging/discharging of PV energy-integrated energy storage system (PV-ESS), EV charging price, and demand response (DR) incentive values considering voltage management. To achieve the optimal energy operation for a distribution network, this study proposes an evolutionary game theory (EGT)-based new scheduling strategy, considering voltage management for a multi-agent system (MAS). The EGT, which is a decision-making strategy, is used by agents to cooperate and derive the best scheduling with their own behavior pattern functions to minimize the system operating cost. Photovoltaic-energy storage systems, electric vehicles charging power, and loads can perform charging/discharging scheduling, electric vehicle charging planning, and demand response participation, respectively. Under DSO supervision, a reward that stabilizes the voltage profile of the power distribution system is also implemented during the cooperation process. The proposed energy scheduling strategy combines an EGT-based decision-making with particle swarm optimization (PSO) to solve the optimization problem and determine the payoff function through self-evolutionary improvement. The effectiveness of the EGT-PSO has been analyzed for an IEEE 33-bus distribution system, and the results demonstrate that the proposed scheduling strategy not only achieves the most economical decision among agents but also manages the voltage profile.
The mechanisms of emergence and evolution of collective behaviours in dynamical Multi-Agent Systems (MAS) of multiple interacting agents, with diverse behavioral strategies in co-presence, have been undergoing mathematical study via Evolutionary Game Theory (EGT). Their systematic study also resorts to agent-based modelling and simulation (ABM) techniques, thus enabling the study of aforesaid mechanisms under a variety of conditions, parameters, and alternative virtual games. This paper summarises some main research directions and challenges tackled in our group, using methods from EGT and ABM. These range from the introduction of cognitive and emotional mechanisms into agents' implementation in an evolving MAS, to the cost-efficient interference for promoting prosocial behaviours in complex networks, to the regulation and governance of AI safety development ecology, and to the equilibrium analysis of random evolutionary multi-player games. This brief aims to sensitize the reader to EGT based issues, results and prospects, which are accruing in importance for the modeling of minds with machines and the engineering of prosocial behaviours in dynamical MAS, with impact on our understanding of the emergence and stability of collective behaviours. In all cases, important open problems in MAS research as viewed or prioritised by the group are described.
Currently, mobile ad hoc networks (MANETs) are widely used due to its self-configuring feature. However, it is vulnerable to the malicious jammers in practice. Traditional anti-jamming approaches, such as channel hopping based on deterministic sequences, may not be the reliable solution against intelligent jammers due to its fixed patterns. To address this problem, we propose a distributed game theory-based multi-agent anti-jamming (DMAA) algorithm in this paper. It enables each user to exploit all information from its neighboring users before the network attacks, and derive dynamic local policy knowledge to overcome intelligent jamming attacks efficiently as well as guide the users to cooperatively hop to the same channel with high probability. Simulation results demonstrate that the proposed algorithm can learn an optimal policy to guide the users to avoid malicious jamming more efficiently and rapidly than the random and independent Q-learning baseline algorithms,
In this paper, we consider the problem of path � nding for a set of homogeneous and autonomous agents navigating a previously unknown stochastic environment. In our problem setting, each agent attempts to maximize a given utility function while respecting safety properties. Our solution is based on ideas from evolutionary game theory, namely replicating policies that perform well and diminishing ones that do not. We do a comprehensive comparison with related multiagent planning methods, and show that our technique beats state of the art RL algorithms in minimizing path length by nearly 30% in large spaces. We show that our algorithm is computationally faster than deep RL methods by at least an order of magnitude. We also show that it scales better with an increase in the number of agents as compared to other methods, path planning methods in particular. Lastly, we empirically prove that the policies that we learn are evolutionarily stable and thus impervious to invasion by any other policy.
Multi-agent collaboration is greatly important in order to reduce the frequency of errors in message communication and enhance the consistency of exchanging information. This study explores the process of evolutionary decision and stable strategies among multi-agent systems, including followers, leaders, and loners, involved in collaboration based on evolutionary game theory (EGT). The main elements that affected the strategies are discussed, and a 3D evolution model is established. The evolutionary stability strategy (ESS) and stable conditions were analyzed subsequently. Numerical simulation results were obtained through MATLAB simulation, and they manifested that leaders play an important role in exchanging information with other agents, accepting agents’ state information, and sending messages to agents. Then, with the positivity of receiving and feeding back messages for followers, implementing message communication is profitable for the system, and the high positivity can accelerate the exchange of information. At the behavior level, reducing costs can strengthen the punishment of impeding the exchange of information and improve the positivity of collaboration to facilitate the evolutionary convergence toward the ideal state. Finally, the EGT results revealed that the possibility of collaboration between loners and others is improved, and the rewards are increased, thereby promoting the implementation of message communication that encourages leaders to send all messages, improve the feedback positivity of followers, and reduce the hindering degree of loners.
To realize the win-win benefits and resource coordination of the multilevel operating entities of a “microgrid cluster (MGC), microgrid (MG) and user” and improve the self-consumption of new energy in the MGC, this paper proposes an energy trading model and solution algorithm of an “MGC, MG and user” based on a multi-agent-system, incentive demand response, and hierarchical Stackelberg game theory. By analyzing the game objectives and strategies of these participants, the unique Stackelberg equilibrium (SE) of the hierarchical Stackelberg game is proved theoretically. The game optimization process is divided into two levels. In the upper-level game, the MGC as a leader stimulates the MG to participate in intracluster dispatching by establishing an internal price incentive mechanism. As the follower, the MG determines the number of electricity transactions based on the realized internal price to maximize its own profits. In the lower-level game, the MG leads the game by deciding electricity selling prices based on the load demands of users, and the user as follower adjust electricity consumption using strategies to balance expenditure and experience of electricity usage. Simulation results verified the effectiveness and good convergence of the proposed method and demonstrated that the proposed hierarchical game strategy can improve the economic benefits of each participant, which is conducive to the establishment of friendly grid-connected MGC.
Intelligent autonomous vehicles navigating in a smart city environment need to find a parking spot, and often resort to multi-storey parking facilities. The task of efficiently using parking resources is at odds with the competitive nature of autonomous vehicles operating in a selfish way. In this paper, we model the problem through game theory and we evaluate the efficiency of a distributed decision mechanism. At the same time, we also gain insight on the complexity of identifying efficient solutions and hint that the overall problem is difficult to solve without compromising the inherent selfish objective of each individual vehicle. We also propose some distributed simulation scenarios to capture some aspects of the competition, thereby suggesting possible further analytical studies.
Crowd stampede accident, as one of the main man-made accidents in large public gathering places, has become a problem that science must face today. In order to explore the formation mechanism of crowd stampede in public places, this paper establishes a theoretical model of crowd evacuation based on multi-agent technology. This paper adopts a combination of theory-modeling-simulation method, with the Bandatia Bridge in India as the background, analyzes the changes in various stages of large-scale crowded stampede accidents from the walking path of the crowd, rationality and irrationality. The experimental results show that the model can simulate the behavior change of the crowd during evacuation, and provide theoretical and technical support for the study of the formation mechanism of crowd stampede.
Effective multi-agent collaboration requires agents to infer the rationale behind others'actions, a capability rooted in Theory-of-Mind (ToM). While recent Large Language Models (LLMs) excel at logical inference, their ability to infer rationale in dynamic, collaborative settings remains under-explored. This study introduces LLM-Hanabi, a novel benchmark that uses the cooperative game Hanabi to evaluate the rationale inference and ToM of LLMs. Our framework features an automated evaluation system that measures both game performance and ToM proficiency. Across a range of models, we find a significant positive correlation between ToM and in-game success. Notably, first-order ToM (interpreting others'intent) correlates more strongly with performance than second-order ToM (predicting others'interpretations). These findings highlight that for effective AI collaboration, the ability to accurately interpret a partner's rationale is more critical than higher-order reasoning. We conclude that prioritizing first-order ToM is a promising direction for enhancing the collaborative capabilities of future models.
Originating in psychology, $\textit{Theory of Mind}$ (ToM) has attracted significant attention across multiple research communities, especially logic, economics, and robotics. Most psychological work does not aim at formalizing those central concepts, namely $\textit{goals}$, $\textit{intentions}$, and $\textit{beliefs}$, to automate a ToM-based computational process, which, by contrast, has been extensively studied by logicians. In this paper, we offer a different perspective by proposing a computational framework viewed through the lens of game theory. On the one hand, the framework prescribes how to make boudedly rational decisions while maintaining a theory of mind about others (and recursively, each of the others holding a theory of mind about the rest); on the other hand, it employs statistical techniques and approximate solutions to retain computability of the inherent computational problem.
Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.
Abstract In response to the deteriorating environmental pollution, China has formulated a series of environmental regulation policies. However, political centralization and economic decentralization have subjected the local governments (LGs) to the dual constraints of environmental protection and economic development during the implementation of environmental regulation, which has resulted in the failure of conventional environmental regulation. To this end, the central government (CG) has established a central environmental protection inspection system (CEPIS). However, the existing literature has insufficiently studied the role and policy tools of the CG for inspecting the LGs and the polluting enterprises (PEs) to implement environmental regulation, and lacked systematic analysis of the strategic interactions and policy effects caused by CEPIS. Evolutionary game theory (EGT) provides a powerful tool with which to unpack the interactive strategies of Multi-Agent in China. By exploring the evolution of different participants’ behavior and their evolutionary stable strategy (ESS), EGT enables a robust, quantitative analysis of this three-party game. Simultaneously, some numerical examples serve to verify the theoretical results and support four key insights. First, the selection of environmental strategies manifests in a dynamic process of constant adjustment and optimization. Second, the whole evolutionary game system can converge on an ideal state under certain conditions. Third, increasing the investments in the special transfer payments of environmental protection, expanding the scale of emissions trading market, and improving the emission reduction benefits can motivate the PEs to reduce their emissions and the LGs to perform their duties. Fourth, in some scenarios, simply increasing the political penalties for the LGs fails to motivate the LGs to perform their duties. This research can provide the evolutionary mechanism and broaden our understanding of relationship between environmental regulation and emission reduction strategies. Related implications are finally proposed, which can offer valuable guidance on reforming the environmental regulation and improve market outcomes in China.
There is the recent boom in investigating the control of evolutionary games in multi-agent systems, where personal interests and collective interests often conflict. Using evolutionary game theory to study the behaviors of multi-agent systems yields an interdisciplinary topic which has received an increasing amount of attention. Findings in real-world multi-agent systems show that individuals have multiple choices, and this diversity shapes the emergence and transmission of strategy, disease, innovation, and opinion in various social populations. In this sense, the simplified theoretical models in previous studies need to be enriched, though the difficulty of theoretical analysis may increase correspondingly. Here, our objective is to theoretically establish a scenario of four strategies, including competition among the cooperatives, defection with probabilistic punishment, speculation insured by some policy, and loner. And the possible results of strategy evolution are analyzed in detail. Depending on the initial condition, the state converges either to a domination of cooperators, or to a rock-scissors-paper type heteroclinic cycle of three strategies.
Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make stratigic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory in Philosophical Investigations, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, we develop the Multi-agent Kahneman&Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23.0% and 10.9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 49% detectability in Turing-style blind tests.
The multi-agent collaborative hunting problem is a typical problem in multi-agent coordination and collaboration research. Aiming at the multi-agent hunting problem with learning ability, a collaborative hunt method based on game theory and Q-learning is proposed. Firstly, a cooperative hunting team is established and a game model of cooperative hunting is built. Secondly, through the learning of the escaper’s strategy choice, the trajectory of the escaper’s limited T-step cumulative reward is established, and the trajectory is adjusted to the hunter’s strategy set. Finally, the Nash equilibrium solution is obtained by solving the cooperative hunt game, and each hunter executes the equilibrium strategy to complete the hunt task. C# simulation experiment shows that under the same conditions, this method can effectively solve the hunting problem of a single runaway with learning ability in the obstacle environment, and the comparative analysis of experimental data shows that the efficiency of this method is better than other methods.
This paper is focused on the Nash equilibrium seeking problem for a group of high‐order dynamic players. Each player possesses a general linear dynamic with parameter uncertainties. To solve this problem, we first introduce a virtual reference generator for each player by virtue of the embedded design approach and then constructively propose an observer‐based output feedback integral controller relying on local information and computation. The decision outputs of all players are rigorously proven to reach the expected Nash equilibrium of the corresponding noncooperative game irrespective of the model uncertainties. We also give a simulation example to illustrate the effectiveness of our algorithms.
This paper proposes a swarm intelligence model that employs classical boidflocking dynamics combined with non-cooperative game-theoretic methods, specifically Nash Equilibrium, to simulate adaptive decision-making in multi-agent systems. The work leverages a payoff matrix based on fundamental flocking behaviors: cohesion, alignment, and separation, to enable each agent to dynamically optimize its own strategy based on local interactions within the group. The simulation introduces Self-Organizing Maps (SOMs) for clustering and behavior adaptation, providing a machine learning perspective on agent categorization and role differentiation. To simulate real-world unpredictability, stochastic noise is used to understand how varying noise levels influence collective alignment and coherence. The results demonstrate the impact of environmental factors on emergent swarm behavior and showcase the benefits of combining machine learning and game theory for adaptive control in distributed systems. This work provides valuable insights into the interplay between noise, decision-making, and flocking dynamics, with broader applications in robotics, swarm intelligence, and autonomous systems.
Enforcing a fair workload allocation among multiple agents tasked to achieve an objective in learning enabled demand side healthcare worker settings is crucial for consistent and reliable performance at runtime. Existing multi-agent reinforcement learning (MARL) approaches steer fairness by shaping reward through post hoc orchestrations, leaving no certifiable self-enforceable fairness that is immutable by individual agents at runtime. Contextualized within a setting where each agent shares resources with others, we address this shortcoming with a learning enabled optimization scheme among self-interested decision makers whose individual actions affect those of other agents. This extends the problem to a generalized Nash equilibrium (GNE) game-theoretic framework where we steer group policy to a safe and locally efficient equilibrium, so that no agent can improve its utility function by unilaterally changing its decisions. Fair-GNE models MARL as a constrained generalized Nash equilibrium-seeking (GNE) game, prescribing an ideal equitable collective equilibrium within the problem's natural fabric. Our hypothesis is rigorously evaluated in our custom-designed high-fidelity resuscitation simulator. Across all our numerical experiments, Fair-GNE achieves significant improvement in workload balance over fixed-penalty baselines (0.89 vs.\ 0.33 JFI, $p<0.01$) while maintaining 86\% task success, demonstrating statistically significant fairness gains through adaptive constraint enforcement. Our results communicate our formulations, evaluation metrics, and equilibrium-seeking innovations in large multi-agent learning-based healthcare systems with clarity and principled fairness enforcement.
The letter studies a multi-cluster aggregative game, where multiple clusters exist and each cluster consists of a group of agents. Each cluster serves as a noncooperative player, and agents within the same cluster aim to minimize the sum of their individual cost functions cooperatively. The local cost function of each agent depends on both its own decision and the aggregate decision of each cluster. The objective is to find the Nash equilibrium (NE) of the game in a distributed manner over time-varying unbalanced graphs. To this end, a distributed discrete-time NE seeking algorithm is developed, incorporating a novel tracking technique to estimate the aggregate decision of each cluster and the push-sum protocol to accommodate time-varying unbalanced graphs. The linear convergence of the proposed algorithm is established rigorously via multi-step contraction analysis and linear systems of inequalities. Finally, numerical simulations of an Energy Internet System validate the effectiveness of the algorithm.
This paper studies the Nash equilibrium seeking problem for a group of high-order players, where each player is modeled as an uncertain linear system with a payoff function depending on all players’ output decisions. Compared with most existing Nash equilibrium seeking results, a distinctive feature of our problem is that the decision profile of each agent fails to be directly assignable and can only be changed through an uncertain high-order dynamics by selecting some admissible control inputs. We present a two-step design for distributed integral feedback controller to solve the problem via available partial information of the whole multi-agent system. The outputs of players are shown to reach the expected Nash equilibrium irrespective of the unknown parameters under the developed algorithm. The efficacy of our design is verified by a numerical example.
We propose a deep neural network-based algorithm to identify the Markovian Nash equilibrium of general large $N$-player stochastic differential games. Following the idea of fictitious play, we recast the $N$-player game into $N$ decoupled decision problems (one for each player) and solve them iteratively. The individual decision problem is characterized by a semilinear Hamilton-Jacobi-Bellman equation, to solve which we employ the recently developed deep BSDE method. The resulted algorithm can solve large $N$-player games for which conventional numerical methods would suffer from the curse of dimensionality. Multiple numerical examples involving identical or heterogeneous agents, with risk-neutral or risk-sensitive objectives, are tested to validate the accuracy of the proposed algorithm in large group games. Even for a fifty-player game with the presence of common noise, the proposed algorithm still finds the approximate Nash equilibrium accurately, which, to our best knowledge, is difficult to achieve by other numerical algorithms.
No abstract available
In this article, we investigate the distributed Nash equilibrium (NE) seeking problem for the multi-cluster game under switching communication topologies. Specifically, the communication topology switches between a group of jointly connected digraphs. First, a new distributed NE seeking algorithm for the multi-cluster games is designed by the consensus protocol and gradient play rule under the switching communication topologies. Furthermore, in order to make the algorithm still applicable when the agent only knows part of the decision information, the leader-following consensus protocol is used to generate the estimates for all agents action in the cluster under the assumption that the switching topology between clusters is directed and strongly connected. A more general NE seeking algorithm for the multi-cluster games is designed. For these two algorithms, the results of local convergence and non-local convergence are given, respectively. Two examples verify the validity of the theoretical results.
This paper investigates the distributed strategy design to find generalized Nash equilibria (GNE) of multi-cluster games with nonsmooth payoff functions, a coupled nonlinear inequality constraint, and set constraints. In this game, each cluster is composed of a group of agents and is a virtual noncooperative player, who minimizes its payoff function; each agent only uses its local payoff function, local feasible set and partial information of the coupled inequality constraint, and communicates with its neighbors. To solve the GNE problem, we propose a distributed nonsmooth algorithm using a projected differential inclusion and establish the convergence analysis of the proposed algorithm. A numerical application is given for illustration.
This paper considers the distributed strategy design for Nash equilibrium (NE) seeking in multi-cluster games under a partial-decision information scenario. In the considered game, there are multiple clusters and each cluster consists of a group of agents. A cluster is viewed as a virtual noncooperative player that aims to minimize its local payoff function and the agents in a cluster are the actual players that cooperate within the cluster to optimize the payoff function of the cluster through communication via a connected graph. In our setting, agents have only partial-decision information, that is, they only know local information and cannot have full access to opponents' decisions. To solve the NE seeking problem of this formulated game, a discrete-time distributed algorithm, called distributed gradient tracking algorithm (DGT), is devised based on the inter- and intra-communication of clusters. In the designed algorithm, each agent is equipped with strategy variables including its own strategy and estimates of other clusters' strategies. With the help of a weighted Fronbenius norm and a weighted Euclidean norm, theoretical analysis is presented to rigorously show the linear convergence of the algorithm. Finally, a numerical example is given to illustrate the proposed algorithm.
Abstract In this paper, we study a distributed continuous-time design for aggregative games with coupled constraints in order to seek the generalized Nash equilibrium by a group of agents via simple local information exchange. To solve the problem, we propose a distributed algorithm based on projected dynamics and non-smooth tracking dynamics, even for the case when the interaction topology of the multi-agent network is time-varying. Moreover, we prove the convergence of the non-smooth algorithm for the distributed game by taking advantage of its special structure and also combining the techniques of the variational inequality and Lyapunov function.
Traditional Nash Q-learning algorithm generally accepts a fact that agents are tightly coupled, which brings huge computing burden. However, many multi-agent systems in the real world have sparse interactions between agents. In this paper, sparse interactions are divided into two categories: intra-group sparse interactions and inter-group sparse interactions. Previous methods can only deal with one specific type of sparse interactions. Aiming at characterizing the two categories of sparse interactions, we use a novel mathematical model called Markov graphical game. On this basis, graphical game-based Nash Q-learning is proposed to deal with different types of interactions. Experimental results show that our algorithm takes less time per episode and acquires a good policy.
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However, existing sample-efficient approaches either require tailored uncertainty estimation under function approximation, or careful coordination of the players. In this paper, we propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters towards those with a higher collective best-response values of all the players when fixing the other players' policies, thus encouraging the policy to deviate from its current equilibrium for more exploration. VMG is oblivious to different forms of function approximation, and permits simultaneous and uncoupled policy updates of all players. Theoretically, we also establish that VMG achieves a near-optimal regret for finding both the NEs of two-player zero-sum Markov games and CCEs of multi-player general-sum Markov games under linear function approximation in an online environment, which nearly match their counterparts with sophisticated uncertainty quantification.
Federated learning (FL) offers a decentralized approach to machine learning, where multiple agents collaboratively train a model while preserving data privacy. In this paper, we investigate the decision-making and equilibrium behavior in FL systems, where agents choose between participating in global training or conducting independent local training. The problem is first modeled as a stage game and then extended to a repeated game to analyze the long-term dynamics of agent participation. For the stage game, we characterize the participation patterns and identify Nash equilibrium, revealing how data heterogeneity influences the equilibrium behavior—specifically, agents with similar data qualities will participate in FL as a group. We also derive the optimal social welfare strategy and show that it lies in a neighborhood of Nash equilibrium. In the repeated game, we propose a privacy-preserving, computationally efficient myopic strategy. This strategy enables agents to make practical decisions under bounded rationality and converges to a neighborhood of Nash equilibrium of the stage game in finite time. By combining theoretical insights with practical strategy design, this work provides a realistic and effective framework for guiding and analyzing agent behaviors in FL systems.
Common feedback strategies in multi-agent dynamic games require all players' state information to compute control strategies. However, in real-world scenarios, sensing and communication limitations between agents make full state feedback expensive or impractical, and such strategies can become fragile when state information from other agents is inaccurate. To this end, we propose a regularized dynamic programming approach for finding sparse feedback policies that selectively depend on the states of a subset of agents in dynamic games. The proposed approach solves convex adaptive group Lasso problems to compute sparse policies approximating Nash equilibrium solutions. We prove the regularized solutions' asymptotic convergence to a neighborhood of Nash equilibrium policies in linear-quadratic (LQ) games. We extend the proposed approach to general non-LQ games via an iterative algorithm. Empirical results in multi-robot interaction scenarios show that the proposed approach effectively computes feedback policies with varying sparsity levels. When agents have noisy observations of other agents' states, simulation results indicate that the proposed regularized policies consistently achieve lower costs than standard Nash equilibrium policies by up to 77% for all interacting agents whose costs are coupled with other agents' states.
Cooperative tasks are common in multi-agent systems, with closely cooperative tasks being a special case of this, where a change in the state of the environment requires multiple agents to perform a specific operation at the same time. Take a box-pushing task as an example, the box is heavy and requires multiple agents to push it simultaneously. Optimal actions in a closely cooperation task are correlated with the actions of other agents, which makes the individual optimal action potentially inconsistent with the group optimal action, which leads to more non-globally optimal Nash equilibrium policies in the problem. This makes it easier for the policy learned by reinforcement learning to fall into these locally optimal policies. In this letter, we propose a self-organised sequential multi-agent reinforcement learning algorithm (SOS-MARL). We propose sequential decision-making to change the optimization objective of the agent's policy so that the learned policy tends to group optimal policies. And propose an automatic grouping mechanism to make the policy smoother for training and reasoning in large-scale agent environments. We decompose the joint action value factorization outside the group into a combination of each group action value, thus guiding the agents to improve their group policies in a fine-grained manner. We deployed scenarios in both simulated and real environments and compared SOS-MARL with various classical MARL algorithms on box-pushing tasks, demonstrating the state-of-the-art of our method.
Addressing the issues of small scale, decentralized entities, and weak bargaining power in rural distributed energy systems, this paper proposes a collaborative game optimization model that integrates dynamic feedback of carbon pricing and multi-agent reinforcement learning. The model establishes a dynamic feedback relationship between carbon pricing and system emissions by constructing an electricity-carbon-biomass multienergy peer-to-peer (P2P) trading mechanism, achieving endogenous adjustment of environmental costs. It employs a principal-agent game to depict the hierarchical interaction between the system group and the superior energy grid, and designs an asymmetric Nash bargaining mechanism based on comprehensive contribution to ensure fair distribution of benefits. Algorithmically, the model combines the KKT-ADMM distributed optimization and MADDPG multi-agent learning framework to achieve collaborative solving of static equilibrium and dynamic adaptation. Case study results show that the model can reduce the total system cost by $24.91 \%$, decrease carbon emissions by $5.59 \%$, and control the photovoltaic curtailment rate below $1 \%$, significantly enhancing the economic efficiency and low-carbon collaboration of rural distributed energy systems.
Agent intelligence involves specific requirements for social attributes. Intelligent agents make their decisions based on the groups they are part of, tend to satisfy co-members, and enlarge their own benefits. A fundamental question is whether this form of subgroup decision-making accommodate each individual's preferences. In this paper, we examine the evolution of an anticoordination game on a higher-order network in the form of a simplicial complex in relation to the facet cover problem, which ensures that each subgroup yields a positive benefit. We introduce and apply the facet update rule to regulate nodes' group-based interactions. We identify the payoff parameter condition that a strict Nash equilibrium (SNE) satisfies a facet cover. The proposed facet update rule enables the activated facet to reach a facet equilibrium, and all nodes would converge to an SNE with no more than 2m strategy switches, where m is the number of nodes in the simplicial complex. Additionally, we analyze the convergence of the asynchronous update rule, which can be seen as a special case of the facet update rule. Our simulations and extensive examples reveal that the SNE achieved by the facet update rule, on average, covers fewer nodes compared to the asynchronous update rule.
Multi-armed bandits (MAB) is an online learning and decisionmaking model under uncertainty. Instead of maximizing the expected utility (or reward) in a classical MAB setting, the variance of the utility should be considered when making risk-aware decisions. In this paper, we propose a risk-aware multi-agent MAB (MAMAB) model, which considers both the "independent" and "correlated" risk when multiple agents make arm-pulling decisions. Specifically, the system includes a platform that owns a number of tasks (or arms) awaiting a group of agents to accomplish. We show how to calculate the arm-pulling strategy of agents with potentially different eligible arm sets under a Nash equilibrium point. From the perspective of the platform, each arm has its maximal capacity to accommodate arm-pulling agents. We design the platform's optimal payment algorithms for its risk-aware revenue maximization (a regret minimization) under both independent and correlated risks. We prove that our algorithms achieve the sub-linear regret under independent risks when the platform can or cannot differentiate the utility on each arm. We also prove that our algorithm achieves the sublinear regret under correlated risks. We also carry out experiments to quantify the merits of our algorithms for various networking applications, such as crowdsourcing and edge computing.
Individual performance variations, target conflicts, and dynamic scenarios pose challenges in creating effective solutions to balance individual and collective benefits in collaborative multi-agent area coverage control. To address these challenges, this paper proposes a semi-distributed Pareto-optimal policy iteration algorithm (SDPOPI). This algorithm introduces a multi-objective group game (MOGG) model that accounts for conflicting objectives such as coverage efficiency, group density, energy consumption, and communication topology equilibrium. By guiding the distributed reinforcement learning policy iteration process through a game interaction mechanism, the algorithm establishes a dual-layer semi-distributed policy iteration frame-work that achieves the Pareto-Nash equilibrium for the MOGG, thereby balancing individual and group benefits. Simulation results validate the accuracy and practical applicability of the proposed method.
The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitates learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework-group practice- for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.
No abstract available
Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.
We study the problem of achieving decentralized coordination by a group of strategic decision-makers choosing to engage or not in a task in a stochastic setting. First, we define a class of symmetric utility games that encompass a broad class of coordination games, including the popular framework known as global games. To study the extent to which agents engaging in a stochastic coordination game indeed coordinate, we propose a new probabilistic measure of coordination efficiency. Then, we provide a universal information-theoretic upper bound on the coordination efficiency as a function of the amount of noise in the observation channels. Finally, we revisit a large class of global games, and we illustrate that their Nash equilibrium policies may be less coordination efficient than certainty equivalent policies, despite them providing better expected utility. This counter-intuitive result, establishes the existence of a nontrivial trade-off between coordination efficiency and expected utility in coordination games.
This paper uses a distributed approach fault tolerant control to solve the Nash equilibrium (NE) search problem for integral multi-agent systems (MAS) subject to undesirable sensor faults. In partial information games, information exchange between agents plays an essential role in ensuring the successful completion of the NE search. A flexible observer-based control algorithm has been proposed to compensate for the adverse effects of sensor faults. It reduces the influence of sensor faults on group decisions-making and realizing the optimization of group strategy. The distributed design can guarantee that the MAS can gradually approach the NE regardless of sensor faults. Finally, the effectiveness of the technology is illustrated through a simulation example.
In this paper, an adaptive distributed optimization approach is investigated for integrator-type multi-agent systems with unknown time-varying disturbances and unmodeled dynamics in non-cooperative games. In partial information games, each agent is considered as a player and local player can know other players' partial decision knowledge through the network. The propagation of local disturbance in the network and the lack of global information make it difficult for local players to make optimal decisions. Aiming at this problem, a disturbance observer algorithm based on neural network is designed to realize disturbance and unmodeled dynamics estimation and a dynamic average consensus algorithm is given to estimate non-neighbor strategy. Estimates of disturbances and unmodeled dynamics are compensated in the control signal to reduce their influence on group decision making. Combined with the gradient optimization method, an adaptive distributed Nash equilibrium seeking method is realized. The simulation results show the effectiveness of the proposed algorithm.
We consider fractional hedonic games, a subclass of coalition formation games that can be succinctly modeled by means of a graph in which nodes represent agents and edge weights the degree of preference of the corresponding endpoints. The happiness or utility of an agent for being in a coalition is the average value she ascribes to its members. We adopt Nash stable outcomes as the target solution concept; that is we focus on states in which no agent can improve her utility by unilaterally changing her own group. We provide existence, efficiency and complexity results for games played on both general and specific graph topologies. As to the efficiency results, we mainly study the quality of the best Nash stable outcome and refer to the ratio between the social welfare of an optimal coalition structure and the one of such an equilibrium as to the price of stability. In this respect, we remark that a best Nash stable outcome has a natural meaning of stability, since it is the optimal solution among the ones which can be accepted by selfish agents. We provide upper and lower bounds on the price of stability for different topologies, both in case of weighted and unweighted edges. Beside the results for general graphs, we give refined bounds for various specific cases, such as triangle-free, bipartite graphs and tree graphs. For these families, we also show how to efficiently compute Nash stable outcomes with provable good social welfare.
In single-elimination knockout tournaments, participants face each other based on a starting seeding and progress to the next rounds by beating their direct opponents. In this paper we initiate the study of coalitional knockout tournaments, which generalise single-elimination knockout tournaments by allowing groups of players, or coalitions, to strategically select one of their members to take part in the tournament, following the starting seeding. We investigate the algorithmic properties of pure strategies Nash equilibria in these games under various setups, i.e., whether or not choices can be made at each round and whether or not tournament progression is important to the group. Despite the more complex tournament structure when compared to single-elimination, we provide (quasi-)polynomial-time algorithms for all cases. Our results can be applied to those tournaments where pre-play selection plays an important role, such as sport events or elections with run-off.
No abstract available
Multi-agent imitation learning (MA-IL) aims to inversely learn policies for all agents using demonstrations collected from an expert group. However, this problem has only been studied in the setting of Markov games (MGs) allowing participants for concurrent actions, and do not work for general MGs, with agents inconcurrently making decisions in different turns. In this work, we propose iMA-IL, a novel multi-agent imitation learning framework for general (inconcurrent) Markov games. The learned policies are proven to guarantee subgame perfect equilibrium (SPE), a stronger equilibrium than Nash equilibrium (NE). The experiment results demonstrate that compared to state-of-the-art baselines, our iMA-IL model can better infer the policy of each expert agent using their demonstration data collected from inconcurrent decision-making scenarios.
Abstract The growing adoption of large language models (LLMs) presents potential for deeper understanding of human behaviours within game theory frameworks. This paper examines strategic interactions among multiple types of LLM-based agents in a classical beauty contest game. LLM-based agents demonstrate varying depth of reasoning that fall within a range of level-0 to 1, which are lower than experimental results conducted with human subjects in previous studies. However, they do display a similar convergence pattern towards Nash Equilibrium choice in repeated settings. Through simulations that vary the group composition of agent types, I found that environments with a lower strategic uncertainty enhance convergence for LLM-based agents, and environments with mixed strategic types accelerate convergence for all. Results with simulated agents not only convey insights into potential human behaviours in competitive settings, but also prove valuable for understanding strategic interactions among algorithms.
We analyze an infinite-horizon deterministic joint replenishment model from a non-cooperative game-theoretical approach. In this model, a group of retailers can choose to jointly place an order, which incurs a major setup cost independent of the group, and a minor setup cost for each retailer. Additionally, each retailer is associated with a holding cost. Our objective is to design cost allocation rules that minimize the long-run average system cost while accounting for the fact that each retailer independently selects its replenishment interval to minimize its own cost. We introduce a class of cost allocation rules that distribute the major setup cost among the associated retailers in proportion to their predefined weights. For these rules, we establish a monotonicity property of agent better responses, which enables us to prove the existence of a payoff dominant pure Nash equilibrium that can also be computed efficiently. We then analyze the efficiency of these equilibria by examining the price of stability (PoS), the ratio of the best Nash equilibrium's system cost to the social optimum, across different information settings. In particular, our analysis reveals that one rule, which leverages retailers'own holding cost rates, achieves a near-optimal PoS of 1.25, while another rule that does not require access to retailers'private information also yields a favorable PoS.
A Bilevel Approach to Integrated Surgeon Scheduling and Surgery Planning solved via Branch-and-Price
In this paper, we study a multi-agent scheduling problem for organising the operations within the operating room department. The head of the surgeon group and individual surgeons are together responsible for the surgeon schedule and surgical case planning. The surgeon head allocates time blocks to individual surgeons, whereas individual surgeons determine the planning of surgical cases independently, which might degrade the schedule quality envisaged by the surgeon head. The bilevel optimisation under study seeks an optimal Nash equilibrium solution -- a surgeon schedule and surgical case plan that optimise the objectives of the surgeon head, while ensuring that no individual surgeon can improve their own objective within the allocated time blocks. We propose a dedicated branch-and-price that adds lazy constraints to the formulation of surgeon-specific pricing problems to ensure an optimal bilevel feasible solution is retrieved. In this way, the surgeon head respects the objective requirements of the individual surgeons and the solution space can be searched efficiently. In the computational experiments, we validate the performance of the proposed algorithm and its dedicated components and provide insights into the benefits of attaining an equilibrium solution under different scenarios by calculating the price of stability and the price of decentralisation.
Coalescence, as a kind of ubiquitous group behavior in the nature and society, means that agents, companies or other substances keep consensus in states and act as a whole. This paper considers coalescence for n rational agents with distinct initial states. Considering the rationality and intellectuality of the population, the coalescing process is described by a bimatrix game which has the unique mixed strategy Nash equilibrium solution. Since the process is not an independent stochastic process, it is difficult to analyze the coalescing process. By using the first Borel-Cantelli Lemma, we prove that all agents will coalesce into one group with probability one. Moreover, the expected coalescence time is also evaluated. For the scenario where payoff functions are power functions, we obtain the distribution and expected value of coalescence time. Finally, simulation examples are provided to validate the effectiveness of the theoretical results.
Modeling how agents form their opinions is of paramount importance for designing marketing and electoral campaigns. In this work, we present a new framework for opinion formation which generalizes the well-known Friedkin-Johnsen model by incorporating three important features: (i) social group membership, that limits the amount of influence that people not belonging to the same group may lead on a given agent; (ii) both attraction among friends, and repulsion among enemies; (iii) different strengths of influence lead from different people on a given agent, even if the social relationships among them are the same. We show that, despite its generality, our model always admits a pure Nash equilibrium which, under opportune mild conditions, is even unique. Next, we analyze the performances of these equilibria with respect to a social objective function defined as a convex combination, parametrized by a value λ∈[0,1], of the costs yielded by the untruthfulness of the declared opinions and the total cost of social pressure. We prove bounds on both the price of anarchy and the price of stability which show that, for not-too-extreme values of λ, performance at equilibrium are very close to optimal ones. For instance, in several interesting scenarios, the prices of anarchy and stability are both equal to max{2λ,1-λ}/min{2λ,1-λ} which never exceeds 2 for λ∈[1/5,1/2].
This paper investigates the problem of maximizing social power for a group of agents, who participate in multiple meetings described by independent Friedkin-Johnsen models. A strategic game is obtained, in which the action of each agent (or player) is her stubbornness over all the meetings, and the payoff is her social power on average. It is proved that, for all but some strategy profiles on the boundary of the feasible action set, each agent's best response is the solution of a convex optimization problem. Furthermore, even with the non-convexity on boundary profiles, if the underlying networks are given by a fixed complete graph, the game has a unique Nash equilibrium. For this case, the best response of each agent is analytically characterized, and is achieved in finite time by a proposed algorithm.
Inspired by socially influenced decision making mechanisms such as the collective behavior of cancer cells, honey bees searching for a new colony or the mobility of bacterial swarms, we consider a Mean Field Games (MFG) model of collective choice where a large number of agents choose between multiple alternatives while taking into account the group's behavior. For example, in elections, individual interests and collective opinion swings together contribute in the crystallization of final decisions. At first, the agents' decisions are determined by their initial states. Subsequently, the model is generalized to include a priori individual preferences towards the destination points. For example, personal preferences that transcend party lines in elections. We show that multiple strategies exist with each one of them defining an epsilon Nash equilibrium.
Despite the proliferation of multi-agent deep reinforcement learning (MADRL), most existing typical methods do not scale well to the dynamics of agent populations. And as the population increases, the dimensional explosion of joint state-action and the complex interaction between agents make learning extremely cumbersome, which poses the scalability challenge for MADRL. This paper focuses on the scalability issue of MADRL with homogeneous agents. In a natural population, local interaction is a more feasible mode of interplay rather than global interaction. And inspired by the strategic interaction model in economics, we decompose the value function of each agent into the sum of the expected cumulative rewards of the interaction between the agent and each neighbor. This novel value function is decentralized and decomposable, which enables it to scale well to the dynamic changes in the number of large-scale agents. Hereby, the corresponding strategic interaction reinforcement learning algorithm (SIQ), is proposed to learn the optimal policy of each agent, wherein a neural network is employed to estimate the expected cumulative reward for the interaction between the agent and one of its neighbors. We test the validity of the proposed method in a mixed cooperative-competitive confrontation game through numerical experiments. Furthermore, the scalability comparison experiments illustrate that the scalability of the SIQ algorithm outperforms the independent learning and mean field reinforcement learning algorithms in multiple scenarios with different and dynamically changing numbers.
When learning in strategic environments, a key question is whether agents can overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty. Can they do this solely through interactions with each other? We focus this question on the ability of agents to attain the value of their Stackelberg optimal strategy and study the impact of information asymmetry. We study repeated interactions in fully strategic environments where players' actions are decided based on learning algorithms that take into account their observed histories and knowledge of the game. We study the pure Nash equilibria (PNE) of a meta-game where players choose these algorithms as their actions. We demonstrate that if one player has perfect knowledge about the game, then any initial informational gap persists. That is, while there is always a PNE in which the informed agent achieves her Stackelberg value, there is a game where no PNE of the meta-game allows the partially informed player to achieve her Stackelberg value. On the other hand, if both players start with some uncertainty about the game, the quality of information alone does not determine which agent can achieve her Stackelberg value. In this case, the concept of information asymmetry becomes nuanced and depends on the game's structure. Overall, our findings suggest that repeated strategic interactions alone cannot facilitate learning effectively enough to earn an uninformed player her Stackelberg value.
LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.
Artificial intelligence (AI) is reshaping strategic planning, with Multi-Agent Reinforcement Learning (MARL) enabling coordination among autonomous agents in complex scenarios. However, its practical deployment in sensitive military contexts is constrained by the lack of explainability, which is an essential factor for trust, safety, and alignment with human strategies. This work reviews and assesses current advances in explainability methods for MARL with a focus on simulated air combat scenarios. We proceed by adapting various explainability techniques to different aerial combat scenarios to gain explanatory insights about the model behavior. By linking AI-generated tactics with human-understandable reasoning, we emphasize the need for transparency to ensure reliable deployment and meaningful human-machine interaction. By illuminating the crucial importance of explainability in advancing MARL for operational defense, our work supports not only strategic planning but also the training of military personnel with insightful and comprehensible analyses.
We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs). WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding, and establishes a systematic benchmark for evaluating agent reasoning abilities in complex social settings. Building on this foundation, we present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs. MMGs pose unique challenges, including undefined state spaces, absent intermediate rewards, and the need for strategic reasoning through natural language. PLAYER* addresses these challenges with a sensor-based state representation and an information-driven strategy that optimises questioning and suspect pruning. Experiments show that PLAYER* outperforms existing methods in reasoning accuracy, efficiency, and agent-human interaction, advancing reasoning agents for complex social scenarios.
No abstract available
Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently interact with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time when intent uncertainty will be resolved. By estimating this parameter online, we construct a game-theoretic motion planner that adapts to changing beliefs while anticipating future certainty. We show that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Through a series of simulated autonomous driving scenarios, we demonstrate that contingency games close the gap between certainty-equivalent games that commit to a single hypothesis and non-contingent multi-hypothesis games that do not account for future uncertainty reduction.
As technology progresses, we find ourselves working with automated agents increasingly more often. Developing intelligent automated agents capable of interacting proficiently with people necessitates the development of integrative approaches which consider both the computational and human factors. In this talk, I will present three of my integrated research efforts towards developing intelligent agents with real-world impact, ranging from 'adversarial' settings such as apprehending reckless drivers to fully cooperative settings such as robotic search and rescue. Through extensive empirical evaluations, we demonstrate how the integration of AI and agent technologies (including machine learning and optimization) with behavioural sciences such as psychology and economics can bring about a much desired leap in the way we develop human interacting agents for social good.
When playing games of strategic interaction, such as iterated Prisoner's Dilemma and iterated Chicken Game, people exhibit specific within-game learning (e.g., learning a game's optimal outcome) as well as transfer of learning between games (e.g., a game's optimal outcome occurring at a higher proportion when played after another game). The reciprocal trust players develop during the first game is thought to mediate transfer of learning effects. Recently, a computational cognitive model using a novel trust mechanism has been shown to account for human behavior in both games, including the transfer between games. We present the results of a study in which we evaluate the model's a priori predictions of human learning and transfer in 16 different conditions. The model's predictive validity is compared against five model variants that lacked a trust mechanism. The results suggest that a trust mechanism is necessary to explain human behavior across multiple conditions, even when a human plays against a non-human agent. The addition of a trust mechanism to the other learning mechanisms within the cognitive architecture, such as sequence learning, instance-based learning, and utility learning, leads to better prediction of the empirical data. It is argued that computational cognitive modeling is a useful tool for studying trust development, calibration, and repair.
No abstract available
State-of-the-art driver-assist systems have failed to effectively mitigate driver inattention and had minimal impacts on the ever-growing number of road mishaps (e.g. life loss, physical injuries due to accidents caused by various factors that lead to driver inattention). This is because traditional human-machine interaction settings are modeled in classical and behavioral game-theoretic domains which are technically appropriate to characterize strategic interaction between either two utility maximizing agents, or human decision makers. Therefore, in an attempt to improve the persuasive effectiveness of driver-assist systems, we develop a novel strategic and personalized driver-assist system which adapts to the driver's mental state and choice behavior. First, we propose a novel equilibrium notion in human-system interaction games, where the system maximizes its expected utility and human decisions can be characterized using any general decision model. Then we use this novel equilibrium notion to investigate the strategic driver-vehicle interaction game where the car presents a persuasive recommendation to steer the driver towards safer driving decisions. We assume that the driver employs an open-quantum system cognition model, which captures complex aspects of human decision making such as violations to classical law of total probability and incompatibility of certain mental representations of information. We present closed-form expressions for players' final responses to each other's strategies so that we can numerically compute both pure and mixed equilibria. Numerical results are presented to illustrate both kinds of equilibria.
Strategic diversity is often essential in games: in multi-player games, for example, evaluating a player against a diverse set of strategies will yield a more accurate estimate of its performance. Furthermore, in games with non-transitivities diversity allows a player to cover several winning strategies. However, despite the significance of strategic diversity, training agents that exhibit diverse behaviour remains a challenge. In this paper we study how to construct diverse populations of agents by carefully structuring how individuals within a population interact. Our approach is based on interaction graphs, which control the flow of information between agents during training and can encourage agents to specialise on different strategies, leading to improved overall performance. We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
Recent advancements in large language models (LLMs) have extended their capabilities from basic text processing to complex reasoning tasks, including legal interpretation, argumentation, and strategic interaction. However, empirical understanding of LLM behavior in open-ended, multi-agent settings especially those involving deliberation over legal and ethical dilemmas remains limited. We introduce NomicLaw, a structured multi-agent simulation where LLMs engage in collaborative law-making, responding to complex legal vignettes by proposing rules, justifying them, and voting on peer proposals. We quantitatively measure trust and reciprocity via voting patterns and qualitatively assess how agents use strategic language to justify proposals and influence outcomes. Experiments involving homogeneous and heterogeneous LLM groups demonstrate how agents spontaneously form alliances, betray trust, and adapt their rhetoric to shape collective decisions. Our results highlight the latent social reasoning and persuasive capabilities of ten open-source LLMs and provide insights into the design of future AI systems capable of autonomous negotiation, coordination and drafting legislation in legal settings.
This paper presents a substantially reworked examination of how advanced game-theoretic paradigms can serve as a foundation for the next-generation challenges in Artificial Intelligence (AI), forecasted to arrive in or around 2025. Our focus extends beyond traditional models by incorporating dynamic coalition formation, language-based utilities, sabotage risks, and partial observability. We provide a set of mathematical formalisms, simulations, and coding schemes that illustrate how multi-agent AI systems may adapt and negotiate in complex environments. Key elements include repeated games, Bayesian updates for adversarial detection, and moral framing within payoff structures. This work aims to equip AI researchers with robust theoretical tools for aligning strategic interaction in uncertain, partially adversarial contexts.
When ML algorithms are deployed to automate human-related decisions, human agents may learn the underlying decision policies and adapt their behavior. Strategic Classification (SC) has emerged as a framework for studying this interaction between agents and decision-makers to design more trustworthy ML systems. Prior theoretical models in SC assume that agents are perfectly or approximately rational and respond to decision policies by optimizing their utility. However, the growing prevalence of LLMs raises the possibility that real-world agents may instead rely on these tools for strategic advice. This shift prompts two questions: (i) Can LLMs generate effective and socially responsible strategies in SC settings? (ii) Can existing SC theoretical models accurately capture agent behavior when agents follow LLM-generated advice? To investigate these questions, we examine five critical SC scenarios: hiring, loan applications, school admissions, personal income, and public assistance programs. We simulate agents with diverse profiles who interact with three commercial LLMs (GPT-4o, GPT-4.1, and GPT-5), following their suggestions on effort allocations on features. We compare the resulting agent behaviors with the best responses in existing SC models. Our findings show that: (i) Even without access to the decision policy, LLMs can generate effective strategies that improve both agents'scores and qualification; (ii) At the population level, LLM-guided effort allocation strategies yield similar or even higher score improvements, qualification rates, and fairness metrics as those predicted by the SC theoretical model, suggesting that the theoretical model may still serve as a reasonable proxy for LLM-influenced behavior; and (iii) At the individual level, LLMs tend to produce more diverse and balanced effort allocations than theoretical models.
Cooperative multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for addressing complex real-world challenges, including autonomous robot control, strategic decision-making, and decentralized coordination in unmanned swarm systems. However, it still faces challenges in learning proper coordination among multiple agents. The lack of effective knowledge sharing and experience interaction mechanisms among agents has led to substantial performance decline, especially in terms of low sampling efficiency and slow convergence rates, ultimately constraining the practical applicability of MARL. To address these challenges, this paper proposes a novel framework termed Reward redistribution and Experience reutilization based Coordination Optimization (RECO). This innovative approach employs a hierarchical experience pool mechanism that enhances exploration through strategic reward redistribution and experience reutilization. The RECO framework incorporates a sophisticated evaluation mechanism that assesses the quality of historical sampling data from individual agents and optimizes reward distribution by maximizing mutual information across hierarchical experience trajectories. Extensive comparative analyses of computational efficiency and performance metrics across diverse environments reveal that the proposed method not only enhances training efficiency in multi-agent gaming scenarios but also significantly strengthens algorithmic robustness and stability in dynamic environments.
The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, and Strategy and Playbook Summarization, where it distills experiences into an actionable strategic playbook. We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban), and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards. Ablation studies confirm that the iterative process is critical for sustained learning. Our work demonstrates a path toward more general and interpretable agents that not only act effectively but also build a transparent and improving model of their world through explicit reasoning on raw experience.
We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold. We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.
Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative reinforcement learning (RL), multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy RL methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy which achieves 69.5% win rate against the strongest baseline in the 3 vs 3 task, demonstrating its potential for tackling the complex interplay between low-level control and high-level strategy. To highlight VolleyBots'sim-to-real potential, we further demonstrate the zero-shot deployment of a policy trained entirely in simulation on real-world drones.
Reinforcement learning (RL) approaches, particularly Q-learning, have emerged as strong tools for autonomous agent training, allowing agents to acquire optimum decision-making rules through interaction with their surroundings. This research investigates the use of Q-learning in the context of training autonomous agents for robotic soccer, a complex and dynamic arena that necessitates strategic planning, coordination, and adaptation. We studied the learning progress and performance of agents taught using Q-learning in a series of experiments carried out in a simulated soccer setting. During training, agents interacted with the soccer environment, iteratively changing their Q-values in response to observable rewards and behaviors. Despite the high-dimensional and stochastic character of the soccer domain, Q-learning helped the agents develop excellent tactics and decision-making capabilities. Notably, our study found that, on average, the agents required 64 steps to reach a stable policy with an average reward of -1.
We present a framework that integrates game theory and psychology to model multiagent systems of large language model (LLM) agents endowed with Big Five personality traits. We embed the agents within polymatrix games and compare their strategic decisions against Nash equilibria. We found that personality traits influence strategic behavior, and that dialogue-enabled interactions achieve convergence rates of 98–100%, compared to 42–54% without dialogue. We also show how personality traits shape collective decision-making and provide insights into the behavioral realism of game-theoretic predictions in psychologically grounded AI agents.
This paper studies how to design a platform to optimally control constrained multi-agent systems with a single coordinator and multiple strategic agents. In our setting, only the coordinator applies control inputs, however, it must do so based on information provided by the agents. One major challenge is that if the platform is not correctly designed then the agents may provide false information to the coordinator in order to achieve improved outcomes for themselves at the expense of the overall system efficiency. Here, we design an interaction mechanism between the agents and the coordinator such that the mechanism: ensures agents truthfully report their information, has low communication requirements, and leads to a control action that achieves efficiency by achieving a Nash equilibrium. We illustrate our proposed mechanism in a model predictive control (MPC) application involving heating, ventilation, air-conditioning (HVAC) control by a building manager of an apartment building.
Since the implementation of the power system reform, the electricity market encouraged the establishment of a long-term and stable trading mechanism. In order to describe the trading behavior of micro agents in the power market, this paper used the simple reflection agent in artificial intelligence to simulate the market behavior of power plants and power users. Specifically, we analyzed the influence of the modes selection behavior of micro agents on power trading results. By comparing the equilibrium results of different market supply and demand relations, we obtained the conditions for improving power plants capacity utilization and market efficiency. The results show that the artificial intelligence agents can significantly improve the market efficiency in certain conditions.
The core challenge in autonomous driving decision-making systems lies in ensuring safety and robustness in complex and adversarial scenarios. Traditional reinforcement learning methods lack interpretability, while recent approaches that use Large Language Models (LLMs) to generate interpretable rules, though transparent, produce policies that are brittle against unknown adversarial corner cases. To address this issue, we propose BladeRunner-AD, an innovative framework for driving policy generation. This framework models the policy generation process as a zero-sum game between two LLM agents. A "Policy Defender" agent is responsible for generating an executable decision-tree driving policy, while a "stylized" Adversarial Attacker agent analyzes the current policy’s logical flaws and generates adversarial scenarios in simulation most likely to cause it to fail. Through a closed loop of generation-attack-reflection-revision, the policy co-evolves under continuous, worst-case pressure. This mechanism, akin to automated red teaming, forces the generated rules to be continuously refined, thereby achieving high robustness. We conducted comprehensive experiments in interactive scenarios on the highway-env simulation platform. The results demonstrate that policies generated by BladeRunner-AD exhibit a significantly lower collision rate on an independent adversarial test set compared to baseline methods, showcasing superior robustness and safety. This work presents a novel paradigm for generating autonomous driving decision-making systems that are both robust and interpretable.
Large Language Models (LLMs) have showcased remarkable potential across various domains, especially in text generation. However, their vulnerability to jailbreak attacks presents considerable challenges to secure deployment, as attackers can use carefully crafted prompts to bypass safety measures and generate harmful content. Current jailbreak methods generally suffer from two significant limitations: a restricted strategy space for generating adversarial prompts and insufficient optimization of prompts based on feedback from LLMs. To overcome these challenges, we present Multistep Adaptive Attack Agent (MATA), an approach that employs a game-theoretic interaction between attack model and target model to adaptively execute jailbreak attacks on LLMs. This method enables iterative attempts based on reflection, gradually identifying the optimal jailbreak attack strategy within a complex strategy space. We compared MATA with mainstream methods across multiple open-source and closed-source LLMs, including Llama3.1, GLM4, and GPT4o. The results demonstrate that our approach exceeds existing methods in terms of attack success rate, average number of queries, and prompt diversity, effectively identifying vulnerabilities in LLMs.
No abstract available
No abstract available
We introduce MBTI-in-Thoughts, a framework for enhancing the effectiveness of Large Language Model (LLM) agents through psychologically grounded personality conditioning. Drawing on the Myers-Briggs Type Indicator (MBTI), our method primes agents with distinct personality archetypes via prompt engineering, enabling control over behavior along two foundational axes of human psychology, cognition and affect. We show that such personality priming yields consistent, interpretable behavioral biases across diverse tasks: emotionally expressive agents excel in narrative generation, while analytically primed agents adopt more stable strategies in game-theoretic settings. Our framework supports experimenting with structured multi-agent communication protocols and reveals that self-reflection prior to interaction improves cooperation and reasoning quality. To ensure trait persistence, we integrate the official 16Personalities test for automated verification. While our focus is on MBTI, we show that our approach generalizes seamlessly to other psychological frameworks such as Big Five, HEXACO, or Enneagram. By bridging psychological theory and LLM behavior design, we establish a foundation for psychologically enhanced AI agents without any fine-tuning.
Equilibrium Analysis of Service Ecosystems for Labor-Intensive Services Using Multi-Agent Simulation
The value of a service system should be evaluated using multiple indicators, such as company profitability, consumer satisfaction, or employee satisfaction to realize an ecosystem in society. This study examines the mechanisms of service systems with a multi-agent simulation model consisting of a company, employees, and consumers based on game theory. The proposed model is intended for a basic service business in which employees provide services to consumers directly based on their skill. In this model, first, a company player sets the price of a service and salary of employees. Then, employees decide whether to acquire resources, such as skills, with their efforts (costs) to satisfy either consumers’ needs or not. Then the employees acquire their profits (equivariant of satisfaction) not only from acquired salary but also from the reflection of consumer satisfaction. However, consumers have their needs structure, as gain tables, and decide whether and from whom to purchase. A consumer’s profit is calculated using his/her satisfaction with the service provided using a certain employee and the price paid for the service. Based on the model proposed above, we conducted a multi-agent simulation where company, employee consumer players make decision to maximize their own profits. From the basic simulation results, two convergent patterns are acquired according to the initial values of price and salary. In the second simulation, the heterogeneity of consumer needs is considered in the model based on questionnaire survey results on actual consumer behaviors related to hair salons (n=2472). With a factor analysis of 13 questionnaire items on lifestyles, four lifestyle factors are extracted. Based on the survey results, consumer players of four types are introduced into the simulation to analyze which services are selected in the service system. Through the simulation, four convergent patterns are acquired. In those patterns, consumers of different types are included according to the types of services. With those results, this paper presents a discussion of the design of a new service ecosystem through the comparison between acquired convergent solutions and existing business models.
Energy Communities (ECs) are confronted by diverse and intricate challenges concerning sustainability development goals and climate change awareness. This demonstration introduces En-join, a speculative game that addresses these complexities by using Large Language Models (LLMs) to engage players in negotiating solutions for such challenges. En-join demonstrates a novel approach by integrating an LLM as a dual-agent, serving simultaneously as a narrative guide and evaluator, to simulate EC dynamics. Players interact with LLM-powered Non-Player Characters (NPCs) to navigate open-ended scenarios, promoting reflection on sustainability and community participatory decisions on their own resources, alongside pro-social behaviors. This work highlights the potential of LLMs as mediators in serious games, fostering engagement and critical thinking on sustainable energy practices.
We present APT, an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex and creative structures within the Minecraft environment. Unlike previous approaches that primarily concentrate on skill-based open-world tasks or rely on image-based diffusion models for generating voxel-based structures, our method leverages the intrinsic spatial reasoning capabilities of LLMs. By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints that the agent can execute under zero-shot or few-shot learning scenarios. Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process. To rigorously evaluate the agent's performance in this emerging research area, we introduce a comprehensive benchmark consisting of diverse construction tasks designed to test creativity, spatial reasoning, adherence to in-game rules, and the effective integration of multimodal instructions. Experimental results using various GPT-based LLM backends and agent configurations demonstrate the agent's capacity to accurately interpret extensive instructions involving numerous items, their positions, and orientations. The agent successfully produces complex structures complete with internal functionalities such as Redstone-powered systems. A/B testing indicates that the inclusion of a memory module leads to a significant increase in performance, emphasizing its role in enabling continuous learning and the reuse of accumulated experience. Additionally, the agent's unexpected emergence of scaffolding behavior highlights the potential of future LLM-driven agents to utilize subroutine planning and leverage the emergence ability of LLMs to autonomously develop human-like problem-solving techniques.
No abstract available
Actions that affect knowledge asymmetrically between agents occur in numerous domains, from card games such as poker to the secure transmission of information. Applications in such domains often depend on reflection over knowledge, including what an agent knows about what other agents know. We are interested in enabling formal specification of these systems which may be used for executable prototyping as well as verification and other formal reasoning. Dynamic Epistemic Logic provides a formal basis for such reasoning, but is often prohibitively cumbersome to use in practice. We present an implementation and macro system called Ostari, backed by a particular flavor of Dynamic Epistemic Logic, which allows us to scale the ideas to more realistic problems. We demonstrate how actions that manipulate agents' beliefs can be written concisely and how this capability can be applied to modeling a popular card game by utilizing our system's ability to execute action sequences, answer queries about knowledge states, and find action sequences to satisfy a particular goal.
No abstract available
In this paper, we present a family of control-stopping games that arise naturally in equilibrium-based models of market microstructure as well as in other models with strategic buyers and sellers. A distinctive feature of this family of games is the fact that the agents do not have any exogenously given fundamental value for the asset, and they deduce the value of their position from the bid and ask prices posted by other agents (i.e., they are pure speculators). As a result, in such a game, the reward function of each agent at the time of stopping depends directly on the controls of other players. The equilibrium problem leads naturally to a system of coupled control-stopping problems (or, equivalently, reflected-backward stochastic differential equations), in which the individual reward functions (or reflecting barriers) depend on the value functions (or solution components) of other agents. The resulting system, in general, presents multiple mathematical challenges because of the nonstandard form of coupling (or reflection). In the present case, this system is also complicated by the fact that the continuous controls of the agents, describing their posted bid and ask prices, are constrained to take values in a discrete grid. The latter feature reflects the presence of a positive tick size in the market, and it creates additional discontinuities in the agents’ reward functions (or reflecting barriers). Herein we prove the existence of a solution to the associated system in a special Markovian framework, provide numerical examples, and discuss the potential applications.
We present a modified naming game by introducing weights of words in the evolution process. We assign the weight of a word spoken by an agent according to its connectivity, which is a natural reflection of the agent's influence in population. A tunable parameter is introduced, governing the word weight based on the connectivity of agents. We consider the scale-free topology and concentrate on the efficiency of reaching the final consensus, which is of high importance in the self-organized system. Interestingly, it is found that there exists an optimal parameter value, leading to the fastest convergence. This indicates appropriate hub's effects favor the achievement of consensus. The evolution of distinct words helps to give a qualitative explanation of this phenomena. Similar nontrivial phenomena are observed in the total memory of agents with a peak in the middle range of parameter values. Other relevant characters are provided as well, including the time evolution of total memory and success rate for different parameter values as well as the average degree of the network, which are helpful for understanding the dynamics of the modified naming game in detail.
Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.
What are the strategies people adopt when deciding how to delegated tasks to agents when the agents' reputation and productivity information is available? How effective are these strategies under different conditions? These questions are important since they have significant implications to the ongoing research of reputation aware task delegation in multi-agent systems (MASs). In this paper, we conduct an empirical study to address the aforementioned research questions by providing a gamified mechanism for people to report the reputation-aware task delegation strategies they adopt. The findings from this empirical study may help MAS researchers develop multi-agent trust evaluation testbeds with more realistic simulated human behaviours.
Understanding how human beings delegate tasks to trustees when presented with reputation information is important for building trust models for human-agent collectives. However, there is a lack of suitable platforms for building large scale datasets on this topic. We describe a demonstration of a multi-agent game for training students in the practice of Agile software engineering. Through interacting with agent competitors in the game, the behavior data related to users' decision-making process under uncertainty and resource constraints are collected in an unobtrusive fashion. These data may provide multi-agent trust researchers with new insight into the human decision-making process, and help them benchmark their proposed models.
No abstract available
Agent-based models (ABM) are valuable for modelling complex systems, however, they are often manually specified and lack behavioral and/or environmental adaptation. In this work, we develop a generic two-layer framework for ADaptive AGEnt based modelling (ADAGE) for addressing this. ADAGE formalises the bi-level problem of agent and environment adaptation as a Stackelberg game, providing a consolidated framework for adaptive ABM. We demonstrate how ADAGE encapsulates several modelling tasks, such as policy design, calibration, scenario generation, and robust behavioural learning under one unified framework. We provide example simulations on various environments, showing the flexibility of ADAGE while addressing long-standing critiques of ABMs.
Cyber defense requires automating defensive decision-making under stealthy, deceptive, and continuously evolving adversarial strategies. The FlipIt game provides a foundational framework for modeling interactions between a defender and an advanced adversary that compromises a system without being immediately detected. In FlipIt, the attacker and defender compete to control a shared resource by performing a Flip action and paying a cost. However, the existing FlipIt frameworks rely on a small number of heuristics or specialized learning techniques, which can lead to brittleness and the inability to adapt to new attacks. To address these limitations, we introduce PoolFlip, a multi-agent gym environment that extends the FlipIt game to allow efficient learning for attackers and defenders. Furthermore, we propose Flip-PSRO, a multi-agent reinforcement learning (MARL) approach that leverages population-based training to train defender agents equipped to generalize against a range of unknown, potentially adaptive opponents. Our empirical results suggest that Flip-PSRO defenders are $2\times$ more effective than baselines to generalize to a heuristic attack not exposed in training. In addition, our newly designed ownership-based utility functions ensure that Flip-PSRO defenders maintain a high level of control while optimizing performance.
The growing complexity and dynamism of modern computational environments demand more adaptive and intelligent methodologies for strategic decision-making, especially in multi-agent systems. These systems, consisting of multiple autonomous agents interacting in shared environments, pose challenges that traditional models struggle to address. Conventional approaches to strategy optimization in game-theoretic contexts typically rely on static models and predefined payoff matrices. While effective in controlled settings, these models lack the flexibility to reflect the adaptive and interactive behavior of agents in real-world scenarios. They often ignore bounded rationality, where agents have limited knowledge or computational resources, and thus fail to capture the full complexity of dynamic environments. In contrast, emerging techniques such as reinforcement learning, evolutionary computation, and stochastic games offer more dynamic and responsive alternatives. These approaches can adapt to changing conditions, learn from interactions, and generalize across different strategic contexts. They provide a more accurate reflection of real-world scenarios where agents must constantly revise strategies based on limited or evolving information. Such advanced methodologies are critical for practical applications in areas like autonomous systems, economic modeling, distributed control, and cybersecurity. They enable the development of systems capable of robust decision-making in uncertain, multi-agent environments. Embracing these innovations will be key to addressing the limitations of static models and realizing the full potential of intelligent computational strategies in dynamic and complex systems.
Intelligent agents designed for interactive environments face significant challenges in text-based games, a domain that demands complex reasoning and adaptability. While agents based on large language models (LLMs) using self-reflection have shown promise, they struggle when initially successful and exhibit reduced effectiveness when using smaller LLMs. We introduce Sweet&Sour, a novel approach that addresses these limitations in existing reflection methods by incorporating positive experiences and managed memory to enrich the context available to the agent at decision time. Our comprehensive analysis spans both closed- and open-source LLMs and demonstrates the effectiveness of Sweet&Sour in improving agent performance, particularly in scenarios where previous approaches fall short.
Large Language Models (LLMs) are pivotal AI agents in complex tasks but still face challenges in open decision-making problems within complex scenarios. To address this, we use the language logic game ``Who is Undercover?'' (WIU) as an experimental platform to propose the Multi-Perspective Team Tactic (MPTT) framework. MPTT aims to cultivate LLMs' human-like language expression logic, multi-dimensional thinking, and self-perception in complex scenarios. By alternating speaking and voting sessions, integrating techniques like self-perspective, identity-determination, self-reflection, self-summary and multi-round find-teammates, LLM agents make rational decisions through strategic concealment and communication, fostering human-like trust. Preliminary results show that MPTT, combined with WIU, leverages LLMs' cognitive capabilities to create a decision-making framework that can simulate real society. This framework aids minority groups in communication and expression, promoting fairness and diversity in decision-making. Additionally, our Human-in-the-loop experiments demonstrate that LLMs can learn and align with human behaviors through interactive, indicating their potential for active participation in societal decision-making.
We introduce and study a group formation game in which individuals form groups so as to achieve high collective strength. This strength could be group identity, reputation, or protection, and is equally shared by all group members. The group's strength is derived from its access to resources possessed by its members, and is traded off against the geographic dispersion of the group; spread-out groups are costlier to maintain. We seek to understand the properties of stable groupings in such a setting. We define several types of equilibria, where a member wishing to join a new group requires the acceptance of that group, and may further require permission to leave its current group. We show that under natural assumptions on the group utility functions, some of these equilibria always exist, and that any sequence of improving deviations by agents (or subsets of agents in the same group) converges to an equilibrium. In characterizing the properties of these equilibria, We show that an “encroachment” relationship — which groups have members in the territory of other groups — always gives rise to a directed acyclic graph (DAG). We relate our model to observations of well-established groups in a real-world dataset.
This paper develops a game-theoretic model and an agent-based model to study group formation driven by resource pooling, spatial cohesion, and heterogeneity. We focus on cross-sector partnerships (CSPs) involving public, private, and nonprofit organizations, each contributing distinct resources. Group formation occurs as agents strategically optimize their choices in response to others within a competitive setting. We prove the existence of stable group equilibria and simulate formation dynamics under varying spatial and resource conditions. The results show that limited individual resources lead to groups that form mainly among nearby actors, while abundant resources allow groups to move across larger distances. Increased resource heterogeneity and spatial proximity promote the formation of larger and more diverse groups. These findings reveal key trade-offs shaping group size and composition, guiding strategies for effective cross-sector collaborations and multi-agent systems.
No abstract available
Cooperative Control of Multi-UAV for Multi-Targets Encirclement and Tracking Based on Potential Game
This paper investigates a cooperative control of multiple unmanned aerial vehicle (multi-UAV) for multiple target (multi-target) encirclement and tracking. Different from the existing works, target clustering and task assignment are considered in this paper. Target clustering according to Density Based Spatial Clustering of Applications with Noise (DBSCAN). The noise points are added as a new cluster or joining an adjacent cluster. When the distance between two clusters is relatively close, they are classified into a cluster group, which avoids wasting UAV resources. The potential game task allocation algorithm enables the UAVs to reach Nash equilibria and obtain the optimal allocation strategy in fast iterations. The encirclement and tracking guidance law is a coupled control method, which can make the UAV formation converge quickly and maintain stability. Finally, the effectiveness of the control strategy is verified by simulation.
No abstract available
This paper studies strategic group formation for local anomaly detection with potential applications to Cognitive Radio Networks (CRN) and the Internet-of-Things (IoT). The problem comprises multiple local anomaly detection tasks which use machine learning (ML) models and partial data. We consider a two-layer network structure with anomaly detectors in the lower layer acting as local anomaly detectors and central nodes at the upper layer as data aggregators, which train the ML models used by local anomaly detectors. The problem is addressed using a strategic (non-cooperative) game formulation, where all central nodes and detectors are players. The players interactively learn one or multiple optimal machine learning models for their dynamically identified local anomaly detection problems. The game is next formulated as a successive optimization problem and solved using the player's best responses to compute a Nash equilibrium. Under mild conditions, we prove that this group formation game is also an exact potential game. Experimental results are consistent with theoretical ones and show fast convergence to the solution.
Coordination and cooperation between humans and autonomous agents in cooperative games raise interesting questions on human decision making and behaviour changes. Here we report our findings from a group formation game in a small-world network of different mixes of human and agent players, aiming to achieve connected clusters of the same colour by swapping places with neighbouring players using non-overlapping information. In the experiments the human players are incentivized by rewarding to prioritize their own cluster while the model of agents’ decision making is derived from our previous experiment of purely cooperative game between human players. The experiments were performed by grouping the players in three different setups to investigate the overall effect of having cooperative autonomous agents within teams. We observe that the human subjects adjust to autonomous agents by being less risk averse, while keeping the overall performance efficient by splitting the behaviour into selfish and cooperative actions performed during the rounds of the game. Moreover, results from two hybrid human-agent setups suggest that the group composition affects the evolution of clusters. Our findings indicate that in purely or lesser cooperative settings, providing more control to humans could help in maximizing the overall performance of hybrid systems.
No abstract available
With the rapid development of unmanned aerial vehicle (UAV) manufacturing technology, large-scale UAV swarm ad hoc networks are becoming widely used in military and civilian spheres. UAV swarms equipped with ad hoc networks and satellite networks are being developed for 6G heterogeneous networks, especially in offshore and remote areas. A key operational aspect in large-scale UAV swarm networks is slot allocation for large capacity and a low probability of conflict. Traditional methods typically form coalitions among UAVs that are in close spatial proximity to reduce internal network interference, thereby achieving greater throughput. However, significant internal interference still persists. Given that UAV networks are required to transmit a substantial amount of safety-related control information, any packet loss due to internal interference can easily pose potential risks. In this paper, we propose a distributed time coalition formation game algorithm that ensures the absence of internal interference and collisions while sharing time slot resources, thereby enhancing the network’s throughput performance. Instead of forming a coalition from UAVs within a contiguous block area as used in prior studies, UAV nodes with no interference from each other form a coalition that can be called a time coalition. UAVs belonging to one coalition share their transmitting slots with each other, and thus, every UAV node achieves the whole transmitting slots of coalition members. They can transmit data packets simultaneously with no interference. In addition, a distributed coalition formation game-based TDMA (DCFG-TDMA) protocol based on the distributed time coalition formation algorithm is designed for UAV swarm ad hoc networks. Our simulation results verify that the proposed algorithm can significantly improve the UAV throughput compared with that of the conventional TDMA protocol.
In this paper, we introduce third-party intelligent reflecting surfaces (TIRSs) into the physical layer security aware wireless communication system, where a central legitimate transmitter is designed to transmit secret signals to a group of legitimate receivers in the presence of the threat from an active eavesdropper (EV). Due to the channel reshaping ability of TIRSs, they are able to not only help legitimate pairs (LPs) enhance the secure transmission rate but also assist EV in improving the eavesdropping performance. Furthermore, with the potential selfishness, TIRSs may dynamically choose to ally with LPs or EV in exchange for potential benefits (e.g., payoffs). This leads to complex dynamic ally-adversary relationships among LPs, EV, and TIRSs under unpredictable wireless channel conditions. To address this issue, we formulate a repeated coalition formation game (RCFG) with dynamic decision-making to model the long-term strategic interactions among LPs, EV, and TIRSs. In particular, we theoretically analyze the existence of Nash equilibrium in the formulated RCFG, and then propose a switch operations-based coalition selection along with a deep reinforcement learning (DRL)-based approach for obtaining such an equilibrium. Simulations examine the feasibility of the proposed approach and show its superiority over counterparts.
Edge computing is an efficient way to help constrained IoT devices by offloading heavy tasks on edge servers, especially computing tasks related to machine learning (ML). Moreover, such devices can only store a limited amount of data because of their reduced capacity. Consequently, ML is bound to be smeared with relatively high error prediction as these devices resort to a small training dataset for their learning. To mend that issue, IoT devices can group in clusters and resort to federated learning (FL) with their pairs in the same cluster or coalition. However, the learned model needs to be transmitted repeatedly over a wireless access network, which is energy consuming. Hence, although learning collectively through FL can reduce the learned model variance, it inflicts a communication cost, dependent on the coalition size, that must be taken into account. Therefore, a cost function is devised astutely by factoring in both the prediction error and communication cost in a learning cluster. Then, a coalition formation game is conceived to minimize the devised cost function. Autonomous IoT devices will engage in the proposed game leading to coalitions of optimal size. Once clusters are formed, distributed FL is applied in any cluster in order to reduce the learning error of participating devices while curbing their communication cost.
In this article, adaptive group time-varying formation tracking (TVFT) control problems for unmanned swarm systems are investigated based on networked distributed game. To describe the competition which commonly exists between different subgroups, a novel adaptive group TVFT control framework is designed. Specifically, the group-leaders are required to converge to the Nash equilibrium (NE) of intergroup game, meanwhile the followers within each group will simultaneously realize TVFT relative to their group-leader. In addition, benefit from the adaptive mechanism, the dependence of controller parameters on system scale, interaction topology, and cost function is eliminated. By constructing Lyapunov function with both NE seeking error and TVFT error, the stability of the coupled closed-loop swarm system is proved. An experiment platform with unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) is constructed, then both simulation and experiment results are provided to demonstrate the effectiveness of the proposed controller.
Citizen Broadband Radio Service (CBRS) uses the spectrum of the 3.5 GHz band. It allows general authorized access (GAA) users to access the spectrum through the spectrum access system (SAS) while ensuring no interference with incumbent users and priority access license (PAL) holders. This work proposes a novel channel allocation scheme for GAA users, implemented through the SAS. The proposed schemes are based on a cooperative game-theoretic approach called the hedonic coalition formation (HCF) game and genetic algorithm (GA). In the former, each coalition represents a group of users sharing the same channel within the 3.5 GHz band, whereas the coalitions are formed to minimize cochannel interference (CCI). In the latter, GA is utilized to find a suboptimal channel allocation. This work demonstrates the effectiveness of the proposed schemes in comparison with Approach 1 in the standard.
In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (EV), while there exist a number of third-party IRSs (TIRSs) which can choose to form a coalition with either legitimate pairs (LPs) or the EV to improve their respective performances in exchange for potential benefits (e.g., payments). Unlike existing works that commonly restricted to friendly IRSs or malicious IRSs only, we study the complicated dynamic ally-adversary relationships among LPs, EV and TIRSs, under unpre-dictable wireless channel conditions, and introduce a RCFG to model their long-term strategic interactions. Particularly, we first analyze the existence of Nash equilibrium (NE) in the formulated RCFG, and then propose a switch operations-based coalition selection along with a deep reinforcement learning (DRL)-based algorithm for obtaining such equilibrium. Simulations examine the feasibility of the proposed algorithm and show its superiority over counterparts.
In this paper, we proposed a context-aware group buying mechanism to reduce users’ data cost based on the content similarity. Each user's cost is formulated as the combination of the content-aware data cost and location-aware sharing cost. Data cost is the payoff of the spectrum owner's channel to download files and sharing cost is the energy and time cost in transmitting files among the coalition. Compared with downloading data alone, users would like to form different groups and download the traffic data first and then share data among the group to achieve a lower cost. The cost reducing problem through group buying mechanism is modeled as a coalition formation game (CFG). Besides the traditional Pareto order, a coalition order maximizing the coalition's benefit and a selfish order maximizing users’ benefit are proposed. The CFGs with the two proposed orders are proved to be potential games, respectively, and the existence of the stable coalition partitions are also guaranteed by Nash equilibria. A cooperative exchange mechanism is designed, where users can make decisions cooperatively to achieve better performance. Simulation results show that the context awareness group buying reduces the cost and improves the benefit significantly compared with the situation without context awareness. The proposed orders both have better performance than the Pareto order.
Unmanned aerial vehicles (UAVs) enable promising solutions in assisting data collection in wide-area distributed sensor networks, leveraging their advanced properties of high mobility and line-of-sight communication links. However, existing UAV-assisted data collection methods mainly focus on unilaterally maximizing the utility of UAVs or sensors. Unfortunately, the problem driven by the market economy is ignored, namely the game between buyer and seller, in the process of sensors competing for UAV services. To address this problem, we propose a group-buying coalition auction method that encourages sensors to form coalitions to bid for UAV data collection services. Then, a parallel variable neighborhood ascent search algorithm is designed to quickly search the approximately optimal group-buying coalition structure. We further propose a novel group-buying coalition auction method, named TRUST, which can ensure the economical properties, i.e., truthfulness, individual rationality, and maximization of social welfare. Numerical results show that the sensors’ average age of information (AoI) under the proposed method is reduced by 16.7% and 44.5% compared with the coalition formation game (CFG) and joint trajectory design-task scheduling (TDTS) UAV-to-community methods. To our best knowledge, this is the first effort on truthful coalition formation-based group-buying auction.
Blockchain has introduced a new era for online payment services and its economy with tamper-proof cryptocurrencies. However, blockchain, which is based on global peer-to-peer networks, has its limitations due to payment delays from global consensus and transaction costs for maintenance. Thus, payment channel networks (PCN) have been proposed as one of the most promising off-chain solutions, allowing users to pay directly through payment channels (PC), with minimal blockchain involvement. However, payment delays and cost problems still exist, especially given the large size of the PCN. This study proposes a multiparty payment channel (MPC) that enables multiple users to join the same PC and exchange payment transactions, compared to the legacy PC. To avoid a consensus procedure among users in the PC, we introduce sequential and parallel updates for the PC status. Since increasing the MPC size limits the advantages in terms of the delay and cost, we propose a distributed coalition formation algorithm to form the MPC group, in which each user has the choice to join or leave the group. Simulations show that the proposed algorithm establishes MPCs successfully, considering the trade-off between the payoff gain and the MPC delay cost.
The price of spectrum resources buying are costly due to its rareness in D2D networks. Although there were several group buying strategies introduced to enhance the buying power of SUs, the user's own specific resource requirements were not being considered. In order to reduce the overhead of spectrum acquisition, this article presents an approach of context-aware group buying, and model the problem as overlapping coalition formation (OCF) games based on users' data (spectrum resource). The proposed OCF games is proved to have stable coalition partition through Nash equilibrium (NE). Then we introduced context-aware algorithm based on SAP, simulation result shows the proposed method has better performance comparing with other methods.
Facing the challenge of statistical diversity in client local data distribution, personalized federated learning (PFL) has become a growing research hotspot. Although the state-of-the-art methods with model similarity-based pairwise collaboration have achieved promising performance, they neglect the fact that model aggregation is essentially a collaboration process within the coalition, where the complex multiwise influences take place among clients. In this paper, we first apply Shapley value (SV) from coalition game theory into the PFL scenario. To measure the multiwise collaboration among a group of clients on the personalized learning performance, SV takes their marginal contribution to the final result as a metric. We propose a novel personalized algorithm: pFedSV, which can 1. identify each client's optimal collaborator coalition and 2. perform personalized model aggregation based on SV. Extensive experiments on various datasets (MNIST, Fashion-MNIST, and CIFAR-10) are conducted with different Non-IID data settings (Pathological and Dirichlet). The results show that pFedSV can achieve superior personalized accuracy for each client, compared to the state-of-the-art benchmarks.
Non-orthogonal multiple access (NOMA) in satellite communication (SATCOM) system can bring high spectral efficiency and massive connectivity. In this paper, we investigate files delivery and share optimization in LEO satellite-terrestrial integrated networks (STINs). A NOMA based coalition formation game (CFG) approach is proposed for minimizing total cost, in which satellite transfer files to head users in each group via NOMA, and head users utilize device to device (D2D) communications to share files among users. Firstly, head users selection algorithm is proposed to choose each group's head users receiving files from satellite. Precoding vectors and transmit power optimization with imperfect channel state information (CSI) are derived to achieve successful NOMA transmission and files download cost minimization in each group. A graph theory based algorithm is proposed to find the optimal D2D links and minimize files share cost. Then, we formulate a CFG and propose a preference order (Group-Best order) for user grouping. Furthermore, we have proved that the CFG with Group-Best order is an exact potential game (EPG), which can get stable group partition and achieve global optimization, i.e., minimizing total cost. Finally, a best response algorithm for NOMA based CFG (NCFG) is proposed to find stable group partition. Simulation results verify that our proposed approach is better than other approaches.
Cooperative spectrum sensing improves the sensing performance of secondary users by exploiting spatial diversity in cognitive radio networks. However, the cooperation of secondary users introduces some overhead also that may degrade the overall performance of cooperative spectrum sensing. The trade-off between cooperation gain and overhead plays a vital role in modeling cooperative spectrum sensing. This paper considers overhead in terms of reporting energy and reporting time. We propose a cooperative spectrum sensing based coalitional game model where the utility of the game is formulated as a function of throughput gain and overhead. To achieve a rational average throughput of secondary users, the overhead incurred is to be optimized. This work emphasizes on optimization of the overhead incurred. In cooperative spectrum sensing, the large number of cooperating users improve the detection performance, on the contrary, it increases overhead too. So, to limit the maximum coalition size we propose a formulation under the constraint of the probability of false alarm. An efficient fusion center selection scheme and an algorithm to select eligible secondary users for reporting are proposed to reduce the reporting overhead. We also outline a distributed cooperative spectrum sensing algorithm using the properties of the coalition formation game and prove that the utility of the proposed game has non-transferable properties. The simulation results show that the proposed schemes reduce the overhead of reporting without compromising the overall detection performance of cooperative spectrum sensing.
Non-orthogonal multiple access (NOMA) is regarded as a promising technology to provide high spectral efficiency and support massive connectivity in 5G systems. In most existing NOMA user grouping approaches, users are grouped into disjoint groups, which may lead to a waste of power resources within each NOMA group. Motivated by this, in this paper we propose a novel generalized user grouping (GuG) concept for NOMA from an overlapping perspective, which allows each user to participate in multiple groups but subject to individual maximum power constraint. In order to achieve effective GuG and maximize the system sum rate, we formulate a joint power control and GuG optimization problem. Then, we address this problem by exploiting the overlapping coalition formation (OCF) game framework, and we further propose an OCF-based algorithm in which each user can be self-organized into a desirable overlapping coalition structure. Simulation results verify the efficiency of GuG in NOMA systems and indicate that compared with traditional NOMA user grouping schemes, our proposed OCF-based GuG NOMA scheme achieves significant performance gains in terms of system sum rate.
With the rapid development of sensor technology and mobile services, the service model of mobile crowd sensing (MCS) has emerged. In this model, user groups perceive data through carried mobile terminal devices, thereby completing large-scale and distributed tasks. Task allocation is an important link in MCS, but the interests of task publishers, users, and platforms often conflict. Therefore, to improve the performance of MCS task allocation, this study proposes a repeated overlapping coalition formation game MCS task allocation scheme based on multiple-objective particle swarm optimization (ROCG-MOPSO). The overlapping coalition formation (OCF) game model is used to describe the resource allocation relationship between users and tasks, and design two game strategies, allowing users to form overlapping coalitions for different sensing tasks. Multi-objective optimization, on the other hand, is a strategy that considers multiple interests simultaneously in optimization problems. Therefore, we use the multi-objective particle swarm optimization algorithm to adjust the parameters of the OCF to better balance the interests of task publishers, users, and platforms and thus obtain a more optimal task allocation scheme. To verify the effectiveness of ROCG-MOPSO, we conduct experiments on a dataset and compare the results with the schemes in the related literature. The experimental results show that our ROCG-MOPSO performs superiorly on key performance indicators such as average user revenue, platform revenue, task completion rate, and user average surplus resources.
Spatial Crowdsourcing (SC), which outsources location-dependent tasks to workers for physical completion, is gaining popularity. Recently, more complex tasks have emerged that require a group of workers collaborating in a coalition. Several pioneering studies have examined this issue using the server assigned tasks mode from an overall perspective, such as maximizing the total benefits of all workers. Unfortunately, maximizing the overall benefit does not necessarily align with maximizing individual benefits. In practice, crowd workers are often self-interested and autonomous, making decisions based on their personal perspectives. In this article, under the worker selected tasks mode, we investigate an important problem: Selfish Workers Coalition Formation (SWCF) problem in SC. Here, selfish workers autonomously form coalitions to accomplish tasks to maximize their individual benefits. Achieving a stable coalition formation for SWCF problem requires balancing cooperation and competition. First, we transform the SWCF problem into a hedonic coalition formation game using a devised exploited skills-based reward distribution model. Subsequently, we propose a distributed algorithm HCFTA and prove its Nash stability and performance bounds. Additionally, to enhance coalition formation efficiency, we propose a Markov blanket coloring parallel optimization algorithm MCPHCF. Extensive experiments demonstrate the superiority of the proposed methods on both synthetic and real-world datasets.
This paper studies controlling segregation in social networks via exogenous incentives. We construct an edge formation game on a directed graph. A user (node) chooses the probability with which it forms an inter- or intra- community edge based on a utility function that reflects the tradeoff between homophily (preference to connect with individuals that belong to the same group) and the preference to obtain an exogenous incentive. Decisions made by the users to connect with each other determine the evolution of the social network. We explore an algorithmic recommendation mechanism where the exogenous incentive in the utility function is based on weak ties which incentivizes users to connect across communities and mitigates the segregation. This setting leads to a submodular game with a unique Nash equilibrium. In numerical simulations, we explore how the proposed model can be useful in controlling segregation and echo chambers in social networks under various settings.
Space–Air–Ground–Sea Integrated Networks (SAGSINs) are emerging as a key enabling architecture for broadband maritime connectivity, where heterogeneous access tiers (shore, aerial, and satellite) jointly support delay-sensitive and mission-critical uplink traffic such as alarms, telemetry, and surveillance video. However, uplink resource scheduling in maritime SAGSINs remains challenging due to time-varying channels, locally bursty traffic, and intense contention, while centralized optimization is ill-suited, as global information collection is often delayed, incomplete, and inconsistent over long-haul maritime links. Therefore, this paper investigates distributed uplink scheduling in maritime SAGSINs, where maritime nodes jointly select the access tier, spectrum slice, and transmit power under interference, spectrum, deadline, and energy constraints. Specifically, we formulate the uplink resource scheduling as a cumulative value of information (VoI) maximization problem, and develop a game-theoretic distributed multi-agent reinforcement learning algorithm, termed GTMARL. Therein, maritime nodes learn transmission policies from local observations, coordinated through congestion prices broadcast by access nodes. These prices are derived from Lagrangian relaxation and act as coordination signals that align individual decisions with global objectives. To ensure stable operation, a two-timescale mechanism is adopted, where maritime nodes make fast slot-level transmission decisions, while access nodes adapt and broadcast congestion prices on a slower timescale. Extensive experiments demonstrate that GTMARL achieves up to 90% of the idealized upper bound, significantly outperforming baselines in deadline satisfaction, throughput, delay, energy efficiency and fairness under varying traffic loads and network densities.
The assignment of resources dynamically is a decisive issue in the new paradigms of computations including cloud, edge, fog computing, and the Internet of Things (IoT). These are heterogeneous resource environments, decentralized control environments, and overload environments that vary at a rapid pace and require adaptive and intelligent allocation schemes. The paper suggests a hybrid quantum-game-theoretic model, which combines quantum optimization approaches and multi-agent game theory to solve the problems of multi-agent resource allocation in dynamic systems. Using the quantum concept of superposition and entanglement, the framework can explore the solution spaces in parallel and therefore increase convergence speed and allocation efficiency.A quantum game-theoretic model is developed, which entails price-motivated system to control the strategic interaction of rational agents, in addition to maintaining stability, autonomy and equilibrium. The resource allocation problem is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solved with the Quantum Approximate Optimization Algorithm (QAOA) algorithm. The proposed approach is proved to be useful as simulation results show significant gains to classical baselines in terms of utility gain, fairness, and system throughput. The results highlight the possibilities of the quantum-game-theoretic paradigm as a fundamental engine of next-generation computing infrastructures, including networked cloud, edge and IoT with networked computing nodes in VPN-enabled networks.
Heterogeneous networks, which integrate diverse access nodes such as base stations, WiFi access points, and unmanned aerial vehicles (UAVs), provide an effective solution to address the computational bottlenecks of mobile devices in edge computing. The differences in coverage, bandwidth, and channel characteristics among network nodes enable flexible resource allocation and expanded service coverage. This paper proposes a novel two-stage optimization framework that integrates evolutionary game theory with Multi-Agent Soft ActorCritic (MASAC) for user association and task offloading in heterogeneous edge computing environments comprising base stations, WiFi APs, and UAVs. In the first stage, we model user association as an evolutionary game to achieve Nash equilibrium among users, enabling stable and efficient allocation of users to access nodes. In the second stage, we formulate the task offloading problem as a multi-agent reinforcement learning (MARL) problem and employ MASAC to optimize offloading decisions based on the user association results. Experimental results demonstrate that our proposed approach outperforms existing methods (MATD3, MADDPG) with 38.5% improvement in reward, 33.6% reduction in latency, and 24.7% decrease in energy consumption compared to MATD3, and 75.1%, 58.1%, and 48.4% improvements compared to MADDPG, respectively.
This paper investigates the multi-dimensional resource allocation in air-terrestrial integrated networks under jamming environments, where task requirements, the mobility of unmanned aerial vehicles (UAVs), network topology, and anti-jamming spectrum access decisions are strongly coupled. Besides, the heterogeneous attributes of UAVs and ground users (GUs) further complicate the solution of the multi-dimensional optimization problem. Thus, the problem is first formulated as a hierarchical game model comprising two layers. i) a high-layer partially observable stochastic game (POSG) for joint UAV trajectory optimization and GUs channel selection, and ii) a low-layer coalition formation game (CFG) for managing dynamic network associations. It is proved that the low-layer CFG converges to a stable coalition structure using exact potential game (EPG) theory. In addition, we propose a novel multi-agent coalition formation learning framework, and accordingly design a coalition-based heterogeneous multi-agent proximal policy optimization algorithm that dynamically decouples UAV movement, GUs’ channel access, and adaptive UAV-GU associations under task-specific demand. Meanwhile, a novel metric named Quality of Task-completion (QoT) is proposed to quantify how well communication systems meet the heterogeneous demands of tasks, and a hierarchical reward mechanism is designed to enhance coordination between UAVs and GUs. Extensive simulation results demonstrate the superiority of our approach over existing benchmarks in QoT, jamming robustness, and energy efficiency.
Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneous edge nodes are cooperated for real-time task processing. We derive a vehicle-to-infrastructure (V2I) transmission model by considering both intra-edge and inter-edge interferences and formulate a cooperative resource optimization (CRO) problem by jointly optimizing the task offloading and resource allocation, aiming at maximizing the service ratio. Further, we decompose the CRO into two subproblems, namely, task offloading and resource allocation. In particular, the task offloading subproblem is modeled as an exact potential game (EPG), and a multi-agent distributed distributional deep deterministic policy gradient (MAD4PG) is proposed to achieve the Nash equilibrium. The resource allocation subproblem is divided into two independent convex optimization problems, and an optimal solution is proposed by using a gradient-based iterative method and KKT condition. Finally, we build the simulation model based on real-world vehicle trajectories and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solutions.
With the increasing diversity of use cases and service requirements in heterogeneous networks, the concept of network slicing has emerged. However, user association, distributed resource allocation, and the high-speed data rate demands of different users still face numerous challenges. To address these issues, we propose a UAV-assisted RAN resource slicing framework in heterogeneous networks. Firstly, we employ a stable matching game algorithm to solve the access problem between UAVs (unmanned aerial vehicles) and TBSs (terrestrial base stations). Secondly, we formulate a joint user association and slicing resource allocation problem. However, the optimization problem is non-convex, and the problem is decoupled into two sub-problems: user association and slicing resource allocation. Moreover, a Lagrangian dual algorithm is employed to solve the user association problem, while Multi-Agent Deep Deterministic Policy Gradient based on Matching Game and Lagrangian Dual (MADDPG-M&L) slicing resource allocation algorithm is proposed to determine the allocation ratio of resources for each slice. Simulation results show that the Lagrangian dual-based user association algorithm improves the system performance by 12.8%, 36.2% and 61.9% respectively compared to the other three user association methods. Furthermore, compared to MATD3-M&L, MASAC-M&L, and Hard-slicing, the proposed MADDPG-M&L algorithm improves the throughput by 36.3%, 105%, and 177%, respectively. In terms of latency, the improvements are 46%, 68%, and 86.7%, respectively. For SINR, the increases are 5.2%, 2.9%, and 6.4%, respectively. The objective function improves by 54.7%, 218%, and 336%, respectively, with the data transmission rate showing the most significant improvement.
No abstract available
Despite the success of game-theoretic models in security resource allocations against adversaries, existing works have fallen short in addressing the critical challenge of team defense with composable targets. Composable targets, commonly seen in cybersecurity practices like vulnerability analysis, consist of heterogeneous tasks that can be processed by different defenders. The intrinsic heterogeneity and potential precedence constraints among tasks present a great challenge to devising optimal defender strategies. In this paper, we propose a general-sum Stackelberg game model for team defense with composable targets. We develop SWING, a novel method that efficiently calculates optimal defense strategies by combining binary search, linear programming, and column generation. We prove that our algorithm calculates strong Stackelberg equilibrium (SSE), and that in practice, it is runtime-efficient at finding optimal strategies. To further enhance the applicability of SWING, we extend its capabilities to encompass defense tasks with precedence constraints. This is achieved by leveraging flexible job shop problem (FJSP) literature to devise a branch-and-bound-based method. Our empirical evaluations illustrate that this extension enhances runtime efficiency and substantially improves solution quality compared to baseline methods.
Vehicle-and-infrastructure cooperation is emerging in vehicle-to-everything (V2X) communication to increase traffic efficiency and road safety with advanced services. Static infrastructures like roadside units (RSUs) have the potential to provide stable and high-quality communication services, but suffer overload problems caused by uneven spatiotemporal distribution of vehicles. Unmanned aerial vehicles (UAVs) with high flexibility can establish the line-of-sight (LoS) links but require extra scheduling overheads. Furthermore, the scarce spectrum resources, complex interference, limited energy budgets, and the mobility of automobiles also pose significant challenges. In this paper, we combine mean-field game (MFG) theory with multi-agent reinforcement learning (MARL) to allocate resources for heterogeneous infrastructures in non-orthogonal multiple access (NOMA) V2X communication networks. First, a joint sub-band scheduling, transmit power allocation, and UAV deployment problem is addressed, aiming to jointly optimize communication resources for heterogeneous infrastructures under power and QoS constraints. Subsequently, considering the differences among nodes, multi-type agents are designed and applied MARL to get self-learning ability and collaboration. Moreover, MFG theory is employed to tackle the tremendous overhead caused by agent interactions in MARL. Finally, simulation results demonstrate that our proposed method outperforms two state-of-the-art resource allocation schemes in both average energy efficiency and probability of failure.
No abstract available
We consider a team of heterogeneous agents, which is collectively responsible for servicing, and subsequently reviewing a stream of homogeneous tasks. Each agent has an associated mean service time and a mean review time for servicing and reviewing the tasks, respectively. Agents receive a reward based on their service and review admission rates. The team objective is to collaboratively maximize the number of “serviced and reviewed” tasks. We formulate a common-pool resource game, and design utility functions to incentivize collaboration among heterogeneous agents in a decentralized manner. We show the existence of a unique pure Nash equilibrium (PNE), and establish convergence of the best response dynamics to this unique PNE. Finally, we establish an analytic upper bound on three inefficiency measures of the PNE, namely the price of anarchy, the ratio of the total review admission rate, and the ratio of latency.
No abstract available
We consider a team of heterogeneous agents that is collectively responsible for servicing and subsequently reviewing a stream of homogeneous tasks. Each agent (autonomous system or human operator) has an associated mean service time and mean review time for servicing and reviewing the tasks, respectively, which are based on their expertise and skill-sets. The team objective is to collaboratively maximize the number of "serviced and reviewed" tasks. To this end, we formulate a Common-Pool Resource (CPR) game and design utility functions to incentivize collaboration among team-members. We show the existence and uniqueness of the Pure Nash Equilibrium (PNE) for the CPR game. Additionally, we characterize the structure of the PNE and study the effect of heterogeneity among the agents at the PNE. We show that the formulated CPR game is a best response potential game for which both sequential best response dynamics and simultaneous best reply dynamics converge to the Nash equilibrium. Finally, we numerically illustrate the price of anarchy for the PNE.
No abstract available
In this paper, we present the Proportional Payoff Allocation Game (PPA-Game), which characterizes situations where agents compete for divisible resources. In the PPA-game, agents select from available resources, and their payoffs are proportionately determined based on heterogeneous weights attributed to them. Such dynamics simulate content creators on online recommender systems like YouTube and TikTok, who compete for finite consumer attention, with content exposure reliant on inherent and distinct quality. We first conduct a game-theoretical analysis of the PPA-Game. While the PPA-Game does not always guarantee the existence of a pure Nash equilibrium (PNE), we identify prevalent scenarios ensuring its existence. Simulated experiments further prove that the cases where PNE does not exist rarely happen. Beyond analyzing static payoffs, we further discuss the agents' online learning about resource payoffs by integrating a multi-player multi-armed bandit framework. We propose an online algorithm facilitating each agent's maximization of cumulative payoffs over T rounds. Theoretically, we establish that the regret of any agent is bounded by O(log^1 + η T) for any η > 0. Empirical results further validate the effectiveness of our online learning approach.
In this article, we introduce a game-theoretic learning framework for the multi-agent wireless network. By combining learning in artificial intelligence (AI) with game theory, several promising properties emerge such as obtaining high payoff in the unknown and dynamic environment, coordinating the actions of agents and making the adversarial decisions with the existence of malicious users. Unfortunately, there is no free lunch. To begin with, we discuss the connections between learning in AI and game theory mainly in three levels, i.e., pattern recognition, prediction and decision making. Then, we discuss the challenges and requirements of the combination for the intelligent wireless network, such as constrained capabilities of agents, incomplete information obtained from the environment and the distributed, dynamically scalable and heterogeneous characteristics of wireless network. To cope with these, we propose a game-theoretic learning framework for the wireless network, including the internal coordination (resource optimization) and external adversarial decision-making (anti-jamming). Based on the framework, we introduce several attractive game-theoretic learning methods combining with the typical applications that we have proposed. What's more, we developed a real-life testbed for the multi-agent anti-jamming problem based on the game-theoretic learning framework. The experiment results verify the effectiveness of the proposed game-theoretic learning method.
Federated learning (FL) is an emerging technology for empowering various applications that generate large amounts of data in intelligent cyber-physical systems (ICPS). Though FL can address users' concerns about data privacy, its maintenance still depends on efficient incentive mechanisms. For long-term incentivization to participants in data federation under dynamic environments, deep reinforcement learning as a promising technology has been extensively studied. However, the non-stationary problem caused by the heterogeneity of ICPS devices results in a serious effect on the convergence rate of existing single-agent reinforcement learning. In this paper, we propose a multi-agent learning-based incentive mechanism to capture the stationarity approximation in FL with heterogeneous ICPS. First, we formulate the secure communication and data resource allocation problem as a Stackelberg game in FL with multiple participants. Then, to tackle the heterogeneous problem, we model this multi-agent game as a partially observable Markov decision process. Particularly, a multi-agent federated reinforcement learning algorithm is proposed to learn the allocation policies efficiently by dwindling variances in policy evaluation caused by interaction among multiple devices without sharing privacy information. Moreover, the proposed algorithm is proved to attain convergence at an expected rate. Lastly, extensive experimental results demonstrate that our proposed algorithm significantly outperforms baselines.
Pursuit-Evasion Game (PEG) is important in the robotics field. Determining the optimal strategy, such as the minimum number of pursuers needed, often require exponential time. We propose a novel parallel algorithm that significantly reduces the computation time for PEG, making it feasible to analyze games with a large number of states and transitions. This algorithm also extends to the heterogeneous/multi-speed players and incorporates advanced strategies. Additionally, we introduce a resource allocation algorithm for heterogeneous multi-agent teams, ensuring efficient resource sharing and real-time agent replacement to maintain fault tolerance. Our simulations demonstrate that our parallel algorithm significantly outperforms existing approaches, achieving up to 8.13 times speedup compared to the state-of-the-art. Furthermore, our algorithms enhance the scalability and practical applicability of solving PEGs.
No abstract available
No abstract available
No abstract available
We study the problem of optimal resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple computers of differing importance. An attacker tries to harm these targets by sending malicious packets from multiple entry points of the network; the defender thus needs to optimally allocate her resources to maximize the probability of malicious packet detection under network latency constraints. We formulate the problem as a graph-based security game with multiple resources of heterogeneous capabilities and propose a mathematical program for finding optimal solutions. We also propose Grande, a novel polynomial time algorithm that uses an approximated utility function to circumvent the limited scalability caused by the attacker's large strategy space and the non-linearity of the aforementioned mathematical program. Grande computes solutions with bounded error and scales up to problems of realistic sizes.
Future 6G networks require agile medium access control (MAC) protocols for dynamic conditions. Since traditional multi-agent reinforcement learning (MARL) falters with fluctuating agent numbers and typical LLM applications lack exploratory power for protocol emergence, we synergize LLMs with RL to propose LLM4MAC, overcoming these limitations. By reformulating uplink data transmission scheduling as a semantics-generalized partially observable Markov game (POMG), LLM4MAC encodes network operations in natural language and utilizes proximal policy optimization (PPO) to ensure continuous alignment with the evolving network dynamics. A structured identity embedding (SIE) mechanism further enables robust coordination among heterogeneous agents. Extensive simulations demonstrate that on top of a compact LLM, which is purposefully selected to balance performance with resource efficiency, the protocol emerging from LLM4MAC outperforms comparative baselines in throughput and generalization.
Federated learning (FL) in multi-service provider (SP) ecosystems is fundamentally hampered by non-cooperative dynamics, where privacy constraints and competing interests preclude the centralized optimization of multi-SP communication and computation resources. In this paper, we introduce PAC-MCoFL, a game-theoretic multi-agent reinforcement learning (MARL) framework where SPs act as agents to jointly optimize client assignment, adaptive quantization, and resource allocation. Within the framework, we integrate Pareto Actor-Critic (PAC) principles with expectile regression, enabling agents to conjecture optimal joint policies to achieve Pareto-optimal equilibria while modeling heterogeneous risk profiles. To manage the high-dimensional action space, we devise a ternary Cartesian decomposition (TCAD) mechanism that facilitates fine-grained control. Further, we develop PAC-MCoFL-p, a scalable variant featuring a parameterized conjecture generator that substantially reduces computational complexity with a provably bounded error. Alongside theoretical convergence guarantees, our framework’s superiority is validated through extensive simulations – PAC-MCoFL achieves approximately 5.8% and 4.2% improvements in total reward and hypervolume indicator (HVI), respectively, over the latest MARL solutions. The results also demonstrate that our method can more effectively balance individual SP and system performance in scaled deployments and under diverse data heterogeneity.
With the rapid development of the Internet of Things (IoT) and edge computing, the end-edge-cloud collaborative architecture has become the core infrastructure supporting key scenarios such as intelligent manufacturing and smart cities. However, traditional scheduling methods have significant limitations when dealing with heterogeneous resources, dynamic task requirements, and multi-objective optimization problems in end-edge-cloud environments. To this end, this paper proposes an end-edge-cloud collaborative optimization scheduling method based on the integration of deep reinforcement learning (DRL) and attention mechanisms. By constructing a dynamic cognitive perception meta-model, a multi-agent game-theoretic decision-making mechanism, and an adaptive attention weight distribution mechanism, it achieves efficient resource scheduling and task collaborative orchestration in complex end-edge-cloud environments. The experimental results show that, compared with the traditional methods, this method reduces the task processing delay, improves the resource utilization rate and increases the system throughput. This research provides an innovative solution to the scheduling optimization problem in the end-edge-cloud collaborative environment and lays an important foundation for the efficient operation of emerging intelligent applications.
No abstract available
In this paper, we study the heterogeneous facility location game with fractional preferences under resource constraints. In this model, a group of agents are positioned along the interval [0,1], where each agent has position information and fractional preferences indicated as support weights for facilities. Our main focus is to design mechanisms that choose and locate one facility out of two facilities while motivating agents to truthfully report their information, aiming to approximately maximize the social utility, defined as the sum of utilities of all agents. Based on the types of private information held by agents, we consider three different settings. For the known-preferences setting, we provide a deterministic group strategy-proof mechanism with 2-approximation and a randomized group strategy-proof mechanism with 4 over 3-approximation. We also provide lower bounds of 2 on the approximation ratio for any deterministic strategy-proof mechanism and 1.043 for any randomized strategy-proof mechanism. For the known-positions setting and the general setting, we present a deterministic group strategy-proof mechanism with 6-approximation and a randomized strategy-proof mechanism with 4-approximation, respectively. Furthermore, we give lower bounds of 1.554 for any deterministic strategy-proof mechanism and 1.2 for any randomized strategy-proof mechanism in the known-positions setting. Finally, we extend the model to the scenario of choosing k facilities out of m facilities. For the known-preferences setting, we provide a 2-approximate deterministic group strategy-proof mechanism, which is also the best deterministic strategy-proof mechanism. For the known-positions setting, when k ≥ 2, we give a lower bound of 2-1 over k for any deterministic strategy-proof mechanism.
Building on research previously reported at AAMAS conferences, this paper describes an innovative application of a novel game-theoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to assist in resource allocation tasks for airport protection at over 400 United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and one security activity (e.g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition and development of the system. In doing so we develop a software scheduling assistant, GUARDS, designed to reason over two agents --- the TSA and a potential adversary --- and allocate the TSA's limited resources across hundreds of security activities in order to provide protection within airports. The scheduling assistant has been delivered to the TSA and is currently under evaluation and testing for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nation-wide. In this paper we discuss the design choices and challenges encountered during the implementation of GUARDS. GUARDS represents promising potential for transitioning years of academic research into a nationally deployed system.
Federated learning (FL) is an emerging technology for empowering various applications that generate large amounts of data in intelligent cyber–physical systems (ICPS). Though FL can address users’ concerns about data privacy, its maintenance still depends on efficient incentive mechanisms. For long-term incentivization to participants in data federation under dynamic environments, deep reinforcement learning as a promising technology has been extensively studied. However, the nonstationary problem caused by the heterogeneity of ICPS devices results in a serious effect on the convergence rate of existing single-agent reinforcement learning. In this article, we propose a multiagent learning-based incentive mechanism to capture the stationarity approximation in FL with heterogeneous ICPS. First, we formulate the secure communication and data resource allocation problem as a Stackelberg game in FL with multiple participants. Then, to tackle the heterogeneous problem, we model this multiagent game as a partially observable Markov decision process. In particular, a multiagent federated reinforcement learning algorithm is proposed to learn the allocation policies efficiently by dwindling variances in policy evaluation caused by interaction among multiple devices without the requirement of sharing privacy information. Moreover, the proposed algorithm is proved to attain convergence at an expected rate. Finally, extensive experimental results demonstrate that our proposed algorithm significantly outperforms baseline approaches.
No abstract available
No abstract available
No abstract available
The stability property of the loss-aversion-based noncooperative switched systems with quadratic payoffs is investigated. In this system, each agent adopts the lower sensitivity parameter in the myopic pseudo-gradient dynamics for the case of losing utility than gaining utility, and both system dynamics and switching events (conditions) are depending on agents’ payoff functions. Sufficient conditions under which agents’ state converges toward the Nash equilibrium are derived in accordance with the location of the Nash equilibrium. In the analysis, the mode transition sequence and interesting phenomena, which we call flash switching are characterized. We present several numerical examples to illustrate the properties of our results.
Multi-agent systems often operate under feedback, adaptation, and non-stationarity, yet many simulation studies retain static decision rules and fixed control parameters. This paper introduces a general adaptive multi-agent learning framework that integrates: (i) four dynamic regimes distinguishing static versus adaptive agents and fixed versus adaptive system parameters; (ii) information-theoretic diagnostics (entropy rate, statistical complexity, and predictive information) to assess predictability and structure; (iii) structural causal models for explicit intervention semantics; (iv) procedures for generating agent-level priors from aggregate or sample data; and (v) unsupervised methods for identifying emergent behavioral regimes. The framework offers a domain-neutral architecture for analyzing how learning agents and adaptive controls jointly shape system trajectories, enabling systematic comparison of stability, performance, and interpretability across non-equilibrium, oscillatory, or drifting dynamics. Mathematical definitions, computational operators, and an experimental design template are provided, yielding a structured methodology for developing explainable and contestable multi-agent decision processes.
No abstract available
This article proposes an iterative‐based algorithm for open‐loop equilibrium seeking in a two‐agent noncooperative dynamic game. The finite‐horizon dynamic games handled in almost all existing articles assume the common prediction horizon length. From the viewpoint of divergence and differences in personal values, it is socially rational to solve the equilibrium of the games with asymmetric prediction horizon length. We thus propose an open‐loop equilibrium‐seeking algorithm without the private knowledge of the other agent's horizon information through a receding‐horizon linear‐quadratic game. Our proposed algorithm is based on iterative optimization. Each agent obtains its own best response control strategy while estimating the other agent's state feedback gain from the state information and guaranteeing the stability of the whole system. We also discuss the performance of the proposed equilibrium‐seeking algorithm through numerical examples.
This paper investigates the multi-agent pursuitevasion game and presents a distributed control framework for designing optimal strategies. In the proposed game-theoretic model, the pursuer collective adopts an optimization objective that simultaneously minimizes the distance to evaders and maintains prescribed inter-agent separation, while the evader collective implements adversarial strategies to maximize their distance from pursuers under safe inter-agent proximity constraints. By constructing the Hamilton-Jacobi-Isaacs (HJI) equations, this paper derives the solution for optimal control strategies of all agents and proves the existence of the system's Nash equilibrium based on differential game theory. Particularly for the single-pursuer-single-evader configuration, sufficient conditions ensuring guaranteed capture are theoretically established through Lyapunov stability analysis. To enhance operational reliability, an adaptive fault-tolerant control scheme with robustness guarantees is developed, capable of maintaining capture performance despite actuator effectiveness degradation failures in pursuers.
A zero-sum tax/subsidy approach and a necessary condition for stabilizing unstable Nash equilibria in pseudo-gradient-based noncooperative dynamical systems with vector-valued payoff functions are proposed. Specifically, we first present a necessary and sufficient condition for the Nash equilibrium of the noncooperative game with vector-valued payoff functions to be bounded. Then we give a sufficient condition for such Nash equilibrium to be stable. After that, we develop a framework where a system manager constructs a zero-sum tax/subsidy incentive structure by collecting taxes from one agent and giving the same amount of subsidy to the other agent to make the incentivized Nash equilibrium stable and bounded, which can make the trajectories converge to the interior of original Nash equilibrium set. Finally, we present a numerical example to illustrate the utility of the zero-sum tax/subsidy approach.
In this paper, we consider the stability problem of Nash equilibrium for a two-agent noncooperative dynamical system with hybrid myopic pseudo-gradient dynamics based on loss-aversion phenomena. In the considered noncooperative dynamical system, each agent adopts different constant sensitivity parameters for the case of losing utilities and gaining utilities. To characterize the stability property, some general characteristics of the active modes and rotational directions are discussed. Based on the integral of normalized radial growth rate, sufficient conditions under which agents’ state converges towards the Nash equilibrium are derived. We present a numerical example to illustrate the efficacy of our results.
As language is intrinsic to the expression of culture, the rise and fall of a language directly affect the culture associated with it. Therefore, constructing a computational model to study the mechanisms of language competition and explore policies of language preservation is very important. We address the language system’s macroscopic aspects, such as the prestige of languages, the difficulty level of learning languages, and natives’ tolerance toward nonnative languages, as well as individual interactions at the microscopic level, and then propose an agent network computation-based evolutionary game model (ANC-EGM), including two major components—the definition of language attractiveness and the language competition game, to model a more realistic dynamic evolving language system. The replicator equation is adopted to solve the evolutionary equilibrium, and the stability of the equilibrium points is analyzed by the local stability analysis of the Jacobian matrix. The theoretical analysis and simulations illustrate that the ANC-EGM can comprehensively model the competition between two languages and estimate how individual interactions lead to the demise or coexistence of languages. We further validate the conclusions of the ANC-EGM on the empirical data of the Minnan dialect and Mandarin, which show that the ANC-EGM can provide an experimental computing platform for the in-depth study of language policy regulation and language evolution rules.
This paper studies the formation control of high-order multi-agent systems, where the dynamics of agents are $n$-order integrators. Different from existing results, this paper investigates the problem from the viewpoint of aggregative games. An interesting discovery is that the Nash equilibrium of a quadratic aggregative game constitutes the desired formation. Moreover, a distributed algorithm is designed for these high-order agents to form the desired formation by seeking the Nash equilibrium, where every agent estimates the aggregate of the game. Furthermore, the convergence of the algorithm is analyzed via Lyapunov stability theory. In contrast with existing formation protocols, the high-order agents with the proposed algorithm exponentially converge to the desired formation without using the (relative) positions and velocities of formation neighbors. Finally, two examples illustrate the algorithm.
This paper investigates equilibrium and stability analysis in a two-agent non-cooperative dynamic game. Almost all existing papers handling finite-horizon dynamic games assume the common prediction horizon length, whereas this paper considers an asymmetric length case due to differences in personal values. We thus propose two possible control strategies without the knowledge of the other agent’s horizon information through a linear-quadratic game. One of the proposed control strategies is the receding horizon control based on an open-loop Nash equilibrium with the common horizon case. The other is an iterative optimization-based control with each agent estimating the other’s feedback gain from the state information. We also discuss the effectiveness of the proposed strategies and the stability condition of the corresponding closed-loop systems through numerical examples.
This paper studies the non-cooperative game problem of second-order multi-agent systems with external disturbances under undirected graphs. And the decisions of all players are subjected to linear coupling equality constraints. The disturbance terms are regarded as extended states for which observers are designed to estimate them. Based on the extended state observer, we propose an algorithm to achieve the seeking of the generalized Nash equilibrium for networked games. When the external disturbance is bounded disturbance, we show by Lyapunov stability analysis that the players' actions would converge to a small neighborhood of the generalized Nash equilibrium. Finally, we provide an example to illustrate our result.
In this paper, we investigate the noncooperative games of multi-agent systems. Different from existing noncooperative games, our formulation involves the high-order nonlinear dynamics of players, and the communication topologies among players are weight-unbalanced digraphs. Due to the high-order nonlinear dynamics and the weight-unbalanced digraphs, existing Nash equilibrium seeking algorithms cannot solve our problem. In order to seek the Nash equilibrium of the noncooperative games, we propose two distributed algorithms based on state feedback and output feedback, respectively. Moreover, we analyze the convergence of the two algorithms with the help of variational analysis and Lyapunov stability theory. By the two algorithms, the high-order nonlinear players exponentially converge to the Nash equilibrium. Finally, two simulation examples illustrate the effectiveness of the algorithms.
Water scarcity and allocation disputes have emerged as major challenges in increasingly urbanizing smart cities, where increasing population density, outdated infrastructure, high water losses, and unequal geographic distribution frequently result in shortages despite adequate overall supply. Traditional techniques, such as linear programming and agent-based modeling, have produced helpful insights, but they are still restricted in capturing varied stakeholder behaviors, assuring equilibrium stability in competitive contexts, and providing spatially adaptable solutions. To address these shortcomings, this study applies the concept of the Nash Equilibrium (NE) model within Game theory (GT) to model strategic interactions among households, industries, utilities, and regulators, each with distinct payoff functions. Once equilibrium is achieved, no stakeholder can unilaterally improve its outcome, thereby guaranteeing fairness and stability. Building on this theoretical foundation, the model integrates Optimized Multi-Objective Particle Swarm Optimization (OMOPSO) to efficiently explore Pareto-optimal trade-offs between economic, social, and environmental objectives, while Geographic Information Systems (GIS) incorporate spatial constraints to deliver geographically realistic allocation strategies. Experimental validation demonstrates that the proposed model consistently outperforms existing approaches within the framework of Multi-Objective Evolutionary Algorithms (MOEAs) in terms of convergence stability and computational efficiency. Beyond algorithmic performance, the findings highlight practical applications for tariff design, consumer incentive programs, infrastructure investment, and water-use restrictions. This study increases stateof-the-art urban water management by integrating GT, evolutionary optimization, and spatial analysis, while also providing policymakers with a strong and fair decision-support framework for sustainable resource allocation.
This article focuses on the prescribed-time fully distributed Nash equilibrium seeking (PT-FDNES) against denial-of-service attacks of multiagent systems (MASs), where the action of each agent is described by high-order uncertain nonlinear dynamics. Specifically, the fuzzy logic system technique is utilized to approximate the unknown nonlinear functions existing in the agents' dynamics, where a single parameter updating law is proposed by estimating the norm of fuzzy optimal weight vectors to further reduce the computational cost. To deal with the difficulty where the attack-induced topologies may destabilize the NE seeking performance, a novel PT function together with a resetting mechanism is delicately designed. Then, a PT-FDNES with adaptive gains is developed by incorporating the resetting mechanism to search for the NE independent of any global information. Subsequently, a novel PT fuzzy adaptive tracking control law is designed in the celebrated backstepping structure. Based on the Lyapunov stability theory, we prove that actions of the MAS converge to the NE within the PT and all the closed-loop system signals remain bounded. At last, an example is given to illustrate the effectiveness of the proposed algorithm.
Fully Distributed Robust Adaptive Nash Equilibrium Seeking of High-Order Uncertain Nonlinear Systems
This article investigates fully distributed robust adaptive Nash equilibrium (NE) seeking strategies in noncooperative games. Different from existing NE seeking results, this article considers high-order nonlinear multiagent systems (MASs) with mismatched uncertainties and disturbances. To deal with the challenges brought by the high-order structure, a new auxiliary system is first introduced based on the gradient play rule to generate a reference trajectory, which converges to the NE exponentially without requiring any global graph information. Then, a backstepping-based robust adaptive controller is developed for each agent to exponentially track the reference trajectory. By resorting to the Lyapunov stability theory, the developed robust seeking strategies drive all agents’ actions to the NE exponentially. Moreover, considering the circumstance that only the information of the agent’s action is available for control law design, an adaptive output-feedback NE seeking strategy is further developed by constructing K-filters to estimate the immeasurable states. Finally, the effectiveness of the two proposed NE seeking algorithms are verified by different simulation examples.
No abstract available
The generalized stability results of a Nash equilibrium in two-agent noncooperative systems are investigated with loss-averse consideration and quadratic payoffs. In this system, each agent is driven by the pseudo-gradient dynamics with piecewise constant sensitivity parameters for the situations of losing payoffs and gaining payoffs. Based on the mode analysis, the sufficient and necessary conditions under which agents’ state converge to the Nash equilibrium are derived. We make a bifurcation analysis for the loss-aversion-based noncooperative system to observe how the loss-averse behaviors change the stability property of the Nash equilibrium. It is found that the loss-averse behaviors may destabilize the Nash equilibrium. We present a sufficient condition of robust stability under which the loss-averse behaviors never destabilize the Nash equilibrium for any sensitivity parameters. A numerical example is provided to illustrate the effectiveness of our results.
Open multiagent systems (OMASs) feature a dynamic structure with agents continuously joining or leaving, resulting in shifting Nash equilibria and frequent disruptions of equality constraints. This inherent instability poses a significant challenge to traditional incremental-consensus-based distributed optimization or game methods, which rely on a stable and consistent agent population to compute and maintain equilibrium solutions effectively. The necessity for these methods to continuously enforce constraints and the time-intensive process of recalculating equilibria in response to agent dynamics present a substantial bottleneck in the optimization of OMASs. To address this challenge, we develop an innovative incremental consensus-based distributed (ICBD) algorithm to achieve the dynamic Nash equilibrium (NE) for constrained noncooperative game of OMASs. The ICBD algorithm leverages predefined-time stability and integral sliding-mode control to enable rapid recalibration to new equilibria and maintain constraints without the need for prolonged recalculations. Finally, several numerical simulations validate our approach to demonstrating its effectiveness.
Modern financial markets feature complex interactions between vast populations of rule- based agents and adaptive algorithmic traders, yet existing models treat these layers separately. We introduce a discrete-time population-strategy game unifying Agent-Based Modeling (ABM) and Multi-Agent Reinforcement Learning (MARL). The framework’s core innovations are: (1) an asymmetric bilevel architecture where strategic agents optimize over population distributions while endogenously shaping them; (2) a rigorous treatment of system noise via martingale difference processes with bounded moments; and (3) a verifiable, post-convergence spectral stability certificate (ρ(∇µΦ) < 1). Under mild conditions, we prove: • Existence of a Mean-Field Equilibrium • Almost sure convergence of a two-timescale learning dynamic (policy gradient + population dynamics) to this equilibrium • An O(N−1/2) finite-population approximation error Our linearly scalable Population-Strategy Policy Gradient (PSPG) algorithm enables tractable computation. We apply the framework to a synthetic market-making environment. Experiments demonstrate emergent critical thresholds (e.g., an 8.2 bps spread bifurcating stable and fragmented regimes in our calibrated model) and a 23% volatility reduction versus a pure ABM system. This framework bridges ABM’s emergent heterogeneity with MARL’s strategic adaptation, addressing key gaps in mean-field game theory and enabling a path toward real-world deployment with verifiable stability.
The coordination of multi-agent is one of the critical problems in Multi-agent Reinforcement Learning (MARL). The traditional methods of MARL focus on finding a stochastically acceptable solution called Nash Equilibrium (NE) for all agents from the Markov Game in which multiple equilibria exist. However, learning a fair equilibrium is crucial for the sustainability and stability of collaboration in the long-term coordination game, especially when the leadership competition exists. In this article, we propose the bi-level reinforcement learning method N-Bi-AC, whose solution is a Pareto improvement for traditional NE, to choose a fair Equilibrium. There are two parts in our method, the first is that we propose the Negotiator to determine the leader in stage game, and the other is to update the Q-value of agents in the game by using a bi-level actor-critic learning method based on the Joint Mixed Strategy Equilibrium Q-learning algorithm (JMSE Q-learning). The convergence proof is given, and the learning algorithm is compared with the state-of-the-art algorithms. We found that the proposed N-Bi-AC method successfully converged to a fair NE, which guarantees the fairness of agents in different matrix game environments.
This article investigates the optimal control problem with disturbance rejection for discrete‐time multi‐agent systems under cooperative and non‐cooperative graphical games frameworks. Given the practical challenges of obtaining accurate models, Q‐function‐based policy iteration methods are proposed to seek the Nash equilibrium solution for the cooperative graphical game and the distributed minmax solution for the non‐cooperative graphical game. To implement these methods online, two reinforcement learning frameworks are developed, an actor‐disturber‐critic structure for the cooperative graphical game and an actor‐adversary‐disturber‐critic structure for the non‐cooperative graphical game. The stability of the proposed methods is rigorously analyzed, and simulation results are provided to illustrate the effectiveness of the proposed methods.
This paper investigates the formation control and collision avoidance problem for high-dimensional multi-agent systems under dynamically switching communication topologies via an aggregative game framework. A distributed control algorithm is proposed, integrating second-order agent dynamics, artificial repulsion potentials, and topology-dependent neighbor coupling to achieve convergence to a desired formation while ensuring collision-free operations. The quadratic cost function, combining target tracking, inter-agent/obstacle repulsion, and topological consistency, is rigorously proven to admit a unique Nash equilibrium equivalent to the desired formation configuration. Through Lyapunov-based stability analysis, exponential convergence of the system states is guaranteed under switching topologies with persistent connectivity, and minimum interagent/obstacle distance constraints are analytically ensured. The proposed framework bridges game-theoretic optimization with multi-agent coordination, offering scalability in high-dimensional spaces and robustness against topology variations. Potential applications include agent swarms and robotic networks.
We extend the indirect evolutionary approach to the selection of (possibly misspecified) models. Agents with different models match in pairs to play a stage game, where models define feasible beliefs about game parameters and about others'strategies. In equilibrium, each agent adopts the feasible belief that best fits their data and plays optimally given their beliefs. We define the stability of the resident model by comparing its equilibrium payoff with that of the entrant model, and provide conditions under which the correctly specified resident model can only be destabilized by misspecified entrant models that contain multiple feasible beliefs (that is, entrant models that permit inference). We also show that entrants may do well in their matches against the residents only when the entrant population is large, due to the endogeneity of misspecified beliefs. Applications include the selection of demand-elasticity misperception in Cournot duopoly and the emergence of analogy-based reasoning in centipede games.
This paper studies the multi-agent differential game based problem and its application to cooperative synchronization control. A systematized formulation and analysis method for the multi-agent differential game is proposed and a data-driven methodology based on the reinforcement learning (RL) technique is given. First, it is pointed out that typical distributed controllers may not necessarily lead to global Nash equilibrium of the differential game in general cases because of the coupling of networked interactions. Second, to this end, an alternative local Nash solution is derived by defining the best response concept, while the problem is decomposed into local differential games. An off-policy RL algorithm using neighboring interactive data is constructed to update the controller without requiring a system model, while the stability and robustness properties are proved. Third, to further tackle the dilemma, another differential game configuration is investigated based on modified coupling index functions. The distributed solution can achieve global Nash equilibrium in contrast to the previous case while guaranteeing the stability. An equivalent parallel RL method is constructed corresponding to this Nash solution. Finally, the effectiveness of the learning process and the stability of synchronization control are illustrated in simulation results.
This paper explores the trade-off between efficiency and complexity in mean-field game (MFG) theory for large-scale multi-agent systems (LS-MAS). While MFG reduces computational burden by avoiding the curse of dimensionality and achieving Nash equilibrium, its solutions often yield inefficient social cost compared to the centralized McKeanVlasov control. To address this, we propose an extended MFG (EMFG) framework that improves efficiency with minimal complexity increase by introducing a decomposed mean field term using probability density functions (PDFs). Agents are grouped based on terminal PDF constraints, and an actor-critic-decomposed mass (ACDM) algorithm is developed to solve the resulting forward-backward PDE system. Although the PDF decomposition enlarges the neural network, the algorithm effectively balances control efficiency and computational cost. We provide an induction-based proof to bound the inefficiency gap between EMFG and centralized control and establish Lyapunov stability for the convergence of ACDM.
The paper investigates a constrained aggregative game problem with second-order nonlinear players, where the communication topology is a weight-unbalanced digraph. Firstly, a generalized Nash equilibrium (GNE) seeking algorithm based on the interior penalty function method is proposed to solve the aggregative game with inequality constraints, and ensuring that the constraints are always satisfied. A variable is also introduced to eliminate the imbalance caused by the weight-unbalanced communication network. Then, the stability of the proposed algorithm is proved by using Lyapunov theory. Finally, the effectiveness of the algorithm is verified by simulation.
This paper presents a novel solution concept, called BAR Nash Equilibrium (BARNE) and applies it to analyse the Verifier's dilemma, a fundamental problem in blockchain. Our solution concept adapts the Nash equilibrium (NE) to accommodate interactions among Byzantine, altruistic and rational agents, which became known as the BAR setting in the literature. We prove the existence of BARNE in a large class of games and introduce two natural refinements, global and local stability. Using this equilibrium and its refinements, we analyse the free-rider problem in the context of Byzantine consensus. We demonstrate that by incorporating fines and forced errors into a standard quorum-based blockchain protocol, we can effectively reestablish honest behavior as a globally stable BARNE.
We study learning dynamics induced by strategic agents who repeatedly play a game with an unknown payoff-relevant parameter. In each step, an information system estimates a belief distribution of the parameter based on the players' strategies and realized payoffs using Bayes' rule. Players adjust their strategies by accounting for an equilibrium strategy or a best response strategy based on the updated belief. We prove that beliefs and strategies converge to a fixed point with probability 1. We also provide conditions that guarantee local and global stability of fixed points. Any fixed point belief consistently estimates the payoff distribution given the fixed point strategy profile. However, convergence to a complete information Nash equilibrium is not always guaranteed. We provide a sufficient and necessary condition under which fixed point belief recovers the unknown parameter. We also provide a sufficient condition for convergence to complete information equilibrium even when parameter learning is incomplete.
In this brief, the multi-cluster game of second-order systems over weight-unbalanced directed graphs is studied for the first time, where each local cost function depends on the decisions of all participants. A novel algorithm is designed based on state feedback and left eigenvector estimation under the partial information setting, where each agent only knows its own local decision and cost function information. It is demonstrated by Lyapunov stability theory that the algorithm can converge to the Nash equilibrium (NE) with the weight-unbalanced directed graphs. Finally, the effectiveness of the algorithm is illustrated by a numerical simulation.
This article is devoted to analyzing the equilibrium points and convergent behaviors for a constrained signed network with general topology containing a directed spanning tree, where the output of each agent is restricted by a constraint set. Different from unconstrained signed networks, the rooted subgraph and constraint sets are both critical for the theoretical analysis of the constrained signed network. By utilizing $H$-matrix theories, projection techniques, invariance principle, and an extended Barbalat's lemma, it is rigorously shown that the state of the constrained network globally asymptotically approaches the nonempty equilibrium set. Based on the equilibrium set and constraint sets, some notions and criteria are developed to explore the convergent behaviors of the constrained network, including interval bipartite consensus, bipartite consensus, global stability, and noninterior convergence. In sharp contrast to unconstrained signed networks, a constrained signed network may fail to achieve interval bipartite consensus or bipartite consensus even if the rooted subgraph is structurally balanced. Surprisingly, it is found that the constrained signed network under different initial conditions may exhibit different types of convergent behaviors. The theoretical results are illustrated by numerical examples.
The healthy evolution of an entrepreneurial ecosystem relies on the symbiotic relationships among its diverse internal actors. This study addresses a gap in entrepreneurial ecosystem research, which has predominantly focused on two-agent models, by constructing a tripartite symbiotic evolution model that incorporates entrepreneurial ventures, incubation chains, and customers. Based on the Logistic and Lotka-Volterra models, the research identifies the system’s equilibrium points and their stability conditions. Simulations reveal evolutionary paths from parasitism and commensalism to mutualism. A comparative case study of SenseTime (Shanghai, China) and Lanma Technology (Shanghai, China) validates these findings. The comparison shows that an influx of multiple agents, coupled with the core venture’s ability to strengthen key symbiotic coefficients, drives the ecosystem towards a dynamic multi-agent symbiosis in the post-optimization phase. Conversely, the failure to establish these robust reciprocal value flows leads to ecosystem fragility. The results indicate that: (1) Multi-agent entrepreneurial ecosystems are complex systems where symbiotic units form adaptive relationships for value creation, adhering to market laws. (2) The system’s equilibrium depends on symbiotic coefficients, leading to four modes—independent coexistence, parasitism, commensalism, and mutualism—with mutualism being the optimal state. (3) The contrasting cases further demonstrate that the evolution towards mutualism is not automatic but hinges on the core venture’s strategic agency in constructing and strengthening synergistic pathways with forward and backward linkages. This study provides a theoretical model for understanding the evolutionary mechanisms of entrepreneurial ecosystems and offers practical insights for optimizing ecosystem governance.
No abstract available
Division of labor is a widely studied subject of collective behavior in natural systems. It is concerned with the question of how the regulation of the division of labor may contribute to the performance of the multi-agent system. This brief investigates the evolution of the division of labor with three strategies by employing the evolutionary game theory. Thus, these available strategies are provided for the unstructured multi-agent system: strategy A (performing task A), strategy B (performing task B), and strategy C (not performing any task but only interfere other’s actions). Depending on the parameter settings in the payoff matrix, four specific scenarios occur and the corresponding stability of the equilibrium points there are provided by calculation. Numerical calculation results are also provided for an intuitive description of the strategy evolution state. Results show that larger synergistic benefit contributes to the effective division of labor in the system, providing high benefits for the multi-agent system. These results help design a effective mechanism where self-organized division of labor occurs in the form of maximizing the benefit of the multi-agent system.
Decentralized combinatorial optimization in evolving multi-agent systems poses significant challenges, requiring agents to balance long-term decision-making, short-term optimized collective outcomes, while preserving autonomy of interactive agents under unanticipated changes. Reinforcement learning offers a way to model sequential decision-making through dynamic programming to anticipate future environmental changes. However, applying multi-agent reinforcement learning (MARL) to decentralized combinatorial optimization problems remains an open challenge due to the exponential growth of the joint state-action space, high communication overhead, and privacy concerns in centralized training. To address these limitations, this paper proposes Hierarchical Reinforcement and Collective Learning (HRCL), a novel approach that leverages both MARL and decentralized collective learning based on a hierarchical framework. Agents take high-level strategies using MARL to group possible plans for action space reduction and constrain the agent behavior for Pareto optimality. Meanwhile, the low-level collective learning layer ensures efficient and decentralized coordinated decisions among agents with minimal communication. Extensive experiments in a synthetic scenario and real-world smart city application models, including energy self-management and drone swarm sensing, demonstrate that HRCL significantly improves performance, scalability, and adaptability compared to the standalone MARL and collective learning approaches, achieving a win-win synthesis solution.
Generative agents are rapidly advancing in sophistication, raising urgent questions about how they might coordinate when deployed in online ecosystems. This is particularly consequential in information operations (IOs), influence campaigns that aim to manipulate public opinion on social media. While traditional IOs have been orchestrated by human operators and relied on manually crafted tactics, agentic AI promises to make campaigns more automated, adaptive, and difficult to detect. This work presents the first systematic study of emergent coordination among generative agents in simulated IO campaigns. Using generative agent-based modeling, we instantiate IO and organic agents in a simulated environment and evaluate coordination across operational regimes, from simple goal alignment to team knowledge and collective decision-making. As operational regimes become more structured, IO networks become denser and more clustered, interactions more reciprocal and positive, narratives more homogeneous, amplification more synchronized, and hashtag adoption faster and more sustained. Remarkably, simply revealing to agents which other agents share their goals can produce coordination levels nearly equivalent to those achieved through explicit deliberation and collective voting. Overall, we show that generative agents, even without human guidance, can reproduce coordination strategies characteristic of real-world IOs, underscoring the societal risks posed by increasingly automated, self-organizing IOs.
Coordinating multiple large language models (LLMs) to solve complex tasks collaboratively poses a fundamental trade-off between the computation costs and collective performance compared with individual model. We introduce a novel, game-theoretically grounded reinforcement learning (RL) framework, the Multi-Agent Cooperation Sequential Public Goods Game (MAC-SPGG), to systematically incentivize cooperation in multi-LLM ensembles. In MAC-SPGG, LLM agents move in sequence, observing predecessors'outputs and updating beliefs to condition their own contributions. By redesigning the public-goods reward, effortful contributions become the unique Subgame Perfect Nash Equilibrium (SPNE), which eliminates free-riding under traditional SPGG or PGG. Its sequential protocol replaces costly round-based information exchanges with a streamlined decision flow, cutting communication overhead while retaining strategic depth. We prove the existence and uniqueness of the SPNE under realistic parameters, and empirically show that MAC-SPGG-trained ensembles outperform single-agent baselines, chain-of-thought prompting, and other cooperative methods, even achieving comparable performance to large-scale models across reasoning, math, code generation, and NLP tasks. Our results highlight the power of structured, incentive-aligned MAC-SPGG cooperation for scalable and robust multi-agent language generation.
With the increasing application of artificial intelligence in decision science, it becomes greatly important to understand and simulate the strategy choices of individuals in complex interactive environments. As a model to simulate the dynamics of cooperation and betrayal, the snowdrift game provides researchers with a useful tool to analyze individual behavior strategies. This study aims to explore the application of reinforcement learning algorithms in snowdrift games. In this study, a computational model to simulate a snowdrift game is constructed by introducing a Q-learning algorithm. The influence of key parameters such as betrayal factor, cooperation factor, and exploration rate on strategy selection is also investigated. This paper experimentally analyzes how these factors affect the algorithm's decision-making process, which can promote the emergence of cooperative behavior and make strategic choices beneficial to collective benefits in an uncertain environment.
The effectiveness of a mass vaccination program can engender its own undoing if individuals choose to not get vaccinated believing that they are already protected by herd immunity. This would appear to be the optimal decision for an individual, based on a strategic appraisal of her costs and benefits, even though she would be vulnerable during subsequent outbreaks if the majority of the population argues in this manner. We investigate how voluntary vaccination can nevertheless emerge in a social network of rational agents, who make informed decisions whether to be vaccinated, integrated with a model of epidemic dynamics. The information available to each agent includes the prevalence of the disease in their local network neighborhood and/or globally in the population, as well as the fraction of their neighbors that are protected against the disease. Crucially, the payoffs governing the decision of agents vary with disease prevalence, resulting in the vaccine uptake behavior changing in response to contagion spreading. The collective behavior of the agents responding to local prevalence can lead to a significant reduction in the final epidemic size, particularly for less contagious diseases having low basic reproduction number R0. Near the epidemic threshold (R0≈1) the use of local prevalence information can result in divergent responses in the final vaccine coverage. Our results suggest that heterogeneity in the risk perception resulting from the spatio-temporal evolution of an epidemic differentially affects agents’ payoffs, which is a critical determinant of the success of voluntary vaccination schemes.
No abstract available
No abstract available
No abstract available
We study strategic behaviour in goal-based voting, where agents take a collective decision over multiple binary issues based on their individual goals (expressed as propositional formulas). We focus on three generalizations of the issue-wise majority rule, and study their resistance to manipulability in the general case, as well as for restricted languages for goals. We also study how computationally hard it is for an agent to know if they can profitably manipulate.
Swarm robotic systems utilize collective behaviour to achieve goals that might be too complex for a lone entity, but become attainable with localized communication and collective decision making. In this paper, a behaviour-based distributed approach to shape formation is proposed. Flocking into strategic formations is observed in migratory birds and fish to avoid predators and also for energy conservation. The formation is maintained throughout long periods without collapsing and is advantageous for communicating within the flock. Similar behaviour can be deployed in multi-agent systems to enhance coordination within the swarm. Existing methods for formation control are either dependent on the size and geometry of the formation or rely on maintaining the formation with a single reference in the swarm (the leader). These methods are not resilient to failure and involve a high degree of deformation upon obstacle encounter before the shape is recovered again. To improve the performance, artificial force-based interaction amongst the entities of the swarm to maintain shape integrity while encountering obstacles is elucidated.
Any community in which membership is optional may eventually break apart, or fork. For example, forks may occur in political parties, business partnerships, social groups, cryptocurrencies, and federated governing bodies. Forking is typically the product of informal social processes or the organized action of an aggrieved minority, and it is not always amicable. Forks usually come at a cost, and can be seen as consequences of collective decisions that destabilize the community. Here, we provide a social choice setting in which agents can report preferences not only over a set of alternatives, but also over the possible forks that may occur in the face of disagreement. We study this social choice setting, concentrating on stability issues and concerns of strategic agent behavior.
Many collective decision-making settings feature a strategic tension between agents acting out of individual self-interest and promoting a common good. These include wearing face masks during a pandemic, voting, and vaccination. Networked public goods games capture this tension, with networks encoding strategic interdependence among agents. Conventional models of public goods games posit solely individual self-interest as a motivation, even though altruistic motivations have long been known to play a significant role in agents’ decisions. We introduce a novel extension of public goods games to account for altruistic motivations by adding a term in the utility function that incorporates the perceived benefits an agent obtains from the welfare of others, mediated by an altruism graph. Most importantly, we view altruism not as immutable, but rather as a lever for promoting the common good. Our central algorithmic question then revolves around the computational complexity of modifying the altruism network to achieve desired public goods game investment profiles. We first show that the problem can be solved using linear programming when a principal can fractionally modify the altruism network. While the problem becomes in general intractable if the principal’s actions are all-or-nothing, we exhibit several tractable special cases.
Recent work studied Stackelberg security games with multiple defenders, in which heterogeneous defenders allocate security resources to protect a set of targets against a strategic attacker. Equilibrium analysis was conducted to characterize outcomes of these games when defenders act independently. Our starting point is the observation that the use of resources in equilibria may be inefficient due to lack of coordination. We explore the possibility of reducing this inefficiency by coordinating the defenders---specifically, by pooling the defenders' resources and allocating them jointly. The defenders' heterogeneous preferences then give rise to a collective decision-making problem, which calls for a mechanism to generate joint allocation strategies. We seek a mechanism that encourages coordination, produces efficiency gains, and incentivizes the defenders to report their true preferences and to execute the recommended strategies. Our results show that, unfortunately, even these basic properties clash with each other and no mechanism can achieve them simultaneously, which reveals the intrinsic difficulty of achieving meaningful defense coordination in security games. On the positive side, we put forward mechanisms that fulfill some of these properties and we identify special cases of our setting where more of these properties are compatible.
No abstract available
Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. However, their application in strategic multi-agent decision-making environments is hampered by significant limitations including poor mathematical reasoning, difficulty in following instructions, and a tendency to generate incorrect information. These deficiencies hinder their performance in strategic and interactive tasks that demand adherence to nuanced game rules, long-term planning, exploration in unknown environments, and anticipation of opponents' moves. To overcome these obstacles, this paper presents a novel LLM agent framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities. We deploy the tools in a number of economically important environments, in particular bilateral bargaining and multi-agent and dynamic mechanism design. We employ quantitative metrics to assess the framework's performance in various strategic decision-making problems. Our findings establish that our enhanced framework significantly improves the strategic decision-making capability of LLMs. While we highlight the inherent limitations of current LLM models, we demonstrate the improvements through targeted enhancements, suggesting a promising direction for future developments in LLM applications for interactive environments.
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
In recent years, there has been an increasing interest towards the theory and applications of decision-making in multi-agent systems, where the interactions among multiple groups of individuals exhibit complex behaviors. However, a large body of works considers only a single homogeneous population, limiting the applications in real settings. To this end, we develop a general framework for collective decision-making on a networked multi-population. We study this problem in populations with a large number of agents, where each agent has to choose one of two available options, or remain uncommitted. The contribution of this paper is threefold. First, we develop a framework for collective decision-making on a networked multi-population where the transition rates depend on the neighboring populations. Second, we characterize the equilibria and find the conditions for local asymptotic stability. Finally, we study globally stability for the equilibrium where all players are committed to neither option.
合并后的分组全面覆盖了群体智能体博弈与反思的研究前沿。研究版图呈现出明显的层次化特征:底层以分布式控制理论和纳什均衡寻优为数学基础;中层通过多智能体强化学习和演化博弈动力学实现策略的自适应演进与群体行为涌现;高层则引入大语言模型(LLM)赋予智能体心智理论、战略反思与社会推理能力。同时,这些理论在能源管理、网络安全、人机协作及大规模工业调度等实际场景中得到了广泛应用,体现了从“理性计算”向“认知反思”与“社会协同”跨越的发展趋势。