多智能体多模态社会社会记忆社会事件表示/多视角社会推理
结构化社会记忆与智能体认知架构
该组文献探讨如何为智能体构建类似人类的认知系统,重点在于多模态长期记忆(情节与语义记忆)、人格一致性维护、以及利用知识图谱和事件演算实现信息的持久化管理与生命周期维护。
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory(Lin Long, Yichen He, Wen-song Ye, Yiyuan Pan, Yuan Lin, Hang Li, Junbo Zhao, Wei Li, 2025, ArXiv)
- CogniPair: From LLM Chatbots to Conscious AI Agents - GNWT-Based Multi-Agent Digital Twins for Social Pairing - Dating & Hiring Applications(Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Mengnan Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li, 2025, ArXiv)
- Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting(Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luís Frazão, Nuno Costa, António Pereira, 2025, ArXiv)
- TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation(Bangde Du, Minghao Guo, Songming He, Ziyi Ye, Xi Zhu, Weihang Su, Shuqi Zhu, Yujia Zhou, Yongfeng Zhang, Qingyao Ai, Yiqun Liu, 2025, ArXiv)
- MIRIX: Multi-Agent Memory System for LLM-Based Agents(Yu Wang, Xi Chen, 2025, ArXiv)
- CASEE: A Hierarchical Event Representation for the Analysis of Videos(Asaad Hakeem, Yaser Sheikh, M. Shah, 2004, No journal)
- Queryable AAS Graphs for AI Agents: An Event-Driven Knowledge Graph Integration for AAS Environments(Gerhard Sonnenberg, Peter Stein, Fabio Espinosa, Daniel Porta, 2025, 2025 IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA))
- Towards Meta-Cognitive Knowledge Editing for Multimodal LLMs(Zhaoyu Fan, Kaihang Pan, Mingze Zhou, Bosheng Qin, Juncheng Li, Shengyu Zhang, Wenqiao Zhang, Siliang Tang, Fei Wu, Yueting Zhuang, 2025, ArXiv)
- Narrative Memory in Machines: Multi-Agent Arc Extraction in Serialized TV(Roberto Balestri, G. Pescatore, 2025, ArXiv)
- CreAgentive: An Agent Workflow Driven Multi-Category Creative Generation Engine(Yuyang Cheng, Linyue Cai, Chang‐Hung Peng, Yumiao Xu, Rongfang Bie, Yong Zhao, 2025, ArXiv)
- SupportPlay: A Multi-Agent Role-Playing System for Personalized and Sustained Multimodal Emotional Support Conversation(Geng Tu, Bingbing Wang, Erik Cambria, Wenjie Li, Ruifeng Xu, 2025, Companion Proceedings of the ACM on Web Conference 2025)
- Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models(Sureyya Akin, Shruti T. Tiwari, R. Bhattacharya, Sagar A. Raman, Kiran Mohanty, Sita Krishnan, 2025, ArXiv)
多视角社会推理与心理理论 (ToM) 建模
研究聚焦于智能体理解他人心理状态(意图、错误信念)的能力,通过视角切换、溯因推理、圆桌讨论及逻辑增强技术,提升智能体在复杂社会互动中的决策深度与博弈能力。
- Bayesian Theory of Mind for False Belief Understanding in Human-Robot Interaction(Mehdi Hellou, Samuele Vinanzi, Angelo Cangelosi, 2023, 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN))
- Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective(Qiyao Xue, Weichen Liu, Shiqi Wang, Haoming Wang, Yuyang Wu, Wei Gao, 2025, ArXiv)
- PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision(Sabrina Patania, Luca Annese, Anita Pellegrini, Silvia Serino, Anna Lambiase, Luca Pallonetto, Silvia Rossi, Simone Colombani, Tom Foulsham, A. Ruggeri, Dimitri Ognibene, 2025, ArXiv)
- LLM Agents at the Roundtable: A Multi-Perspective and Dialectical Reasoning Framework for Essay Scoring(Jinhee Jang, Ayoung Moon, Minkyoung Jung, Youngbin Kim, Seung Jin Lee, 2025, ArXiv)
- Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance(Baopu Qiu, Hao Chen, Yuan Wu, Changtong Zan, Chao Wei, Weiru Zhang, Xiaoyi Zeng, 2026, ArXiv)
- ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective(Yiwen Zhang, Ziang Chen, Fanqi Kong, Yizhe Huang, Xue Feng, 2025, ArXiv)
- MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind(Emilio Villa-Cueva, S M Masrur Ahmed, Rendi Chevi, Jan Christian Blaise Cruz, Kareem Elzeky, Fermin Cristobal, Alham Fikri Aji, Skyler Wang, Rada Mihalcea, T. Solorio, 2025, ArXiv)
- Perceiving minds in machines: how perceived theory of mind in robots influences human–robot empathy through the lens of mind perception theory(Ruolin Fan, Yunjia Zheng, Jiayi Li, Guiping Xu, 2025, BMC Psychology)
- Multi-Perspective Explanations for Multi-Agent Systems(Nathan Lloyd, Peter R. Lewis, 2025, 2025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C))
- Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning(Song Yu, Xiaofei Xu, Keqin Deng, Li Li, Lin Tian, 2025, ArXiv)
- DiPT: Enhancing LLM reasoning through diversified perspective-taking(H. Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia, 2024, ArXiv)
- Identifying Power Relations in Conversations using Multi-Agent Social Reasoning(Zhaoqing Wu, Dan Goldwasser, Maria Leonor Pacheco, Leora Morgenstern, 2025, No journal)
- From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning(Yunyao Zhang, Xinglang Zhang, Junxi Sheng, Wenbing Li, Junqing Yu, Wei Yang, Zikai Song, 2025, ArXiv)
- Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames(Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis, 2025, ArXiv)
- ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions(Matteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling, 2025, No journal)
- SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions(Xianzhe Fan, Xuhui Zhou, Chuanyang Jin, Kolby Nottingham, Hao Zhu, M. Sap, 2025, ArXiv)
- AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios(Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, Zhongyu Wei, 2024, No journal)
社会动力学模拟与群体行为演化
关注大规模多智能体系统中社会规范的涌现、意见极化、从众行为及民主进程。研究结合了人格驱动模型与离散事件模拟,旨在揭示复杂社会系统的宏观演化规律。
- Mindstorms in Natural Language-Based Societies of Mind(Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, R'obert Csord'as, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Hammoud, Vincent Herrmann, K. Irie, Louis Kirsch, Bing-chuan Li, G. Li, Shuming Liu, Jinjie Mai, Piotr Pikekos, A. Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stani'c, Wenyi Wang, Yu‐Han Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, J. Schmidhuber, 2023, Comput. Vis. Media)
- Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting(Chen Han, Jin Tan, Bohan Yu, Wenzhen Zheng, Xijin Tang, 2026, ArXiv)
- LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation(Yijun Liu, Wu Liu, Xiaoyan Gu, Yong Rui, Xiaodong He, Yongdong Zhang, 2024, ArXiv)
- SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation(Gaurav Koley, 2025, ArXiv)
- Simulating Social Behavior of LLM-Based Autonomous Negotiator Agents in a Game-Theoretical Framework Using Multi-Agent Systems(Ahmad Mouri Zadeh Khaki, Ahyoung Choi, Laleh Seyyed-Kalantari, 2025, International Journal of Human–Computer Interaction)
- PRISM: A Personality-Driven Multi-Agent Framework for Social Media Simulation(Zhixiang Lu, Xueyuan Deng, Yiran Liu, Yulong Li, Qiang Yan, Imran Razzak, Jionglong Su, 2025, ArXiv)
- Towards Simulating Social Influence Dynamics with LLM-Based Multi-Agents(Hsien-Tsung Lin, Pei-Cing Huang, Chan-Tung Ku, Chan Hsu, Pei-Xuan Shieh, Yihuang Kang, 2025, 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI))
- Fostering collective intelligence in CPSS: an LLM-driven multi-agent cooperative tuning framework(Rongjun Chen, Chengbo He, 2025, Frontiers in Physics)
- Discrete Event Dynamic Modeling and Analysis of the Democratic Progress in a Society Controlled by Networked Agents(Seong-Jin Park, Kwang-Hyun Cho, 2021, IEEE Transactions on Automatic Control)
- Emergence of Social Norms in Large Language Model-based Agent Societies(Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu, 2024, ArXiv)
- Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges(P. Taillandier, Jean-Daniel Zucker, A. Grignard, B. Gaudou, N. Huynh, A. Drogoul, 2025, ArXiv)
- A Visualized Framework for Event Cooperation with Generative Agents(Yuyang Tian, Shunqiang Mao, Wenchang Gao, Lanlan Qiu, Tianxing He, 2025, ArXiv)
- FACS-CHARM: A Hybrid Agent-Based and Discrete-Event Simulation Approach for Covid-19 Management at Regional Level(Anastasia Anagnostou, D. Groen, Simon J. E. Taylor, D. Suleimenova, N. Abubakar, Arindam Saha, K. Mintram, Maziar Ghorbani, Habiba Daroge, Tasin Islam, Yani Xue, Edward Okine, N. Anokye, 2022, 2022 Winter Simulation Conference (WSC))
- Emergence of Social Norms in Generative Agent Societies: Principles and Architecture(Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu, 2024, No journal)
- MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs(Xianhao Yu, Jiaqi Fu, Renjia Deng, Wenjuan Han, 2024, ArXiv)
多模态社会事件表示、检测与意图识别
利用多模态数据(文、图、社交网络)对社会事件进行图谱化表示,涵盖谣言检测、有害意图识别、情感预测及品牌感知分析,强调跨模态对齐与细粒度语义理解。
- ContextAware: A Multi-Agent Framework for Detecting Harmful Image-Based Comments on Social Media(Zheng Wei, Mingchen Li, Pu Zhang, Xinyu Liu, Huamin Qu, Pan Hui, 2024, No journal)
- SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media(Xilai Xu, Zilin Zhao, Chengye Song, Zining Wang, J. Qiang, Jiongrui Yan, Yuhuai Lin, 2025, ArXiv)
- Retrieval-Augmented Hypergraph for Multimodal Social Media Popularity Prediction(Zhangtao Cheng, Jienan Zhang, Xovee Xu, Goce Trajcevski, Ting Zhong, Fan Zhou, 2024, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
- Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information(Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji, 2024, No journal)
- EvolGCN: A Co-Evolutionary Graph Convolutional Network Model for Dynamically Spatio-Temporal Anomaly Event Inference(Xiaoming Liu, Hang Pu, Zhuo Chen, Zhanwei Zhang, Y. Lan, Chao Shen, 2025, IEEE Transactions on Dependable and Secure Computing)
- Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents(Fanhang Man, Huandong Wang, Jianjie Fang, Zhaoyi Deng, Baining Zhao, Xinlei Chen, Yong Li, 2025, No journal)
- Agent‐oriented activity recognition in the event calculus: An application for diabetic patients(Özgür Kafali, A. Romero, Kostas Stathis, 2017, Computational Intelligence)
- Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs(Firoj Alam, Md. Rafiul Biswas, Uzair Shah, W. Zaghouani, Georgios Mikros, 2024, ArXiv)
- MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media(Rui Lu, Jinhe Bi, Yunpu Ma, Feng Xiao, Yuntao Du, Yijun Tian, 2025, ArXiv)
- Aligned or Apart? Multi-Agent Insights into Consumer and Brand Messaging Discrepancies(Haotian Gan, Yudong Li, Wanyue Li, Weidong Tang, 2025, Proceedings of the 33rd ACM International Conference on Multimedia)
- Emotion meets coordination: Designing multi-agent LLMs for fine-grained user sentiment detection on social media(Hao Dong, Zuowen Bao, Muze Li, Zhengfeng Yang, 2026, PLOS One)
- A Multi-Agent Framework for Fine-Grained Multimodal Named Entity Recognition through Check and Reasoning(Heng-yang Lu, Xintong Liu, Xingda Shang, Wei Fang, Xiao-jun Wu, 2025, Proceedings of the 7th ACM International Conference on Multimedia in Asia)
- Graph Representation Learning with Massive Unlabeled Data for Rumor Detection(Chaoqun Cui, Caiyan Jia, 2025, ArXiv)
- Multiple Agent Event Detection and Representation in Videos(Asaad Hakeem, M. Shah, 2005, No journal)
特定领域的协同决策与专业化应用
展示多智能体框架在垂直领域的应用,如业务数据分析、网络安全检测、事实核查、教育社交分析及文化感知模拟,通过辩论与协作机制提升任务处理的准确性。
- Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play(Sha Li, R. Reddy, Khanh Duy Nguyen, Qingyun Wang, May Fung, Chi Han, Jiawei Han, Kartik Natarajan, Clare Voss, Heng Ji, 2024, ArXiv)
- CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences(Yiyang Wang, Chen Chen, Tica Lin, Vishnu Raj, J. Kimball, Alex Cabral, Josiah D. Hester, 2025, ArXiv)
- Supporting social interactions analyses with multi-agent large language models (MALLM) – an exploratory study(Jui-Long Hung, Yeye Tang, Xu Du, Hao Li, Minghao Deng, 2025, Information Discovery and Delivery)
- Research on the Influencing Factors of the Willingness to Share Knowledge in WeChat Moments(Honghua Xie, Juan Zhu, 2019, Proceedings of the 2019 4th International Conference on Humanities Science and Society Development (ICHSSD 2019))
- Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics(Ran Zhang, Mohannad Elhamod, 2025, ArXiv)
- LoCal: Logical and Causal Fact-Checking with LLM-Based Multi-Agents(Jiatong Ma, Linmei Hu, Rang Li, Wenbo Fu, 2025, Proceedings of the ACM on Web Conference 2025)
- PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection(Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah, 2025, 2025 IEEE International Conference on Big Data (BigData))
- Mining security events in a distributed agent society(D. Dasgupta, José M. Rodríguez, S. Balachandran, 2006, No journal)
- Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning(Rui Huang, Wendy Liu, Anastasia Miin, Lei Ding, 2025, ArXiv)
社会智能的安全性、信任治理与形式化理论
探讨多智能体系统在交互中的信任建立、欺骗检测、责任归属及安全合规性,同时包含可负担性(Affordance)形式化表示等底层理论支撑。
- Formalizing Affordance(Mark Steedman, 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society)
- FOGMACHINE - Leveraging Discrete-Event Simulation and Scene Graphs for Modeling Hierarchical, Interconnected Environments under Partial Observations from Mobile Agents(Lars Ohnemus, Nils Hantke, Max Weißer, Kai Furmans, 2025, ArXiv)
- Tensor optimization with group lasso for multi-agent predictive state representation(Biyang Ma, Jing Tang, Bilian Chen, Yinghui Pan, Yi-feng Zeng, 2021, Knowl. Based Syst.)
- Multimodal Safety Evaluation in Generative Agent Social Simulations(Alhim Vera, Karen Sanchez, Carlos Hinojosa, Haidar Bin Hamid, Donghoon Kim, Bernard Ghanem, 2025, ArXiv)
- Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic(Angus Addlesee, Neeraj Cherakara, Nivan Nelson, Daniel Hernández García, Nancie Gunson, Weronika Maria Sieińska, C. Dondrup, Oliver Lemon, 2024, No journal)
- From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration(Shuide Wen, Yu Sun, Beier Ku, Zhiqi Gao, Lijun Ma, Yang Yang, Can Jiao, 2025, ArXiv)
- Mind Meets Space: Rethinking Agentic Spatial Intelligence from a Neuroscience-inspired Perspective(Bui Duc Manh, Soumyaratna Debnath, Zetong Zhang, Shriram Damodaran, Arvind Kumar, Yueyi Zhang, Lu Mi, Erik Cambria, Lin Wang, 2025, ArXiv)
- The Traitors: Deception and Trust in Multi-Agent Language Model Simulations(Pedro M. P. Curvo, 2025, ArXiv)
- Trustworthiness assessment in multimodal human-robot interaction based on cognitive load(M. Kirtay, Erhan Öztop, A. Kuhlen, M. Asada, V. Hafner, 2022, 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN))
- Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning(Dayong Liang, Xiao-Yong Wei, Changmeng Zheng, 2025, ArXiv)
- Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning(Junqi Gao, Xiang Zou, Ying Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu, 2025, ArXiv)
- Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment(Jinwei Hu, Yi Dong, Shuang Ao, Zhuoyun Li, Boxuan Wang, Lokesh Singh, Guangliang Cheng, Sarvapali D. Ramchurn, Xiaowei Huang, 2025, ArXiv)
- The open agent society: retrospective and prospective views(J. Pitt, A. Artikis, 2015, Artificial Intelligence and Law)
- Explicit Cooperation Shapes Human-Like Multi-Agent LLM Negotiation(Yanru Jiang, G¨uls¸ah Akc¸akır, 2025, No journal)
- Modelling security risk in critical utilities: The system at risk as a three player game and agent society(J. Busby, Antonios Gouglidis, S. Rass, Sandra König, 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC))
本报告综合了多智能体系统在社会计算领域的最新研究成果,形成了从底层认知架构到高层社会治理的完整体系。研究核心已从单一的文本交互转向具备多模态感知、长期社会记忆和心理理论(ToM)推理能力的具身智能体。重点趋势包括:1) 结构化记忆与人格化架构的融合;2) 复杂社会规范与群体动力学的仿真模拟;3) 针对多模态社会事件的精准检测与意图分析;4) 愈发强调多智能体协作中的信任、责任与安全治理。这些进展共同推动了AI代理向具有深度社会理解力的数字化社会成员演进。
总计82篇相关文献
This search introduces the Multimodal Socialized Learning Framework (M-S2L), designed to foster emergent social intelligence in AI agents by integrating Multimodal Large Language Models (M-LLMs) with social learning mechanisms. The framework equips agents with multimodal perception (vision and text) and structured action capabilities, enabling physical manipulation and grounded multimodal communication (e.g., text with visual pointers). M-S2L combines direct reinforcement learning with two novel social learning pathways: multimodal observational learning and communication-driven learning from feedback, augmented by an episodic memory system for long-term social context. We evaluate M-S2L in a Collaborative Assembly Environment (CAE), where agent teams must construct complex devices from ambiguous blueprints under informational asymmetry. Across tasks of increasing complexity, M-S2L agents consistently outperform Text-Only and No-Social-Learning baselines in Task Completion Rate and Time to Completion, particularly in dynamic problem-solving scenarios. Ablation studies confirm the necessity of both multimodality and socialized learning. Our analysis reveals the emergence of efficient communication protocols integrating visual pointers with concise text, alongside rapid role specialization leading to stable labor division. Qualitative case studies demonstrate agents'abilities for shared awareness, dynamic re-planning, and adaptive problem-solving, suggesting a nascent form of machine social cognition. These findings indicate that integrating multimodal perception with explicit social learning is critical for developing human-like collaborative intelligence in multi-agent systems.
The believable simulation of multi-user behavior is crucial for understanding complex social systems. Recently, large language models (LLMs)-based AI agents have made significant progress, enabling them to achieve human-like intelligence across various tasks. However, real human societies are often dynamic and complex, involving numerous individuals engaging in multimodal interactions. In this paper, taking e-commerce scenarios as an example, we present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides freely chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce. To simulate this complex system, we introduce a self-consistency prompting mechanism to augment agents' multimodal capabilities, resulting in significantly improved decision-making performance over the existing multi-agent system. Moreover, we propose a fast memory mechanism combined with the small-world model to enhance system efficiency, which supports more than 10,000 agent simulations in a society. Experiments on agents' behavior show that these agents achieve comparable performance to humans in behavioral indicators. Furthermore, compared with the existing LLMs-based multi-agent system, more different and valuable phenomena are exhibited, such as herd behavior, which demonstrates the potential of LMAgent in credible large-scale social behavior simulations.
Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by solving the field's most critical challenge: enabling language models to truly remember. Unlike prior approaches, MIRIX transcends text to embrace rich visual and multimodal experiences, making memory genuinely useful in real-world scenarios. MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale. We validate MIRIX in two demanding settings. First, on ScreenshotVQA, a challenging multimodal benchmark comprising nearly 20,000 high-resolution computer screenshots per sequence, requiring deep contextual understanding and where no existing memory systems can be applied, MIRIX achieves 35% higher accuracy than the RAG baseline while reducing storage requirements by 99.9%. Second, on LOCOMO, a long-form conversation benchmark with single-modal textual input, MIRIX attains state-of-the-art performance of 85.4%, far surpassing existing baselines. These results show that MIRIX sets a new performance standard for memory-augmented LLM agents. To allow users to experience our memory system, we provide a packaged application powered by MIRIX. It monitors the screen in real time, builds a personalized memory base, and offers intuitive visualization and secure local storage to ensure privacy.
With the increasing prevalence of multimodal content on social media, sentiment analysis faces significant challenges in effectively processing heterogeneous data and recognizing multi-label emotions. Existing methods often lack effective cross-modal fusion and external knowledge integration. We propose SentiMM, a novel multi-agent framework designed to systematically address these challenges. SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification. We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories. Extensive experiments demonstrate that SentiMM achieves superior performance compared to state-of-the-art baselines, validating the effectiveness of our structured approach.
We introduce M3-Agent, a novel multimodal agent framework equipped with long-term memory. Like humans, M3-Agent can process real-time visual and auditory inputs to build and update episodic and semantic memories, gradually accumulating world knowledge. Its memory is organized in an entity-centric, multimodal manner, enabling deeper and more consistent understanding of the environment. Given an instruction, M3-Agent autonomously performs multi-turn reasoning and retrieves relevant memories to complete tasks. To evaluate memory effectiveness and memory-based reasoning in multimodal agents, we develop M3-Bench, a long-video question answering benchmark comprising 100 newly recorded robot-perspective videos (M3-Bench-robot) and 920 diverse web-sourced videos (M3-Bench-web). We annotate QA pairs designed to test capabilities essential for agent applications, such as person understanding, general knowledge extraction, and cross-modal reasoning. Experimental results show that M3-Agent, trained via reinforcement learning, outperforms the strongest baseline, a prompting agent using Gemini-1.5-pro and GPT-4o, achieving 6.7%, 7.7%, and 5.3% higher accuracy on M3-Bench-robot, M3-Bench-web and VideoMME-long, respectively. Our work advances multimodal agents toward more human-like long-term memory and provides insights for their practical design. Model, code and data are available at https://github.com/bytedance-seed/m3-agent.
In this demo, we present SupportPlay, a multi-agent role-playing system for emotional support conversation (ESC) that addresses the limitations of existing methods in providing personalized, sustained, and multimodal support. SupportPlay generates potential seeker profiles from existing ESC datasets and employs GPT-powered agents to play various seeker roles, enabling the supporter to learn personalized memories for each seeker. Subsequently, the supporter can induce general memory from these memories for real user interactions while learning the user's personalized memory in a similar manner. Through continuous memory management-including retrieval, storage, reflection, and forgetting-SupportPlay delivers tailored emotional support across interactions. By integrating text, speech, and video, SupportPlay creates immersive emotional support experiences.
Hallucination continues to pose a major obstacle in the reasoning capabilities of large language models (LLMs). Although the Multi-Agent Debate (MAD) paradigm offers a promising solution by promoting consensus among multiple agents to enhance reliability, it relies on the unrealistic assumption that all debaters are rational and reflective, which is a condition that may not hold when agents themselves are prone to hallucinations. To address this gap, we introduce the Multi-agent Undercover Gaming (MUG) protocol, inspired by social deduction games like"Who is Undercover?". MUG reframes MAD as a process of detecting"undercover"agents (those suffering from hallucinations) by employing multimodal counterfactual tests. Specifically, we modify reference images to introduce counterfactual evidence and observe whether agents can accurately identify these changes, providing ground-truth for identifying hallucinating agents and enabling robust, crowd-powered multimodal reasoning. MUG advances MAD protocols along three key dimensions: (1) enabling factual verification beyond statistical consensus through counterfactual testing; (2) introducing cross-evidence reasoning via dynamically modified evidence sources instead of relying on static inputs; and (3) fostering active reasoning, where agents engage in probing discussions rather than passively answering questions. Collectively, these innovations offer a more reliable and effective framework for multimodal reasoning in LLMs. The source code can be accessed at https://github.com/YongLD/MUG.git.
The Few-Shot Fine-Grained Multimodal Named Entity Recognition task (FewFMNER) aims to recognize the named entities and their fine-grained types in the text based on the social multimedia content of images and texts. With limited labeled data, FewFMNER can more accurately understand the semantics of multimodal content and support downstream applications in social networks. However, as the number of fine-grained types increases, two challenges will arise: (1) Entity boundary ambiguity and concept misjudgment. (2) Semantic confusion exists in fine-grained types. Therefore, we propose a novel Multi-agent Concept Check and Dual-path Reasoning (MCCDR) framework: (1) Entity boundary detection: Recognize named entities and filter concepts to obtain candidate named entity boundaries. (2) Entity type classification: Use dual-path reasoning and type definition to alleviate semantic confusion, and utilize a self-consistency strategy to fuse (entity, fine-grained type) pairs from two perspectives. The experiment results on the Twitter FMNERG dataset show that MCCDR is 6% to 10% higher than the baseline and achieves state-of-the-art results on the FewFMNER task.
Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system. Results: Quantitative experiments showed that the mean semantic similarity between Multimodal Large Language Model (MLLM) interpretations and human expert interpretations was approximately 0.75 (standard deviation about 0.05). In structurally oriented expert data sets, this similarity rose to 0.85, indicating expert-level baseline comprehension. Qualitative analyses demonstrated that the multi-agent system, by integrating social-psychological perspectives and destigmatizing narratives, effectively corrected visual hallucinations and produced psychological reports with high ecological validity and internal coherence. Conclusions: The findings confirm the potential of multimodal large models as standardized tools for projective assessment. The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services. Keywords: House-Tree-Person test; multimodal large language model; multi-agent collaboration; cosine similarity; computational psychology; artificial intelligence
Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems can recreate the dynamics of shared viewing experiences across diverse content types. We present CompanionCast, a general framework for orchestrating multiple role-specialized AI agents that respond to video content using multimodal inputs, speech synthesis, and spatial audio. Distinctly, CompanionCast integrates an LLM-as-a-Judge module that iteratively scores and refines conversations across five dimensions (relevance, authenticity, engagement, diversity, personality consistency). We validate this framework through sports viewing, a domain with rich dynamics and strong social traditions, where a pilot study with soccer fans suggests that multi-agent interaction improves perceived social presence compared to solo viewing. We contribute: (1) a generalizable framework for orchestrating multi-agent conversations around multimodal video content, (2) a novel evaluator-agent pipeline for conversation quality control, and (3) exploratory evidence of increased social presence in AI-mediated co-viewing. We discuss challenges and future directions for applying this approach to diverse viewing contexts including entertainment, education, and collaborative watching experiences.
Contemporary approaches to agent-based modeling (ABM) of social systems have traditionally emphasized rule-based behaviors, limiting their ability to capture nuanced dynamics by moving beyond predefined rules and leveraging contextual understanding from LMs of human social interaction. This paper presents SALM (Social Agent LM Framework), a novel approach for integrating language models (LMs) into social network simulation that achieves unprecedented temporal stability in multi-agent scenarios. Our primary contributions include: (1) a hierarchical prompting architecture enabling stable simulation beyond 4,000 timesteps while reducing token usage by 73%, (2) an attention-based memory system achieving 80% cache hit rates (95% CI [78%, 82%]) with sub-linear memory growth of 9.5%, and (3) formal bounds on personality stability. Through extensive validation against SNAP ego networks, we demonstrate the first LLM-based framework capable of modeling long-term social phenomena while maintaining empirically validated behavioral fidelity.
Traditional agent-based models (ABMs) of opinion dynamics often fail to capture the psychological heterogeneity driving online polarization due to simplistic homogeneity assumptions. This limitation obscures the critical interplay between individual cognitive biases and information propagation, thereby hindering a mechanistic understanding of how ideological divides are amplified. To address this challenge, we introduce the Personality-Refracted Intelligent Simulation Model (PRISM), a hybrid framework coupling stochastic differential equations (SDE) for continuous emotional evolution with a personality-conditional partially observable Markov decision process (PC-POMDP) for discrete decision-making. In contrast to continuous trait approaches, PRISM assigns distinct Myers-Briggs Type Indicator (MBTI) based cognitive policies to multimodal large language model (MLLM) agents, initialized via data-driven priors from large-scale social media datasets. PRISM achieves superior personality consistency aligned with human ground truth, significantly outperforming standard homogeneous and Big Five benchmarks. This framework effectively replicates emergent phenomena such as rational suppression and affective resonance, offering a robust tool for analyzing complex social media ecosystems.
While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.
In the past decade, social media platforms have been used for information dissemination and consumption. While a major portion of the content is posted to promote citizen journalism and public awareness, some content is posted to mislead users. Among different content types such as text, images, and videos, memes (text overlaid on images) are particularly prevalent and can serve as powerful vehicles for propaganda, hate, and humor. In the current literature, there have been efforts to individually detect such content in memes. However, the study of their intersection is very limited. In this study, we explore the intersection between propaganda and hate in memes using a multi-agent LLM-based approach. We extend the propagandistic meme dataset with coarse and fine-grained hate labels. Our finding suggests that there is an association between propaganda and hate in memes. We provide detailed experimental results that can serve as a baseline for future studies. We will make the experimental resources publicly available to the community (https://github.com/firojalam/propaganda-and-hateful-memes).
Current large language model (LLM) agents lack authentic human psychological processes necessary for genuine digital twins and social AI applications. To address this limitation, we present a computational implementation of Global Workspace Theory (GNWT) that integrates human cognitive architecture principles into LLM agents, creating specialized sub-agents for emotion, memory, social norms, planning, and goal-tracking coordinated through a global workspace mechanism. However, authentic digital twins require accurate personality initialization. We therefore develop a novel adventure-based personality test that evaluates true personality through behavioral choices within interactive scenarios, bypassing self-presentation bias found in traditional assessments. Building on these innovations, our CogniPair platform enables digital twins to engage in realistic simulated dating interactions and job interviews before real encounters, providing bidirectional cultural fit assessment for both romantic compatibility and workplace matching. Validation using 551 GNWT-Agents and Columbia University Speed Dating dataset demonstrates 72% correlation with human attraction patterns, 77.8% match prediction accuracy, and 74% agreement in human validation studies. This work advances psychological authenticity in LLM agents and establishes a foundation for intelligent dating platforms and HR technology solutions.
Detecting hidden stigmatization in social media poses significant challenges due to semantic misalignments between textual and visual modalities, as well as the subtlety of implicit stigmatization. Traditional approaches often fail to capture these complexities in real-world, multimodal content. To address this gap, we introduce ContextAware, an agent-based framework that leverages specialized modules to collaboratively process and analyze images, textual context, and social interactions. Our approach begins by clustering image embeddings to identify recurring content, activating high-likes agents for deeper analysis of images receiving substantial user engagement, while comprehensive agents handle lower-engagement images. By integrating case-based learning, textual sentiment, and vision-language models (VLMs), ContextAware refines its detection of harmful content. We evaluate ContextAware on a self-collected Douyin dataset focused on interracial relationships, comprising 871 short videos and 885,502 comments—of which a notable portion are image-based. Experimental results show that ContextAware not only outperforms state-of-the-art methods in accuracy and F1 score but also effectively detects implicit stigmatization within the highly contextual environment of social media. Our findings underscore the importance of agent-based architectures and multimodal alignment in capturing nuanced, culturally specific forms of harmful content.
Social media has evolved into a complex multimodal environment where text, images, and other signals interact to shape nuanced meanings, often concealing harmful intent. Identifying such intent, whether sarcasm, hate speech, or misinformation, remains challenging due to cross-modal contradictions, rapid cultural shifts, and subtle pragmatic cues. To address these challenges, we propose MV-Debate, a multi-view agent debate framework with dynamic reflection gating for unified multimodal harmful content detection. MV-Debate assembles four complementary debate agents, a surface analyst, a deep reasoner, a modality contrast, and a social contextualist, to analyze content from diverse interpretive perspectives. Through iterative debate and reflection, the agents refine responses under a reflection-gain criterion, ensuring both accuracy and efficiency. Experiments on three benchmark datasets demonstrate that MV-Debate significantly outperforms strong single-model and existing multi-agent debate baselines. This work highlights the promise of multi-agent debate in advancing reliable social intent detection in safety-critical online contexts.
Accurately predicting the popularity of multimodal user-generated content (UGC) is fundamental for many real-world applications such as online advertising and recommendation. Existing approaches generally focus on limited contextual information within individual UGCs, yet overlook the potential benefit of exploiting meaningful knowledge in relevant UGCs. In this work, we propose RAGTrans, an aspect-aware retrieval-augmented multi-modal hypergraph transformer that retrieves pertinent knowledge from a multi-modal memory bank and enhances UGC representations via neighborhood knowledge aggregation on multi-model hypergraphs. In particular, we initially retrieve relevant multimedia instances from a large corpus of UGCs via the aspect information and construct a knowledge-enhanced hypergraph based on retrieved relevant instances. This allows capturing meaningful contextual information across the data. We then design a novel bootstrapping hypergraph transformer on multimodal hypergraphs to strengthen UGC representations across modalities via customizing a propagation algorithm to effectively diffuse information across nodes and edges. Additionally, we propose a user-aware attention-based fusion module to comprise the enriched UGC representations for popularity prediction. Extensive experiments on real-world social media datasets demonstrate that RAGTrans outperforms state-of-the-art popularity prediction models across settings.
As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where LLMs reason over persistent memory and evolving social dynamics, with support for heterogeneous agent populations, specialized traits, and adaptive behaviors. Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry: advanced models like GPT-4o demonstrate superior deceptive capabilities yet exhibit disproportionate vulnerability to others'falsehoods. This suggests deception skills may scale faster than detection abilities. Overall, The Traitors provides a focused, configurable testbed for investigating LLM behavior in socially nuanced interactions. We position this work as a contribution toward more rigorous research on deception mechanisms, alignment challenges, and the broader social reliability of AI systems.
Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence, and trust across modalities remains limited. We introduce a reproducible simulation framework for evaluating agents along three dimensions: (1) safety improvement over time, including iterative plan revisions in text-visual scenarios; (2) detection of unsafe activities across multiple categories of social situations; and (3) social dynamics, measured as interaction counts and acceptance ratios of social exchanges. Agents are equipped with layered memory, dynamic planning, multimodal perception, and are instrumented with SocialMetrics, a suite of behavioral and structural metrics that quantifies plan revisions, unsafe-to-safe conversions, and information diffusion across networks. Experiments show that while agents can detect direct multimodal contradictions, they often fail to align local revisions with global safety, reaching only a 55 percent success rate in correcting unsafe plans. Across eight simulation runs with three models - Claude, GPT-4o mini, and Qwen-VL - five agents achieved average unsafe-to-safe conversion rates of 75, 55, and 58 percent, respectively. Overall performance ranged from 20 percent in multi-risk scenarios with GPT-4o mini to 98 percent in localized contexts such as fire/heat with Claude. Notably, 45 percent of unsafe actions were accepted when paired with misleading visuals, showing a strong tendency to overtrust images. These findings expose critical limitations in current architectures and provide a reproducible platform for studying multimodal safety, coherence, and social dynamics.
Social media platforms have become central channels for emotional communication, posing new challenges for fine-grained sentiment analysis due to their high contextual variability, multimodal content, and pervasive ambiguity. Traditional end-to-end sentiment models often struggle to capture compositional or conflicting emotional cues in user-generated texts. This study presents a modular multi-agent architecture for sentiment analysis, implemented with the LLaMA-3.3-70B-Instruct model and guided by system-level design principles. The framework decomposes emotion inference into three coordinated stages, perception, reasoning, and resolution, each managed by a specialized agent trained with parameter-efficient tuning strategies. A meta-agent mediates conflicting predictions through a coordination protocol based on confidence estimation and discourse consistency, enabling adaptive consensus formation. Evaluations on the GoEmotions v2, SemEval-2024, and Twitter benchmarks demonstrate that the proposed system achieves higher accuracy, robustness, and interpretability compared with existing baselines. These findings indicate that architectural decomposition combined with collaborative reasoning enhances reliability and transparency in sentiment analysis, offering a scalable pathway toward intelligent and emotionally aware computational systems.
Serialized television narratives present significant analytical challenges due to their complex, temporally distributed storylines that necessitate sophisticated information management. This paper introduces a multi-agent system (MAS) designed to extract and analyze narrative arcs by implementing principles of computational memory architectures. The system conceptualizes narrative understanding through analogues of human memory: Large Language Models (LLMs) provide a form of semantic memory for general narrative patterns, while a vector database stores specific arc progressions as episodic memories. A multi-agent workflow simulates working memory processes to integrate these information types. Tested on the first season of Grey's Anatomy (ABC 2005-), the MAS identifies three arc types: Anthology (self-contained), Soap (relationship-focused), and Genre-Specific. These arcs and their episodic developments are stored in a vector database, facilitating structured analysis and semantic comparison. To bridge automation with critical interpretation, a graphical interface enables human oversight and refinement of the system's narrative memory. While demonstrating strong performance in identifying Anthology Arcs and character entities, the system's reliance on textual paratexts (episode summaries) revealed limitations in discerning overlapping arcs and opaque dynamics, underscoring the challenges in computational memory consolidation versus human holistic understanding. This memory-centric approach highlights the potential of combining AI-driven memory processing with human expertise. Beyond television, it offers promise for serialized written formats where narrative is entirely text-based. Future work will focus on integrating multimodal inputs to enrich episodic memory, refining memory integration mechanisms within the MAS, and expanding testing across diverse genres.
No abstract available
This study aims to explore the potential of Multi-Agent Large Language Models (MALLM) to enhance Social Network Analysis (SNA) for online education. It compares MALLM with single-agent LLMs in conducting, interpreting and applying SNA, addressing barriers that limit adoption. An exploratory experiment using AutoGen compared MALLM and single-agent LLMs across multistep SNA workflows with a Coursera discussion data set. The process included data exploration, analysis and visualization. Specialized agent teams were assigned to analysis and interpretation. Performance was tested over 20 rounds, evaluated on comprehension, accuracy, execution and educational relevance. Single agents were more efficient in simpler tasks (data exploration 85% vs 25%, visualization 50% vs 45%). MALLM outperformed in complex tasks, with higher SNA precision (30% vs 25%), stronger node-level analysis (95% vs 65%) and greater educational insights (55% vs 35%). However, MALLM faced coordination inefficiencies in linear tasks. Limitations include contextual forgetting, token-size constraints and coordination overhead. Results are specific to GPT-4/GPT-4o, with a 30% success rate in complex tasks, indicating LLMs are not yet sufficient for full automation. MALLMs can advance online education by supporting personalized learning and engagement while democratizing access to advanced analytics and pedagogical feedback, thereby enhancing educational equity. To the best of the authors’ knowledge, this study is among the first to examine MALLMs’ management of multimodal, domain-specific analytics tasks, moving beyond general text-based applications, highlighting their advantages in generating educational insights and informing agent design while providing benchmarks for advancing multi-agent LLM systems.
In the digital age, brand meaning is increasingly shaped through user participation and content sharing on social media platforms. However, significant perceptual gaps often exist between official brand narratives and consumer interpretations. These multimodal and cognitively nuanced gaps are challenging to detect and model using traditional analytical methods. To address this, we propose a multi-agent framework that metaphorically models perception as an optical process-propagation, interference, and measurement---termed OPIM. We construct a novel dual-perspective dataset from representative social media platforms, integrating text and image content from both user-generated and official brand communications. We evaluate brand perception along six psychological dimensions. Experiments across 15 brands demonstrate that our framework effectively captures key perception gaps, particularly in sincerity, professionalism, and attractiveness. In contrast, materialism and sophistication exhibit higher alignment between brand messaging and consumer perception. Our framework enhances the cognitive alignment and multimodal interpretability of large language models, offering actionable insights for brand strategy and bridging computational modeling with human-centric understanding. The dataset will be available at https://github.com/htgan-ai/OPIM.
The idea that to perceive an object is to perceive its affordances—that is, the interactions of the perceiver with the world that the object supports or affords—is attractive from the point of view of theories in cognitive science that emphasize the fundamental role of actionsin representing an agent’s knowledge about the world. However, in this general form, the notion has so far lacked a formal expression. This paper offers a representation for objects in terms of their affordances using Linear Dynamic Event Calculus, a formalism for reasoning about causal relations over events. It argues that a representation of this kind, linking objects to the events which they are characteristically involved in, underlies some universal operations of natural language syntactic and semantic composition that are postulated in Combinatory Categorial Grammar (CCG). These observations imply that the language faculty is more directly related to prelinguistic cognitive apparatus used for planning action than formal theories in either domain have previously seemed to allow.
Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society. Relying on historical events to project the future is insufficient as such events are sparse and do not cover all possible conditions and nuanced situations. Simulation of these complex events can help better prepare and reduce the negative impact. We develop a controllable complex news event simulator guided by both the event schema representing domain knowledge about the scenario and user-provided assumptions representing case-specific conditions.As event dynamics depend on the fine-grained social and cultural context, we further introduce a geo-diverse commonsense and cultural norm-aware knowledge enhancement component.To enhance the coherence of the simulation, apart from the global timeline of events,we take an agent-based approach to simulate the individual character states, plans, and actions. By incorporating the schema and cultural norms, our generated simulations achieve much higher coherence and appropriateness and are received favorably by participants from a humanitarian assistance organization.
No abstract available
Pandemics have huge impact on all aspect of people's lives. As we have experienced during the Coronavirus pandemic, healthcare, education and the economy have been put under extreme strain. It is important therefore to be able to respond to such events fast in order to limit the damage to the society. Decision-makers typically are advised by experts in order to inform their response strategies. One of the tools that is widely used to support evidence-based decisions is modeling and simulation. In this paper, we present a hybrid agent-based and discrete-event simulation for the Coronavirus pandemic management at regional level. Our model considers disease dynamics, population interactions and dynamic ICU bed capacity management and predicts the impact of various public health preventive measures on the population and the healthcare service.
No abstract available
Queryable AAS Graphs for AI Agents: An Event-Driven Knowledge Graph Integration for AAS Environments
The Asset Administration Shell (AAS) plays a key role in Industry 4.0, providing a standardized digital representation of industrial assets to enable interoperability and data exchange. However, information retrieval with complex queries is still in its infancy despite recent advances in the specification of a dedicated query language, which will be hard to implement for different SDKs and storage back-ends.This paper proposes an event-driven architecture that integrates AAS contents into a Neo4j-based knowledge graph using Apache Kafka. This enables powerful, relationship-oriented Cypher queries for complex information retrieval and supports advanced cross-validation of references. The knowledge graph will be kept in sync with changes in the AAS. This establishes a solid foundation for advanced natural language user interfaces. As a proof-of-concept, we implemented an AI agent using a Large Language Model (LLM) and a standardized Neo4j tool integration by means of the Model Context Protocol (MCP) for question answering.Thus, the paper contributes to making the AAS more accessible and actionable for AI-driven industrial applications in a broad range of use cases without the need for a dedicated query language. The source code is publicly available on GitHub.
Dynamic Scene Graphs (DSGs) provide a structured representation of hierarchical, interconnected environments, but current approaches struggle to capture stochastic dynamics, partial observability, and multi-agent activity. These aspects are critical for embodied AI, where agents must act under uncertainty and delayed perception. We introduce FOGMACHINE , an open-source framework that fuses DSGs with discrete-event simulation to model object dynamics, agent observations, and interactions at scale. This setup enables the study of uncertainty propagation, planning under limited perception, and emergent multi-agent behavior. Experiments in urban scenarios illustrate realistic temporal and spatial patterns while revealing the challenges of belief estimation under sparse observations. By combining structured representations with efficient simulation, FOGMACHINE establishes an effective tool for benchmarking, model training, and advancing embodied AI in complex, uncertain environments.
Accurately spatio-temporal anomaly event inference is significant to enhance society’s safety, such as crime prevention and traffic collision reduction, etc. However, it is hard to achieve good performance for its complicated process being influenced by various kinds of factors. Previous works mainly focus on employing feature-based regression or fitting models with supposed spatio-temporal distribution to tackle this challenge, but are normally short of the following considerations: 1) mutual evolutionary influence, i.e., the dynamic evolution of interactions and dependencies among anomaly events, is changing along with the timeline and alters the probabilities or patterns of their occurrences dynamically; 2) messy features, i.e., complex attributes in the data but with noise, are difficult to select and aggregate for the representation learning algorithm with uncertainty and redundancy. To address the research gap, we put forward a co-evolutionary graph convolutional network model to explicate the dynamically spatio-temporal anomaly patterns. Specifically, we firstly take advantage of a fuzzy-rough set-based algorithm to select features by discovering the specialty and permanence attributes of different interaction features. Then, we propose a co-evolutionary learning method to embed the dynamically temporal influence into latent features with selected interaction information. Finally, we design a graph convolutional network with an attention mechanism to formulate the mutually spatial effects among the anomaly events. The proposed model is verified on the New York City crime records from the real-world, and extensive experiments show that our approach achieves 0.10129, 0.09958 and 0.10034 of the MAE (hour) in the action, location, and action-location time inference tasks, and 0.7973 and 0.4678 of the accuracy in the action and location type inference tasks, which outperform state-of-arts by 24.00%, 50.60%, 11.10%, 10.56% and 21.53% at most, respectively. Hyper-parameter and ablation experiments are also carried out further to demonstrate the sensitivity and effectiveness of our model.
No abstract available
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lackingmthe ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.
We present a knowledge representation framework on the basis of the Event Calculus that allows an agent to recognize complex activities from low‐level observations received by multiple sensors, reason about the life cycle of such activities, and take action to support their successful completion. Activities are multivalue fluents that change according to events that occur in the environment. The parameters of an activity consist of a unique label, a set of participants involved in the performing of the activity, and a unique goal associated with the activity revealing the activity's desired outcome. Our contribution is the identification of an activity life cycle describing how activities can be started, interrupted, suspended, resumed, or completed over time, as well as how these can be represented. The framework also specifies activity goals, their associated life cycle, and their relation with the activity life cycle. We provide the complete implementation of the framework, which includes an activity generator that automatically creates synthetic sensor data in the form of event streams that represent the everyday lifestyle of a type 1 diabetic patient. Moreover, we test the framework by generating very large activity streams that we use to evaluate the performance of the recognition capability and study its relative merits.
Abstract Predictive state representation (PSR) is a compact model of dynamic systems that represents state as a vector of predictions about future observable events. It is an alternative to a partially observable Markov decision process (POMDP) model in dealing with a sequential decision-making problem under uncertainty. Most of the existing PSR research focuses on the model learning in a single-agent setting. In this paper, we investigate a multi-agent PSR model upon available agents interaction data. It turns out to be rather difficult to learn a multi-agent PSR model especially with limited samples and increasing number of agents. We resort to a tensor technique to better represent dynamic system characteristics and address the challenging task of learning multi-agent PSR problems based on tensor optimization. We first focus on a two-agent scenario and use a third order tensor (system dynamics tensor) to capture the system interaction data. Then, the PSR model discovery can be formulated as a tensor optimization problem with group lasso, and an alternating direction method of multipliers is called for solving the embedded subproblems. Hence, the prediction parameters and state vectors can be directly learned from the optimization solutions, and the transition parameters can be derived via a linear regression. Subsequently, we generalize the tensor learning approach in a multi( N > 2)-agent PSR model, and analyze the computational complexity of the learning algorithms. Experimental results show that the tensor optimization approaches have provided promising performances on learning a multi-agent PSR model over multiple problem domains.
No abstract available
Large Language Models (LLMs) have revolutionized the simulation of agent societies, enabling autonomous planning, memory formation, and social interactions. However, existing frameworks often overlook systematic evaluations for event organization and lack visualized integration with physically grounded environments, limiting agents'ability to navigate spaces and interact with items realistically. We develop MiniAgentPro, a visualization platform featuring an intuitive map editor for customizing environments and a simulation player with smooth animations. Based on this tool, we introduce a comprehensive test set comprising eight diverse event scenarios with basic and hard variants to assess agents'ability. Evaluations using GPT-4o demonstrate strong performance in basic settings but highlight coordination challenges in hard variants.
This article proposes a formal framework based on discrete event systems in order to analyze the democratic progress and regression in a society controlled by networked agents. For this purpose, we construct a simple model using a finite state automaton that describes the dynamic behavior of progress and regression in a democracy. We represent a network of agents as a directed graph where each agent has its own objective. Each agent may be a citizen or a group of people sharing a common objective, and it makes decisions on enabling or disabling events upon the observation of states of a system. Agents may have different decisions on the same event, and the final decision follows the majority rule. Upon this framework, we derive the necessary and sufficient conditions for a democratic system controlled by networked agents to be progressive or regressive, where a progressive one implies that it reaches a more equal state at which a larger number of agents meet their objectives. Finally, we obtain some convergence results for special graph topologies.
We present CreAgentive, an agent workflow driven multi-category creative generation engine that addresses four key limitations of contemporary large language models in writing stories, drama and other categories of creatives: restricted genre diversity, insufficient output length, weak narrative coherence, and inability to enforce complex structural constructs. At its core, CreAgentive employs a Story Prototype, which is a genre-agnostic, knowledge graph-based narrative representation that decouples story logic from stylistic realization by encoding characters, events, and environments as semantic triples. CreAgentive engages a three-stage agent workflow that comprises: an Initialization Stage that constructs a user-specified narrative skeleton; a Generation Stage in which long- and short-term objectives guide multi-agent dialogues to instantiate the Story Prototype; a Writing Stage that leverages this prototype to produce multi-genre text with advanced structures such as retrospection and foreshadowing. This architecture reduces storage redundancy and overcomes the typical bottlenecks of long-form generation. In extensive experiments, CreAgentive generates thousands of chapters with stable quality and low cost (less than $1 per 100 chapters) using a general-purpose backbone model. To evaluate performance, we define a two-dimensional framework with 10 narrative indicators measuring both quality and length. Results show that CreAgentive consistently outperforms strong baselines and achieves robust performance across diverse genres, approaching the quality of human-authored novels.
Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our architecture consists of four modules: Creation & Representation, Spreading, Evaluation, and Compliance. This addresses several important aspects of the emergent processes all in one: (i) where social norms come from, (ii) how they are formally represented, (iii) how they spread through agents' communications and observations, (iv) how they are examined with a sanity check and synthesized in the long term, and (v) how they are incorporated into agents' planning and actions. Our experiments deployed in the Smallville sandbox game environment demonstrate the capability of our architecture to establish social norms and reduce social conflicts within generative MASs. The positive outcomes of our human evaluation, conducted with 30 evaluators, further affirm the effectiveness of our approach. Our project can be accessed via the following link: https://github.com/sxswz213/CRSEC.
No abstract available
With the development of social media, rumors spread quickly, cause great harm to society and economy. Thereby, many effective rumor detection methods have been developed, among which the rumor propagation structure learning based methods are particularly effective compared to other methods. However, the existing methods still suffer from many issues including the difficulty to obtain large-scale labeled rumor datasets, which leads to the low generalization ability and the performance degeneration on new events since rumors are time-critical and usually appear with hot topics or newly emergent events. In order to solve the above problems, in this study, we used large-scale unlabeled topic datasets crawled from the social media platform Weibo and Twitter with claim propagation structure to improve the semantic learning ability of a graph reprentation learing model on various topics. We use three typical graph self-supervised methods, InfoGraph, JOAO and GraphMAE in two commonly used training strategies, to verify the performance of general graph semi-supervised methods in rumor detection tasks. In addition, for alleviating the time and topic difference between unlabeled topic data and rumor data, we also collected a rumor dataset covering a variety of topics over a decade (10-year ago from 2022) from the Weibo rumor-refuting platform. Our experiments show that these general graph self-supervised learning methods outperform previous methods specifically designed for rumor detection tasks and achieve good performance under few-shot conditions, demonstrating the better generalization ability with the help of our massive unlabeled topic dataset.
No abstract available
Large Language Models (LLMs) have been used to make decisions in complex scenarios, where they need models to think deeply, reason logically, and decide wisely. Many existing studies focus solely on multi-round conversations in social tasks or simulated environments, neglecting the various types of decisions and their interdependence. Current reinforcement learning methods struggle to consider the strategies of others during training. To address these issues, we first define a strategic decision-making problem that includes two types of decisions and their temporal dependencies. Furthermore, we propose **T**heory **o**f **M**ind **P**olicy **O**ptimization **(ToMPO)** algorithm to optimize the perception of other individual strategies and the game situation trends. Compared to the Group Relative Policy Optimization (GRPO) algorithm, ToMPO enhances the LLM's strategic decision-making mainly by: 1) generating rollouts based on reasoning the strategies of other individuals, 2) estimating advantages at both the graph-level and sample-level, and 3) balancing global and partial rewards. The ToMPO algorithm outperforms the GRPO method by 35% in terms of model output compliance and cooperative outcomes. Additionally, when compared to models with parameter sizes 100 times larger, it shows an 18% improvement. This demonstrates the effectiveness of the ToMPO algorithm in enhancing the model's strategic decision-making capabilities.
This position paper examines the use of Large Language Models (LLMs) in social simulation, analyzing their potential and limitations from a computational social science perspective. We first review recent findings on LLMs'ability to replicate key aspects of human cognition, including Theory of Mind reasoning and social inference, while identifying persistent limitations such as cognitive biases, lack of grounded understanding, and behavioral inconsistencies. We then survey emerging applications of LLMs in multi-agent simulation frameworks, examining system architectures, scalability, and validation strategies. Projects such as Generative Agents (Smallville) and AgentSociety are analyzed with respect to their empirical grounding and methodological design. Particular attention is given to the challenges of behavioral fidelity, calibration, and reproducibility in large-scale LLM-driven simulations. Finally, we distinguish between contexts where LLM-based agents provide operational value-such as interactive simulations and serious games-and contexts where their use raises epistemic concerns, particularly in explanatory or predictive modeling. We argue that hybrid approaches integrating LLMs into established agent-based modeling platforms such as GAMA and NetLogo may offer a promising compromise between expressive flexibility and analytical transparency. Building on this analysis, we outline a conceptual research direction termed Hybrid Constitutional Architectures, which proposes a stratified integration of classical agent-based models (ABMs), small language models (SLMs), and LLMs within established platforms such as GAMA and NetLogo.
Large language models (LLMs) are increasingly leveraged to empower autonomous agents to simulate human beings in various fields of behavioral research. However, evaluating their capacity to navigate complex social interactions remains a challenge. Previous studies face limitations due to insufficient scenario diversity, complexity, and a single-perspective focus. To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning. Code and data are available at \url{https://github.com/ljcleo/agent_sense}.
Logical reasoning is a fundamental capability of large language models. However, existing studies often overlook the interaction between logical complexity and semantic complexity, leading to systems that struggle with abstract propositions, ambiguous contexts, and conflicting stances that are central to human reasoning. We propose LogicAgent, a semiotic-square-guided framework that jointly addresses these two axes of difficulty. The semiotic square provides a principled structure for multi-perspective semantic analysis, and LogicAgent integrates automated deduction with reflective verification to manage logical complexity across deeper reasoning chains. To support evaluation under these conditions, we introduce RepublicQA, a benchmark that couples semantic complexity with logical depth. RepublicQA reaches college-level semantic difficulty (FKGL 11.94), contains philosophically grounded abstract propositions with systematically constructed contrary and contradictory forms, and offers a semantically rich setting for assessing logical reasoning in large language models. Experiments show that LogicAgent achieves state-of-the-art performance on RepublicQA with a 6.25 percent average improvement over strong baselines, and generalizes effectively to mainstream logical reasoning benchmarks including ProntoQA, ProofWriter, FOLIO, and ProverQA, achieving an additional 7.05 percent average gain. These results demonstrate the effectiveness of semiotic-grounded multi-perspective reasoning in enhancing logical performance.
The emergence of large language models (LLMs) has brought a new paradigm to automated essay scoring (AES), a long-standing and practical application of natural language processing in education. However, achieving human-level multi-perspective understanding and judgment remains a challenge. In this work, we propose Roundtable Essay Scoring (RES), a multi-agent evaluation framework designed to perform precise and human-aligned scoring under a zero-shot setting. RES constructs evaluator agents based on LLMs, each tailored to a specific prompt and topic context. Each agent independently generates a trait-based rubric and conducts a multi-perspective evaluation. Then, by simulating a roundtable-style discussion, RES consolidates individual evaluations through a dialectical reasoning process to produce a final holistic score that more closely aligns with human evaluation. By enabling collaboration and consensus among agents with diverse evaluation perspectives, RES outperforms prior zero-shot AES approaches. Experiments on the ASAP dataset using ChatGPT and Claude show that RES achieves up to a 34.86% improvement in average QWK over straightforward prompting (Vanilla) methods.
LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.
User sentiment on social media reveals the underlying social trends, crises, and needs. Researchers have analyzed users'past messages to trace the evolution of sentiments and reconstruct sentiment dynamics. However, predicting the imminent sentiment of an ongoing event is rarely studied. In this paper, we address the problem of \textbf{sentiment forecasting} on social media to predict the user's future sentiment in response to the development of the event. We extract sentiment-related features to enhance the modeling skill and propose a multi-perspective role-playing framework to simulate the process of human response. Our preliminary results show significant improvement in sentiment forecasting on both microscopic and macroscopic levels.
Abstract Simulation is a widely used approach for evaluating system performance, robustness, and potential issues during design and testing. Large Language Models (LLMs) have recently shown strong potential in autonomous agent systems, including negotiation tasks—a core aspect of commerce. This paper evaluates LLM-based autonomous negotiator agents (LANAs) in a buyer-seller bargaining game to assess their decision-making and reasoning. We simulate interactions between agents embodying contrasting social behaviors: (a) Cunning vs. Kind, and (b) Greedy vs. Generous. By analyzing both the game outcomes and the agents’ internal reasoning, we find that LLMs can effectively simulate distinct social behaviors in both dialogue and decision-making. Our results offer insights into how social traits affect negotiation dynamics, emphasizing the importance of clear policy design to ensure fairness and reliability in LANA-based systems.
Recent advancements in Large Language Models offer promising capabilities to simulate complex human social interactions. We investigate whether LLM-based multi-agent simulations can reproduce core human social dynamics observed in online forums. We evaluate conformity dynamics, group polarization, and fragmentation across different model scales and reasoning capabilities using a structured simulation framework. Our findings indicate that smaller models exhibit higher conformity rates, whereas models optimized for reasoning are more resistant to social influence.
Spatial reasoning is a core aspect of human intelligence that allows perception, inference and planning in 3D environments. However, current vision-language models (VLMs) struggle to maintain geometric coherence and cross-view consistency for spatial reasoning in multi-view settings. We attribute this gap to the lack of fine-grained benchmarks that isolate multi-view reasoning from single-view perception and temporal factors. To address this, we present ReMindView-Bench, a cognitively grounded benchmark for evaluating how VLMs construct, align and maintain spatial mental models across complementary viewpoints. ReMindView-Bench systematically varies viewpoint spatial pattern and query type to probe key factors of spatial cognition. Evaluations of 15 current VLMs reveals consistent failures in cross-view alignment and perspective-taking in multi-view spatial reasoning, motivating deeper analysis on the reasoning process. Explicit phase-wise analysis using LLM-as-a-judge and self-consistency prompting shows that VLMs perform well on in-frame perception but degrade sharply when integrating information across views. Implicit analysis, including linear probing and entropy dynamics, further show progressive loss of task-relevant information and uncertainty separation between correct and incorrect trajectories. These results provide a cognitively grounded diagnosis of VLM spatial reasoning and reveal how multi-view spatial mental models are formed, degraded and destabilized across reasoning phases. The ReMindView-Bench benchmark is available at https://huggingface.co/datasets/Xue0823/ReMindView-Bench, and the source codes of benchmark construction and VLM reasoning analysis are available at https://github.com/pittisl/ReMindView-Bench.
With the development of social media, people are exposed to a vast amount of unverified information, making fact-checking particularly important. Existing fact-checking methods primarily encourage breaking down claims into more easily solvable sub-tasks, and deriving final answers through reasoning with external evidence. However, these models face logical issues regarding whether and how the sub-tasks can logically be combined to form the original claims, and encounter causal errors in the reasoning process due to insufficient evidence or hallucinations from LLMs. In addition, they often suffer from a lack of interpretability. In this paper, we propose Logical and Causal fact-checking (LoCal), a novel fact-checking framework based on multiple LLM-based agents. The usage of multi-agent systems is due to their increasingly demonstrated ability to perform complex tasks in a manner similar to humans. LoCal primarily consists of a decomposing agent, multiple reasoning agents, and two evaluating agents. Specifically, the decomposing agent first utilizes the in-context learning ability of LLMs to break down complex claims into simpler sub-tasks, including fact verification tasks and question answering tasks. Afterwards, two types of reasoning agents are respectively utilized to retrieve external knowledge to address the fact verification tasks that require comparative analysis skills, and the question answering tasks that necessitate the ability of information extraction from evidence. We then combine the sub-tasks and their corresponding responses to generate a solution for evaluation. In order to enhance logical and causal consistency, two evaluating agents are respectively employed to examine whether the generated solution is logically equivalent to the original claim and determine whether the solution still holds when challenged by the counterfactual label. The evaluating agents provide confidence degrees for the solutions based on the evaluation results and iteratively correct the logical and causal errors in the reasoning process. We evaluate LoCal on two challenging datasets, and the results show that LoCal significantly outperforms all the baseline models across different settings of evidence availability. In addition, LoCal offers better interpretability by providing a structured solution along with detailed evaluating processes. We believe LoCal will provide valuable insights for future misinformation detection.
Graph Retrieval Augmented Generation (GraphRAG) effectively enhances external knowledge integration capabilities by explicitly modeling knowledge relationships, thereby improving the factual accuracy and generation quality of Large Language Models (LLMs) in specialized domains. However, existing methods suffer from two inherent limitations: 1) Inefficient Information Aggregation: They rely on a single agent and fixed iterative patterns, making it difficult to adaptively capture multi-level textual, structural, and degree information within graph data. 2) Rigid Reasoning Mechanism: They employ preset reasoning schemes, which cannot dynamically adjust reasoning depth nor achieve precise semantic correction. To overcome these limitations, we propose Graph Counselor, an GraphRAG method based on multi-agent collaboration. This method uses the Adaptive Graph Information Extraction Module (AGIEM), where Planning, Thought, and Execution Agents work together to precisely model complex graph structures and dynamically adjust information extraction strategies, addressing the challenges of multi-level dependency modeling and adaptive reasoning depth. Additionally, the Self-Reflection with Multiple Perspectives (SR) module improves the accuracy and semantic consistency of reasoning results through self-reflection and backward reasoning mechanisms. Experiments demonstrate that Graph Counselor outperforms existing methods in multiple graph reasoning tasks, exhibiting higher reasoning accuracy and generalization ability. Our code is available at https://github.com/gjq100/Graph-Counselor.git.
Existing work on improving language model reasoning typically explores a single solution path, which can be prone to errors. Inspired by perspective-taking in social studies, this paper introduces DiPT, a novel approach that complements current reasoning methods by explicitly incorporating diversified viewpoints. This approach allows the model to gain a deeper understanding of the problem's context and identify the most effective solution path during the inference stage. Additionally, it provides a general data-centric AI recipe for augmenting existing data to improve their quality for fine-tuning. Our empirical results demonstrate that DiPT can be flexibly integrated into existing methods that focus on a single reasoning approach, enhancing their reasoning performance and stability when presented with paraphrased problems. Furthermore, we illustrate improved context understanding by maintaining the model's safe outputs against"jailbreaking"prompts intentionally designed to bypass safeguards built into deployed models. Lastly, we show that fine-tuning with data enriched with diverse perspectives can boost the reasoning capabilities of the model compared to fine-tuning with raw data alone.
Effective relevance modeling is crucial for e-commerce search, as it aligns search results with user intent and enhances customer experience. Recent work has leveraged large language models (LLMs) to address the limitations of traditional relevance models, especially for long-tail and ambiguous queries. By incorporating Chain-of-Thought (CoT) reasoning, these approaches improve both accuracy and interpretability through multi-step reasoning. However, two key limitations remain: (1) most existing approaches rely on single-perspective CoT reasoning, which fails to capture the multifaceted nature of e-commerce relevance (e.g., user intent vs. attribute-level matching vs. business-specific rules); and (2) although CoT-enhanced LLM's offer rich reasoning capabilities, their high inference latency necessitates knowledge distillation for real-time deployment, yet current distillation methods discard the CoT rationale structure at inference, using it as a transient auxiliary signal and forfeiting its reasoning utility. To address these challenges, we propose a novel framework that better exploits CoT semantics throughout the optimization pipeline. Specifically, the teacher model leverages Multi-Perspective CoT (MPCoT) to generate diverse rationales and combines Supervised Fine-Tuning (SFT) with Direct Preference Optimization (DPO) to construct a more robust reasoner. For distillation, we introduce Latent Reasoning Knowledge Distillation (LRKD), which endows a student model with a lightweight inference-time latent reasoning extractor, allowing efficient and low-latency internalization of the LLM's sophisticated reasoning capabilities. Evaluated in offline experiments and online A/B tests on an e-commerce search advertising platform serving tens of millions of users daily, our method delivers significant offline gains, showing clear benefits in both commercial performance and user experience.
Phishing websites remain a major cybersecurity threat, exploiting deceptive structures, brand impersonation, and social engineering to evade detection. Recent advances in large language models (LLMs) have improved phishing detection through contextual understanding, yet most existing approaches rely on single-agent classification, which is prone to hallucination and often lacks interpretability and robustness. To address these limitations, we propose PhishDebate, a modular multi-agent LLM-based debate framework for phishing website detection. Four specialized agents independently analyze webpage aspects, including URL structure, HTML composition, semantic content, and brand impersonation, under the coordination of a Moderator and final Judge. Through structured debate and divergent reasoning, the framework achieves more accurate and interpretable decisions. By reducing uncertain predictions and providing transparent reasoning, PhishDebate functions as an analyst-augmentation system that lowers cognitive load and supports early, left-of-exploit detection of phishing threats. Evaluations on commercial LLMs show that PhishDebate achieves 98.2 % recall on a real-world phishing dataset and outperforms single-agent and Chain-of-Thought (CoT) baselines. Its modular design enables agent-level configurability, allowing adaptation to varying resource and application requirements, and offers scalability to high-velocity, large-scale security data environments.
Inference-time computation is a critical yet challenging paradigm for enhancing the reasoning performance of large language models (LLMs). While existing strategies improve reasoning stability and consistency, they suffer from notable limitations: self-correction often reinforces the model's initial biases, and Multi-Agent Collaboration (MAC) often fails due to the lack of efficient coordination mechanisms, leading to collective errors. Although high-performing verifiers can detect reasoning errors, making them reliable requires substantial training. To address these challenges, we introduce a novel inference-time framework, Adaptive Coopetition (AdCo), in which LLM agents utilize an adaptive, UCB-based"coopetition"mechanism. At each round, agents leverage coarse verifier signals to determine whether to collaborate or compete, and iteratively refine their reasoning based on peer feedback. Without relying on high-performance verifiers, our adaptive strategy achieves significant performance gains on mathematical reasoning benchmarks, yielding a 20% relative improvement over baselines on the more challenging dataset. Our approach remains robust and consistent in terms of accuracy under different sample sizes and configurations. This adaptive, signal-guided"coopetition"framework enhances reasoning robustness by leveraging both model knowledge diversity and reasoning trace measures, while also promoting uncertainty-driven exploration, especially when participants have comparable capabilities. From this perspective, our work offers a fresh lens on inference-time computation and paves the way for more resilient multi-agent LLM systems. Our code is available at: https://github.com/AdCo-Research/adaptive-coopetition.
This paper presents Project Riley, a novel multimodal and multi-model conversational AI architecture oriented towards the simulation of reasoning influenced by emotional states. Drawing inspiration from Pixar's Inside Out, the system comprises five distinct emotional agents - Joy, Sadness, Fear, Anger, and Disgust - that engage in structured multi-round dialogues to generate, criticise, and iteratively refine responses. A final reasoning mechanism synthesises the contributions of these agents into a coherent output that either reflects the dominant emotion or integrates multiple perspectives. The architecture incorporates both textual and visual large language models (LLMs), alongside advanced reasoning and self-refinement processes. A functional prototype was deployed locally in an offline environment, optimised for emotional expressiveness and computational efficiency. From this initial prototype, another one emerged, called Armando, which was developed for use in emergency contexts, delivering emotionally calibrated and factually accurate information through the integration of Retrieval-Augmented Generation (RAG) and cumulative context tracking. The Project Riley prototype was evaluated through user testing, in which participants interacted with the chatbot and completed a structured questionnaire assessing three dimensions: Emotional Appropriateness, Clarity and Utility, and Naturalness and Human-likeness. The results indicate strong performance in structured scenarios, particularly with respect to emotional alignment and communicative clarity.
The rapid advancement of LLMs has led to the creation of diverse agentic systems in data analysis, utilizing LLMs' capabilities to improve insight generation and visualization. In this paper, we present an agentic system that automates the data-to-dashboard pipeline through modular LLM agents capable of domain detection, concept extraction, multi-perspective analysis generation, and iterative self-reflection. Unlike existing chart QA systems, our framework simulates the analytical reasoning process of business analysts by retrieving domain-relevant knowledge and adapting to diverse datasets without relying on closed ontologies or question templates. We evaluate our system on three datasets across different domains. Benchmarked against GPT-4o with a single-prompt baseline, our approach shows improved insightfulness, domain relevance, and analytical depth, as measured by tailored evaluation metrics and qualitative human assessment. This work contributes a novel modular pipeline to bridge the path from raw data to visualization, and opens new opportunities for human-in-the-loop validation by domain experts in business analytics. All code can be found here: https://github.com/77luvC/D2D_Data2Dashboard
LLM-powered Multi-Agent Systems (LLM-MAS) unlock new potentials in distributed reasoning, collaboration, and task generalization but also introduce additional risks due to unguaranteed agreement, cascading uncertainty, and adversarial vulnerabilities. We argue that ensuring responsible behavior in such systems requires a paradigm shift: from local, superficial agent-level alignment to global, systemic agreement. We conceptualize responsibility not as a static constraint but as a lifecycle-wide property encompassing agreement, uncertainty, and security, each requiring the complementary integration of subjective human-centered values and objective verifiability. Furthermore, a dual-perspective governance framework that combines interdisciplinary design with human-AI collaborative oversight is essential for tracing and ensuring responsibility throughout the lifecycle of LLM-MAS. Our position views LLM-MAS not as loose collections of agents, but as unified, dynamic socio-technical systems that demand principled mechanisms to support each dimension of responsibility and enable ethically aligned, verifiably coherent, and resilient behavior for sustained, system-wide agreement.
Large Language Models (LLMs) are increasingly instantiated as interacting agents in multi-agent systems (MAS), where collective decisions emerge through social interaction rather than independent reasoning. A fundamental yet underexplored mechanism in this process is conformity, the tendency of agents to align their judgments with prevailing group opinions. This paper presents a systematic study of how network topology shapes conformity dynamics in LLM-based MAS through a misinformation detection task. We introduce a confidence-normalized pooling rule that controls the trade-off between self-reliance and social influence, enabling comparisons between two canonical decision paradigms: Centralized Aggregation and Distributed Consensus. Experimental results demonstrate that network topology critically governs both the efficiency and robustness of collective judgments. Centralized structures enable immediate decisions but are sensitive to hub competence and exhibit same-model alignment biases. In contrast, distributed structures promote more robust consensus, while increased network connectivity speeds up convergence but also heightens the risk of wrong-but-sure cascades, in which agents converge on incorrect decisions with high confidence. These findings characterize the conformity dynamics in LLM-based MAS, clarifying how network topology and self-social weighting jointly shape the efficiency, robustness, and failure modes of collective decision-making.
Recent advances in Large Language Models (LLMs) and multimodal foundation models have significantly broadened their application in robotics and collaborative systems. However, effective multi-agent interaction necessitates robust perspective-taking capabilities, enabling models to interpret both physical and epistemic viewpoints. Current training paradigms often neglect these interactive contexts, resulting in challenges when models must reason about the subjectivity of individual perspectives or navigate environments with multiple observers. This study evaluates whether explicitly incorporating diverse points of view using the ReAct framework, an approach that integrates reasoning and acting, can enhance an LLM's ability to understand and ground the demands of other agents. We extend the classic Director task by introducing active visual exploration across a suite of seven scenarios of increasing perspective-taking complexity. These scenarios are designed to challenge the agent's capacity to resolve referential ambiguity based on visual access and interaction, under varying state representations and prompting strategies, including ReAct-style reasoning. Our results demonstrate that explicit perspective cues, combined with active exploration strategies, significantly improve the model's interpretative accuracy and collaborative effectiveness. These findings highlight the potential of integrating active perception with perspective-taking mechanisms in advancing LLMs'application in robotics and multi-agent systems, setting a foundation for future research into adaptive and context-aware AI systems.
Large language models (LLMs) face persistent challenges when handling long-context tasks, most notably the lost in the middle issue, where information located in the middle of a long input tends to be underutilized. Some existing methods that reduce input have the risk of discarding key information, while others that extend context windows often lead to attention dispersion. To address these limitations, we propose Tree of Agents (TOA), a multi-agent reasoning framework that segments the input into chunks processed by independent agents. Each agent generates its local cognition, then agents dynamically exchange information for collaborative reasoning along tree-structured paths. TOA enables agents to probe different reasoning orders for multi-perspective understanding, effectively mitigating position bias and reducing hallucinations. To improve processing efficiency, we incorporate prefix-hash caching and adaptive pruning strategies, achieving significant performance improvements with comparable API overhead. Experiments show that TOA, powered by compact LLaMA3.1-8B, significantly outperforms multiple baselines and demonstrates comparable performance to the latest and much larger commercial models, such as Gemini1.5-pro, on various long-context tasks. Code is available at https://github.com/Aireduce952/Tree-of-Agents.
Large Language Models (LLMs) are exhibiting emergent human-like abilities and are increasingly envisioned as the foundation for simulating an individual's communication style, behavioral tendencies, and personality traits. However, current evaluations of LLM-based persona simulation remain limited: most rely on synthetic dialogues, lack systematic frameworks, and lack analysis of the capability requirement. To address these limitations, we introduce TwinVoice, a comprehensive benchmark for assessing persona simulation across diverse real-world contexts. TwinVoice encompasses three dimensions: Social Persona (public social interactions), Interpersonal Persona (private dialogues), and Narrative Persona (role-based expression). It further decomposes the evaluation of LLM performance into six fundamental capabilities, including opinion consistency, memory recall, logical reasoning, lexical fidelity, persona tone, and syntactic style. Experimental results reveal that while advanced models achieve moderate accuracy in persona simulation, they still fall short of capabilities such as syntactic style and memory recall. Consequently, the average performance achieved by LLMs remains considerably below the human baseline.
Large language models (LLMs) struggle in social science domains, where critical thinking and human-level inference are crucial. In this work, we propose a multi-agent social reasoning framework that leverages the generative and reasoning capabilities of LLMs to generate and evaluate reasons from multiple perspectives grounded in social science theories, and construct a factor graph for inference. Experimental results on understanding power dynamics in conversations show that our method out-performs standard prompting baselines, demonstrating its potential for tackling hard Computational Social Science (CSS) tasks.
Cyber-Physical-Social Systems (CPSS) have emerged as a transformative paradigm in recent years, embracing computational processes, physical systems, and human social interactions within an integrated architectural framework. Advances in artificial intelligence technologies are targeted at addressing the complexity of CPSS design, especially in modeling human reactions in cyber-physical environment. Notably, LLM-based agents have shown significant potential, and numerous studies have leveraged multi-agent collaboration frameworks to solve reasoning tasks. Some approaches achieve multi-agent collaboration through a debate or communication setting. However, these approaches only use the existing capabilities of LLMs, fail to enhance their problem-solving performance. Other works incorporate the responses of other LLMs into their training trajectories to train individual LLMs in a reinforcement learning setting. We argue that effective collaboration should align not only in input information but also in consistent optimization objectives. Furthermore, in current cooperative frameworks, some LLMs tend to redundantly repeat others’ viewpoints, contributing minimally to solve problems. In this paper, inspired by multi-agent reinforcement learning research, we propose MACT, a Multi-Agent Cooperative Tuning framework to joint train multiple LLMs, ensuring that the optimization of each agent aligns directly with the objective of the global task. We equip each agent with a critic network to facilitate individual optimization. Furthermore, to encourage different agents to complement each other and contribute to the overall task, we employ a mixing network that ensures the value of each agent is monotonically consistent with the total value. Experimental results reveal that our method significantly enhances cooperative problem-solving capabilities in the LLM multi-agent framework, which set strong evidence for the modeling of human reaction within CPSS.
In this paper, we present early work exploring two essential socio-cognitive capacities for intelligent social behavior: perspective-taking, the ability to represent and utilize others' viewpoints, and abductive reasoning, the ability to generate plausible explanations. Together, these competencies facilitate the generation of hypotheses about self, other, and the world. We describe extensions to the Event Calculus and Abductive Event Calculus that support these capacities, and provide a set of minimal, illustrative scenarios that demonstrate how these capacities facilitate practical reasoning tasks.
No abstract available
Understanding Theory of Mind is essential for building socially intelligent multimodal agents capable of perceiving and interpreting human behavior. We introduce MoMentS (Multimodal Mental States), a comprehensive benchmark designed to assess the ToM capabilities of multimodal large language models (LLMs) through realistic, narrative-rich scenarios presented in short films. MoMentS includes over 2,300 multiple-choice questions spanning seven distinct ToM categories. The benchmark features long video context windows and realistic social interactions that provide deeper insight into characters'mental states. We evaluate several MLLMs and find that although vision generally improves performance, models still struggle to integrate it effectively. For audio, models that process dialogues as audio do not consistently outperform transcript-based inputs. Our findings highlight the need to improve multimodal integration and point to open challenges that must be addressed to advance AI's social understanding.
In this study, we extend our robot trust model into a multimodal setting in which the Nao robot leverages audio-visual data to perform a sequential multimodal pattern recalling task while interacting with a human partner who has different guiding strategies: reliable, unreliable, and random. Here, the humanoid robot is equipped with a multimodal auto-associative memory module to process audio-visual patterns to extract cognitive load (i.e., computational cost) and an internal reward module to perform cost-guided reinforcement learning. After interactive experiments, the robot associates a low cognitive load (i.e., high cumulative reward) yielded during the interaction with high trustworthiness of the guiding strategy of the partner. At the end of the experiment, we provide a free choice to the robot to select a trustworthy instructor. We show that the robot forms trust in a reliable partner. In the second setting of the same experiment, we endow the robot with an additional simple theory of mind module to assess the efficacy of the instructor in helping the robot perform the task. Our results show that the performance of the robot is improved when the robot bases its action decisions on factoring in the instructor assessment.
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
Recent advances in agentic AI have led to systems capable of autonomous task execution and language-based reasoning, yet their spatial reasoning abilities remain limited and underexplored, largely constrained to symbolic and sequential processing. In contrast, human spatial intelligence, rooted in integrated multisensory perception, spatial memory, and cognitive maps, enables flexible, context-aware decision-making in unstructured environments. Therefore, bridging this gap is critical for advancing Agentic Spatial Intelligence toward better interaction with the physical 3D world. To this end, we first start from scrutinizing the spatial neural models as studied in computational neuroscience, and accordingly introduce a novel computational framework grounded in neuroscience principles. This framework maps core biological functions to six essential computation modules: bio-inspired multimodal sensing, multi-sensory integration, egocentric-allocentric conversion, an artificial cognitive map, spatial memory, and spatial reasoning. Together, these modules form a perspective landscape for agentic spatial reasoning capability across both virtual and physical environments. On top, we conduct a framework-guided analysis of recent methods, evaluating their relevance to each module and identifying critical gaps that hinder the development of more neuroscience-grounded spatial reasoning modules. We further examine emerging benchmarks and datasets and explore potential application domains ranging from virtual to embodied systems, such as robotics. Finally, we outline potential research directions, emphasizing the promising roadmap that can generalize spatial reasoning across dynamic or unstructured environments. We hope this work will benefit the research community with a neuroscience-grounded perspective and a structured pathway. Our project page can be found at Github.
In order to achieve a widespread adoption of social robots in the near future, we need to design intelligent systems that are able to autonomously understand our beliefs and preferences. This will pave the foundation for a new generation of robots able to navigate the complexities of human societies. To reach this goal, we look into Theory of Mind (ToM): the cognitive ability to understand other agents’ mental states. In this paper, we rely on a probabilistic ToM model to detect when a human has false beliefs with the purpose of driving the decision-making process of a collaborative robot. In particular, we recreate an established psychology experiment involving the search for a toy that can be secretly displaced by a malicious individual. The results that we have obtained in simulated experiments show that the agent is able to predict human mental states and detect when false beliefs have arisen. We then explored the set-up in a real-world human interaction to assess the feasibility of such an experiment with a humanoid social robot.
With the rapid advancements in emotion processing algorithm within artificial intelligence, it is essential to explore the evolving relationships between humans and robots. This exploration can prepare society for the future widespread application of social robots and address the new social dynamics involving AI agents. Human–robot empathy emerges as a crucial avenue for exploring the emotional connections between humans and robots. The purpose of this study was to investigate the impact of users’ perceptions of robots’ minds on human–robot empathy. This study manipulated perceived theory of mind (ToM) in robots through human–robot interaction scenarios, utilizing four experiments to assess the effects of perceived ToM in robots—categorized as cognitive ToM (cToM) in Experiments 1a and 2a, and affective ToM (aToM) in Experiments 1b and 2b—on human–robot empathy, including pain empathy and empathic concern. Experiments 1a and 1b examined the influence of perceived ToM in robots on human–robot empathy within classic ToM scenarios, while Experiments 2a and 2b were conducted within real service contexts, further investigating the mediating role of users’ mind perceptions of robots. First, perceiving a robot with high aToM significantly enhanced users’ pain empathy and empathic concern towards robots, with the experience dimension of mind perception potentially serving as an indirect-only mediator in this relationship. Second, in real home service scenarios in Experiment 2, while the total effect of high cToM on empathic concern was not statistically significant after multiple comparisons correction, mediation analysis revealed a significant negative direct effect alongside a positive indirect effect through agency. This pattern suggests that perceiving high cToM may simultaneously inhibit empathic concern directly while potentially fostering it through enhanced agency perception. The findings demonstrate that perceived aToM in robots consistently enhances human–robot emotional interactions, while revealing a more complex dual-pathway mechanism for cToM effects. These results provide valuable insights into how distinct dimensions of mind perception shape human–robot relationships.
Knowledge editing enables multimodal large language models (MLLMs) to efficiently update outdated or incorrect information. However, existing benchmarks primarily emphasize cognitive-level modifications while lacking a focus on deeper meta-cognitive processes. To bridge this gap, we introduce CogEdit, a novel benchmark designed to evaluate MLLMs'meta-cognitive knowledge editing abilities across three levels: (1) Counterfactual-Driven Editing, assessing self-awareness of knowledge correctness changes; (2) Boundary Constraint Editing, ensuring appropriate generalization without unintended interference; and (3) Noise-Robust Editing, promoting reflective evaluation of uncertain information. To advance meta-cognitive editing, we propose MIND (Meta-cognitive INtegrated Dynamic Knowledge Editing), a framework that constructs a meta-knowledge memory for self-awareness, employs game-theoretic interactions to monitor knowledge activation, and incorporates label refinement for noise-robust updates. Extensive experiments show that MIND significantly outperforms existing cognitive editing approaches, achieving strong performance on both traditional and meta-cognitive knowledge editing benchmarks.
Both Minsky's"society of mind"and Schmidhuber's"learning to think"inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a"mindstorm."Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.
Most existing Theory of Mind (ToM) benchmarks for foundation models rely on variations of the Sally-Anne test, offering only a very limited perspective on ToM and neglecting the complexity of human social interactions. To address this gap, we propose ToM-SSI: a new benchmark specifically designed to test ToM capabilities in environments rich with social interactions and spatial dynamics. While current ToM benchmarks are limited to text-only or dyadic interactions, ToM-SSI is multimodal and includes group interactions of up to four agents that communicate and move in situated environments. This unique design allows us to study, for the first time, mixed cooperative-obstructive settings and reasoning about multiple agents'mental state in parallel, thus capturing a wider range of social cognition than existing benchmarks. Our evaluations reveal that the current models'performance is still severely limited, especially in these new tasks, highlighting critical gaps for future research.
Social networks are important platforms for people in modern society to share information knowledge ,one of which is WeChat Moments. Investigating the influencing factors affecting users' willingness to share knowledge is of great significance both for the dissemination of knowledge and for exploring the characteristics of user behavior. Based on rational behavior theory and attitude theory, this paper uses 199 WeChat users as samples to explore the influencing factors of users' knowledge sharing willingness through structural equation modeling. The research results show that information sources, information characteristics and information quality are important factors affecting users' willingness to share. 微信朋友圈知识共享意愿影响因素研究 谢鸿婷,朱娟 九江学院信息学院,九江,江西,中国 九江学院信息学院,九江,江西,中国 3050566385@qq.com,zhujuan@whu.edu.cn 朱娟 关键词:微信朋友圈;理性行为理论;结构方程模型;知识共享意愿 中文摘要:社交网络是现代社会人们分享信息知识的重要平台,微信朋友圈更是个中翘楚, 探究影响用户知识共享意愿的因素,不论是在知识的传播方面还是在用户行为特征方面都独 具意义。本文在探究用户知识共享意愿时,以 199 位微信用户为样本基于理性行为理论和态度 理论,通过结构方程模型的方法来探讨。研究结果得信息来源,信息特征以及信息质量都对用 户知识共享意愿有重要影响。 1. 引言 随着互联网的不断发展,越来越多的人利用互联网的便利进行各类社交活动,基于这样 的背景,各类服务互联网用户的沟通交流平台应运而生。随着移动终端的发展,许多诸如微 信朋友圈的虚拟社区由于不受时间和空间的限制、可以自由的传递信息的优势,更是成为人 4th International Conference on Humanities Science and Society Development (ICHSSD 2019) Copyright © 2019, the Authors. Published by Atlantis Press. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/). Advances in Social Science, Education and Humanities Research, volume 328
本报告综合了多智能体系统在社会计算领域的最新研究成果,形成了从底层认知架构到高层社会治理的完整体系。研究核心已从单一的文本交互转向具备多模态感知、长期社会记忆和心理理论(ToM)推理能力的具身智能体。重点趋势包括:1) 结构化记忆与人格化架构的融合;2) 复杂社会规范与群体动力学的仿真模拟;3) 针对多模态社会事件的精准检测与意图分析;4) 愈发强调多智能体协作中的信任、责任与安全治理。这些进展共同推动了AI代理向具有深度社会理解力的数字化社会成员演进。