多智能体多模态社会社会记忆社会事件表示/多视角社会推理

本报告综合了多智能体系统在社会计算领域的最新研究成果，形成了从底层认知架构到高层社会治理的完整体系。研究核心已从单一的文本交互转向具备多模态感知、长期社会记忆和心理理论（ToM）推理能力的具身智能体。重点趋势包括：1) 结构化记忆与人格化架构的融合；2) 复杂社会规范与群体动力学的仿真模拟；3) 针对多模态社会事件的精准检测与意图分析；4) 愈发强调多智能体协作中的信任、责任与安全治理。这些进展共同推动了AI代理向具有深度社会理解力的数字化社会成员演进。

共 82 篇文献，6 个研究方向

结构化社会记忆与智能体认知架构

该组文献探讨如何为智能体构建类似人类的认知系统，重点在于多模态长期记忆（情节与语义记忆）、人格一致性维护、以及利用知识图谱和事件演算实现信息的持久化管理与生命周期维护。相关文献: Lin Long et. al, 2025 等 12 篇文献

多视角社会推理与心理理论 (ToM) 建模

研究聚焦于智能体理解他人心理状态（意图、错误信念）的能力，通过视角切换、溯因推理、圆桌讨论及逻辑增强技术，提升智能体在复杂社会互动中的决策深度与博弈能力。相关文献: Mehdi Hellou et. al, 2023 等 17 篇文献

社会动力学模拟与群体行为演化

关注大规模多智能体系统中社会规范的涌现、意见极化、从众行为及民主进程。研究结合了人格驱动模型与离散事件模拟，旨在揭示复杂社会系统的宏观演化规律。相关文献: Mingchen Zhuge et. al, 2023 等 15 篇文献

多模态社会事件表示、检测与意图识别

利用多模态数据（文、图、社交网络）对社会事件进行图谱化表示，涵盖谣言检测、有害意图识别、情感预测及品牌感知分析，强调跨模态对齐与细粒度语义理解。相关文献: Zheng Wei et. al, 2024 等 14 篇文献

特定领域的协同决策与专业化应用

展示多智能体框架在垂直领域的应用，如业务数据分析、网络安全检测、事实核查、教育社交分析及文化感知模拟，通过辩论与协作机制提升任务处理的准确性。相关文献: Sha Li et. al, 2024 等 9 篇文献

社会智能的安全性、信任治理与形式化理论

探讨多智能体系统在交互中的信任建立、欺骗检测、责任归属及安全合规性，同时包含可负担性（Affordance）形式化表示等底层理论支撑。相关文献: Mark Steedman et. al, 2019 等 15 篇文献

总计82篇相关文献

Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models

基于多模态大型语言模型的多智能体系统中的社会化学习和涌现行为

Sureyya Akin, Shruti T. Tiwari, R. Bhattacharya 等, 2025-ArXiv

This search introduces the Multimodal Socialized Learning Framework (M-S2L), designed to foster emergent social intelligence in AI agents by integrating Multimodal Large Language Models (M-LLMs) with social learning mechanisms. The framework equips agents with multimodal perception (vision and text) and structured action capabilities, enabling physical manipulation and grounded multimodal communication (e.g., text with visual pointers). M-S2L combines direct reinforcement learning with two novel social learning pathways: multimodal observational learning and communication-driven learning from feedback, augmented by an episodic memory system for long-term social context. We evaluate M-S2L in a Collaborative Assembly Environment (CAE), where agent teams must construct complex devices from ambiguous blueprints under informational asymmetry. Across tasks of increasing complexity, M-S2L agents consistently outperform Text-Only and No-Social-Learning baselines in Task Completion Rate and Time to Completion, particularly in dynamic problem-solving scenarios. Ablation studies confirm the necessity of both multimodality and socialized learning. Our analysis reveals the emergence of efficient communication protocols integrating visual pointers with concise text, alongside rapid role specialization leading to stable labor division. Qualitative case studies demonstrate agents'abilities for shared awareness, dynamic re-planning, and adaptive problem-solving, suggesting a nascent form of machine social cognition. These findings indicate that integrating multimodal perception with explicit social learning is critical for developing human-like collaborative intelligence in multi-agent systems.