agent memory system
面向LLM智能体的分层/操作系统式记忆架构与记忆管理机制
这些工作将“记忆”明确建模为分层存储与跨层数据流/更新/检索/生成的系统工程问题,借鉴操作系统或认知分层(短/中/长、工作/情景/语义等),关注生命周期管理、路由与可控性,从而缓解上下文窗口限制并提升长对话与长期个性化表现。
- Memory OS of AI Agent(Jie Kang, Mingming Ji, Zhe Zhao, Ting Bai, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- A Hybrid, Multi-Layered Memory Architecture for Collaborative Reasoning in Multi-Agent Systems(M. Ilin, Dmitry Pavlyuk, 2025, 2025 3rd International Conference on Foundation and Large Language Models (FLLM))
- Cooperative Scheduling and Hierarchical Memory Model for Multi-Agent Systems(Huhai Zou, Rongzhen Li, Tianhao Sun, Fei Wang, Ta-Hsin Li, Kai Liu, 2024, 2024 IEEE International Symposium on Product Compliance Engineering - Asia (ISPCE-ASIA))
- A Hybrid Agent Model, Mixing Short Term and Long Term Memory Abilities(F. Torterolo, C. Garbay, 1998, Lecture Notes in Computer Science)
检索增强(RAG)与外部记忆库:检索-重排-生成/自适应更新
该组论文共同点是:将外部记忆(语言-程序对、经验片段、证据、向量库条目等)组织为可检索的数据源,并围绕“如何检索、如何减少噪声、如何更新/分区/重排、如何把检索结果注入生成”展开,核心目标是提升开放域任务正确性与长期决策质量,降低检索带来的冗余噪声与计算开销。
- Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models(Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki, 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)
- MemInsight: Autonomous Memory Augmentation for LLM Agents(Rana Salama, Jason Cai, Mingzhe Yuan, Anna Currey, Mahendra K. Sunkara, Yi Zhang, Yassine Benajiba, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks(Shuyue Jia, Subhrangshu Bit, Varuna Jasodanand, Yi Liu, Vijaya B. Kolachalama, 2026, International Journal of Medical Informatics)
- Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs(Zheng Wang, Zhongyang Li, Zeren Jiang, Dandan Tu, W. Shi, 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)
- Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation(Qitao Qin, Yucong Luo, Yihang Lu, Zhibo Chu, Xiaoman Liu, Xianwei Meng, 2025, Findings of the Association for Computational Linguistics: ACL 2025)
- M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions(Zheng Wang, Shu Mei Teo, Jieer Ouyang, Yongjun Xu, Wei Shi, 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- Retrieval-Augmented Embodied Agents(Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Vector Storage Based Long-term Memory Research on LLM(Kun Li, Xin Jing, Chengang Jing, 2024, International Journal of Advanced Network, Monitoring and Controls)
个性化/长期用户记忆:层次化表征与检索以适配用户偏好与习惯
这些研究聚焦“用户长期交互”场景下的记忆组织与个性化服务:通过层次化记忆框架、面向用户会话/偏好的检索与生成,或从用户认知/期望出发提出访问与组织原则,从而让代理能够在多会话中保持一致性、可控性与可用性。
- Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction(Zhaopei Huang, Qifeng Dai, Guozheng Wu, Xiaopeng Wu, Xubin Li, Tiezheng Ge, Wenxuan Wang, Qin Jin, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- Memory Fabric for Conversational AI Agents: Enabling Shared and Persistent Memory Across Users(A Tiwari, V Gupta, 2025, Authorea Preprints)
- Users' Expectations and Practices with Agent Memory(Brennan Jones, Kelsey Stemmler, Emily Su, Young-Ho Kim, A. Kuzminykh, 2025, Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs(Zheng Wang, Zhongyang Li, Zeren Jiang, Dandan Tu, W. Shi, 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)
- Multi-agent Personal Memory Assistant(Ângelo Costa, P. Novais, Ricardo Costa, J. Corchado, J. Neves, 2010, Advances in Intelligent and Soft Computing)
工作记忆优化与层级子目标记忆分块(长程任务的可扩展上下文管理)
共同点在于:把长期能力瓶颈归因于工作记忆/上下文管理不当,并用“分块/摘要/子目标”或“端到端的记忆-推理-策略耦合”来减少冗余输入、提升历史信息利用效率与决策稳定性,特别面向长程序列任务与部分可观测环境。
- HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model(Mengkang Hu, Tianxing Chen, Qiguang Chen, Yi Mu, Wenqi Shao, Ping Luo, 2025, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents(Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith A. Lambert, Adam Amos-Binks, Zohreh Dannenhauer, Dustin Dannenhauer, 2024, Proceedings of the AAAI Symposium Series)
- Cognitive Modeling for Long-Horizon Agent Learning via Integrated Long-Term Memory and Reasoning(Linghao Yang, Tian Guan, Yumeng Ma, Zhongkang Li, Zhou Fang, Feiyang Wang, 2026, … Networks and Machine …)
情景/事件/神经启发的持续记忆与价值治理(记忆巩固与对齐)
该组强调“经验的事件化/情景化存储”、基于效用的巩固/衰减/治理,以及跨会话或动态系统中的记忆融合与对齐控制。其共同目标是让代理能持续积累并在多会话或多阶段行为中保持鲁棒性,同时用机制约束输出质量与一致性。
- Enhancing intelligent agents with episodic memory(Andrew Nuxoll, John E. Laird, 2012, Cognitive Systems Research)
- Value-Driven Memory-Augmented Generation for Agentic LLMs: Towards Structured and Adaptive Knowledge Utilization(C. H. Tan, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- An Intelligent Multi-Agent Memory Assistant(Ângelo Costa, P. Novais, 2011, Communications in Medical and Care Compunetics)
- Designing Synthetic Memory Systems for Supporting Autonomous Embodied Agent Behaviour(Christopher E. Peters, 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication)
- AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation(Ming Wang, Peidong Wang, L. Wu, Xiaocui Yang, Daling Wang, Shi Feng, Yuxin Chen, Bixuan Wang, Yifei Zhang, 2025, Findings of the Association for Computational Linguistics: ACL 2025)
图/二进制/结构化记忆格式与可审计的记忆可计算载体
这些工作把记忆视为可计算的数据结构/存储介质(例如二进制图格式、agent-native嵌入数据库、证据检索的缓存与剪枝记忆库),关注高效导航/持久化/审计与低延迟检索,使记忆系统从“文本拼接”走向“结构化、工程可实现、可控可追溯”。
- AgenticMemory: A Binary Graph Format for Persistent, Portable, and Navigable AI Agent Memory(Omoshola S. Owolabi, 2026, Portable, and Navigable AI Agent Memory (February …)
- AEVUM: An Agent-Native Persistent Memory Database System with Autonomous Data Management and Multi-Stage Compression(JR Maligireddy, 2026, Authorea Preprints)
- Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks(Shuyue Jia, Subhrangshu Bit, Varuna Jasodanand, Yi Liu, Vijaya B. Kolachalama, 2026, International Journal of Medical Informatics)
多智能体共享/企业/分布式记忆管理与协议(含调度与实时性)
共同点是:在多智能体或组织协作场景下处理共享记忆的创建、传播、复用与一致性,并进一步讨论分布式存储/检索/更新/删除的实时协议与效率优化(如事件驱动、intent-indexed图、协商与卸载),强调系统级可扩展与低延迟。
- A multi-agent system for building project memories to facilitate the design process(D. Monticolo, Vincent Hilaire, S. Gomes, A. Koukam, 2008, Integrated Computer-Aided Engineering)
- Multi-Agent Corporate Memory Management System(Fabien Gandon, Agostino Poggi, Giovanni Rimassa, Paola Turci, 2002, Applied Artificial Intelligence)
- An Intelligent Multi-Agent Memory Assistant(Ângelo Costa, P. Novais, 2011, Communications in Medical and Care Compunetics)
- MemIndex: Agentic Event-based Distributed Memory Management for Multi-agent Systems(Alaa Saleh, Sasu Tarkoma, Anders Lindgren, Praveen Kumar Donta, S. Dustdar, Susanna Pirttikangas, Lauri Lovén, 2025, ACM Transactions on Autonomous and Adaptive Systems)
- Multi-Agent Corporate Memory Management System(Fabien Gandon, Agostino Poggi, Giovanni Rimassa, Paola Turci, 2002, Applied Artificial Intelligence)
具身与多模态记忆:用于机器人长期导航/策略复用/场景依赖记忆
该组共同点是面向具身智能体(机器人)在长时程、部分可观测环境中的“场景历史依赖记忆”。通过多模态记忆(VLM描述/视觉-动作历史)与可检索策略库/注意力记忆策略,提高长期任务成功率与泛化。
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks(Kuan Fang, Alexander Toshev, Li Fei-Fei, S. Savarese, 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models(Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki, 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)
- Retrieval-Augmented Embodied Agents(Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models(Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki, 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)
记忆系统的软件工程落地:生产级实现与执行闭环(RAMP/向量检索工程)
该论文将记忆系统放入生产级工程架构,围绕“反映-行动-记忆-计划”的执行循环、低延迟向量检索(如Qdrant)与强类型数据契约,解决延迟、状态同步与可复现性问题,体现记忆系统在真实应用中的可落地工程路径。
- High-Performance Implementation of Multi-Agent Web Systems: Integrating Vector Memory with Strictly Typed React Architectures(Mykhailo Nykoliuk, 2025, Universal Library of Engineering Technology)
整体文献可归纳为七到八条主线:①分层/操作系统式记忆管理架构;②检索增强(RAG)与外部记忆库的检索-重排-生成与自适应更新;③面向用户的长期个性化记忆组织与可用性设计;④工作记忆与长程任务的分块/子目标化上下文管理;⑤情景/事件持续记忆、效用巩固与价值治理;⑥结构化/图/二进制/agent-native持久化的记忆载体;⑦多智能体共享与分布式记忆的协作协议与实时性;以及⑧面向具身任务的场景依赖记忆与机器人策略复用,外加少量强调生产级工程落地的实现研究。
总计34篇相关文献
Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents.To overcome this challenge, we innovatively propose a Memory Operating System, i.e., Memo-ryOS, to achieve comprehensive and efficient memory management for AI agents.Inspired by the memory management principles in operating systems, MemoryOS designs a hierarchical storage architecture and consists of four key modules: Memory Storage, Updating, Retrieval, and Generation.Specifically, the architecture comprises three levels of storage units: short-term memory, mid-term memory, and long-term personal memory.Key operations within MemoryOS include dynamic updates between storage units: short-term to mid-term updates follow a dialogue-chain-based FIFO principle, while mid-term to long-term updates use a segmented page organization strategy.Extensive experiments on the LoCoMo benchmark show an average improvement of 49.11% on F1 and 46.18% on BLEU-1 over the baselines on GPT-4o-mini, showing contextual coherence and personalized memory retention in long conversations.
Interactive applications are latency-sensitive systems that enable dynamic responses to user inputs in domains such as robotics, industrial automation, and autonomous control. These applications require efficient application protocols for communication, with the pub/sub model being one of the most promising approaches. However, existing pub/sub systems are architecturally constrained, particularly by limited memory capacity and inefficiencies in dynamic environments. Addressing these challenges requires effective distributed memory management, yet this aspect has received limited attention in existing research. This paper addresses the gap by proposing MemIndex, an adaptive and autonomous distributed memory-management framework with an intent-indexed bipartite graph architecture. It is designed for an LM-based multi-agent pub/sub systems, enabling agents to autonomously negotiate memory operations in real time through dynamic index spaces for efficient reasoning. We evaluate our proposed MemIndex using diverse models against two baselines. Experimental results show MemIndex outperforms both baselines across storage, retrieval, update, and deletion operations, achieving average reductions of about 34% and 56% in elapsed time, 57% and 75% in CPU utilization, 23% and 76% in memory usage. Scalability tests further demonstrate that MemIndex maintains low end-to-end delay as submissions and agents grow, confirming that its negotiation-driven offloading enables efficient distributed memory management in interactive applications.
… -agent system for the management of a corporate memory. The innovative aspect of the system … were generally used separately until now: agent technology, knowledge modeling, XML …
… Section 4 presents the structure of our project memories model. We describe a multi-agent system used to build these project memories in section 5. Finally Section 6 describes the …
Constrained by the cost and ethical concerns of involving real seekers in AI-driven mental health, researchers develop LLM-based conversational agents (CAs) with tailored configurations, such as profiles, symptoms, and scenarios, to simulate seekers. While these efforts advance AI in mental health, achieving more realistic seeker simulation remains hindered by two key challenges: dynamic evolution and multi-session memory. Seekers'mental states often fluctuate during counseling, which typically spans multiple sessions. To address this, we propose AnnaAgent, an emotional and cognitive dynamic agent system equipped with tertiary memory. AnnaAgent incorporates an emotion modulator and a complaint elicitor trained on real counseling dialogues, enabling dynamic control of the simulator's configurations. Additionally, its tertiary memory mechanism effectively integrates short-term and long-term memory across sessions. Evaluation results, both automated and manual, demonstrate that AnnaAgent achieves more realistic seeker simulation in psychological counseling compared to existing baselines. The ethically reviewed and screened code can be found on https://github.com/sci-m-wang/AnnaAgent.
… present a single memory system, since often such a system will depend on … memory systems that are compatible in a general manner with computational attention and emotion systems. …
Large Language Models (LLMs) are leading a technological revolution. This gives agents based on LLMs renewed vitality, and multi-agent collaboration is showing the potential to foster new forms of intelligence. However, current multi-agent systems face two major challenges: the first is the issue of resource coordination and scheduling within multi-agent systems, and the second is the limitation of the context window in large models, which hinders the practical application of agents in long-term conversational scenarios, urgently requiring improvements in memory capabilities. To address these challenges, we propose a collaborative scheduling strategy and hierarchical memory model for LLM-based multi-agent systems inspired by operating systems. First, we design a time-sharing scheduling strategy, analogous to process scheduling in operating systems, which divides the resource usage cycle into finer-grained single-step workflows, allocating independent resource windows to different agents to reduce resource contention and conflicts. Second, we introduce a hierarchical memory model based on the multi-level cache architecture of operating systems, segmenting the agents' memory into core memory, main memory, and vague memory areas, thereby significantly improving memory retention and retrieval efficiency in LLM - based agents when handling complex tasks. Experimental results demonstrate that our proposed method achieves efficient resource allocation in multi-agent systems while significantly enhancing the memory capabilities of agents and overall system performance.
… reference system for modelling spatial prepositions, the “state transition semantics” system … language sentences, the agent’s episodic memory system and associated reflective demons, …
… can fulfil the development requisites of Memory Assistants is the Multi-Agent System one [12]. The … Generally, agent platforms support the interaction between the agents by means of …
… On top of this platform a Personal Memory Assistant and a Social Enabler where developed … a distributed system approach is adequate for developing multi-agent systems for healthcare …
AI agents have the potential to provide long-term personalized assistance to users, and this relies on effective long-term memory. While memory in agents has been extensively covered by prior work, there is little understanding of users’ expectations and practices with agent memory. As a preliminary investigation, we interviewed people who use AI tools with memory and analyzed online discussion posts of people’s experiences with such tools. We found that users often have incomplete mental models of how agents remember and recall information, and how their memories affect their behaviours. Users generally consider agents’ memories as belonging to different categories along a hierarchy from more generalized knowledge to more specific knowledge about the user or task. Users often desire the system’s memories to be cleanly organized by these categories. These findings reveal opportunities to design agent memory mechanisms to organize and control access to memories based on users’ task-based needs.
… that agent memory is a graph navigation problem, not a text search problem. When an agent … We present AgenticMemory, a binary graph format purpose-built for AI agent memory. The …
… and retrieval and we have used it to create agents for a variety of tasks. Our research suggests that episodic memory enhances the performance of AI agents and may be a “missing link” …
While Large Language Model (LLM) based multi-agent systems (MAS) show promise, their capabilities are constrained by simplistic, monolithic memory models. To address this, we propose a novel, four-tier hierarchical memory architecture inspired by cognitive science. Our architecture decomposes memory into an L1 Active Context, L2 Working Memory for significant facts, L3 Episodic Memory for experiences, and L4 Semantic Memory for distilled knowledge. Our core innovation lies in the autonomous lifecycle engines that govern information flow between these tiers, including a CIAR scoring model for significance-based Promotion and a multi-stage Consolidation/Distillation pipeline for long-term learning. This principled, production-ready design enables more robust and auditable reasoning. We validate the performance of the underlying storage layer through a micro-benchmark, demonstrating the high reliability and throughput required for mission-critical agentic systems.
… Memory Exchange Protocol evaluated through the lens of memory fabric. It also addresses work on dialogue and agent memory, multi-agent shared-memory designs, privacy and …
… Turing Machines and Key-Value Memory Networks to recent systems like AutoGen, … memory fabric. It also addresses work on dialogue and agent memory, multi-agent shared-memory …
This paper presents an approach to design a multi-agent system managing a corporate memory in the form of a distributed semantic web and describes the resulting architecture. The system was designed during the CoMMA European project (Corporate Memory Management through Agents) and aims at helping users in the management of a corporate memory, facilitating the creation, dissemination, transmission and reuse of knowledge in an organisation. The implementation integrated several emerging technologies: multi-agents system technology (using the JADE FIPA-compliant platform), knowledge modelling and XML technology for information retrieval (using the CORESE semantic search engine) and machine learning techniques. Here, we describe the agent roles and interactions, we explain the design rationale for the agent societies and we discuss the configuration and implementation issues.
In this paper, we provide a review of the current efforts to develop LLM agents, which are autonomous agents that leverage large language models. We examine the memory management approaches used in these agents. One crucial aspect of these agents is their long-term memory, which is often implemented using vector databases. We describe how vector databases are utilized to store and retrieve information in LLM agents. Moreover we highlight open problems, such as the separation of different types of memories and the management of memory over the agent's lifetime. Lastly, we propose several topics for future research to address these challenges and further enhance the capabilities of LLM agents, including the use of metadata in procedural and semantic memory and the integration of external knowledge sources with vector databases.
This study focuses on the tendency of agents in long-horizon sequential tasks to rely on short-term states and to underutilize historical information, and proposes a cognitive modeling and learning framework with long-term memory and reasoning capabilities. The framework provides a unified cognitive description of the agent's decision process. It introduces a structured long-term memory mechanism to support continuous storage and selective updating of cross-temporal key information. On this basis, a memory retrieval-driven reasoning module is constructed so that experience can explicitly participate in the formation of current decision logic. To address the separation between memory and decision making in conventional policy models, the framework tightly couples perception representation, memory management, reasoning processes, and policy generation into an end-to-end cognitive loop. This design strengthens goal consistency and behavioral stability in long-horizon interactive environments. Comparative evaluations in open source interactive task settings demonstrate consistent advantages in task completion quality, decision efficiency, and long-term information utilization. The results indicate that the proposed cognitive modeling framework effectively mitigates decision difficulties caused by long-range dependencies and partial observability. Overall, the study shows that integrating long-term memory and reasoning within a unified learning framework is an important approach for improving sustained decision-making capability in complex environments.
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.
Large Language Model (LLM)-based agents exhibit significant potential across various domains, operating as interactive systems that process environmental observations to generate executable actions for target tasks.The effectiveness of these agents is significantly influenced by their memory mechanism, which records historical experiences as sequences of actionobservation pairs.We categorize memory into two types: cross-trial memory, accumulated across multiple attempts, and in-trial memory (working memory), accumulated within a single attempt.While considerable research has optimized performance through cross-trial memory, the enhancement of agent performance through improved working memory utilization remains underexplored.Instead, existing approaches often involve directly inputting entire historical action-observation pairs into LLMs, leading to redundancy in long-horizon tasks.Inspired by human problem-solving strategies, this paper introduces HIAGENT, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically.Specifically, HIAGENT prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal.Experimental results across five long-horizon tasks demonstrate that HIAGENT achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.Additionally, our analysis shows that HIAGENT consistently improves performance across various steps, highlighting its robustness and generalizability.
… We try to clarify in this section the de nition and use of basic notions like the ones of reactive and cognitive agents, or short-term and long-term memory. Some emphasis is put on the …
In the age of mobile internet, user data, often referred to as memories, is continuously generated on personal devices.Effectively managing and utilizing this data to deliver services to users is a compelling research topic.In this paper, we introduce a novel task of crafting personalized agents powered by large language models (LLMs), which utilize a user's smartphone memories to enhance downstream applications with advanced LLM capabilities.To achieve this goal, we introduce EMG-RAG, a solution that combines Retrieval-Augmented Generation (RAG) techniques with an Editable Memory Graph (EMG).This approach is further optimized using Reinforcement Learning to address three distinct challenges: data collection, editability, and selectability.Extensive experiments on a real-world dataset validate the effectiveness of EMG-RAG, achieving an improvement of approximately 10% over the best existing approach.Additionally, the personalized agents have been transferred into a real smartphone AI assistant, which leads to enhanced usability.
Retrieval-Augmented Generation (RAG), by integrating non-parametric knowledge from external knowledge bases into models, has emerged as a promising approach to enhancing response accuracy while mitigating factual errors and hallucinations.This method has been widely applied in tasks such as Question Answering (QA).However, existing RAG methods struggle with open-domain QA tasks because they perform independent retrieval operations and directly incorporate the retrieved information into generation without maintaining a summarizing memory or using adaptive retrieval strategies, leading to noise from redundant information and insufficient information integration.To address these challenges, we propose Adaptive memory-based optimization for enhanced RAG (Amber) for open-domain QA tasks, which comprises an Agent-based Memory Updater, an Adaptive Information Collector, and a Multi-granular Content Filter, working together within an iterative memory updating paradigm.Specifically, Amber integrates and optimizes the language model's memory through a multi-agent collaborative approach, ensuring comprehensive knowledge integration from previous retrieval steps.It dynamically adjusts retrieval queries and decides when to stop retrieval based on the accumulated knowledge, enhancing retrieval efficiency and effectiveness.Additionally, it reduces noise by filtering irrelevant content at multiple levels, retaining essential information to improve overall model performance.We conduct extensive experiments on several open-domain QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components.The source code is available 1 .
Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology.
… Our system features a lightweight retrieval-augmented generation pipeline for efficient evidence retrieval and reranking, coupled with a cache-and-prune memory bank, enabling …
… memory augmentation approach, MemInsight, to enhance semantic data representation and retrieval … • We design and apply memory retrieval methods that leverage the generated …
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant memories from an external database.However, existing RAG methods typically organize all memories in a whole database, potentially limiting focus on crucial memories and introducing noise.In this paper, we introduce a multiple partition paradigm for RAG (called M-RAG), where each database partition serves as a basic unit for RAG execution.Based on this paradigm, we propose a novel framework that leverages LLMs with Multi-Agent Reinforcement Learning to optimize different language generation tasks explicitly.Through comprehensive experiments conducted on seven datasets, spanning three language generation tasks and involving three distinct language model architectures, we confirm that M-RAG consistently outperforms various baseline methods, achieving improvements of 11%, 8%, and 12% for text summarization, machine translation, and dialogue generation, respectively.
With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users’ subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose H2Memory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.
Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an embodied agent equipped with an external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting: relevant memories are retrieved based on the current dialogue, instruction, correction or VLM description, and used as in-context prompt examples for LLM querying. The memory is expanded during deployment to include pairs of user's language and action plans, to assist future inferences and personalize them to the user's language and routines. HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD), with 1.7x improvement over the previous SOTA for TfD. Our models, code and video results can be found in our project's website: https://helper-agent-llm.github.io.
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, yet their efficacy is constrained by a fundamental memory limitation: a static context window that resets with each interaction. This prevents them from accumulating experience and adapting to dynamic, long-term tasks. To address the limitations of long-term memory in agentic LLMs, this work introduces a neuro-inspired framework with two key contributions. First, we propose \textbf{ARTEM} (Agentic Retrieval with Temporal-Episodic Memory), a system that organizes experiences into structured events and manages utility-based memory consolidation. Second, we extend this framework with a distinct governance component, \textbf{Value-driven ARTEM}, that validates candidate outputs against core principles before finalization. Together, these components equip LLM agents with continual learning, adaptive reasoning, and robust value-aligned decision-making. Looking forward, we outline future directions including dynamic memory adaptation, memory decay mechanisms, and applications in interactive multi-agent environments.
Abstract Current large language model (LLM) intelligences face the challenges of high inference cost and low decision quality when dealing with complex tasks, and are especially deficient in maintaining context coherence during long tasks. This research presents an innovative vector storage long-term memory mechanism model (VIMBank) to enhance the long-term context retention ability and task execution efficiency of LLM intelligences by storing and retrieving historical interaction data through a vector database. VIMBank utilizes a dynamic memory updating strategy and the Ebbinghaus forgetting curve theory to efficiently manage the memory of intelligences and reinforce critical information, forgetting unimportant data, and optimizing storage and reasoning costs. The experimental results show that VIMBank significantly improves the decision quality and efficiency of LLM intelligences in multi-tasking scenarios and reduces the computational cost. Compared with different agents, the success rate of task decision is increased by 10% to 20%, and the reasoning cost is reduced by about 23%, which provides an important theoretical basis and practical support for the future development of intelligences with long term memory and adaptive learning ability.
This paper addresses the software engineering challenges of integrating autonomous agents into production-grade web applications. While traditional implementations suffer from high latency and state synchronization issues, this study presents a full-stack solution based on TypeScript and React 19 Server Components. This paper details the implementation of a RAMP (Reflect, Act, Memory, Plan) execution loop at the code level, using Qdrant to produce low-latency (<100ms) vectors and Next.js for server-side orchestration. A key engineering contribution is the development of a strictly typed data contract that synchronizes server-side agent reasoning with client-side state management (via TanStack Query). Experimental results confirm that this specific stack architecture significantly reduces response times and prevents runtime type errors, offering a reproducible pattern for building scalable, high-load web platforms.
… records, vector memory, and audit logs into uncoordinated silos. We present AEVUM, an agent-native embedded database system that inverts this model. In AEVUM the AI agent owns, …
整体文献可归纳为七到八条主线:①分层/操作系统式记忆管理架构;②检索增强(RAG)与外部记忆库的检索-重排-生成与自适应更新;③面向用户的长期个性化记忆组织与可用性设计;④工作记忆与长程任务的分块/子目标化上下文管理;⑤情景/事件持续记忆、效用巩固与价值治理;⑥结构化/图/二进制/agent-native持久化的记忆载体;⑦多智能体共享与分布式记忆的协作协议与实时性;以及⑧面向具身任务的场景依赖记忆与机器人策略复用,外加少量强调生产级工程落地的实现研究。