agent memory system
交互式LLM代理的记忆分层、生命周期管理与巩固/触发策略
共同点是面向“对话/交互型LLM代理”的记忆生命周期与检索触发机制:通过记忆分层(短/中/长或工作/情景/语义)、层间更新与巩固策略、以及与上下文/子目标/时间因素相结合的检索与生成,从而提升长对话一致性与个性化。部分工作还关注长程对话下的性能评测或反思式记忆管理。
- Memory OS of AI Agent(Jie Kang, Mingming Ji, Zhe Zhao, Ting Bai, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- A Hybrid, Multi-Layered Memory Architecture for Collaborative Reasoning in Multi-Agent Systems(M. Ilin, Dmitry Pavlyuk, 2025, 2025 3rd International Conference on Foundation and Large Language Models (FLLM))
- HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model(Mengkang Hu, Tianxing Chen, Qiguang Chen, Yi Mu, Wenqi Shao, Ping Luo, 2025, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- "My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents(Yuki Hou, Haruki Tamoto, Homei Miyashita, 2024, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- MemInsight: Autonomous Memory Augmentation for LLM Agents(Rana Salama, Jason Cai, Mingzhe Yuan, Anna Currey, Mahendra K. Sunkara, Yi Zhang, Yassine Benajiba, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents(Tan Zhen, Jun Yan, I-Hung Hsu, R.J. Han, Zifeng Wang, Duc Long Le, Yong Sang Song, Yanfei Chen, Hamid Palangi, George Lee, Aarti Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, Tomas Pfister, 2025, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- Evaluating Very Long-Term Conversational Memory of LLM Agents(Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang, 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
记忆存储与检索底座:向量数据库、持久化系统与记忆数据结构/框架
共同点是把“外部/持久化记忆存储与检索”作为系统基础设施或方法论核心:包括向量数据库与RAG协同、向量存储的更新/遗忘机制、以及为代理提供原生持久数据库或记忆数据结构(如图/内嵌数据库/记忆fabric)。另外,部分综述/框架讨论了记忆类型分离与长期管理的开放问题,偏工程与机制层面的可实现性。
- Vector Databases and Language Models: Synergies and Challenges(Toni Taipalus, 2025, Communications in Computer and Information Science)
- Vector Storage Based Long-term Memory Research on LLM(Kun Li, Xin Jing, Chengang Jing, 2024, International Journal of Advanced Network, Monitoring and Controls)
- AgenticMemory: A Binary Graph Format for Persistent, Portable, and Navigable AI Agent Memory(Omoshola S. Owolabi, 2026, … , and Navigable AI Agent Memory (February 18, 2026))
- AEVUM: An Agent-Native Persistent Memory Database System with Autonomous Data Management and Multi-Stage Compression(JR Maligireddy, 2026, Authorea Preprints)
- Memory Fabric for Conversational AI Agents: Enabling Shared and Persistent Memory Across Users(A Tiwari, V Gupta, 2025, Authorea Preprints)
- An emotion understanding framework for intelligent agents based on episodic and semantic memories(M. Kazemifard, N. Ghasem-Aghaee, Bryan L. Koenig, T. Ören, 2013, Autonomous Agents and Multi-Agent Systems)
- Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents(Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith A. Lambert, Adam Amos-Binks, Zohreh Dannenhauer, Dustin Dannenhauer, 2024, Proceedings of the AAAI Symposium Series)
- MemInsight: Autonomous Memory Augmentation for LLM Agents(Rana Salama, Jason Cai, Mingzhe Yuan, Anna Currey, Mahendra K. Sunkara, Yi Zhang, Yassine Benajiba, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
记忆检索增强与相关性建模(检索模块/注意力/过滤)
共同点是围绕“记忆检索质量”的建模与优化:通过更好的检索打分/注意力分配、记忆增强与过滤(减少无关记忆)、或用检索模块来提升生成代理的适应性与行为一致性。同时,这些工作也与长对话场景中的检索有效性相关。
- Enhancing memory retrieval in generative agents through LLM-trained cross attention networks(Chuanyang Hong, Qingyun He, 2025, Frontiers in Psychology)
- MemInsight: Autonomous Memory Augmentation for LLM Agents(Rana Salama, Jason Cai, Mingzhe Yuan, Anna Currey, Mahendra K. Sunkara, Yi Zhang, Yassine Benajiba, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- "My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents(Yuki Hou, Haruki Tamoto, Homei Miyashita, 2024, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- Evaluating Very Long-Term Conversational Memory of LLM Agents(Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang, 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
情景记忆(Episodic Memory)的表示、触发与应用
共同点是将“情景记忆(episodic memory)”作为关键记忆类型来研究其表示、触发与应用:包括基于时间丰富域的符号化/结构化情景记忆、从短时情景缓存到长期存储的流程、以及面向情感交互或类人行为的能力。
- Episodic memory formulation and its application in long-term HRI(Markos Sigalas, M. Maniadakis, P. Trahanias, 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN))
- Enhancing intelligent agents with episodic memory(Andrew Nuxoll, John E. Laird, 2012, Cognitive Systems Research)
- Towards episodic memory-based long-term affective interaction with a human-like robot(Zerrin Kasap, N. Magnenat-Thalmann, 2010, 19th International Symposium in Robot and Human Interactive Communication)
- Episodic memory for autonomous agents(T. Deutsch, A. Gruber, R. Lang, R. Velik, 2008, 2008 Conference on Human System Interactions)
- Episodic Memory for Human-like Agents and Human-like Agents for Episodic Memory(C. Brom, J. Lukavský, 2010, International Journal of Machine Consciousness)
- Different Ways to Cue a Coherent Memory System: A Theory for Episodic, Semantic, and Procedural Tasks.(M. Humphreys, J. Bain, R. Pike, 1989, Psychological Review)
- Different Ways to Cue a Coherent Memory System: A Theory for Episodic, Semantic, and Procedural Tasks.(M. Humphreys, J. Bain, R. Pike, 1989, Psychological Review)
长时程任务中的记忆利用:跨时序依赖建模与认知闭环
共同点是面向“长时程/长视野决策”的跨时序依赖:通过注意力/Transformer式记忆策略、在认知循环中显式接入长期记忆检索以形成决策逻辑、或讨论长期记忆对长任务执行与一致性的影响。整体更偏跨时间的记忆利用方式。
- Towards episodic memory-based long-term affective interaction with a human-like robot(Zerrin Kasap, N. Magnenat-Thalmann, 2010, 19th International Symposium in Robot and Human Interactive Communication)
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks(Kuan Fang, Alexander Toshev, Li Fei-Fei, S. Savarese, 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Cognitive Modeling for Long-Horizon Agent Learning via Integrated Long-Term Memory and Reasoning(Linghao Yang, Tian Guan, Yumeng Ma, Zhongkang Li, Zhou Fang, Feiyang Wang, 2026, … Networks and Machine …)
- Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents(Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith A. Lambert, Adam Amos-Binks, Zohreh Dannenhauer, Dustin Dannenhauer, 2024, Proceedings of the AAAI Symposium Series)
记忆系统的隐私与安全风险(记忆泄露与防护需求)
共同点是从安全与风险角度讨论记忆系统:关注代理把用户交互写入记忆后可能发生的隐私泄露,并提出针对记忆的提取攻击与影响因素分析;同时与长程记忆带来的可恢复性/可提取性相关。
- Unveiling Privacy Risks in LLM Agent Memory(Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, Pengfei He, 2025, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- Evaluating Very Long-Term Conversational Memory of LLM Agents(Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang, 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
代理记忆的系统架构、综述与工程化落地(生产/领域应用)
共同点是更偏整体架构与应用落地/工程模式:包括对LLM代理四要素(感知-规划-记忆-行动)的理论综述与记忆管理综述、以及将记忆能力嵌入生产系统(多智能体Web系统、领域运维记忆架构)与基于向量存储的长期机制验证。整体关注系统性与可落地性。
- Memory Fabric for Conversational AI Agents: Enabling Shared and Persistent Memory Across Users(A Tiwari, V Gupta, 2025, Authorea Preprints)
- A Survey of LLM-based Agents: Theories, Technologies, Applications and Suggestions(Xiaofei Dong, Xueqiang Zhang, Weixin Bu, Dan Zhang, Feng Cao, 2024, 2024 3rd International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC))
- Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents(Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith A. Lambert, Adam Amos-Binks, Zohreh Dannenhauer, Dustin Dannenhauer, 2024, Proceedings of the AAAI Symposium Series)
- High-Performance Implementation of Multi-Agent Web Systems: Integrating Vector Memory with Strictly Typed React Architectures(Mykhailo Nykoliuk, 2025, Universal Library of Engineering Technology)
- Mind-Tool: Domain Memory Architecture for AI Agents(Ioannis Chrysochos, 2026, Journal of Engineering and Artificial Intelligence)
- Vector Storage Based Long-term Memory Research on LLM(Kun Li, Xin Jing, Chengang Jing, 2024, International Journal of Advanced Network, Monitoring and Controls)
具身/多智能体与实时约束下的记忆-推理耦合与性能优化
共同点是偏“多智能体/具身/实时环境”的部署与性能权衡:在部分可观测、长周期控制、或生产级低延迟约束下,长期记忆如何与策略/推理循环耦合,同时保证吞吐与响应速度。
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks(Kuan Fang, Alexander Toshev, Li Fei-Fei, S. Savarese, 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Cognitive Modeling for Long-Horizon Agent Learning via Integrated Long-Term Memory and Reasoning(Linghao Yang, Tian Guan, Yumeng Ma, Zhongkang Li, Zhou Fang, Feiyang Wang, 2026, … Networks and Machine …)
- High-Performance Implementation of Multi-Agent Web Systems: Integrating Vector Memory with Strictly Typed React Architectures(Mykhailo Nykoliuk, 2025, Universal Library of Engineering Technology)
以上文献可归纳为:围绕“长期记忆如何写入、分层管理、触发检索并支持长程一致决策”,构建记忆底座(向量库/持久数据库/数据结构/记忆fabric/MemoryOS等);进一步通过检索增强(相关性建模、过滤、注意力/巩固量化)提升记忆可用性;在模型层面研究情景记忆与长时程跨时序依赖的利用(episodic/场景记忆、认知闭环);同时从风险侧关注记忆带来的隐私泄露;最后结合综述与工程化研究,覆盖从理论框架到生产系统与具身/多智能体场景的落地路径。
总计29篇相关文献
In this paper, we provide a review of the current efforts to develop LLM agents, which are autonomous agents that leverage large language models. We examine the memory management approaches used in these agents. One crucial aspect of these agents is their long-term memory, which is often implemented using vector databases. We describe how vector databases are utilized to store and retrieve information in LLM agents. Moreover we highlight open problems, such as the separation of different types of memories and the management of memory over the agent's lifetime. Lastly, we propose several topics for future research to address these challenges and further enhance the capabilities of LLM agents, including the use of metadata in procedural and semantic memory and the integration of external knowledge sources with vector databases.
Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents.To overcome this challenge, we innovatively propose a Memory Operating System, i.e., Memo-ryOS, to achieve comprehensive and efficient memory management for AI agents.Inspired by the memory management principles in operating systems, MemoryOS designs a hierarchical storage architecture and consists of four key modules: Memory Storage, Updating, Retrieval, and Generation.Specifically, the architecture comprises three levels of storage units: short-term memory, mid-term memory, and long-term personal memory.Key operations within MemoryOS include dynamic updates between storage units: short-term to mid-term updates follow a dialogue-chain-based FIFO principle, while mid-term to long-term updates use a segmented page organization strategy.Extensive experiments on the LoCoMo benchmark show an average improvement of 49.11% on F1 and 46.18% on BLEU-1 over the baselines on GPT-4o-mini, showing contextual coherence and personalized memory retention in long conversations.
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.
This study focuses on the tendency of agents in long-horizon sequential tasks to rely on short-term states and to underutilize historical information, and proposes a cognitive modeling and learning framework with long-term memory and reasoning capabilities. The framework provides a unified cognitive description of the agent's decision process. It introduces a structured long-term memory mechanism to support continuous storage and selective updating of cross-temporal key information. On this basis, a memory retrieval-driven reasoning module is constructed so that experience can explicitly participate in the formation of current decision logic. To address the separation between memory and decision making in conventional policy models, the framework tightly couples perception representation, memory management, reasoning processes, and policy generation into an end-to-end cognitive loop. This design strengthens goal consistency and behavioral stability in long-horizon interactive environments. Comparative evaluations in open source interactive task settings demonstrate consistent advantages in task completion quality, decision efficiency, and long-term information utilization. The results indicate that the proposed cognitive modeling framework effectively mitigates decision difficulties caused by long-range dependencies and partial observability. Overall, the study shows that integrating long-term memory and reasoning within a unified learning framework is an important approach for improving sustained decision-making capability in complex environments.
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.
With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users’ subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose H2Memory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.
… First, rigid memory granularity fails to capture the natural semantic structure of conversations… Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, …
Large Language Model (LLM)-based agents exhibit significant potential across various domains, operating as interactive systems that process environmental observations to generate executable actions for target tasks.The effectiveness of these agents is significantly influenced by their memory mechanism, which records historical experiences as sequences of actionobservation pairs.We categorize memory into two types: cross-trial memory, accumulated across multiple attempts, and in-trial memory (working memory), accumulated within a single attempt.While considerable research has optimized performance through cross-trial memory, the enhancement of agent performance through improved working memory utilization remains underexplored.Instead, existing approaches often involve directly inputting entire historical action-observation pairs into LLMs, leading to redundancy in long-horizon tasks.Inspired by human problem-solving strategies, this paper introduces HIAGENT, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically.Specifically, HIAGENT prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal.Experimental results across five long-horizon tasks demonstrate that HIAGENT achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.Additionally, our analysis shows that HIAGENT consistently improves performance across various steps, highlighting its robustness and generalizability.
… is called episodic memory. This work describes an episodic memory architecture derived from … 143], the human memory can be divided into a short term memory (alternatively: a working …
… agent’s ability to sense its environment, reason, and learn. We demonstrate that episodic memory enables agents … Working memory is a short-term declarative memory that encapsulates …
… The memory integrates following submodels: visual short-term … a generic and believable agent with episodic memory abilities, which … Such a generic agent has not been developed yet, …
… episodic memory model was not specifically designed for intelligent conversational agents, this description of memory … Episodes are first saved into short-term episodic memory (STEM). …
… The very short-term memory is sensory memory, basically a … Short-term memory (also referred to as working memory) … information retrieved from long-term memory [52]. Long-term …
Episodic memory endows agents with numerous general cognitive capabilities, such as action modeling and virtual sensing. However, for long-lived agents, there are numerous unexplored computational challenges in supporting useful episodic-memory functions while maintaining real-time reactivity. In this paper, we review the implementation of episodic memory in Soar and present an expansive evaluation of that system. We demonstrate useful applications of episodic memory across a variety of domains, including games, mobile robotics, planning, and linguistics. In these domains, we characterize properties of environments, tasks, and episodic cues that affect performance, and evaluate the ability of Soar’s episodic memory to support hours to days of real-time operation.
… episodic memory for autonomous artificial agents, which encodes symbolic information on a temporally rich domain. In particular, memory … Beaufays, “Long short-term memory recurrent …
… When developing the Matrix model to encompass both episodic and semantic memories, we sought to preserve earlier distributed models as special cases. We achieved this by …
Introduction The surge in the capabilities of large language models (LLMs) has propelled the development of Artificial General Intelligence (AGI), highlighting generative agents as pivotal components for emulating complex AI behaviors. Given the high costs associated with individually training LLMs for each AI agent, there is a critical need for advanced memory retrieval mechanisms to maintain the unique characteristics and memories of individual AI agents. Methods In this research, we developed a text-based simulation of a generative agent world, constructing a community with multiple agents and locations in which certain levels of interaction were enabled. Within this framework, we introduced a novel memory retrieval system using an Auxiliary Cross Attention Network (ACAN). This system calculates and ranks attention weights between an agent's current state and stored memories, selecting the most relevant memories for any given situation. In a novel approach, we incorporated LLM assistance, comparing memories retrieved by our model with those extracted using a base method during training, and constructing a novel loss function based on these comparisons to optimize the training process effectively. To our knowledge, this is the first study to utilize LLMs to train a dedicated agent memory retrieval network. Results Our empirical evaluations demonstrate that this approach substantially enhances the quality of memory retrieval, thereby increasing the adaptability and behavioral consistency of agents in fluctuating environments. Discussion Our findings not only introduce new perspectives and methodologies for memory retrieval in generative agents but also extend the utility of LLMs in memory management across varied AI agent applications.
… and apply memory retrieval methods that leverage the generated memory augmentations to filter out irrelevant memory while … Memory Retrieval For this task we evaluate attribute-based …
In this study, we propose a novel human-like memory architecture designed for enhancing the cognitive abilities of large language model (LLM)-based dialogue agents. Our proposed architecture enables agents to autonomously recall memories necessary for response generation, effectively addressing a limitation in the temporal cognition of LLMs. We adopt the human memory cue recall as a trigger for accurate and efficient memory recall. Moreover, we developed a mathematical model that dynamically quantifies memory consolidation, considering factors such as contextual relevance, elapsed time, and recall frequency. The agent stores memories retrieved from the user’s interaction history in a database that encapsulates each memory’s content and temporal context. Thus, this strategic storage allows agents to recall specific memories and understand their significance to the user in a temporal context, similar to how humans recognize and recall past experiences.
AI Agent has presented potential towards Artificial General Intelligence (AGI), which is expected to autonomously perceive the environments, make decisions and take actions. However, most of existing AI agents tend to train in confined environments with limited knowledge, yielding sub-optimal performance. Benefiting from the remarkable progress of large language models (LLMs), diverse LLM-based agents emerge. These agents employ LLM as the central brain to perceive, plan, and memorize, etc, which exhibit human-level intelligence across multifarious applications and obtain satisfactory performance. In this paper, we propose a survey of LLM-based agents from the perspective of theories, technologies, applications and suggestions, respectively. Specifically, we first deliver a recapitulative review of the theory foundation, which includes Large Language Models, Chain of Thought and AI Alignment, Retrieval-Augmented Generation, Embodied AI, etc; With this, we then present the key technologies, comprising four critical components: Perception, Planning, Memory and Action; Subsequently, we briefly explore some domain-related and evaluation applications; Finally, we provide pertinent suggestions based on the observations of significant challenges for LLM-based agents.
Large Language Model (LLM) agents have become increasingly prevalent across various realworld applications.They enhance decisionmaking by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents.In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting.To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent.Experiments on two representative agents demonstrate the effectiveness of MEXTRA.Moreover, we explore key factors influencing memory leakage from both the agent designer's and the attacker's perspectives.Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.
… Turing Machines and Key-Value Memory Networks to recent systems like AutoGen, … memory fabric. It also addresses work on dialogue and agent memory, multi-agent shared-memory …
While Large Language Model (LLM) based multi-agent systems (MAS) show promise, their capabilities are constrained by simplistic, monolithic memory models. To address this, we propose a novel, four-tier hierarchical memory architecture inspired by cognitive science. Our architecture decomposes memory into an L1 Active Context, L2 Working Memory for significant facts, L3 Episodic Memory for experiences, and L4 Semantic Memory for distilled knowledge. Our core innovation lies in the autonomous lifecycle engines that govern information flow between these tiers, including a CIAR scoring model for significance-based Promotion and a multi-stage Consolidation/Distillation pipeline for long-term learning. This principled, production-ready design enables more robust and auditable reasoning. We validate the performance of the underlying storage layer through a micro-benchmark, demonstrating the high reliability and throughput required for mission-critical agentic systems.
We present Mind-Tool, an AI-augmented system implementing domain memory architecture for operational infrastructure management. Unlike conventional AI assistants that operate statelessly, Mind-Tool maintains an organized memory layer (persistent knowledge files), a desired-state model (conversational goal tracking) and a continuous reasoning engine that updates digital assets over time. Deployed for managing complex IT infrastructure (Proxmox clusters, Kubernetes, networking, security systems) over a 90-day production period, Mind-Tool achieved 94% task success rate with 68% workflow automation and 62%- time reduction compared to manual approaches. Our architecture independently validates recent parallel research by anthropic demonstrating that effective AI agents require persistent domain memory rather than relying solely on large context windows. We provide quantitative results from production deployment, identify key architectural differences between autonomous coding agents and operational infrastructure agents and demonstrate that competitive advantage in AI agent systems lies in domain memory design rather than model intelligence confirming through independent development and extended operational use that domain memory represents a fundamental pat- tern for practical agent systems in human-collaborative domains.
… Large language model agents operate without persistent memory, losing accumulated knowledge at every session boundary. Current approaches to agent memory—vector databases, …
Abstract Current large language model (LLM) intelligences face the challenges of high inference cost and low decision quality when dealing with complex tasks, and are especially deficient in maintaining context coherence during long tasks. This research presents an innovative vector storage long-term memory mechanism model (VIMBank) to enhance the long-term context retention ability and task execution efficiency of LLM intelligences by storing and retrieving historical interaction data through a vector database. VIMBank utilizes a dynamic memory updating strategy and the Ebbinghaus forgetting curve theory to efficiently manage the memory of intelligences and reinforce critical information, forgetting unimportant data, and optimizing storage and reasoning costs. The experimental results show that VIMBank significantly improves the decision quality and efficiency of LLM intelligences in multi-tasking scenarios and reduces the computational cost. Compared with different agents, the success rate of task decision is increased by 10% to 20%, and the reasoning cost is reduced by about 23%, which provides an important theoretical basis and practical support for the future development of intelligences with long term memory and adaptive learning ability.
… In the retrieval-augmented generation (RAG) paradigm, a language model’s parametric knowledge is supplemented with non-parametric memory from an external vector database. This …
This paper addresses the software engineering challenges of integrating autonomous agents into production-grade web applications. While traditional implementations suffer from high latency and state synchronization issues, this study presents a full-stack solution based on TypeScript and React 19 Server Components. This paper details the implementation of a RAMP (Reflect, Act, Memory, Plan) execution loop at the code level, using Qdrant to produce low-latency (<100ms) vectors and Next.js for server-side orchestration. A key engineering contribution is the development of a strictly typed data contract that synchronizes server-side agent reasoning with client-side state management (via TanStack Query). Experimental results confirm that this specific stack architecture significantly reduces response times and prevents runtime type errors, offering a reproducible pattern for building scalable, high-load web platforms.
… records, vector memory, and audit logs into uncoordinated silos. We present AEVUM, an agent-native embedded database system that inverts this model. In AEVUM the AI agent owns, …
以上文献可归纳为:围绕“长期记忆如何写入、分层管理、触发检索并支持长程一致决策”,构建记忆底座(向量库/持久数据库/数据结构/记忆fabric/MemoryOS等);进一步通过检索增强(相关性建模、过滤、注意力/巩固量化)提升记忆可用性;在模型层面研究情景记忆与长时程跨时序依赖的利用(episodic/场景记忆、认知闭环);同时从风险侧关注记忆带来的隐私泄露;最后结合综述与工程化研究,覆盖从理论框架到生产系统与具身/多智能体场景的落地路径。