agent memory system
工作记忆与子目标/分块的短时记忆管理(In-trial/Working Memory)
聚焦“工作记忆/情境内记忆(working memory)”与在单次决策过程中的记忆选择、更新与容量管理;共同点是用层级/分块/选择机制降低冗余并提升长任务稳定性与效率(如用子目标做记忆块、基于强化学习做chunk选择、将工作记忆与动作规划联动、用类OS缓存层级提升检索)。
- HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model(Mengkang Hu, Tianxing Chen, Qiguang Chen, Yi Mu, Wenqi Shao, Ping Luo, 2025, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- A working memory model improves cognitive control in agents and robots(Michele Persiani, A. Franchi, G. Gini, 2018, Cognitive Systems Research)
- Enhancing intelligent agents with episodic memory(Andrew Nuxoll, John E. Laird, 2012, Cognitive Systems Research)
- Cooperative Scheduling and Hierarchical Memory Model for Multi-Agent Systems(Huhai Zou, Rongzhen Li, Tianhao Sun, Fei Wang, Ta-Hsin Li, Kai Liu, 2024, 2024 IEEE International Symposium on Product Compliance Engineering - Asia (ISPCE-ASIA))
长期记忆建模:巩固/遗忘、语义-情景交互与时空一致表征(Long-term Memory Modeling)
围绕长期记忆的“形成-巩固-遗忘”与语义/情景表征;共同点是借鉴神经/认知机制或图/结构化表征来实现可扩展的长期知识抽取与一致性维护,强调语义可迁移、时间/空间/逻辑约束以及记忆压缩/格式化以提升可用性。
- Memory formation, consolidation, and forgetting in learning agents(B. Subagdja, Wenwen Wang, A. Tan, Yuan-Sin Tan, Loo-Nin Teow, 2012, International Joint Conference on Autonomous Agents and Multiagent Systems)
- Semantic Memory Modeling and Memory Interaction in Learning Agents(Wenwen Wang, Ah‐Hwee Tan, Loo-Nin Teow, 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems)
- Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces(Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models(Yu Gu, Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Su, Michihiro Yasunaga, 2024, Advances in Neural Information Processing Systems 37)
- DSRd: A Proposal for a Low-Latency, Distributed Working Memory for CORTEX(P. Bustos, Juan C. García, R. Cintas, Esteban Martirena, P. Bachiller, Pedro Núñez Trujillo, A. Bandera, 2020, Advances in Intelligent Systems and Computing)
- AgenticMemory: A Binary Graph Format for Persistent, Portable, and Navigable AI Agent Memory(Omoshola S. Owolabi, 2026, … , and Navigable AI Agent Memory (February 18, 2026))
RAG到LTM的演进:全局记忆增强检索与多轮/动态记忆检索(RAG→LTM via Retrieval/Updating)
以RAG/LLM长上下文任务为主线,讨论如何让检索过程具备“动态性、记忆增强与多级层次”;共同点是把记忆模块落到检索与生成协同(draft-then-final、多轮探询-更新全局记忆、跨多尺度记忆层的自适应检索、从RAG演进到LTM)。
- MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation(Hongjin Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Defu Lian, Zhicheng Dou, Tiejun Huang, 2024, Proceedings of the ACM on Web Conference 2025)
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning(Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, M. K. Yu, Jie Zhou, Jin Xu, Liyan Xu, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- Dynamic Memory Retrieval in RAG Models: Enhancing Long-Context Reasoning(Changqing Dong, 2025, 2025 6th International Conference on Artificial Intelligence and Computer Engineering (ICAICE))
- Conversational Agents: From RAG to LTM(Dell Zhang, Yue Feng, Haiming Liu, Changzhi Sun, Jixiang Luo, Xiangyu Chen, Xuelong Li, 2025, Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region)
- RAG-Driven Memory Architectures in Conversational LLMs—A Literature Review With Insights Into Emerging Agriculture Data Sharing(Nur Arifin Akbar, Rahool Dembani, B. Lenzitti, Domenico Tegolo, 2025, IEEE Access)
- Dynamic Memory Updating in RAG: Lifelong Learning and Adaptation(Sivarama Krishna Akhil Koduri, 2026, The poper is also available on Zenodo-https://doi. org …)
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models(Yu Gu, Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Su, Michihiro Yasunaga, 2024, Advances in Neural Information Processing Systems 37)
- Vector Databases and Language Models: Synergies and Challenges(Toni Taipalus, 2025, Communications in Computer and Information Science)
记忆基础设施与系统架构:MemoryOS/Memory Fabric/向量数据库与长期检索存储(Systems & Storage Infrastructure)
聚焦“记忆系统/操作系统/向量数据库/基础设施”的工程与体系结构:包括向量存储与检索的技术底座、面向长期对话的记忆操作流程(更新/检索/生成)、以及通过KV压缩/遗忘曲线/层级存储来提升一致性与成本效率;共同点是把记忆能力系统化为可实现的模块与数据管理策略。
- Memory Fabric for Conversational AI Agents: Enabling Shared and Persistent Memory Across Users(A Tiwari, V Gupta, 2025, Authorea Preprints)
- A memory fabric for conversational AI agents enabling shared and persistent multiuser memory(A. Tiwari, Vibhuti Gupta, 2026, Discover Artificial Intelligence)
- Vector Database Management Techniques and Systems(J. Pan, Jianguo Wang, Guoliang Li, 2024, Companion of the 2024 International Conference on Management of Data)
- Vector database management systems: Fundamental concepts, use-cases, and current challenges(Toni Taipalus, 2023, Cognitive Systems Research)
- Vector Databases and Language Models: Synergies and Challenges(Toni Taipalus, 2025, Communications in Computer and Information Science)
- Vector Storage Based Long-term Memory Research on LLM(Kun Li, Xin Jing, Chengang Jing, 2024, International Journal of Advanced Network, Monitoring and Controls)
- Memory OS of AI Agent(Jie Kang, Mingming Ji, Zhe Zhao, Ting Bai, 2025, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing)
- Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents(Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith A. Lambert, Adam Amos-Binks, Zohreh Dannenhauer, Dustin Dannenhauer, 2024, Proceedings of the AAAI Symposium Series)
多智能体与交互系统的持久/分布式记忆管理(Multi-agent & Distributed/Persistent Memory)
围绕多智能体或复杂交互系统中的“持久性与分布式记忆管理”:包括多代理协作的层级架构与持久监督、跨会话/跨会话长期记忆助手、用于GUI/计算机使用的图结构持久记忆以减少重复、以及分布式/嵌入式数据库与发布订阅场景下的实时记忆操作协商;共同点是记忆被设计为支撑多代理协作与跨会话稳定运行的基础能力。
- Functional Stability and Adaptive Control in LLM-Based Computer Use Agents via Graph-Structured Persistent Memory(Danylo Vorvul, Andrii Musienko, Iryna Galchenko, Mykola Myroniuk, Андрій Собчук, 2026, Preprints.org)
- Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces(Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- An Intelligent Multi-Agent Memory Assistant(Ângelo Costa, P. Novais, 2011, Communications in Medical and Care Compunetics)
- Multi-agent Personal Memory Assistant(Ângelo Costa, P. Novais, Ricardo Costa, J. Corchado, J. Neves, 2010, Advances in Intelligent and Soft Computing)
- Society Agent: A Hierarchical Multi-Agent Architecture with Autonomous Persistent and Ephemeral Agents and Persistent Evolving Knowledge(I. Chrysochos, 2026, Authorea Preprints)
- High-Performance Implementation of Multi-Agent Web Systems: Integrating Vector Memory with Strictly Typed React Architectures(Mykhailo Nykoliuk, 2025, Universal Library of Engineering Technology)
- MemIndex: Agentic Event-based Distributed Memory Management for Multi-agent Systems(Alaa Saleh, Sasu Tarkoma, Anders Lindgren, Praveen Kumar Donta, S. Dustdar, Susanna Pirttikangas, Lauri Lovén, 2025, ACM Transactions on Autonomous and Adaptive Systems)
- AEVUM: An Agent-Native Persistent Memory Database System with Autonomous Data Management and Multi-Stage Compression(JR Maligireddy, 2026, Authorea Preprints)
- CoMMA: a multi-agent system for corporate memory management(F. Bergenti, A. Poggi, G. Rimassa, Paola Turci, 2002, Proceedings of the first international joint conference on Autonomous agents and multiagent systems part 3 - AAMAS '02)
- A multi-agent system for building project memories to facilitate the design process(D. Monticolo, Vincent Hilaire, S. Gomes, A. Koukam, 2008, Integrated Computer-Aided Engineering)
- Multi-agent Personal Memory Assistant(Ângelo Costa, P. Novais, Ricardo Costa, J. Corchado, J. Neves, 2010, Advances in Intelligent and Soft Computing)
整体上,文献可按“记忆生命周期与系统落地”拆为五条并列主线:1)会话内工作记忆的分块/选择/层级管理;2)认知启发的长期记忆形成、巩固-遗忘与语义/情景交互表征;3)面向长上下文的RAG到LTM演进:动态检索与多轮更新;4)面向工程实现的记忆操作系统/记忆织物与向量数据库底座;5)多智能体与交互式任务场景中的持久、分布式记忆管理与稳定性控制。
总计36篇相关文献
Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents.To overcome this challenge, we innovatively propose a Memory Operating System, i.e., Memo-ryOS, to achieve comprehensive and efficient memory management for AI agents.Inspired by the memory management principles in operating systems, MemoryOS designs a hierarchical storage architecture and consists of four key modules: Memory Storage, Updating, Retrieval, and Generation.Specifically, the architecture comprises three levels of storage units: short-term memory, mid-term memory, and long-term personal memory.Key operations within MemoryOS include dynamic updates between storage units: short-term to mid-term updates follow a dialogue-chain-based FIFO principle, while mid-term to long-term updates use a segmented page organization strategy.Extensive experiments on the LoCoMo benchmark show an average improvement of 49.11% on F1 and 46.18% on BLEU-1 over the baselines on GPT-4o-mini, showing contextual coherence and personalized memory retention in long conversations.
Interactive applications are latency-sensitive systems that enable dynamic responses to user inputs in domains such as robotics, industrial automation, and autonomous control. These applications require efficient application protocols for communication, with the pub/sub model being one of the most promising approaches. However, existing pub/sub systems are architecturally constrained, particularly by limited memory capacity and inefficiencies in dynamic environments. Addressing these challenges requires effective distributed memory management, yet this aspect has received limited attention in existing research. This paper addresses the gap by proposing MemIndex, an adaptive and autonomous distributed memory-management framework with an intent-indexed bipartite graph architecture. It is designed for an LM-based multi-agent pub/sub systems, enabling agents to autonomously negotiate memory operations in real time through dynamic index spaces for efficient reasoning. We evaluate our proposed MemIndex using diverse models against two baselines. Experimental results show MemIndex outperforms both baselines across storage, retrieval, update, and deletion operations, achieving average reductions of about 34% and 56% in elapsed time, 57% and 75% in CPU utilization, 23% and 76% in memory usage. Scalability tests further demonstrate that MemIndex maintains low end-to-end delay as submissions and agents grow, confirming that its negotiation-driven offloading enables efficient distributed memory management in interactive applications.
… -agent system for the management of a corporate memory. The innovative aspect of the system … were generally used separately until now: agent technology, knowledge modeling, XML …
… Section 4 presents the structure of our project memories model. We describe a multi-agent system used to build these project memories in section 5. Finally Section 6 describes the …
Constrained by the cost and ethical concerns of involving real seekers in AI-driven mental health, researchers develop LLM-based conversational agents (CAs) with tailored configurations, such as profiles, symptoms, and scenarios, to simulate seekers. While these efforts advance AI in mental health, achieving more realistic seeker simulation remains hindered by two key challenges: dynamic evolution and multi-session memory. Seekers'mental states often fluctuate during counseling, which typically spans multiple sessions. To address this, we propose AnnaAgent, an emotional and cognitive dynamic agent system equipped with tertiary memory. AnnaAgent incorporates an emotion modulator and a complaint elicitor trained on real counseling dialogues, enabling dynamic control of the simulator's configurations. Additionally, its tertiary memory mechanism effectively integrates short-term and long-term memory across sessions. Evaluation results, both automated and manual, demonstrate that AnnaAgent achieves more realistic seeker simulation in psychological counseling compared to existing baselines. The ethically reviewed and screened code can be found on https://github.com/sci-m-wang/AnnaAgent.
… present a single memory system, since often such a system will depend on … memory systems that are compatible in a general manner with computational attention and emotion systems. …
Large Language Models (LLMs) are leading a technological revolution. This gives agents based on LLMs renewed vitality, and multi-agent collaboration is showing the potential to foster new forms of intelligence. However, current multi-agent systems face two major challenges: the first is the issue of resource coordination and scheduling within multi-agent systems, and the second is the limitation of the context window in large models, which hinders the practical application of agents in long-term conversational scenarios, urgently requiring improvements in memory capabilities. To address these challenges, we propose a collaborative scheduling strategy and hierarchical memory model for LLM-based multi-agent systems inspired by operating systems. First, we design a time-sharing scheduling strategy, analogous to process scheduling in operating systems, which divides the resource usage cycle into finer-grained single-step workflows, allocating independent resource windows to different agents to reduce resource contention and conflicts. Second, we introduce a hierarchical memory model based on the multi-level cache architecture of operating systems, segmenting the agents' memory into core memory, main memory, and vague memory areas, thereby significantly improving memory retention and retrieval efficiency in LLM - based agents when handling complex tasks. Experimental results demonstrate that our proposed method achieves efficient resource allocation in multi-agent systems while significantly enhancing the memory capabilities of agents and overall system performance.
… reference system for modelling spatial prepositions, the “state transition semantics” system … language sentences, the agent’s episodic memory system and associated reflective demons, …
… can fulfil the development requisites of Memory Assistants is the Multi-Agent System one [12]. The … Generally, agent platforms support the interaction between the agents by means of …
… On top of this platform a Personal Memory Assistant and a Social Enabler where developed … a distributed system approach is adequate for developing multi-agent systems for healthcare …
… Turing Machines and Key-Value Memory Networks to recent systems like AutoGen, … memory fabric. It also addresses work on dialogue and agent memory, multi-agent shared-memory …
Large Language Model (LLM)-based agents exhibit significant potential across various domains, operating as interactive systems that process environmental observations to generate executable actions for target tasks.The effectiveness of these agents is significantly influenced by their memory mechanism, which records historical experiences as sequences of actionobservation pairs.We categorize memory into two types: cross-trial memory, accumulated across multiple attempts, and in-trial memory (working memory), accumulated within a single attempt.While considerable research has optimized performance through cross-trial memory, the enhancement of agent performance through improved working memory utilization remains underexplored.Instead, existing approaches often involve directly inputting entire historical action-observation pairs into LLMs, leading to redundancy in long-horizon tasks.Inspired by human problem-solving strategies, this paper introduces HIAGENT, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically.Specifically, HIAGENT prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal.Experimental results across five long-horizon tasks demonstrate that HIAGENT achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.Additionally, our analysis shows that HIAGENT consistently improves performance across various steps, highlighting its robustness and generalizability.
… dialogue systems enhanced with external or persistent memory [22]. Unlike traditional … have explored persistent memory, shared retrieval, or multi-agent coordination, the notion of …
… model agents operate without persistent memory, losing accumulated knowledge at every session boundary. Current approaches to agent memory… structure that makes memory useful. …
Background: Traditional AI coding assistants operate as single agents responding to immediate user requests, lacking persistence, organizational structure, and the ability to coordinate complex, long-running tasks. Existing multi-agent systems typically use ephemeral agents with flat architectures and no long-term memory. Objectives: We introduce Society Agent, a supervised multi-agent system that transforms standalone AI assistants into coordinated teams capable of autonomous, long-running work. The system's hierarchical architecture can model both human organizations (companies, departments, teams) and large software systems (modules, components, services). Methods: We design and implement a hierarchical agent architecture with persistent supervisors and ephemeral workers, integrating the Mind-Tool file-based memory system for persistent evolving knowledge. The system includes cron-based task scheduling, zero-token heartbeat monitoring, self-reconfiguration through folder reorganization, and a web dashboard for human oversight. Results: Our evaluation across three use cases (automated software development, organizational simulation, and self-reengineering systems) demonstrates that Society Agent successfully coordinates multiple agents across hierarchical departments while maintaining persistent knowledge and enabling autonomous task execution without continuous human intervention. Conclusions: Society Agent represents a paradigm shift from task-execution tools to organizational AI systems capable of modeling and eventually augmenting entire companies, departments, and development teams. The combination of hierarchical structure, persistent memory, and autonomous operation enables a new class of AI applications.
Large language model (LLM)-driven computer use agents (CUAs) automate graphical user interface (GUI) tasks but often re-solve previously encountered subtasks, increasing token use, latency, and instability. We address this limitation with a directed graph-based persistent memory in which nodes represent observable GUI states and edges encode executable action sequences. We formalize the memory-augmented agent as S=〈A,Σ,G,δ,π,Φ〉, define stability conditions by analogy with functional stability theory, and derive token-cost efficiency bounds. In control-theoretic terms, the Manager–Worker architecture becomes a closed-loop system where memory provides experience-based feedback, and selecting between memory retrieval and fresh LLM planning is treated as adaptive control. Experiments on OSWorld show that the proposed agent cuts both LLM token consumption and execution time by about 50% versus a memoryless baseline while preserving comparable success rates (≈36.9% on 15-step and ≈46.9% on 50-step tasks). Structured graph memory therefore improves robustness under perturbation and supports convergent efficiency gains over time.
… In this work, we propose HippoRAG, a RAG framework that serves as a long-term memory … We model the three components of human long-term memory to mimic its pattern separation …
Large Language Models (LLMs) have significantly advanced Artificial Intelligence (AI), demonstrating impressive capabilities in language understanding, reasoning, and generation. However, their fixed context windows fundamentally limit their utility in sustained, complex human-computer interactions, leading to issues such as forgetting previous turns, lacking consistent personas, and an inability to perform long-horizon reasoning. While Retrieval-Augmented Generation (RAG) offers a promising solution by externalizing knowledge and providing LLMs with relevant information from external corpora, its traditional static retrieve-then-generate pipeline often struggles with dynamic knowledge integration, introduces noise, and overlooks structural relationships. This tutorial introduces the evolution from traditional RAG to advanced Long-Term Memory (LTM) mechanisms that equip LLM-based conversational agents with human-like memory capabilities. We will explore various LTM architectures, including textual, graph-based, and parametric memory, detailing their forms, operations (such as dynamic indexing, retrieval, updating, and consolidation), and multimodal integration strategies. The tutorial will cover cutting-edge systems (like Mem0), illustrating how they enable agents to maintain coherent conversations, personalize interactions, and perform complex reasoning over extended periods. We will also delve into evaluation benchmarks (e.g., LoCoMo and ZH-4O) as well as metrics that comprehensively assess these long-term memory capabilities. Finally, we will discuss current limitations and promising future research directions, particularly focusing on AI self-evolution, multimodal memory, and ethical considerations. This tutorial aims to provide a comprehensive understanding for researchers and practitioners interested in building the next generation of intelligent, memory-aware conversational AI agents.
Retrieval-Augmented Generation (RAG) models have revolutionized the way large language models (LLMs) access and utilize external knowledge. However, traditional RAG pipelines often rely on static retrieval mechanisms, which limit adaptability and degrade performance in long-context or evolving knowledge scenarios. This paper introduces Dynamic Memory Retrieval (DMR) — a framework designed to enhance RAG’s reasoning capabilities by enabling adaptive, context-aware retrieval across multi-scale memory hierarchies. We explore its theoretical foundations, architectural design, and implications for long-context reasoning, highlighting improvements in interpretability, stability, and efficiency.
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and its high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods could fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition on reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global comprehension, offering a principled, cognitively motivated paradigm for retrieval-based stateful reasoning.
Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answers, providing useful clues for the retrieval tools to locate relevant information within the long context. Second, it leverages an expensive but expressive system, which generates the final answer based on the retrieved information. Building upon this fundamental framework, we realize the memory module in the form of KV compression, and reinforce its memorization and cluing capacity from the Generation quality's Feedback (a.k.a. RLGF). In our experiments, MemoRAG achieves superior performances across a variety of long-context evaluation tasks, not only complex scenarios where traditional RAG methods struggle, but also simpler ones where RAG is typically applied.
… edge bases restricts adaptability and long-term scalability. This paper synthesizes recent literature on RAG system design, … Jiang et al., “Long term memory: The foundation of AI self- …
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the Generative Semantic Workspace (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an Operator, which maps incoming observations to intermediate semantic structures, and a Reconciler, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to 20%. Furthermore, GSW is highly efficient, reducing query-time context tokens by 51% compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.
Despite significant advances in natural language processing, conversational AI systems face persistent challenges in maintaining extensive and contextually coherent dialogues, particularly regarding long-term memory management. This literature review synthesizes current approaches to memory architectures in conversational AI, examining the transition from basic dialogue agents to more sophisticated, agentic frameworks. We analyze how vector databases and Retrieval-Augmented Generation (RAG) address fundamental challenges in storing and retrieving conversational context, maintaining system responsiveness, managing user-specific data ethically, and integrating domain-specific information. Through systematic review of papers, we identify critical limitations of vector embedding in capturing extended conversational context, particularly in agentic domains requiring semantic, episodic, procedural, and emotional memory. We evaluate how RAG frameworks can augment vector databases to handle memory-intensive tasks requiring real-time updates and domain-specific knowledge integration. Furthermore, we examine alternative architectures including knowledge graphs, finite state machines, and hybrid solutions, highlighting the data quality and ethical challenges that must be addressed for scalable, reliable AI memory management. Our analysis provides a structured framework for understanding memory evolution in conversational AI, identifies gaps in current RAG solutions, proposes hybrid memory designs, and outlines future research directions emphasizing cross-domain applications in agriculture.
Abstract Cognition entails those mental processes enabling understanding the current situation through senses, experience, and thought, and supporting the acquisition of new knowledge. A fundamental contribution in cognition is offered by the working memory, that is a small, short-term memory containing and protecting from interference goal-relevant pieces of information. Grounding our work on biological and neuroscientific studies, we modeled and implemented working memory processes in a software model, IDRA-WM, that can simultaneously act as short-term memory and actions generator, thanks to the use of a reinforcement-driven mechanism for chunk selection. Moreover our system integrates the functions of the working memory with a basic action planner. We tested the model with robot relevant tasks to assess whether the proposed solution can learn to solve a problem on the basis of a delayed reward. The experimental results indicate that IDRA-WM is able to solve even those tasks that do not provide immediate reward after an action.
Semantic memory plays a critical role in reasoning and decision making. It enables an agent to abstract useful knowledge learned from its past experience. Based on an extension of fusion adaptive resonance theory network, this paper presents a novel self-organizing memory model to represent and learn various types of semantic knowledge in a unified manner. The proposed model, called fusion adaptive resonance theory for multimemory learning, incorporates a set of neural processes, through which it may transfer knowledge and cooperate with other long-term memory systems, including episodic memory and procedural memory. Specifically, we present a generic learning process, under which various types of semantic knowledge can be consolidated and transferred from the specific experience encoded in episodic memory. We also identify and formalize two forms of memory interactions between semantic memory and procedural memory, through which more effective decision making can be achieved. We present experimental studies, wherein the proposed model is used to encode various types of semantic knowledge in different domains, including a first-person shooting game called Unreal Tournament, the Toads and Frogs puzzle, and a strategic game known as StarCraft Broodwar. Our experiments show that the proposed knowledge transfer process from episodic memory to semantic memory is able to extract useful knowledge to enhance the performance of decision making. In addition, cooperative interaction between semantic knowledge and procedural skills can lead to a significant improvement in both learning efficiency and performance of the learning agents.
… One of CORTEX’s main elements is a working memory designed as a graph-like data structure … The new working memory presents important advantages over existing designs that are …
… working memory then that production fires and performs its actions which consist of creating or removing elements from working memory… create representations in working memory of the …
Memory enables past experiences to be remembered and acquired as useful knowledge to support decision making, especially when perception and computational resources are limited. This paper presents a neuropsychological-inspired dual memory model for agents, consisting of an episodic memory that records the agent's experience in real time and a semantic memory that captures factual knowledge through a parallel consolidation process. In addition, the model incorporates a natural forgetting mechanism that prevents memory overloading by removing transient memory traces. Our experimental study based on a real-time first-person-shooter video game has indicated that the memory consolidation and forgetting processes are not only able to extract valuable knowledge and regulate the memory capacity, but they can mutually improve the effectiveness of learning the knowledge for the given task in hand. Interestingly, a moderate level of forgetting may even improve the task performance rather than disadvantaging it. We suggest that the interplay between rapid memory formation, consolidation, and forgetting processes points to a practical and effective approach for learning agents to acquire and maintain useful knowledge from experiences in a scalable manner.
Vector database management systems have emerged as an important component in modern data management, driven by the growing importance for the need to computationally describe rich data such as texts, images and video in various domains such as recommender systems, similarity search, and chatbots. These data descriptions are captured as numerical vectors that are computationally inexpensive to store and compare. However, the unique characteristics of vectorized data, including high dimensionality and sparsity, demand specialized solutions for efficient storage, retrieval, and processing. This narrative literature review provides an accessible introduction to the fundamental concepts, use-cases, and current challenges associated with vector database management systems, offering an overview for researchers and practitioners seeking to facilitate effective vector data management.
Feature vectors are now mission-critical for many applications, including retrieval-based large language models (LLMs). Traditional database management systems are not equipped to deal with the unique characteristics of feature vectors, such as the vague notion of semantic similarity, large size of vectors, expensive similarity comparisons, lack of indexable structure, and difficulty of answering "hybrid" queries that combine structured attributes with feature vectors. A number of vector database management systems (VDBMSs) have been developed to address these challenges, combining novel techniques for query processing, storage and indexing, and query optimization and execution and culminating in a spectrum of performance and accuracy characteristics and capabilities. In this tutorial, we review the existing vector database management techniques and systems. For query processing, we review similarity score design and selection, vector query types, and vector query interfaces. For storage and indexing, we review various indexes and discuss compression as well as disk-resident indexes. For query optimization and execution, we review hybrid query processing, hardware acceleration, and distibuted search. We then review existing systems, search engines and libraries, and benchmarks. Finally, we present research challenges and open problems.
In this paper, we provide a review of the current efforts to develop LLM agents, which are autonomous agents that leverage large language models. We examine the memory management approaches used in these agents. One crucial aspect of these agents is their long-term memory, which is often implemented using vector databases. We describe how vector databases are utilized to store and retrieve information in LLM agents. Moreover we highlight open problems, such as the separation of different types of memories and the management of memory over the agent's lifetime. Lastly, we propose several topics for future research to address these challenges and further enhance the capabilities of LLM agents, including the use of metadata in procedural and semantic memory and the integration of external knowledge sources with vector databases.
Abstract Current large language model (LLM) intelligences face the challenges of high inference cost and low decision quality when dealing with complex tasks, and are especially deficient in maintaining context coherence during long tasks. This research presents an innovative vector storage long-term memory mechanism model (VIMBank) to enhance the long-term context retention ability and task execution efficiency of LLM intelligences by storing and retrieving historical interaction data through a vector database. VIMBank utilizes a dynamic memory updating strategy and the Ebbinghaus forgetting curve theory to efficiently manage the memory of intelligences and reinforce critical information, forgetting unimportant data, and optimizing storage and reasoning costs. The experimental results show that VIMBank significantly improves the decision quality and efficiency of LLM intelligences in multi-tasking scenarios and reduces the computational cost. Compared with different agents, the success rate of task decision is increased by 10% to 20%, and the reasoning cost is reduced by about 23%, which provides an important theoretical basis and practical support for the future development of intelligences with long term memory and adaptive learning ability.
… In the retrieval-augmented generation (RAG) paradigm, a language model’s parametric knowledge is supplemented with non-parametric memory from an external vector database. This …
This paper addresses the software engineering challenges of integrating autonomous agents into production-grade web applications. While traditional implementations suffer from high latency and state synchronization issues, this study presents a full-stack solution based on TypeScript and React 19 Server Components. This paper details the implementation of a RAMP (Reflect, Act, Memory, Plan) execution loop at the code level, using Qdrant to produce low-latency (<100ms) vectors and Next.js for server-side orchestration. A key engineering contribution is the development of a strictly typed data contract that synchronizes server-side agent reasoning with client-side state management (via TanStack Query). Experimental results confirm that this specific stack architecture significantly reduces response times and prevents runtime type errors, offering a reproducible pattern for building scalable, high-load web platforms.
… records, vector memory, and audit logs into uncoordinated silos. We present AEVUM, an agent-native embedded database system that inverts this model. In AEVUM the AI agent owns, …
整体上,文献可按“记忆生命周期与系统落地”拆为五条并列主线:1)会话内工作记忆的分块/选择/层级管理;2)认知启发的长期记忆形成、巩固-遗忘与语义/情景交互表征;3)面向长上下文的RAG到LTM演进:动态检索与多轮更新;4)面向工程实现的记忆操作系统/记忆织物与向量数据库底座;5)多智能体与交互式任务场景中的持久、分布式记忆管理与稳定性控制。