agent
基于大语言模型(LLM)的智能体架构、任务规划与协作
该组论文探讨利用LLM作为核心大脑,通过提示工程、自动规划、记忆机制及工具调用来处理复杂认知任务,并研究了多智能体间的协作范式及科研/垂直领域的应用。
- DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences(Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai, 2024, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design(Zhihan Liu, Yubo Chai, Jianfeng Li, 2024, Journal of Chemical Information and Modeling)
- EvoMDT: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer.(Qicai Liu, Zhichao Hu, Tao Huang, Yupeng Niu, Xinche Zhang, Shanwu Ma, Chutong Lin, Goh Kim Huat, Hyeokkoo Eric Kwon, Feng Gao, Xianfu Sun, Zhitao Ying, Guangliang Qiang, 2026, NPJ digital medicine)
- AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environments(Nan Sun, Bo Mao, Yongchang Li, Lumeng Ma, Di Guo, Huaping Liu, 2024, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems(Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-rong Wen, 2023, Proceedings of the ACM Web Conference 2024)
- AuditAgent: LLM Agent for Risks Auditing in Recommender Systems(Du Su, Zhenxing Chen, Shilong Zhao, Yuanhao Liu, Fei Sun, Qi Cao, Huawei Shen, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- Electromagnetic metamaterial agent.(Shengguo Hu, Mingyi Li, Jiawen Xu, Hongrui Zhang, Shanghang Zhang, Tie Jun Cui, Philipp Del Hougne, Lianlin Li, 2025, Light, science & applications)
- Merging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Powered Autonomous Agents(Ankita Sharma, 2024, Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023)
- AdaptJobRec: Enhancing Conversational Career Recommendation through an LLM-Powered Agentic System(Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, B. Aoun, Greg Hayworth, Han Li, Xintao Wu, 2025, AAAI Conference on Artificial Intelligence)
- Watson: A Cognitive Observability Framework for the Reasoning of LLM-Powered Agents(Benjamin Rombaut, Sogol Masoumzadeh, Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan, 2024, 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE))
- A Governance Framework For Agentic AI: Mitigating Systemic Risks In LLM-Powered Multi-Agent Architectures(Annapurneswar Putrevu, 2025, Journal of International Crisis and Risk Communication Research)
- RecMind: Large Language Model Powered Agent For Recommendation(Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, Yingzhen Yang, 2023, Findings of the Association for Computational Linguistics: NAACL 2024)
- Plugin Framework-Based Neuro-Symbolic Grounded Task Planning for Multi-Agent System.(Jiyoun Moon, 2021, Sensors (Basel, Switzerland))
- Agent-vs-Agent Cyber Warfare: Autonomous AI Systems Defending Against AI-Enabled APTs(Dr. Salman Arafath Mohammed, 2025, International Journal on Advanced Computer Engineering and Communication Technology)
- A Hybrid Large Vision Model Powered GUI Agent for Walmart Myassistant Application(Qixin Wang, Puneet Girdhar, Yaowei Hu, Dawei Wang, Yifan Wang, Chaitanya Devella, Ayush Dwivedi, Wen Guo, Terrence Liu, Manmohan Dogra, Kun Chen, Yuhao Zheng, Shangwen Huang, Swati Pandey, Balram Mirani, Diwash Pokharel, Greg Hayworth, Han Li, Xintao Wu, 2025, 2025 IEEE International Conference on Big Data (BigData))
- Integrating large language model-based agents into a virtual patient chatbot for clinical anamnesis training.(Nicolas Laverde, Christian Grévisse, Sandra Jaramillo, Ruben Manrique, 2025, Computational and structural biotechnology journal)
- Empowering biomedical discovery with AI agents.(Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik, 2024, Cell)
- Exploring Applicability of LLM-Powered Autonomous Agents to Solve Real-life Problems: Microsoft Entra ID Administration Agent (MEAN)(Roberto Rodriguez, Nestori Syynimaa, 2024, Proceedings of the 26th International Conference on Enterprise Information Systems)
- Intelligent Agent Planning for Optimizing Parallel MRI Reconstruction via A Large Language Model.(Yuchou Chang, Zhiqiang Li, Huy Anh Pham, Gulfam Ahmed Saju, 2024, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference)
- LLM-based multi-agent system for autonomous maintenance process of machine tools(Jongsu Park, Seongwoo Cho, Yoonji Chae, Sena Nur Durgunlu, J. Um, 2026, PHM Society Asia-Pacific Conference)
- Artificial Intelligence Agents for Materials Sciences.(O N Oliveira, L Christino, M C F Oliveira, F V Paulovich, 2023, Journal of chemical information and modeling)
- Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology(Dyke Ferber, O. E. El Nahhas, G. Wölflein, I. Wiest, J. Clusmann, M. Leßmann, S. Foersch, Jacqueline Lammert, Maximilian Tschochohei, Dirk Jaeger, M. Salto-Tellez, N. Schultz, D. Truhn, J. Kather, 2025, Nature Cancer)
- A Taxonomy for Autonomous LLM-Powered Multi-Agent Architectures(Thorsten Händler, 2023, Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management)
- MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools(Zikang Guo, Benfeng Xu, Chiwei Zhu, Wentao Hong, Xiaorui Wang, Zhendong Mao, 2025, AAAI Conference on Artificial Intelligence)
- First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An Llm-Powered Multi-AI-Agent Solution(Yihao Zhang, Qizhi Qiu, Xiaomin Liu, Dianxuan Fu, Xingyu Liu, Leyan Fei, Yuming Cheng, Lilin Yi, Weisheng Hu, Q. Zhuge, 2025, 2025 European Conference on Optical Communications (ECOC))
- Leveraging Multi-agent System Powered by Large Language Model to Improve Transparency and Reliability in Automated Supply Chain Coordination(Xianxian Zhao, Wenyi Kuang, Yong-Woo Kim, 2025, Annual Conference of the International Group for Lean Construction)
- Enhancing Neural Architecture Search with LLM-Based Autonomous Agents for Efficient Network Optimisation(Qifan Chen, 2025, 2025 5th International Conference on Artificial Intelligence, Automation and High Performance Computing (AIAHPC))
- Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance(Ya-Ting Lu, Shenzhi Yang, Cheng Qian, Gui-Fang Chen, Qinyu Luo, Yesai Wu, Huadong Wang, X. Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, Maosong Sun, 2024, International Conference on Learning Representations)
- SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World(Jiaqi Zhang, Chen Gao, Liyuan Zhang, Quoc Viet Hung Nguyen, Hongzhi Yin, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- Blueprint2Code: a multi-agent pipeline for reliable code generation via blueprint planning and repair.(Kehao Mao, Baokun Hu, Ruixin Lin, Zewen Li, Guanyu Lu, Zhengyu Zhang, 2025, Frontiers in artificial intelligence)
- ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models(Chenliang Li, Hehong Chen, Mingshi Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou, 2023, Conference on Empirical Methods in Natural Language Processing)
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation(Yaoxiang Wang, Zhiyong Wu, Junfeng Yao, Jinsong Su, 2024, Neural Networks)
- Accelerating earth science discovery via multi-agent LLM systems.(Dmitrii Pantiukhin, Boris Shapkin, Ivan Kuznetsov, Antonia Anna Jost, Nikolay Koldunov, 2025, Frontiers in artificial intelligence)
- LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing(Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi, 2024, Proceedings of the 29th International Conference on Intelligent User Interfaces)
- An Intelligent LLM-Powered Personalized Assistant for Digital Banking Using LangGraph and Chain of Thoughts(Md. Easin Arafat, Sourav Saha, Tamás Orosz, 2024, 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY))
- MASTER: A Multi-Agent System with LLM Specialized MCTS(Bingzheng Gan, Yufan Zhao, Tianyi Zhang, Jing Huang, Yusu Li, Shu Xian Teo, Changwang Zhang, Wei Shi, 2025, North American Chapter of the Association for Computational Linguistics)
- ChatEDA: A Large Language Model Powered Autonomous Agent for EDA(Haoyuan Wu, Zhuolun He, Xinyun Zhang, Xufeng Yao, Su Zheng, Haisheng Zheng, Bei Yu, 2023, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)
- Reflective Multi-Agent Collaboration based on Large Language Models(Xiaohe Bo, Zeyu Zhang, Quanyu Dai, Xueyang Feng, Lei Wang, Rui Li, Xu Chen, Ji-Rong Wen, 2024, Advances in Neural Information Processing Systems 37)
- LangGraph-Orchestrated LLM Agents for Scalable Movie Knowledge Graphs and Question Answering(A. Kaplunovich, 2025, International Conference on AI Research)
- ProactiveVA: Proactive Visual Analytics with LLM-Based UI Agent.(Yuheng Zhao, Xueli Shu, Liwen Fan, Lin Gao, Yu Zhang, Siming Chen, 2026, IEEE transactions on visualization and computer graphics)
- Intelligent agent based on large language model(Jiaxin Li, Fang He, Haojie Hu, Jianwei Zhao, Fenggan Zhang, 2025, Eighth International Conference on Artificial Intelligence and Pattern Recognition (AIPR 2025))
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks(Ruofan Lu, Yichen Li, Yintong Huo, 2025, 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE))
- Autonomous LLM Agent: A Memory-Augmented, Edge-Optimized SHAP Explanations With Zero-Day Attack Resilience in IoT/Industrial IoT Networks(Y. Saheed, J. Chukwuere, 2026, IEEE Internet of Things Journal)
- An autonomous GIS agent framework for geospatial data retrieval(H. Ning, Zhenlong Li, Temitope Akinboyewa, M. Lessani, 2024, International Journal of Digital Earth)
- GIS Copilot: towards an autonomous GIS agent for spatial analysis(Temitope Akinboyewa, Zhenlong Li, H. Ning, M. Lessani, 2024, International Journal of Digital Earth)
- Multi-agent systems powered by large language models: applications in swarm intelligence(Cristian Jimenez-Romero, Alper Yegenoglu, Christian Blum, 2025, Frontiers in Artificial Intelligence)
- tagE: Enabling an Embodied Agent to Understand Human Instructions(Chayan Sarkar, Avik Mitra, Pradip Pramanick, Tapas Nayak, 2023, Conference on Empirical Methods in Natural Language Processing)
- ChatExosome: An Artificial Intelligence (AI) Agent Based on Deep Learning of Exosomes Spectroscopy for Hepatocellular Carcinoma (HCC) Diagnosis.(Zhejun Yang, Tongtong Tian, Jilie Kong, Hui Chen, 2025, Analytical Chemistry)
- ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models(Y. Kang, Jihan Kim, 2024, Nature Communications)
- LLM-Powered AI Agents for Autonomous Optical Networks: Recent Advances and Field Trial Demonstrations [Invited](Qizhi Qiu, Yihao Zhang, Xiaomin Liu, L. Yi, Weisheng Hu, Q. Zhuge, 2025, 2025 Asia Communications and Photonics Conference (ACP))
- SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models(S. S. Kannan, Vishnunandan L. N. Venkatesh, Byung-Cheol Min, 2023, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Large Language Model Agent for Modular Task Execution in Drug Discovery.(Janghoon Ock, Radheesh Sharma Meda, Srivathsan Badrinarayanan, Neha S Aluru, Achuth Chandrasekhar, Amir Barati Farimani, 2026, Journal of chemical information and modeling)
- A Multiagent-Driven Robotic AI Chemist Enabling Autonomous Chemical Research On Demand.(Tao Song, Man Luo, Xiaolong Zhang, Linjiang Chen, Yan Huang, Jiaqi Cao, Qing Zhu, Daobin Liu, Baicheng Zhang, Gang Zou, Guoqing Zhang, Fei Zhang, Weiwei Shang, Yao Fu, Jun Jiang, Yi Luo, 2025, Journal of the American Chemical Society)
- Sura.ai: Multi-Agent Infrastructure Recovery with LLM-Powered Autonomous Remediation(Ananya Arvind, Shruthi Narayanan, Saishriya Narayanan, 2026, Proceedings of the 18th International Conference on Agents and Artificial Intelligence)
- AI-HOPE: an AI-driven conversational agent for enhanced clinical and genomic data integration in precision medicine research(E.-W. Yang, E. Velazquez-Villarreal, 2025, Bioinformatics)
- Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub(Bohan Lyu, X. Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Ya-Ting Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun, 2023, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
- SciToolAgent: a knowledge-graph-driven scientific agent for multitool integration(Keyan Ding, Jing Yu, Junjie Huang, Yuchen Yang, Qiang Zhang, Huajun Chen, 2025, Nature Computational Science)
- CACTUS: Chemistry Agent Connecting Tool Usage to Science.(Andrew D McNaughton, Gautham Krishna Sankar Ramalaxmi, Agustin Kruel, Carter R Knutson, Rohith A Varikoti, Neeraj Kumar, 2024, ACS omega)
- LLM Agents for Smart City Management: Enhancing Decision Support Through Multi-Agent AI Systems(A. Kalyuzhnaya, Sergey Mityagin, E. Lutsenko, Andrey Getmanov, Yaroslav Aksenkin, Kamil Fatkhiev, Kirill Fedorin, Nikolay O. Nikitin, Natalia Chichkova, V. Vorona, A. Boukhanovsky, 2025, Smart Cities)
- First Field Trial of LLM-Powered AI Agent for Lifecycle Management of Autonomous Driving Optical Networks(Xiaomin Liu, Qizhi Qiu, Yihao Zhang, Yuming Cheng, L. Yi, Weisheng Hu, Q. Zhuge, 2024, Optical Fiber Communications Conference and Exhibition)
- Enhancing LLMs for Power System Simulations: A Feedback-Driven Multi-Agent Framework(Mengshuo Jia, Zeyu Cui, Gabriela Hug, 2024, IEEE Transactions on Smart Grid)
- Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments(Seungjun Han, Wongyung Choi, 2024, Advances in Artificial Intelligence and Machine Learning)
- Autonomous Analysis of Curated Patient Data Using a Large Language Model-Based Multiagent Framework.(Jiasheng Wang, David M Swoboda, Aziz Nazha, 2025, JCO clinical cancer informatics)
- Motif: Intrinsic Motivation from Artificial Intelligence Feedback(Martin Klissarov, P. D'Oro, Shagun Sodhani, R. Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff, 2023, International Conference on Learning Representations)
多智能体强化学习(MARL)与复杂系统协调优化
该组文献集中研究利用MARL技术解决大规模复杂系统中的协作问题,重点在于分布式决策、高维状态空间、通信协议、资源调度以及交通、物流、能源等动态环境下的性能优化。
- Hierarchical deep reinforcement learning for self-adaptive economic dispatch.(Mengshi Li, Dongyan Yang, Yuhan Xu, Tianyao Ji, 2024, Heliyon)
- Integration of an agent-based strategic planner in an enterprise service bus ecosystem(Adriano Ferreira, Arnaldo Pereira, N. Rodrigues, José Barbosa, P. Leitão, 2015, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN))
- Deep Reinforcement Learning-Based Large-Scale Robot Exploration(Yuhong Cao, Rui Zhao, Yizhuo Wang, Bairan Xiang, Guillaume Sartoretti, 2024, IEEE Robotics and Automation Letters)
- CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario(Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Z. Li, 2019, The World Wide Web Conference)
- Artificial agent: The fusion of artificial intelligence and a mobile agent for energy-efficient traffic control in wireless sensor networks(Jiayi Lu, Luanye Feng, Jun Yang, Mohammad Mehedi Hassan, Abdulhameed Alelaiwi, I. Humar, 2019, Future Generation Computer Systems)
- Multiple Autonomous Underwater Vehicles-Assisted Data Collection in 6G-Driven Underwater Wireless Networks Based on Software-Defined MARL(Chuan Lin, Yu Zhang, Guangjie Han, Chang Lu, Shengchao Zhu, Shengbo Wang, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Cooperative Multi-Agent Reinforcement Learning for Large Scale Variable Speed Limit Control(Yuhang Zhang, Marcos Quiñones-Grueiro, William Barbour, Zhiyao Zhang, Joshua Scherer, Gautam Biswas, D. Work, 2023, 2023 IEEE International Conference on Smart Computing (SMARTCOMP))
- Decentralized Multi-agent Reinforcement Learning for Large-scale Mobile Wireless Sensor Network Control Using Mean Field Games(Zejian Zhou, Lijun Qian, Hao Xu, 2024, 2024 33rd International Conference on Computer Communications and Networks (ICCCN))
- Adaptive Wireless Network Management with Multi-Agent Reinforcement Learning.(Ameer Ivoghlian, Zoran Salcic, Kevin I-Kai Wang, 2022, Sensors (Basel, Switzerland))
- Multi-Agent Hierarchical Graph Attention Actor-Critic Reinforcement Learning.(Tongyue Li, Dianxi Shi, Songchang Jin, Zhen Wang, Huanhuan Yang, Yang Chen, 2024, Entropy (Basel, Switzerland))
- MARVEL: Multi-Agent Reinforcement Learning for Constrained Field-of-View Multi-Robot Exploration in Large-Scale Environments(Jimmy Chiun, Shizhe Zhang, Yizhuo Wang, Yuhong Cao, G. Sartoretti, 2025, 2025 IEEE International Conference on Robotics and Automation (ICRA))
- Multi-Agent Mix Hierarchical Deep Reinforcement Learning for Large-Scale Fleet Management(Xiaohui Huang, Jiahao Ling, Xiaofei Yang, Xiong Zhang, Kaiming Yang, 2023, IEEE Transactions on Intelligent Transportation Systems)
- Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning(Han Qie, Dian-xi Shi, Tianlong Shen, Xinhai Xu, Yuan Li, Liujing Wang, 2019, IEEE Access)
- Nonzero-Sum Game Reinforcement Learning for Performance Optimization in Large-Scale Industrial Processes.(Jinna Li, Jinliang Ding, Tianyou Chai, Frank L Lewis, 2020, IEEE transactions on cybernetics)
- Coverage Optimization for Large-Scale Mobile Networks With Digital Twin and Multi-Agent Reinforcement Learning(Haoqiang Liu, Tong Li, Fenyu Jiang, Weikang Su, Zhaocheng Wang, 2024, IEEE Transactions on Wireless Communications)
- Evolutionary optimization for risk-aware heterogeneous multi-agent path planning in uncertain environments.(Fatemeh Rekabi Bana, Tomáš Krajník, Farshad Arvin, 2024, Frontiers in robotics and AI)
- MAARS: Multiagent Actor-Critic Approach for Resource Allocation and Network Slicing in Multiaccess Edge Computing.(Ducsun Lim, Inwhee Joe, 2024, Sensors (Basel, Switzerland))
- Federated reinforcement learning with constrained markov decision processes and graph neural networks for fair and grid-constrained coordination of large-scale electric vehicle charging networks.(Lixia Zhou, Dawei Huo, Jian Chen, Bo Bo, Hao Li, 2025, Scientific reports)
- Local Distribution Voltage Control Using Large-Scale Coordinated PV Inverters: A Novel Multi-Agent Deep Reinforcement Learning-Based Approach(Yinfan Wang, Weihao Hu, Di Cao, Pengfei Zhao, Sayed Abulanwar, Zhe Chen, F. Blaabjerg, 2025, IEEE Transactions on Smart Grid)
- Development of a Dynamic Path Planning System for Autonomous Mobile Robots Using a Multi-Agent System Approach.(Bradley Fourie, Louis Louw, Günter Bitsch, 2025, Sensors (Basel, Switzerland))
- Enhancing Underwater IoT Security: A Collaborative Pursuit Strategy Using Multi-Agent Reinforcement Learning(Yun Hou, Guangjie Han, Fan Zhang, Chuan Lin, 2024, IEEE Internet of Things Magazine)
- Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning.(Xiaoqiang Wang, Liangjun Ke, Zhimin Qiao, Xinghua Chai, 2021, IEEE transactions on cybernetics)
- Multi-Agent Reinforcement Learning for Resource Allocation in Large-Scale Robotic Warehouse Sortation Centers(Yi Shen, Benjamin McClosky, Joseph W. Durham, M. Zavlanos, 2023, 2023 62nd IEEE Conference on Decision and Control (CDC))
- Intelligent multicast routing method based on multi-agent deep reinforcement learning in SDWN.(Hongwen Hu, Miao Ye, Chenwei Zhao, Qiuxiang Jiang, Xingsi Xue, 2023, Mathematical biosciences and engineering : MBE)
- A graph-based safe reinforcement learning method for multi-agent cooperation.(Fandi Gou, Haikuo Du, Yunze Cai, 2026, Neural networks : the official journal of the International Neural Network Society)
- Multi-agent Motion Planning from Signal Temporal Logic Specifications(Dawei Sun, Jingkai Chen, S. Mitra, Chuchu Fan, 2022, IEEE Robotics and Automation Letters)
- A Multi Agent Based Approach for Prehospital Emergency Management.(Reza Safdari, Jaleh Shoshtarian Malak, Niloofar Mohammadzadeh, Azimeh Danesh Shahraki, 2017, Bulletin of emergency and trauma)
- Underwater Target Tracking Based on Hierarchical Software-Defined Multi-AUV Reinforcement Learning: A Multi-AUV Advantage-Attention Actor-Critic Approach(Shengchao Zhu, Guangjie Han, Chuan Lin, Qiuzi Tao, 2024, IEEE Transactions on Mobile Computing)
- Fuzzy Knowledge-Based Hierarchical Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems.(Dingbang Liu, Fenghui Ren, Jun Yan, Guoxin Su, Wen Gu, Shohei Kato, 2025, IEEE transactions on cybernetics)
- Multi-agent reinforcement learning driven resource game optimization for network slicing in MEC-enabled HetNets.(Kai Mao, 2025, Scientific reports)
- UAV Swarm Cooperative Target Search: A Multi-Agent Reinforcement Learning Approach(Yukai Hou, Jin Zhao, Rongqing Zhang, Xiang Cheng, Liuqing Yang, 2024, IEEE Transactions on Intelligent Vehicles)
- MULTI-AGENT SOFTWARE DEVELOPMENT FOR DIGITAL IMAGE PROCESSING(V. E. Bolshakov, A. N. Alfimtsev, 2024, Informatization and communication)
- Combinatorial Auction Method for Decentralized Task Assignment of Multiple-Loading Capacity AGV Based on Intelligent Agent Architecture(M. Fauadi, Wan-Ling Li, T. Murata, 2011, 2011 Second International Conference on Innovations in Bio-inspired Computing and Applications)
- Large-Scale Computation Offloading Using a Multi-Agent Reinforcement Learning in Heterogeneous Multi-Access Edge Computing(Zhen Gao, Lei Yang, Yu Dai, 2023, IEEE Transactions on Mobile Computing)
- Optimizing Large-Scale Fleet Management on a Road Network using Multi-Agent Deep Reinforcement Learning with Graph Neural Network(Juhyeon Kim, 2020, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC))
- Engineering A Large-Scale Traffic Signal Control: A Multi-Agent Reinforcement Learning Approach(Yue Chen, Changle Li, Wenwei Yue, Hehe Zhang, Guoqiang Mao, 2021, IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS))
- TrafficGen: A Flexible Tool For Informing Agent-Based Traffic Simulations With Open Data(Alexandre Bonhomme, P. Mathieu, S. Picault, 2015, Lecture Notes in Computer Science)
- Underwater Equipotential Line Tracking Based on Self-Attention Embedded Multiagent Reinforcement Learning Toward AUV-Based ITS(Chuan Lin, Guangjie Han, Qiuzi Tao, Li Liu, Syed Bilal Hussain Shah, Tongwei Zhang, Zhenglin Li, 2023, IEEE Transactions on Intelligent Transportation Systems)
- Distributed agent-based deep reinforcement learning for large scale traffic signal control(Qiang Wu, Jianqing Wu, Jun Shen, Bo Du, A. Telikani, M. Fahmideh, Chao Liang, 2022, Knowledge-Based Systems)
- Toward Adaptive and Coordinated Transportation Systems: A Multi-Personality Multi-Agent Meta-Reinforcement Learning Framework(Songjun Huang, Chuanneng Sun, Ruo-Qian Wang, D. Pompili, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Graph-Based Multi-agent Reinforcement Learning for Large-Scale UAVs Swarm System Control(Bocheng Zhao, M. Huo, Zheng Li, Ze Yu, Naiming Qi, 2024, Aerospace Science and Technology)
- A Reinforcement Learning Control Framework Based on Scalable Graph Transformer for Large-Scale Fuzzy Job Shop Scheduling Problems.(Wenquan Zhang, Fei Zhao, Bo Feng, Xuesong Mei, 2025, IEEE transactions on neural networks and learning systems)
- Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive Network.(Hao Huang, Zhiqun Hu, Zhaoming Lu, Xiangming Wen, 2023, IEEE transactions on cybernetics)
- Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative-Competitive Environments Based on Hierarchical Graph Attention.(Yining Chen, Guanghua Song, Zhenhui Ye, Xiaohong Jiang, 2022, Entropy (Basel, Switzerland))
- An Improved Acceleration Method Based on Multi-Agent System for AGVs Conflict-Free Path Planning in Automated Terminals(Kunlun Guo, Jin Zhu, Lei Shen, 2021, IEEE Access)
- Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning(Kaixiang Lin, Renyu Zhao, Zhe Xu, Jiayu Zhou, 2018, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining)
- Fast and Adaptive Multi-Agent Planning under Collaborative Temporal Logic Tasks via Poset Products(Zesen Liu, Meng Guo, Weimin Bao, Zhongkui Li, 2023, Research)
- Multi agent reinforcement learning for online layout planning and scheduling in flexible assembly systems(Lea Kaven, Philipp Huke, Amon Göppert, Robert H. Schmitt, 2024, Journal of Intelligent Manufacturing)
- Attention Based Large Scale Multi-agent Reinforcement Learning(Xiaoqiang Wang, Liangjun Ke, Gewei Zhang, Dapeng Zhu, 2022, 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD))
- Multi-Agent Deep Reinforcement Learning for Integrated Demand Forecasting and Inventory Optimization in Sensor-Enabled Retail Supply Chains.(Yongbin Yang, Mengdie Wang, Jiyuan Wang, Pan Li, Mengjie Zhou, 2025, Sensors (Basel, Switzerland))
- Transformer-Based Soft Actor-Critic for UAV Path Planning in Precision Agriculture IoT Networks.(Guanting Ge, Mingde Sun, Yiyuan Xue, Svitlana Pavlova, 2025, Sensors (Basel, Switzerland))
- Optimizing Fuel-Constrained UAV-UGV Routes for Large Scale Coverage: Bilevel Planning in Heterogeneous Multi-Agent Systems(Md Safwan Mondal, S. Ramasamy, James D. Humann, Jean-Paul F. Reddinger, James M. Dotterweich, Marshal A. Childers, Pranav A. Bhounsule, 2023, 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS))
- Multi-Agent Deep Reinforcement Learning for Solving Large-scale Air Traffic Flow Management Problem: A Time-Step Sequential Decision Approach(Yifan Tang, Yan Xu, 2021, 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC))
- A Scalable Privacy-Preserving Multi-Agent Deep Reinforcement Learning Approach for Large-Scale Peer-to-Peer Transactive Energy Trading(Yujian Ye, Yi Tang, Huiyu Wang, Xiao-Ping Zhang, G. Strbac, 2021, IEEE Transactions on Smart Grid)
- Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles.(Anum Mushtaq, Irfan Ul Haq, Muhammad Azeem Sarwar, Asifullah Khan, Wajeeha Khalil, Muhammad Abid Mughal, 2023, Sensors (Basel, Switzerland))
- Collaborative optimization of task scheduling and multi-agent path planning in automated warehouses(Honglin Zhang, Yaohua Wu, Jinchang Hu, Yanyan Wang, 2023, Complex & Intelligent Systems)
- Pruning-Based Multi-Agent DRL for Efficient Twins Migration in Vehicular Embodied AI Networks(Yuxiang Wei, Zhuoqi Zeng, Yue Zhong, Jiawen Kang, Minrui Xu, Junjie Ma, 2025, 2025 7th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT))
- Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning(Rose E. Wang, Sarah A. Wu, James A. Evans, J. Tenenbaum, D. Parkes, Max Kleiman-Weiner, 2020, International Joint Conference on Autonomous Agents and Multiagent Systems)
- A combined multi-agent system for distributed multi-project scheduling problems(Fang Fu, Hong Zhou, 2021, Applied Soft Computing)
- A Closed-Loop Toolchain for Neural Network Simulations of Learning Autonomous Agents.(Jakob Jordan, Philipp Weidel, Abigail Morrison, 2019, Frontiers in computational neuroscience)
- Adaptive Intrusion Mitigation in Software-Defined Vehicles Using Deep Reinforcement Learning(Harrison Kurunathan, H. Ali, Gowhar Javanmardi, Mohamed Eldefrawy, M. Gaitán, Ramiro Robles, P. Yomsi, Eduardo Tovar, 2025, Proceedings of the 4th International Workshop on Real-time and IntelliGent Edge computing)
- A Fault-Tolerant Multi-Agent Reinforcement Learning Framework for Unmanned Aerial Vehicles–Unmanned Ground Vehicle Coverage Path Planning(M. Ramezani, M. A. Amiri Atashgah, Alireza Rezaee, 2024, Drones)
- Multi-Agent LLM-powered AI for Autonomous Optical Power Commissioning of OMS Links(Yujiao Hao, Mahdi Hemmati, Mehrad Vaezi, Yuren You, Christopher Janz, 2025, 2025 European Conference on Optical Communications (ECOC))
- Attention Enhanced Reinforcement Learning for Multi agent Cooperation.(Zhiqiang Pu, Huimu Wang, Zhen Liu, Jianqiang Yi, Shiguang Wu, 2023, IEEE transactions on neural networks and learning systems)
- Multi-agent architecture for information retrieval and intelligent monitoring by UAVs in known environments affected by catastrophes(David Vallejo, J. J. Castro-Schez, C. González-Morcillo, J. Albusac, 2020, Engineering Applications of Artificial Intelligence)
- Safe multi-agent motion planning via filtered reinforcement learning(Abraham P. Vinod, Sleiman Safaoui, A. Chakrabarty, R. Quirynen, Nobuyuki Yoshikawa, S. Cairano, 2022, 2022 International Conference on Robotics and Automation (ICRA))
- Efficient and scalable reinforcement learning for large-scale network control(Chengdong Ma, Aming Li, Yali Du, Hao Dong, Yaodong Yang, 2024, Nature Machine Intelligence)
- An Application of Model-Free Reinforcement Learning to the Control of Aerial Vehicles With Slung Payloads(Eleni Sabourin, Eric Lanteigne, 2024, 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE))
- Multi-Agent Reinforcement Learning Based on Representational Communication for Large-Scale Traffic Signal Control(Rohit Bokade, Xiaoning Jin, Chris Amato, 2023, IEEE Access)
- Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning(Songyan Liu, Muyang Fan, Weizi Li, Jing Du, Shuai Li, 2025, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- An approach to model based testing of multiagent systems.(Shafiq Ur Rehman, Aamer Nadeem, 2015, TheScientificWorldJournal)
- Multi-UAV Redeployment Optimization Based on Multi-Agent Deep Reinforcement Learning Oriented to Swarm Performance Restoration.(Qilong Wu, Zitao Geng, Yi Ren, Qiang Feng, Jilong Zhong, 2023, Sensors (Basel, Switzerland))
- Multi-Agent Learning-Based Optimal Task Offloading and UAV Trajectory Planning for AGIN-Power IoT(Peng Qin, Yang Fu, Yuanbo Xie, Kui Wu, Xianchao Zhang, Xiongwen Zhao, 2023, IEEE Transactions on Communications)
- Uplink Power Control for Extremely Large-Scale MIMO with Multi-Agent Reinforcement Learning and Fuzzy Logic(Zih-Yi Liu, Zhilong Liu, Jiayi Zhang, Huahua Xiao, Bo Ai, Derrick Wing Kwan Ng, 2023, IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS))
- Origin-Destination Pattern Effects on Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning(Muyang Fan, Songyan Liu, Weizi Li, 2025, 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC))
- Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control(Tianshu Chu, Jie Wang, Lara Codecà, Zhaojian Li, 2019, IEEE Transactions on Intelligent Transportation Systems)
- Adaptive and Robust DBSCAN With Multi-Agent Reinforcement Learning.(Hao Peng, Xiang Huang, Shuo Sun, Ruitong Zhang, Xizhao Wang, Philip S Yu, 2026, IEEE transactions on pattern analysis and machine intelligence)
- Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control.(Tian Tan, Feng Bao, Yue Deng, Alex Jin, Qionghai Dai, Jie Wang, 2020, IEEE transactions on cybernetics)
- MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence(Lianmin Zheng, Jiacheng Yang, Han Cai, Weinan Zhang, J. Wang, Yong Yu, 2017, Proceedings of the AAAI Conference on Artificial Intelligence)
- GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control(Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan, 2023, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence)
- A Distributed Multi-Agent Reinforcement Learning With Graph Decomposition Approach for Large-Scale Adaptive Traffic Signal Control(Shan Jiang, Yufei Huang, M. Jafari, M. Jalayer, 2022, IEEE Transactions on Intelligent Transportation Systems)
- Large-Scale Machine Learning Cluster Scheduling via Multi-Agent Graph Reinforcement Learning(Xiaoyang Zhao, Chuan Wu, 2021, IEEE Transactions on Network and Service Management)
- A Multi-Agent Reinforcement Learning Framework for Public Health Decision Analysis.(Dinesh Sharma, Ankit Shah, Chaitra Gopalappa, 2025, Healthcare analytics (New York, N.Y.))
- NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning.(Jiajun Chai, Yuanheng Zhu, Dongbin Zhao, 2024, IEEE transactions on neural networks and learning systems)
- Smart Underwater Pollution Detection Based on Graph-Based Multi-Agent Reinforcement Learning Towards AUV-Based Network ITS(Chuan Lin, Guangjie Han, Tongwei Zhang, S. B. H. Shah, Yan Peng, 2023, IEEE Transactions on Intelligent Transportation Systems)
- Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control.(Tian Tan, Feng Bao, Yue Deng, Alex Jin, Qionghai Dai, Jie Wang, 2020, IEEE transactions on cybernetics)
- Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation(Siddharth Nayak, Kenneth M. F. Choi, Wenqi Ding, Sydney I. Dolan, Karthik Gopalakrishnan, H. Balakrishnan, 2022, International Conference on Machine Learning)
- Digital Twin Enhanced Multi-Agent Reinforcement Learning for Large-Scale Mobile Network Coverage Optimization(Haoqiang Liu, Weikang Su, Tong Li, Wenzhen Huang, Yong Li, 2024, ACM Transactions on Knowledge Discovery from Data)
- Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems(Qing Fu, Tenghai Qiu, Jianqiang Yi, Z. Pu, Shiguang Wu, 2022, AAAI Conference on Artificial Intelligence)
- GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning(Qianyue Hao, Wenzhen Huang, Tao Feng, Jian Yuan, Yong Li, 2023, Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
- Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning.(Jiao Wang, Mingrui Yuan, Yun Li, Zihui Zhao, 2023, Neural networks : the official journal of the International Neural Network Society)
- Generative Diffusion-Based Contract Design for Efficient AI Twin Migration in Vehicular Embodied AI Networks(Yue Zhong, Jiawen Kang, Jinbo Wen, Dongdong Ye, Jiangtian Nie, D. Niyato, Xiaozheng Gao, Shengli Xie, 2025, IEEE Transactions on Mobile Computing)
具身智能体、人机交互与情感建模
该组聚焦于智能体在物理或虚拟空间中的具身表现,涵盖情感交互、个性化协作、医疗护理辅助及多模态反馈设计,重点探讨如何通过情感和感知能力提升用户体验。
- A Computational Model for Managing Impressions of an Embodied Conversational Agent in Real-Time(Béatrice Biancardi, Chen Wang, M. Mancini, Angelo Cafaro, G. Chanel, C. Pelachaud, 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII))
- Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational Agent(Deepali Aneja, Rens Hoegen, Daniel J. McDuff, M. Czerwinski, 2021, Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems)
- Conversational Agents Supporting Self-Management in People With a Chronic Disease: Systematic Review.(Tessa F Peerbolte, Rozanne Ja van Diggelen, Pieter van den Haak, Kim Geurts, Luc Jw Evers, Bastiaan R Bloem, Nienke M de Vries, Sanne W van den Berg, 2025, Journal of medical Internet research)
- Designing Emotionally Intelligent Embodied Agents for Immersive Virtual Reality Experiences(S. Rad, Razeen Hussain, Manuela Chessa, Fabio Solari, 2026, 2026 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR))
- Time-Aware Multi-Agent Symbiosis.(Michail Maniadakis, Emmanouil Hourdakis, Markos Sigalas, Stylianos Piperakis, Maria Koskinopoulou, Panos Trahanias, 2020, Frontiers in robotics and AI)
- Embodied Intelligence: Smooth Coping in the Learning Intelligent Decision Agent Cognitive Architecture(Christian Kronsted, Sean Kugele, Zachariah A. Neemeh, Kevin Ryan, S. Franklin, 2022, Frontiers in Psychology)
- Spontaneous Interactions with a Virtually Embodied Intelligent Assistant in Minecraft(Fraser Allison, E. Luger, Katja Hofmann, 2017, Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems)
- A multi-agent architecture for intelligent home network service using tuple space model(Jae-Chul Moon, Soon-Ju Kang, 2000, IEEE Transactions on Consumer Electronics)
- Artificial intelligence in virtual reality simulation for interprofessional communication training: Mixed method study.(Sok Ying Liaw, Jian Zhi Tan, Siriwan Lim, Wentao Zhou, John Yap, Rabindra Ratan, Sim Leng Ooi, Shu Jing Wong, Betsy Seah, Wei Ling Chua, 2023, Nurse education today)
- Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scans.(Florin-Cristian Ghesu, Bogdan Georgescu, Yefeng Zheng, Sasa Grbic, Andreas Maier, Joachim Hornegger, Dorin Comaniciu, 2019, IEEE transactions on pattern analysis and machine intelligence)
- Socially situated artificial intelligence enables learning from human interaction(Ranjay Krishna, Donsuk Lee, Li Fei-Fei, Michael S. Bernstein, 2022, Proceedings of the National Academy of Sciences)
- A Tablet Based Embodied Conversational Agent to Promote Smoking Cessation among Veterans: A Feasibility Study.(Abu S Abdullah, Stephan Gaehde, Tim Bickmore, 2018, Journal of epidemiology and global health)
- Effects of Verbal Interruption in Conversations with an Intelligent Virtual Agent in Virtual Reality(David Egelhofer, Jiafan Gao, Nils Heinsohn, Sherwin Khabari, Lucie Kruse, Ke Li, Fariba Mostajeran, Frank Steinicke, 2025, Proceedings of the 2025 ACM Symposium on Spatial User Interaction)
- An Embodied Conversational Agent to Minimize the Effects of Social Isolation During Hospitalization(Jemma Smith, Aashish Bhandari, Berkan Yuksel, A. Kocaballi, 2022, No journal)
- Embodied intelligence via learning and evolution.(Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei, 2021, Nature communications)
- Human-robot sensor interface for cardiac rehabilitation.(Juan S Lara, Jonathan Casas, Andres Aguirre, Marcela Munera, Monica Rincon-Roncancio, Bahar Irfan, Emmanuel Senft, Tony Belpaeme, Carlos A Cifuentes, 2017, IEEE ... International Conference on Rehabilitation Robotics : [proceedings])
- An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance(Hsuan-Kung Yang, Tsu-Ching Hsiao, R. Oka, Ryuya Nishino, Satoko Tofukuji, N. Kobori, 2025, 2025 IEEE International Symposium on Mixed and Augmented Reality (ISMAR))
- One Robot, Many Minds: Factors Shaping Visitors' Evaluation of an Autonomous Museum Robot Guide(Luca Garello, Francesca Cocchella, Manuel G. Catalano, A. Sciutti, Francesco Rea, 2025, Proceedings of the 13th International Conference on Human-Agent Interaction)
- Embodied Object Representation Learning and Recognition.(Toon Van de Maele, Tim Verbelen, Ozan Çatal, Bart Dhoedt, 2022, Frontiers in neurorobotics)
- Enacting Plant-Inspired Robotics.(Jonny Lee, Paco Calvo, 2021, Frontiers in neurorobotics)
- Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations.(Tianyu Song, Felix Pabst, Ulrich Eck, Nassir Navab, 2025, IEEE transactions on visualization and computer graphics)
- A multi-layer artificial intelligence and sensing based affective conversational embodied agent(S. DiPaola, Ö. Yalçın, 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW))
- Teaching Vehicles to Steer Themselves with Deep Learning(Ian Timmis, Nicholas Paul, C. Chung, 2021, 2021 IEEE International Conference on Electro Information Technology (EIT))
- Knowledge-Based Embodied Question Answering.(Sinan Tan, Mengmeng Ge, Di Guo, Huaping Liu, Fuchun Sun, 2023, IEEE transactions on pattern analysis and machine intelligence)
- Intelligent Blended Agents: Reality-Virtuality Interaction with Artificially Intelligent Embodied Virtual Humans(S. Schmidt, Oscar Ariza, Frank Steinicke, 2020, Multimodal Technologies and Interaction)
- Optimizing Patient-AI Dialogue: Integrating Empathic Technology.(Gheorghe Ioan Mihalas, 2025, Studies in health technology and informatics)
- Conversational agents in physical and psychological symptom management: A systematic review of randomized controlled trials.(Qingling Yang, Kin Cheung, Yan Zhang, Yazhou Zhang, Jing Qin, Yao Jie Xie, 2025, International journal of nursing studies)
- Collaborative Learning with Artificial Intelligence Speakers(Gyeong-Geon Lee, Seonyeong Mun, Myeong-Kyeong Shin, Xiaoming Zhai, 2023, Science & Education)
- Exploring the Ethical Challenges of Conversational AI in Mental Health Care: Scoping Review.(Mehrdad Rahsepar Meadi, Tomas Sillekens, Suzanne Metselaar, Anton van Balkom, Justin Bernstein, Neeltje Batelaan, 2025, JMIR mental health)
- Modelling Therapeutic Alliance using a User-aware Explainable Embodied Conversational Agent to Promote Treatment Adherence(Amal Abdulrahman, Deborah Richards, 2019, Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents)
- Investigating the Impact of Multimodal Feedback on User-Perceived Latency and Immersion with LLM-Powered Embodied Conversational Agents in Virtual Reality(Morad Elfleet, Mathieu Chollet, 2024, Proceedings of the ACM International Conference on Intelligent Virtual Agents)
- Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy.(Amelia Fiske, Peter Henningsen, Alena Buyx, 2019, Journal of medical Internet research)
- Human-AI collaborative learning in mixed reality: Examining the cognitive and socio-emotional interactions(Belle Dang, Luna Huynh, Faaiz Gul, Carolyn Rosé, Sanna Järvelä, Andy Nguyen, 2025, British Journal of Educational Technology)
- An artificial intelligence-driven agent for real-time head-and-neck IMRT plan generation using conditional generative adversarial network (cGAN).(Xinyi Li, Chunhao Wang, Yang Sheng, Jiahan Zhang, Wentao Wang, Fang-Fang Yin, Qiuwen Wu, Q Jackie Wu, Yaorong Ge, 2021, Medical physics)
- LLM-based multi-agent system for neuro-ophthalmic diagnosis and personalized treatment planning.(Wenmiao Wang, 2025, Frontiers in neuroscience)
- Automatic path searching for interactive navigation support within virtual medical 3-dimensional objects.(Hansrudi Noser, Christian Stern, Peter Stucki, 2004, Academic radiology)
智能体架构、认知理论、基础设施与安全性
该组探讨智能体系统的底层架构(如认知体系、BDI架构)、建模理论、仿真方法论以及面对现实环境所需的安全性、可解释性与鲁棒性保障措施。
- TheatreBot: A Software Architecture for a Theatrical Robot(Julian M. Angel Fernandez, Andrea Bonarini, 2013, Lecture Notes in Computer Science)
- Towards an Experimental Autonomous Blimp Platform(Axel Rottmann, M. Sippel, Thorsten Zitterell, Wolfram Burgard, L. Reindl, C. Scholl, 2007, European Conference on Mobile Robots)
- GIMT: A Tool for Ontology and Goal Modeling in BDI Multi-Agent Design(M. Cossentino, Daniele Dalle Nogare, R. Giancarlo, C. Lodato, S. Lopes, Patrizia Ribino, L. Sabatucci, V. Seidita, 2014, Workshop From Objects to Agents)
- Actor-Agent Communities: Design Approaches(Sorin M. Iacob, K. Nieuwenhuis, N. Wijngaards, G. Pavlin, J. V. Veelen, 2009, Studies in Computational Intelligence)
- An Intelligent Agent Architecture for Smart Environments(S. Ferilli, B. D. Carolis, Domenico Redavid, 2015, Lecture Notes in Computer Science)
- Virtual actors that can perform scripts and improvise roles(P. Wavish, David Connah, 1997, Proceedings of the first international conference on Autonomous agents - AGENTS '97)
- The quest of parsimonious XAI: A human-agent architecture for explanation formulation(Yazan Mualla, I. Tchappi, Timotheus Kampik, A. Najjar, Davide Calvaresi, A. Abbas-Turki, Stéphane Galland, C. Nicolle, 2021, Artificial Intelligence)
- A Multi-Agent architecture for intelligent gathering systems(David Camacho, R. Aler, D. Borrajo, J. M. López, 2005, AI Communications)
- Evolutionary reinforcement learning algorithm for large-scale multi-agent cooperation and confrontation applications(Haiying Liu, Zhihao Li, Kuihua Huang, Rui Wang, Guangquan Cheng, Tie-xiang Li, 2023, The Journal of Supercomputing)
- Patterns for Negotiating Actors(M. Weiss, B. Esfandiari, 2005, European Conference on Pattern Languages of Programs)
- Editorial: Decision-making and planning for multi-agent systems.(Panagiotis Tsiotras, Matthew Gombolay, Jakob Foerster, 2024, Frontiers in robotics and AI)
- Collaborative framework of an intelligent agent system for efficient logistics transport planning(F. Feng, Y. Pang, G. Lodewijks, Wenfeng Li, 2017, Computers & Industrial Engineering)
- Uncertainty-based modulation for lifelong learning.(Andrew P Brna, Ryan C Brown, Patrick M Connolly, Stephen B Simons, Renee E Shimizu, Mario Aguilar-Simon, 2019, Neural networks : the official journal of the International Neural Network Society)
- Enhancing generative AI reliability via agentic AI in 6G-enabled edge computing(Laha Ale, Scott A. King, Ning Zhang, Huanlai Xing, 2025, Nature Reviews Electrical Engineering)
- A Mobile Intelligent Agent-Based Architecture for E-Business(Zhiyong Weng, T. Tran, 2007, International Journal of Information Technology and Web Engineering)
- A Multi-Agent System based simulation approach for planning procurement operations and scheduling with multiple cross-docks(Reddivari Himadeep Reddy, S. K. Kumar, K. Fernandes, M. Tiwari, 2017, Computers & Industrial Engineering)
- Learning algorithm for an intelligent decision making system based on multi-agent neurocognitive architectures(Z. Nagoev, I. Pshenokova, O. Nagoeva, Z. Sundukov, 2021, Cognitive Systems Research)
- Multi Agent Architecture for Automated Health Coaching.(Ajith Vemuri, Keith Decker, Mathew Saponaro, Gregory Dominick, 2021, Journal of medical systems)
- Cognitive modeling, ecological psychology, and musical improvisation.(Kevin J Ryan, 2023, Frontiers in robotics and AI)
- Expression unleashed in artificial intelligence.(Ekaterina I Tolstaya, Abhinav Gupta, Edward Hughes, 2023, The Behavioral and brain sciences)
- Personalized conciliation of clinical guidelines for comorbid patients through multi-agent planning.(Juan Fdez-Olivares, Eva Onaindia, Luis Castillo, Jaume Jordán, Juan Cózar, 2019, Artificial intelligence in medicine)
- Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security(Daniel Grzonka, Agnieszka Jakóbik, J. Kolodziej, Sabri Pllana, 2017, Future Generation Computer Systems)
- Embodied intelligence in manufacturing: leveraging large language models for autonomous industrial robotics(Haolin Fan, Xuan Liu, J. Y. Fuh, Wen Feng Lu, Bingbing Li, 2024, Journal of Intelligent Manufacturing)
- Aircraft taxiing route planning based on multi-agent system(F. Chen, 2016, 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC))
- Causal action empowerment for efficient reinforcement learning in embodied agents(Hongye Cao, Fan Feng, Jing Huo, Yang Gao, 2025, Science China Information Sciences)
- An architecture design of the intelligent agent for speech recognition and translation(A.M. Ahmad, A.N.B.A. Nordin, E. Saaim, Den Fairol Samaon, Moulay Ibrahim, 2004, 2004 IEEE Region 10 Conference TENCON 2004.)
- Interlinked switch circuits of biological intelligence.(Raktim Mukherjee, Saptarshi Sinha, Gary D Luker, Pradipta Ghosh, 2024, Trends in biochemical sciences)
- New Multi-Agent architecture of visual Intelligent Decision Support Systems application in the medical field(Hamdi Ellouzi, Hela Ltifi, Mounir Ben Ayed, 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA))
- Multi-agent reinforcement learning vibration control and trajectory planning of a double flexible beam coupling system(Zhi-cheng Qiu, Jun Hu, Xian-min Zhang, 2023, Mechanical Systems and Signal Processing)
- CT2X-IRA: CT to x-ray image registration agent using domain-cross multi-scale-stride deep reinforcement learning.(Haixiao Geng, Deqiang Xiao, Shuo Yang, Jingfan Fan, Tianyu Fu, Yucong Lin, Yanhua Bai, Danni Ai, Hong Song, Yongtian Wang, Feng Duan, Jian Yang, 2023, Physics in medicine and biology)
- A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases(Yanni Li, Yuping Wang, Erfeng Tian, 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology)
- Discovering state-of-the-art reinforcement learning algorithms.(Junhyuk Oh, Gregory Farquhar, Iurii Kemaev, Dan A Calian, Matteo Hessel, Luisa Zintgraf, Satinder Singh, Hado van Hasselt, David Silver, 2025, Nature)
- JavaLog: a framework-based integration of Java and Prolog for agent-oriented programming(Analía Amandi, M. Campo, Alejandro Zunino, 2005, Computer Languages, Systems & Structures)
- MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning(Ziren Xiao, Peisong Li, Chang Liu, Honghao Gao, Xinheng Wang, 2024, Information Fusion)
- A multi-agent reinforcement learning method with curriculum transfer for large-scale dynamic traffic signal control(Xuesi Li, Jingchen Li, Haobin Shi, 2023, Applied Intelligence)
- Applying ontologies to the development and execution of Multi-Agent Systems(Artur Freitas, Alison R. Panisson, L. Hilgert, Felipe Meneguzzi, R. Vieira, Rafael Heitor Bordini, 2017, Web Intelligence)
- Dynamically Scaled Fuzzy Control of Autonomous Intelligent Actor(A. Tserkovny, 2023, Journal of Software Engineering and Applications)
- Load Balancing of Autonomous Actors over Dynamic Networks(Travis Desell, K. E. Maghraoui, Carlos A. Varela, 2004, Hawaii International Conference on System Sciences)
- Security-Informed Safety Analysis of Autonomous Transport Systems Considering AI-Powered Cyberattacks and Protection.(Oleg Illiashenko, Vyacheslav Kharchenko, Ievgen Babeshko, Herman Fesenko, Felicita Di Giandomenico, 2023, Entropy (Basel, Switzerland))
- AI-Enabled Next-Generation Communication Networks: Intelligent Agent and AI Router(Chunxiao Jiang, N. Ge, Linling Kuang, 2020, IEEE Wireless Communications)
- Space-Air-Ground Integrated Mobile Crowdsensing for Partially Observable Data Collection by Multi-Scale Convolutional Graph Reinforcement Learning.(Yixiang Ren, Zhenhui Ye, Guanghua Song, Xiaohong Jiang, 2022, Entropy (Basel, Switzerland))
- Avoiding Threats Using Multi Agent System Planning for Web Based Systems(Punam Bedi, Vandana Gandotra, Archana Singhal, Vandita Vats, Neha Mishra, 2009, Lecture Notes in Computer Science)
- Neuropsychological architecture of a general-purpose artificial intelligence agent(Z. V. Nagoev, 2025, News of the Kabardin-Balkar Scientific Center of RAS)
- Mix-attention approximation for homogeneous large-scale multi-agent reinforcement learning(Yang Shike, Jingchen Li, Haobin Shi, 2022, Neural Computing and Applications)
- Integrated PK-PD and agent-based modeling in oncology.(Zhihui Wang, Joseph D Butner, Vittorio Cristini, Thomas S Deisboeck, 2015, Journal of pharmacokinetics and pharmacodynamics)
- Engineering Approaches for Programming Agent-Based IoT Objects Using the Resource Management Architecture.(Fabian Cesar Brandão, Maria Alice Trinta Lima, Carlos Eduardo Pantoja, Jean Zahn, José Viterbo, 2021, Sensors (Basel, Switzerland))
- An introductory preview of Autonomous Intelligent Cyber-defense Agent reference architecture, release 2.0(A. Kott, P. Théron, L. Mancini, Edlira Dushku, Agostino Panico, Martin Drašar, Benoît Leblanc, P. Losiewicz, A. Guarino, Mauno Pihelgas, K. Rządca, 2020, The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology)
- Intelligent agent characterization and uncertainty management with fuzzy set theory: a tool to support early supplier integration(Pamela McCauley, 1999, Journal of Intelligent Manufacturing)
- Intelligent hybrid multi-agent architecture for engineering complex systems(R. Khosla, T. Dillon, 1997, Proceedings of International Conference on Neural Networks (ICNN'97))
- Re-architecting the virtual human toolkit: towards an interoperable platform for embodied conversational agent research and development(Arno Hartholt, Edward Fast, Zong-yong Li, Kevin Kim, Andrew Leeds, S. Mozgai, 2022, Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents)
- Neural Dynamic Principles for an Intentional Embodied Agent.(Jan Tekülve, Gregor Schöner, 2024, Cognitive science)
- AntAlate-A Multi-Agent Autonomy Framework.(David Dovrat, Alfred M Bruckstein, 2021, Frontiers in robotics and AI)
- Actors Based Agent Modelling and Simulation(Giulio Angiani, Paolo Fornacciari, Gianfranco Lombardo, A. Poggi, M. Tomaiuolo, 2018, Communications in Computer and Information Science)
- On the Modeling, Refinement and Integration of Decentralized Agent Coordination(Jan Sudeikat, W. Renz, 2009, Lecture Notes in Computer Science)
- Multi-Agent Architectures as Organizational Structures(Manuel Kolp, P. Giorgini, J. Mylopoulos, 2006, Autonomous Agents and Multi-Agent Systems)
- Multi-Agent Architecture for Intelligent Tutoring Systems(A. Laureano-Cruces, F. Arriaga-Gómez, 1998, Interactive Learning Environments)
- Development a BDI-Based Intelligent Agent Architecture for Distribution Systems Restoration Planning(Yen-Tsung Pan, M. Tsai, 2009, 2009 15th International Conference on Intelligent System Applications to Power Systems)
- An Autonomous Intelligent Agent Architecture Based on Constructivist AI(F. S. Perotto, R. Vicari, L. Alvares, 2004, IFIP International Federation for Information Processing)
- IASelect: Finding Best-fit Agent Practices in Industrial CPS Using Graph Databases(Chandan Sharma, R. Sinha, P. Leitão, 2019, 2019 IEEE 17th International Conference on Industrial Informatics (INDIN))
- Agent architecture of an intelligent medical system based on federated learning and blockchain technology(Dawid Połap, Gautam Srivastava, Keping Yu, 2021, Journal of Information Security and Applications)
- PAULA: Multi-agent architecture for coordination of intelligent agent systems(S. Ibarra, C. Quintero, J. A. Ramon, J. L. de la Rosa, J. Castán, 2007, 2007 European Control Conference (ECC))
- Inductive biases of neural network modularity in spatial navigation.(Ruiyi Zhang, Xaq Pitkow, Dora E Angelaki, 2024, Science advances)
- Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence(N. Ghaffarzadegan, A. Majumdar, Ross Williams, N. Hosseinichimeh, 2023, System Dynamics Review)
- Interactive Intelligent Agent Architecture(Rym Ameur, J. Heudin, 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops)
- Multi-satellite Mission Planning Using a Self-Adaptive Multi-agent System(J. Bonnet, Marie-Pierre Gleizes, Elsy Kaddoum, S. Rainjonneau, G. Flandin, 2015, 2015 IEEE 9th International Conference on Self-Adaptive and Self-Organizing Systems)
- Web Mining in the EVA Intelligent Agent Architecture(P. Millet, J. Heudin, 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops)
- Multi-Agent Oriented Integration in Distributed Control System(D. Choiński, M. Senik, 2011, Lecture Notes in Computer Science)
- Deep Learning and Data Mining Classification through the Intelligent Agent Reasoning(A. Chemchem, F. Alin, M. Krajecki, 2018, 2018 6th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW))
- COHUMAIN: Building the Socio-Cognitive Architecture of Collective Human-Machine Intelligence.(Cleotilde Gonzalez, Henny Admoni, Scott Brown, Anita Williams Woolley, 2025, Topics in cognitive science)
- PaCAR: COVID-19 Pandemic Control Decision Making via Large-Scale Agent-Based Modeling and Deep Reinforcement Learning(Xudong Guo, Peiyu Chen, Shi Liang, Zengtao Jiao, Linfeng Li, Jun Yan, Yadong Huang, Yi Liu, Wenhui Fan, 2022, Medical Decision Making)
- A Multi-agent Architecture for an Intelligent Website in Insurance(C. Jonker, Jan Treur, Remco A. Lam, 1999, Lecture Notes in Computer Science)
- Embodied, Intelligent Communication for Multi-Agent Cooperation(Esmaeil Seraj, 2023, Proceedings of the AAAI Conference on Artificial Intelligence)
- Villanelle: An Authoring Tool for Autonomous Characters in Interactive Fiction(Chris Martens, Owais Iqbal, 2019, Lecture Notes in Computer Science)
- A Formal Model to Integrate Behavioral and Structural Adaptations in Self-adaptive Systems(Narges Khakpour, J. Kleijn, M. Sirjani, 2019, Lecture Notes in Computer Science)
- Multi-agent system of IT project planning(O. Dunets, Carsten Wolff, A. Sachenko, Grygoriy Hladiy, I. Dobrotvor, 2017, 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS))
- Toward Agentic AI Networking in 6G: A Generative Foundation Model-as-Agent Approach(Yong Xiao, Guangming Shi, Ping Zhang, 2025, IEEE Communications Magazine)
- A social path to human-like artificial intelligence(Edgar A. Duéñez-Guzmán, Suzanne Sadedin, Jane X. Wang, Kevin R. McKee, Joel Z. Leibo, 2023, Nature Machine Intelligence)
- InterACTE: Improvising with a Virtual Actor(Dimitrios Batras, Judith Guez, Jean-François Jégo, 2016, Proceedings of the 3rd International Symposium on Movement and Computing)
- An intelligent-agent architecture for flexible service integration on the web(Eleni Stroulia, M. Hatch, 2003, IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews))
- Evolution of a Complex Predator-Prey Ecosystem on Large-scale Multi-Agent Deep Reinforcement Learning(Jun Yamada, J. Shawe-Taylor, Z. Fountas, 2020, 2020 International Joint Conference on Neural Networks (IJCNN))
- Refining Abstract Specifications into Dangerous Traffic Scenarios(Aren A. Babikian, 2024, Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings)
- DSAC-ICM: A Distributional Reinforcement Learning Framework for Path Planning in 3D Uneven Terrains.(Yixin Zhou, Fan Liu, Zhixiao Liu, Xianghan Ji, Guangqiang Yin, 2026, Sensors (Basel, Switzerland))
- Design and Integration of an Intelligent Agent to a Telemedicine Platform, for the Translation of Exchanges Between Doctor and Patient During Teleconsultation: Methodology of Design and Technological Choices.(Jonathan Kambire, Seydou Golo Barro, Pascal Staccini, 2024, Studies in health technology and informatics)
- Organizational Multi-Agent Architectures for Information Systems(T. Do, Stéphane Faulkner, Manuel Kolp, 2003, International Conference on Enterprise Information Systems)
- Inherently Interpretable Knowledge Representation for a Trustworthy Artificially Intelligent Agent Teaming with Humans in Industrial Environments(V. Galetic, Alistair Nottle, 2022, International Workshop on Artificial Intelligence and Cognition)
- Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models(Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin, 2024, Proceedings of the 32nd ACM International Conference on Multimedia)
- Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents(Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang, 2024, International Conference on Learning Representations)
- Interactive Debugging and Steering of Multi-Agent AI Systems(Will Epperson, Gagan Bansal, Victor C. Dibia, Adam Fourney, Jack Gerrits, E. Zhu, Saleema Amershi, 2025, Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems)
- Explainable Artificial Intelligence: Evaluating the Objective and Subjective Impacts of xAI on Human-Agent Interaction(Andrew Silva, Mariah L. Schrum, Erin Hedlund-Botti, N. Gopalan, M. Gombolay, 2022, International Journal of Human–Computer Interaction)
- Enhancing LLM Agent Safety via Causal Influence Prompting(Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee, 2025, Annual Meeting of the Association for Computational Linguistics)
- Driving Style Alignment for LLM-powered Driver Agent(Ruoxuan Yang, Xinyu Zhang, Anais Fernandez-Laaksonen, Xin Ding, Jiangtao Gong, 2024, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Design and implementation of an intelligent product agent architecture in manufacturing systems(Ilya Kovalenko, K. Barton, D. Tilbury, 2017, 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA))
- An intelligent agent-based distributed architecture for Smart-Grid integrated network management(A. García, J. Oliver, D. Gosch, 2010, IEEE Local Computer Network Conference)
- Do Intelligent Robots Need Emotion?(Luiz Pessoa, 2017, Trends in cognitive sciences)
- Intelligent task planning and action selection of a mobile robot in a multi-agent system through a fuzzy neural network approach(K. Jolly, R. S. Kumar, R. Vijayakumar, 2007, Engineering Applications of Artificial Intelligence)
- Model integration in agent-oriented development(Rubén Fuentes-Fernández, J. Gómez-Sanz, J. Pavón, 2007, International Journal of Agent-Oriented Software Engineering)
- An intelligent agent based defense architecture for DDoS attacks(M. Duraipandian, C. Palanisamy, 2014, 2014 International Conference on Electronics and Communication Systems (ICECS))
- A Goal-Based Organizational Perspective on Multi-agent Architectures(Manuel Kolp, P. Giorgini, J. Mylopoulos, 2001, Lecture Notes in Computer Science)
- Development of a Robot Agent for Interactive Assembly(Jainwei Zhang, Y. Collani, A. Knoll, 1998, Distributed Autonomous Robotic Systems 3)
- Mapping of cognitive radio as intelligent agent architecture(Irfan Siddavatm, 2011, 2011 2nd International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE))
- Java-Based Distributed Intelligent Agent Architecture for Building Safety-Critical Tele-Inspection Systems on the Internet(Jae-Chul Moon, Soon-Ju Kang, N. Park, 2000, Lecture Notes in Computer Science)
- A hierarchical fuzzy-genetic multi-agent architecture for intelligent buildings online learning, adaptation and control(H. Hagras, V. Callaghan, M. Colley, G. Clarke, 2003, Information Sciences)
- Mario Becomes Cognitive.(Fabian Schrodt, Jan Kneissler, Stephan Ehrenfeld, Martin V Butz, 2017, Topics in cognitive science)
- Active Inference and Intentional Behavior.(Karl J Friston, Tommaso Salvatori, Takuya Isomura, Alexander Tschantz, Alex Kiefer, Tim Verbelen, Magnus Koudahl, Aswin Paul, Thomas Parr, Adeel Razi, Brett J Kagan, Christopher L Buckley, Maxwell J D Ramstead, 2025, Neural computation)
- Artificial Intelligent Agent Architecture and Clinical Decision-Making in the Healthcare Sector(Kian A. Huang, Haris K Choudhary, Paul C. Kuo, 2024, Cureus)
- Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving(Haochen Liu, Li Chen, Yu Qiao, Chen Lv, Hongyang Li, 2024, Neural Information Processing Systems)
- Building Intelligent Embodied AI Agents for Asking Clarifying Questions(Dhruvil Lakhtaria, Radhika Chhabra, Rohit Taparia, Anand Kumar M, 2023, 2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS))
- Particle swarm optimization based co-operative task assignment and path planning for multi-agent system(Sumana Biswas, S. Anavatti, M. Garratt, 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI))
- Intelligent QoS Agent Design for QoS Monitoring and Provisioning in 6G Network(S. T. Arzo, P. M. Tshakwanda, Y. M. Worku, Harsh Kumar, Michael Devetsikiotis, 2023, ICC 2023 - IEEE International Conference on Communications)
- Agent-Based Intelligent Interface for Wheelchair Movement Control.(Alberto L Barriuso, Javier Pérez-Marcos, Diego M Jiménez-Bravo, Gabriel Villarrubia González, Juan F De Paz, 2018, Sensors (Basel, Switzerland))
- An argumentation-oriented multi-agent system for automating the freight planning process(Harry K. H. Chow, Winson Siu, Chi-Kong Chan, Henry C. B. Chan, 2013, Expert Systems with Applications)
- GridLAB-D: An Agent-Based Simulation Framework for Smart Grids(D. Chassin, J. Fuller, N. Djilali, 2014, Journal of Applied Mathematics)
- Multi-hazard hospital evacuation planning during disease outbreaks using agent-based modeling.(Fardad Haghpanah, Kimia Ghobadi, Benjamin W Schafer, 2021, International journal of disaster risk reduction : IJDRR)
- Route planning model of multi-agent system for a supply chain management(Mortaza Zolfpour Arokhlo, A. Selamat, S. Hashim, 2013, Expert Systems with Applications)
- The architecture of Agent-Based Intelligent Tutoring System for the learning of software engineering Function Point Metrics(Aedah Binti Abd. Rahman, Munaisyah Abdullah, Siti Hajar Alias, 2016, 2016 2nd International Symposium on Agent, Multi-Agent Systems and Robotics (ISAMSR))
- Hippocampal formation-inspired probabilistic generative model.(Akira Taniguchi, Ayako Fukawa, Hiroshi Yamakawa, 2022, Neural networks : the official journal of the International Neural Network Society)
- A multi-agent architecture for supporting distributed normality-based intelligent surveillance(David Vallejo, J. Albusac, J. J. Castro-Schez, C. González-Morcillo, Luis Jiménez, 2011, Engineering Applications of Artificial Intelligence)
- Toward Self-Aware Robots.(Raja Chatila, Erwan Renaudo, Mihai Andries, Ricardo-Omar Chavez-Garcia, Pierre Luce-Vayrac, Raphael Gottstein, Rachid Alami, Aurélie Clodic, Sandra Devin, Benoît Girard, Mehdi Khamassi, 2018, Frontiers in robotics and AI)
- Engineering LLM Powered Multi-Agent Framework for Autonomous CloudOps(Kannan Parthasarathy, Karthik Vaidhyanathan, Rudra Dhar, Venkat Krishnamachari, Basil Muhammed, Adyansh Kakran, Sreemaee Akshathala, Shrikara Arun, Sumant Dubey, Mohan Veerubhotla, Amey Karan, 2025, 2025 IEEE/ACM 4th International Conference on AI Engineering – Software Engineering for AI (CAIN))
- Haptic perception using optoelectronic robotic flesh for embodied artificially intelligent agents.(Jose A Barreiros, Artemis Xu, Sofya Pugach, Narahari Iyengar, Graeme Troxell, Alexander Cornwell, Samantha Hong, Bart Selman, Robert F Shepherd, 2022, Science robotics)
- Spontaneous Emergence of Agent Individuality Through Social Interactions in Large Language Model-Based Communities.(Ryosuke Takata, Atsushi Masumori, Takashi Ikegami, 2024, Entropy (Basel, Switzerland))
- Multi-Agent-Based Urban Vegetation Design.(Ahmed Khairadeen Ali, Hayub Song, One Jae Lee, Eun Seok Kim, Haneen Hashim Mohammed Ali, 2020, International journal of environmental research and public health)
- Mathematics of multi-agent learning systems at the interface of game theory and artificial intelligence(Long Wang, Feng Fu, Xingru Chen, 2024, Science China Information Sciences)
- On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)(Vishal Pallagani, Kaushik Roy, Bharath Muppasani, F. Fabiano, A. Loreggia, K. Murugesan, Biplav Srivastava, F. Rossi, L. Horesh, Amit P. Sheth, 2024, Proceedings of the International Conference on Automated Planning and Scheduling)
- An intelligent agent-based architecture for strategic information system applications(M. A. Shirazi, Javad Soroor, 2007, Knowledge-Based Systems)
- A distributed nanocluster based multi-agent evolutionary network.(Liying Xu, Jiadi Zhu, Bing Chen, Zhen Yang, Keqin Liu, Bingjie Dang, Teng Zhang, Yuchao Yang, Ru Huang, 2022, Nature communications)
- Agent-based modeling of high-resolution household electricity demand profiles: A novel tool for policy evaluating(Yu Wang, Haiyang Lin, Yiling Liu, R. Wennersten, Qie Sun, 2017, 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2))
- A multi-agent system for integrated scheduling and maintenance planning of the flexible job shop(Manojkumar Pal, M. L. Mittal, G. Soni, S. Chouhan, 2023, Computers & Operations Research)
- Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning(Alexey Skrynnik, A. Andreychuk, Maria Nesterova, K. Yakovlev, Aleksandr Panov, 2023, AAAI Conference on Artificial Intelligence)
- Dynamic traffic signal control using mean field multi‐agent reinforcement learning in large scale road‐networks(Tian-Meng Hu, Zhiqun Hu, Zhaoming Lu, X. Wen, 2023, IET Intelligent Transport Systems)
本报告对智能体(Agent)研究领域的文献进行了系统性归纳,划分为四个核心维度:一是以大语言模型(LLM)为核心的任务规划与协作智能体,侧重于推理与复杂场景应用;二是以多智能体强化学习(MARL)为主的复杂系统控制,侧重于大规模环境下的协作优化;三是具身智能与人机交互,强调智能体在医疗、心理健康等领域的社交化情感表达与交互体验;四是通用智能体架构、基础认知理论与安全性研究,涵盖了系统的底层构建方法、社会动力学仿真及应对AI信任与风险的保障机制。整体呈现出从单一算法优化向模型驱动、具身感知与安全可信并重的协同演进趋势。
总计331篇相关文献
This paper examines the decision-making processes of physicians and intelligent agents within the healthcare sector, particularly focusing on their characteristics, architectures, and approaches. We provide a theoretical insight into the evolving role of artificial intelligence (AI) in healthcare, emphasizing its potential to address various healthcare challenges. Defining features of intelligent agents are explored, including their perceptual abilities and behavioral properties, alongside their architectural frameworks, ranging from reflex-based to general learning agents, and contrasted with the rational decision-making structure employed by physicians. Through data collection, hypothesis generation, testing, and reflection, physicians exhibit a nuanced approach informed by adaptability and contextual understanding. A comparative analysis between intelligent agents and physicians reveals both similarities and disparities, particularly in adaptability and contextual comprehension. While intelligent agents offer promise in enhancing clinical decisions, challenges with types of dataset biases pose significant hurdles. Informing and educating physicians about AI concepts can build trust and transparency in intelligent programs. Such efforts aim to leverage the strengths of both human and AI toward improving healthcare delivery and outcomes.
No abstract available
Abstract Multi-agent systems enable the division of complicated tasks into individual objects that can cooperate. Such architecture can be useful in building solutions in the Internet of Medical Things (IoMT). In this paper, we propose an architecture of such a system that ensures the security of private data, as well as allows the addition and/or modification of the used classification methods. The main advantages of the proposed system are based on the implementation of blockchain technology elements and threaded federated learning. The individual elements are located on the agents who exchange information. Additionally, we propose building an agent with a consortium mechanism for classification results from many machine learning solutions. This proposal offers a new model of agents that can be implemented as a system for processing medical data in real-time. Our proposition was described and tested to present advantages over other, existing state-of-the-art methods. We show, that this proposition can improve the Internet of Medical Thing solutions by presenting a new idea of a multi-agent system that can separate different tasks like security, or classification and as a result minimize operation time and increase accuracy.
Abstract The consequences of natural or man-made catastrophes can be devastating. To minimize its impact, it is crucial to carry out a rapid analysis of the affected environment in the moments after they occur, especially from the perspective of alert notification or crisis management. In this context, the use of UAVs, understood as the technological basis on which intelligent systems capable of providing support to rescue teams is built, has positively contributed to face this challenge. In this article the design of a multi-agent architecture which enables the deployment of systems made up of intelligent agents that can monitor environments affected by a catastrophe and provide support to human staff in the decision-making process is proposed. These environments, known in advance, are characterized through a set of points of interests that are critical from the point of view of aerial surveillance and monitoring. To conduct an intelligent information analysis, a formal model of normality analysis is employed, which makes possible the definition of surveillance components. These represent the knowledge bases of the agents responsible for monitoring environments. Likewise, the architecture envisages communication and cooperation mechanisms between the different agents, as the basis for fusing information to assess the overall level of risk of the monitored environment. A case study is presented in which the spread of toxic smoke in an industrial complex which has just suffered a hypothetical earthquake is monitored.
No abstract available
No abstract available
Development a BDI-Based Intelligent Agent Architecture for Distribution Systems Restoration Planning
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
The North Atlantic Treaty Organization (NATO) Research Task Group IST-152 developed a concept and a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military assets. The group released a detailed report, briefly reviewed in this article, where such an agent is referred to as an Autonomous Intelligent Cyber-defense Agent (AICA). In a conflict with a technically sophisticated adversary, NATO military networks will operate in a heavily contested battlefield. Enemy malware will likely infiltrate and attack friendly networks and systems. Today’s reliance on human cyber defenders will be untenable on the future battlefield. Instead, artificially intelligent agents, such as AICAs, will be necessary to defeat the enemy malware in an environment of potentially disrupted communications where human intervention may not be possible. The IST-152 group identified specific capabilities of AICA. For example, AICA will have to be capable of autonomous planning and execution of complex multi-step activities for defeating or degrading sophisticated adversary malware, with the anticipation and minimization of resulting side effects. It will have to be capable of adversarial reasoning to battle against a thinking, adaptive malware. Crucially, AICA will have to keep itself and its actions as undetectable as possible, and will have to use deceptions and camouflage. The report identifies the key functions and components and their interactions for a potential reference architecture of such an agent, as well as a tentative roadmap toward the capabilities of AICA.
Nowadays, network users' demands for low latency services and individual data security grow rapidly. In this article, we conceive a new communication network architecture based on the concept of intelligent agents. The proposed intelligent agent serves as a virtual secretary of the corresponding user to request desired service from Internet companies or neighboring agents, which can reduce Internet surfing latency and prevent privacy leakage. Moreover, to improve the forwarding efficiency of data packages, a new mixed routing structure and the AI router are also introduced, to support various on-demand services with low latency. Overall, the proposed communication network architecture is expected to be more efficient than the existing one, and inspire readers to deliberate the revolution of future communications networks.
Future networks such as 6G are projected to incorporate in-network intelligence toward achieving a zero-touch network. In this regard, several approaches proposed for the organizational architecture of future networks. Similar to microservice-based service design, a multi-agent-based network automation architecture was proposed as a competing paradigm for service-oriented architecture. The proposed architecture outlines the design guideline for intelligent network systems using agents as atomic and autonomous service units that can be used as building blocks. As a continuation of this approach, we design a Quality of Service (QoS) agent to control and manage stringent services such as remote surgery. QoS agent is intelligent that can capture and respond proactively to network traffic and workload distribution, showing hourly and seasonal patterns. QoS agents can be used as a building block along with traffic classification and traffic prediction agents for an intelligent networking system. For evaluation, a campus network is designed using a NetSim environment considering a three-tier network architecture. The QoS agent dynamically finds the best path for a particular service depending on the requirements. The agent communicates with the traffic prediction agent and traffic classifier agent to collect information about the network and services. Moreover, the QoS agent also observes the network states. Using these values along with existing network topology knowledge, it ranks the available paths to proactively allocate the best path for a service. Evaluation results suggest that with the appropriate accuracy of traffic prediction, the proposed approach can autonomously adapt in allocating a path for a given service.
We consider the problem of multi-agent navigation and collision avoidance when observations are limited to the local neighborhood of each agent. We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. Specifically, InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm. We show that (1) in training, InforMARL has better sample efficiency and performance than baseline approaches, despite using less information, and (2) in testing, it scales well to environments with arbitrary numbers of agents and obstacles. We illustrate these results using four task environments, including one with predetermined goals for each agent, and one in which the agents collectively try to cover all goals. Code available at https://github.com/nsidn98/InforMARL.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
Over the last few years, machine learning and data mining methods (MLDM) are constantly evolving, in order to accelerate the process of knowledge discovery from data (KDD). Today's challenge is to select only the most relevant knowledge from those extracted. The present paper is directed to these purposes, by developing a new concept of knowledge mining for meta-knowledge extraction, and extending the most popular machine learning methods to extract meta-models. This new concept of knowledge classification is integrated on the cognitive agent architecture, so as to speed-up its inference process. With this new architecture, the agent will be able to select only the actionable rule class, instead of trying to infer its whole rule base exhaustively.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
This paper presents a model to define heterogeneous agents that solve problems by sharing the knowledge retrieved from the AND cooperating among them. The control structure of those agents is based on a general purpose Multi-Agent architecture (SKELETONAGENT) based on a deliberative approach. Any agent in the architecture is built by means of several interrelated modules: control module, language and communication module, skills modules, knowledge base, yellow pages, etc. … The control module uses an agenda to activate and coordinate the agent skills. This agenda handles actions from both the internal goals of the agent and from other agents in the environment. In the paper, we show a high level agent model, which is later instantiated to build a set of heterogeneous specialized agents. The paper describes how SKELETONAGENT has been used to implement different kinds of agents and a specialized Multi-Agent System (MAS). The implemented MAS, MAPWEB-ETOURISM, is the specific implementation of a general WEB gathering architecture, named MAPWEB, which extends SKELETONAGENT. MAPWEB has been designed to solve problems in WEB domains through the integration of information gathering and planning techniques. The MAPWEB-ETOURISM system has been applied to a specific WEB domain (e-tourism) which uses information gathered directly from several WEB sources (plane, train, and hotel companies) to solve travel problems. This paper shows how the proposed architecture allows to integrate the different agents tasks with AI techniques like planning to build a MAS which is able to gather and integrate information retrieved from the WEB to solve problems.
No abstract available
No abstract available
Abstract The paper presents the formalism of an intelligent decision-making system based on multi-agent neurocognitive architectures, which has an architectural similarity to the human brain. An invariant of the organizational and functional structure of the intellectual decision-making process based on the multi-agent neurocognitive architecture is developed. An algorithm for teaching intelligent decision-making systems based on the self-organization of the invariant of multi-agent neurocognitive architectures is presented. Using this algorithm, an intelligent agent was trained and the architecture of the learning process was built on the basis of an invariant of neurocognitive architecture. Further research is related to training an intelligent agent in more complex behavior and expanding the capabilities of an intelligent decision-making system based on multi-agent neurocognitive architectures.
No abstract available
With the rapid development of underwater robots, underwater communication techniques, etc., the Autonomous Underwater Vehicle (AUV) cluster network has emerged as a candidate paradigm to perform underwater civil and military applications, e.g., underwater target tracking. In this paper, we focus on how to utilize networking and multi-agent artificial intelligence technique to improve underwater target tracking. In particular, to improve the flexibility and scalability of the AUV cluster network, we employ Software-Defined Networking (SDN) and Centralized Training with Decentralized Execution (CTDE)-based Multi-Agent Reinforcement Learning (MARL) technologies, to propose a Hierarchical Software-Defined Multiple AUVs Reinforcement Learning (HSD-MARL) framework. For the MARL mechanism in HSD-MARL, we propose an advantage-attention mechanism and present the architecture of Multi-AUV Advantage-Attention Actor-Critic (MA-A3C), to address slow convergence and poor scalability issues on the AUV cluster network of large-scale. Further, to improve the utilization rate of advantage samples especially when the MA-A3C is utilized to perform AUV cluster network-based underwater tracking, we propose an ‘advantage resampling’ method based on experience replay buffer. Evaluation results showcase that our proposed approaches can perform exact underwater target tracking based on AUV cluster network systems and outperform some recent research products in terms of convergence speed, tracking accuracy, etc.
The multiple Autonomous Underwater Vehicle (AUV)-assisted cooperative system or the AUV-based Underwater Ad-hoc Networks (UAN) system has been considered as a highly-potential future in underwater data surveillance. In this paper, we propose grid-based distributed data collection architecture and define two categories of navigation modes. Based on the proposed data collection model, we propose MADAC, a scheme based on AUV-based UAN to cooperatively collect data from 6G-driven underwater wireless networks. We utilize the Software-Defined Networking (SDN) technique to re-organize the architecture of AUV-based UAN and propose software-defined actor-critic MARL framework. Based on the proposed MARL framework, we present the paradigm of MADDPG algorithm with optimal similarity attention mechanism (MADDPG-SA), to plan the paths for the AUV-based UAN, especially the cooperative underwater obstacle avoidance, the task distribution balancing, the Value of Information (VoI) are concurrently taken into account. In particular, the proposed MADDPG-SA improves the running efficiency of the proposed MADDPG-SA by encouraging the agent to learn from the similar and better-performance agent. The evaluation results demonstrate that the proposed MADAC can schedule the AUV-based UAN to perform efficient underwater data collection, reduce data collection time and energy consumption, and balance data collection tasks in the AUV-based UAN.
No abstract available
Increasingly, artificial intelligence (AI) is being used to support automotive systems, including autonomous vehicles (AVs) with self-driving capabilities. The premise is that learning-enabled systems (LESs), those systems that have one or more AI components, use statistical models to make better informed adaptation decisions and mitigate potentially dangerous situations. These AI techniques largely focus on uncertainty factors that can be explicitly identified and defined (e.g., environmental conditions). However, the unexpected behavior of human actors is a source of uncertainty that is challenging to explicitly model and define. In order to train a learning-enabled AV, developers may use a combination of realworld monitored data and simulated external actor behaviors (e.g., human-driven vehicles, pedestrians, etc.), where participants follow defined sets of rules such as traffic laws. However, if uncertain human behaviors are not sufficiently captured during training, then the AV may not be able to safely handle unexpected behavior induced by human-operated vehicles (e.g., unexpected sudden lane changes). This work introduces a non-cooperative game theory and reinforcement learning-based (RL) framework to discover and assess an AV's ability to handle high-level uncertain behavior(s) induced by human-based rewards. The discovered synthetic data can then be used to reconfigure the AV to robustify onboard behaviors.
Software-defined vehicles (SDVs) leverage vehicle-to-everything (V2X) communication to enable advanced connectivity and autonomous driving capabilities. However, this increased interconnectivity also exposes them to cyber threats such as spoofing, denial-of-service attacks, and data manipulation, making intrusion detection systems (IDS) essential for ensuring SDV security and reliability. In this work, we propose a novel intrusion mitigation approach that integrates Advantage Actor-Critic (A2C) reinforcement learning with a Long Short-Term Memory (LSTM) network to detect anomalies and intrusions in V2X communications. The LSTM component captures temporal dependencies in V2X data, enhancing the model's ability to identify emerging attack patterns, while the A2C framework dynamically adjusts defensive actions, including flagging, blocking or monitoring traffic, based on evolving threat levels. Experimental results demonstrate the model's effectiveness, achieving high detection accuracy and sensitivity. Additionally, we analyze how the system adapts over time, becoming more confident in its decision-making and optimizing security enforcement. This work enhances SDV cybersecurity by introducing a learning-based adaptive intrusion response system aiming at mitigating threats in highly dynamic vehicular networks.
The integration of artificial intelligence in computer vision tasks has significantly transformed computing technology, especially in applications such as autonomous vehicles, medical image diagnosis, and video surveillance systems. However, multi-object tracking, a key component of modern computer vision, faces challenges such as object occlusion, changing object appearance, and complex interactions with the background. This paper discusses the improvements in multi-object tracking achieved by integrating multi-agent reinforcement learning with modern tracking methods. A novel tracking system is proposed that uses the deep deterministic policy gradient algorithm for multi-agent systems in combina- tion with the YOLO v7 object detector. The system uses centralized learning with decentralized execution and the Actor- Critic architecture to dynamically and accurately track multiple objects in real time. The proposed method improves the tracking performance by improving the coordination between agents and efficiently handling the dynamics of the environ- ment. Experiments conducted on the MOT16 dataset showed significant improvements in tracking accuracy and processing speed, making this method suitable for a variety of applications.
No abstract available
The exploitation/utilization of marine resources and the rapid development of urbanization along coastal cities result in serious marine pollution, especially underwater diffusion pollution. It is a non-trivial task to detect the source of diffusion pollution, such that the disadvantageous effect of the pollution can be reduced. With the vision of 6G framework, we employ Autonomous Underwater Vehicle (AUV) flock and introduce the concept of AUV-based network. In particular, we utilize the Software-Defined Networking (SDN) technique to update the controllability of the AUV-based network, leading to the paradigm of SDN-enabled multi-AUVs network Intelligent Transportation Systems (SDNA-ITS). For SDNA-ITS, we utilize artificial potential field theories to model the control model. To optimize the system output, we introduce the graph-based Soft Actor-Critic (SAC) algorithm, i.e., a category of Multi-Agent Reinforcement Learning (MARL) mechanism where each AUV can be regarded as a node in a graph. In particular, we improve the optimization model based on Centralized Training Decentralized Execution (CTDE) architecture with the assistance of the SDN controller, by which each AUV can efficiently adjust its speed towards the diffusion source. Further, to achieve exact path planning for detecting the diffusion source, a dynamic detection scheme is proposed to output the united control policy to schedule the SDNA-ITS dynamically. Simulation results demonstrate that our approaches are available to detect the underwater diffusion source when the actual scenario is taken into account and perform better than some recent research products.
With the rapid development of the underwater internet of things (UIoT), underwater security challenges, especially the problem of illegal invasion, are becoming increasingly prominent, threatening the security of underwater environments and infrastructures. In this article, we propose an approach combining software defined networking (SDN) and multi-agent deep reinforcement learning (MADRL) to improve the efficiency of cooperative tracking of Autonomous Underwater Vehicle (AUV) systems in complex underwater environments. By leveraging SDN technology, this novel approach achieves centralized control and management of multi-AUV systems. Then, we construct a trajectory prediction network based on the attention mechanism and long short-term memory (LSTM) to generate the target trajectories and provide prior knowledge for collaborative pursuit. Finally, a novel MADRL method combining bidirectional LSTM and multiagent soft actor-critic is proposed to make real-time, accurate, distributed, and adaptive pursuit decisions when a part of AUVs break down. The experimental results demonstrate that the proposed methods can pursue the target successfully at the fastest speed, as compared to the latest MADRL methods.
Safety assurance of autonomous vehicles (AVs) is particularly challenging when considering the infinite number of scenarios an AV may encounter. As such, existing scenario generation approaches optimize search to derive dangerous refinements of a (same) abstract scenario given as input. In this paper, we propose a scenario generation approach that derives dangerous (collision-inducing) concrete scenarios from arbitrary abstract scenarios (under reasonable assumptions). As added novelty, our approach allows to compare the level of danger offered by different abstract scenar-ios. We evaluate the collision avoidance capacity of the Transfuser AV controller by generating, then simulating, collision-inducing 2-actor scenarios at a road junction. Results show that distinctions at higher abstraction levels yield measurable differences in simulation.
Model-free control has gained in popularity in recent years for its adaptive, data-driven solutions to complex control objectives. This paper discusses the application of model-free reinforcement learning (RL) to the control of multibody aerial vehicles. First, a summary of model-free RL as it applies to robotic systems is presented. Then, a model-free RL controller is demonstrated, whereby an actor-critic algorithm is used to stabilize the swing of a slung payload carried by an autonomous quadcopter aerial vehicle, utilizing the latter’s full continuous action-space. Its effectiveness and adaptability are first demonstrated in simulation, and then validated on an experimental testbed. The algorithm is shown to be computationally efficient enough to adapt to the experimental testbed in real-time, and to work within the framework of widely-used autopilot software ArduPilot.
The rapid development of intelligent underwater devices promotes marine exploitation activities, including marine resource exploitation, marine target tracking, etc. This work will present how to utilize the Autonomous Underwater Vehicle (AUV) swarm or multi-AUVs system to track the underwater diffusion pollution, especially the equipotential line of particular concentration. Different from most of the current research, in this work, we take the AUV swam as a network system and utilize the Software-Defined Networking (SDN) technique to optimize the network architecture, constructing an SDN-enabled AUV network Intelligent Transportation Systems (ITS). With the centralized management ability of the SDN technique, we propose the software-defined Centralized Training Decentralized Execution (CTDE) architecture based on the graph-based Soft Actor-Critic (SAC) algorithm to optimize the system control and management. To improve the computing and training efficiency, we embed the self-attention mechanism into the critic network construction, leading to a self-attention-based SAC algorithm. Evaluation results demonstrate that our proposed approach is able to exactly track the equipotential lines of a particular concentration in many categories (with different types of equipotential lines (including the shape, noise, and diffusion value)) of underwater diffusion fields. Meanwhile, our proposed approaches outperform some classical schemes in system awards, tracking errors, etc.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
Traditional approaches for steering a vehicle using machine vision require large amounts of robust hand-crafted software which is both time consuming and expensive. The presented method uses a deep neural network to teach cars to steer themselves without any additional software. We created a labeled dataset for the ACTor (Autonomous Campus TranspORt) electric vehicle by pairing real world images taken during a drive with the associated steering wheel angle. We trained a model end to end using modern deep learning techniques including convolutional neural networks and transfer learning to automatically detect relevant features in the input and provide a predicted output. This means that no traditional hand engineered algorithm features were required for this implementation. We currently use an pretrained inception network on the ImageNet dataset to leverage the high level features learned from ImageNet to the steering problem through transfer learning. We removed the top portion of the network and replaced it with a linear regression node to provide the output. The model is trained end to end using backpropagation. The trained model is integrated with vehicle software on ROS (Robot Operating System) to read image data and provide a corresponding steering angle in real time. The current model achieves 15.2 degree error on average. As development continues the model may replace the current lane centering software and will be used for IGVC Self-Drive competition and campus transportation.
No abstract available
An approach for modelling adaptive complex systems should be flexible and scalable to allow a system to grow easily, and should have a formal foundation to guarantee the correctness of the system behavior. In this paper, we present the architecture, and formal syntax and semantics of HPobSAM which is a model for specifying behavioral and structural adaptations to model large-scale systems and address re-usability concerns. Self-adaptive modules are used as the building blocks to structure a system, and policies are used as the mechanism to perform both behavioral and structural adaptations. While a self-adaptive module is autonomous to achieve its local goals by collaborating with other self-adaptive modules, it is controlled by a higher-level entity to prevent undesirable behavior. HPobSAM is formalized using a combination of algebraic, graph transformation-based and actor-based formalisms.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
Clinical decision-making in oncology is complex, requiring the integration of multimodal data and multidomain expertise. We developed and evaluated an autonomous clinical artificial intelligence (AI) agent leveraging GPT-4 with multimodal precision oncology tools to support personalized clinical decision-making. The system incorporates vision transformers for detecting microsatellite instability and KRAS and BRAF mutations from histopathology slides, MedSAM for radiological image segmentation and web-based search tools such as OncoKB, PubMed and Google. Evaluated on 20 realistic multimodal patient cases, the AI agent autonomously used appropriate tools with 87.5% accuracy, reached correct clinical conclusions in 91.0% of cases and accurately cited relevant oncology guidelines 75.5% of the time. Compared to GPT-4 alone, the integrated AI agent drastically improved decision-making accuracy from 30.3% to 87.2%. These findings demonstrate that integrating language models with precision oncology and search tools substantially enhances clinical accuracy, establishing a robust foundation for deploying AI-driven personalized oncology support systems. Ferber et al. present an autonomous artificial intelligence agent system for deployment of specialized medical oncology computational tools, validating their system across various clinical scenarios representative of typical patient care workflows.
No abstract available
Large language models (LLMs) hold significant promise in the field of medical diagnosis. There are still many challenges in the direct diagnosis of hepatocellular carcinoma (HCC). α-Fetoprotein (AFP) is a commonly used tumor marker for liver cancer. However, relying on AFP can result in missed diagnoses of HCC. We developed an artificial intelligence (AI) agent centered on LLMs, named ChatExosome, which created an interactive and convenient system for clinical spectroscopic analysis and diagnosis. ChatExosome consists of two main components: the first is the deep learning of the Raman fingerprinting of exosomes derived from HCC. Based on a patch-based 1D self-attention mechanism and downsampling, the feature fusion transformer (FFT) was designed to process the Raman spectra of exosomes. It achieved accuracies of 95.8% for cell-derived exosomes and 94.1% for 165 clinical samples, respectively. The second component is the interactive chat agent based on LLM. The retrieval-augmented generation (RAG) method was utilized to enhance the knowledge related to exosomes. Overall, LLM serves as the core of this interactive system, which is capable of identifying users' intentions and invoking the appropriate plugins to process the Raman data of exosomes. This is the first AI agent focusing on exosome spectroscopy and diagnosis, enhancing the interpretability of classification results, enabling physicians to leverage cutting-edge medical research and artificial intelligence techniques to optimize medical decision-making processes, and it shows great potential in intelligent diagnosis.
Evolutionary Game Theory (EGT) and Artificial Intelligence (AI) are two fields that, at first glance, might seem distinct, but they have notable connections and intersections. The former focuses on the evolution of behaviors (or strategies) in a population, where individuals interact with others and update their strategies based on imitation (or social learning). The more successful a strategy is, the more prevalent it becomes over time. The latter, meanwhile, is centered on machine learning algorithms and (deep) neural networks. It is often from a single-agent perspective but increasingly involves multi-agent environments, in which intelligent agents adjust their strategies based on feedback and experience, somewhat akin to the evolutionary process yet distinct in their self-learning capacities. In light of the key components necessary to address real-world problems, including (i) learning and adaptation, (ii) cooperation and competition, (iii) robustness and stability, and altogether (iv) population dynamics of individual agents whose strategies evolve, the cross-fertilization of ideas between both fields will contribute to the advancement of mathematics of multi-agent learning systems, in particular, to the nascent domain of ``collective cooperative intelligence'' bridging evolutionary dynamics and multi-agent reinforcement learning.
We discuss the emerging new opportunity for building feedback‐rich computational models of social systems using generative artificial intelligence. Referred to as generative agent‐based models (GABMs), such individual‐level models utilize large language models to represent human decision‐making in social settings. We provide a GABM case in which human behavior can be incorporated into simulation models by coupling a mechanistic model of human interactions with a pre‐trained large language model. This is achieved by introducing a simple GABM of social norm diffusion in an organization. For educational purposes, the model is intentionally kept simple. We examine a wide range of scenarios and the sensitivity of the results to several changes in the prompt. We hope the article and the model serve as a guide for building useful dynamic models of various social systems that include realistic human reasoning and decision‐making. © 2024 System Dynamics Society.
No abstract available
ChatMOF is an artificial intelligence (AI) system that is built to predict and generate metal-organic frameworks (MOFs). By leveraging a large-scale language model (GPT-4, GPT-3.5-turbo, and GPT-3.5-turbo-16k), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid and formal structured queries. The system is comprised of three core components (i.e., an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generations. ChatMOF shows high accuracy rates of 96.9% for searching, 95.7% for predicting, and 87.5% for generating tasks with GPT-4. Additionally, it successfully creates materials with user-desired properties from natural language. The study further explores the merits and constraints of utilizing large language models (LLMs) in combination with database and machine learning in material sciences and showcases its transformative potential for future advancements. LLMs can be augmented with tools to increase their capabilities. Here, authors have developed an artificial intelligence system called ChatMOF combining LLMs and specialised libraries and utilities to predict and generate metal-organic frameworks.
Abstract Intelligent agents must be able to communicate intentions and explain their decision-making processes to build trust, foster confidence, and improve human-agent team dynamics. Recognizing this need, academia and industry are rapidly proposing new ideas, methods, and frameworks to aid in the design of more explainable AI. Yet, there remains no standardized metric or experimental protocol for benchmarking new methods, leaving researchers to rely on their own intuition or ad hoc methods for assessing new concepts. In this work, we present the first comprehensive (n = 286) user study testing a wide range of approaches for explainable machine learning, including feature importance, probability scores, decision trees, counterfactual reasoning, natural language explanations, and case-based reasoning, as well as a baseline condition with no explanations. We provide the first large-scale empirical evidence of the effects of explainability on human-agent teaming. Our results will help to guide the future of explainability research by highlighting the benefits of counterfactual explanations and the shortcomings of confidence scores for explainability. We also propose a novel questionnaire to measure explainability with human participants, inspired by relevant prior work and correlated with human-agent teaming metrics.
Abstract Cloud Computing is one of the most intensively developed solutions for large-scale distributed processing. Effective use of such environments, management of their high complexity and ensuring appropriate levels of Quality of Service (QoS) require advanced monitoring systems. Such monitoring systems have to support the scalability, adaptability and reliability of Cloud. Most of existing monitoring systems do not incorporate any Artificial Intelligence (AI) algorithms for supporting the change inside the task stream or environment itself. They focus only on monitoring or enabling the control of the system as a part of a separated service. An effective monitoring system for the Cloud environment should gather information about all stages of tasks processing and should actively control the monitored environment. In this paper, we present a novel Multi-Agent System based Cloud Monitoring (MAS-CM) model that supports the performance and security of tasks gathering, scheduling and execution processes in large-scale service-oriented environments. Such models are explicitly designed to control the performance and security objectives of the environment. In our work, we focus on prevention of unauthorized task injection and modification, optimization of scheduling process and maximization of resource usage. We evaluate the effectiveness of MAS-CM empirically using an evolutionary driven implementation of Independent Batch Scheduler and FastFlow framework. The obtained results demonstrate the effectiveness of the proposed approach and the performance improvement.
Abstract Applications of wireless sensor networks are blooming for attacking some limits of social development, among which energy consumption and communication latency are fatal. Effective communication traffic control and management is a potential solution, so we propose a novel traffic-control system based on deep reinforcement learning, which regards traffic control as a strategy-learning process, to minimize energy consumption. Our algorithm utilizes deep neural network for learning, inputs the state of wireless sensor network as well as outputs the optimal route path. The simulation experiments demonstrate that our algorithm is feasible to control traffic in wireless sensor network and can reduce the energy consumption.
Traditionally, cognitive and computer scientists have viewed intelligence solipsistically, as a property of unitary agents devoid of social context. Given the success of contemporary learning algorithms, we argue that the bottleneck in artificial intelligence (AI) advancement is shifting from data assimilation to novel data generation. We bring together evidence showing that natural intelligence emerges at multiple scales in networks of interacting agents via collective living, social relationships and major evolutionary transitions, which contribute to novel data generation through mechanisms such as population pressures, arms races, Machiavellian selection, social learning and cumulative culture. Many breakthroughs in AI exploit some of these processes, from multi-agent structures enabling algorithms to master complex games such as Capture-The-Flag and StarCraft II, to strategic communication in the game Diplomacy and the shaping of AI data streams by other AIs. Moving beyond a solipsistic view of agency to integrate these mechanisms could provide a path to human-like compounding innovation through ongoing novel data generation. Advances in machine intelligence often depend on data assimilation, but data generation has been neglected. The authors discuss mechanisms that might achieve continuous novel data generation and the creation of intelligent systems that are capable of human-like innovation, focusing on social aspects of intelligence.
This research aims to demonstrate that artificial intelligence (AI) can function not only as a tool for learning, but also as an intelligent agent with which humans can engage in collaborative learning (CL) to change epistemic practices in science classrooms. We adopted a design and development research approach, following the Analysis, Design, Development, Implementation and Evaluation (ADDIE) model, to prototype a tangible instructional system called Collaborative Learning with AI Speakers (CLAIS). The CLAIS system is designed to have 3–4 human learners join an AI speaker to form a small group, where humans and AI are considered peers participating in the Jigsaw learning process. The development was carried out using the NUGU AI speaker platform. The CLAIS system was successfully implemented in a Science Education course session with 15 pre-service elementary science teachers. The participants evaluated the CLAIS system through mixed methods surveys as teachers, learners, peers, and users. Quantitative data showed that the participants’ Intelligent-Technological, Pedagogical, and Content Knowledge was significantly increased after the CLAIS session, the perception of the CLAIS learning experience was positive, the peer assessment on AI speakers and human peers was different, and the user experience was ambivalent. Qualitative data showed that the participants came to anticipate future changes in the epistemic process in science classrooms, while acknowledging technical issues such as speech recognition and response latency. This study highlights the potential of human-AI collaboration for knowledge co-construction in authentic classroom settings and exemplifies how AI could shape the future landscape of epistemic practices in the classroom.
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM over pairs of captions to construct an intrinsic reward, which is then used to train agents with reinforcement learning. We evaluate Motif's performance and behavior on the challenging, open-ended and procedurally-generated NetHack game. Surprisingly, by only learning to maximize its intrinsic reward, Motif achieves a higher game score than an algorithm directly trained to maximize the score itself. When combining Motif's intrinsic reward with the environment reward, our method significantly outperforms existing approaches and makes progress on tasks where no advancements have ever been made without demonstrations. Finally, we show that Motif mostly generates intuitive human-aligned behaviors which can be steered easily through prompt modifications, while scaling well with the LLM size and the amount of information given in the prompt.
Significance Humans have long demonstrated an ability to learn from interactions with others. However, artificial intelligence (AI) agents learn in social isolation. To create intelligent systems that understand more than a fixed slice of the world, our article formalizes socially situated AI—a framework that enables agents to interact with people as they simultaneously learn new concepts about the world around them. Using our framework, we deploy a field experiment on a photo-sharing social network where our agent interacts with hundreds of thousands of people to learn concepts about the visual world. We combine advances in deep learning, computer vision, natural language processing, and human–computer Interaction to deliver a human-centered AI that learns from interactions with people in social environments.
We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents' optimal polices, but more importantly, the observation and understanding of individual agent's behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.
High-performing human teams leverage intelligent and efficient communication and coordination strategies to collaboratively maximize their joint utility. Inspired by teaming behaviors among humans, I seek to develop computational methods for synthesizing intelligent communication and coordination strategies for collaborative multi-robot systems. I leverage both classical model-based control and planning approaches as well as data-driven methods such as Multi-Agent Reinforcement Learning (MARL) to provide several contributions towards enabling emergent cooperative teaming behavior across both homogeneous and heterogeneous (including agents with different capabilities) robot teams.
This article focuses on the research of artificial intelligence agents based on large language models. These agents break away from the traditional reinforcement learning framework and can achieve internal-driven evolution through their own language generation. The article details several representative research results, including the HPTSA system from the University of Illinois at Urbana-Champaign, which adopts a hierarchical planning and task-specific agent collaboration model and significantly improves efficiency in zero-day vulnerability attacks, outperforming single-agent systems and open-source vulnerability scanners; the BattleAgent multimodal dynamic simulation system from Rutgers University, which can simulate the complex dynamic interactions of agents and provide support for historical battle reenactments; the WarAgent multi-agent simulation system from Rutgers and the University of Michigan, which can simulate international conflict events to explore factors related to war and peace; the general embodied intelligent agent research and series projects from NVIDIA, which promote the development of embodied intelligence; the "Unified Agent" framework from DeepMind of Google, which alleviates some drawbacks of traditional reinforcement learning technology; and the "Smallville" platform from Stanford University and Google's Artificial Intelligence Research Institute, as well as the Dynalang intelligent agent from the University of California, Los Angeles. These studies demonstrate the powerful capabilities and wide application prospects of artificial intelligence agents empowered by large language models in various fields.
Delivering intelligent and adaptive navigation assistance in augmented reality (AR) requires more than visual cues, as it demands systems capable of interpreting flexible user intent and reasoning over both spatial and semantic context. Prior AR navigation systems often rely on rigid input schemes or predefined commands, which limit the utility of rich building data and hinder natural interaction. In this work, we propose an embodied AR navigation system that integrates Building Information Modeling (BIM) with a multi-agent retrieval-augmented generation (RAG) framework to support flexible, language-driven goal retrieval and route planning. The system orchestrates three language agents, Triage, Search, and Response, built on large language models (LLMs), which enables robust interpretation of open-ended queries and spatial reasoning using BIM data. Navigation guidance is delivered through an embodied AR agent, equipped with voice interaction and locomotion, to enhance user experience. A real-world user study yields a System Usability Scale (SUS) score of 80.5, indicating excellent usability, and comparative evaluations show that the embodied interface can significantly improves users' perception of system intelligence. These results underscore the importance and potential of language-grounded reasoning and embodiment in the design of user-centered AR navigation systems. Video demonstrations are available at https://woven-visionai.github.io/ar-navigation-agent-project.
In intelligent transportation systems, the convergence of large language models and embodied artificial intelligence (AI) has led to Vehicular Embodied AI Networks (VEANs). In VEANs, the integration of digital twins results in Vehicular Embodied Agent AI Twins (VEAATs), which enables AI assistants for in-vehicle application services. Constrained by computational latency and limited on-board computing resources, autonomous vehicles (AVs) must offload computationally intensive VEAATs to Roadside Units (RSUs) for remote execution support with substantial resources. Considering the limited coverage of RSUs and the mobility of AVs, the migration of VEAATs among RSUs is crucial for ensuring continuous and high-quality immersive services. Nevertheless, high-density traffic exacerbates the workload imbalance among RSUs, increasing the risk of overload. This paper models the interaction between AVs and RSUs as a Stackelberg game to optimize bandwidth resource allocation for efficient task migration. We employ a multi-agent deep reinforcement learning algorithm to approximate the Stackelberg Equilibrium (SE) and address the workload imbalance issue. Specifically, we propose Path eXclusion-based Multi-Agent Proximal Policy Optimization (PX-MAPPO), which integrates the PX-based neural network pruning algorithm with the MAPPO algorithm. Numerical results demonstrate that this algorithm reduces the number of network parameters, decreases model complexity, and minimizes performance loss.
Much of our everyday, embodied action comes in the form of smooth coping. Smooth coping is skillful action that has become habituated and ingrained, generally placing less stress on cognitive load than considered and deliberative thought and action. When performed with skill and expertise, walking, driving, skiing, musical performances, and short-order cooking are all examples of the phenomenon. Smooth coping is characterized by its rapidity and relative lack of reflection, both being hallmarks of automatization. Deliberative and reflective actions provide the contrast case. In Dreyfus’ classic view, smooth coping is “mindless” absorption into action, being in the flow, and any reflective thought will only interrupt this flow. Building on the pragmatist account of Dewey, others, such as Sutton, Montero, and Gallagher, insist on the intelligent flexibility built into smooth coping, suggesting that it is not equivalent to automatization. We seek to answer two complementary challenges in this article. First, how might we model smooth coping in autonomous agents (natural or artificial) at fine granularity? Second, we use this model of smooth coping to show how we might implement smooth coping in artificial intelligent agents. We develop a conceptual model of smooth coping in LIDA (Learning Intelligent Decision Agent). LIDA is an embodied cognitive architecture implementing the global workspace theory of consciousness, among other psychological theories. LIDA’s implementation of consciousness enables us to account for the phenomenology of smooth coping, something that few cognitive architectures would be able to do. Through the fine granular analysis of LIDA, we argue that smooth coping is a sequence of automatized actions intermittently interspersed with consciously mediated action selection, supplemented by dorsal stream processes. In other words, non-conscious, automatized actions (whether learned or innate) often require occasional bursts of conscious cognition to achieve the skillful and flexible adjustments of smooth coping. In addition, never-conscious dorsal stream information and associated sensorimotor processes provide further online adjustments during smooth coping. To achieve smooth coping in LIDA we introduce a new module to the LIDA cognitive architecture the Automatized Action Selection sub-module. Our complex model of smooth coping borrows notions of “embodied intelligence” from enactivism and augments these by allowing representations and more detailed mechanisms of conscious control. We explore several extended examples of smooth coping, starting from basic activities like walking and scaling up to more complex tasks like driving and short-order cooking.
Effects of Verbal Interruption in Conversations with an Intelligent Virtual Agent in Virtual Reality
Intelligent assistants powered by large-language models (LLMs) are increasingly becoming a part of our daily lives, particularly on our mobile devices. In immersive virtual reality (VR), intelligent assistants can be embodied as intelligent virtual agents (IVAs). They are commonly humanoid characters capable of multimodal interaction with the user. When prompted with a question, these assistants or IVAs typically formulate elaborate responses, which are then vocalized. While it is possible to terminate the output by tapping on a mobile phone screen, in immersive VR, different strategies have to be investigated to interrupt the IVA. In our work, we investigated whether implementing the possibility to interrupt embodied IVAs in VR enhances the user experience and the users’ perception of the IVA. Therefore, we conducted a user study in which 30 participants were tasked to answer several quiz questions with the help of an IVA, with or without the ability to interrupt. The results show that the possibility of interrupting the IVA increases perceived efficiency, attractiveness and stimulation. Qualitative feedback suggests that real-time interruptions lead to more natural and dynamic conversations. Although the interruption possibility did not change users’ perception of the IVA regarding perceived anthropomorphism, likeability, and perceived intelligence, we discussed these findings along with participants’ qualitative feedback and provide various design insights for future research.
Natural language serves as the primary mode of communication when an intelligent agent with a physical presence engages with human beings. While a plethora of research focuses on natural language understanding (NLU), encompassing endeavors such as sentiment analysis, intent prediction, question answering, and summarization, the scope of NLU directed at situations necessitating tangible actions by an embodied agent remains limited. The inherent ambiguity and incompleteness inherent in natural language present challenges for intelligent agents striving to decipher human intention. To tackle this predicament head-on, we introduce a novel system known as task and argument grounding for Embodied agents (tagE). At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions. These extracted tasks are then mapped (or grounded) to the robot's established collection of skills, while the arguments find grounding in objects present within the environment. To facilitate the training and evaluation of our system, we have curated a dataset featuring complex instructions. The results of our experiments underscore the prowess of our approach, as it outperforms robust baseline models.
Interactive embodied agents are intelligent agents present inside some virtual environment and interact with humans to do tasks we want them to simulate. The communication takes place in some verbal or non-verbal form. This kind of communication gives a better and richer experience and more possibility for users to be satisfied with the carried-out task. An intelligent agent analyzes the environment it is in and performs tasks with all its knowledge based on the interaction. The task is carried out in the game environment where the Agent resides. There are some challenges to an agent that it needs to address, as well as to understand the instruction given to it. The next challenge is asking for clarification if the instruction given is unclear. In this research paper, we have discussed the problems mentioned above using the dataset provided by the NeurIPS 2022 IGLU Challenge which is a labeled dataset. We have discussed the various approaches for the problem mentioned.
Recent advances in embodied agents with multimodal perception and reasoning capabilities based on large vision-language models (LVLMs), excel in autonomously interacting either real or cyber worlds, helping people make intelligent decisions in complex environments. However, the current works are normally optimized by golden action trajectories or ideal task-oriented solutions toward a definitive goal. This paradigm considers limited user-oriented factors, which could be the reason for their performance reduction in a wide range of personal assistant applications. To address this, we propose Chain-of-User-Thought (COUT, a novel embodied reasoning paradigm that takes a chain of thought from basic action thinking to explicit and implicit personalized preference thought to incorporate personalized factors into autonomous agent learning. The main challenges of achieving COUT include: 1) the definition of embodied personalized tasks, 2) the embodied environment epitomizes personalized preference, and 3) the way to model embodied personalized actions. To target COUT, we introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements as: 1) interacting with GUI to access an item pool, 2) generating users' explicit requirements implied by previous actions, and 3) recommending items to fulfill users' implicit requirements. To demonstrate SmartAgent's capabilities, we also create a brand-new dataset SmartSpot that offers a full-stage personalized action-involved environment. To our best knowledge, our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning. Our extensive experiments on SmartSpot illuminate SmartAgent’s functionality among a series of embodied and personalized sub-tasks.
The integration of conversational agents in virtual reality (VR) offers promising opportunities for applications such as training, education, and mental health support. However, most current VR agents lack empathic capabilities, which are crucial for fostering engagement, trust, and effective interaction. To address this gap, we present EMBRACE, a modular framework that combines natural language processing, emotion inference, expressive text-to-speech synthesis, and AI-driven facial animation to deliver emotionally responsive VR agents. We implemented EMBRACE within a counselling-inspired virtual environment and conducted a user study, comparing empathic and nonempathic agent conditions. Subjective measures included usability, presence, and agent evaluation, while objective measures incorporated heart rate and heart rate variability. Results show that empathic behaviours enhanced spatial presence and revealed trends toward higher general presence and physiological engagement. Additional analyses indicated that prior chatbot experience shaped usability and agent ratings, while gender influenced certain presence dimensions. Our findings demonstrate both the feasibility and benefits of implementing empathic VR agents.
No abstract available
No abstract available
Embodied conversational agents have changed the ways we can interact with machines. However, these systems often do not meet users’ expectations. A limitation is that the agents are monotonic in behavior and do not adapt to an interlocutor. We present SIVA (a Socially Intelligent Virtual Agent), an expressive, embodied conversational agent that can recognize human behavior during open-ended conversations and automatically align its responses to the conversational and expressive style of the other party. SIVA leverages multimodal inputs to produce rich and perceptually valid responses (lip syncing and facial expressions) during the conversation. We conducted a user study (N=30) in which participants rated SIVA as being more empathetic and believable than the control (agent without style matching). Based on almost 10 hours of interaction, participants who preferred interpersonal involvement evaluated SIVA as significantly more animate than the participants who valued consideration and independence.
No abstract available
The research and development (R&D) of intelligent virtual agents (IVAs) is inherently complex. We aim to manage this complexity by combining the best aspects of academic and commercial approaches into a principled R&D platform that emphasizes interoperability, ex-tendability, re-use, and support for multiple hardware targets. This IVA platform, the Virtual Human Toolkit 2.0, is a re-architecture of our earlier work and combines a modular message passing architecture with that of a microservices architecture. This paper discusses our approach, design decisions, lessons learned, and current status of this ongoing effort. We illustrate the strengths of the architecture, how best to use commodity AI cloud services in one's own work, and how to port legacy stand-alone software to a web service.
No abstract available
Embodied Artificial Intelligence (AI) bridges the cyberspace and the physical space, driving advancements in autonomous systems like the Vehicular Embodied AI NETwork (VEANET). VEANET integrates advanced AI capabilities into vehicular systems to enhance autonomous operations and decision-making. Embodied agents, such as Autonomous Vehicles (AVs), are autonomous entities that can perceive their environment and take actions to achieve specific goals, actively interacting with the physical world. Embodied Agent Twins (EATs) are digital models of these embodied agents, with various Embodied Agent AI Twins (EAATs) for intelligent applications in cyberspace. In VEANETs, EAATs act as in-vehicle AI assistants to perform diverse tasks supporting autonomous driving using generative AI models. Due to limited onboard computational resources, AVs offload EAATs to nearby RoadSide Units (RSUs). However, the mobility of AVs and limited RSU coverage necessitates dynamic migrations of EAATs, posing challenges in selecting suitable RSUs under information asymmetry. To address this, we construct a multi-dimensional contract theoretical model between AVs and alternative RSUs. Considering that AVs may exhibit irrational behavior, we utilize prospect theory instead of expected utility theory to model the actual utilities of AVs. Finally, we employ a Generative Diffusion Model (GDM)-based algorithm to identify the optimal contract designs, thus enhancing the efficiency of EAAT migrations. Numerical results demonstrate the superior efficiency of the proposed GDM-based scheme in facilitating EAAT migrations compared with traditional deep reinforcement learning methods.
This paper presents a computational model for managing an Embodied Conversational Agent's first impressions of warmth and competence towards the user. These impressions are important to manage because they can impact users' perception of the agent and their willingness to continue the interaction with the agent. The model aims at detecting user's impression of the agent and producing appropriate agent's verbal and nonverbal behaviours in order to maintain a positive impression of warmth and competence. User's impressions are recognized using a machine learning approach with facial expressions (action units) which are important indicators of users' affective states and intentions. The agent adapts in real-time its verbal and nonverbal behaviour, with a reinforcement learning algorithm that takes user's impressions as reward to select the most appropriate combination of verbal and non-verbal behaviour to perform. A user study to test the model in a contextualized interaction with users is also presented. Our hypotheses are that users' ratings differs when the agents adapts its behaviour according to our reinforcement learning algorithm, compared to when the agent does not adapt its behaviour to user's reactions (i.e., when it randomly selects its behaviours). The study shows a general tendency for the agent to perform better when using our model than in the random condition. Significant results shows that user's ratings about agent's warmth are influenced by their a-priori about virtual characters, as well as that users' judged the agent as more competent when it adapted its behaviour compared to random condition.
Intelligent virtual agents (VAs) already support us in a variety of everyday tasks such as setting up appointments, monitoring our fitness, and organizing messages. Adding a humanoid body representation to these mostly voice-based VAs has enormous potential to enrich the human–agent communication process but, at the same time, raises expectations regarding the agent’s social, spatial, and intelligent behavior. Embodied VAs may be perceived as less human-like if they, for example, do not return eye contact, or do not show a plausible collision behavior with the physical surroundings. In this article, we introduce a new model that extends human-to-human interaction to interaction with intelligent agents and covers different multi-modal and multi-sensory channels that are required to create believable embodied VAs. Theoretical considerations of the different aspects of human–agent interaction are complemented by implementation guidelines to support the practical development of such agents. In this context, we particularly emphasize one aspect that is distinctive of embodied agents, i.e., interaction with the physical world. Since previous studies indicated negative effects of implausible physical behavior of VAs, we were interested in the initial responses of users when interacting with a VA with virtual–physical capabilities for the first time. We conducted a pilot study to collect subjective feedback regarding two forms of virtual–physical interactions. Both were designed and implemented in preparation of the user study, and represent two different approaches to virtual–physical manipulations: (i) displacement of a robotic object, and (ii) writing on a physical sheet of paper with thermochromic ink. The qualitative results of the study indicate positive effects of agents with virtual–physical capabilities in terms of their perceived realism as well as evoked emotional responses of the users. We conclude with an outlook on possible future developments of different aspects of human–agent interaction in general and the physical simulation in particular.
Non-adherence to a treatment plan recommended by the therapist is a key cause of the increasing rate of chronic medical conditions globally. The therapist-patient therapeutic alliance is regarded as a successful intervention and a good predictor of treatment adherence. Similar to the human scenario, embodied conversational agents (ECAs) showed evidence of their ability to build an agent-patient therapeutic alliance, which motivates the effort to advance ECAs as a potential solution to improve treatment adherence and consequently the health outcome. Building therapeutic alliance implies the need for a positive environment where the ECA and the patient can share their knowledge and discuss their goals, preferences and tasks towards building a shared plan, which is commonly done using explanations. However, explainable agents commonly rely on their own knowledge and goals in providing explanations, rather than the beliefs, plans or goals of the user. It is not clear whether such explanations, in individual-specific contexts such as personal health assistance, are perceived by the user as relevant in decision-making towards their own behavior change. Therefore, in this research, we are developing a user-aware explainable ECA by embedding the cognitive agent architecture with a user model, explanation engine and modified planner to implement the concept of SharedPlans. The developed agent will be deployed and evaluated with real patients and the therapeutic alliance will be measured using standard measurements.
Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. Our design vision is embodied in LAVE, a novel system that provides LLM-powered agent assistance and language-augmented editing features. LAVE automatically generates language descriptions for the user’s footage, serving as the foundation for enabling the LLM to process videos and assist in editing tasks. When the user provides editing objectives, the agent plans and executes relevant actions to fulfill them. Moreover, LAVE allows users to edit videos through either the agent or direct UI manipulation, providing flexibility and enabling manual refinement of agent actions. Our user study, which included eight participants ranging from novices to proficient editors, demonstrated LAVE’s effectiveness. The results also shed light on user perceptions of the proposed LLM-assisted editing paradigm and its impact on users’ creativity and sense of co-creation. Based on these findings, we propose design implications to inform the future development of agent-assisted content editing.
Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, oversimplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes. Nevertheless, DriVLMe offers a promising new direction for autonomous driving agents that need to navigate not just complex environments but also complex social interactions.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models.
In this work, we introduce SMART-LLM, an innovative framework designed for embodied multi-robot task planning. SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models (LLMs), harnesses the power of LLMs to convert high-level task instructions provided as input into a multi-robot task plan. It accomplishes this by executing a series of stages, including task decomposition, coalition formation, and task allocation, all guided by programmatic LLM prompts within the few-shot prompting paradigm. We create a benchmark dataset designed for validating the multi-robot task planning problem, encompassing four distinct categories of high-level instructions that vary in task complexity. Our evaluation experiments span both simulation and real-world scenarios, demonstrating that the proposed model can achieve promising results for generating multi-robot task plans. The experimental videos, code, and datasets from the work can be found at https://sites.google.com/view/smart-llm/.
The promising potential of AI and network convergence in improving networking performance and enabling new service capabilities has recently attracted significant interest. Existing network AI solutions, while powerful, are mainly built based on the close-loop and passive learning framework, resulting in major limitations in autonomous solution finding and dynamic environmental adaptation. Agentic AI has recently been introduced as a promising solution to address the above limitations and pave the way for true, generally intelligent, and beneficial AI systems. The key idea is to create a networking ecosystem to support a diverse range of autonomous and embodied AI agents in fulfilling their goals. In this article, we focus on the novel challenges and requirements of agentic AI networking. We propose AgentNet, a novel framework for supporting interaction, collaborative learning, and knowledge transfer among AI agents. We introduce a general architectural framework of AgentNet and then propose a generative foundation model (GFM)-based implementation in which multiple GFM-as-agents have been created as an interactive knowledge-base to bootstrap the development of embodied AI agents according to different task requirements and environmental features. We consider two application scenarios, digital-twin-based industrial automation and metaverse-based infotainment system, to describe how to apply AgentNet for supporting efficient task-driven collaboration and interaction among AI agents.
No abstract available
Our research investigates the impact of latency on presence and immersion in virtual reality (VR) environments, focusing on interactions with LLM-powered Embodied Conversational Agents (ECAs). We explore the effectiveness of multimodal feedback strategies—including filled pauses, nonverbal turn-taking behaviours, and visual feedback—in mitigating perceived latency. Eighteen participants were subjected to both a baseline condition, without feedback interventions, and a feedback-enhanced condition. Our findings indicate that the feedback condition significantly improved the sense of presence and immersion. We also found that perceived response time and users’ impressions of the agents improved, thereby increasing willingness for future interactions. Additionally, chatbot experience positively correlated with agent likeability, whereas VR experience showed no significant correlation. These results highlight the effectiveness of feedback modalities in enhancing spatial presence and overall immersion, despite latency issues in VR interactions with LLM-powered agents.
No abstract available
: Microsoft Entra ID is Microsoft’s identity and access management solution used by many public and private sector organisations globally. In March 2023, Microsoft retired two PowerShell modules which have enabled automation of administrative tasks, such as user management. The replacement module is based on Microsoft Graph API, and its effective usage would require administrators to learn software development skills. In this paper, we will report the results of work-in-progress research on exploring the applicability of LLM-powered autonomous agents to solve real-life problems. We describe the design and proof-of-concept implementation of MEAN, an agent that performs Entra ID administrative tasks using Microsoft Graph API based on natural language prompts. The results show that LLM-powered autonomous agents can perform at least simple Entra ID administrative tasks. This indicates that the agents could ease the administrative burden by removing the need to learn software development skills.
Cloud Operations (CloudOps) is a rapidly growing field focused on the automated management and optimization of cloud infrastructure which is essential for organizations nav-igating increasingly complex cloud environments. MontyCloud Inc. is one of the major companies in the CloudOps domain that leverages autonomous bots to manage cloud compliance, security, and continuous operations. To make the platform more accessible and effective to the customers, we leveraged the use of GenAl. Developing a GenAl-based solution for autonomous CloudOps for the existing MontyCloud system presented us with various challenges such as i) diverse data sources; ii) orchestration of multiple processes and iii) handling complex workflows to automate routine tasks. To this end, we developed MOYA, a multi-agent framework that leverages GenAI and balances autonomy with the necessary human control. This framework integrates various internal and external systems and is optimized for factors like task orchestration, security, and error mitigation while producing accurate, reliable, and relevant insights by utilizing Retrieval Augmented Generation (RAG). Evaluations of our multi-agent system with the help of practitioners as well as using automated checks demonstrate enhanced accuracy, responsiveness, and effectiveness over non-agentic approaches across complex workflows.
This paper explores the integration of symbolic and connectionist paradigms within the realm of Large Language Model (LLM)-powered autonomous agents, highlighting the complementary strengths of each approach. Symbolic AI, known for its structured, rule-based logic, excels at encoding explicit knowledge and facilitating reasoning, while connectionist AI, particularly neural networks, provides robustness in handling large-scale unstructured data through learning from examples. By merging these paradigms, we propose a synergistic framework that enhances autonomous agent capabilities in both reasoning and adaptability. We investigate how LLMs, which exhibit traits of both paradigms, can serve as the backbone for this integration, fostering improved decision-making, natural language understanding, and autonomy. Our findings underscore the potential of this hybrid approach to advance the development of intelligent agents that can navigate complex environments, reason effectively, and learn from experience in dynamic, real-world applications.
We demonstrate the first cross-domain cross-layer level-4 autonomous optical network via a multi-AI-agent system. Field trials show ~98% task completion rate across the distributed AI training lifecycle©3.2© higher than single agents using advanced LLMs. ©2025 The Author(s)
: Infrastructure failures like the 2024 CrowdStrike incident in July demonstrate the critical need for autonomous recovery systems that can detect, diagnose, and remediate outages without the dependency on human intervention. We present Sura.ai, a multi-agent system using four cooperating Fetch.ai uAgents orchestrated via the Agentverse Mailbox to provide autonomous infrastructure recovery. Our system integrates LLM-powered root cause analysis through Claude Sonnet 3.5, enabling intelligent decision-making beyond simple rule-based automation. We conducted comprehensive testing across four disaster scenarios including simulated faulty updates, CPU spikes, and cascading failures. Sura.ai achieved a 97.6% action success rate (41/42 incidents resolved) with intelligent alert deduplication across all tested scenarios. Our work demonstrates the feasibility of LLM-based multi-agent orchestration for critical infrastructure resilience and introduces practical patterns for AgentOps implementation in the growing agentic cybersecurity space.
We design and demonstrate the first field trial of LLM-powered AI Agent for ADON. Three operation modes of the Agent are proposed for network lifecycle management to process wavelength add/drop, soft/hard failures, and power optimizations.
The integration of a complex set of electronic design automation (EDA) tools to enhance interoperability is a critical concern for circuit designers. Recent advancements in large language models (LLMs) have showcased their exceptional capabilities in natural language processing and comprehension, offering a novel approach to interfacing with EDA tools. This research article introduces ChatEDA, an autonomous agent for EDA empowered by an LLM, AutoMage, complemented by EDA tools serving as executors. ChatEDA streamlines the design flow from the register-transfer level (RTL) to the graphic data system version II (GDSII) by effectively managing task decomposition, script generation, and task execution. Through comprehensive experimental evaluations, ChatEDA has demonstrated its proficiency in handling diverse requirements, and our fine-tuned AutoMage model has exhibited superior performance compared to GPT-4 and other similar LLMs.
We demonstrate that a group of Al agents can autonomously optimize power commissioning in WDM links. By leveraging modern LLMs' reflection and reasoning capabilities and interacting with a network digital twin, the agents achieve optimal solutions for different criteria such as power and OSNR equalization.
: Large language models (LLMs) have revolutionized the field of artificial intelligence, endowing it with sophisticated language understanding and generation capabilities. However, when faced with more complex and interconnected tasks that demand a profound and iterative thought process, LLMs reveal their inherent limitations. Autonomous LLM-powered multi-agent systems represent a strategic response to these challenges. While these architectures hold promising potential in amplifying AI capabilities, striking the right balance be-tween different levels of autonomy and alignment remains the crucial challenge for their effective operation. This paper proposes a comprehensive multi-dimensional taxonomy, engineered to analyze how autonomous LLM-powered multi-agent systems balance the dynamic interplay between autonomy and alignment across various aspects inherent to architectural viewpoints such as goal-driven task management, agent composition, multi-agent collaboration, and context interaction. Our taxonomy aims to empower researchers, engineers, and AI practitioners to systematically analyze the architectural dynamics and balancing strategies employed by these increasingly prevalent AI systems. The exploratory taxonomic classification of selected representative LLM-powered multi-agent systems illustrates its practical utility and reveals potential for future research and development. An extended version of this paper is available on arXiv (Händler, 2023).
No abstract available
Large language models (LLMs) have catalyzed artificial intelligence (AI) agent development for autonomous optical networks (AONs). This invited paper reviews our recent progress in leveraging LLM-powered AI agents for realizing AONs in field-deployed networks.
The unique integration of Large Language Models (LLMs) as the reasoning center inside Agentic Artificial Intelligence (AAI) systems exposes new, systematic hazards that current research is totally ill-fitted to handle. The lack of a consistent, thorough framework that simultaneously addresses the basic problems of LLM unreliability as they spread and grow in autonomous, goal-directed, multi-agent systems reveals a major gap in the literature. These issues reach important, under-investigated fields like avoiding goal misalignment, lowering the danger of opaque decision-making, and guaranteeing strong long-term safety in complicated systems. Clearly establishing moral and legal responsibility and existing means for minimal human supervision are clearly lacking, therefore creating a hazardous hole as these automated systems approach actual deployment. This article provides an innovative, unified framework for responsible development, thus directly tackling this critical need. Specifically intended for LLM-powered agentic systems operating in challenging, high-stakes contexts, the author has presented the Trust, Risk, and Safety Management (TRiSM) governance framework. The core innovation of the framework is the Goal-Constraint Alignment (GCA) mechanism, which dynamically monitors and constrains LLM behavior inside set ethical and safety envelopes, hence acting as a dynamic barrier against both planned and unexpected goal misalignment. Furthermore, we install a Decentralized Oversight Ledger (DOL) to improve transparency and allow realistic accountability. The DOL offers real-time, tamper-proof, auditable tracking of all multi-agent interactions and decisions, therefore enhancing human oversight and establishing a clear chain of custody for agent behavior, which is vital to determine legal responsibility. Studies show a major improvement in systematic safety and a significant decrease in catastrophic failures compared to present baseline systems by verifying the effectiveness of the framework against a fresh collection of high-stakes, multi-agent coordination scenarios. This study offers the essential technical and governance structure needed for the responsible and safe distribution of next-generation autonomous artificial intelligence.
Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities. However, current research on aligning driver agent behaviors with human driving styles remains limited, partly due to the scarcity of high-quality natural language data from human driving behaviors. To address this research gap, we propose a multi-alignment framework designed to align driver agents with human driving styles through demonstrations and feedback. Notably, we construct a natural language dataset of human driver behaviors through naturalistic driving experiments and post-driving interviews, offering high-quality human demonstrations for LLM alignment. The framework’s effectiveness is validated through simulation experiments in the CARLA urban traffic simulator and further corroborated by human evaluations. Our research offers valuable insights into designing driving agents with diverse driving styles. The implementation of the framework 1 and details of the dataset 2 can be found at the link.
In recent years, recommendation systems have evolved from providing a single list of recommendations to offering a comprehensive suite of topic-focused services. To better accomplish this task, conversational recommendation systems (CRS) have progressed from basic retrieval-augmented LLM generation to agentic systems with advanced reasoning and self-correction capabilities. However, agentic systems come with notable response latency—a longstanding challenge for conversational recommendation systems. To balance the trade-off between handling complex queries and minimizing latency, we propose AdaptJobRec, the first conversational job recommendation system that leverages autonomous agent to integrate personalized recommendation algorithm tools. The system employs a user query complexity identification mechanism to minimize response latency. For straightforward queries, the agent directly selects the appropriate tool for rapid responses. For complex queries, the agent uses the memory processing module to filter chat history for relevant content, then passes the results to the intelligent task decomposition planner, and finally executes the tasks using personalized recommendation tools. Evaluation on Walmart’s real-world career recommendation scenarios demonstrates that AdaptJobRec reduces average response latency by up to 53.3\% compared to competitive baselines, while significantly improving recommendation accuracy.
Text-based applications and chatbots are increasingly popular for delivering banking services and educational tools, offering convenient and efficient solutions for users. Whereas, personalized assistants have transformed user engagement in the digital banking space by utilizing Large Language Models (LLMs) in conjunction with autonomous agents. This study proposes the development of an intelligent personalized assistant for digital banking, utilizing a multi-agent framework based on the LangGraph and Chain of Thoughts (COT) prompting. While COT guarantees context-aware replies, the LangGraph design maps characteristics to nodes to improve user interactions. The objectives of this system are to enhance task efficiency and elevate the capabilities of digital banking assistants. We present a customizable digital banking system powered by LLM-based models, designed to deliver an interactive and personalized banking experience. The system supports a range of services, including adding money, transferring funds, paying bills, accessing telco services like mobile recharge, managing savings interest rates, DPS schemes, fixed deposits, and answering FAQs related to banking information. Therefore, integrating COT for logical reasoning enhances the effectiveness of multi-agent systems, as each single agent benefits from the structured reasoning process. In addition, LangGraph is employed for structured data management, enabling the assistant to support and accelerate various digital banking processes efficiently. The code implementation of this work is available for public access at: https://github.com/srv-sh/digital_agent.
Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in real-world scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 210 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion. More details and videos can be found at https://assistantx-agent.github.io/AssistantX/.
Large language models (LLMs) are increasingly integrated into autonomous systems, giving rise to a new class of software known as Agentware, where LLM-powered agents perform complex, open-ended tasks in domains such as software engineering, customer service, and data analysis. However, their high autonomy and opaque reasoning processes pose significant challenges for traditional software observability methods. To address this, we introduce the concept of cognitive observability—the ability to recover and inspect the implicit reasoning behind agent decisions. We present Watson, a general-purpose framework for observing the reasoning processes of fast-thinking LLM agents without altering their behavior. Watson retroactively infers reasoning traces using prompt attribution techniques. We evaluate Watson in both manual debugging and automated correction scenarios across the MMLU benchmark and the AutoCodeRover and OpenHands agents on the SWE-bench-lite dataset. In both static and dynamic settings, Watson surfaces actionable reasoning insights and supports targeted interventions, demonstrating its practical utility for improving transparency and reliability in Agentware systems.
This work examines the integration of large language models (LLMs) into multi-agent simulations by replacing the hard-coded programs of agents with LLM-driven prompts. The proposed approach is showcased in the context of two examples of complex systems from the field of swarm intelligence: ant colony foraging and bird flocking. Central to this study is a toolchain that integrates LLMs with the NetLogo simulation platform, leveraging its Python extension to enable communication with GPT-4o via the OpenAI API. This toolchain facilitates prompt-driven behavior generation, allowing agents to respond adaptively to environmental data. For both example applications mentioned above, we employ both structured, rule-based prompts and autonomous, knowledge-driven prompts. Our work demonstrates how this toolchain enables LLMs to study self-organizing processes and induce emergent behaviors within multi-agent environments, paving the way for new approaches to exploring intelligent systems and modeling swarm intelligence inspired by natural phenomena. We provide the code, including simulation files and data at https://github.com/crjimene/swarm_gpt.
The Internet of Things (IoT), particularly its industrial subset Industrial IoT (IIoT), presents a critical attack surface due to its interconnected nature. As emerging threats exploit IoT edge networks, there is a growing demand for anomaly detection systems capable of addressing zero-day attacks while providing explainable predictions. Existing machine learning (ML) and deep learning (DL) methods often lack explainability, sensitivity, absence of a large language model (LLM) agent for adaptive detection and struggle with unseen zero-day threats. Motivated by these challenges, we introduce Anomaly-Agent, a novel LLM-powered explainable anomaly detection framework for IoT/IIoT edge environments. Anomaly-Agent leverages a reasoning-followed-by-action pipeline, integrating domain tools, external knowledge retrieval, and memory-augmented decisions to detect and explain anomalies. Unlike static ML/DL models, Anomaly-Agent adapts to evolving zero-day threats and supports sensitivity customization. We evaluated Anomaly-Agent on the Edge-IIoTset (IIoT-specific) and CIC-IoT2023 (general IoT) datasets; it achieved accuracies of 0.96 and 0.89, respectively, with a false alarm rate (FAR) below 0.04. It also attains a recall of 0.65 for zero-day attacks, surpassing traditional ML models and LLM baselines including GPT-4o, Claude 3.5, and GPT-4o-mini. Anomaly-Agent outperformed GPT-4o and Claude 3.5 due to their reliance on generic prompting, which limits performance to 64%–70% multiclass $F1$ -score on CIC-IoT2023. Their high FAR of 10%–13% stems from misclassifying benign edge traffic as malicious. It also surpasses GPT-4o-mini, where token constraints reduce accuracy to 58% for multiclass tasks. The agent’s performance benefits from integration with Shapley additive explanation (SHAP), enhancing transparency and trust. While demonstrating strong performance, Anomaly-Agent faces inherent challenges in latency in complex scenarios and adversarial robustness that guide future improvements. These results demonstrate Anomaly-Agent’s robustness and interpretability, offering a viable path toward resilient, LLM-driven IoT/IIoT security solutions.
This paper presents a large language model (LLM)-based system for autonomous maintenance in manufacturing facilities. While many machine alarms are interpreted with existing manuals, understqanding and acting on these instructions of all facilities remains a challenge for operators. The proposed system processes user inputs including error codes, identifies corresponding procedures from manuals, and decomposes them into structured action sequences. These sequences include action, user interface target, preconditions, and expected outcomes, and are executed by agent capable of interacting with Human–Machine Interfaces (HMIs). The proposed system is built on an LLM-powered multi-agent framework comprising four agents: a chatbot, solution_finder, actor, and supervisor. Each agent operates based on role-specific prompts that define their responsibilities and decision rules. Instead of relying on predefined rule sets, the system interprets unfamiliar or previously unseen alarms by reasoning over machine manuals and context, enabling flexible and scalable maintenance. The system was implemented on a HMI system of CNC machine tools and successfully performed automatic responses to selected alarms. Prompt-based control ensure adaptability to other machines, and the use of a local LLM maintains data security. This approach enables general-purpose, self-directed maintenance with minimal operator intervention.
Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses across 13 LLM backbones. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30\%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. We also introduce a new metric to evaluate the agents' capability to balance utility and security. Our code can be found at https://github.com/agiresearch/ASB.
Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.
As autonomous agents powered by large language models (LLMs) continue to demonstrate potential across various assistive tasks, ensuring their safe and reliable behavior is crucial for preventing unintended consequences. In this work, we introduce CIP, a novel technique that leverages causal influence diagrams (CIDs) to identify and mitigate risks arising from agent decision-making. CIDs provide a structured representation of cause-and-effect relationships, enabling agents to anticipate harmful outcomes and make safer decisions. Our approach consists of three key steps: (1) initializing a CID based on task specifications to outline the decision-making process, (2) guiding agent interactions with the environment using the CID, and (3) iteratively refining the CID based on observed behaviors and outcomes. Experimental results demonstrate that our method effectively enhances safety in both code execution and mobile device control tasks.
This paper proposes a novel Neural Architecture Search (NAS) framework powered by Large Language Model (LLM)-based autonomous agents. The method integrates self-consistent architecture generation, role-based agent collaboration, and a memory stream for iterative refinement. By simulating structured dialogues between a Strategic Director and a Technical Analyst, the system enables autonomous design exploration and decision-making without human input. A case study on short-term electricity load forecasting demonstrates the framework’s effectiveness, where the final model—developed through iterative reasoning—achieves a 9.2% reduction in MAE, 7.4% reduction in RMSE, and 32% fewer parameters compared to baseline architectures. The results highlight the framework’s ability to generate efficient, high-performing models while maintaining adaptability and interpretability. This approach offers a scalable solution for automated deep learning design and can be extended to other domains requiring efficient and context-aware model optimisation.
ABSTRACT Powered by the emerging large language models (LLMs), autonomous geographic information system (GIS) agents can perform spatial analyses and cartographic tasks. However, a research gap exists in enabling these agents to autonomously discover and retrieve the necessary data for spatial analysis. This study proposes an autonomous GIS agent framework capable of retrieving required geospatial data by generating, executing, and debugging programs. The framework, with an LLM-driven decision core, selects data sources from a predefined list and fetches data using source-specific handbooks that document metadata and data retrieval details. Designed in a plug-and-play style, the framework allows human users or automated data crawlers to add new sources by creating additional handbooks. A prototype agent based on the framework is developed and released as a QGIS plugin and a Python program. Experiment results demonstrate its capability of retrieving data from various sources, including OpenStreetMap, administrative boundaries and demographic data from the U.S. Census Bureau, satellite basemaps from ESRI World Imagery, global digital elevation model (DEM) from OpenTopography.org, weather data from a commercial provider, and the COVID-19 case data from the NYTimes GitHub. This study is among the first attempts to develop an autonomous GIS agent for geospatial data retrieval.
A GUI agent is an intelligent autonomous system that operates graphical interfaces to complete tasks and deliver services to users. Recent advances in large vision models (LVMs) have boosted GUI agents' flexibility, robustness, and task success rates, but at the cost of high response latency that limits enterprise use. We propose a hybrid LVM GUI agent for Walmart MyAssistant Application: the LVM is invoked only at critical decision points, while routine visual actions are offloaded to lightweight computer vision. Guided by semantics from LLMpredicted action steps, the agent retrieves candidate icons from a curated icon library via semantic search and applies template matching to score tensor similarity against screenshot regions, efficiently estimating click targets. This design cuts end-to-end latency while preserving high success rates, enabling scalable, responsive enterprise application and moving LLM-powered assistants toward broader AGI capabilities.
Lack of transparency and information reliability in supply chain management have been persistent challenges. Based on the LangChain and LangGraph frameworks, this research proposed a Large Language Model (LLM)-based Multi-Agent System (MAS) specifically designed to enhance information reliability and transparency in construction supply chain coordination. A prototype system composed of multiple autonomous agents was designed and developed capable of working collaboratively, sharing information, and supporting decision-making. The system comprises Supplier Agents and General Contractor Agents capable of engaging in natural language interactions. These agents coordinate the supply chain by facilitating communication about material deliveries and project progress. The prototype demonstrated the potential of LLM-based MAS in improving supply chain transparency and reliability. This research not only validated the feasibility of applying large language models in automated supply chain coordination but also offered insights for the design and implementation of future systems.
The cyber-security ecosystem is evolving very fast, with Artificial Intelligence (AI) giving rise to both highly defensive and more sophisticated forms of Advanced Persistent Threats (APTs). AI-powered APTs are a new breed of intelligent, adaptive and self-learning cyber attackers that can autonomously use vulnerabilities, evade detection and continue operating within networks. Organizations in their turn are moving towards the shift between stationary, rule-based control and fully autonomous defensive agents able to conduct continuous monitoring, predict the threat, interrupt the attack real-time, and actively respond. It is this paper that examines the new paradigm of Agent-vs-Agent Cyber Warfare, where autonomous AI defenses indirectly respond to AI-driven APTs on dynamic digital platforms. We describe the architecture of the offensive APT agents based on AI, analyze defensive multi-agent systems (MAS), and suggest a proactive cyber-battlefield model, based on reinforcement learning (RL), large language models (LLM), and self-evolving threat intelligence. Lastly, we outline constraints, ethical aspects, and the way forward with regard to obtaining digital ecosystems in an era of autonomous cyber warfare.
While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, which is capable of leveraging external knowledge, utilizing tools with careful planning to provide zero-shot personalized recommendations. We propose a Self-Inspiring algorithm to improve the planning ability. At each intermediate step, the LLM self-inspires to consider all previously explored states to plan for the next step. This mechanism greatly improves the model's ability to comprehend and utilize historical information in planning for recommendation. We evaluate RecMind's performance in various recommendation scenarios. Our experiment shows that RecMind outperforms existing zero/few-shot LLM-based recommendation baseline methods in various tasks and achieves comparable performance to a fully trained recommendation model P5.
Auditing recommendation systems has attracted growing attention due to increasing concerns over filter bubbles, unfairness, and data misuse. A common approach is sock-puppet auditing, where autonomous agents interact with platforms to reveal risks. However, existing approaches rely on hard-coded agents, lacking adaptability to dynamic GUI layouts and generating behaviors far from those of real users, limiting the comprehensiveness and representativeness of assessment. To address these issues, we introduce AuditAgent, an LLM-powered GUI-agent framework for risk auditing. AuditAgent simulates realistic user preferences and performs adaptive, human-like interactions on recommendation platforms. This design enables more thorough and faithful auditing, providing comprehensive assessments across multiple risk dimensions, including filter bubbles, unfairness, and data misuse.
The successful integration of large language models (LLMs) into laboratory workflows has demonstrated robust capabilities in natural language processing, autonomous task execution, and collaborative problem-solving. This offers an exciting opportunity to realize the dream of autonomous chemical research on demand. Here, we report a robotic AI chemist powered by a hierarchical multiagent system, ChemAgents, based on an on-board Llama-3.1-70B LLM, capable of executing complex, multistep experiments with minimal human intervention. It operates through a Task Manager agent that interacts with human researchers and coordinates four role-specific agents─Literature Reader, Experiment Designer, Computation Performer, and Robot Operator─each leveraging one of four foundational resources: a comprehensive Literature Database, an extensive Protocol Library, a versatile Model Library, and a state-of-the-art Automated Lab. We demonstrate its versatility and efficacy through six experimental tasks of varying complexity, ranging from straightforward synthesis and characterization to more complex exploration and screening of experimental parameters, culminating in the discovery and optimization of functional materials. Additionally, we introduce a seventh task, where ChemAgents is deployed in a new robotic chemistry lab environment to autonomously perform photocatalytic organic reactions, highlighting ChemAgents's scalability and adaptability. Our multiagent-driven robotic AI chemist showcases the potential of on-demand autonomous chemical research to accelerate discovery and democratize access to advanced experimental capabilities across academic disciplines and industries.
Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.
Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rates without systematically analyzing the interactions, communication mechanisms, and failure causes within these systems. To bridge this gap, we present a benchmark of 34 representative programmable tasks designed to rigorously assess autonomous agents. Using this benchmark, we evaluate three popular open-source agent frameworks combined with two LLM backbones, observing a task completion rate of approximately 50%. Through in-depth failure analysis, we develop a three-tier taxonomy of failure causes aligned with task phases, highlighting planning errors, task execution issues, and incorrect response generation. Based on these insights, we propose actionable improvements to enhance agent planning and self-diagnosis capabilities. Our failure taxonomy, together with mitigation advice, provides an empirical foundation for developing more robust and effective autonomous agent systems in the future.
Recent advances in large language models (LLMs) and agent-based orchestration are transforming automated knowledge graph (KG) creation as well as robust question answering in complex domains. We present a modular, multi-agent system that extracts, integrates, and reasons over diverse NoSQL movie data, powered by state-of-the-art LLMs such as GPT-4.1. Our architecture converts unstructured plots, cast/crew metadata, and numeric attributes into high-fidelity KGs - enabling both natural language and programmatic queries. To maximize reliability and flexibility, the system unifies multiple retrieval strategies - keyword search, vector similarity, knowledge graph querying, and summarization - each deployed as an autonomous pipeline. Parallel orchestration via LangGraph supports adaptive engine selection, concurrent execution, and robust answer verification with LLM ensemble “jury” scoring. Critically, the framework features comprehensive observability, allowing detailed monitoring and analysis of agent decisions, pipeline performance, and query outcomes. By treating each retrieval method and LLM as a specialized agent, our approach delivers scalable, explainable, and highly accurate results (up to 97%), significantly surpassing monolithic solutions. This agentic, observable architecture paves the way for next-generation autonomous analytics, integration, and decision support across data-rich domains.
The advent of Large Language Models (LLMs) has created new opportunities for the automation of scientific research spanning both experimental processes and computational simulations. This study explores the feasibility of constructing an autonomous simulation agent (ASA) powered by LLMs through prompt engineering and automated program design to automate the entire simulation research process according to a human-provided research plan. This process includes experimental design, remote upload and simulation execution, data analysis, and report compilation. Using a well-studied simulation problem of polymer chain conformations as a test case, we assessed the long-task completion and reliability of ASAs powered by different LLMs, including GPT-4o, Claude-3.5, etc. Our findings revealed that ASA-GPT-4o achieved near-flawless execution on designated research missions, underscoring the potential of methods like ASA to achieve automation in simulation research processes to enhance research efficiency. The outlined automation can be iteratively performed for up to 20 cycles without human intervention, illustrating the potential of ASA for long-task workflow automation. Additionally, we discussed the intrinsic traits of ASA in managing extensive tasks, focusing on self-validation mechanisms, and the balance between local attention and global oversight.
Robots are no longer just tools—they are becoming social agents that can shape how we engage with culture. This study examines what influences visitors’ perceptions of an autonomous museum guide robot, focusing not only on technical capabilities but also on human-centered factors. In a maritime exhibition, 34 participants interacted with a fully autonomous, LLM-powered robot acting as a museum guide. Using self-report questionnaires, we explored how individual differences - age and prior experience with robots — interacted with experimental conditions to shape participants’ impressions of the robot. Our findings suggest that these personal factors significantly affect how visitors evaluate the robot, suggesting that effective design must reflect the diversity of users’ experiences and expectations. By acknowledging the complexity of human-robot interaction, we move closer to creating robotic guides that are not only functional but also socially attuned.
Recently, there has been an emergence of employing LLM-powered agents as believable human proxies, based on their remarkable decision-making capability. However, existing studies mainly focus on simulating human dialogue. Human non-verbal behaviors, such as item clicking in recommender systems, although implicitly exhibiting user preferences and could enhance the modeling of users, have not been deeply explored. The main reasons lie in the gap between language modeling and behavior modeling, as well as the incomprehension of LLMs about user-item relations. To address this issue, we propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimizes both kinds of agents together. Specifically, at each time step, we first prompt the user and item agents to interact autonomously. Then, based on the disparities between the agents' decisions and real-world interaction records, user and item agents are prompted to reflect on and adjust the misleading simulations collaboratively, thereby modeling their two-sided relations. The optimized agents can also propagate their preferences to other agents in subsequent interactions, implicitly capturing the collaborative filtering idea. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions. The results show that these agents can demonstrate personalized behaviors akin to those of real-world individuals, sparking the development of next-generation user behavior simulation.
No abstract available
Large Language Models (LLM) are increasingly being explored for problem-solving tasks. However, their strategic planning capability is often viewed with skepticism. Recent studies have incorporated the Monte Carlo Tree Search (MCTS) algorithm to augment the planning capacity of LLM. Despite its potential, MCTS relies on extensive sampling simulations to approximate the true reward distribution, which leads to two primary issues. Firstly, MCTS is effective for tasks like the Game of Go, where simulation results can yield objective rewards (e.g., 1 for a win and 0 for a loss). However, for tasks such as question answering, the result of a simulation is the answer to the question, which cannot yield an objective reward without the ground truth. Secondly, obtaining statistically significant reward estimations typically requires a sample size exceeding 30 simulations, resulting in excessive token usage and time consumption. To address these challenges, we present the Multi-Agent System with Tactical Execution and Reasoning using LLM Specialized MCTS (MASTER), a novel framework that coordinates agent recruitment and communication through LLM specialized MCTS. This system autonomously adjusts the number of agents based on task complexity and ensures focused communication among them. Comprehensive experiments across various tasks demonstrate the effectiveness of our proposed framework. It achieves 76% accuracy on HotpotQA and 80% on WebShop, setting new state-of-the-art performance on these datasets.
No abstract available
Aiming at the problem that the increasing number of automated guided vehicles (AGVs) will lead to more frequent conflicts between AGVs. In this paper, a conflict-free path planning model for multi-AGV is established, aiming to minimize the blocking rate of AGVs between the quay crane and the yard crane, considering the travel speed, operation time, and conflict distance of AGVs. An architecture of AGV’s system based on Multi-Agent System (MAS) is designed, the improved interactive protocol based on blackboard model is used as the communication method of AGV, the improved acceleration control method is combined with the AGV priority determination method based on time cost as the negotiation strategy of AGV, the improved Dijkstra algorithm calculates the conflict-free path of each AGV. By comparing the acceleration control method based on MAS with the speed control method based on MAS and the task priority control method, the effectiveness of this method for solving multiple AGVs conflict-free path planning in automated terminals is verified.
No abstract available
No abstract available
No abstract available
No abstract available
Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decisionmaking. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decisionmaking compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.
In this paper, we introduce a fault-tolerant multi-agent reinforcement learning framework called SERT-DQN to optimize the operations of UAVs with UGV central control in coverage path planning missions. Our approach leverages dual learning systems that combine individual agent autonomy with centralized strategic planning, thus enhancing the efficiency of cooperative path planning missions. This framework is designed for high performance in environments with fault uncertainty detected and operational challenges such as interruptions in connectivity and compromised sensor reliability. With the integration of an innovative communication system between agents, our system appropriately handles both static and dynamic environments. Also, we introduce similarity-based shared experience replay to attain faster convergence and sample efficiency in the multi-agent system. The architecture is specially designed to respond adaptively to such irregularities by effectively showing enhanced resilience in scenarios where data integrity is impaired due to faults or the UAV faces disruptions. Simulation results indicate that our fault tolerance algorithms are very resilient and do indeed improve mission outcomes, especially under dynamic and highly uncertain operating conditions. This approach becomes critical for the most recent sensor-based research in autonomous systems.
No abstract available
Manufacturing systems are undergoing systematic change facing the trade-off between the customer's needs and the economic and ecological pressure. Especially assembly systems must be more flexible due to many product generations or unpredictable material and demand fluctuations. As a solution line-less mobile assembly systems implement flexible job routes through movable multi-purpose resources and flexible transportation systems. Moreover, a completely reactive rearrangeable layout with mobile resources enables reconfigurations without interrupting production. A scheduling that can handle the complexity of dynamic events is necessary to plan job routes and control transportation in such an assembly system. Conventional approaches for this control task require exponentially rising computational capacities with increasing problem sizes. Therefore, the contribution of this work is an algorithm to dynamically solve the integrated problem of layout optimization and scheduling in line-less mobile assembly systems. The proposed multi agent deep reinforcement learning algorithm uses proximal policy optimization and consists of a decoder and encoder, allowing for various-sized system state descriptions. A simulation study shows that the proposed algorithm performs better in 78% of the scenarios compared to a random agent regarding the makespan optimization objective. This allows for adaptive optimization of line-less mobile assembly systems that can face global challenges.
No abstract available
Abstract A distributed multi-project scheduling problem is considered, in which several projects share scarce resources, and a planning department (planner) is responsible for allocating the resources among the projects. Information asymmetry and heterogeneous resources are assumed to be due to the geographical distribution of the planner and the projects. The projects compete for the limited global resources to maximize their local benefit, such that they may lie or overstate resource importance to the planner. In this paper, a multi-agent system is developed to address this problem due to the concerns of private information and highly autonomous nature of project agents, which makes a central coordination approach unsuitable. Different from previous work, a project agent may employ the lying strategy to increase its possibility of winning the desired resource, while the planner can adopt an integrity policy to penalize this behaviour. Another main contribution is that a heuristic procedure is designed and combined with an argumentation-based approach for this multi-agent system that can improve computation efficiency. Finally, the proposed combined multi-agent system is compared with a central coordination algorithm to demonstrate its efficacy. Numerical experiments show that the combined multi-agent system is more effective in exploration. It outperforms the central coordination algorithm for problems of a larger scale, especially those with a tighter global resource constraint. Experimental results also reveal that the proper integrity policy could considerably reduce the negative effect of dishonesty of the project agents on the global objective by eliminating the potential to benefit from lying.
No abstract available
UAV-based air-ground integrated computing networks (AGIN) have gained significant traction in remote areas for the Power Internet of Things (PIoT). This paper considers an AGIN-PIoT, where computing tasks generated by ground PIoT devices are offloaded to aerial UAVs that perform edge computing. Jointly optimizing task offloading and UAV trajectory poses challenges such as many decision variables, information uncertainty, and long-term queue delay constraints. Due to the limited battery capacity of PIoT devices and UAVs, our objective is to minimize system energy consumption under long-term queue delay constraints by jointly optimizing task offloading, trajectory planning, and computing resource assignment. In light of Lyapunov optimization, we decompose the original challenging optimization problem into two sub-problems: (1) task offloading and UAV trajectory planning and (2) aerial edge resource allocation. Accordingly, we develop a multi-agent deep reinforcement learning-based algorithm called AGIN-MADDPG for the former to achieve the maximum accumulative reward and propose a greedy solution for the latter. Extensive experiments and numerical results demonstrate that our approach can avoid the problem of gradient vanishing and outperforms other benchmark methods in terms of power consumption, task backlog, queue delay, and system throughput.
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that possesses all the information on the agents' locations and goals is absent and the agents have to sequentially decide the actions on their own without having access to the full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver. The code is available at https://github.com/AIRI-Institute/learn-to-follow.
Task scheduling (TS) and multi-agent-path-finding (MAPF) are two cruxes of pickup-and-delivery in automated warehouses. In this paper, the two cruxes are optimized simultaneously. Firstly, the system model, task model, and path model are established, respectively. Then, a task scheduling algorithm based on enhanced HEFT, a heuristic MAPF algorithm and a TS- MAPF algorithm are proposed to solve this combinatorial optimization problem. In EHEFT, a novel rank priority rule is used to determine task sequencing and task allocation. In MAPF algorithm, a CBS algorithm with priority rules is designed for path search. Subsequently, the TS-MAPF algorithm which combines EHEFT and MAPF is proposed. Finally, the proposed algorithms are tested separately against relevant typical algorithms at different scales. The experimental results indicate that the proposed algorithms exhibited good performance.
Efficient coordination and planning is essential for large-scale multi-agent systems that collaborate in a shared dynamic environment. Heuristic search methods or learning-based approaches often lack the guarantee on correctness and performance. Moreover, when the collaborative tasks contain both spatial and temporal requirements, e.g., as linear temporal logic (LTL) formulas, formal methods provide a verifiable framework for task planning. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing studies are mostly limited to small artificial cases. To address this issue, a new planning paradigm is proposed in this work for system-wide temporal task formulas that are released online and continually. It avoids two common bottlenecks in the traditional methods, i.e., (a) the direct translation of the complete task formula to the associated Büchi automaton and (b) the synchronized product between the Büchi automaton and the transition models of all agents. Instead, an adaptive planning algorithm is proposed, which computes the product of relaxed partially ordered sets (R-posets) on-the-fly and assigns these subtasks to the agents subject to the ordering constraints. It is shown that the first valid plan can be derived with a polynomial time and memory complexity with respect to the system size and the formula length. Our method can take into account task formulas with a length of more than 400 and a fleet with more than 400 agents, while most existing methods fail at the formula length of 25 within a reasonable duration. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments.
This study investigates the implementation of LLM agents in smart city management, leveraging both the inherent language processing abilities of LLMs and the distributed problem solving capabilities of multi-agent systems for the improvement of urban decision making processes. A multi-agent system architecture combines LLMs with existing urban information systems to process complex queries and generate contextually relevant responses for urban planning and management. The research is focused on three main hypotheses testing: (1) LLM agents’ capability for effective routing and processing diverse urban queries, (2) the effectiveness of Retrieval-Augmented Generation (RAG) technology in improving response accuracy when working with local knowledge and regulations, and (3) the impact of integrating LLM agents with existing urban information systems. Our experimental results, based on a comprehensive validation dataset of 150 question–answer pairs, demonstrate significant improvements in decision support capabilities. The multi-agent system achieved pipeline selection accuracy of 94–99% across different models, while the integration of RAG technology improved response accuracy by 17% for strategic development queries and 55% for service accessibility questions. The combined use of document databases and service APIs resulted in the highest performance metrics (G-Eval scores of 0.68–0.74) compared to standalone LLM responses (0.30–0.38). Using St. Petersburg’s Digital Urban Platform as a testbed, we demonstrate the practical applicability of this approach to create integrated city management systems with support complex urban decision making processes. This research contributes to the growing field of AI-enhanced urban management by providing empirical evidence of LLM agents’ effectiveness in processing heterogeneous urban data and supporting strategic planning decisions. Our findings suggest that LLM-based multi-agent systems can significantly enhance the efficiency and accuracy of urban decision making while maintaining high relevance in responses.
Fast moving unmanned aerial vehicles (UAVs) are well suited for aerial surveillance, but are limited by their battery capacity. To increase their endurance, UAVs can be refueled on slow moving unmanned ground vehicles (UGVs). This cooperative routing of UAV-UGV multi-agent system to survey vast regions within their speed and fuel constraints is a computationally challenging problem, but can be simplified with heuristics. In this study, we utilize heuristic approaches to obtain feasible and near-optimal solutions to the problem, leveraging the fuel limitations of the UAV with the minimum set cover algorithm to identify the UGV refueling points. These refueling stops enable the allocation of mission points to the UAV and UGV. A standard traveling salesman formulation and a vehicle routing formulation with time windows, dropped visits, and capacity constraints are used to solve for the UGV and UAV route, respectively. Experimental validation on a small-scale testbed (http://tiny.cc/vancvz) underscores the effectiveness of our multi-agent approach.
Advancements in Intelligent Transportation Systems (ITS) have led to innovative solutions for planning optimization, efficiency enhancement, and resource allocation in transportation networks, which are demonstrated in applications such as smart parking lot management and electric vehicle (EV) charging station allocation, where improved decision-making and system-wide optimization have been achieved. However, as these systems evolve, the demand for better adaptability and coordination continues to grow to maximize their overall effectiveness and efficiency. To achieve this, we propose the Multi-Personality Multi-Agent Meta-Reinforcement Learning (MPMA-MRL) framework. This approach incorporates multiple meta-trained, meta-tested explainable personality policies, which are deployed to each agent. A personality selector is trained and deployed on each agent to optimize the overall performance. MPMA-MRL is superior than traditional methods in terms of the adaptability and coordination in ITS by leveraging improved information from the environment, more practical coordination among agents, faster adaptation speed to intermediate tasks, and more appropriate allocation and planning. The proposed framework is evaluated in the applications of parking lot optimization and EV charging station allocation. Its broader impact on multi-agent smart systems is analyzed to demonstrate its generalizability. The results demonstrate that in parking lot optimization, MPMA-MRL significantly reduces the time required to direct all vehicles to available parking spots. In EV charging station allocation, MPMA-MRL effectively minimizes waiting times at charging stations. Moreover, in both applications, MPMA-MRL exhibits enhanced adaptability to previously unseen scenarios, improving its applicability.
We tackle the challenging problem of multi-agent cooperative motion planning for complex tasks described using signal temporal logic (STL), where robots can have nonlinear and nonholonomic dynamics. Existing methods in multi-agent motion planning, especially those based on discrete abstractions and model predictive control (MPC), suffer from limited scalability with respect to the complexity of the task, the size of the workspace, and the planning horizon. We present a method based on {\em timed waypoints\/} to address this issue. We show that timed waypoints can help abstract nonlinear behaviors of the system as safety envelopes around the reference path defined by those waypoints. Then the search for waypoints satisfying the STL specifications can be inductively encoded as a mixed-integer linear program. The agents following the synthesized timed waypoints have their tasks automatically allocated, and are guaranteed to satisfy the STL specifications while avoiding collisions. We evaluate the algorithm both in simulation and on the Robotarium platform. Results show that it supports multi-agent planning from complex specification over long planning horizons, and significantly outperforms state-of-the-art abstraction-based and MPC-based motion planning tools.
No abstract available
No abstract available
No abstract available
We study the problem of safe multi-agent motion planning in cluttered environments. Existing multi-agent reinforcement learning-based motion planners only provide approximate safety enforcement. We propose a safe reinforcement learning algorithm that leverages single-agent reinforcement learning for target regulation and a subsequent convex optimization-based filtering that ensures the collective safety of the system. Our approach yields a safe, real-time implementable multi-agent motion planner that is simpler to train and enforces safety as hard constraints. Our approach can handle state and control constraints on the agents, and enforce collision avoidance among themselves and with static obstacles in the environment. Numerical simulations and hardware experiments show the efficacy of the approach.
One of the major research topics in unmanned aerial vehicle (UAV) collaborative control systems is the problem of multi-UAV target assignment and path planning (MUTAPP). It is a complicated optimization problem in which target assignment and path planning are solved separately. However, recalculation of the optimal results is too slow for real-time operations in dynamic environments because of the large number of calculations required. In this paper, we propose an artificial intelligence method named simultaneous target assignment and path planning (STAPP) based on a multi-agent deep deterministic policy gradient (MADDPG) algorithm, which is a type of multi-agent reinforcement learning algorithm. In STAPP, the MUTAPP problem is first constructed as a multi-agent system. Then, the MADDPG framework is used to train the system to solve target assignment and path planning simultaneously according to a corresponding reward structure. The proposed system can deal with dynamic environments effectively as its execution only requires the locations of the UAVs, targets, and threat areas. Real-time performance can be guaranteed as the neural network used in the system is simple. In addition, we develop a technique to improve the training effect and use experiments to demonstrate the effectiveness of our method.
Benefiting from the powerful language expression and planning capabilities of Large Language Models (LLMs), LLM-based autonomous agents have achieved promising performance in various downstream tasks. Recently, based on the development of single-agent systems, researchers propose to construct LLM-based multi-agent systems to tackle more complicated tasks. In this paper, we propose a novel framework, named COPPER , to enhance the collaborative capabilities of LLM-based agents with the self-reflection mechanism. To improve the quality of reflections, we propose to fine-tune a shared reflector, which automatically tunes the prompts of actor models using our counterfactual PPO mechanism. On the one hand, we propose counterfactual rewards to assess the contribution of a single agent’s reflection within the system, alleviating the credit assignment problem. On the other hand, we propose to train a shared reflector, which enables the reflector to generate personalized reflections according to agent roles, while reducing the computational resource requirements and improving training stability. We conduct experiments on three datasets to evaluate the performance of our model in multi-hop question answering, mathematics, and chess scenarios. Experimental results show that COPPER possesses stronger reflection capabilities and exhibits excellent generalization performance across different actor models.
The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.
No abstract available
Autonomous driving system aims for safe and social-consistent driving through the behavioral integration among interactive agents. However, challenges remain due to multi-agent scene uncertainty and heterogeneous interaction. Current dense and sparse behavioral representations struggle with inefficiency and inconsistency in multi-agent modeling, leading to instability of collective behavioral patterns when integrating prediction and planning (IPP). To address this, we initiate a topological formation that serves as a compliant behavioral foreground to guide downstream trajectory generations. Specifically, we introduce Behavioral Topology (BeTop), a pivotal topological formulation that explicitly represents the consensual behavioral pattern among multi-agent future. BeTop is derived from braid theory to distill compliant interactive topology from multi-agent future trajectories. A synergistic learning framework (BeTopNet) supervised by BeTop facilitates the consistency of behavior prediction and planning within the predicted topology priors. Through imitative contingency learning, BeTop also effectively manages behavioral uncertainty for prediction and planning. Extensive verification on large-scale real-world datasets, including nuPlan and WOMD, demonstrates that BeTop achieves state-of-the-art performance in both prediction and planning tasks. Further validations on the proposed interactive scenario benchmark showcase planning compliance in interactive cases.
Humans collaborate in dynamic and flexible ways. Collaboration requires agents to coordinate their behavior on the fly, sometimes jointly solving a single task together and other times dividing it up into sub-tasks to work on in parallel. We develop Bayesian Delegation, a learning mechanism for decentralized multi-agent coordination that enables agents to rapidly infer the sub-tasks that other agents are working on by inverse planning. These inferences enable agents to determine, in the absence of communication, whether to plan jointly with others or work on complementary sub-tasks. We test this model in a suite of decentralized multi-agent environments inspired by cooking problems. To succeed, agents must coordinate both their high-level plans (sub-task) and their low-level actions (avoiding collisions). Including joint sub-tasks in the prior of Bayesian delegation enables agents to carry out sub-tasks that neither agent can finish independently. The full system outperforms lesioned systems that are missing one or more of these capabilities.
Large Language Models (LLMs) excel in traditional natural language processing tasks but struggle with problems that require complex domain-specific calculations or simulations. While equipping LLMs with external tools to build LLM-based agents can enhance their capabilities, existing approaches lack the flexibility to address diverse and ever-evolving user queries in open domains. Currently, there is also no existing dataset that evaluates LLMs on open-domain knowledge that requires tools to solve. To this end, we introduce OpenAct benchmark to evaluate the open-domain task-solving capability, which is built on human expert consultation and repositories in GitHub. It comprises 339 questions spanning 7 diverse domains that need to be solved with domain-specific methods. In our experiments, even state-of-the-art LLMs and LLM-based agents demonstrate unsatisfactory success rates, underscoring the need for a novel approach. Furthermore, we present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub. OpenAgent employs 1) a hierarchical framework where specialized agents handle specific tasks and can assign tasks to inferior agents, 2) a bi-level experience learning mechanism to learn from both humans' and its own experiences to tackle tool flaws. Experiments demonstrate its superior effectiveness and efficiency, which significantly outperforms baselines. Our data and code are open-source at https://github.com/OpenBMB/OpenAct.
Scientific research increasingly relies on specialized computational tools, yet effectively utilizing these tools requires substantial domain expertise. While large language models show promise in tool automation, they struggle to seamlessly integrate and orchestrate multiple tools for complex scientific workflows. Here we present SciToolAgent, a large language model-powered agent that automates hundreds of scientific tools across biology, chemistry and materials science. At its core, SciToolAgent leverages a scientific tool knowledge graph that enables intelligent tool selection and execution through graph-based retrieval-augmented generation. The agent also incorporates a comprehensive safety-checking module to ensure responsible and ethical tool usage. Extensive evaluations on a curated benchmark demonstrate that SciToolAgent outperforms existing approaches. Case studies in protein engineering, chemical reactivity prediction, chemical synthesis and metal–organic framework screening further demonstrate SciToolAgent’s capability to automate complex scientific workflows, making advanced research tools accessible to both experts and nonexperts. This study presents SciToolAgent, a large language model-based agent that orchestrates scientific tools via a knowledge graph, enabling automated and effective execution of scientific research workflows.
Abstract Motivation The growing complexity of clinical cancer research has fueled a surge in demand for automated bioinformatics tools capable of integrating clinical and genomic data to accelerate discovery efforts. Results We present the Artificial Intelligence Agent for High-Optimization and Precision Medicine (AI-HOPE), an AI-driven system that enables domain experts to conduct integrative data analyses through natural language interactions. Powered by Large Language Models, AI-HOPE interprets user instructions, converts them into executable code, and autonomously analyzes locally stored data. It supports flexible association studies, subset comparisons, clinical prevalence assessments and survival analyses. In addition, AI-HOPE enables global variable scans to identify features significantly associated with a user-defined outcome, making a powerful and intuitive tool for advancing precision medicine research. Importantly, its closed-system design prevents clinical data leakage. To demonstrate its utility, AI-HOPE was applied to The Cancer Genome Atlas data to address two clinical questions. First, it identified significant enrichment of TP53 mutations in late-stage colorectal cancer compared to early-stage cases. Second, it uncovered a strong association between KRAS mutations and poorer progression-free survival in FOLFOX-treated patients. These findings align with established literature and demonstrate AI-HOPE's ability to generate meaningful insights independently, without prior assumptions. By removing programming barriers and simplifying complex analyses, AI-HOPE bridges the gap between data complexity and research needs. With its scalable and adaptable framework, AI-HOPE has the potential to support diverse biomedical research fields, driving innovation and efficiency in translational studies. Availability and implementation The AI-HOPE software and demonstration data is available at https://github.com/Velazquez-Villarreal-Lab/AI-HOPE.
The Model Context Protocol (MCP) is rapidly emerging as a pivotal open standard, designed to enhance agent-tool integration and interoperability, and is positioned to unlock a new era of powerful, interconnected, and genuinely utilitarian agentic AI. However, despite MCP's growing adoption, existing benchmarks often fail to capture real-world agent performance within this new paradigm, leading to a distorted perception of their true operational value and an inability to reliably differentiate proficiencies. To bridge this critical evaluation gap, we introduce MCP-AgentBench—a comprehensive benchmark specifically engineered to rigorously assess language agent capabilities in MCP-mediated tool interactions. Core contributions of MCP-AgentBench include: the establishment of a robust MCP testbed comprising 33 operational servers with 188 distinct tools; the development of a benchmark featuring 600 systematically designed queries distributed across 6 distinct categories of varying interaction complexity; and the introduction of MCP-Eval, a novel outcome-oriented evaluation methodology prioritizing real-world task success. Through extensive empirical evaluation of leading language agents, we provide foundational insights. MCP-AgentBench aims to equip the research community with a standardized and reliable framework to build, validate, and advance agents capable of fully leveraging MCP's transformative benefits, thereby accelerating progress toward truly capable and interoperable AI systems.
ABSTRACT Recent advancements in generative artificial intelligence (AI), particularly Large Language Models (LLMs), offer promising capabilities for spatial analysis. However, their integration with established GIS platforms remains underexplored. In this study, we propose a framework that embeds LLMs into existing GIS platforms, using QGIS as a case study. Our approach leverages LLMs’ reasoning and coding abilities to autonomously generate spatial analysis workflows through an informed agent equipped with comprehensive documentation of key GIS tools and parameters. External tools such as GeoPandas are also incorporated to enhance the system's geoprocessing capabilities. Based on this framework, we developed a ‘GIS Copilot’ that enables users to interact with QGIS using natural language. We evaluated the copilot across over 100 tasks of varying complexity including basic (single tool/layer), intermediate (multistep with guidance), and advanced (multistep without guidance). Results show high success rates for basic and intermediate tasks, with challenges remaining in fully autonomous execution of advanced tasks. The GIS Copilot advances the vision of autonomous GIS by enabling non-experts to perform geospatial analysis with minimal prior knowledge. While full autonomy is not yet achieved, the copilot demonstrates significant potential for simplifying GIS workflows and enhancing decision-making processes.
The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations — one of the essential experimental technologies — remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from Daline and MATPOWER, this framework achieves success rates of 93.13% and 96.85%, respectively. It significantly outperforms ChatGPT 4o, o1-preview, and the fine-tuned GPT4o, which all achieved a success rate lower than 30% on complex tasks. Additionally, the proposed framework also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens. Overall, this adaptable framework lays a foundation for developing intelligent LLM-based assistants for human researchers, facilitating power system research and beyond.
No abstract available
Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available.
No abstract available
No abstract available
Multi-Agent Systems have been widely used for traffic simulation. The modeling of individuals allows indeed to introduce a behavioral diversity which is crucial to obtain realistic simulation outcomes. The recent growth of open geographical databases and related flow information provides an opportunity for enhancing traffic simulators with data automatically retrieved from the real world and updated regularly. We present here TrafficGen, a highly modular platform based on the integration of such open data within a library of rule-based behaviors, in order to provide a versatile decision support tool in traffic.
The continuous change in the manufacturing world is demanding more flexible, responsive and accurate planning tools, which are able to assist the decision-makers to take tactical and strategic decisions on short notice with a high level of confidence. For this purpose, these tools should dynamically explore different operative scenarios in the planning procedure and produce information about key performance indicators. This paper describes the development of an agent-based strategic planner, combining the flexibility of multi-agent systems principles with the optimization capability of a Mixed Integral Programming technique. The tool is integrated in an ecosystem of heterogeneous decision-making systems through an Enterprise Service Bus that also provides access to legacy data.
No abstract available
Automated Planning and Scheduling is among the growing areas in Artificial Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive review of 126 papers, this paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems: language translation, plan generation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. For each category, we articulate the issues considered and existing gaps. A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners, pointing towards a promising neuro-symbolic approach. This approach effectively combines the generative aspects of LLMs with the precision of classical planning methods. By synthesizing insights from existing literature, we underline the potential of this integration to address complex planning challenges. Our goal is to encourage the ICAPS community to recognize the complementary strengths of LLMs and symbolic planners, advocating for a direction in automated planning that leverages these synergistic capabilities to develop more advanced and intelligent planning systems. We aim to keep the categorization of papers updated on https://ai4society.github.io/LLM-Planning-Viz/, a collaborative resource that allows researchers to contribute and add new literature to the categorization.
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
The ongoing fourth Industrial Revolution depends mainly on robust Industrial Cyber-Physical Systems (ICPS). ICPS includes computing (software and hardware) abilities to control complex physical processes in distributed industrial environments. Industrial agents, originating from the well-established multi-agent systems field, provide complex and cooperative control mechanisms at the software level, allowing us to develop larger and more feature-rich ICPS. The IEEE P2660.1 standardisation project, "Recommended Practices on Industrial Agents: Integration of Software Agents and Low Level Automation Functions" focuses on identifying Industrial Agent practices that can benefit ICPS systems of the future. A key problem within this project is identifying the best-fit industrial agent practices for a given ICPS. This paper reports on the design and development of a tool to address this challenge. This tool, called IASelect, is built using graph databases and provides the ability to flexibly and visually query a growing repository of industrial agent practices relevant to ICPS. IASelect includes a front-end that allows industry practitioners to interactively identify best-fit practices without having to write manual queries.
No abstract available
Simulation of smart grid technologies requires a fundamentally new approach to integrated modeling of power systems, energy markets, building technologies, and the plethora of other resources and assets that are becoming part of modern electricity production, delivery, and consumption systems. As a result, the US Department of Energy’s Office of Electricity commissioned the development of a new type of power system simulation tool called GridLAB-D that uses an agent-based approach to simulating smart grids. This paper presents the numerical methods and approach to time-series simulation used by GridLAB-D and reviews applications in power system studies, market design, building control system design, and integration of wind power in a smart grid.
The rise of generative artificial intelligence (GAI), especially with multimodal large language models like GPT‐4o, sparked transformative potential and challenges for learning and teaching. With potential as a cognitive offloading tool, GAI can enable learners to focus on higher‐order thinking and creativity. Yet, this also raises questions about integration into traditional education due to the limited research on learners' interactions with GAI. Some studies with GAI focus on text‐based human–AI interactions, while research on embodied GAI in immersive environments like mixed reality (MR) remains unexplored. To address this, this study investigates interaction dynamics between learners and embodied GAI agents in MR, examining cognitive and socio‐emotional interactions during collaborative learning. We investigated the paired interactive patterns between a student and an embodied GAI agent in MR, based on data from 26 higher education students with 1317 recorded activities. Data were analysed using a multi‐layered learning analytics approach, including quantitative content analysis, sequence analysis via hierarchical clustering and pattern analysis through ordered network analysis (ONA). Our findings identified two interaction patterns: type (1) AI‐led Supported Exploratory Questioning (AISQ) and type (2) Learner‐Initiated Inquiry (LII) group. Despite their distinction in characteristic, both types demonstrated comparable levels of socio‐emotional engagement and exhibited meaningful cognitive engagement, surpassing the superficial content reproduction that can be observed in interactions with GPT models. This study contributes to the human–AI collaboration and learning studies, extending understanding to learning in MR environments and highlighting implications for designing AI‐based educational tools. What is already known about this topic Socio‐emotional interactions are fundamental to cognitive processes and play a critical role in collaborative learning. Generative artificial intelligence (GAI) holds transformative potential for education but raises questions about how learners interact with such technology. Most existing research focuses on text‐based interactions with GAI; there is limited empirical evidence on how embodied GAI agents within immersive environments like Mixed Reality (MR) influence the cognitive and socio‐emotional interactions for learning and regulation. What this paper adds Provides first empirical insights into cognitive and socio‐emotional interaction patterns between learners and embodied GAI agents in MR environments. Identifies two distinct interaction patterns: AISQ type (structured, guided, supportive) and LII type (inquiry‐driven, exploratory, engaging), demonstrating how these patterns influence collaborative learning dynamics. Shows that both interaction types facilitate meaningful cognitive engagement, moving beyond superficial content reproduction commonly associated with GAI interactions. Implications for practice and/or policy Insights from the identified interaction patterns can inform the design of teaching strategies that effectively integrate embodied GAI agents to enhance both cognitive and socio‐emotional engagement. Findings can guide the development of AI‐based educational tools that capitalise on the capabilities of embodied GAI agents, supporting a balance between structured guidance and exploratory learning. Highlights the need for ethical considerations in adopting embodied GAI agents, particularly regarding the human‐like realism of these agents and potential impacts on learner dependency and interaction norms.
No abstract available
In multi-robot exploration, a team of mobile robot is tasked with efficiently mapping an unknown environments. While most exploration planners assume omnidirectional sensors like LiDAR, this is impractical for small robots such as drones, where lightweight, directional sensors like cameras may be the only option due to payload constraints. These sensors have a constrained field-of-view (FoV), which adds complexity to the exploration problem, requiring not only optimal robot positioning but also sensor orientation during movement. In this work, we propose MARVEL, a neural framework that leverages graph attention networks, together with novel frontiers and orientation features fusion technique, to develop a collaborative, decentralized policy using multi-agent reinforcement learning (MARL) for robots with constrained FoV. To handle the large action space of viewpoints planning, we further introduce a novel information-driven action pruning strategy. MARVEL improves multi-robot coordination and decision-making in challenging large-scale indoor environments, while adapting to various team sizes and sensor configurations (i.e., FoV and sensor range) without additional training. Our extensive evaluation shows that MARVEL's learned policies exhibit effective coordinated behaviors, outperforming state-of-the-art exploration planners across multiple metrics. We experimentally demonstrate MARVEL's generalizability in large-scale environments, of up to 90 m by 90 m, and validate its practical applicability through successful deployment on a team of real drone hardware.
Traffic congestion remains a major challenge for modern urban transportation, diminishing both efficiency and quality of life. While autonomous driving technologies and reinforcement learning (RL) have shown promise for improving traffic control, most prior work has focused on small-scale networks or isolated intersections. Large-scale mixed traffic control, involving both human-driven and robotic vehicles, remains underexplored. In this study, we propose a decentralized multi-agent reinforcement learning framework for managing large-scale mixed traffic networks, where intersections are controlled either by traditional traffic signals or by robotic vehicles. We evaluate our approach on a real-world network of 14 intersections in Colorado Springs, Colorado, USA, using average vehicle waiting time as the primary measure of traffic efficiency. We are exploring a problem that has not been sufficiently addressed: Is large-scale Multi-Agent Traffic Control (MTC) still feasible when facing time-varying Origin-Destination (OD) patterns?
Traffic congestion remains a significant challenge in modern urban networks. Autonomous driving technologies have emerged as a potential solution. Among traffic control methods, reinforcement learning has shown superior performance over traffic signals in various scenarios. However, prior research has largely focused on small-scale networks or isolated intersections, leaving large-scale mixed traffic control largely unexplored. This study presents the first attempt to use decentralized multi-agent reinforcement learning for large-scale mixed traffic control in which some intersections are managed by traffic signals and others by robot vehicles. Evaluating a real-world network in Colorado Springs, CO, USA with 14 intersections, we measure traffic efficiency via average waiting time of vehicles at intersections and the number of vehicles reaching their destinations within a time window (i.e., throughput). At 80% RV penetration rate, our method reduces waiting time from 6.17 s to 5.09 s and increases throughput from 454 vehicles per 500 seconds to 493 vehicles per 500 seconds, outperforming the baseline of fully signalized intersections. These findings suggest that integrating reinforcement learning-based control large-scale traffic can improve overall efficiency and may inform future urban planning strategies.
No abstract available
Recently, existing computation offloading methods have provided extremely low service latency for mobile users (MUs) in multi-access edge computing (MEC). However, this remains a challenge in large-scale mixed cooperative-competitive MUs heterogeneous MEC environments. Moreover, existing methods focus more on all offloaded tasks handled by static resource allocation MEC servers (ESs) within a time interval, ignoring on-demand requirements of heterogeneous tasks, resulting in many tasks being dropped or wasting resources, especially for latency-sensitive tasks. To address these issues, we present a decentralized computation offloading solution based on the Attention-weighted Recurrent Multi-Agent Actor-Critic (ARMAAC). First, we design a recurrent actor-critic framework to assist MU agents in remembering historical resource allocation information of ESs to better understand the future state of ESs, especially in dynamic resource allocation. Second, an attention mechanism is introduced to compress the joint observation space dimension of all MUs agent to adapt to large-scale MUs. Finally, the actor-critic framework with double centralized critics and Dueling network is redesigned considering the instability and convergence difficulties caused by the sensitive relationship between the actor and critic networks. The experiments show that ARMAAC improves task completion rates and reduces average system cost by 11.01%<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math><alternatives><mml:math><mml:mo>∼</mml:mo></mml:math><inline-graphic xlink:href="gao-ieq1-3141080.gif"/></alternatives></inline-formula>14.03% and 10.45%<inline-formula><tex-math notation="LaTeX">$\sim$</tex-math><alternatives><mml:math><mml:mo>∼</mml:mo></mml:math><inline-graphic xlink:href="gao-ieq2-3141080.gif"/></alternatives></inline-formula>15.56% compared with baselines.
With the exponential growth of mobile users, ensuring high-quality network coverage has become paramount. Large-scale mobile networks consist of numerous base stations (BSs), each with adjustable parameters such as angles and beam widths. Automatically optimizing network coverage can be difficult due to environmental factors and the interdependence of the adjustable parameters. Due to the inherent uncertainties and unpredictable nature of large-scale wireless networks, traditional methods such as heuristics and meta-heuristics lack the adaptability and scalability required to cope with their dynamic environment. To address these challenges, we propose utilizing digital twin and reinforcement learning (RL) techniques within mobile networks characterized by multiple collaborating agents. We initially introduce DT-SimNet, a digital twin-enabled mobile network simulator to facilitate optimization evaluation. DT-SimNet can efficiently simulate communication behaviors of network elements within a complex environment while revealing user mobility patterns. Moreover, to address challenges arising from multifaceted relationships among users, BSs, and the parameters across BSs, we introduce an innovative strategy named Optimized Multi-Agent Proximal Policy Optimization with Self-supervised Prediction (OMAPPO-SSP). Compared to MAPPO, which leads to limited applicability and inferior performance due to the dynamic characteristics of 5G networks, this approach leverages network structure optimization and a self-supervised prediction mechanism, employing multi-agent reinforcement learning (MARL) principles to enhance efficiency. By harnessing collaborative neural networks, OMAPPO-SSP facilitates the explicit learning of behavioral interactions among all BSs, enabling effective decision-making in environments characterized by intricate spatial relationships, dynamic user behaviors, and diverse interactions. Extensive experiments are conducted to validate the efficiency and effectiveness of the OMAPPO-SSP. Within the target area, OMAPPO-SSP achieves a coverage ratio of 94.66% and an average throughput of 89746 bits per second (bps), demonstrating significant improvements compared to competing methods.
In this paper, the real-time optimal transmission power control problem is investigated for large-scale mobile wireless sensor networks (MWSN). Controlling large-scale MWSN has two novel challenges, i.e., 1) increasing navigation complexity due to a large number of mobile wireless sensors, and 2) limited energy prohibits peer-to-peer communication between large-scale mobile sensors. To overcome these challenges, the novel mean field game theory is adopted and integrated along with the emerging decentralized reinforcement learning technique. Specifically, the optimal transmission control problem and the optimal navigation problem are formulated as mean field games with two objectives. Then, a novel Actor-Critic-Mass multi-agent reinforcement learning algorithm is developed to learn the decentralized optimal transmission power control and motion control. To learn the decentralized optimal navigation and transmission power control policies, the coupled Hamiltonian-Jacobian-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations are derived in mean field game formulation. The learned decentralized policies can be guaranteed to converge close to the optimal value, i.e., the Nash Equilibrium, even with large-scale MWSNs in uncertain environments. Finally, the numerical simulations have been provided to demonstrate the effectiveness of the proposed design.
With the rapid advancement of communication technology and the exponential growth of mobile users, improving network coverage quality and throughput has become increasingly important. In particular, large-scale Base Station (BS) cooperative optimization has become a highly significant topic. BSs can adjust various parameters for high-quality communication, but automating this optimization remains challenging due to environmental sensitivity and interdependencies. Traditional methods for network optimization are constrained by the intricate nature of real-world environments. Further, Reinforcement Learning (RL) techniques, which are effective for configuration policies, encounter difficulties in intricate, high-dimensional wireless communication networks, especially in multi-agent cooperative optimization. To overcome these challenges, this article proposes the Enhanced Multi-Agent Proximal Policy Optimization (EMAPPO), which utilizes the capabilities of the UNet network to extract multi-spatial relationships among a massive number of network elements and employs the DiffPool network to efficiently depict the impact of large-scale action coordination among massive agents on coverage performance. To facilitate evaluation in communication optimization, we further introduce a high-fidelity digital twin-driven mobile network. Extensive experiments validate the effectiveness and superior performance of EMAPPO by utilizing the network digital twin. The results demonstrate significant improvements in signal coverage rate and network throughput compared to the competing methods.
The use of multi-agent reinforcement learning (MARL) methods in coordinating traffic lights (CTL) has become increasingly popular, treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i.e., same network and parameter for each agent, or treat each agent completely heterogeneous, i.e., different networks and parameters for each agent. This creates a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions to maintain a learnable and dynamic clustering, one that uses mutual information estimation for better stability, and the other that maximizes separability between groups. Finally, GPLight enforces the agents in a group to share the same network and parameters. This approach reduces complexity by promoting cooperation within the same group of agents while reflecting differences between groups to ensure accuracy. To verify the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, with up to 1,089 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.
This letter develops a novel multi-agent deep reinforcement learning (MADRL)-based local control method that can achieve coordinated scheduling of large-scale PV inverters using local information. This is achieved by the development of a system state inference-aided actor structure for each agent and implementation of random sequential updating within centralized-training-decentralized-execution framework. To enhance the coordination between agents utilizing local observation, a state latent inductive reasoning-based composite loss is further designed for the optimization of the inference models. Simulation tests on IEEE 123-node network demonstrate the superiority of the developed local control method when there is a large number of PV inverters.
Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). While centralized approaches are often infeasible for large-scale TSC problems, decentralized approaches provide scalability but introduce new challenges, such as partial observability. Communication plays a critical role in decentralized MARL, as agents must learn to exchange information using messages to better understand the system and achieve effective coordination. Deep MARL has been used to enable inter-agent communication by learning communication protocols in a differentiable manner. However, many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates “which” part of the message is sent “to whom”. In essence, our framework enables agents to selectively choose the recipients of their messages and exchange variable length messages with them. This results in a decentralized and flexible communication mechanism in which agents can effectively use the communication channel only when necessary. We designed two networks, a synthetic $4 \times 4$ grid network and a real-world network based on the Pasubio neighborhood in Bologna. Our framework achieved the lowest network congestion compared to related methods, with agents utilizing $\sim 47-65 \%$ of the communication channel. Ablation studies further demonstrated the effectiveness of the communication policies learned within our framework.
Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.
Recent advancements in reinforcement learning have witnessed remarkable achievements by intelligent agents ranging from game-playing to industrial applications. Of particular interest is the area of multi-agent reinforcement learning (MARL), which holds significant potential for real-world scenarios. However, typical MARL methods are limited in their ability to handle tens of agents, leaving scenarios with up to hundreds or even thousands of agents almost unexplored. The scaling up of the number of agents presents two primary challenges: (1) agent-agent interactions are crucial in multi-agent systems while the number of interactions grows quadratically with the number of agents, resulting in substantial computational complexity and difficulty in strategies-learning; (2) the strengths of interactions among agents exhibit variations both across agents and over time, making it difficult to precisely model such interactions. In this paper, we propose a novel approach named Graph Attention Mean Field (GAT-MF). By converting agent-agent interactions into interactions between each agent and a weighted mean field, we achieve a substantial reduction in computational complexity. The proposed method offers a precise modeling of interaction dynamics with mathematical proofs of its correctness. Additionally, we design a graph attention mechanism to automatically capture the diverse and time-varying strengths of interactions, ensuring an accurate representation of agent interactions. Through extensive experimentation conducted in both manual and real-world scenarios involving over 3000 agents, we validate the efficacy of our method. The results demonstrate that our method outperforms the best baseline method with a remarkable improvement of 42.7%. Furthermore, our method saves 86.4% training time and 19.2% GPU memory compared to the best baseline method. For reproducibility, our source codes and data are available at https://github.com/tsinghua-fib-lab/Large-Scale-MARL-GATMF.
With the emerging connected-vehicle technologies and smart roadways, the need for intelligent adaptive traffic signal controls (ATSC) is more than ever before. This paper first proposes an Accumulated Exponentially Weighted Waiting Time-based Adaptive Traffic Signal Control (AEWWT-ATSC) model to calculate priorities of roadways for signal scheduling. As the size of the traffic network grows, it adds great complexities and challenges to computational efficiencies. Considering this, we propose a novel Distributed Multi-agent Reinforcement Learning (DMARL) with a graph decomposition approach for large-scale ATSC problems. The decomposition clusters intersections by the level of connectivity (LoC), defined by the average residual capacities (ARC) between connected intersections, enabling us to train subgraphs instead of the entire network in a synchronized way. The problem is formulated as a Markov Decision Process (MDP), and the Double Dueling Deep Q Network with Prioritized Experience Replay is utilized to solve it. Under the optimal policy, the agents can select the optimal signal durations to minimize the waiting time and queue size. In evaluation, we show the superiority of the AEWWT-ATSC based RL methods in different densities and demonstrate the DMARL with a graph decomposition approach on a large graph in Manhattan, NYC. The approach is generic and can be extended to various types of use cases.
No abstract available
Robotic sortation centers use mobile robots to sort packages by their destinations. The destination-to-sort-location (chute) mapping can significantly impact the volume of packages that can be sorted by the sortation floor. In this work, we propose a multi-agent reinforcement learning method to solve large-scale chute mapping problems with hundreds of agents (the destinations). To address the exponential growth of the state-action space, we decompose the joint action-value function as the sum of local action-value functions associated with the individual agents. To incorporate robot congestion effects on the rates at which packages are sorted, we couple the local action-value functions through the states of destinations mapped to nearby chutes on the sortation floor. We show that our proposed framework can solve large chute mapping problems and outperforms static or reactive policies that are commonly used in practice in robotic sortation facilities.
Variable speed limit (VSL) control has emerged as a promising traffic management strategy for enhancing safety and mobility. In this study, we introduce a multi-agent reinforcement learning framework for implementing a large-scale VSL system to address recurring congestion in transportation corridors. The VSL control problem is modeled as a Markov game, using only data widely available on freeways. By employing parameter sharing among all VSL agents, the proposed algorithm can efficiently scale to cover extensive corridors. The agents are trained using a reward structure that incorporates adaptability, safety, mobility, and penalty terms; enabling agents to learn a coordinated policy that effectively reduces spatial speed variations while minimizing the impact on mobility. Our findings reveal that the proposed algorithm leads to a significant reduction in speed variation, which holds the potential to reduce incidents. Furthermore, the proposed approach performs satisfactorily under varying traffic demand and compliance rates.
No abstract available
In this paper, we investigate the uplink transmit power optimization problem in cell-free (CF) extremely large-scale multiple-input multiple-output (XL-MIMO) systems. Instead of applying the traditional methods, we propose two signal processing architectures: the centralized training and centralized execution with fuzzy logic as well as the centralized training and decentralized execution with fuzzy logic, respectively, which adopt the amalgamation of multi-agent reinforcement learning (MARL) and fuzzy logic to solve the design problem of power control for the maximization of the system spectral efficiency (SE). Furthermore, the uplink performance of the system adopting maximum ratio (MR) combining and local minimum mean-squared error (L-MMSE) combining is evaluated. Our results show that the proposed methods with fuzzy logic outperform the conventional MARL-based method and signal processing methods in terms of computational complexity. Also, the SE performance under MR combining is even better than that of the conventional MARL-based method.
Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, the centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. The multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now, the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent, advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. The results demonstrate its optimality, robustness, and sample efficiency over the other state-of-the-art decentralized MARL algorithms.
Background Policy makers are facing more complicated challenges to balance saving lives and economic development in the post-vaccination era during a pandemic. Epidemic simulation models and pandemic control methods are designed to tackle this problem. However, most of the existing approaches cannot be applied to real-world cases due to the lack of adaptability to new scenarios and micro representational ability (especially for system dynamics models), the huge computation demand, and the inefficient use of historical information. Methods We propose a novel Pandemic Control decision making framework via large-scale Agent-based modeling and deep Reinforcement learning (PaCAR) to search optimal control policies that can simultaneously minimize the spread of infection and the government restrictions. In the framework, we develop a new large-scale agent-based simulator with vaccine settings implemented to be calibrated and serve as a realistic environment for a city or a state. We also design a novel reinforcement learning architecture applicable to the pandemic control problem, with a reward carefully designed by the net monetary benefit framework and a sequence learning network to extract information from the sequential epidemiological observations, such as number of cases, vaccination, and so forth. Results Our approach outperforms the baselines designed by experts or adopted by real-world governments and is flexible in dealing with different variants, such as Alpha and Delta in COVID-19. PaCAR succeeds in controlling the pandemic with the lowest economic costs and relatively short epidemic duration and few cases. We further conduct extensive experiments to analyze the reasoning behind the resulting policy sequence and try to conclude this as an informative reference for policy makers in the post-vaccination era of COVID-19 and beyond. Limitations The modeling of economic costs, which are directly estimated by the level of government restrictions, is rather simple. This article mainly focuses on several specific control methods and single-wave pandemic control. Conclusions The proposed framework PaCAR can offer adaptive pandemic control recommendations on different variants and population sizes. Intelligent pandemic control empowered by artificial intelligence may help us make it through the current COVID-19 and other possible pandemics in the future with less cost both of lives and economy. Highlights We introduce a new efficient, large-scale agent-based epidemic simulator in our framework PaCAR, which can be applied to train reinforcement learning networks in a real-world scenario with a population of more than 10,000,000. We develop a novel learning mechanism in PaCAR, which augments reinforcement learning with sequence learning, to learn the tradeoff policy decision of saving lives and economic development in the post-vaccination era. We demonstrate that the policy learned by PaCAR outperforms different benchmark policies under various reality conditions during COVID-19. We analyze the resulting policy given by PaCAR, and the lessons may shed light on better pandemic preparedness plans in the future.
No abstract available
No abstract available
Learning in large scale Multi-Agent Reinforcement Learning is fundamentally difficult due to the curse of dimensionality. In homogeneous multi-agent setting, mean field theory provides an effective way of scaling MARL to environments with many agents by abstracting other agents to a virtual mean agent, which assumes the impact of each player on the outcome is equal and infinitesimal. However, in some real scenarios, it is only several neighboring agents that affect the decision-making of an agent, need not all other agents. In addition, different neighboring agents may have different degrees of influence on the decision-making of an agent. In this paper, not restricted to homogeneous setting, we propose Adaptive Mean Field Multi-Agent Reinforcement Learning (AMF-MARL), which is based on the attention mechanism and can be used to deal with many agent scenarios in which there may be different influence relationships among agents. Specifically, we firstly derive the mean field approximation with adaptive weight. Then, we propose the Adaptive Mean Field Q-learning (AMF-Q) approach, and describe how to obtain the adaptive weight. Finally, we conduct experiment to study the learning effectiveness of proposed approach.
Reinforcement learning is of vital significance in machine learning and is also a promising approach for traffic signal control in urban road networks with assistance of deep neural networks. However, in a large scale urban network, the centralized reinforcement learning approach is beset with difficulties due to the extremely high dimension of joint action space. The multi-agent reinforcement learning (MARL) approach overcomes the high dimension problem by employing distributed local agents whose action space is much smaller. Even though, MARL approach introduces another issue that multiple agents interact with environment simultaneously causing its instability so that training each agent independently may not converge. This paper presents an actor-critic based decentralized MARL approach to control traffic signal which overcomes the shortcomings of both centralized RL approach and independent MARL approach. In particular, a distributed critic network is designed which overcomes the difficulty to train a large-scale neural network in centralized RL approach. Moreover, a difference reward method is proposed to evaluate the contribution of each agent, which accelerates the convergence of algorithm and makes agents optimize policy in a more accurate direction. The proposed MARL approach is compared against the fully independent approach and the centralized learning approach in a grid network. Simulation results demonstrate its effectiveness in terms of average travel speed, travel delay and queue length over other MARL algorithms.
In recent years, ride-sharing has gained popularity as a daily means of transportation. The primary challenge for large-scale online ride-sharing platforms is to design an efficient fleet management policy that reallocates vehicles to appropriate regions to receive orders, thereby improving the platform’s cumulative revenue and order response rate. Combinatorial optimization algorithms and reinforcement learning methods are commonly employed for this task, but they typically learn a unified repositioning policy for all regions. However, different regions, such as hot and cold zones, may require different repositioning policies due to varying travel patterns. In this paper, we propose a multi-agent mixed hierarchical reinforcement learning approach, called MIX-H, for efficient large-scale fleet management by formulating it as a Markov decision process. MIX-H adopts multi-level controllers, including a leader controller and follower controller, for multi-level action learning. The leader controller plans the goal to be executed by the follower controller. Additionally, to improve the algorithm’s stability, we introduce a MIX module to compute the total value of joint action. Finally, experiments on real-world datasets demonstrate that the proposed method outperforms the state-of-the-art methods.
No abstract available
Peer-to-peer (P2P) transactive energy trading has emerged as a promising paradigm towards maximizing the flexibility value of prosumers’ distributed energy resources (DERs). Despite reinforcement learning constitutes a well-suited model-free and data-driven methodological framework to optimize prosumers’ energy management decisions, its application to the large-scale coordinated management and P2P trading among multiple prosumers within an energy community is still challenging, due to the scalability, non-stationarity and privacy limitations of state-of-the-art multi-agent deep reinforcement learning (MADRL) approaches. This paper proposes a novel P2P transactive trading scheme based on the multi-actor-attention-critic (MAAC) algorithm, which addresses the above challenges individually. This method is complemented by a P2P trading platform that incentivizes prosumers to engage in local energy trading while also penalizes each prosumer’s addition to rebound peaks. Case studies involving a real-world, large-scale scenario with 300 residential prosumers demonstrate that the proposed method significantly outperforms the state-of-the-art MADRL methods in reducing the community’s cost and peak demand.
Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.
When dealing with a series of imminent issues, humans can naturally concentrate on a subset of these concerning issues by prioritizing them according to their contributions to motivational indices, e.g., the probability of winning a game. This idea of concentration offers insights into reinforcement learning of sophisticated Large-scale Multi-Agent Systems (LMAS) participated by hundreds of agents. In such an LMAS, each agent receives a long series of entity observations at each step, which can overwhelm existing aggregation networks such as graph attention networks and cause inefficiency. In this paper, we propose a concentration network called ConcNet. First, ConcNet scores the observed entities considering several motivational indices, e.g., expected survival time and state value of the agents, and then ranks, prunes, and aggregates the encodings of observed entities to extract features. Second, distinct from the well-known attention mechanism, ConcNet has a unique motivational subnetwork to explicitly consider the motivational indices when scoring the observed entities. Furthermore, we present a concentration policy gradient architecture that can learn effective policies in LMAS from scratch. Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.
Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance. While server sharing among jobs improves resource utilization, interference among co-located DL jobs occurs due to resource contention. Interference-aware job placement has been studied, with white-box approaches based on explicit interference modeling and black-box schedulers with reinforcement learning. In today’s clusters containing thousands of GPU servers, running a single scheduler to manage all arrival jobs in a timely and effective manner is challenging, due to the large workload scale. We adopt multiple schedulers in a large-scale cluster/data center, and propose a multi-agent reinforcement learning (MARL) scheduling framework to cooperatively learn fine-grained job placement policies, towards the objective of minimizing job completion time (JCT). To achieve topology-aware placements, our proposed framework uses hierarchical graph neural networks to encode the data center topology and server architecture. In view of a common lack of precise reward samples corresponding to different placements, a job interference model is further devised to predict interference levels in face of various co-locations, for training of the MARL schedulers. Testbed and trace-driven evaluations show that our scheduler framework outperforms representative scheduling schemes by more than 20% in terms of average JCT, and is adaptive to various machine learning cluster topologies.
In order to address the data security and communication efficiency of vehicles during high-speed mobile communication, this paper investigates the problem of secure in-vehicle communication resource allocation based on slow-variable large-scale fading channel information, to meet the quality of service requirements of vehicular communication, i.e., to ensure the reliability of vehicle-to-vehicle (V2V) communication and the time delay while maximizing the transmission rate of the cellular link. And an eavesdropping model is introduced to ensure the secure delivery of link information. Considering that the high mobility of vehicles causes rapid channel changes, we model the problem as a Markov decision process and propose a resource allocation optimization framework based on the Multi-Agent Reinforcement Learning Algorithm (MARL-DDQN), in which a large-scale neural network model is built to train vehicular to learn the optimal resource allocation strategy for optimal communication performance and security performance. Simulation results show that the load successful delivery rate and confidentiality performance of the vehicular communication network are effectively improved compared to the baseline and MADDPG strategies while ensuring link security. This study provides useful references and practical value for the optimization of secure communication resource allocation in vehicular networking.
No abstract available
The primary challenge in the development of large-scale artificial intelligence (AI) systems lies in achieving scalable decision-making—extending the AI models while maintaining sufficient performance. Existing research indicates that distributed AI can improve scalability by decomposing complex tasks and distributing them across collaborative nodes. However, previous technologies suffered from compromised real-world applicability and scalability due to the massive requirement of communication and sampled data. Here we develop a model-based decentralized policy optimization framework, which can be efficiently deployed in multi-agent systems. By leveraging local observation through the agent-level topological decoupling of global dynamics, we prove that this decentralized mechanism achieves accurate estimations of global information. Importantly, we further introduce model learning to reinforce the optimal policy for monotonic improvement with a limited amount of sampled data. Empirical results on diverse scenarios show the superior scalability of our approach, particularly in real-world systems with hundreds of agents, thereby paving the way for scaling up AI systems. Applying large-scale AI systems to multi-agent scenarios in real-world settings is challenging. The authors propose a decentralized model-based policy optimization framework to enable scalable decision-making.
In this paper, we focus on the demand-capacity balancing (DCB) problem in air traffic flow management, which is considered as a fully cooperative multi-agent learning task. First, a rule-based time-step environment is designed to mimic the DCB process. In this environment, each agent ‘flight’ decides its action at valid time steps. Three different rules are defined, based on the remaining capacity and the number of cooperative flights in each sector, to ease the learning process. Second, a multi-agent reinforcement learning framework, built on the proximal policy optimization (MAPPO), is proposed by using the parameter sharing mechanism and the mean-field approximation method, where an inherent feature of all other agents is extracted to address the credit assignment problem. Moreover, a supervisor integrated MAPPO framework is proposed, where a supervisor is designed to generate supervised actions, in such a way to further improve the learning performance. In the experiments, two performance indices, Search Capability and Generalization Capability, are considered. Both indices are assessed with the evaluation of two toy cases and a real-world case study. Results suggest that, the supervisor integrated MAPPO with supervised actions achieves the best performance across the different cases; other proposed methods also show some promising Search Capability, but only prove an acceptable Generalization Capability in simpler cases than the training cases.
In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a $\text{130}\;\text{m}\times \text{100}\,\text{m}$ benchmark scenario. We also validate our learned model on hardware.
The development of machine learning and artificial intelligence algorithms, as well as the progress of unmanned aerial vehicle swarm technology, has significantly enhanced the intelligence and autonomy of unmanned aerial vehicles in search missions, resulting in greater efficiency when searching unknown areas. However, as search scenarios become more complex, the existing unmanned aerial vehicle swarm search method lacks scalability and efficient cooperation. Furthermore, due to the increasing scale of search scenarios, the accuracy and real-time performance of global information are difficult to ensure, necessitating the provision of local information. This paper focuses on the large-scale search scenario and split it to provide both local and global information for running unmanned aerial vehicle swarm search algorithms. Since the search environment is often unknown, dynamic, and complex, it requires adaptive decision-making in a constantly changing environment, which is suitable for modeling as a Markov decision process. Considering the sequential-based scenario, we propose a distributed collaborative search method based on a multi-agent reinforcement learning algorithm, which can operate efficiently in complex and large-scale scenarios. Additionally, the proposed method can utilize a convolutional neural network to process high-dimensional map data with almost no loss of the structure information. Experimental results demonstrate that the proposed method can collaboratively search unknown areas, avoid collisions and repetitions, and find all targets faster compared with the benchmarks.
Simulation of population dynamics is a central research theme in computational biology, which contributes to understanding the interactions between predators and preys. Conventional mathematical tools of this theme, however, are incapable of accounting for several important attributes of such systems, such as the intelligent and adaptive behavior exhibited by individual agents. This unrealistic setting is often insufficient to simulate properties of population dynamics found in the real-world. In this work, we leverage multi-agent deep reinforcement learning, and we propose a new model of large-scale predator-prey ecosystems. Using different variants of our proposed environment, we show that multi-agent simulations can exhibit key real-world dynamical properties. To obtain this behavior, we firstly define a mating mechanism such that existing agents reproduce new individuals bound by the conditions of the environment. Furthermore, we incorporate a real-time evolutionary algorithm and show that reinforcement learning enhances the evolution of the agents' physical properties such as speed, attack and resilience against attacks.
We propose a novel approach to optimize fleet management by combining multi-agent reinforcement learning with graph neural network. To provide ride-hailing service, one needs to optimize dynamic resources and demands over spatial domain. While the spatial structure was previously approximated with a regular grid, our approach represents the road network with a graph, which better reflects the underlying geometric structure. Dynamic resource allocation is formulated as multi-agent reinforcement learning, whose action-value function (Q function) is approximated with graph neural networks. We use stochastic policy update rule over the graph with deep Q-networks (DQN), and achieve superior results over the greedy policy update. We design a realistic simulator that emulates the empirical taxi call data, and confirm the effectiveness of the proposed model under various conditions.
For software applications in health coaching domains to be effective, it is vital that they address issues of privacy, modularity, scalability, individualization, data integration, transferability, coordination and flexibility. In this paper, we propose a novel generic multi-agent architecture which serves as a template for health coaching applications involving wearable sensors. Analyzer and communication modules allow different functionalities like goal formation, planning, scheduling, event detection, learning, inter-agent + human communication and long-term data collection, based on the capabilities of the underlying sensor platforms. To show the flexibility of our proposed architecture, we have successfully built two different health coaching systems with the proposed architecture: (1) a static system based on the Fitbit platform where the coaching is done at specific preset times to encourage increased physical activity, and (2) a dynamic system based on the Apple Watch platform where the smart coach adapts and learns when to intervene to encourage physical activity and reduce sedentary behavior.
Despite major progress in Robotics and AI, robots are still basically "zombies" repeatedly achieving actions and tasks without understanding what they are doing. Deep-Learning AI programs classify tremendous amounts of data without grasping the meaning of their inputs or outputs. We still lack a genuine theory of the underlying principles and methods that would enable robots to understand their environment, to be cognizant of what they do, to take appropriate and timely initiatives, to learn from their own experience and to show that they know that they have learned and how. The rationale of this paper is that the understanding of its environment by an agent (the agent itself and its effects on the environment included) requires its self-awareness, which actually is itself emerging as a result of this understanding and the distinction that the agent is capable to make between its own mind-body and its environment. The paper develops along five issues: agent perception and interaction with the environment; learning actions; agent interaction with other agents-specifically humans; decision-making; and the cognitive architecture integrating these capacities.
Plants offer a source of bioinspiration for soft robotics. Nevertheless, a gap remains in designing robots based on the fundamental principles of plant intelligence, rooted in a non-centralized, modular architecture and a highly plastic phenotype. We contend that a holistic approach to plant bioinspiration-one that draws more fully on the features of plant intelligence and behavior-evidences the value of an enactivist perspective. This is because enactivism emphasizes not only features of embodiment such as material composition and morphology, but also autonomy as an important aspect of plant intelligence and behavior. The enactivist sense of autonomy concerns the dynamics of self-producing systems (such as plants) that create a distinction between themselves and a domain of interactions that bear on the conditions of viability of the system. This contrasts with the widespread, but diluted notion of autonomy that merely indicates the independent operability of a system for an arbitrary period. Different notions of autonomy are relevant for soft roboticists, for instance, when evaluating limitations on existing growing robots ("growbots") that take bioinspiration from plants, but depend on a fixed source of energy and material provided by an external agent. More generally, plant-inspired robots serve as a case study for an enactivist approach to intelligence, while, correspondingly, enactivism calls attention to the possibility of non-zoological forms of intelligence embodied in a self-organizing, autonomous system.
The artificial intelligence (AI) tools based on large-language models may serve as a demonstration that we are reaching a groundbreaking new paradigm in which machines themselves will generate knowledge autonomously. This statement is based on the assumption that the ability to master natural languages is the ultimate frontier for this new paradigm and perhaps an essential step to achieving the so-called general artificial intelligence. Autonomous knowledge generation implies that a machine will be able, for instance, to retrieve and understand the contents of the scientific literature and provide interpretations for existing data, allowing it to propose and address new scientific problems. While one may assume that the continued development of AI tools exploiting large-language models, with more data used for training, may lead these systems to learn autonomously, this learning can be accelerated by devising human-assisted strategies to deal with specific tasks. For example, strategies may be implemented for AI tools to emulate the analysis of multivariate data by human experts or in identifying and explaining patterns in temporal series. In addition to generic AI tools, such as Chat AIs, one may conceive personal AI agents, potentially working together, that are likely to serve end users in the near future. In this perspective paper, we discuss the development of this type of agent, focusing on its architecture and requirements. As a proof-of-concept, we exemplify how such an AI agent could work to assist researchers in materials sciences.
Understanding novelty and improvisation in music requires gathering insight from a variety of disciplines. One fruitful path for synthesizing these insights is via modeling. As such, my aim in this paper is to start building a bridge between traditional cognitive models and contemporary embodied and ecological approaches to cognitive science. To achieve this task, I offer a perspective on a model that would combine elements of ecological psychology (especially affordances) and the Learning Intelligent Decision Agent (LIDA) cognitive architecture. Jeff Pressing's cognitive model of musical improvisation will also be a central link between these elements. While some overlap between these three areas already exists, there are several points of tension between them, notably concerning the nature of perception and the function of artificial general intelligence modeling. I thus aim to alleviate the most worrisome concerns here, introduce several future research questions, and conclude with several points on how my account is part of a general theory, rather than merely a redescription of existent work.
Eukaryotic cells learn and adapt via unknown network architectures. Recent work demonstrated a circuit of two GTPases used by cells to overcome growth factor scarcity, encouraging our view that artificial and biological intelligence share strikingly similar design principles and that cells function as deep reinforcement learning (RL) agents in uncertain environments.
Multicast communication technology is widely applied in wireless environments with a high device density. Traditional wireless network architectures have difficulty flexibly obtaining and maintaining global network state information and cannot quickly respond to network state changes, thus affecting the throughput, delay, and other QoS requirements of existing multicasting solutions. Therefore, this paper proposes a new multicast routing method based on multiagent deep reinforcement learning (MADRL-MR) in a software-defined wireless networking (SDWN) environment. First, SDWN technology is adopted to flexibly configure the network and obtain network state information in the form of traffic matrices representing global network links information, such as link bandwidth, delay, and packet loss rate. Second, the multicast routing problem is divided into multiple subproblems, which are solved through multiagent cooperation. To enable each agent to accurately understand the current network state and the status of multicast tree construction, the state space of each agent is designed based on the traffic and multicast tree status matrices, and the set of AP nodes in the network is used as the action space. A novel single-hop action strategy is designed, along with a reward function based on the four states that may occur during tree construction: progress, invalid, loop, and termination. Finally, a decentralized training approach is combined with transfer learning to enable each agent to quickly adapt to the dynamic changes of network link information and accelerate convergence. Simulation experiments show that MADRL-MR outperforms existing algorithms in terms of throughput, delay, packet loss rate, etc., and can establish more intelligent multicast routes. Code and model are available at https://github.com/GuetYe/MADRL-MR_code.
People who suffer from any kind of motor difficulty face serious complications to autonomously move in their daily lives. However, a growing number research projects which propose different powered wheelchairs control systems are arising. Despite of the interest of the research community in the area, there is no platform that allows an easy integration of various control methods that make use of heterogeneous sensors and computationally demanding algorithms. In this work, an architecture based on virtual organizations of agents is proposed that makes use of a flexible and scalable communication protocol that allows the deployment of embedded agents in computationally limited devices. In order to validate the proper functioning of the proposed system, it has been integrated into a conventional wheelchair and a set of alternative control interfaces have been developed and deployed, including a portable electroencephalography system, a voice interface or as specifically designed smartphone application. A set of tests were conducted to test both the platform adequacy and the accuracy and ease of use of the proposed control systems yielding positive results that can be useful in further wheelchair interfaces design and implementation.
In line with Allen Newell's challenge to develop complete cognitive architectures, and motivated by a recent proposal for a unifying subsymbolic computational theory of cognition, we introduce the cognitive control architecture SEMLINCS. SEMLINCS models the development of an embodied cognitive agent that learns discrete production rule-like structures from its own, autonomously gathered, continuous sensorimotor experiences. Moreover, the agent uses the developing knowledge to plan and control environmental interactions in a versatile, goal-directed, and self-motivated manner. Thus, in contrast to several well-known symbolic cognitive architectures, SEMLINCS is not provided with production rules and the involved symbols, but it learns them. In this paper, the actual implementation of SEMLINCS causes learning and self-motivated, autonomous behavioral control of the game figure Mario in a clone of the computer game Super Mario Bros. Our evaluations highlight the successful development of behavioral versatility as well as the learning of suitable production rules and the involved symbols from sensorimotor experiences. Moreover, knowledge- and motivation-dependent individualizations of the agents' behavioral tendencies are shown. Finally, interaction sequences can be planned on the sensorimotor-grounded production rule level. Current limitations directly point toward the need for several further enhancements, which may be integrated into SEMLINCS in the near future. Overall, SEMLINCS may be viewed as an architecture that allows the functional and computational modeling of embodied cognitive development, whereby the current main focus lies on the development of production rules from sensorimotor experiences.
The Internet of Things (IoT) allows the sharing of information among devices in a network. Hardware evolutions have enabled the employment of cognitive agents on top of such devices, which could help to adopt pro-active and autonomous IoT systems. Agents are autonomous entities from Artificial Intelligence capable of sensing (perceiving) the environment where they are situated. Then, with these captured perceptions, they can reason and act pro-actively. However, some agent approaches are created for a specific domain or application when dealing with embedded systems and hardware interfacing. In addition, the agent architecture can compromise the system's performance because of the number of perceptions that agents can access. This paper presents three engineering approaches for creating IoT Objects using Embedded Multi-agent systems (MAS)-as cognitive systems at the edge of an IoT network-connecting, acting, and sharing information with a re-engineered IoT architecture based on the Sensor as a Service model. These engineering approaches use Belief-Desire-Intention (BDI) agents and the JaCaMo framework. In addition, it is expected to diversify the designers' choice in applying embedded MAS in IoT systems. We also present a case study to validate the whole re-engineered architecture and the approaches. Moreover, some performance tests and comparisons are also presented. The study case shows that each approach is more or less suitable depending on the domain tackled. The performance tests show that the re-engineered IoT architecture is scalable and that there are some trade-offs in adopting one or another approach. The contributions of this paper are an architecture for sharing resources in an IoT network, the use of embedded MAS on top IoT Objects, and three engineering approaches considering agent and artifacts dimensions.
In recent years, we have experienced rapid development of advanced technology, machine learning, and artificial intelligence (AI), intended to interact with and augment the abilities of humans in practically every area of life. With the rapid growth of new capabilities, such as those enabled by generative AI (e.g., ChatGPT), AI is increasingly at the center of human communication and collaboration, resulting in a growing recognition of the need to understand how humans and AI can integrate their inputs in collaborative teams. However, there are many unanswered questions regarding how human-AI collective intelligence will emerge and what the barriers might be. Truly integrated collaboration between humans and intelligent agents may result in a different way of working that looks nothing like what we know now, and it is important to keep the essential goal of human societal well-being and prosperity a priority. In this special issue, we begin to scope out the underpinnings of a socio-cognitive architecture for Collective HUman-MAchine INtelligence (COHUMAIN), which is the study of the capability of an integrated human and machine (i.e., intelligent technology) system to achieve goals in a wide range of environments. This topic consists of nine papers including a description of the conceptual foundation for a socio-cognitive architecture for COHUMAIN, empirical tests of some aspects of this architecture, research on proposed representations of intelligent agents that can jointly interact with humans, empirical tests of human-human and human-machine interactions, and philosophical and ethical issues to consider as we develop these systems.
What is the place of emotion in intelligent robots? Researchers have advocated the inclusion of some emotion-related components in the information-processing architecture of autonomous agents. It is argued here that emotion needs to be merged with all aspects of the architecture: cognitive-emotional integration should be a key design principle.
Patient empowerment is a growing focus in medical informatics, and artificial intelligence (AI) offers new opportunities for personalized, patient-centered support. This vision paper introduces a theoretical framework for an advanced medical conversational agent. We propose a conceptual model called Holistic Empathic AI (HEAI), implemented as a chatbot (PAI-bot). The system architecture integrates cognitive principles and includes modules for natural language dialogue, user state detection, and adaptive response generation. A key element is a multidimensional well-being score combining physical, emotional, cognitive, and social factors. The model structure and data flow support dynamic adjustment of interaction based on real-time analysis. Although not yet implemented clinically, this work establishes the conceptual foundation for future development of empathic AI in healthcare. The HEAI model aims to enhance the patient-AI dialogue and enable more human-like support.
The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize that this architecture enables better learning and generalization than architectures with less specialized modules. To test this, we trained reinforcement learning agents with various neural architectures on a naturalistic navigation task. We found that the modular agent, with an architecture that segregates computations of state representation, value, and action into specialized modules, achieved better learning and generalization. Its learned state representation combines prediction and observation, weighted by their relative uncertainty, akin to recursive Bayesian estimation. This agent's behavior also resembles macaques' behavior more closely. Our results shed light on the possible rationale for the brain's modularity and suggest that artificial systems can use this insight from neuroscience to improve learning and generalization in natural tasks.
In building artificial intelligence (AI) agents, referring to how brains function in real environments can accelerate development by reducing the design space. In this study, we propose a probabilistic generative model (PGM) for navigation in uncertain environments by integrating the neuroscientific knowledge of hippocampal formation (HF) and the engineering knowledge in robotics and AI, namely, simultaneous localization and mapping (SLAM). We follow the approach of brain reference architecture (BRA) (Yamakawa, 2021) to compose the PGM and outline how to verify the model. To this end, we survey and discuss the relationship between the HF findings and SLAM models. The proposed hippocampal formation-inspired probabilistic generative model (HF-PGM) is designed to be highly consistent with the anatomical structure and functions of the HF. By referencing the brain, we elaborate on the importance of integration of egocentric/allocentric information from the entorhinal cortex to the hippocampus and the use of discrete-event queues.
AntAlate is a software framework for Unmanned Aerial Vehicle (UAV) autonomy, designed to streamline and facilitate the work of application developers, particularly in deployment of Multi-Agent Robotic Systems (MARS). We created AntAlate in order to bring our research in the field of multi-agent systems from theoretical results to both advanced simulations and to real-life demonstrations. Creating a framework capable of catering to MARS applications requires support for distributed, decentralized, control using local sensing, performed autonomously by groups of identical anonymous agents. Though mainly interested in the emergent behavior of the system as a whole, we focused on the single agent and created a framework suitable for a system of systems approach, while minimizing the hardware requirements of the single agent. Global observers or even a centralized control can be added on top of AntAlate, but the framework does not require a global actor to finalize an application. The same applies to a human in the loop, and fully autonomous UAV applications can be written in as straightforward a way as can semi-autonomous applications. In this paper we describe the AntAlate framework and demonstrate its utility and versatility.
Intelligent traffic management systems have become one of the main applications of Intelligent Transportation Systems (ITS). There is a growing interest in Reinforcement Learning (RL) based control methods in ITS applications such as autonomous driving and traffic management solutions. Deep learning helps in approximating substantially complex nonlinear functions from complicated data sets and tackling complex control issues. In this paper, we propose an approach based on Multi-Agent Reinforcement Learning (MARL) and smart routing to improve the flow of autonomous vehicles on road networks. We evaluate Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critical (IA2C), recently suggested Multi-Agent Reinforcement Learning techniques with smart routing for traffic signal optimization to determine its potential. We investigate the framework offered by non-Markov decision processes, enabling a more in-depth understanding of the algorithms. We conduct a critical analysis to observe the robustness and effectiveness of the method. The method's efficacy and reliability are demonstrated by simulations using SUMO, a software modeling tool for traffic simulations. We used a road network that contains seven intersections. Our findings show that MA2C, when trained on pseudo-random vehicle flows, is a viable methodology that outperforms competing techniques.
The entropy-oriented approach called security- or cybersecurity-informed safety (SIS or CSIS, respectively) is discussed and developed in order to analyse and evaluate the safety and dependability of autonomous transport systems (ATSs) such as unmanned aerial vehicles (UAVs), unmanned maritime vehicles (UMVs), and satellites. This approach allows for extending and integrating the known techniques FMECA (Failure Modes, Effects, and Criticality Analysis) and IMECA (Intrusion MECA), as well as developing the new SISMECA (SIS-based Intrusion Modes, Effects, and Criticality Analysis) technique. The ontology model and templates for SISMECA implementation are suggested. The methodology of safety assessment is based on (i) the application and enhancement of SISMECA considering the particularities of various ATSs and roles of actors (regulators, developers, operators, customers); (ii) the development of a set of scenarios describing the operation of ATS in conditions of cyberattacks and physical influences; (iii) AI contribution to system protection for the analysed domains; (iv) scenario-based development and analysis of user stories related to different cyber-attacks, as well as ways to protect ATSs from them via AI means/platforms; (v) profiling of AI platform requirements by use of characteristics based on AI quality model, risk-based assessment of cyberattack criticality, and efficiency of countermeasures which actors can implement. Examples of the application of SISMECA assessment are presented and discussed.
To demonstrate an architecture to automate the prehospital emergency process to categorize the specialized care according to the situation at the right time for reducing the patient mortality and morbidity. Prehospital emergency process were analyzed using existing prehospital management systems, frameworks and the extracted process were modeled using sequence diagram in Rational Rose software. System main agents were identified and modeled via component diagram, considering the main system actors and by logically dividing business functionalities, finally the conceptual architecture for prehospital emergency management was proposed. The proposed architecture was simulated using Anylogic simulation software. Anylogic Agent Model, State Chart and Process Model were used to model the system. Multi agent systems (MAS) had a great success in distributed, complex and dynamic problem solving environments, and utilizing autonomous agents provides intelligent decision making capabilities. The proposed architecture presents prehospital management operations. The main identified agents are: EMS Center, Ambulance, Traffic Station, Healthcare Provider, Patient, Consultation Center, National Medical Record System and quality of service monitoring agent. In a critical condition like prehospital emergency we are coping with sophisticated processes like ambulance navigation health care provider and service assignment, consultation, recalling patients past medical history through a centralized EHR system and monitoring healthcare quality in a real-time manner. The main advantage of our work has been the multi agent system utilization. Our Future work will include proposed architecture implementation and evaluation of its impact on patient quality care improvement.
This paper presents a novel algorithm to address resource allocation and network-slicing challenges in multiaccess edge computing (MEC) networks. Network slicing divides a physical network into virtual slices, each tailored to efficiently allocate resources and meet diverse service requirements. To maximize the completion rate of user-computing tasks within these slices, the problem is decomposed into two subproblems: efficient core-to-edge slicing (ECS) and autonomous resource slicing (ARS). ECS facilitates collaborative resource distribution through cooperation among edge servers, while ARS dynamically manages resources based on real-time network conditions. The proposed solution, a multiagent actor-critic resource scheduling (MAARS) algorithm, employs a reinforcement learning framework. Specifically, MAARS utilizes a multiagent deep deterministic policy gradient (MADDPG) for efficient resource distribution in ECS and a soft actor-critic (SAC) technique for robust real-time resource management in ARS. Simulation results demonstrate that MAARS outperforms benchmark algorithms, including heuristic-based, DQN-based, and A2C-based methods, in terms of task completion rates, resource utilization, and convergence speed. Thus, this study offers a scalable and efficient framework for resource optimization and network slicing in MEC networks, providing practical benefits for real-world deployments and setting a new performance benchmark in dynamic environments.
This article proposes the use of a disembodied autonomous actor for navigation support within complex virtual medical objects reconstructed from Computed Tomography or Magnetic Resonance Imaging. Such objects are often maze-like, and users risk getting lost within them during Virtual Reality sessions. Therefore, users need paths for guided fly-throughs when performing non-invasive diagnostic tasks. We present a synthetic vision-based actor capable of finding collision-free paths from a given position to a goal point in environments containing loops and impasses. When navigating, the actor voxelizes the virtual environment and searches for collision-free paths in voxel space by using a back tracking search algorithm. Automata and rules control its search behaviour. The resulting paths can be used in dedicated virtual endoscopy applications. Our path search method has been tested within a variety of tubular virtual anatomical structures in 3D such as aortas, colons, or blood vessels of the brain. The actor finds paths within reasonable time limits, even when considering complex anatomical surface models. The method may be used as a valuable tool for assisting virtual endoscopic diagnostic and screening activities in the near future.
Autonomous agents perform on behalf of the user to achieve defined goals or objectives. They are situated in dynamic environment and are able to operate autonomously to achieve their goals. In a multiagent system, agents cooperate with each other to achieve a common goal. Testing of multiagent systems is a challenging task due to the autonomous and proactive behavior of agents. However, testing is required to build confidence into the working of a multiagent system. Prometheus methodology is a commonly used approach to design multiagents systems. Systematic and thorough testing of each interaction is necessary. This paper proposes a novel approach to testing of multiagent systems based on Prometheus design artifacts. In the proposed approach, different interactions between the agent and actors are considered to test the multiagent system. These interactions include percepts and actions along with messages between the agents which can be modeled in a protocol diagram. The protocol diagram is converted into a protocol graph, on which different coverage criteria are applied to generate test paths that cover interactions between the agents. A prototype tool has been developed to generate test paths from protocol graph according to the specified coverage criterion.
Multi-agent path planning for Unmanned Aerial Vehicles (UAVs) in agricultural data collection tasks presents a significant challenge, requiring sophisticated coordination to ensure efficiency and avoid conflicts. Existing multi-agent reinforcement learning (MARL) algorithms often struggle with high-dimensional state spaces, continuous action domains, and complex inter-agent dependencies. To address these issues, we propose a novel algorithm, Multi-Agent Transformer-based Soft Actor-Critic (MATRS). Operating on the Centralized Training with Decentralized Execution (CTDE) paradigm, MATRS enables safe and efficient collaborative data collection and trajectory optimization. By integrating a Transformer encoder into its centralized critic network, our approach leverages the self-attention mechanism to explicitly model the intricate relationships between agents, thereby enabling a more accurate evaluation of the joint action-value function. Through comprehensive simulation experiments, we evaluated the performance of MATRS against established baseline algorithms (MADDPG, MATD3, and MASAC) in scenarios with varying data loads and problem scales. The results demonstrate that MATRS consistently achieves faster convergence and shorter task completion times. Furthermore, in scalability experiments, MATRS learned an efficient "task-space partitioning" strategy, where the UAV swarm autonomously divides the operational area for conflict-free coverage. These findings indicate that combining attention-based architectures with Soft Actor-Critic learning offers a potent and scalable solution for high-performance multi-UAV coordination in IoT data collection tasks.
Neural network simulation is an important tool for generating and evaluating hypotheses on the structure, dynamics, and function of neural circuits. For scientific questions addressing organisms operating autonomously in their environments, in particular where learning is involved, it is crucial to be able to operate such simulations in a closed-loop fashion. In such a set-up, the neural agent continuously receives sensory stimuli from the environment and provides motor signals that manipulate the environment or move the agent within it. So far, most studies requiring such functionality have been conducted with custom simulation scripts and manually implemented tasks. This makes it difficult for other researchers to reproduce and build upon previous work and nearly impossible to compare the performance of different learning architectures. In this work, we present a novel approach to solve this problem, connecting benchmark tools from the field of machine learning and state-of-the-art neural network simulators from computational neuroscience. The resulting toolchain enables researchers in both fields to make use of well-tested high-performance simulation software supporting biologically plausible neuron, synapse and network models and allows them to evaluate and compare their approach on the basis of standardized environments with various levels of complexity. We demonstrate the functionality of the toolchain by implementing a neuronal actor-critic architecture for reinforcement learning in the NEST simulator and successfully training it on two different environments from the OpenAI Gym. We compare its performance to a previously suggested neural network model of reinforcement learning in the basal ganglia and a generic Q-learning algorithm.
DSAC-ICM: A Distributional Reinforcement Learning Framework for Path Planning in 3D Uneven Terrains.
Ground autonomous mobile robots are increasingly critical for reconnaissance, patrol, and resupply tasks in public safety and national defense scenarios, where global path planning in 3D uneven terrains remains a major challenge. Traditional planners struggle with high dimensionality, while Deep Reinforcement Learning (DRL) is hindered by two key issues: (1) systematic overestimation of action values (Q-values) due to function approximation error, which leads to suboptimal policies and training instability; and (2) inefficient exploration under sparse reward signals. To address these limitations, we propose DSAC-ICM: a Distributional Soft Actor-Critic framework integrated with an Intrinsic Curiosity Module (ICM). Our method fundamentally shifts the learning paradigm from estimating scalar Q-values to learning the full probability distribution of state-action returns, which inherently mitigates value overestimation. We further integrate the ICM to generate dense intrinsic rewards, guiding the agent toward novel and unvisited states to tackle the exploration challenge. Comprehensive experiments conducted in a suite of realistic 3D uneven-terrain environments demonstrate that DSAC-ICM successfully enables the agent to learn effective navigation capabilities. Crucially, it achieves a superior trade-off between path quality and computational cost when compared to traditional path planning algorithms. Furthermore, DSAC-ICM significantly outperforms other RL baselines in terms of convergence speed and return.
Virtual reality simulations are shown to be an effective approach for interprofessional nurse-physician communication training. However, its scalability is constrained by unequal medical-nursing cohort size, rendering a great challenge for all nursing students to form an interprofessional team with medical students. With the evolution of artificial intelligence (AI), an AI medical team player can be integrated into virtual reality simulations for more nursing students to engage in interprofessional team training. To describe the development of a novel AI-enabled virtual reality simulation (AI-enabled VRS) and to evaluate nursing students' competencies and experiences in communicating with an AI medical doctor. A mixed-methods design using a one-group pretest-posttest design and focus group discussions were employed in the evaluation phase. Nursing students from a university were recruited to undertake the 2-hour AI-enabled VRS. Pre-test and post-tests were administered to evaluate the participants' communication knowledge and self-efficacy. Survey questionnaires were administered to examine their experiences with the virtual reality environment and the AI doctor. Five focus group discussions were conducted to gain deeper insight into their learning experiences. The participants demonstrated significant improvements in communication knowledge and interprofessional communication self-efficacy after the learning. They reported positively on the acceptability, feasibility and usability of the AI-enabled VRS. The subscale of "human-like" feature of the AI medical doctor was rated the lowest. Three themes surrounding participants' experiences of the virtual learning emerged: "relate to the real world", "artificial intelligence versus human intelligence" and "complement with face-to-face learning". This study demonstrates initial evidence on the potential of AI-enabled VRS in fostering nursing students' learning on interprofessional communication skills. The findings have also provided insights on how to improve the AI-enabled VRS, in particular, the expressiveness of the AI pedagogical agent and facilitating more dialogue trainings with learner-agent conversations.
We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are poised to be proficient in various tasks, planning discovery workflows and performing self-assessment to identify and mitigate gaps in their knowledge. These agents use large language models and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.
Conversational artificial intelligence (CAI) is emerging as a promising digital technology for mental health care. CAI apps, such as psychotherapeutic chatbots, are available in app stores, but their use raises ethical concerns. We aimed to provide a comprehensive overview of ethical considerations surrounding CAI as a therapist for individuals with mental health issues. We conducted a systematic search across PubMed, Embase, APA PsycINFO, Web of Science, Scopus, the Philosopher's Index, and ACM Digital Library databases. Our search comprised 3 elements: embodied artificial intelligence, ethics, and mental health. We defined CAI as a conversational agent that interacts with a person and uses artificial intelligence to formulate output. We included articles discussing the ethical challenges of CAI functioning in the role of a therapist for individuals with mental health issues. We added additional articles through snowball searching. We included articles in English or Dutch. All types of articles were considered except abstracts of symposia. Screening for eligibility was done by 2 independent researchers (MRM and TS or AvB). An initial charting form was created based on the expected considerations and revised and complemented during the charting process. The ethical challenges were divided into themes. When a concern occurred in more than 2 articles, we identified it as a distinct theme. We included 101 articles, of which 95% (n=96) were published in 2018 or later. Most were reviews (n=22, 21.8%) followed by commentaries (n=17, 16.8%). The following 10 themes were distinguished: (1) safety and harm (discussed in 52/101, 51.5% of articles); the most common topics within this theme were suicidality and crisis management, harmful or wrong suggestions, and the risk of dependency on CAI; (2) explicability, transparency, and trust (n=26, 25.7%), including topics such as the effects of "black box" algorithms on trust; (3) responsibility and accountability (n=31, 30.7%); (4) empathy and humanness (n=29, 28.7%); (5) justice (n=41, 40.6%), including themes such as health inequalities due to differences in digital literacy; (6) anthropomorphization and deception (n=24, 23.8%); (7) autonomy (n=12, 11.9%); (8) effectiveness (n=38, 37.6%); (9) privacy and confidentiality (n=62, 61.4%); and (10) concerns for health care workers' jobs (n=16, 15.8%). Other themes were discussed in 9.9% (n=10) of the identified articles. Our scoping review has comprehensively covered ethical aspects of CAI in mental health care. While certain themes remain underexplored and stakeholders' perspectives are insufficiently represented, this study highlights critical areas for further research. These include evaluating the risks and benefits of CAI in comparison to human therapists, determining its appropriate roles in therapeutic contexts and its impact on care access, and addressing accountability. Addressing these gaps can inform normative analysis and guide the development of ethical guidelines for responsible CAI use in mental health care.
The problem of generating generally capable agents is an important frontier in artificial intelligence (AI) research. Such agents may demonstrate open-ended, versatile, and diverse modes of expression, similar to humans. We interpret the work of Heintz & Scott-Phillips as a minimal sufficient set of socio-cognitive biases for the emergence of generally expressive AI, separate yet complementary to existing algorithms.
To develop an artificial intelligence (AI) agent for fully automated rapid head-and-neck intensity-modulated radiation therapy (IMRT) plan generation without time-consuming dose-volume-based inverse planning. This AI agent was trained via implementing a conditional generative adversarial network (cGAN) architecture. The generator, PyraNet, is a novel deep learning network that implements 28 classic ResNet blocks in pyramid-like concatenations. The discriminator is a customized four-layer DenseNet. The AI agent first generates multiple customized two-dimensional projections at nine template beam angles from a patient's three-dimensional computed tomography (CT) volume and structures. These projections are then stacked as four-dimensional inputs of PyraNet, from which nine radiation fluence maps of the corresponding template beam angles are generated simultaneously. Finally, the predicted fluence maps are automatically postprocessed by Gaussian deconvolution operations and imported into a commercial treatment planning system (TPS) for plan integrity check and visualization. The AI agent was built and tested upon 231 oropharyngeal IMRT plans from a TPS plan library. 200/16/15 plans were assigned for training/validation/testing, respectively. Only the primary plans in the sequential boost regime were studied. All plans were normalized to 44 Gy prescription (2 Gy/fx). A customized Harr wavelet loss was adopted for fluence map comparison during the training of the PyraNet. For test cases, isodose distributions in AI plans and TPS plans were qualitatively evaluated for overall dose distributions. Key dosimetric metrics were compared by Wilcoxon signed-rank tests with a significance level of 0.05. All 15 AI plans were successfully generated. Isodose gradients outside of PTV in AI plans were comparable to those of the TPS plans. After PTV coverage normalization, D With rapid and fully automated execution, the developed AI agent can generate complex head-and-neck IMRT plans with acceptable dosimetry quality. This approach holds great potential for clinical applications in preplanning decision-making and real-time planning.
Research in embodied artificial intelligence (AI) has increasing clinical relevance for therapeutic applications in mental health services. With innovations ranging from 'virtual psychotherapists' to social robots in dementia care and autism disorder, to robots for sexual disorders, artificially intelligent virtual and robotic agents are increasingly taking on high-level therapeutic interventions that used to be offered exclusively by highly trained, skilled health professionals. In order to enable responsible clinical implementation, ethical and social implications of the increasing use of embodied AI in mental health need to be identified and addressed. This paper assesses the ethical and social implications of translating embodied AI applications into mental health care across the fields of Psychiatry, Psychology and Psychotherapy. Building on this analysis, it develops a set of preliminary recommendations on how to address ethical and social challenges in current and future applications of embodied AI. Based on a thematic literature search and established principles of medical ethics, an analysis of the ethical and social aspects of currently embodied AI applications was conducted across the fields of Psychiatry, Psychology, and Psychotherapy. To enable a comprehensive evaluation, the analysis was structured around the following three steps: assessment of potential benefits; analysis of overarching ethical issues and concerns; discussion of specific ethical and social issues of the interventions. From an ethical perspective, important benefits of embodied AI applications in mental health include new modes of treatment, opportunities to engage hard-to-reach populations, better patient response, and freeing up time for physicians. Overarching ethical issues and concerns include: harm prevention and various questions of data ethics; a lack of guidance on development of AI applications, their clinical integration and training of health professionals; 'gaps' in ethical and regulatory frameworks; the potential for misuse including using the technologies to replace established services, thereby potentially exacerbating existing health inequalities. Specific challenges identified and discussed in the application of embodied AI include: matters of risk-assessment, referrals, and supervision; the need to respect and protect patient autonomy; the role of non-human therapy; transparency in the use of algorithms; and specific concerns regarding long-term effects of these applications on understandings of illness and the human condition. We argue that embodied AI is a promising approach across the field of mental health; however, further research is needed to address the broader ethical and societal concerns of these technologies to negotiate best research and medical practices in innovative mental health care. We conclude by indicating areas of future research and developing recommendations for high-priority areas in need of concrete ethical guidance.
Recent advances in theoretical biology suggest that key definitions of basal cognition and sentient behavior may arise as emergent properties of in vitro cell cultures and neuronal networks. Such neuronal networks reorganize activity to demonstrate structured behaviors when embodied in structured information landscapes. In this article, we characterize this kind of self-organization through the lens of the free energy principle, that is, as self-evidencing. We do this by first discussing the definitions of reactive and sentient behavior in the setting of active inference, which describes the behavior of agents that model the consequences of their actions. We then introduce a formal account of intentional behavior that describes agents as driven by a preferred end point or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behavior using simulations. First, we simulate the in vitro experiments, in which neuronal cultures modulated activity to improve gameplay in a simplified version of Pong by implementing nested, free energy minimizing processes. The simulations are then used to deconstruct the ensuing predictive behavior, leading to the distinction between merely reactive, sentient, and intentional behavior with the latter formalized in terms of inductive inference. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem) that show how quickly and efficiently adaptive behavior emerges under an inductive form of active inference.
How situated embodied agents may achieve goals using knowledge is the classical question of natural and artificial intelligence. How organisms achieve this with their nervous systems is a central challenge for a neural theory of embodied cognition. To structure this challenge, we borrow terms from Searle's analysis of intentionality in its two directions of fit and six psychological modes (perception, memory, belief, intention-in-action, prior intention, desire). We postulate that intentional states are instantiated by neural activation patterns that are stabilized by neural interaction. Dynamic instabilities provide the neural mechanism for initiating and terminating intentional states and are critical to organizing sequences of intentional states. Beliefs represented by networks of concept nodes are autonomously learned and activated in response to desired outcomes. The neural dynamic principles of an intentional agent are demonstrated in a toy scenario in which a robotic agent explores an environment and paints objects in desired colors based on learned color transformation rules.
Parallel magnetic resonance imaging (pMRI) reconstruction needs tedious parameter tuning process for achieving optimal image quality. Although data-driven artificial intelligence (AI) has significantly improved pMRI reconstruction, knowledge-driven AI has been little utilized for optimizing pMRI reconstruction. Recent progress of large language models (LLM) embodying vast knowledge bases are adept at decomposing complex tasks into structured and planned steps in some automation tasks. In this paper, we develop an intelligent agent equipped with LLM-based planning capability for optimizing pMRI reconstruction. Based on the existing empirical knowledge of optimal parameter tuning for GRAPPA reconstruction, Planning Domain Definition Language (PDDL) domain and problem files are generated by using an LLM. Then, structured PDDL is used to guide GRAPPA reconstruction. Experimental results show that LLM-based planning can specify clear goals of parameter tuning from unstructured knowledge description and improve image reconstruction. The proposed method may help users such as MRI technologist who is not familiar with pMRI reconstruction to optimize image quality. Future work may eliminate the need of human intervention for fully automatic reconstruction.
Scene understanding and decomposition is a crucial challenge for intelligent systems, whether it is for object manipulation, navigation, or any other task. Although current machine and deep learning approaches for object detection and classification obtain high accuracy, they typically do not leverage interaction with the world and are limited to a set of objects seen during training. Humans on the other hand learn to recognize and classify different objects by actively engaging with them on first encounter. Moreover, recent theories in neuroscience suggest that cortical columns in the neocortex play an important role in this process, by building predictive models about objects in their reference frame. In this article, we present an enactive embodied agent that implements such a generative model for object interaction. For each object category, our system instantiates a deep neural network, called Cortical Column Network (CCN), that represents the object in its own reference frame by learning a generative model that predicts the expected transform in pixel space, given an action. The model parameters are optimized through the active inference paradigm, i.e., the minimization of variational free energy. When provided with a visual observation, an ensemble of CCNs each vote on their belief of observing that specific object category, yielding a potential object classification. In case the likelihood on the selected category is too low, the object is detected as an unknown category, and the agent has the ability to instantiate a novel CCN for this category. We validate our system in an simulated environment, where it needs to learn to discern multiple objects from the YCB dataset. We show that classification accuracy improves as an embodied agent can gather more evidence, and that it is able to learn about novel, previously unseen objects. Finally, we show that an agent driven through active inference can choose their actions to reach a preferred observation.
Embodied Conversational Agent (ECA) offer a new means to support smokers as a virtual coach and motivate them to quit smoking. In this study we assess the feasibility and acceptability of an ECA to support quit smoking ("aka ECA-Q"). ECA-Q, a 14-days program, delivered through Tablet computers, interacts with participants with supporting messages for quit smoking and motivates them to set a quit date. Study participants (
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.
The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.
Flesh encodes a variety of haptic information including deformation, temperature, vibration, and damage stimuli using a multisensory array of mechanoreceptors distributed on the surface of the human body. Currently, soft sensors are capable of detecting some haptic stimuli, but whole-body multimodal perception at scales similar to a human adult (surface area ~17,000 square centimeters) is still a challenge in artificially intelligent agents due to the lack of encoding. This encoding is needed to reduce the wiring required to send the vast amount of information transmitted to the processor. We created a robotic flesh that could be further developed for use in these agents. This engineered flesh is an optical, elastomeric matrix "innervated" with stretchable lightguides that encodes haptic stimuli into light: temperature into wavelength due to thermochromic dyes and forces into intensity due to mechanical deformation. By exploiting the optical properties of the constitutive materials and using machine learning, we infer spatiotemporal, haptic information from light that is read by an image sensor. We demonstrate the capabilities of our system in various assemblies to estimate temperature, contact location, normal and shear force, gestures, and damage from temporal snapshots of light coming from the entire haptic sensor with errors <5%.
The creation of machine learning algorithms for intelligent agents capable of continuous, lifelong learning is a critical objective for algorithms being deployed on real-life systems in dynamic environments. Here we present an algorithm inspired by neuromodulatory mechanisms in the human brain that integrates and expands upon Stephen Grossberg's ground-breaking Adaptive Resonance Theory proposals. Specifically, it builds on the concept of uncertainty, and employs a series of "neuromodulatory" mechanisms to enable continuous learning, including self-supervised and one-shot learning. Algorithm components were evaluated in a series of benchmark experiments that demonstrate stable learning without catastrophic forgetting. We also demonstrate the critical role of developing these systems in a closed-loop manner where the environment and the agent's behaviors constrain and guide the learning process. To this end, we integrated the algorithm into an embodied simulated drone agent. The experiments show that the algorithm is capable of continuous learning of new tasks and under changed conditions with high classification accuracy (>94%) in a virtual environment, without catastrophic forgetting. The algorithm accepts high dimensional inputs from any state-of-the-art detection and feature extraction algorithms, making it a flexible addition to existing systems. We also describe future development efforts focused on imbuing the algorithm with mechanisms to seek out new knowledge as well as employ a broader range of neuromodulatory processes.
Visual analytics (VA) is typically applied to complex data, thus requiring complex tools. While visual analytics empowers analysts in data analysis, analysts may get lost in the complexity occasionally. This highlights the need for intelligent assistance mechanisms. However, even the latest LLM-assisted VA systems only provide help when explicitly requested by the user, making them insufficiently intelligent to offer suggestions when analysts need them the most. We propose a ProactiveVA framework in which LLM-powered UI agent monitors user interactions and delivers context-aware assistance proactively. To design effective proactive assistance, we first conducted a formative study analyzing help-seeking behaviors in user interaction logs, identifying when users need proactive help, what assistance they require, and how the agent should intervene. Based on this analysis, we distilled key design requirements in terms of intent recognition, solution generation, interpretability and controllability. Guided by these requirements, we develop a three-stage UI agent pipeline including perception, reasoning, and acting. The agent autonomously perceives users' needs from VA interaction logs, providing tailored suggestions and intuitive guidance through interactive exploration of the system. We implemented the framework in two representative types of VA systems, demonstrating its generalizability, and evaluated the effectiveness through an algorithm evaluation, case and expert study and a user study. We also discuss current design trade-offs of proactive VA and areas for further exploration.
This Perspective explores the transformative potential of multi-agent systems (MAS) powered by Large Language Models (LLMs) in the geosciences. Users of geoscientific data repositories face challenges due to the complexity and diversity of data formats, inconsistent metadata practices, and a considerable number of unprocessed datasets. MAS possesses transformative potential for improving scientists' interaction with geoscientific data by enabling intelligent data processing, natural language interfaces, and collaborative problem-solving capabilities. We illustrate this approach with "PANGAEA GPT," a specialized MAS pipeline integrated with the diverse PANGAEA database for Earth & Environmental Science, demonstrating how MAS-driven workflows can effectively manage complex datasets and accelerate scientific discovery. We discuss how MAS can address current data challenges in geosciences, highlight advancements in other scientific fields, and propose future directions for integrating MAS into geoscientific data processing pipelines. In this Perspective, we show how MAS can fundamentally improve data accessibility, promote cross-disciplinary collaboration, and accelerate geoscientific discoveries.
We study the emergence of agency from scratch by using Large Language Model (LLM)-based agents. In previous studies of LLM-based agents, each agent's characteristics, including personality and memory, have traditionally been predefined. We focused on how individuality, such as behavior, personality, and memory, can be differentiated from an undifferentiated state. The present LLM agents engage in cooperative communication within a group simulation, exchanging context-based messages in natural language. By analyzing this multi-agent simulation, we report valuable new insights into how social norms, cooperation, and personality traits can emerge spontaneously. This paper demonstrates that autonomously interacting LLM-powered agents generate hallucinations and hashtags to sustain communication, which, in turn, increases the diversity of words within their interactions. Each agent's emotions shift through communication, and as they form communities, the personalities of the agents emerge and evolve accordingly. This computational modeling approach and its findings will provide a new method for analyzing collective artificial intelligence.
Robotic ultrasound systems have the potential to improve medical diagnostics, but patient acceptance remains a key challenge. To address this, we propose a novel system that combines an AI-based virtual agent, powered by a large language model (LLM), with three mixed reality visualizations aimed at enhancing patient comfort and trust. The LLM enables the virtual assistant to engage in natural, conversational dialogue with patients, answering questions in any format and offering real-time reassurance, creating a more intelligent and reliable interaction. The virtual assistant is animated as controlling the ultrasound probe, giving the impression that the robot is guided by the assistant. The first visualization employs augmented reality (AR), allowing patients to see the real world and the robot with the virtual avatar superimposed. The second visualization is an augmented virtuality (AV) environment, where the real-world body part being scanned is visible, while a 3D Gaussian Splatting reconstruction of the room, excluding the robot, forms the virtual environment. The third is a fully immersive virtual reality (VR) experience, featuring the same 3D reconstruction but entirely virtual, where the patient sees a virtual representation of their body being scanned in a robot-free environment. In this case, the virtual ultrasound probe, mirrors the movement of the probe controlled by the robot, creating a synchronized experience as it touches and moves over the patient's virtual body. We conducted a comprehensive agent-guided robotic ultrasound study with all participants, comparing these visualizations against a standard robotic ultrasound procedure. Results showed significant improvements in patient trust, acceptance, and comfort. Based on these findings, we offer insights into designing future mixed reality visualizations and virtual agents to further enhance patient comfort and acceptance in autonomous medical procedures.
Effective communication is crucial for trust-building, accurate information gathering, and clinical decision-making in healthcare. Despite its emphasis in medical curricula, traditional training methods, such as role-playing with standardized patients, remain costly, logistically complex, and fail to replicate real-life scenarios. Simulation-based training enhances communication and reasoning skills, but novice learners often struggle due to underdeveloped reasoning processes. Furthermore, limited access to asynchronous, autonomous simulated patient interactions restricts personalized practice. Virtual patient models offer scalable solutions with interactive scenarios and tailored feedback, but high development costs and resource demands hinder their widespread adoption. To address these challenges, virtual patient systems powered by Large Language Models (LLMs) have emerged as a promising tool. These generative agents simulate human-like behavioral responses by leveraging LLM capabilities, cognitive mechanisms, and contextual memory retrieval. A tool was developed allowing students to select clinical cases and interact with a chatbot simulating a patient role. Teachers can also create custom cases. Evaluations showed that the agent provided consistent, plausible responses aligned with case descriptions and achieved a Chatbot Usability Questionnaire (CUQ) score of 86.25/100. Our results show that this approach enables flexible, repetitive, and asynchronous practice while offering real-time feedback.
Autonomous Analysis of Curated Patient Data Using a Large Language Model-Based Multiagent Framework.
Analyzing complex medical data sets is specialized and time-consuming. This study aimed to develop and evaluate a novel multiagent artificial intelligence (AI) framework for automating medical data analysis workflows and to compare its performance against nonagent-based approaches using large language models (LLMs). A six-party AI agent system was developed using the AutoGen platform, with specialized agents for planning, data retrieval, cleaning, statistical analysis, and review, powered by OpenAI gpt-4o. This framework was applied to deidentified single patient-level data sets from 20 recent studies in the field of bone marrow transplantation (2021-2023). The primary objective was to evaluate its accuracy in replicating published primary outcomes, benchmarked against direct use of the Web site-based ChatGPT 4o. The multiagent framework successfully replicated 53.3% (95% CI, 40.7 to 66.0) of primary outcomes, significantly outperforming ChatGPT 4o (35.0% [95% CI, 22.9 to 47.1]; Our multiagent AI framework demonstrated superior accuracy and robustness in automating biomedical data analysis compared with a generalized LLM.
We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, literature-grounded question answering via retrieval-augmented generation, molecular generation, multiproperty prediction, property-aware molecular refinement, and 3D protein-ligand structure generation. The agent autonomously retrieves relevant biomolecular information, including FASTA sequences, SMILES representations, and literature, and answers mechanistic questions with improved contextual accuracy compared to standard LLMs. It then generates chemically diverse seed molecules and predicted 75 properties, including ADMET-related and general physicochemical descriptors, which guids iterative molecular refinement. Across two refinement rounds, the number of molecules with QED >0.6 increased from 34 to 55. The number of molecules satisfying empirical drug-likeness filters also rose; for example, compliance with the Ghose filter increased from 32 to 55 within a pool of 100 molecules. The framework also employed Boltz-2 to generate 3D protein-ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.
No abstract
Ophthalmic findings can non-invasively reflect nervous-system status. We present an LLM-based multi-agent framework that preserves diagnostic uncertainty to support neuro-ophthalmic screening and referral. Heterogeneous inputs (clinical text/PDFs and optional fundus/OCT images) are normalized by an Information Collection Agent. A Diagnosis Agent ensembles multiple LLMs and, when available, a CNN image branch; outputs are aggregated with an uncertainty-aware fusion. Across a curated ophthalmic corpus, the multi-agent framework improves robustness over single-model baselines and produces multi-candidate distributions suitable for downstream triage and monitoring. Uncertainty-aware, multi-candidate predictions align with clinical decision-making under ambiguity and suggest future work on calibration and knowledge-layer fusion.
Contemporary research in human-machine symbiosis has mainly concentrated on enhancing relevant sensory, perceptual, and motor capacities, assuming short-term and nearly momentary interaction sessions. Still, human-machine confluence encompasses an inherent temporal dimension that is typically overlooked. The present work shifts the focus on the temporal and long-lasting aspects of symbiotic human-robot interaction (sHRI). We explore the integration of three time-aware modules, each one focusing on a diverse part of the sHRI timeline. Specifically, the Episodic Memory considers past experiences, the Generative Time Models estimate the progress of ongoing activities, and the Daisy Planner devices plans for the timely accomplishment of goals. The integrated system is employed to coordinate the activities of a multi-agent team. Accordingly, the proposed system (i) predicts human preferences based on past experience, (ii) estimates performance profile and task completion time, by monitoring human activity, and (iii) dynamically adapts multi-agent activity plans to changes in expectation and Human-Robot Interaction (HRI) performance. The system is deployed and extensively assessed in real-world and simulated environments. The obtained results suggest that building upon the unfolding and the temporal properties of team tasks can significantly enhance the fluency of sHRI.
As the roles of robots continue to expand in general, there is an increasing demand for research on automated task planning for a multi-agent system that can independently execute tasks in a wide and dynamic environment. This study introduces a plugin framework in which multiple robots can be involved in task planning in a broad range of areas by combining symbolic and connectionist approaches. The symbolic approach for understanding and learning human knowledge is useful for task planning in a wide and static environment. The network-based connectionist approach has the advantage of being able to respond to an ever-changing dynamic environment. A planning domain definition language-based planning algorithm, which is a symbolic approach, and the cooperative-competitive reinforcement learning algorithm, which is a connectionist approach, were utilized in this study. The proposed architecture is verified through a simulation. It is also verified through an experiment using 10 unmanned surface vehicles that the given tasks were successfully executed in a wide and dynamic environment.
As different types of hazards, including natural and man-made, can occur simultaneously, to implement an integrated and holistic risk management, a multi-hazard perspective on disaster risk management, including preparedness and planning, must be taken for a safer and more resilient society. Considering the emerging challenges that the COVID-19 pandemic has been introducing to regular hospital operations, there is a need to adapt emergency plans with the changing conditions, as well. Evacuation of patients with different mobility disabilities is a complicated process that needs planning, training, and efficient decision-making. These protocols need to be revisited for multi-hazard scenarios such as an ongoing disease outbreak during which additional infection control protocols might be in place to prevent transmission. Computational models can provide insights on optimal emergency evacuation strategies, such as the location of isolation units or alternative evacuation prioritization strategies. This study introduces a non-ICU patient classification framework developed based on available patient mobility data. An agent-based model was developed to simulate the evacuation of the emergency department at the Johns Hopkins Hospital during the COVID-19 pandemic due to a fire emergency. The results show a larger nursing team can reduce the median and upper bound of the 95% confidence interval of the evacuation time by 36% and 33%, respectively. A dedicated exit door for COVID-19 patients is relatively less effective in reducing the median time, while it can reduce the upper bound by more than 50%.
Urban vegetation is an essential element of the urban city pedestrian walkway. Despite city forest regulations and urban planning best practices, vegetation planning lacks clear comprehension and compatibility with other urban elements surrounding it. Urban planners and academic researchers currently devote vital attention to include most of the urban elements and their impact on the occupants and the environment in the planning stage of urban development. With the advancement in computational design, they have developed various algorithms to generate design alternatives and measure their impact on the environment that meets occupants' needs and perceptions of their city. In particular, multi-agent-based simulations show great promise in developing rule compliance with urban vegetation design tools. This paper proposed an automatic urban vegetation city rule compliance approach for pedestrian pathway vegetation, leveraging multi-agent system and algorithmic modeling tools. This approach comprises three modules: rule compliance (T-Rule), street vegetation design tool (T-Design), and multi-agent alternative generation (T-Agent). Notably, the scope of the paper is limited to trees, shrubbery, and seating area configurations in the urban pathway context. To validate the developed design tool, a case study was tested, and the vegetation design tool generated the expected results successfully. A questionnaire was conducted to give feedback on the use of the developed tool for enhancing positive experience of the developed tool. It is anticipated that the proposed tool has the potential to aid urban planners in decision-making and develop more practical vegetation planting plans compared with the conventional Two-Dimensional (2D) plans, and give the city occupants the chance to take part in shaping their city by merely selecting from predefined parameters in a user interface to generate their neighborhood pathway vegetation plans. Moreover, this approach can be extended to be embedded in an interactive map where city occupants can shape their neighborhood greenery and give feedback to urban planners for decision-making.
Multidisciplinary tumor boards (MDTs) are central to cancer care but remain constrained by scarce experts and variable decision quality. EvoMDT employs a self-evolution loop that updates prompts, consensus weights, and retrieval scope based on expert feedback and outcome signals, improving robustness without sacrificing traceability. This matters clinically because MDT workloads and evidence shift over time, requiring adaptive yet auditable decision support. Agents perform domain-specific inference over lesion-level clinical data with structured knowledge retrieval; a consensus protocol resolves conflicts and generates traceable, evidence-linked recommendations. Evaluation spanned six public oncology QA benchmarks and four real-world datasets (breast, liver, lung, lymphoma), followed by single-blind physician assessment. Quantitative metrics (ROUGE, BERTScore) and automated safety checks assessed factuality and guideline concordance, while clinicians rated clinical appropriateness and usability. EvoMDT outperformed frontier Large Language Models (LLMs) baselines (e.g., Llama-3-70B, Claude-3, Med-PaLM 2), improving guideline concordance and semantic alignment with expert plans (BERTScore 0.62-0.68) and reducing safety violations. In physician review, EvoMDT achieved decision quality comparable to human MDTs while shortening response time by 30-40%. These results position EvoMDT as an interpretable, evidence-traceable framework that operationalizes AI reasoning for multidisciplinary oncology practice and offers a scalable foundation for trustworthy, lesion-level precision cancer care.
Autonomous Mobile Robots (AMRs) are increasingly important in Industry 4.0 intralogistics but creating path planning systems that adapt to dynamic and uncertain Flexible Manufacturing Systems (FMS), especially managing conflicts among multiple AMRs with a need for scalable decentralised solutions, remains a significant challenge. This research introduces a dynamic path planning system for AMRs designed for reactive adaptation to FMS disturbances and generalisation across factory layouts, incorporating support for multiple AMRs with integrated conflict avoidance. The system is built on a Multi-Agent Systems (MAS) architecture, where software AMR agents independently calculate their paths using a hybrid Genetic Algorithm (GA) that employs Cell-Based Decomposition (CBD) and optimises path length, smoothness, and overlap via a multi-objective fitness function. Multi-AMR conflict avoidance is implemented using the Iterative Exclusion Principle (IEP), which facilitates priority-based planning, knowledge sharing through Predictive Collision Avoidance (PCA), and iterative replanning among agents communicating via a blackboard agent. Verification demonstrated the system's ability to successfully avoid deadlocks for up to nine AMRs and exhibit good scalability. Validation in a simulated FMS environment confirmed robust adaptation to various disturbances, including static and dynamic obstacles, while maintaining stable run times and consistent path quality. These results affirm the practical feasibility of this hybrid GA and MAS-based approach for dynamic AMR control in complex industrial settings.
No abstract
As an important approach of distributed artificial intelligence, multi-agent system provides an efficient way to solve large-scale computational problems through high-parallelism processing with nonlinear interactions between the agents. However, the huge capacity and complex distribution of the individual agents make it difficult for efficient hardware construction. Here, we propose and demonstrate a multi-agent hardware system that deploys distributed Ag nanoclusters as physical agents and their electrochemical dissolution, growth and evolution dynamics under electric field for high-parallelism exploration of the solution space. The collaboration and competition between the Ag nanoclusters allow information to be effectively expressed and processed, which therefore replaces cumbrous exhaustive operations with self-organization of Ag physical network based on the positive feedback of information interaction, leading to significantly reduced computational complexity. The proposed multi-agent network can be scaled up with parallel and serial integration structures, and demonstrates efficient solution of graph and optimization problems. An artificial potential field with superimposed attractive/repulsive components and varied ion velocity is realized, showing gradient descent route planning with self-adaptive obstacle avoidance. This multi-agent network is expected to serve as a physics-empowered parallel computing hardware.
Cooperative multi-agent systems make it possible to employ miniature robots in order to perform different experiments for data collection in wide open areas to physical interactions with test subjects in confined environments such as a hive. This paper proposes a new multi-agent path-planning approach to determine a set of trajectories where the agents do not collide with each other or any obstacle. The proposed algorithm leverages a risk-aware probabilistic roadmap algorithm to generate a map, employs node classification to delineate exploration regions, and incorporates a customized genetic framework to address the combinatorial optimization, with the ultimate goal of computing safe trajectories for the team. Furthermore, the proposed planning algorithm makes the agents explore all subdomains in the workspace together as a formation to allow the team to perform different tasks or collect multiple datasets for reliable localization or hazard detection. The objective function for minimization includes two major parts, the traveling distance of all the agents in the entire mission and the probability of collisions between the agents or agents with obstacles. A sampling method is used to determine the objective function considering the agents' dynamic behavior influenced by environmental disturbances and uncertainties. The algorithm's performance is evaluated for different group sizes by using a simulation environment, and two different benchmark scenarios are introduced to compare the exploration behavior. The proposed optimization method establishes stable and convergent properties regardless of the group size.
Personalized conciliation of clinical guidelines for comorbid patients through multi-agent planning.
The conciliation of multiple single-disease guidelines for comorbid patients entails solving potential clinical interactions, discovering synergies in the diagnosis and the recommendations, and managing clinical equipoise situations. Personalized conciliation of multiple guidelines considering additionally patient preferences brings some further difficulties. Recently, several works have explored distinct techniques to come up with an automated process for the conciliation of clinical guidelines for comorbid patients but very little attention has been put in integrating the patient preferences into this process. In this work, a Multi-Agent Planning (MAP) framework that extends previous work on single-disease temporal Hierarchical Task Networks (HTN) is proposed for the automated conciliation of clinical guidelines with patient-centered preferences. Each agent encapsulates a single-disease Computer Interpretable Guideline (CIG) formalized as an HTN domain and conciliates the decision procedures that encode the clinical recommendations of its CIG with the decision procedures of the other agents' CIGs. During conciliation, drug-related interactions, scheduling constraints as well as redundant actions and multiple support interactions are solved by an automated planning process. Moreover, the simultaneous application of the patient preferences in multiple diseases may potentially bring about contradictory clinical decisions and more interactions. As a final step, the most adequate personalized treatment plan according to the patient preferences is selected by a Multi-Criteria Decision Making (MCDM) process. The MAP approach is tested on a case study that builds upon a simplified representation of two real clinical guidelines for Diabetes Mellitus and Arterial Hypertension.
Automated programming has become a powerful tool for solving real-world problems. Code generation, in particular, plays a key role in improving developer productivity and reducing the entry barrier to software development. Recent advances in large language models (LLMs) have significantly improved program synthesis, enabling high-quality code generation from natural language. However, LLMs still struggle with complex tasks, especially in understanding problem intent, conducting multi-step reasoning, and producing code that passes all test cases. As task difficulty increases, existing models often fail to devise complete and reliable generation strategies, leading to reduced accuracy and robustness. To address these limitations, we propose Blueprint2Code, an innovative multi-agent framework for code generation. It emulates the human programming workflow through the coordinated interaction of four agents-Previewing, Blueprint, Coding, and Debugging-forming a closed-loop system from task comprehension to planning, implementation, and iterative refinement. Compared to existing methods, Blueprint2Code shows superior performance on complex programming tasks. Extensive experiments on benchmark datasets-HumanEval, MBPP, their extended versions (HumanEval-ET, MBPP-ET), and the APPS competition dataset-demonstrated its effectiveness, achieving strong pass@1 results: HumanEval 96.3%, MBPP 88.4%, HumanEval-ET 86.5%, MBPP-ET 59.4%, and APPS 24.6%. The related code is available at https://github.com/MKH99918/Blueprint2Code.
Metamaterials have revolutionized wave control; in the last two decades, they evolved from passive devices via programmable devices to sensor-endowed self-adaptive devices realizing a user-specified functionality. Although deep-learning techniques play an increasingly important role in metamaterial inverse design, measurement post-processing and end-to-end optimization, their role is ultimately still limited to approximating specific mathematical relations; the metamaterial is still limited to serving as proxy of a human operator, realizing a predefined functionality. Here, we propose and experimentally prototype a paradigm shift toward a metamaterial agent (coined metaAgent) endowed with reasoning and cognitive capabilities enabling the autonomous planning and successful execution of diverse long-horizon tasks, including electromagnetic (EM) field manipulations and interactions with robots and humans. Leveraging recently released foundation models, metaAgent reasons in high-level natural language, acting upon diverse prompts from an evolving complex environment. Specifically, metaAgent's cerebrum performs high-level task planning in natural language via a multi-agent discussion mechanism, where agents are domain experts in sensing, planning, grounding, and coding. In response to live environmental feedback within a real-world setting emulating an ambient-assisted living context (including human requests in natural language), our metaAgent prototype self-organizes a hierarchy of EM manipulation tasks in conjunction with commanding a robot. metaAgent masters foundational EM manipulation skills related to wireless communications and sensing, and it memorizes and learns from past experience based on human feedback.
Large language models (LLMs) have shown remarkable potential in various domains but often lack the ability to access and reason over domain-specific knowledge and tools. In this article, we introduce Chemistry Agent Connecting Tool-Usage to Science (CACTUS), an LLM-based agent that integrates existing cheminformatics tools to enable accurate and advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama3-8b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b, Mistral-7b, and Llama3-8b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without a significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with widely used domain-specific tools provided by RDKit, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment.
After having designed and implemented a telemedicine solution equipped with a video presence tool for teleconsultation and tele-expertise and in order to obtain a faithful communication between healthcare professional and patient despite language differences, our study was to perform a literary review on the various existing works and to perform analysis on the different types of neural network for designing an voice intelligent agent for translation during exchanges between doctor and patient during teleconsultation and make tool choices for its development.
Effective management of physical and psychological symptoms is a critical component of comprehensive care for both chronic disease patients and apparently healthy individuals experiencing episodic symptoms. Conversational agents, which are dialog systems capable of understanding and generating human language, have emerged as a potential tool to enhance symptom management through interactive support. To examine the characteristics and effectiveness of conversational agent-delivered interventions reported in randomized controlled trials (RCTs) in the management of both physical and psychological symptoms. A systematic review. A comprehensive search was performed in Pubmed, ACM Digital Library, CINAHL, EMBASE, PyscInfo, Web of Science, Scopus and gray literature sources from their inception to Oct 2024. Search terms included "conversational agent", "symptom", "randomized controlled trial" and their synonyms and hyponyms. Duplicates were identified by EndNote, and titles, abstracts and full texts were independently screened according to predefined criteria. Data extraction focused on basic study characteristics and conversational agent details, with The Cochrane Risk of Bias 2.0 tool employed for bias assessment. The search yielded 2756 articles and 29 were finally included for review. The included studies predominantly came from developed countries (n = 23) and were conducted between 2020 and 2024 (n = 24). The studies frequently evaluated the feasibility and acceptability of conversational agent interventions (n = 14), with a predominantly focus on psychological symptoms (depression, anxiety, etc.) (n = 17). A few studies focused on physical symptoms (pain, etc.) (n = 4), while others addressed both symptoms (n = 8). Twenty-five distinct conversational agents (Woebot, Tess, etc.) were evaluated, utilizing platforms ranging from proprietary applications to common messaging channels like WeChat and Facebook Messenger. Cognitive Behavioral Therapy (CBT) was a commonly integrated approach (n = 22), with rule-based dialogs (n = 22) as the most commonly dialog system methods and Natural Language Processing (NLP) (n = 15) as the predominant AI techniques. The median recruitment and completion rates were 72 % and 79 %, respectively. The majority of studies reported positive user experiences and significant symptom management improvements (n = 22). However, risk of bias was high in seventeen studies and presented some concerns in nine others. Conversational agents have shown promise in enhancing both physical and psychological symptom management through positive user experiences and effectiveness. However, the high risk of bias identified in many studies warrants caution in interpreting these findings. Future research should prioritize the methodological quality of RCTs to strengthen the evidence base supporting the use of conversational agents as a complementary tool in symptom management.
Conversational agents (CAs) are increasingly used as a promising tool for scalable, accessible, and personalized self-management support of people with a chronic disease. Studies of CAs for self-management of chronic disease operate within a multidisciplinary domain: self-management originates from (behavioral) psychology and CAs stem from intervention technology, while diseases are typically studied within the biomedical context. To ensure their effectiveness, structured evaluations and descriptions of the interventions, integrating biomedical, behavioral, and technological perspectives, are essential. We aimed to examine the design and evaluation of CAs for self-management support of chronic diseases, focusing on their characteristics, integration of behavioral change techniques, and evaluation methods. The findings will guide future research and inform intervention design. We conducted a systematic search in the PubMed and Embase databases to identify studies that investigated CAs for chronic disease self-management, published from January 1, 2018, to April 15, 2024. Full-text journal articles, published in English, studying the efficacy or effectiveness of a CA in the context of self-management for chronic diseases in adults were included. Data extraction was guided by conceptual frameworks to ensure comprehensive reporting of intervention and methodologies: the behavioral intervention technology model and the CONSORT-EHEALTH (Consolidated Standards of Reporting Trials of Electronic and Mobile Health Applications and Online Telehealth) checklist. Risk of bias was assessed using the Risk of Bias 2 tool and the Risk of Bias in Non-randomized Studies-of Interventions (ROBINS-I) tool (version 2). In total, 25 studies were included, primarily focusing on text-based, rule-based CAs delivered via a mobile apps. The chronic diseases predominantly targeted were diabetes and cancer. Commonly identified clusters of behavior change techniques were "shaping knowledge," "feedback and monitoring," "natural consequences," and "associations." However, reporting of behavior change techniques and their delivery was lacking, and intervention descriptions were limited. Studies were mostly in the early phase, with a great variety in intervention descriptions, study methods, and outcome measures. Advancing the field of CA-based interventions requires transparent intervention descriptions, rigorous methodologies, consistent use of validated scales, standardized taxonomy, and reporting aligned with standardized frameworks. Enhanced integration of artificial intelligence-driven personalization and a focus on implementation in health care settings are critical for future research.
Cardiovascular disease is the leading cause of death in the world. A program of cardiac rehabilitation (CR) is related to physical activities or exercises to regain the optimal quality of life. CR relies on the necessity to evaluate, control and supervise a patient's status and progress. This work has two objectives: on the one hand, provide a tool for clinicians to assess the patient's status during CR. On the other hand, there is evidence that robots can motivate patients during therapeutic procedures. Our sensor interface explores the possibility to integrate a robotic agent into cardiac therapy. This work presents an exploratory experiment for on-line assessment of typical CR routines.
Mathematical modeling has become a valuable tool that strives to complement conventional biomedical research modalities in order to predict experimental outcome, generate new medical hypotheses, and optimize clinical therapies. Two specific approaches, pharmacokinetic-pharmacodynamic (PK-PD) modeling, and agent-based modeling (ABM), have been widely applied in cancer research. While they have made important contributions on their own (e.g., PK-PD in examining chemotherapy drug efficacy and resistance, and ABM in describing and predicting tumor growth and metastasis), only a few groups have started to combine both approaches together in an effort to gain more insights into the details of drug dynamics and the resulting impact on tumor growth. In this review, we focus our discussion on some of the most recent modeling studies building on a combined PK-PD and ABM approach that have generated experimentally testable hypotheses. Some future directions are also discussed.
Exploiting reinforcement learning (RL) for traffic congestion reduction is a frontier topic in intelligent transportation research. The difficulty in this problem stems from the inability of the RL agent simultaneously monitoring multiple signal lights when taking into account complicated traffic dynamics in different regions of a traffic system. Such challenge is even more outstanding when forming control decisions on a large-scale traffic grid, where the RL action space grows exponentially with the number of intersections within the traffic grid. In this paper, we tackle such a problem by proposing a cooperative deep reinforcement learning (Coder) framework. The intuition behind Coder is to decompose the original difficult RL task as a number of subproblems with relatively easy RL goals. Accordingly, we implement Coder with multiple regional agents and a centralized global agent. Each regional agent learns its own RL policy and value functions over a small region with limited actions. Then, the centralized global agent hierarchically aggregates RL achievements from different regional agents and forms the final Q -function over the entire large-scale traffic grid. The experimental investigations demonstrate that the proposed Coder could reduce on average 30% congestions in terms of the number of waiting vehicles during high density traffic flows in simulations.
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Multiagent reinforcement learning (MARL) has garnered extensive research attention due to its strong learning capabilities, leading to its deployment in increasingly challenging scenarios. Although progress has been made toward more generalizable solutions, many MARL algorithms continue to struggle with balancing scalability and heterogeneity, particularly under conditions of growing uncertainty. Research has shown that combining dense local interactions with sparse global interactions can significantly enhance scalability while preserving agent heterogeneity. Motivated by these insights and inspired by human social behavior, we propose a novel hierarchical method that integrates human guidance with multiagent systems (MASs). Rather than requiring agents to learn from scratch, our method transfers abstract knowledge from humans, employing fuzzy logic to manage the inherent uncertainty in this guidance and reduce the required human effort. To accommodate both local and global interactions, we introduce two levels of human guidance: individual action guidance for agents and an attention graph to describe agent relationships. Our proposed approach is end-to-end and compatible with diverse MARL algorithms. We evaluate our approach in the starcraft multiagent challenge (SMAC) and SMACv2 environments. Empirical results demonstrate its effectiveness, even under low-performance fuzzy human guidance.
No abstract
Multi-agent systems often face challenges such as elevated communication demands, intricate interactions, and difficulties in transferability. To address the issues of complex information interaction and model scalability, we propose an innovative hierarchical graph attention actor-critic reinforcement learning method. This method naturally models the interactions within a multi-agent system as a graph, employing hierarchical graph attention to capture the complex cooperative and competitive relationships among agents, thereby enhancing their adaptability to dynamic environments. Specifically, graph neural networks encode agent observations as single feature-embedding vectors, maintaining a constant dimensionality irrespective of the number of agents, which improves model scalability. Through the "inter-agent" and "inter-group" attention layers, the embedding vector of each agent is updated into an information-condensed and contextualized state representation, which extracts state-dependent relationships between agents and model interactions at both individual and group levels. We conducted experiments across several multi-agent tasks to assess our proposed method's effectiveness, stability, and scalability. Furthermore, to enhance the applicability of our method in large-scale tasks, we tested and validated its performance within a curriculum learning training framework, thereby enhancing its transferability.
Humans and other animals use powerful reinforcement learning (RL) mechanisms that have been discovered by evolution over many generations of trial and error. By contrast, artificial agents typically learn using handcrafted learning rules. Despite decades of interest, the goal of autonomously discovering powerful RL algorithms has proven to be elusive
This article presents a novel technique to achieve plant-wide performance optimization for large-scale unknown industrial processes by integrating the reinforcement learning method with the multiagent game theory. A main advantage of this technique is that plant-wide optimal performance is achieved by a distributed approach where multiple agents solve simplified local nonzero-sum optimization problems so that a global Nash equilibrium is reached. To this end, first, the plant-wide performance optimization problem is reformulated by decomposition into local optimization subproblems for each production index in a multiagent framework. Then, the nonzero-sum graphical game theory is utilized to compute the operational indices for each unit process with the purpose of reaching the global Nash equilibrium, resulting in production indices following their prescribed target values. The stability and the global Nash equilibrium of this multiagent graphical game solution are rigorously proved. The reinforcement learning methods are then developed for each agent to solve the nonzero-sum graphical game problem using data measurements available in the system in real time. The plant dynamics do not have to be known. Finally, the emulation results are given to show the effectiveness of the proposed automated decision algorithm by using measured data from a large mineral processing plant in Gansu Province, China.
In this article, a novel method, called attention enhanced reinforcement learning (AERL), is proposed to address issues including complex interaction, limited communication range, and time-varying communication topology for multi agent cooperation. AERL includes a communication enhanced network (CEN), a graph spatiotemporal long short-term memory network (GST-LSTM), and parameters sharing multi-pseudo critic proximal policy optimization (PS-MPC-PPO). Specifically, CEN based on graph attention mechanism is designed to enlarge the agents' communication range and to deal with complex interaction among the agents. GST-LSTM, which replaces the standard fully connected (FC) operator in LSTM with graph attention operator, is designed to capture the temporal dependence while maintaining the spatial structure learned by CEN. PS-MPC-PPO, which extends proximal policy optimization (PPO) in multi agent systems with parameters' sharing to scale to environments with a large number of agents in training, is designed with multi-pseudo critics to mitigate the bias problem in training and accelerate the convergence process. Simulation results for three groups of representative scenarios including formation control, group containment, and predator-prey games demonstrate the effectiveness and robustness of AERL.
Most previous studies on multi-agent systems aim to coordinate agents to achieve a common goal, but the lack of scalability and transferability prevents them from being applied to large-scale multi-agent tasks. To deal with these limitations, we propose a deep reinforcement learning (DRL) based multi-agent coordination control method for mixed cooperative-competitive environments. To improve scalability and transferability when applying in large-scale multi-agent systems, we construct inter-agent communication and use hierarchical graph attention networks (HGAT) to process the local observations of agents and received messages from neighbors. We also adopt the gated recurrent units (GRU) to address the partial observability issue by recording historical information. The simulation results based on a cooperative task and a competitive task not only show the superiority of our method, but also indicate the scalability and transferability of our method in various scale tasks.
Human immunodeficiency virus (HIV) is a major public health concern in the United States (U.S.), with about 1.2 million people living with it and about 35,000 newly infected each year. There are considerable geographical disparities in HIV burden and care access across the U.S. The 'Ending the HIV Epidemic (EHE)' initiative by the U.S. Department of Health and Human Services aims to reduce new infections by 90% by 2030, by improving coverage of diagnoses, treatment, and prevention interventions and prioritizing jurisdictions with high HIV prevalence. One of the approaches towards achieving this objective includes developing intelligent decision-support systems that can help optimize resource allocation and intervention strategies. Existing decision analytic models either focus on individual cities or aggregate national data, failing to capture jurisdictional interactions critical for optimizing intervention strategies. To address this, we propose a multi-agent reinforcement learning (MARL) framework that enables jurisdiction-specific decision-making while accounting for cross-jurisdictional epidemiological interactions. Our framework functions as an intelligent resource optimization system, helping policymakers strategically allocate interventions based on dynamic, data-driven insights. Experimental results across jurisdictions in California and Florida demonstrate that MARL-driven policies outperform traditional single-agent reinforcement learning approaches by reducing new infections under fixed budget constraints. Our study highlights the importance of incorporating jurisdictional dependencies in decision-making frameworks for large-scale public initiatives. By integrating multi-agent intelligent systems, decision analytics, and reinforcement learning, this study advances expert systems for government resource planning and public health management, offering a scalable framework for broader applications in healthcare policy and epidemic management.
Robust and fast detection of anatomical structures is a prerequisite for both diagnostic and interventional medical image analysis. Current solutions for anatomy detection are typically based on machine learning techniques that exploit large annotated image databases in order to learn the appearance of the captured anatomy. These solutions are subject to several limitations, including the use of suboptimal feature engineering techniques and most importantly the use of computationally suboptimal search-schemes for anatomy detection. To address these issues, we propose a method that follows a new paradigm by reformulating the detection problem as a behavior learning task for an artificial agent. We couple the modeling of the anatomy appearance and the object search in a unified behavioral framework, using the capabilities of deep reinforcement learning and multi-scale image analysis. In other words, an artificial agent is trained not only to distinguish the target anatomical object from the rest of the body but also how to find the object by learning and following an optimal navigation path to the target object in the imaged volumetric space. We evaluated our approach on 1487 3D-CT volumes from 532 patients, totaling over 500,000 image slices and show that it significantly outperforms state-of-the-art solutions on detecting several anatomical structures with no failed cases from a clinical acceptance perspective, while also achieving a 20-30 percent higher detection accuracy. Most importantly, we improve the detection-speed of the reference methods by 2-3 orders of magnitude, achieving unmatched real-time performance on large 3D-CT scans.
The rapid proliferation of electric vehicles (EVs) and their spatially clustered charging behaviors have imposed unprecedented challenges on the stability, efficiency, and fairness of power distribution networks. Coordinating large-scale EV clusters across geographically distributed charging stations requires intelligent scheduling strategies that can simultaneously respect grid constraints, maximize user satisfaction, and enhance renewable energy utilization-all while safeguarding data privacy and computational scalability. This paper proposes a novel multi-agent cooperative dispatch framework based on Federated Deep Reinforcement Learning (FDRL) to optimize the real-time coordination between EVs, chargers, and the underlying power grid infrastructure. The model adopts a hierarchical structure where local agents independently train deep reinforcement learning policies tailored to site-specific dynamics, while a central aggregator synchronizes global model parameters using federated averaging enhanced by entropy-based reward normalization and fairness-aware weighting. The optimization problem is formulated as a multi-objective constrained Markov decision process (CMDP), featuring long-horizon coupling, grid-aware feasibility, and user-centric reward shaping. Our formulation explicitly integrates peak transformer loading limits, charging demand satisfaction, temporal renewable absorption, and inter-agent equity, thereby capturing the full complexity of EV-grid interactions. A realistic case study involving 1,200 EVs, 60 chargers, and a 33-bus feeder system over 24 hours shows that the proposed FDRL framework achieves a 13.6% reduction in grid operating cost, a 21.4% increase in renewable absorption, and fairness with Jain's index consistently above 0.95, while reducing average state-of-charge (SoC) deviation to below 2.5%. These quantitative results highlight the effectiveness of the framework and confirm its promise as a privacy-preserving, scalable, and equitable solution for next-generation energy-cyber-physical systems.
Most multi-agent reinforcement learning (MARL) approaches optimize strategy by improving itself, while ignoring the limitations of homogeneous agents that may have single function. However, in reality, the complex tasks tend to coordinate various types of agents and leverage advantages from one another. Therefore, it is a vital research issue how to establish appropriate communication among them and optimize decision. To this end, we propose a Hierarchical Attention Master-Slave (HAMS) MARL, where the Hierarchical Attention balances the weight allocation within and among clusters, and the Master-Slave architecture endows agents independent reasoning and individual guidance. By the offered design, information fusion, especially among clusters, is implemented effectively, and excessive communication is avoided, moreover, selective composed action optimizes decision. We evaluate the HAMS on both small and large scale heterogeneous StarCraft II micromanagement tasks. The proposed algorithm achieves the exceptional performance with more than 80% win rates in all evaluation scenarios, which obtains an impressive win rate of over 90% in the largest map. The experiments demonstrate a maximum improvement in win rate of 47% over the best known algorithm. The results show that our proposal outperforms recent state-of-the-art approaches, which provides a novel idea for heterogeneous multi-agent policy optimization.
Wireless networks are trending towards large scale systems, containing thousands of nodes, with multiple co-existing applications. Congestion is an inevitable consequence of this scale and complexity, which leads to inefficient use of the network capacity. This paper proposes an autonomous and adaptive wireless network management framework, utilising multi-agent deep reinforcement learning, to achieve efficient use of the network. Its novel reward function incorporates application awareness and fairness to address both node and network level objectives. Our experimental results demonstrate the proposed approach's ability to be optimised for application-specific requirements, while optimising the fairness of the network. The results reveal significant performance benefits in terms of adaptive data rate and an increase in responsiveness compared to a single-agent approach. Some significant qualitative benefits of the multi-agent approach-network size independence, node-led priorities, variable iteration length, and reduced search space-are also presented and discussed.
Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods. NVIF compresses the MIS into a compact latent state while adopting neighboring communication. To stabilize the overall training process, we introduce a two-stage training mechanism. We first pretrain the NVIF module using a randomly sampled offline dataset to create a task-agnostic and stable communication protocol, and then use the pretrained protocol to perform online policy training with RL algorithms. Our theoretical analysis indicates that NVIF-proximal policy optimization (PPO), which combines NVIF with PPO, has the potential to promote cooperation with agent-specific rewards. Experiment results demonstrate the superiority of our method in both heterogeneous and homogeneous settings. Additional experiment results also demonstrate the potential of our method for multitask learning.
The job shop scheduling problem (JSSP) is a classic NP-hard problem. This article focuses on a realistic variant of the JSSP incorporating fuzzy processing times, with the objective of minimizing the maximum completion time. We propose a proximal policy optimization with graph transformer (GT-PPO) algorithm, which leverages proximal policy optimization (PPO) as the foundational framework, to address this problem for the first time. First, the intricate variability in states and actions often leads to suboptimal scheduling outcomes. To address this, we refine the representation of states and actions for improved performance. Second, to overcome inherent limitations of conventional graph neural networks (GNNs)-including difficulty in handling heterogeneity, over-squashing, and limited ability to capture long-range dependencies-we employ a graph transformer (GT) architecture for the first time in this study. These transformers effectively capture both the topological relationships in fuzzy disjunctive graph models and the long-range dependencies in large-scale JSSP instances. Additionally, we also reduce the computational complexity of the GT to $O(n)$ , enabling the agent to derive optimal scheduling solutions for large disjunctive graphs more efficiently, with reduced memory usage. Finally, the testing results demonstrate the strong robustness of our model across various scales of generated instances and public datasets after a single training session. Notably, on large-scale DMU and Taillard public datasets, the model exhibited exceptional robustness, further validating its effectiveness in addressing large-scale fuzzy JSSP.
The continuous development of intelligent traffic control systems has a profound influence on urban traffic planning and traffic management. Indeed, as big data and artificial intelligence continue to evolve, the traffic control strategy based on deep reinforcement learning (RL) has been proven to be a promising method to improve the efficiency of intersections and save people's travel time. However, the existing algorithms ignore the temporal and spatial characteristics of intersections. In this article, we propose a multiagent RL based on the deep spatiotemporal attentive neural network (MARL-DSTAN) to determine the traffic signal timing in a large-scale road network. In this model, the state information captures the spatial dependency of the entire road network by leveraging the graph convolutional network (GCN) and integrates the information based on the importance of intersections via the attention mechanism. Meanwhile, to accumulate more valuable samples and enhance the learning efficiency, the recurrent neural network (RNN) is introduced in the exploration stage to constrain the action search space instead of fully random exploration. MARL-DSTAN decomposes the large-scale area into multiple base environments, and the agents in each base environment use the idea of "centralized training and decentralized execution" to learn to accelerate the algorithm convergence. The simulation results show that our algorithm significantly outperforms the fixed timing scheme and several other state-of-the-art baseline RL algorithms.
Distributed artificial intelligence is increasingly being applied to multiple unmanned aerial vehicles (multi-UAVs). This poses challenges to the distributed reconfiguration (DR) required for the optimal redeployment of multi-UAVs in the event of vehicle destruction. This paper presents a multi-agent deep reinforcement learning-based DR strategy (DRS) that optimizes the multi-UAV group redeployment in terms of swarm performance. To generate a two-layer DRS between multiple groups and a single group, a multi-agent deep reinforcement learning framework is developed in which a QMIX network determines the swarm redeployment, and each deep Q-network determines the single-group redeployment. The proposed method is simulated using Python and a case study demonstrates its effectiveness as a high-quality DRS for large-scale scenarios.
Mobile crowdsensing (MCS) is attracting considerable attention in the past few years as a new paradigm for large-scale information sensing. Unmanned aerial vehicles (UAVs) have played a significant role in MCS tasks and served as crucial nodes in the newly-proposed space-air-ground integrated network (SAGIN). In this paper, we incorporate SAGIN into MCS task and present a
It is challenging to accurately model the overall uncertainty of the power system when it is connected to large-scale intermittent generation sources such as wind and photovoltaic generation due to the inherent volatility, uncertainty, and indivisibility of renewable energy. Deep reinforcement learning (DRL) algorithms are introduced as a solution to avoid modeling the complex uncertainties and to adapt the fluctuation of uncertainty by interacting with the environment and using feedback to continuously improve their strategies. However, the large-scale nature and uncertainty of the system lead to the sparse reward problem and high-dimensional space issue in DRL. A hierarchical deep reinforcement learning (HDRL) scheme is designed to decompose the process of solving this problem into two stages, using the reinforcement learning (RL) agent in the global stage and the heuristic algorithm in the local stage to find optimal dispatching decisions for power systems under uncertainty. Simulation studies have shown that the proposed HDRL scheme is efficient in solving power system economic dispatch problems under both deterministic and uncertain scenarios thanks to its adaptation system uncertainty, and coping with the volatility of uncertain factors while significantly improving the speed of online decision-making.
The convergence of Mobile Edge Computing (MEC) and network slicing technologies is critical to meet the diverse quality of service (QoS) requirements of 5G/6G.However,multi-tenancy (eMBB/uRLLC/mMTC) competition for resources in dynamic environments challenges traditional centralised allocation methods.In this paper, we propose a cooperative optimization framework for edge network slicing resources based on the fusion of multi-intelligence reinforcement learning (MARL) and evolutionary game theory (MARL-EGT).The framework models each slice tenant as an intelligent body with autonomous decision-making capability, which explores the optimal resource requesting strategy through interactive learning; meanwhile, evolutionary game dynamics is introduced to model the imitation, learning and evolution process of the slice population strategy, which guides the system to converge to an efficient evolutionary stable equilibrium (ESS).In order to cope with the problem of too large environment state space and intelligence coordination, a hierarchical attention mechanism and a credit-based contribution evaluation algorithm are innovatively designed to significantly improve the learning efficiency and convergence speed. In simulation experiments, under the MEC scenario constructed based on real data, the MARL-EGT scheme significantly outperforms benchmark methods such as federated reinforcement learning (FRL) and non-cooperative gaming (NCG) in terms of key metrics, such as total system utility, slicing SLA satisfaction rate, and resource utilization, and demonstrates superior dynamic environment adaptability, which provides large-scale, intelligent edge network slicing resource management new ideas. The online version contains supplementary material available at 10.1038/s41598-025-33190-5.
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a well-known density-based clustering algorithm, has gained widespread popularity and usage due to its effectiveness in identifying clusters of arbitrary shapes and handling noisy data. However, it encounters challenges in producing satisfactory cluster results when confronted with datasets of varying density scales, a common scenario in real-world applications. In this paper, we propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN. First, we model the initial dataset as a two-level encoding tree and categorize the data vertices into distinct density partitions according to the information uncertainty determined in the encoding tree. Each partition is then assigned to an agent to find the best clustering parameters without manual assistance. The allocation is density-adaptive, enabling AR-DBSCAN to effectively handle diverse density distributions within the dataset by utilizing distinct agents for different partitions. Second, a multi-agent deep reinforcement learning guided automatic parameter searching process is designed. The process of adjusting the parameter search direction by perceiving the clustering environment is modeled as a Markov decision process. Using a weakly-supervised reward training policy network, each agent adaptively learns the optimal clustering parameters by interacting with the clusters. Third, a recursive search mechanism adaptable to the data's scale is presented, enabling efficient and controlled exploration of large parameter spaces. Extensive experiments are conducted on nine artificial datasets and a real-world dataset. The results of offline and online tasks show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the Normalized Mutual Information (NMI) and the Adjusted Rand Index (ARI) metrics, respectively, but also is capable of robustly finding dominant parameters.
The retail industry faces increasing challenges in matching supply with demand due to evolving consumer behaviors, market volatility, and supply chain disruptions. While existing approaches employ statistical and machine learning methods for demand forecasting, they often fail to capture complex temporal dependencies and lack the ability to simultaneously optimize inventory decisions. This paper proposes a novel multi-agent deep reinforcement learning framework that jointly optimizes demand forecasting and inventory management in retail supply chains, leveraging data from IoT sensors, RFID tracking systems, and smart shelf monitoring devices. Our approach combines transformer-based sequence modeling for demand patterns with hierarchical reinforcement learning agents that coordinate inventory decisions across distribution networks. The framework integrates both historical sales data and real-time sensor measurements, employing attention mechanisms to capture seasonal patterns, promotional effects, and environmental conditions detected through temperature and humidity sensors. Through extensive experiments on large-scale retail datasets incorporating sensor network data, we demonstrate that our method achieves 18.2% lower forecast error and 23.5% reduced stockout rates compared with state-of-the-art baselines. The results show particular improvements in handling promotional events and seasonal transitions, where traditional methods often struggle. Our work provides new insights into leveraging deep reinforcement learning for integrated retail operations optimization and offers a scalable solution for modern sensor-enabled supply chain challenges.
Safety and Restricted Communication are two critical challenges faced by practical Multi-Agent Systems (MAS). However, most Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety, and their applicability is rather limited due to the fully connected communication. To address these issues, we propose a novel framework, Graph-based Safe MARL (GS-MARL), to enhance the safety and scalability of MARL methods. Leveraging the inherent graph structure of MAS, we design a Graph Neural Network (GNN) based on message passing to aggregate local observations and communications of varying sizes. Furthermore, we develop a constrained joint policy optimization method in the setting of local observation to improve safety. Simulation experiments demonstrate that GS-MARL achieves a better trade-off between optimality and safety compared to other methods, and in large-scale communication-limited scenarios GS-MARL achieves a success rate at least 10% higher than the leading baselines. The feasibility of our method is also verified by hardware implementation with Mecanum-wheeled vehicles. Codes and demos are available at https://github.com/finleygou/GS-MARL.
本报告对智能体(Agent)研究领域的文献进行了系统性归纳,划分为四个核心维度:一是以大语言模型(LLM)为核心的任务规划与协作智能体,侧重于推理与复杂场景应用;二是以多智能体强化学习(MARL)为主的复杂系统控制,侧重于大规模环境下的协作优化;三是具身智能与人机交互,强调智能体在医疗、心理健康等领域的社交化情感表达与交互体验;四是通用智能体架构、基础认知理论与安全性研究,涵盖了系统的底层构建方法、社会动力学仿真及应对AI信任与风险的保障机制。整体呈现出从单一算法优化向模型驱动、具身感知与安全可信并重的协同演进趋势。