交通出行、agent、大模型、算法
多智能体强化学习在交通信号与流量协调控制中的应用
该组文献集中研究利用多智能体强化学习(MARL)架构解决城市路网的信号灯调度、交叉口协同控制及大规模路网流量优化问题,侧重于系统协作与分布式策略。
- Parallel System-Based Predictive Control for Traffic Signals in Large-Scale Road Networks(Xingyuan Dai, Yan Zhang, Yiqing Tang, Hongrui Chen, Jiaming Sun, Fei Cong, Yanan Lu, Yisheng Lv, 2023, 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC))
- Robust Dynamic Bus Control: a Distributional Multi-Agent Reinforcement Learning Approach(Jiawei Wang, Lijun Sun, 2021, IEEE Transactions on Intelligent Transportation Systems)
- Mitigating Bus Bunching via Hierarchical Multi-Agent Reinforcement Learning(Mengdi Yu, Tao Yang, Chun-Xiao Li, Yaohui Jin, Yanyan Xu, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Towards Enhanced Fairness and Sample Efficiency in Traffic Signal Control(Xingshuai Huang, Di Wu, Michael R. M. Jenkin, Benoit Boulet, 2024, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Cooperative Multi-Agent Reinforcement Learning Framework for Edge Intelligence-Empowered Traffic Light Control(Haiyong Shi, Bingyi Liu, Enshu Wang, Weizhen Han, Jinfan Wang, Shihong Cui, Libing Wu, 2024, IEEE Transactions on Consumer Electronics)
- Adaptive Multi-Agent Reinforcement Learning for Dynamic Traffic Signal Optimization In Zero-Emission Urban Mobility(Aakansha Soy, Roohee Khan, Mamta Pandey, 2025, International Journal of Basic and Applied Sciences)
- A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal Control(Anirudh Satheesh, Keenan Powell, 2025, ACM Journal on Autonomous Transportation Systems)
- An Improved Multi-agent based Data Driven Distributed Adaptive Cooperative Control in Traffic Network Signal Timing(Honghai Ji, H. Yin, Ye Ren, Li Wang, Shida Liu, 2022, 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS))
- A multi‐agent deep reinforcement learning approach for traffic signal coordination(Ta-Yin Hu, Zhuoran Li, 2024, IET Intelligent Transport Systems)
- An Optimization Method for Traffic Signals Oriented Towards Multi-Priority Vehicles(Qinghan Huang, Lei Nie, Dandan Qi, Qiuming Ai, 2024, 2024 5th International Conference on Artificial Intelligence and Computer Engineering (ICAICE))
- Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning(Koorosh Moslemi, Chi-Guhn Lee, 2025, ArXiv Preprint)
- A Multi-Agent Reinforcement Learning Method With Route Recorders for Vehicle Routing in Supply Chain Management(Lei Ren, Xiaoyang Fan, Jin Cui, Zhen Shen, Yisheng Lv, Gang Xiong, 2022, IEEE Transactions on Intelligent Transportation Systems)
- Multi-Objective Task Allocation and Path Planning in Heterogeneous Multi-Robot Systems Using Hierarchical Framework and Reinforcement Learning(S. Lo, R. Chen, 2025, 2025 International Conference on Automation Technology (Automation))
- Multi-Agent Traffic Control System using Deep Deterministic Policy Gradient(S. M. K., Chaitra B V, Dharuvkumar Bhansali, K. P, Dharan Gowda H E, Karthik Mallappa Adarakatti, 2025, 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON))
- Fast Learning for Multi-Agent with Combination of Imitation Learning and Model-Based Learning for Formation Change of Transport Robots(Keisuke Azetsu, A. Budiyanto, Nobutomo Matsunaga, 2024, 2024 International Joint Conference on Neural Networks (IJCNN))
- Traffic flow control using multi-agent reinforcement learning(A. Zeynivand, A. Javadpour, S. Bolouki, A. K. Sangaiah, F. Ja’fari, P. Pinto, W. Zhang, 2022, Journal of Network and Computer Applications)
- GR-MADRL: a multi-agent deep reinforcement learning framework with hamiltonian optimization for task offloading in vehicular fog computing(Samuel Akwasi Frimpong, Mu Han, Wenyi Zheng, Andrew Quansah, 2025, Autonomous Agents and Multi-Agent Systems)
- Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning(Qiang Li, Jin Niu, Qin Luo, Lina Yu, 2025, ArXiv Preprint)
- Integrated Deep Reinforcement Learning Agent Approach for Multi-Lane Cooperative Variable Speed Limit Control(Dedong Xiao, Yitong Wang, Lanjing Xing, Zhongyi Han, Tong Li, Chao Wang, Sihao Chen, 2026, IEEE Access)
- CAV Trio: A Structurally Mobile Regulator for Lane-Wise Mixed Traffic with Hierarchical Deep Learning Control Strategy(Chen Zhang, Yunwen Xu, Dewei Li, Youren Chen, 2025, 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC))
- Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces(Ziyad Sheebaelhamd, Konstantinos Zisis, Athina Nisioti, Dimitris Gkouletsos, Dario Pavllo, Jonas Kohler, 2021, ArXiv Preprint)
- Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control(Qiang Li, Ningjing Zeng, Lina Yu, 2025, ArXiv Preprint)
- Smart Traffic Control using Reinforcement Learning in Urban Transport Systems(Smitha Sharath Shankar, Rudresh. A N, K. Reddy, Gnanasekaran. M, S. Sumanth, 2025, 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON))
- A Clustering-Based Multi-Agent Reinforcement Learning Framework for Finer-Grained Taxi Dispatching(Taha M. Rajeh, Zhpeng Luo, Muhammad Hafeez Javed, Fares Alhaek, Tianrui Li, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Deep reinforcement learning of event-triggered communication and control for multi-agent cooperative transport(Kazuki Shibata, Tomohiko Jimbo, Takamitsu Matsubara, 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA))
- Switching Policies based on Multi-Objective Reinforcement Learning for Adaptive Traffic Signal Control(Takumi Saiki, S. Arai, 2022, 2022 61st Annual Conference of the Society of Instrument and Control Engineers (SICE))
- A Multi-agent Traffic Signal Control System Using Reinforcement Learning(Wei Wu, Geng Haifei, J. An, 2009, 2009 Fifth International Conference on Natural Computation)
- A Multi-agent Ant Colony Optimization Algorithm for Effective Vehicular Traffic Management(S. S, V. S. S., 2020, Lecture Notes in Computer Science)
- Deep Reinforcement Learning-Based Traffic Light Scheduling Framework for SDN-Enabled Smart Transportation System(Neetesh Kumar, Sarthak Mittal, V. Garg, Neeraj Kumar, 2022, IEEE Transactions on Intelligent Transportation Systems)
- A Collaborative Control Scheme for Smart Vehicles Based on Multi-Agent Deep Reinforcement Learning(Liyan Shi, Hairui Chen, 2023, IEEE Access)
- CoMASig: A Collaborative Multi-Agent Signal Control to Support Senior Drivers(J. M. Bailey, Fatemeh Golpayegani, Siobhán Clarke, 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC))
- A Multi-Agent Urban Traffic Control System Cooperated with Dynamic Route Guidance(Xin Chen, Zhao-sheng Yang, Hai-yang Wang, 2006, 2006 International Conference on Machine Learning and Cybernetics)
- Reinforcement Learning-Based Countermeasures Against Attacking UAV Swarms(Jennifer Simonjan, Kseniia Harshina, M. Schranz, 2023, 2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT))
- Learn to Refine: Synergistic Multi-Agent Path Optimization for Lifelong Conflict-Free Navigation of Autonomous Vehicles(Junjun Li, Zeyuan Ma, Ting Huang, Yue-jiao Gong, 2025, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2)
- Deep reinforcement learning of event-triggered communication and consensus-based control for distributed cooperative transport(Kazuki Shibata, Tomohiko Jimbo, Takamitsu Matsubara, 2022, Robotics and Autonomous Systems)
- Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning(Rupali Bhati, Sai Krishna Gottipati, Clodéric Mars, Matthew E. Taylor, 2023, ArXiv Preprint)
- Implementing an adaptive traffic signal control algorithm in an agent-based transport simulation(Nico Kühnel, Theresa Thunig, K. Nagel, 2018, Procedia Computer Science)
- Balancing Fairness and Efficiency in Transport Network Design through Reinforcement Learning(D. Michailidis, S. Ghebreab, F. Santos, 2023, Adaptive Agents and Multi-Agent Systems)
- Adaptive traffic signal control in multiple intersections network(K. Benhamza, Hamid Seridi, 2015, Journal of Intelligent & Fuzzy Systems)
- Optimization of Traffic Signal Cooperative Control with Sparse Deep Reinforcement Learning Based on Knowledge Sharing(Lingling Fan, Yu-Shuang Yang, Honghai Ji, Shuangshuang Xiong, 2025, Electronics)
- An Event-Driven Multi Agent System for Scalable Traffic Optimization(Geir Horn, Tomasz Przezdziek, M. Büscher, S. Venticinque, Rocco Aversa, B. Martino, A. Esposito, Pawel Skrzypek, Mark Leznik, 2020, Advances in Intelligent Systems and Computing)
- QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning(Pascal Leroy, Damien Ernst, Pierre Geurts, Gilles Louppe, Jonathan Pisane, Matthia Sabatelli, 2020, ArXiv Preprint)
- A Multi-agent Traffic Signal Control System Using Reinforcement Learning(Wei Wu, Geng Haifei, J. An, 2009, 2009 Fifth International Conference on Natural Computation)
- AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions(Shota Takayama, Katsuhide Fujita, 2025, ArXiv Preprint)
- AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions(Shota Takayama, Katsuhide Fujita, 2025, ArXiv Preprint)
- Multi-Agent Reinforcement Learning Based on Representational Communication for Large-Scale Traffic Signal Control(Rohit Bokade, Xiaoning Jin, Chris Amato, 2023, IEEE Access)
- Multi-objective optimization of freeway traffic flow via a fuzzy reinforcement learning method(Zhaohui Yang, Kaige Wen, 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE))
- Multi-gated perimeter flow control for monocentric cities: Efficiency and equity(Ruzanna Mat Jusoh, Konstantinos Ampountolas, 2024, ArXiv Preprint)
- MATRICS: A Multi-Agent Deep Reinforcement Learning-Based Traffic-Aware Intelligent Lane-Change System(L. Das, Myounggyu Won, 2025, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Sustainability-Oriented Urban Traffic System Optimization Through a Hierarchical Multi-Agent Deep Reinforcement Learning Framework(Qian Cao, Jing Li, P. Trucco, 2026, Sustainability)
- Multi-Object-Based Efficient Traffic Signal Optimization Framework via Traffic Flow Analysis and Intensity Estimation Using UCB-MRL-CSFL(Zainab Saadoon Naser, H. Marouane, Ahmed Fakhfakh, 2025, Vehicles)
- Efficiency and equity based freeway traffic network flow control(Kaige Wen, Wugang Yang, Shiru Qu, 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE))
- Autonomous Separation Assurance in An High-Density En Route Sector: A Deep Multi-Agent Reinforcement Learning Approach(Marc Brittain, Peng Wei, 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC))
- Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections(Antonio Guillen-Perez, M. Cano, 2022, IEEE Transactions on Vehicular Technology)
- Optimization of transport traffic in a simple network using deep learning with reinforcement(V. Levytskyi, 2025, Management of Development of Complex Systems)
- Large Language Model Agents for Adaptive Traffic Signal Control: A Simulation Case Study in Nairobi(Monicah Wambui Hinga, 2025, International Journal of Applied Science and Research)
- Vehicle-Level Fairness-Oriented Constrained Multi-Agent Reinforcement Learning for Adaptive Traffic Signal Control(Wanting Liu, Chengwei Zhang, Wanqing Fang, Kailing Zhou, Yihong Li, Furui Zhan, Qi Wang, Wanli Xue, Rong Chen, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Adaptive Traffic Signal Control with Network-Wide Coordination(Yong Chen, Juncheng Yao, Chunjiang He, Hanhua Chen, Hai Jin, 2017, Lecture Notes in Computer Science)
- Field Deployment of Multi-Agent Reinforcement Learning Based Variable Speed Limit Controllers(Yuhang Zhang, Zhiyao Zhang, Marcos Quiñones-Grueiro, William Barbour, Clay Weston, Gautam Biswas, Dan Work, 2024, 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC))
- A Novel Multi-Agent Deep RL Approach for Traffic Signal Control(Shijie Wang, Shangbo Wang, 2023, 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops))
- Safe Multi-Agent Deep Reinforcement Learning for the Management of Autonomous Connected Vehicles at Future Intersections(Rui Zhao, Kui Wang, Yun Li, Yuze Fan, Fei Gao, Zhenhai Gao, 2025, IEEE Transactions on Parallel and Distributed Systems)
- Learning Correlated Communication Topology in Multi-Agent Reinforcement learning(Yali Du, Bo Liu (Benjamin Liu), V. Moens, Ziqi Liu, Zhicheng Ren, Jun Wang, Xu Chen, Haifeng Zhang, 2021, Adaptive Agents and Multi-Agent Systems)
- An Improved Traffic Signal Control Method Based on Multi-agent Reinforcement Learning(Jianyou Xu, Zhichao Zhang, Shuo Zhang, Jiayao Miao, 2021, 2021 40th Chinese Control Conference (CCC))
- Multi-Armed Bandit Based Traffic Signal Control for Congestion Management(Ke Bai, 2025, ITM Web of Conferences)
- Stop-Free Strategies for Traffic Networks: Decentralized On-line Optimization(M. Tlig, O. Buffet, Olivier Simonin, 2014, Frontiers in Artificial Intelligence and Applications)
- Coordinated dual-objective transit signal priority: a deep reinforcement learning approach(Wenfeng Hu, Yuzhe Lu, Yifan Zhao, Hirotaka Ishihara, Amer Shalaby, B. Abdulhai, 2025, Transportmetrica B: Transport Dynamics)
- Blue Phase: Optimal Network Traffic Control for Legacy and Autonomous Vehicles(David Rey, Michael W Levin, 2018, ArXiv Preprint)
- Application of Traffic Light Control in Oversaturated Urban Network Using Multi-Agent Deep Reinforcement Learning(Ei Ei Mon, H. Ochiai, C. Aswakul, 2024, IEEE Access)
- Data-driven Adaptive Cooperative Control for Urban Traffic Signal Timing in Multi-intersections(Honghai Ji, Hao Liu, Shida Liu, Li Wang, Lingling Fan, 2020, 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS))
- Deep reinforcement learning based cooperative control of traffic signal for multi‐intersection network in intelligent transportation system using edge computing(A. Paul, S. Mitra, 2022, Transactions on Emerging Telecommunications Technologies)
- A Deep Reinforcement Learning Approach to Traffic Signal Control(Aquib Junaid Razack, Vysyakh Ajith, Rajesh K. Gupta, 2021, 2021 IEEE Conference on Technologies for Sustainability (SusTech))
- Multi-agent reinforcement learning framework for autonomous traffic signal control in smart cities(O. Olusanya, Yetunde Owosho, I. Daniyan, A. Elegbede, Q. B. Sodipo, A. Adeodu, H. Phuluwa, T. Ramasu, Mukondeleli Grace Kana-Kana Katumba, 2025, Frontiers in Mechanical Engineering)
自主驾驶与多智能体系统协同控制
该组聚焦于自动驾驶车辆与移动系统在复杂动态环境中的自主决策、路径规划、编队控制、避障策略及多智能体协作范式。
- Agent-based planning and control for groupage traffic(Max Gath, O. Herzog, S. Edelkamp, 2013, 2013 10th International Conference and Expo on Emerging Technologies for a Smarter World (CEWIT))
- Situation-aware Path Optimization for Autonomous Systems: A Deep Reinforcement Learning based Approach(P. Ashritha, Srija Nandamuri, Yarra Venkata Vijay, Preeth Raguraman, K. Guravaiah, 2025, 2025 IEEE 9th International Conference on Information and Communication Technology (CICT))
- A Multi-Agent Rollout Approach for Highway Bottleneck Decongestion in Mixed Autonomy(Lu Liu, Maonan Wang, M. Pun, Xi Xiong, 2024, 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC))
- Adaptive Decision Making in Autonomous Campus Shuttles using Deep Deterministic Policy Gradient(O. Aderoba, Rex Ugochukwu Iroaganachi, K. Mpofu, O. Ogundipe, 2025, NIPES Journal of Science and Technology Research)
- A Robust Reinforcement Learning Framework for Platoon Control of Heterogeneous Vehicles Under Uncertain Dynamics(E. Y. Bejarbaneh, Haiping Du, Jun Shen, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Leading Cruise Control in Mixed Traffic Flow(Jiawei Wang, Yang Zheng, Chaoyi Chen, Qing Xu, Keqiang Li, 2020, ArXiv Preprint)
- Multi-Vehicle Mixed-Reality Reinforcement Learning for Autonomous Multi-Lane Driving(Rupert Mitchell, Jenny Fletcher, Jacopo Panerati, Amanda Prorok, 2019, International Joint Conference on Autonomous Agents and Multiagent Systems)
- Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic(Wei Zhou, Dong Chen, Jun Yan, Zhaojian Li, Huilin Yin, Wanchen Ge, 2021, ArXiv Preprint)
- Sustainable Smart Cities through Multi-Agent Reinforcement Learning-Based Cooperative Autonomous Vehicles(Ali Louati, Hassen Louati, Elham Kariri, Wafa Neifar, Mohamed K. Hassan, M. H. Khairi, Mohammed A. Farahat, Heba M. El-Hoseny, 2024, Sustainability)
- Optimal Formation of Autonomous Vehicles in Mixed Traffic Flow(Keqiang Li, Jiawei Wang, Yang Zheng, 2020, ArXiv Preprint)
- Reinforcement Learning-based Optimized Driving Behaviour Framework for Autonomous Vehicles in Intelligent Transportation System(K. Jash, Jugal Patel, Rajesh K. Gupta, N. Jadav, Sudeep Tanwar, Shivani Desai, 2024, 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT))
- Enforcing Cooperative Safety for Reinforcement Learning-Based Mixed-Autonomy Platoon Control(Jingyuan Zhou, Longhao Yan, Jinhao Liang, Kaidi Yang, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Attention-based multi-agent reinforcement learning for traffic flow stability in mountainous tunnel entrances(Mengmeng Duan, 2025, Scientific Reports)
- Coordination for Complex Road Conditions at Unsignalized Intersections: A MADDPG Method with Enhanced Data Processing(Ruo Chen, Yang Zhu, Hongye Su, 2025, Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems)
- Utilizing Multi-Agent Deep Reinforcement Learning for Autonomous Intersection Management Systems: A Promising Approach(Mostafa K. Ghaith, Mohamed M. Rehaan, N. Shouman, Y. Abdalla, O. Shehata, 2023, 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC))
- Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning(Ziyu Cheng, Jinsheng Ren, Zhouxian Jiang, Chenzhihang Li, Rongye Shi, Bin Liang, Jun Yang, 2026, ArXiv Preprint)
- Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations(Ozsel Kilinc, Giovanni Montana, 2018, ArXiv Preprint)
- SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning(Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu, 2024, ArXiv Preprint)
- Decentralized Multi-Agent Reinforcement Learning with Global State Prediction(Josh Bloom, Pranjal Paliwal, Apratim Mukherjee, Carlo Pinciroli, 2023, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Variational Policy Propagation for Multi-agent Reinforcement Learning(Chao Qu, Hui Li, Chang Liu, Junwu Xiong, James Zhang, Wei Chu, Weiqiang Wang, Yuan Qi, Le Song, 2020, ArXiv Preprint)
- Quantifying the effects of environment and population diversity in multi-agent reinforcement learning(Kevin R. McKee, Joel Z. Leibo, Charlie Beattie, Richard Everett, 2021, ArXiv Preprint)
- Accelerating Training in Pommerman with Imitation and Reinforcement Learning(Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar, 2019, ArXiv Preprint)
- Transfer Learning versus Multi-agent Learning regarding Distributed Decision-Making in Highway Traffic(Mark Schutera, Niklas Goby, Dirk Neumann, Markus Reischl, 2018, ArXiv Preprint)
- ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning(Hangyu Mao, Zhibo Gong, Yan Ni, Zhen Xiao, 2017, ArXiv Preprint)
- Learning Distributed Stabilizing Controllers for Multi-Agent Systems(Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush K. Sharma, 2021, ArXiv Preprint)
- Graph cooperation deep reinforcement learning for ecological urban traffic signal control(Liping Yan, Lulong Zhu, Kai Song, Zhaohui Yuan, Yunjuan Yan, Yue Tang, Chan Peng, 2022, Applied Intelligence)
- A Bio-Inspired Cognitive Agent for Autonomous Urban Vehicles Routing Optimization(G. Vitello, A. Alongi, V. Conti, S. Vitabile, 2017, IEEE Transactions on Cognitive and Developmental Systems)
- Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi(F. Bredell, H. A. Engelbrecht, J. C. Schoeman, 2024, ArXiv Preprint)
- Re-conceptualising the Language Game Paradigm in the Framework of Multi-Agent Reinforcement Learning(Paul Van Eecke, Katrien Beuls, 2020, ArXiv Preprint)
- Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments(Luca Castri, Gloria Beraldo, Nicola Bellotto, 2025, ArXiv Preprint)
- Causal Explanations for Sequential Decision-Making in Multi-Agent Systems(Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht, 2023, ArXiv Preprint)
- Altruistic Decision-Making for Autonomous Driving with Sparse Rewards(Jack Geary, Henry Gouk, 2020, ArXiv Preprint)
- A Neural Network Approach Applied to Multi-Agent Optimal Control(Derek Onken, Levon Nurbekyan, Xingjian Li, Samy Wu Fung, Stanley Osher, Lars Ruthotto, 2020, ArXiv Preprint)
- Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning(Barna Pásztor, Ilija Bogunovic, Andreas Krause, 2021, ArXiv Preprint)
- Towards Autonomous Micromobility through Scalable Urban Simulation(Wayne Wu, Honglin He, Chaoyuan Zhang, Jack He, Seth Z. Zhao, Ran Gong, Quanyi Li, Bolei Zhou, 2025, ArXiv Preprint)
- Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning(Zixiang Wang, Hao Yan, Changsong Wei, Junyu Wang, Shi Bo, Minheng Xiao, 2024, Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning)
- AIoT-based smart traffic management system(A. Elbasha, Mohammad M. Abdellatif, 2025, Natural Language Processing, Information Retrieval and AI Trends 2025)
- Intelligent Electric Vehicle Charging Recommendation Based on Multi-Agent Reinforcement Learning(Weijiao Zhang, Hao Liu, Fan Wang, Tong Xu, Haoran Xin, D. Dou, Hui Xiong, 2021, Proceedings of the Web Conference 2021)
- A Traffic Management Framework for On-Demand Urban Air Mobility Systems(Milad Pooladsanj, Ketan Savla, Petros A. Ioannou, 2023, ArXiv Preprint)
- Game theoretic approach on Real‐time decision making for IoT‐based traffic light control(Khac-Hoai Nam Bui, Jai E. Jung, David Camacho, 2017, Concurrency and Computation: Practice and Experience)
- Evaluation of Railway Intelligent Transportation Systems to Construct Safer Railway Transport Systems with a Novel Decision-Making Model(Ö. F. Görçün, Abrar Hussain, Kifayat Ullah, D. Pamucar, Vladimir Simić, 2025, Transport Policy)
- Autonomous UAV Traffic Surveillance Using Deep Reinforcement Learning for Continuous Congestion Hotspot Monitoring in Kuwait(Ali Fenjan, Huda Nael, Moudhi Almutairi, A. Alajmi, M. Almulla, 2025, 2025 International Conference on Computer and Applications (ICCA))
- Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)(Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma, Ken Huang, 2025, ArXiv Preprint)
- Adaptive bidirectional planning framework for enhanced safety and robust decision-making in autonomous navigation systems(Daoming Yu, Shaowen Wang, Yao Xu, Tianqi Wang, Jiaxin Zou, 2025, The Journal of Supercomputing)
- Agents for Traffic Simulation(Arne Kesting, Martin Treiber, Dirk Helbing, 2008, ArXiv Preprint)
- RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System(Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast, 2025, 2025 IEEE 7th International Conference on Cognitive Machine Intelligence (CogMI))
- Toward Adaptive and Coordinated Transportation Systems: A Multi-Personality Multi-Agent Meta-Reinforcement Learning Framework(Songjun Huang, Chuanneng Sun, Ruo-Qian Wang, D. Pompili, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Highway Travel-Time Forecasting with Greenshields Model-Based Cascaded Fuzzy Logic Systems(Miin-Jong Hao, Yu-Xuan Zheng, 2025, Applied Sciences)
- Optimization of Flight Scheduling in Urban Air Mobility Considering Spatiotemporal Uncertainties(Lingzhong Meng, Minggong Wu, Xiang-xi Wen, Zhichong Zhou, Qing-guo Tian, 2025, Aerospace)
- Traffic management model for vehicle re-routing and traffic light control based on Multi-Objective Particle Swarm Optimization(Chaimae El Hatri, J. Boumhidi, 2017, Intelligent Decision Technologies)
- A cognitive internet of things resource allocation method based on multi-agent reinforcement learning algorithm(Rong Wang, Yanjin Shen, Dongtao Wang, Wan-Tong Li, 2026, Scientific Reports)
- Multi-Agent Reinforcement Learning Optimization for Urban Earthquake Emergency Evacuation With GIS-Based Real-Time Decision Support(Zhaodong Zhong, Z. Ren, 2026, Journal of Earthquake and Tsunami)
- Multimodal Transportation Model for Real-Time Prediction and Adaptive Optimization Using Advanced Machine Learning Techniques(Ms. Sweta Kahurke, K. Reddy, 2026, 2026 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI))
- Evolution-Assisted Deep Reinforcement Learning for Fast Charging Station Coordinated Operation(Xiaoying Yang, Yu Gu, Fuhua Jia, Yiran Li, Hongru Wang, Nanjiang Du, Tianxiang Cui, Yujian Ye, Ruibing Bai, 2024, 2024 IEEE Congress on Evolutionary Computation (CEC))
- Collective traffic of agents that remember(Danny Raj M, Arvind Nayak, 2023, ArXiv Preprint)
- CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario(Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Z. Li, 2019, The World Wide Web Conference)
- Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents(Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin, 2024, ArXiv Preprint)
- Causal-Paced Deep Reinforcement Learning(Geonwoo Cho, Jaegyun Im, Doyoon Kim, Sundong Kim, 2025, ArXiv Preprint)
- Spatiotemporal Heterogeneity of AI-Driven Traffic Flow Patterns and Land Use Interaction: A GeoAI-Based Analysis of Multimodal Urban Mobility(Olaf Yunus Laitinen Imanov, 2026, ArXiv Preprint)
- Experimental Analysis of Two-Dimensional Pedestrian Flow in front of the Bottleneck(Marek Bukáček, Pavel Hrabák, Milan Krbálek, 2014, ArXiv Preprint)
- Multi-agent reinforcement learning for electric vehicles joint routing and scheduling strategies(Yi Wang, D. Qiu, G. Strbac, 2022, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC))
- A Multi-Agent Reinforcement Learning-Based Resilience Engineering Method for Mobility-as-a-Service(Zhengshu Zhou, Weijie Yu, Tingting Zhao, Qian Long, Yutaka Matsubara, Hiroaki Takada, 2026, IEEE Transactions on Network and Service Management)
- Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks(Ankit Bhardwaj, Rohail Asim, S. Chauhan, Y. Zaki, Lakshmi Subramanian, 2025, AAAI Conference on Artificial Intelligence)
- A Centralized Multi-Agent DRL-Based Trajectory Control Strategy for Unmanned Aerial Vehicle-Enabled Wireless Communications(G. B. Tarekegn, Rong-Terng Juang, B. A. Tesfaw, Hsin-Piao Lin, Huan-Chia Hsu, R. Tarekegn, L. Tai, 2024, IEEE Open Journal of Vehicular Technology)
- Intelligent Railway Capacity and Traffic Management Using Multi-Agent Deep Reinforcement Learning(Stefan Schneider, Anirudha Ramesh, Anne Roets, Ciprian Stirbu, F. Safaei, Faten Ghriss, Jan Wülfing, Mehmet Güra, Nima Sibon, Rick Gentry, Roman Liessner, Thomas Hustache, Thomas Lecat, Umashankar Deekshith, Valerii Markin, Victor Le, Wissam Bejjani, Michael Küpper, Irene Sturm, 2024, 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC))
- Interaction-Aware Hierarchical Representation of Multi-Vehicle Reinforcement Learning for Cooperative Control in Dense Mixed Traffic(Yuxin Cai, Zhengxuan Liu, Xiangkun He, Z. Zuo, W. Yau, Chen Lv, 2024, 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC))
- Dynamic Repositioning in Dock-less Bike-sharing System: A Multi-agent Reinforcement Learning Approach(Xinghua Li, Xinyuan Zhang, Cheng Cheng, Wei Wang, Chaowei Yang, 2022, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC))
- Multi-Agent Reinforcement Learning for Distributed Cooperative Vehicular Positioning(Bernardo Camajori Tedeschini, Mattia Brambilla, Monica Nicoli, M. Win, 2025, IEEE Transactions on Intelligent Vehicles)
- Urban Network Traffic Analysis, Data Imputation, and Flow Prediction based on Probabilistic PCA Model of Traffic Volume Data(Muhammad Farhan Fathurrahman, H. Sutarto, I. Šemanjski, 2021, 2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA))
- Traffic control using intelligent timing of traffic lights with reinforcement learning technique and real-time processing of surveillance camera images(Mahdi Jamebozorg, Mohsen Hami, Sajjad Deh Deh Jani, 2024, ArXiv Preprint)
- Integration of Regional Demand Management and Signals Control for Urban Traffic Networks(Zhao Zhou, Shu Lin, W. Du, Haili Liang, 2019, IEEE Access)
- MERL: Multi-Head Reinforcement Learning(Yannis Flet-Berliac, Philippe Preux, 2019, ArXiv Preprint)
- Safety-constrained Reinforcement Learning with Interaction-aware for Decision-making of Autonomous Driving(Di Zhang, Haonan Luo, Honglin Dong, Jianfeng Lu, 2025, 2025 IEEE International Conference on Multimedia and Expo (ICME))
- Mobility-Aware Resource Allocation for mmWave IAB Networks: A Multi-Agent Reinforcement Learning Approach(Bibo Zhang, Ilario Filippini, 2022, ArXiv Preprint)
- Fuzzy-integrated multi-agent system for autonomous smart city traffic management(Goverdhan Reddy Jidiga, Chaitali Bhattacharya, Cindhe Ramesh, Avneesh Vashistha, Kriti Srivastava, Sajjan Singh, 2025, International Journal of Information Technology)
- Adversarial Deep Reinforcement Learning Attacks on Multi‐Agent Autonomous Cooperative Driving Policies(A. Alzubaidi, A. Al‐Sumaiti, Majid Khonji, 2025, IET Intelligent Transport Systems)
- Receding Horizon Control Using Graph Search for Multi-Agent Trajectory Planning(Patrick Scheffe, Matheus V. A. Pedrosa, K. Flaßkamp, Bassam Alrifaee, 2021, IEEE Transactions on Control Systems Technology)
- Refined Hardness of Distance-Optimal Multi-Agent Path Finding(Tzvika Geft, Dan Halperin, 2022, ArXiv Preprint)
- Multi-task dispatch of shared autonomous electric vehicles for Mobility-on-Demand services – combination of deep reinforcement learning and combinatorial optimization method(Ning Wang, Jiahui Guo, 2022, Heliyon)
- Model-Free Optimal Control of Linear Multi-Agent Systems via Decomposition and Hierarchical Approximation(Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, 2020, ArXiv Preprint)
- Price of anarchy of traffic assignment with exponential cost functions(Jianglin Qiao, D. Jonge, Dongmo Zhang, Simeon Simoff, Carles Sierra, Bo Du, 2023, Autonomous Agents and Multi-Agent Systems)
- Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network(Yuanqing Wu, Siqin Liao, Xiang Liu, Zhihang Li, Renquan Lu, 2021, IEEE Transactions on Neural Networks and Learning Systems)
- Q-learning based intelligent multi-objective particle swarm optimization of light control for traffic urban congestion management(Chaimae El Hatri, J. Boumhidi, 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt))
- Rebalancing shared mobility-on-demand systems: A reinforcement learning approach(Jian Wen, Jinhuan Zhao, Patrick Jaillet, 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC))
- Trust-based Consensus in Multi-Agent Reinforcement Learning Systems(Ho Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi, 2022, ArXiv Preprint)
- Exploring Hierarchy-Aware Inverse Reinforcement Learning(Chris Cundy, Daniel Filan, 2018, ArXiv Preprint)
- Multi-Objective Multi-Agent Deep Reinforcement Learning for Dynamic Pricing and RideSharing Optimization(Cheng-Fen Hsueh, Ta-Yin Hu, 2025, 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC))
- Cooperative Urban Air Mobility Trajectory Design for Power and AoI Optimization: A Multi-Agent Reinforcement Learning Approach(Hyeonsu Kim, P. Aung, M. S. Munir, Walid Saad, Choong Seon Hong, 2025, IEEE Transactions on Vehicular Technology)
- Blockchain-Integrated Multiagent Deep Reinforcement Learning for Securing Cooperative Adaptive Cruise Control(G. Raja, Kottilingam Kottursamy, K. Dev, R. Narayanan, Ashmitha Raja, K. Karthik, 2022, IEEE Transactions on Intelligent Transportation Systems)
- Deep Dyna-Q for Rapid Learning and Improved Formation Achievement in Cooperative Transportation(A. Budiyanto, N. Matsunaga, 2023, Automation)
- Route Orchestrator: An Integrated Multi-Agent Path Finding Framework with Hybrid Path Conversion to SUMO(Fatih Kaya, Pinar Kirci, 2025, 2025 Innovations in Intelligent Systems and Applications Conference (ASYU))
- Learning to Drive in the NGSIM Simulator Using Proximal Policy Optimization(Yang Zhou, Yunxing Chen, 2023, Journal of Advanced Transportation)
- Strategic bidding in freight transport using deep reinforcement learning(W. V. Heeswijk, 2021, Annals of Operations Research)
- Research on Obstacle Avoidance Strategy of Automated Heavy Vehicle Platoon in High-Speed Scenarios(Yingfeng Cai, Lili Zhan, Xiaoqiang Sun, Yubo Lian, Youguo He, Long Chen, 2024, Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering)
- Collaborative Decision-making in Heterogeneous UAV Swarms based on Multi-agent Deep Reinforcement Learning(Feng Yang, Zhi Li, Jiahao Fu, 2024, 2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC))
- Multi-agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control(Zeno Woywood, J. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer, 2024, Lecture Notes in Computer Science)
- A Deep Reinforcement Learning Framework for Real-Time Vehicle Routing with Dynamic Traffic Conditions(Antonio Salcuni, Gaetano Volpe, A. M. Mangini, M. Fanti, 2025, 2025 IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE))
- The Quest for a Common Model of the Intelligent Decision Maker(Richard S. Sutton, 2022, ArXiv Preprint)
- Model-free Motion Planning of Autonomous Agents for Complex Tasks in Partially Observable Environments(Junchao Li, Mingyu Cai, Zhen Kan, Shaoping Xiao, 2023, ArXiv Preprint)
- Learning to Control and Coordinate Mixed Traffic Through Robot Vehicles at Complex and Unsignalized Intersections(Dawei Wang, Weizi Li, Lei Zhu, Jia Pan, 2023, ArXiv Preprint)
- Learning Reward Machines in Cooperative Multi-Agent Tasks(Leo Ardon, Daniel Furelos-Blanco, Alessandra Russo, 2023, ArXiv Preprint)
- A Game-Theoretic Approach to Multi-Agent Trust Region Optimization(Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang, 2021, ArXiv Preprint)
- Data Technology Triad: A Model towards Integrated Autonomous Transportation (IAT) Networks(Nistor Andrei, Cezar Scarlat, 2025, Proceedings of the International Conference on Business Excellence)
- Control for Autonomous Intersection Management Based on Adaptive Control Barrier Function(Jie Song, Mikhail M. Svinin, Naoki Wakamiya, 2025, Applied Sciences)
- Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning(Satchit Chatterji, Erman Acar, 2024, ArXiv Preprint)
- Deep Reinforcement Learning for Multi-Agent Interaction(Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht, 2022, ArXiv Preprint)
- Online Learning of Interaction Dynamics with Dual Model Predictive Control for Multi-Agent Systems Using Gaussian Processes(T. M. J. T. Baltussen, A. Katriniok, E. Lefeber, R. Tóth, W. P. M. H. Heemels, 2024, ArXiv Preprint)
大模型与多模态AI驱动的智能交通感知、分析与决策
该组文献集中展示了大语言模型(LLM)在交通语义理解、预测模型优化、复杂决策支持及人类流动性模式分析中的应用新范式。
- FDALLM+: A Functional Data Analysis-Driven Large-Language Model Framework for Network Traffic Prediction(Yujie Sun, Xuyu Wang, Guanqun Cao, Shiwen Mao, 2025, IEEE Internet of Things Journal)
- Large language model empowered smart city mobility(Yong Chen, Haoyu Zhang, Chuanjia Li, Ben Chi, Xiqun Xiqun Michael Chen Chen, Jianjun Wu, 2025, Frontiers of Engineering Management)
- Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management(Hanyu Ding, Yawei Du, Zhengyu Xia, 2025, Applied Sciences)
- Visual Reasoning at Urban Intersections: Fine-Tuning GPT-4O for Traffic Conflict Detection(Sari Masri, Huthaifa I. Ashqar, Mohammed Elhenawy, 2025, 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI))
- TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer(Jiawei Wang, Chuang Yang, Jiawei Yong, Xiaohang Xu, Hongjun Wang, Noboru Koshizuka, Shintaro Fukushima, Ryosuke Shibasaki, Renhe Jiang, 2026, ArXiv Preprint)
- Mixture of Semantic and Spatial Experts for Explainable Traffic Prediction(Yang Hu, Shaobo Li, Dawen Xia, Zhiheng Zhou, Wenyong Zhang, Huaqing Li, Xingxing Zhang, Senzhang Wang, 2025, Proceedings of the 34th ACM International Conference on Information and Knowledge Management)
- GPT4STP: A Novel Ship Trajectory Prediction Method Based on Pre-trained Large Language Model(Han Jiao, Huanhuan Li, Jasmine Siu Lee Lam, Zaili Yang, 2025, 2025 8th International Conference on Transportation Information and Safety (ICTIS))
- Eixão-UAM: LLM-assisted iterative design of a low-altitude urban air mobility corridor in Brasilia(Weigang Li, Juliano Adorno Maia, Emilia Stenzel, Lucas Ramson Siefert, 2025, Frontiers of Information Technology & Electronic Engineering)
- A Foundational individual Mobility Prediction Model based on Open-Source Large Language Models(Zhenlin Qin, Leizhen Wang, Francisco Camara Pereira, Zhe Ma, 2025, Transportation Research Part C: Emerging Technologies)
- A Self-Supervised Multi-Agent Large Language Model Framework for Customized Traffic Mobility Analysis Using Machine Learning Models(Fengze Yang, Xiaoyue Cathy Liu, Lingjiu Lu, Bingzhang Wang, Chenxi Liu, 2025, Transportation Research Record: Journal of the Transportation Research Board)
- Enhancing Taxi Demand Prediction with Limited Data using a Spatial-Temporal Large Language Model(Jing Chen, Yu Wang, Ruimin Li, Changwei Yuan, Changming Zong, Minghui Zhang, Zhang Chen, 2025, Transportation Research Record: Journal of the Transportation Research Board)
- ChenSi Big Model: LLM Intelligent Emergency Assistance Decision System Based on LangChain Framework(Jingxian Wang, Lida Huang, Tao Chen, Xiuzhong Yang, Ying Zang, Zhili Feng, Ning Wei, Xuerui Chen, Zhipeng Li, 2024, 2024 8th Asian Conference on Artificial Intelligence Technology (ACAIT))
- SOLID: a Framework of Synergizing Optimization and LLMs for Intelligent Decision-Making(Yinsheng Wang, Tario G You, Léonard Boussioux, Shan Liu, 2025, ArXiv Preprint)
- Intelligent Traffic Violation Detection and Explanation System Using Computer Vision and Large Language Models(Srishti Kaur, Muskan, Preeti Khera, 2025, 2025 International Conference on Digital Innovations for Sustainable Solutions (ICDISS))
- Harnessing Large Language Models for Predicting Mobility Modes(Amir Badawi, Ana-Maria Olteanu-Raimond, A. L. Guilcher, Karine Zeitouni, 2025, 2025 26th IEEE International Conference on Mobile Data Management (MDM))
- Training Agents using Upside-Down Reinforcement Learning(Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber, 2019, ArXiv Preprint)
- AI Agents: Evolution, Architecture, and Real-World Applications(Naveen Krishnan, 2025, ArXiv Preprint)
- Urban Air Mobility as a System of Systems: An LLM-Enhanced Holonic Approach(Ahmed R. Sadik, Muhammad Ashfaq, Niko Mäkitalo, T. Mikkonen, 2025, 2025 20th Annual System of Systems Engineering Conference (SoSE))
- From Detection to Action: A Multimodal AI Framework for Traffic Incident Response(Afaq Ahmed, Muhammad Farhan, Hassan Eesaar, K. Chong, Hilal Tayara, 2024, Drones)
- Leveraging Large Language Models (LLMs) for Traffic Management at Urban Intersections: The Case of Mixed Traffic Scenarios(Sari Masri, Huthaifa I. Ashqar, Mohammed Elhenawy, 2024, ArXiv Preprint)
- AutoFPDesigner: Automated Flight Procedure Design Based on Multi-Agent Large Language Model(Longtao Zhu, Hongyu Yang, Ge Song, Xin Ma, Yanxin Zhang, Yulong Ji, Jinchang Ren, 2026, IEEE Transactions on Intelligent Transportation Systems)
- A Language Agent for Autonomous Driving(Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang, 2023, ArXiv Preprint)
- LLM-Augmented Reinforcement Learning for Adaptive and Intelligent Decision-Making(Gopinath Karunanithi, Yatindra Kumar Gupta, Mallesh Deshapaga, Somnath Banerjee, Vandana Roy, 2025, 2025 World Conference on Cutting-Edge Science and Technology (WCCEST))
- LLM-LCSA: LLM for Collaborative Control and Decision Optimization in UAV Cluster Security(Hua Song, Zheng Yang, Haitao Du, Yuting Zhang, Jie Zeng, Xinxin He, 2025, Drones)
- SEDM: A Safety‐Enhanced Decision‐Making Framework for Autonomous Driving by Integrating Large Language Models and XGBoost(Jun Li, Baozhu Chen, Kai Xu, Xiaohan Yang, Mengting Sun, Guojun Li, Haojie Du, 2026, IET Intelligent Transport Systems)
- Research on AI-Agent-Based Resource Dispatch Methods for Public Health Emergencies(Jiaying Zeng, Lei Chen, Xingying Liu, 2025, Proceedings of the 2025 6th International Conference on Computer Science and Management Technology)
- Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring(Murat Arda Onsu, Poonam Lohan, B. Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy, 2025, 2025 IEEE Symposium on Computers and Communications (ISCC))
- Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning(Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens, 2025, ArXiv Preprint)
- Towards an Aviation Large Language Model by Fine-tuning and Evaluating Transformers(David Nielsen, Stephen S. B. Clarke, Krishna M. Kalyanam, 2024, 2024 AIAA DATC/IEEE 43rd Digital Avionics Systems Conference (DASC))
- Demystifying Instruction Mixing for Fine-tuning Large Language Models(Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin, 2023, ArXiv Preprint)
- Exploring Advanced Large Language Models with LLMsuite(Giorgio Roffo, 2024, ArXiv Preprint)
- A Large Language Model Agent-Guided Multi-agent System for Adaptive Traffic Signal Control(Minglu Zhu, Congcong Zhu, 2025, Lecture Notes in Computer Science)
- Leveraging LLM Decision-Making in the Internet of Drone Things (IoDT) Ecosystem(Fatima Shibli, Burak Tufekci, Cihan Tunc, Robin Laidig, 2025, ACM Journal on Autonomous Transportation Systems)
- MobGLM: A Large Language Model for Synthetic Human Mobility Generation(Kunyi Zhang, Y. Pang, Yurong Zhang, Y. Sekimoto, 2024, Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems)
- Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation(Yu-Lun Song, Chung-En Tsern, Che-Cheng Wu, Yu-Ming Chang, Syuan-Bo Huang, Wei-Chu Chen, Michael Chia-Liang Lin, Yu-Ta Lin, 2025, ArXiv Preprint)
- Leveraging Large Language Models (LLMs) for Traffic Management at Urban Intersections(Sari Masri, Huthaifa I. Ashqar, Mohammed Elhenawy, 2024, 2025 Sixteenth International Conference on Ubiquitous and Future Networks (ICUFN))
- ALM: A Two-Stage Traffic Anomaly Detection and Analysis System via the Large Language Model(Songyun Wu, Enhuan Dong, H. He, Zhiliang Wang, Dongqi Han, Haina Hu, Jiahai Yang, 2025, NOMS 2025-2025 IEEE Network Operations and Management Symposium)
- Spatial-Temporal Large Language Model for Traffic Prediction(Chenxi Liu, Sun Yang, Qianxiong Xu, Zhishuai Li, Cheng Long, Ziyue Li, Rui Zhao, 2024, 2024 25th IEEE International Conference on Mobile Data Management (MDM))
- GSF-LLM: Graph-Enhanced Spatio-Temporal Fusion-Based Large Language Model for Traffic Prediction(Honggang Wang, Ye Li, Wenzhi Zhao, Hao Zhu, Jin Zhang, Xuening Wu, 2025, Sensors)
- Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System(Shangyu Lou, 2025, ArXiv Preprint)
- Agent-Based Traffic Monitoring and Regulation System for SDN(Xiaodan Guo, Falong Zhou, Weihao Du, Mengdie Deng, Xun Zhu, Juming Bao, 2025, 2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI))
- Real-Time Traffic Insights With Physics-Informed Neural Networks: Integrating the Aw-Rascle Model and LLMs(Tewodros Syum Gebre, Simachew Endale Ashebir, J. Blay, Matilda Anokye, Venktesh Pandey, L. Hashemi-Beni, 2025, IEEE Access)
- Learning universal human mobility patterns with a foundation model for cross-domain data fusion(Haoxuan Ma, Xishun Liao, Yifan Liu, Qinhua Jiang, Chris Stanford, Shangqing Cao, Jiaqi Ma, 2025, Transportation Research Part C: Emerging Technologies)
- Multimodal Intelligent Perception at an Intersection: Pedestrian and Vehicle Flow Dynamics Using a Pipeline-Based Traffic Analysis System(B. Chang, H. Tsai, Chen-Chia Chen, 2026, Electronics)
空中交通、车联网通信与资源调度优化
该组文献关注低空经济(UAM)、车载网络(V2X/IoV)与边缘计算环境下的资源分配、通信功率管理、任务调度及系统稳健性保障。
- Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction(Hongyuan Zhan, Gabriel Gomes, Xiaoye S. Li, Kamesh Madduri, Kesheng Wu, 2018, ArXiv Preprint)
- Intermodal Journey Planning to Transportation Hubs in a Microscopic Environment: A Multi-Objective Multi-Agent Reinforcement Learning Optimization Framework(Dominik Wittenberg, Nick Schade, Jürgen Pannek, 2026, 2026 IEEE/SICE International Symposium on System Integration (SII))
- The Multi-agent Simulation-based Framework for Optimization of Detectors Layout in Public Crowded Places(N. Butakov, D. Nasonov, K. V. Knyazkov, Vladislav A. Karbovskii, Yulia Chuprova, 2015, Procedia Computer Science)
- Exploiting Near Time Forecasting From Social Network To Decongest Traffic(Deepika Pathania, Kamalakar Karlapalem, 2015, ArXiv Preprint)
- AI-Optimized EV Charging Coordination Algorithm for Intelligent Urban Sustainable Mobility Ecosystems(R. Bhandari, George Willy Tusabomu, Amrit Sapkota, 2025, 2025 International Conference on Intelligent Innovations in Engineering and Technology (ICIIET))
- KI-GCNN: Knowledge-Informed Graph Convolutional Neural Network for Multiclass Trajectory Prediction at Signalized Intersections(Zhuangzhuang Chen, Xi Wang, 2025, 2025 25th International Conference on Software Quality, Reliability and Security (QRS))
- DeepCrowd: A Deep Model for Large-Scale Citywide Crowd Density and Flow Prediction(Renhe Jiang, Z. Cai, Zhaonan Wang, Chuang Yang, Z. Fan, Quanjun Chen, K. Tsubouchi, Xuan Song, R. Shibasaki, 2023, IEEE Transactions on Knowledge and Data Engineering)
- The Geospatial Generalization Problem: When Mobility Isn't Mobile(M. Tenzer, Z. Rasheed, K. Shafique, 2023, Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems)
- Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinforcement Learning Approach(Md Ferdous Pervej, Shih-Chun Lin, 2020, 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall))
- SMART-eFlo: An Integrated SUMO-Gym Framework for Multi-Agent Reinforcement Learning in Electric Fleet Management Problem(Shuo Liu, Yunhao Wang, Xu Chen, Yongjie Fu, Xuan Di, 2022, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC))
- Generative and Adaptive AI for Sustainable Supply Chain Design(Sabina-Cristiana Necula, Emanuel Rieder, 2025, Journal of Theoretical and Applied Electronic Commerce Research)
- Secure and Intelligent Low-Altitude Infrastructures: Synergistic Integration of IoT Networks, AI Decision-Making and Blockchain Trust Mechanisms(Yuwen Ye, Xirun Min, Xiangwen Liu, Xiangyin Chen, Kefan Cao, S. Howlader, Xiao Chen, 2025, Sensors)
- Joint Trajectory and Spectrum Optimization in Advanced Air Mobility via Multi-Agent Deep Reinforcement Learning(Qingyang Li, Hongxiang Li, E. Knoblock, 2025, 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring))
- A Deep Spatio-Temporal Graph Learning Framework for Real-Time Urban Accessibility Prediction and Facility Optimization in Smart Cities(Feiyang Xu, 2026, 2026 3rd International Conference on Smart City and Information System (ICSCIS))
- Toward Sustainable Mobility: A Hybrid Quantum–LLM Decision Framework for Next-Generation Intelligent Transportation Systems(N. Jabeur, 2025, Sustainability)
- Distributed Multi-Agent Reinforcement Learning on a Hierarchical Game Model for Railway Engineering Data Collaborative Edge Caching(Yannan Wang, Zhen Liu, Chong Geng, Yidong Li, Xinyu Liu, Q. Gao, 2025, IEEE Transactions on Intelligent Transportation Systems)
- Mapping Socio-Economic Divides with Urban Mobility Data(Yingche Liu, Mengyang Li, 2025, ArXiv Preprint)
- Diffusion-based auction mechanism for efficient resource management in 6G-enabled vehicular metaverses(Jiawen Kang, Yongju Tong, Yue Zhong, Junlong Chen, Minrui Xu, D. Niyato, Runrong Deng, Shiwen Mao, 2024, Science China Information Sciences)
- Privacy-Preserving Reinforcement Learning Framework for V2x Resource Management(Sanjay Kumar, Suman, B. Madhavi, S. Monika, Jaseenash.R Asst, Professor, Dendi Swathi, L. B. Professor, Dept. of Radiodiagnosis Head, 2025, 2025 IEEE 5th International Conference on ICT in Business Industry & Government (ICTBIG))
- Coordinating Cooperative Perception in Urban Air Mobility for Enhanced Environmental Awareness(Timo Häckel, Luca von Roenn, Nemo Juchmann, Alexander Fay, Rinie Akkermans, Tim Tiedemann, Thomas C. Schmidt, 2024, ArXiv Preprint)
- Joint Velocity and Spectrum Optimization in Urban Air Transportation System via Multi-Agent Deep Reinforcement Learning(Ruixuan Han, Hongxiang Li, E. Knoblock, Michael R. Gasper, R. Apaza, 2023, IEEE Transactions on Vehicular Technology)
- Vehicle Edge Computing Task Offloading Strategy Based on Multi-Agent Deep Reinforcement Learning(Jia Bo, Xuanpeng Zhao, 2025, Journal of Grid Computing)
- Energy-Optimal Multi-Agent Navigation as a Strategic-Form Game(Logan Beaver, 2024, ArXiv Preprint)
- Dynamic Spectrum Allocation in Urban Air Transportation System via Deep Reinforcement Learning(Ruixuan Han, Hongxiang Li, E. Knoblock, R. Apaza, 2021, 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC))
- ASTM :Autonomous Smart Traffic Management System Using Artificial Intelligence CNN and LSTM(Christofel Rio Goenawan, 2024, ArXiv Preprint)
- Enhancing Cyber-Resilience in Electric Vehicle Charging Stations: A Multi-Agent Deep Reinforcement Learning Approach(Reza Sepehrzad, Mohammad Javad Faraji, A. Al‐Durra, M. Sadabadi, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Joint Optimization of Time-Frequency Resources for Multi-Predictor Antennas Sounding Reference Signal in High-Speed Railway(Shaoyou Ao, Yong Niu, Yu-Cong Qiao, Ziqi Guo, Maoyuan Jin, Zhu Han, Ning Wang, Bo Ai, 2024, IEEE Transactions on Vehicular Technology)
- Cooperative Edge Caching Based on Elastic Federated and Multi-Agent Deep Reinforcement Learning in Next-Generation Network(Qiong Wu, Wenhua Wang, Pingyi Fan, Qiang Fan, Huiling Zhu, Khaled B. Letaief, 2024, ArXiv Preprint)
- Trajectory Control and Fair Communications for Multi-UAV Networks: A Federated Multi-Agent Deep Reinforcement Learning Approach(G. B. Tarekegn, B. A. Tesfaw, Rong-Terng Juang, Dola Saha, R. Tarekegn, Hsin-Piao Lin, L. Tai, 2025, IEEE Transactions on Wireless Communications)
- Joint Communication Resource Allocation and Velocity Optimization in Advanced Air Mobility via Multi-Agent Reinforcement Learning(Ruixuan Han, Summer Li, E. Knoblock, Michael R. Gasper, Hongxiang Li, R. Apaza, 2023, GLOBECOM 2023 - 2023 IEEE Global Communications Conference)
- SafeLinkX: Hybrid Dsrc-5G/6G Optimization for Safety-Critical Vehicular Communication(J.Surendiran Professor, K. A. Professor, 2026, 2026 4th International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT))
- Density fluctuations and phase separation in a traffic flow model(S. Lubeck, M. Schreckenberg, K. D. Usadel, 1998, ArXiv Preprint)
- Multi-agent deep reinforcement learning with traffic flow for traffic signal control(Liang Hou, Dailin Huang, Jie Cao, Jialin Ma, 2023, Journal of Control and Decision)
- Age of Information Optimization in UAV-enabled Intelligent Transportation System via Deep Reinforcement Learning(Xinmin Li, Jiahui Li, B. Yin, Jiaxin Yan, Yuan Fang, 2022, 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall))
- Data prioritization aware resource allocation in internet of vehicles using multi-agent deep reinforcement learning(Cong Wang, Yingshan Guan, Sancheng Peng, Hao Chen, Guorui Li, 2025, Neural Networks)
- FedMATD3: A Federated Reinforcement Learning Approach for Global Optimization in Multi-Agent Vehicular Task Offloading(Lingqiu Zeng, Y. Huang, Fukun Xie, Qingwen Han, L. Ye, 2025, 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC))
- Multi-Agent Reinforcement Learning for Efficient Resource Allocation in Internet of Vehicles(Jun-han Wang, He He, Jaesang Cha, Incheol Jeong, Chang-Jun Ahn, 2025, Electronics)
- Multi-Agent Deep Reinforcement Learning for Enhancement of Distributed Resource Allocation in Vehicular Network(Odilbek Urmonov, H. Aliev, Hyungwon Kim, 2023, IEEE Systems Journal)
- AoI-Aware Joint Scheduling and Power Allocation in Intelligent Transportation System: A Deep Reinforcement Learning Approach(Guangming Bai, Long Qu, Juan Liu, Dechao Sun, 2024, IEEE Transactions on Vehicular Technology)
- Adaptive Trajectory Optimization for UAV-IRS Systems in 6G Thz Networks Using Multi Agent-DRL(Nahla Nur Elmadina, Rashid A. Saeed, Elsadig Saeid, A. Awouda, Hana M. Mujlid, H. Elshafie, 2025, Transport and Telecommunication Journal)
- Multi-Agent Reinforcement Learning for Intelligent V2G Integration in Future Transportation Systems(Jiawei Dong, Member Ieee Abdulsalam Yassine, Andy Armitage, S. M. I. M. Shamim Hossain, M. Shamim, 2023, IEEE Transactions on Intelligent Transportation Systems)
- Multi-Agent Reinforcement Learning for Slicing Resource Allocation in Vehicular Networks(Yaping Cui, Hongji Shi, Ruyan Wang, Peng He, Dapeng Wu, Xinyun Huang, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Priority-Aware Multi-Agent Deep Reinforcement Learning for Resource Scheduling in C-V2X Mode 4 Communication(Ahmed Thair Shakir, Barbara M. Masini, Nemer Radhwan Khudhair, R. Nordin, Angela Amphawan, 2025, IEEE Access)
- Collaborative Multi-Agent Resource Allocation in C-V2X Mode 4(M. Saad, Md. Mahmudul Islam, M. Tariq, Muhammad Toaha Raza Khan, Dongkyun Kim, 2021, 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN))
- Multi-Agent Energy Optimization in Connected HEV Convoys via MPC and Reinforcement Learning(W. A. Hashim, Shadan M. Abdalwahid, S. Kareem, Sami Ismael, Ali T. Al-khwaldeh, 2025, 2025 5th International Conference on Applied Automation and Industrial Diagnostics (ICAAID))
城市交通状态感知与行为模式分析
该组主要涉及城市宏观与微观层面的交通状态检测、人类流动性特征提取以及针对交通行为演化的分析研究。
- Behavioural Change Support Intelligent Transportation Applications(Efthimios Bothos, Babis Magoutas, Brian Caulfield, Athena Tsirimpa, Maria Kamargianni, Panagiotis Georgakis, Gregoris Mentzas, 2017, ArXiv Preprint)
- Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking(Hiroyuki Kasai, Wolfgang Kellerer, Martin Kleinsteuber, 2016, ArXiv Preprint)
- Semantics-Aware Hidden Markov Model for Human Mobility(Hongzhi Shi, Hancheng Cao, Xiangxin Zhou, Yong Li, Chao Zhang, V. Kostakos, Funing Sun, Fanchao Meng, 2019, IEEE Transactions on Knowledge and Data Engineering)
- A Real-Time Urban Traffic Congestion Prediction Framework Based on Dynamic Risk Field and Multi-Source Data Fusion(Yuman Xu, Yunxia Li, 2025, IEEE Access)
- Mobility Control Centre and Artificial Intelligence for Sustainable Urban Districts(F. M. M. Cirianni, A. Comi, Agata Quattrone, 2023, Information)
- Pedestrian Real-Time Detection and Flow Forecasting in Complex Urban Scenarios via Multi-Source Spatio-Temporal Data Fusion: A Unified Model(Huihong Chen, Vivekanandam Balasubramaniam, Shiming Liu, 2025, 2025 IEEE 4th International Conference of Safe Production and Informatization (IICSPI))
- Urban Transport Decision Making: Improving Traffic Prediction with Symbolic Regression, Transfer Learning and Deep Learning(Alina Patelli, John Rego Hamilton, Anikó Ekárt, 2025, Proceedings of the Genetic and Evolutionary Computation Conference Companion)
本报告通过对智能交通领域文献的系统性整理,将核心研究逻辑划分为五大方向:(1) MARL在交通信号与流量控制中的高效协作;(2) 自动驾驶与多智能体系统在路径规划与决策中的自主演进;(3) 大语言模型与多模态AI赋能的智能交通决策与语义感知;(4) 空中交通与车联网通信资源的高度优化与保障;(5) 城市交通行为模式的深度分析与感知。总体上,智能交通研究正从单纯的规则驱动与算法仿真,演变为具备推理能力的大模型交互与多主体协同控制的新形态。
总计280篇相关文献
In Intelligent Transportation System (ITS), information freshness is a crucial indicator for monitoring road traffic, which is measured by Age of Information (AoI). This paper studies the problem of vehicle data packet scheduling and power allocation for AoI minimization in a Manhattan grid Vehicle to Infrastructure (V2I) network. The challenge of the problem originates from the dynamic wireless environment and different AoI requirements of vehicles. To solve the above problems, a single-agent Markov Decision Process (MDP) is modeled. And we propose a Dueling Double Deep Q-Network (D3QN)-based Scheduling and Power Allocation Method (SPAM). The D3QN agent is devoted to minimizing the AoI of each vehicle. In addition, Priority Experience Replay (PER) technology is developed to solve the difficulty of obtaining high-value learning experiences. The simulation results show that the proposed method outperforms baseline D3QN approaches in average AoI by 22.6%.
Reinforcement learning (RL) has emerged as a crucial method for training autonomous vehicles that can navigate complex and changing environments. However, calculating effective reward functions for RL agents in autonomous vehicles remains difficult. The reward function shall capture the desired driving behaviours while also being robust to different road conditions and traffic densities. This study proposes an innovative approach to designing a reward function for RL agents in autonomous driving. The proposed approach involves a variety of high-level and low-level actions to measure the effectiveness of an agent. The agent is evaluated on a variety of randomly generated driving scenarios, which include different traffic intensities, lane keeping, overspeeding, and collision avoidance. The outcomes show that the proposed approach is highly effective in training RL agents to drive safely and efficiently in numerous environments.
Electric vehicles (EVs) are the backbone of the future intelligent transportation system (ITS). They are environmentally friendly and can also be integrated as distributed energy resources (DERs) into the smart grid using vehicle-to-grid (V2G) scheme. Specifically, utility companies can push back EV batteries into the electric grid to reduce the peak load. However, integrating EVs into the power grid efficiently requires accurate artificial intelligence (AI) mechanisms to forecast, coordinate, and dispatch the EVs into the grid. This paper proposes a Multi-agent Reinforcement Learning (MARL) mechanism that schedules the day-ahead discharging process of EV batteries to optimize the peak shaving performance of the electric grid. The proposed MARL overcomes the inaccuracy of energy prediction by allowing the agents, i.e. EVs, to make autonomous decisions. These agents are trained in a centralized fashion but make decisions locally to maintain autonomy and privacy. In particular, the model does not require that the EVs communicate with a centralized entity during the execution stage, which assures the model’s integrity and protects the EVs’ private information. To evaluate the model, a comprehensive series of experiments were carried out to prove the effectiveness of the MARL coordination and scheduling mechanism and to show that the model can indeed flatten the peak load.
We present MATRICS, a traffic-aware multi-agent reinforcement learning (MARL)-based intelligent lane-change system designed for autonomous vehicles (AVs). While existing research primarily focuses on enhancing the local impact of the ego vehicle’s lane-change decisions, MATRICS stands out by optimizing both local and global performance, i.e., aiming not only to improve the traffic efficiency, driving safety, and driver comfort of the ego vehicle, but also to enhance overall traffic flow within a designated road segment. Through an extensive review of the transportation literature, we construct a novel state space integrating local traffic information collected from surrounding vehicles and global traffic data obtained from roadside units (RSUs). We develop a reward function to guide judicious lane-change decisions, considering both ego vehicle performance and traffic flow enhancement. Our local density-aware multi-agent double deep Q-network (DDQN) algorithm facilitates effective cooperation among agents in executing lane-change maneuvers. Simulation results demonstrate MATRICS’ superior performance across metrics of traffic efficiency, driving safety, and driver comfort in comparison with a state-of-the-art MARL model.
To support diverse Internet of vehicles (IoV) services with different quality of service (QoS) requirements, network slicing is applied in vehicular networks to establish multiple logically isolated networks on common physical network infrastructure. However, dynamic and efficient radio access network (RAN) slicing adapting to the dynamics of vehicular networks remains challenging. The diverse applications make multi-dimensional resource requirements, which will result in the resource allocation more complicated. In addition, the system needs to frequently adjust the resources of slices, which will cause additional slicing overhead. Thus, to solve the above problems, we propose a resource allocation strategy by using multi-agent reinforcement learning to allocate resources in vehicular networks. Firstly, the cost composition of RAN slicing is analyzed, and the optimization problem is formulated to minimize the long-term system cost. Then, we transform the resource allocation problem into a partially observable Markov decision process. Finally, we propose a multi-agent deep deterministic policy gradient based resource allocation algorithm to solve it. All base stations are treated as independent agents, and they cooperatively allocate spectrum and computing resources. Simulation results show that the proposed strategy reduces the system cost effectively compared to the benchmarks, and the average QoS satisfaction rate achieves 96.5%.
Intermodal journey planning remains a challenge in intelligent transportation systems, particularly when accounting for heterogeneous passenger preferences and the integration into smart cities. Traditional planning approaches often fail to capture dynamic traffic conditions and the passenger-centric view required for future transportation systems. This study proposes a Multi-Objective Multi-Agent Reinforcement Learning (MOMARL) framework for individual intermodal journey planning across multiple modes. Two microscopic traffic models were developed in Simulation of Urban Mobility (SUMO), creating simulation environments in which passengers plan their journeys to arrive on time at transportation hubs. One simpler model for the verification of the framework and a calibrated model reflecting the dynamics of a real city. The transportation networks were modeled as multilayered graphs. Since each passenger has different preferences and access to transport modes, their individual cost-minimal paths are formulated as a multi-objective optimization (MOO) problem. From this, the scalarized reward signals used in the MOMARL framework are derived. Simulation results show that the proposed approach enables agents to generate feasible intermodal routes in a microscopic traffic environment, demonstrating the use of MOMARL for passenger-centric coordination in multimodal transport systems. Application to the calibrated model of Ingolstadt posed challenges regarding simulation complexity, highlighting the need to expand research in methods that allow the systematic reduction of model fidelity and granularity while retaining realistic dynamics.
Advancements in Intelligent Transportation Systems (ITS) have led to innovative solutions for planning optimization, efficiency enhancement, and resource allocation in transportation networks, which are demonstrated in applications such as smart parking lot management and electric vehicle (EV) charging station allocation, where improved decision-making and system-wide optimization have been achieved. However, as these systems evolve, the demand for better adaptability and coordination continues to grow to maximize their overall effectiveness and efficiency. To achieve this, we propose the Multi-Personality Multi-Agent Meta-Reinforcement Learning (MPMA-MRL) framework. This approach incorporates multiple meta-trained, meta-tested explainable personality policies, which are deployed to each agent. A personality selector is trained and deployed on each agent to optimize the overall performance. MPMA-MRL is superior than traditional methods in terms of the adaptability and coordination in ITS by leveraging improved information from the environment, more practical coordination among agents, faster adaptation speed to intermediate tasks, and more appropriate allocation and planning. The proposed framework is evaluated in the applications of parking lot optimization and EV charging station allocation. Its broader impact on multi-agent smart systems is analyzed to demonstrate its generalizability. The results demonstrate that in parking lot optimization, MPMA-MRL significantly reduces the time required to direct all vehicles to available parking spots. In EV charging station allocation, MPMA-MRL effectively minimizes waiting times at charging stations. Moreover, in both applications, MPMA-MRL exhibits enhanced adaptability to previously unseen scenarios, improving its applicability.
The rapid expansion and intelligent development of railway infrastructure are driving significant growth in railway engineering data. For the dispersed users across railway networks’ complex topology, traditional centralized storage systems are insufficient for their low-latency, cost-efficient data retrieval. Existing edge caching solutions based on multi-agent reinforcement learning fail to address the asymmetric relationships among railway nodes, such as data centers, stations, and sections, etc. Besides, the complexity of computing Nash equilibrium points also gets higher as the number of agents (edge caching servers) increases. This study introduces a Hierarchical Game model-based MADRL-driven Collaborative Edge Caching method(HG-MCEC) tailored for railway engineering data. By considering the distribution characteristics and caching strategy games among railway nodes, a hierarchical game model for collaborative edge caching is constructed. This model treats the railway edge caching as a multi-agent system, in which each railway node server is regarded as an agent. HG-MCEC utilizes deep learning to mitigate computational complexity and recognize agents’ asymmetry. Upper-level agents adjust cache replacement strategies according to environmental changes and decisionmaking experience. Lower-level agents, under the guidance of upper-level decisions, optimize collaborative caching strategies toward achieving hierarchical game equilibrium. Using a highspeed railway building information modeling data for validation, the method significantly outperforms existing approaches by enhancing content hit rates and reducing latency at edge caching servers while decreasing system content transmission costs.
The Internet of Vehicles (IoV), a burgeoning technology, merges advancements in the internet, vehicle electronics, and wireless communications to foster intelligent vehicle interactions, thereby enhancing the efficiency and safety of transportation systems. Nonetheless, the continual and high-frequency communications among vehicles, coupled with regional limitations in system capacity, precipitate significant challenges in allocating wireless resources for vehicular networks. In addressing these challenges, this study formulates the resource allocation issue as a multi-agent deep reinforcement learning scenario and introduces a novel multi-agent actor-critic framework. This framework incorporates a prioritized experience replay mechanism focused on distributed execution, which facilitates decentralized computing by structuring the training processes and defining specific reward functions, thus optimizing resource allocation. Furthermore, the framework prioritizes empirical data during the training phase based on the temporal difference error (TD error), selectively updating the network with high-priority data at each sampling point. This strategy not only accelerates model convergence but also enhances the learning efficacy. The empirical validations confirm that our algorithm augments the total capacity of vehicle-to-infrastructure (V2I) links by 9.36% and the success rate of vehicle-to-vehicle (V2V) transmissions by 6.74% compared with a benchmark algorithm.
In this work, we investigate an uplink unmanned aerial vehicles (UAVs)-enabled intelligent transportation system to collect data from traveling vehicles on a specific highway road. To ensure the freshness of information delivered from the traveling vehicles to UAV base stations, we use the new age of information (AoI) metric to characterize the information freshness and formulate the AoI minimization problem by optimizing the UAVs’ trajectories and the communication time of vehicles jointly. In order to handle the mixed-integer nonlinear problem, a multi-agent deep reinforcement learning scheme is proposed by applying independent flight direction and time slot action spaces, in which each UAV working as an independent agent adjusts to the dynamic environment quickly based on stored experience. The AoI-related reward function is proposed to select the beneficial action space to guarantee the information freshness. Numerical simulation results show the proposed scheme outperforms the benchmark schemes.
In the current era, the coordination of traffic flow is hindered by the discrepancy between road infrastructure and the number of vehicles which leads to traffic congestion. One of the widely used strategies to mitigate traffic congestion is to control traffic signals with the help of deep reinforcement learning (DRL) in edge computing based intelligent transportation system. This article provides a comprehensive analysis of the most recent DRL algorithms, advantage actor‐critic and proximal policy optimization in multiple deep neural networks (DNNs), including a state‐of‐the‐art transformer model for effective traffic signal management. Here, a single DRL agent is used, which obtains the spatio‐temporal information of the traffic to identify traffic patterns from complex intersection environments. The agent uses this information as the input to the DNNs and then applies the algorithms to retrieve the essential parameters of DNN to seek an optimal action selection policy to mitigate congestion. Different real‐time maps and small city networks are explored here to determine which DNN is best suited for traffic congestion management. The simulation study reveals that both the algorithms significantly outperform the baseline. The transformer model gives the best result when compared to other DNNs. The transformer model decreases average waiting time by 96.16%, implying that it has a higher capability of dealing with congested environments.
As urban centers evolve into smart cities, sustainable mobility emerges as a cornerstone for ensuring environmental integrity and enhancing quality of life. Autonomous vehicles (AVs) play a pivotal role in this transformation, with the potential to significantly improve efficiency and safety, and reduce environmental impacts. This study introduces a novel Multi-Agent Actor–Critic (MA2C) algorithm tailored for multi-AV lane-changing in mixed-traffic scenarios, a critical component of intelligent transportation systems in smart cities. By incorporating a local reward system that values efficiency, safety, and passenger comfort, and a parameter-sharing scheme that encourages inter-agent collaboration, our MA2C algorithm presents a comprehensive approach to urban traffic management. The MA2C algorithm leverages reinforcement learning to optimize lane-changing decisions, ensuring optimal traffic flow and enhancing both environmental sustainability and urban living standards. The actor–critic architecture is refined to minimize variances in urban traffic conditions, enhancing predictability and safety. The study extends to simulating realistic human-driven vehicle (HDV) behavior using the Intelligent Driver Model (IDM) and the model of Minimizing Overall Braking Induced by Lane changes (MOBIL), contributing to more accurate and effective traffic management strategies. Empirical results indicate that the MA2C algorithm outperforms existing state-of-the-art models in managing lane changes, passenger comfort, and inter-vehicle cooperation, essential for the dynamic environment of smart cities. The success of the MA2C algorithm in facilitating seamless interaction between AVs and HDVs holds promise for more fluid urban traffic conditions, reduced congestion, and lower emissions. This research contributes to the growing body of knowledge on autonomous driving within the framework of sustainable smart cities, focusing on the integration of AVs into the urban fabric. It underscores the potential of machine learning and artificial intelligence in developing transportation systems that are not only efficient and safe but also sustainable, supporting the broader goals of creating resilient, adaptive, and environmentally friendly urban spaces.
The increasing urbanization across the world necessitate efficient traffic management especially in the emerging economies. This paper presents an intelligent framework aimed at enhancing traffic signal management within complex road networks through the creation and evaluation of a multi-agent reinforcement learning (MARL) framework.The research explored how Reinforcement Learning (RL) algorithms can be employed to optimize the flow of traffic, lessen bottleneck, and enhance overall transportation safety and efficiency. Additionally, the research explored the design and simulation of a typical traffic environment that is, an intersection, defined and implemented a Multi-Agent System (MAS), and developed a Multi-Agent reinforcement learning model for traffic management within a simulated environment this model leverages actor-critics and deep Q Network (DQN) strategies for learning and coordination, and performed the evaluation of the MARL model. Novel approaches for decentralized decision-making and dynamic resource allocation were developed to enable real-time adaptation to changing traffic conditions and emergent situations. Performance evaluation using metrics such as waiting time, queue length, and congestion were carried out in the SUMO simulation platforms (Simulation of Urban Mobility) to evaluate the efficiency of the proposed solution in various traffic scenarios.The outcome of the simulation conducted in this study showed an improvement in queue management and traffic flow by 64.5% and 70.0% respectively with improvement in performance of the proposed model over the episodes. The results show that the RL model policy showed better performance compared to the baseline policy, indicating that the model learned over different episodes. The results also show that the MARL-based approach performs better for decentralized traffic control systems in both scalability and adaptability. The proposed solution supports real-time decision-making, reduces traffic congestion, and improves the efficiency of the urban transportation system.
This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I–24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17 -mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and several safety guards to ensure the posted speed limits satisfy the real-world constraints from the traffic management center and the Tennessee Department of Transportation. Since the time of launch of the system through April, 2024, the system has made approximately 10,000,000 decisions on 8,000,000 trips. The analysis of the controller shows that the MARL policy takes control for up to 98 % of the time without intervention from safety guards. The time-space diagrams of traffic speed and control commands illustrate how the algorithm behaves during rush hour. Finally, we quantify the domain mismatch between the simulation and real-world data and demonstrate the robustness of the MARL policy to this mismatch.
Bus bunching is harmful to the efficiency and stability of bus transit systems, consequently delaying the arrival time of passengers and lowering the public transportation’s adoption rate. Traditional solutions adjust the additional holding time of buses at certain stations to mitigate this phenomenon. These methods sacrifice the system efficiency in exchange for even headway between neighboring buses. Little work focuses on optimizing multiple strategies when a single bus line not only has a bus bay to increase bus dwell time but also owns several dedicated bus lanes to accelerate. In this work, we develop a hierarchical multi-agent reinforcement learning (HMARL) framework to combine these two strategies. Speeding up certain buses via dedicated lanes can counteract the negative influence of additional holding time. Next, to support the two strategies, we devise a two-layer policy scheme, one for high-level policy deciding holding or accelerating and the other for low-level policy determining the specific dwell time or increase of speed. Besides, to handle the issue that the controlling actions of agents are asynchronous and temporally extended, we establish a duration-critic module based on the Recurrent Neural Networks (RNN) mechanism to model other agents’ impact during the period between two consecutive control. We evaluate the proposed framework on a simulated bus line with a quasi-real-world pattern to compare the performance of both traditional headway-based control methods and existing MARL methods. Results show that our method outperforms other baselines, not only stabilizing a strongly unstable bus line but also shortening the traveling times of passengers.
The rapid growth of Internet services dramatically drives the development of various intelligent technologies. As an important composition, modern ride-hailing platforms allow citizens to order taxis in a fast, simple, and secure manner. The key to developing a successful ride-hailing system largely depends on efficient fleet management that narrows the demand-supply gap. For this reason, many Deep Reinforcement Learning (DRL) frameworks are becoming increasingly popular for proactive taxi dispatching. However, existing approaches have very coarse formulations and solutions to the dispatching problem, which overly simplify the real situations. To this end, we propose a fine-grained DRL framework for effective taxi dispatching with tractable solutions. Our framework features three components. 1) The basic model is a multi-agent DRL that optimizes the standard objectives (i.e., fulfilling the taxi orders). To reduce the status space and environment dynamics, we cluster regions based on their daily moving patterns and allow agents in the same cluster to learn cooperatively by sharing their policies, i.e., learning from like-pattern peers. 2) We further employ a recurrent neural network to forecast the demand and supply, which helps the agents make globally optimal decisions. 3) Finally, we design a two-phase central dispatching strategy based on Maximum Network Flow to relocate taxis in fine granularity. We conduct extensive experiments in a realistic environment simulator using a real-world dataset, and the results demonstrate the superior performance of our new framework over existing approaches, promoting the average total order response rate by 2.67%, taxi effort gain by 25.77%, and decreasing the average number of repositions by 2.96%.
Edge Intelligence (EI) technologies obtain an advance with promotion by Consumer Electronics (CE) and spread to the Intelligent Transportation System (ITS). As part of the edge in ITS, traffic lights suffer from overlooking the importance of cooperation among traffic lights and lack of long sequence scheduling. To address this challenge, we formulate the control problem of multi-intersection traffic lights as a multi-agent Markov game problem. In response, we propose a Cooperative Adaptive Control Method (CACOM), a framework based on multi-agent reinforcement learning. CACOM integrates the mixing network and the options framework. Specifically, the mixing network enables cooperation among intersections, and the options framework provides the ability for intersections to make a long sequence scheduling. Besides, we designed a weight generator for the mixing network based on the traffic conditions at intersections, allowing the agents to adjust their weights adaptively during cooperation. Finally, we build a simulator including two real-world urban road networks for extensive evaluation. In contrast to the best baseline methods, our approach achieves an average waiting time reduction of around 24% and 42% for high-priority vehicles in two scenarios. Moreover, the waiting time for all vehicles is decreased by approximately 15% and 6%, respectively.
Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose a framework, named Multi-Agent Spatio-Temporal Reinforcement Learning (Master), for intelligently recommending public accessible charging stations by jointly considering various long-term spatiotemporal factors. Specifically, by regarding each charging station as an individual agent, we formulate this problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor-critic framework with the centralized attentive critic to coordinate the recommendation between geo-distributed agents. Moreover, to quantify the influence of future potential charging competition, we introduce a delayed access strategy to exploit the knowledge of future charging competition during training. After that, to effectively optimize multiple learning objectives, we extend the centralized attentive critic to multi-critics and develop a dynamic gradient re-weighting strategy to adaptively guide the optimization direction. Finally, extensive experiments on two real-world datasets demonstrate that Master achieves the best comprehensive performance compared with nine baseline approaches.
Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). While centralized approaches are often infeasible for large-scale TSC problems, decentralized approaches provide scalability but introduce new challenges, such as partial observability. Communication plays a critical role in decentralized MARL, as agents must learn to exchange information using messages to better understand the system and achieve effective coordination. Deep MARL has been used to enable inter-agent communication by learning communication protocols in a differentiable manner. However, many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates “which” part of the message is sent “to whom”. In essence, our framework enables agents to selectively choose the recipients of their messages and exchange variable length messages with them. This results in a decentralized and flexible communication mechanism in which agents can effectively use the communication channel only when necessary. We designed two networks, a synthetic $4 \times 4$ grid network and a real-world network based on the Pasubio neighborhood in Bologna. Our framework achieved the lowest network congestion compared to related methods, with agents utilizing $\sim 47-65 \%$ of the communication channel. Ablation studies further demonstrated the effectiveness of the communication policies learned within our framework.
The dock-less bike-sharing system (DBSS) is a novel mode that allows users to rent/return available sharing bikes without fixed docking station requirements. To increase the usage of idle bikes and fulfill the spatial-temporal varying user demand, the operator needs to reposition available bikes periodically. This paper proposed a multi-agent reinforcement learning framework (MARL) for generating dynamic bike repositioning plans for DBSS to minimize user dissatisfaction. The framework introduced the double deep q network (DDQN) algorithm with the shadow environment trick to tackle the non-stationary learning problem and decomposed training process into sequential independent single-agent training. The bike-sharing data in Yangpu and Hongkou Districts, Shanghai, China, was implemented for model validation. The validation results demonstrated that the proposed approach can achieve robust performance in various condition with different truck and bike fleet size and truck capacity. It also outperformed some traditional stage-optimization model in reducing user dissatisfaction.
In the modern supply chain system, large-scale transportation tasks require the collaborative work of multiple vehicles to be completed on time. Over the past few decades, multi-vehicle route planning was mainly implemented by heuristic algorithms. However, these algorithms face the dilemma of long computation time. In recent years, some machine learning-based methods are also proposed for vehicle route planning, but the existing algorithms can hardly solve multi-vehicle time-sensitive problems. To overcome this problem, we propose a novel multi-agent reinforcement learning model, which optimizes the route length and the vehicle’s arrival time simultaneously. The model is based on the encoder-decoder framework. The encoder mines the relationship between the customer nodes in the problem, and the decoder generates the route of each vehicle iteratively. Specially, we design multiple route recorders to extract the route history information of vehicles and realize the communication between them. In the inferring phase, the model could immediately generate routes for all vehicles in a new instance. To further improve the performance of the model, we devise a multi-sampling strategy and obtain the balance boundary between computation time and performance improvement. In addition, we propose a simulation-based vehicle configuration method to select the optimal number of vehicles in real applications. For validation, we conduct a series of experiments on problems with different customer amounts and various vehicle numbers. The results show that the proposed model outperforms other typical algorithms in both performance and calculation time.
A more attractive future railway system needs to offer more capacity in the railway network and improve quality and punctuality. A fundamental centerpiece of future digitized railway network operations is automated and optimized planning and dispatching. The sector initiative “Digitale Schiene Deutschland” (DSD) develops a holistic and intelligent Capacity & Traffic Management System (CTMS) that can automatically plan and continuously optimize railway traffic at scale. Both, planning and dispatching tasks, are highly complex and, today, require human expertise and oversight. Our main contribution is a multi-agent deep reinforcement learning approach at the core of the envisioned CTMS, which learns from interaction with a realistic, microscopic railway simulation. Our results demonstrate that the proposed approach flexibly solves planning and re-scheduling tasks in the realistic setting of a medium-sized part of the German railway network. It exhibits response times and scaling properties that make it a promising candidate for future applications in railway operations at scale.
This work proposes a traffic-light scheduling framework using the deep reinforcement learning technique to balance the traffic flow and to prevent congestion in the dense regions of the city via a software-defined control interface. A software-defined control enabled architecture is proposed to monitor the traffic conditions and it generates the traffic light control signal (Red/Yellow/Green) accordingly. For an intelligent traffic light control signal, a Deep Reinforcement Learning (DRL) model is proposed which takes vehicular dynamics as inputs from the real-time traffic environment such as heterogeneous vehicles count, speed, traffic density etc. To determine the congestion, a threshold policy is proposed and deployed on control server which generates the congestion prevention signal. A DRL agent operates in the coordination of congestion prevention signal and generates an effective traffic light control signal. The proposed model is evaluated through a realistic simulation on Indian city OpenStreetMap by using a well-known open-source simulator (SUMO). The comparative results show that the proposed solution improves several performance metrics such as average waiting time, throughput, average queue length, and average speed in the interval of 28.34% – 66.62%, 24.76% – 66.60%, 30.89% – 69.80%, and 16.62% – 43.67% respectively over other states of the art approaches.
Electric vehicle charging stations (EVCSs) heavily rely on communication systems, making them vulnerable to cyber uncertainties such as communication delays and False Data Injection (FDI) attacks. In this study, the techno-economic evaluation of the EVCS based on the developed and data-driven Takagi-Sugeno-Kang Fuzzy System & Multi-Agent Deep Reinforcement Learning (TSKFS&MADRL) method is presented to detect and compensate for cyber uncertainties such as FDI attacks and communication delay. In addition, the proposed approach provides a fast dynamic response and enhances the resilient operation of EVCS in the presence of FDI attacks. First, the target points of hackers such as communication systems and transducer sensors are modeled, and then, using the Euclidean norm theory, weighted least square error method, and residual error technique resulting from comparing measured data with reference values based on probability distribution functions, FDI cyber-attacks are detected. Then, the network control and recovery requirements are enabled by the proposed controller based on the TSKFS&MADRL method. The proposed approach has been implemented in the IEEE 33 bus network. The experimental operating cost is 7.33% less than the RL method and 12.15% less than the CNN method. Also, the experimental results of the proposed method show a 40% detection time reduction compared to other methods.
No abstract available
The proliferation of AI-powered cameras in Intelligent Transportation Systems (ITS) creates a severe conflict between the need for rich visual data and the right to privacy. Existing privacy-preserving methods, such as blurring or encryption, are often insufficient due to creating an undesirable trade-off where either privacy is compromised against advanced reconstruction attacks or data utility is critically degraded. To resolve this challenge, we propose RL-MoE, a novel framework that transforms sensitive visual data into privacy-preserving textual descriptions, eliminating the need for direct image transmission. RL-MoE uniquely combines a Mixture-of-Experts (MoE) architecture for nuanced, multi-aspect scene decomposition with a Reinforcement Learning (RL) agent that optimizes the generated text for a dual objective of semantic accuracy and privacy preservation. Extensive experiments demonstrate that RL-MoE provides superior privacy protection, reducing the success rate of replay attacks to just 9.4% on the CFP-FP dataset, while simultaneously generating richer textual content than baseline methods. Our work provides a practical and scalable solution for building trustworthy AI systems in privacy-sensitive domains, paving the way for more secure smart city and autonomous vehicle networks.
Electric vehicles (EVs) have been used in the ride-hailing system in recent years, which brings the electric fleet management problem (EFMP) critical. This paper aims to leverage multi-agent reinforcement learning (MARL) in EFMP. In particular, we focus on how EVs learn to manage battery charging, pick up and drop off passengers. We propose an integrated SUMO-Gym framework based on the SUMO simulator to capture EVs' asynchronous decision-making regarding charging and ride-hailing in complex traffic environments. We adopt a hierarchical reinforcement learning (HRL) scheme, where each EV decides to get charged or pick up a passenger on the upper level and chooses a charging station or passenger on the lower level. We develop a learning algorithm for the HRL scheme to solve EFMP and present numerical results about the efficiency of our algorithm and policies EVs have learned in EFMP. Our codes are available at https://github.com/LovelyBuggies/SUMO-Gym, which provides an open-source environment for researchers to design traffic scenarios and test RL algorithms for EFMP.
Intelligent transportation systems (ITS) are facing the limitation of spectral resources and stringent real time communication requirements. How to effectively allocate system resources for maximizing the performance in Internet of Vehicles (IoV) is still a substantial challenge, particularly the priority and urgency of different types of data need to be focused. To improve the allocation spectrum resources and optimize transmission power while taking into the dynamic characteristics of vehicles and data priorities account, we design a time-series-based multi-agent deep reinforcement learning framework (NL-MAPPO for short), in this paper. First, we formulate the joint optimization problem as a multi-agent Markov decision process to ensure the minimization of transmission delays and energy consumption when the total vehicle-to-vehicle (V2V) link capacity is maximized. Here, V2V link capacity refers to the maximum achievable data rate for direct communication between vehicles, which depends on factors such as signal strength, interference, and available bandwidth. Then, we design a multi-agent resource allocation algorithm based on a shared-critic mechanism to realize the global sharing of channel information and solve the optimization problem. Finally, to improve efficiency, we also introduce a time series-based channel information extraction mechanism to capture the temporal characteristics of channel information. The simulation experiments were conducted and the results demonstrated that our proposed NL-MAPPO can demonstrate superiority in multiple metrics.
Ahstract-Transforming to a low-carbon future requires massive efforts from both transport and power systems. Electric vehicles (EVs) can reduce CO2 emission in road transport through eco-routing while providing carbon intensity service for power systems via vehicle-to-grid (V2G) scheduling. This paper studies the coordinated effect of routing and scheduling problems of EVs via a novel model-free multi-agent reinforcement learning (MARL) method. In this context, EVs do not reply on any knowledge of the simulated environment and are capable of handling the system with various uncertainties and dynamics during the learning process, which can lead to timely decision making and better privacy protection. Extensive case studies based on a virtual 7-node 10-edge transportation network are developed to demonstrate the effectiveness of the proposed MARL method on reducing carbon emissions in the transportation system and providing carbon intensity service in the power system.
With the rapid development of intelligent transportation systems, task offloading technologies have been widely adopted to shift computational workloads from resource-constrained vehicles to resource-rich edge and cloud servers, thereby improving the service quality of Internet of Vehicles (IoV) systems. However, the growing complexity of vehicular applications and the heterogeneity of service demands have led to critical challenges, such as dynamic resource imbalance, cross-domain scheduling difficulties, and inefficiencies in global optimization. These challenges require more intelligent and coordinated task offloading mechanisms capable of handling resource diversity and system-wide collaboration. To address these issues, a novel task-offloading model based on a federated reinforcement learning framework is proposed. Specifically, a Device-Edge-Cloud collaborative offloading model is constructed, and the Federated Learning Optimized Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (FedMATD3) algorithm is developed to achieve global optimization and efficient cross-domain resource allocation. Extensive simulation experiments validate the effectiveness of the proposed approach. Compared with four classical baseline algorithms, FedMATD3 demonstrates superior performance in terms of task processing delay and task completion rate.
New developments in connected and cooperative driving technologies bring significant transformations to intelligent transportation systems' energy management capabilities. The study demonstrates a method for hybrid electric vehicles (HEVs) to optimise their energy utilisation by leveraging advanced control techniques, such as model predictive control (MPC) and reinforcement learning (RL), in conjunction with vehicle-to-vehicle (V2V) communication. The new system lets loads be dynamically distributed among vehicles, with regenerative braking capabilities given priority. It also makes it easier for the whole fleet to work together to save energy. Simulations show that the system is much better than before. Simulations show that the system reduces total energy requirements by 22 %, increases regenerative-braking efficiency by 28 %, and extends range by 15 % compared to traditional HEVs. Sustainable intelligent transportation through cooperative energy optimization among connected HEVs. The system produces enhanced synchronization performance, which minimizes data transfer delays to maximize simultaneous energy management. All the research that showed how the suggested method cuts down on energy waste and makes SOC control and instant response better. The current system enhances performance, but operators must still overcome issues relating to computational complexity and real-world implementation. Future research will emphasize real-world testing as well as decentralized optimization techniques through practical development. Research findings demonstrate that cooperative driving delivers performance as a sustainable approach for efficient multi-vehicle power distribution.
The bus system is a critical component of sustainable urban transportation. However, the operation of a bus fleet is unstable in nature, and bus bunching has become a common phenomenon that undermines the efficiency and reliability of bus systems. Recently research has demonstrated the promising application of multi-agent reinforcement learning (MARL) to achieve efficient vehicle holding control to avoid bus bunching. However, existing studies essentially overlook the robustness issue resulting from perturbations and anomalies in a transit system, which is of utmost importance when transferring the models for real-world deployment/application. In this study, we integrate implicit quantile network and meta-learning to develop a distributional MARL framework—IQNC-M—to learn continuous control. The proposed IQNC-M framework achieves efficient and reliable control decisions through better handling various uncertainties in real-time transit operations. Specifically, we introduce an interpretable meta-learning module to incorporate global information into the distributional MARL framework, which is an effective solution to circumvent the credit assignment issue in the transit system. In addition, we design a specific learning procedure to train each agent within the framework to pursue a robust control policy. We develop simulation environments based on real-world bus services and passenger demand data and evaluate the proposed framework against both traditional holding control models and state-of-the-art MARL models. Our results show that the proposed IQNC-M framework can effectively handle the general perturbations and various extreme events, such as traffic state perturbations and demand surges, thus improving both efficiency and reliability of the transit system.
This paper introduces an energy-efficient, software-defined vehicular edge network for the growing intelligent connected transportation system. A joint user-centric virtual cell formation and resource allocation problem is investigated to bring eco-solutions at the edge. This joint problem aims to combat against the power-hungry edge nodes while maintaining assured reliability and data rate. More specifically, by prioritizing the downlink communication of dynamic eco-routing, highly mobile autonomous vehicles are served with multiple low-powered access points (APs) simultaneously for ubiquitous connectivity and guaranteed reliability of the network. The formulated optimization is exceptionally troublesome to solve within a polynomial time, due to its complicated combinatorial structure. Hence, a distributed multi-agent reinforcement learning (D-MARL) algorithm is proposed for eco-vehicular edges, where multiple agents cooperatively learn to receive the best reward. First, the algorithm segments the centralized action space into multiple smaller groups. Based on the model-free distributed Q learner, each edge agent takes its actions from the respective group. Also, in each learning state, a software-defined controller chooses the global best action from individual bests of the distributed agents. Numerical results validate that our learning solution achieves near-optimal performances within a small number of training episodes as compared with existing baselines.
One of the pillars of the intelligent transportation systems can be defined as vehicle-to-Everything (V2X) communication, since this enables a real-time communication of information among vehicles, roadside units (RSUs), and the infrastructure. However, the efficient and secure resource distribution of the V2X network is not a simple task, which is conditioned by the highly dynamic nature of the automotive environment, spectrum deficiency, and further privacy issues. Traditional centralized approaches to education, though effective in terms of performance, cannot be applied to large-scale V2X applications in terms of overhead of communication and single point of weakness and the threat of data leakage. The article proposes a safe federated reinforcement learning (SecAgg-FRL) framework of privacy-guaranteed and scalable allocation of resources in V2X systems. Within the provided system, each V2X node becomes informed of a local reinforcement learning agent depending on the traffic, mobility, and channel state of the particular node. Raw data are not exchanged but instead, only masked model updates are sent to a federated server where secure aggregation is enforced to get the global model without revealing any single contribution. The framework can collectively maximize usage of spectrum, latency and interference reduction, and it can withstand inference and model inversion attacks. The simulation findings suggest that the designed SecAgg-FRL framework can achieve spectrum utilization that is close to that of centralized RL, with low latency, and scales well as the number of clients increases, as well as provides reasonably high level of privacy protection with little overhead. These results illustrate why the SecAgg-FRL is a potentially useful solution to resource management in next generation V2X networks in real-time, securely and scalably.
In recent years, deep learning-based traffic anomaly detection has proven very promising. Although current methods achieve high accuracy in detecting anomalies, they struggle to accurately classify attack types of anomalies due to the imbalanced distribution of attack samples. To address the issue, we propose a highly intelligent system, ALM, which can simultaneously provide accurate traffic anomaly detection and attack types classification with the aid of the Large Language Model (LLM)'s few-shot learning ability. To tackle the challenges of training cost and inference efficiency associated with large models, ALM adopts a two-stage solution, i.e., AnomalyDetector and Anomaly Analyzer, that combines the fine-tuned LLM with small models. In the first stage, AnomalyDetector ensembles a set of lightweight models to handle high-concurrency real-time network traffic anomaly detection. In the second stage, Anomaly Analyzer leverages the LLM's powerful fitting and few-shot learning abilities for traffic anomaly analysis through three processes: LLM task adaption, traffic to sequence, and LLM fine-tuning. This allows Anomaly Analyzer to accurately identify the attack types and potential false positives. Experimental results indicate that ALM achieves over 90% Micro-F1 on four public datasets, with a maximum of 99.94 %, surpassing the baseline. Additionally, it requires minimal training costs while significantly improving inference efficiency compared to the pure LLM mode.
No abstract available
In communication network management, prediction of mobile network traffic is essential to ensure efficient system operation. Although significant progress has been made in the application of neural networks to traffic prediction tasks, traditional models still face considerable challenges when handling high-dimensional and highly time-dependent data. To address these issues, this article proposes a new prediction framework that leverages large-language models (LLMs), by constructing efficient prompts to enhance the ability of LLMs in traffic prediction and improve their understanding of complex traffic patterns. Specifically, we introduce functional data analysis (FDA), a technique that offers superior capabilities compared to traditional methods in processing continuous and high-dimensional data structures, to preprocess traffic data and extract key features. Extensive experiments conducted on multiple LLMs using a real-world dataset validate the effectiveness and scalability of the proposed method. The experimental results demonstrate that the framework achieves significant improvements in predictive performance, providing a promising and efficient solution for traffic data analysis in future communication networks.
Ship trajectory prediction plays a pivotal role in maritime navigation, facilitating efficient traffic management, collision avoidance, and route optimisation, particularly in the development and operation of Maritime Autonomous Surface Ships (MASS). This paper introduces GPT4STP, a novel framework that leverages transformer-based architectures inspired by Large Language Models (LLMs) for accurate and robust ship trajectory forecasting. By incorporating advanced techniques such as instance normalisation, patching, and fine-tuned positional embeddings, GPT4STP effectively captures both local and global spatial-temporal dynamics with exceptional precision and robustness in trajectory data. The model is evaluated using Automatic Identification System (AIS) datasets from two complex maritime regions: the Chengshan Jiao Promontory (CSJ) and Zhoushan Archipelago (ZS). Experimental results demonstrate GPT4STP’s superior performance across key metrics, including Average Displacement Error (ADE), Final Displacement Error (FDE), Mean Squared Error (MSE), and Mean Absolute Error (MAE). Compared to existing methods, GPT4STP achieves remarkable improvements in prediction accuracy and robustness, particularly in complex maritime environments. Beyond its technical achievements, GPT4STP offers significant practical implications for the maritime industry. By enhancing the predictive capabilities of MASS, the framework helps ensure safe and efficient maritime operations, contributing to reduced collision risks, optimised routes, and sustainable navigation. This research underscores the transformative potential of integrating cutting-edge artificial intelligence methodologies, like those inspired by LLMs, into maritime applications. The success of GPT4STP highlights a promising direction for future research, emphasising the role of AI-driven solutions in advancing autonomous maritime systems and improving overall maritime safety and efficiency.
Flight procedures are essential to the safety and efficiency of air traffic management. However, due to the highly specialized nature of the flight procedure design process, existing methods rely heavily on manual operations and adjustments with limited automation, resulting in inefficiencies and potential safety risks. This study introduces AutoFPDesigner, a new agent-driven approach to flight procedure design, leveraging large language models (LLMs). By utilizing multi-agent collaboration, AutoFPDesigner automates Performance-Based Navigation (PBN) procedures, enabling end-to-end automation. In this framework, the designer’s role shifts from an executor to a supervisor, issuing tasks through natural language, while the system integrates specialized knowledge and uses a toolset to complete the design. Experimental results show that procedures designed with this approach meet safety requirements nearly 100%, with 75% of tasks completed in a limited number of steps. Moreover, AutoFPDesigner performs effectively across various design tasks, outperforming existing methods. Additionally, this study conducted human interaction experiments and introduced an “instruction-based” feedback method to address agent misinterpretation of human feedback. Experimental results demonstrate that the system bridges the skill gap between experts and beginners, and that the “instruction-based” feedback method enhances the accuracy of agent feedback interpretation. Code and data are available on https://github.com/Zhulongtao6/AutoFPDesigner-LLM
The management of modern networks is becoming increasingly complex, and while Software-Defined Networking (SDN) offers programmability, existing control mechanisms often lack flexibility and operational intuitiveness due to heavy reliance on manual configurations or opaque machine learning models. To address this limitation, we propose the indirect integration of a Large Language Model (LLM)-powered conversational agent into the SDN control loop, implementing the ReAct (Reasoning + Acting) paradigm as our core innovation to synergize natural language reasoning with concrete network operations. This agent utilizes a specialized toolkit that translates human conversational intents both into SQL queries (via a Text-to-SQL engine) for deep network state analysis and into REST API calls (through OpenFlow rule modifications) for real-time traffic regulation, validated through comprehensive emulation-based evaluation.
Traffic congestion and inefficiencies in transportation networks pose significant challenges to road safety, travel times, and environmental sustainability. Traditional traffic management systems, typically reliant on sparse sensor data and rigid models, often fail to provide accurate, reliable, and user-friendly insights. This paper introduces a novel Physics-Informed Neural Network-Based Traffic State Estimator (PINN-TSE), framework that integrates the Aw-Rascle traffic flow model with advanced machine learning and natural language processing (NLP) techniques. By combining physics-informed modeling with data-driven learning, the framework ensures accurate and physically consistent predictions of traffic density and velocity. A multicomponent loss function balances data fidelity with physical constraints, while Large Language Models (LLMs) generate contextualized and interpretable traffic insights through a chat-based web interface. The system is designed to handle diverse user queries from precise spatio-temporal inputs to broad, general inquiries, making it highly adaptable for real-world deployment. Validated on real-world data from the US-101 highway, PINN-TSE demonstrated strong performance in capturing shockwave dynamics and transitions between traffic regimes. It achieved mean absolute errors (MAE) of 2.4 vehicles per mile (vpm) for density and 3.98 mph for velocity, representing improvements of 60% and 73%, respectively, over purely data-driven models. Furthermore, the shockwave speed error was reduced to 8%, significantly improving the reliability of traffic dynamic predictions. The system’s ability to provide actionable insights, such as identifying congestion hotspots and suggesting alternative routes, highlights its practical utility in real-world traffic management. This work makes three key contributions: 1) a robust PINN-TSE framework that embeds physical laws into neural networks, 2) an intuitive LLM-powered interface for real-time traffic interaction, and 3) a demonstration of its effectiveness in real-world settings. By bridging the gap between complex traffic data and human decision-making, this study advances the field of intelligent transportation systems, offering a transformative solution to safer, more efficient, and sustainable traffic management.
Abnormal phenomena on urban roads, including uneven surfaces, garbage, traffic congestion, floods, fallen trees, fires, and traffic accidents, present significant risks to public safety and infrastructure, necessitating real-time monitoring and early warning systems. This study develops Urban Road Anomaly Visual Large Language Models (URA-VLMs), a generative AI-based framework designed for the monitoring of diverse urban road anomalies. The InternVL was selected as a foundational model due to its adaptability for this monitoring purpose. The URA-VLMs framework features dedicated modules for anomaly detection, flood depth estimation, and safety level assessment, utilizing multi-step prompting and retrieval-augmented generation (RAG) for precise and adaptive analysis. A comprehensive dataset of 3034 annotated images depicting various urban road scenarios was developed to evaluate the models. Experimental results demonstrate the system’s effectiveness, achieving an overall anomaly detection accuracy of 93.20%, outperforming state-of-the-art models such as InternVL2.5 and ResNet34. By facilitating early detection and real-time decision-making, this generative AI approach offers a scalable and robust solution that contributes to a smarter, safer road environment.
A robust and efficient traffic monitoring system is essential for smart cities and Intelligent Transportation Systems (ITS), using sensors and cameras to track vehicle movements, optimize traffic flow, reduce congestion, enhance road safety, and enable real-time adaptive traffic control. Traffic monitoring models must comprehensively understand dynamic urban conditions and provide an intuitive user interface for effective management. This research leverages the Large Language-andVision Assistant (LLaVA) visual grounding multimodal large language model (LLM) for traffic monitoring tasks on the realtime Quanser Interactive Lab simulation platform, covering scenarios like intersections, congestion, and collisions. Cameras placed at multiple urban locations collect real-time images from the simulation, which are fed into the LLaVA model with queries for analysis. An instance segmentation model integrated into the cameras highlights key elements such as vehicles and pedestrians, enhancing training and throughput. The system achieves $84.3 \%$ accuracy in recognizing vehicle locations and $76.4 \%$ in determining steering direction, outperforming traditional models.
To satisfy the growing demand for traffic prediction induced by urbanization, the intelligent transportation system integrated various cutting-edge artificial intelligence technologies, with large language models (LLMs) as a representative, has been developed. However, existing methods are mostly confined by shallow LLMs utilization, where the semantic capacity of LLMs is ignored and the traffic data are directly fed in. Furthermore, the modality diversity of different traffic prediction scenarios (e.g., flow, speed, and demanding) remains to be underexplored, which restricts the model flexibility towards downstream applications. To mitigate these limitations, we propose a Mixture of Semantic and Spatial Experts (SS-MoE) for traffic prediction along with the human-intelligible post-hoc result explanation. Specifically, to enlighten the traffic predictor with abundant semantic information, we design hierarchically coarse- and fine-grained prompts including role assignments, dataset descriptions, and background supplements, which serves as the auxiliary knowledge for downstream prediction. Afterwards, considering the diversity of real-world traffic scenarios, we construct the MoE framework consisting of a spatial expert, a semantic expert, and a general expert, which accounts for the node-level features, the semantic representations, and the overall generalization, respectively. At last, we instruct the LLM to explain and analyze the final prediction, which is able to provide insightful conclusions and support intelligent transportation decisions, forming a unified prediction-explanation pipeline. Extensive experiments on five public traffic datasets demonstrate the superiority of SS-MoE across three traffic prediction tasks. Experimental results indicate that the MAE and RMSE values of SS-MoE are reduced by up to 4.04% and 3.20% compared with that of the runner-up, respectively.
With the rising incidence of traffic accidents and growing environmental concerns, the demand for advanced systems to ensure traffic and environmental safety has become increasingly urgent. This paper introduces an automated highway safety management framework that integrates computer vision and natural language processing for real-time monitoring, analysis, and reporting of traffic incidents. The system not only identifies accidents but also aids in coordinating emergency responses, such as dispatching ambulances, fire services, and police, while simultaneously managing traffic flow. The approach begins with the creation of a diverse highway accident dataset, combining public datasets with drone and CCTV footage. YOLOv11s is retrained on this dataset to enable real-time detection of critical traffic elements and anomalies, such as collisions and fires. A vision–language model (VLM), Moondream2, is employed to generate detailed scene descriptions, which are further refined by a large language model (LLM), GPT 4-Turbo, to produce concise incident reports and actionable suggestions. These reports are automatically sent to relevant authorities, ensuring prompt and effective response. The system’s effectiveness is validated through the analysis of diverse accident videos and zero-shot simulation testing within the Webots environment. The results highlight the potential of combining drone and CCTV imagery with AI-driven methodologies to improve traffic management and enhance public safety. Future work will include refining detection models, expanding dataset diversity, and deploying the framework in real-world scenarios using live drone and CCTV feeds. This study lays the groundwork for scalable and reliable solutions to address critical traffic safety challenges.
Traffic control in unsignalized urban intersections presents significant challenges due to the complexity, frequent conflicts, and blind spots. This study explores the capability of leveraging Multimodal Large Language Models (MLLMs), such as GPT-4o, to provide logical and visual reasoning by directly using birds-eye-view videos of four-legged intersections. In this proposed method, GPT-4o acts as intelligent system to detect conflicts and provide explanations and recommendations for the drivers. The fine-tuned model achieved an accuracy of 77.14 %, while the manual evaluation of the true predicted values of the fine-tuned GPT-4o showed significant achievements of 89.9 % accuracy for model-generated explanations and 92.3 % for the recommended next actions. These results highlight the feasibility of using MLLMs for real-time traffic management using videos as inputs, offering scalable and actionable insights into intersections traffic management and operation. Code used in this study is available at https://github.com/sarimasri3/Traffic-Intersection-Conflict-Detection-using-images.git.
Traffic prediction, an essential component for intelligent transportation systems, endeavours to use historical data to foresee future traffic features at specific locations. Although existing traffic prediction models often emphasize developing complex neural network structures, their accuracy has not improved. Recently, large language models have shown outstanding capabilities in time series analysis. Differing from existing models, LLMs progress mainly through parameter expansion and extensive pretraining while maintaining their fundamental structures. Motivated by these developments, we propose a Spatial-Temporal Large Language Model (ST-LLM) for traffic prediction. In the ST-LLM, we define timesteps at each location as tokens and design a spatial-temporal embedding to learn the spatial location and global temporal patterns of these tokens. Additionally, we integrate these embeddings by a fusion convolution to each token for a unified spatial-temporal representation. Furthermore, we innovate a partially frozen attention strategy to adapt the LLM to capture global spatial-temporal dependencies for traffic prediction. Comprehensive experiments on real traffic datasets offer evidence that ST-LLM is a powerful spatial-temporal learner that outperforms state-of-the-art models. Notably, the ST-LLM also exhibits robust performance in both few-shot and zero-shot prediction scenarios. The code is publicly available at https://github.com/ChenxiLiu-HNU/ST-LLM.
With the urbanization process, an increasing number of sensors are being deployed in transportation systems, leading to an explosion of big data. To harness the power of this vast transportation data, various machine learning (ML) and artificial intelligence (AI) methods have been introduced to address numerous transportation challenges. However, these methods often require significant investment in data collection, processing, storage, and the employment of professionals with expertise in transportation and ML. Additionally, privacy issues are a major concern when processing data for real-world traffic control and management. To address these challenges, the research team proposes an innovative multi-agent framework called Independent Mobility Generative Pre-Trained Transformers (IDM-GPT) based on large language models (LLMs) for customized traffic analysis, management suggestions, and privacy preservation. IDM-GPT efficiently connects users, transportation databases, and ML models economically. IDM-GPT trains, customizes, and applies various LLM-based AI agents for multiple functions, including user query comprehension, prompts optimization, data analysis, model selection, and performance evaluation and enhancement. With IDM-GPT, users without any background in transportation or ML can efficiently and intuitively obtain data analysis and customized suggestions in near real-time based on their questions. Experimental results demonstrate that IDM-GPT delivers satisfactory performance across multiple traffic-related tasks, providing comprehensive and actionable insights that support effective traffic management and urban mobility improvement.
Accurate traffic prediction is essential for intelligent transportation systems, urban mobility management, and traffic optimization. However, existing deep learning approaches often struggle to jointly capture complex spatial dependencies and temporal dynamics, and they are prone to overfitting when modeling large-scale traffic networks. To address these challenges, we propose the GSF-LLM (graph-enhanced spatio-temporal fusion-based large language model), a novel framework that integrates large language models (LLMs) with graph-based spatio-temporal learning. GSF-LLM employs a spatio-temporal fusion module to jointly encode spatial and temporal correlations, combined with a partially frozen graph attention (PFGA) mechanism to model topological dependencies while mitigating overfitting. Furthermore, a low-rank adaptation (LoRA) strategy is adopted to fine-tune a subset of LLM parameters, improving training efficiency and generalization. Experiments on multiple real-world traffic datasets demonstrate that GSF-LLM consistently outperforms state-of-the-art baselines, showing strong potential for extension to related tasks such as data imputation, trajectory generation, and anomaly detection.
Traffic offences are a major source of road accidents and traffic congestion in the cities. Monitoring can be conducted manually, which is time consuming, inaccurate and ineffective. This study introduces an Intelligent Traffic Violation Detection and Explanation System (ITVDES) which uses computer vision (CV), optical character recognition (OCR) and large language models (LLMs) to detect, identify and report traffic violations automatically. The system identifies various categories of violations including helmet non-compliance, lane violations and red-light crossings, scans vehicle number plates through OCR and produces structured and legally binding reports with the help of LLMs. Preliminary testing with video data on tape shows it to be accurate and scalable and the future expansion is to conduct CCTV processing and combine with law enforcement notification systems in real time. This research paper promotes the possibility of AI to produce safer and smarter traffic management systems.
Urban traffic management faces significant challenges due to the dynamic environments, and traditional algorithms fail to quickly adapt to this environment in real-time and predict possible conflicts. This study explores the ability of a Large Language Model (LLM), specifically, GPT-40-mini to improve traffic management at urban intersections. We recruited GPT-40-mini to analyze, predict position, detect and resolve the conflicts at an intersection in real-time for various basic scenarios. The key findings of this study to investigate whether LLMs can logically reason and understand the scenarios to enhance the traffic efficiency and safety by providing real-time analysis. The study highlights the potential of LLMs in urban traffic management creating more intelligent and more adaptive systems. Results showed the GPT-40-mini was effectively able to detect and resolve conflicts in heavy traffic, congestion, and mixed-speed conditions. The complex scenario of multiple intersections with obstacles and pedestrians saw successful conflict management as well. Results show that the integration of LLMs promises to improve the effectiveness of traffic control for safer and more efficient urban intersection management.
Traffic congestion in Nairobi’s Central Business District continues to impose high economic, social, and environmental costs. Long queues at intersections, wasted fuel, and poor air quality are common outcomes of conventional fixed-time traffic signals. These systems dominate the city but do not respond to fluctuating or multimodal traffic. This study explored the use of Large Language Model (LLM) agents as adaptive controllers and compared their performance with existing fixed-time plans. Traffic video recordings were collected from selected intersections and analyzed using the YOLOv5 object detection algorithm to estimate lane-specific vehicle counts. The processed counts were then used to calibrate a Simulation of Urban Mobility (SUMO) environment. Within this setup, LLM agents allocated green times dynamically and adjusted signal phases in real time. The study adopted an experimental simulation design, testing both peak and off-peak traffic conditions as well as disruption scenarios such as blocked approaches and emergency vehicle passage. To ensure reliability, the SUMO model was calibrated against observed volumes and validated using standard traffic simulation statistics. Performance was assessed using three key indicators: average waiting time, intersection throughput, and responsiveness to demand fluctuations. Results showed that the LLM-based model reduced waiting times by up to 35%, increased throughput by 12–18%, and stabilized signal plans within fewer cycles than the fixed-time baseline. Beyond efficiency gains, the study demonstrates the feasibility of repurposing generalist AI models as decision agents in traffic management, offering a low-cost, scalable solution particularly suited to resource-constrained cities. By providing localized evidence from Nairobi, the research contributes to Intelligent Transportation Systems (ITS) literature and supports policy directions that include piloting AI-powered adaptive control at critical intersections as part of broader smart mobility strategies in African cities.
Taxi demand prediction is essential for intelligent transportation systems. Accurate prediction results help address the issue of supply–demand imbalances and enable more efficient traffic management. Significant advances have been made in traffic demand prediction, particularly through the use of deep learning models. However, these models heavily rely on a large amount of data. Data scarcity remains a significant challenge because of high acquisition and storage costs, as well as data sparsity in certain locations and times. Thus, this study proposes a novel taxi demand prediction model that leverages the large language model GPT-2 to capture complex spatio-temporal dependencies. By integrating spatial correlations through a graph attention network and incorporating temporal dependencies at multiple scales, the proposed spatio-temporal taxi demand prediction large model (STTDP-LM) is capable of achieving accurate prediction with limited training data. Extensive experiments validate its effectiveness across two districts in Xi’an. Compared to the baseline method, the STTDP-LM reduces the root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE) by an average of 12.25%, 12.55%, and 18.33%, respectively, across the two districts. When trained with only 1% of the data, the model still shows significant improvement, with average reductions of 33.83%, 34.12%, and 17.03% in the RMSE, MAE, and MAPE, respectively. The prediction accuracy of the model is more prominent in multi-step prediction with a total duration of 60 min. In summary, this study offers a promising solution for taxi demand prediction with limited historical data, providing a valuable insight for real-world applications in intelligent transportation systems.
This paper presents a novel AI-based smart traffic management system de-signed to optimize traffic flow and reduce congestion in urban environments. By analysing live footage from existing CCTV cameras, this approach eliminates the need for additional hardware, thereby minimizing both deployment costs and ongoing maintenance expenses. The AI model processes live video feeds to accurately count vehicles and assess traffic density, allowing for adaptive signal control that prioritizes directions with higher traffic volumes. This real-time adaptability ensures smoother traffic flow, reduces congestion, and minimizes waiting times for drivers. Additionally, the proposed system is simulated using PyGame to evaluate its performance under various traffic conditions. The simulation results demonstrate that the AI-based system out-performs traditional static traffic light systems by 34%, leading to significant improvements in traffic flow efficiency. The use of AI to optimize traffic signals can play a crucial role in addressing urban traffic challenges, offering a cost-effective, scalable, and efficient solution for modern cities. This innovative system represents a key advancement in the field of smart city infra-structure and intelligent transportation systems.
In the aviation domain, there are many applications for machine learning and artificial intelligence tools that utilize natural language. For example, there is a desire to know the commonalities in written safety reports such as voluntary post incidents reports or create more accurate transcripts of air traffic management conversations. Another use-case is the possibility of extracting airspace procedures and constraints currently written in documents such as Letters of Agreement (LOA) which is used as the evaluation case in this paper. These applications can benefit from the use of state-of-the-art Natural Language Processing (NLP) techniques when adapted to the language/phraseology specific to the aviation domain. This paper evaluates the viability of transferring pre-trained large language models to the aviation domain by adapting transformer based models using aviation datasets. This paper utilized two datasets to adapt a ‘Robustly Opti-mized Bidirectional Encoder Representations from Transformers Approach’ (RoBERTa) model and two down-stream classification tasks to assess its performance. These datasets are all built upon Letters of Agreement which are Federal Aviation Administration (FAA) documents that formalize airspace operations across the national airspace system. The first two datasets are used for the adaptation of RoBERTa to the aviation domain and were of different sizes to assess the number of documents needed to adapt to the aviation domain. They contain many examples of ‘aviation English’ using domain specific terminology and phrasing which serves as a representative basis to perform the unsupervised adaptation. The second dataset is a separate set of LOA documents with two sets of classification labels to be used for evaluation; one at the document level and one at the line level. These down-stream evaluations allowed the measurement of improvement by adapting RoBERTa. The accuracy increased by 4–6 % on both tasks and the F1 score on the class of interest increased by 4–8 % from the adaptation.
Urban air transportation is undergoing a revolutionary transformation through the integration of Urban Air Mobility (UAM) and Advanced Air Mobility (AAM) systems. One big challenge is that new aerial vehicles (AV) will quickly saturate the already crowded aviation spectrum, which is an essential resource to ensure reliable communications for safe air operations. In this paper, we consider an air transportation system where multiple AV s are operated to transport passengers or cargo from different sources to destinations. During the flight, the minimum communication Quality of Service (QoS) must be achieved at all times to ensure flight safety. Our objective is to minimize the total mission completion time by jointly optimizing the trajectories, velocities, and spectrum allocation for all AV s. We formulate the optimization problem as a multi-stage Markov game where the optimization variables are coupled together. A multi-agent deep Reinforcement learning VD3QN algorithm is proposed to enable cooperative learning among AV s for both trajectory planning and resource optimization. Additionally, we propose an orthogonal multiple access with Space-Time A * algorithm as a non-learning-based solution. Extensive simulation results show that our learning-based solution outperforms both the non-learning-based solution and other learning-based approaches like Qmix under different parameters.
Urban air mobility (UAM) is a revolutionary urban transportation paradigm that aims to transport passengers, emphasizing safety, power efficiency, and autonomous operation. Also, UAM aircraft needs to regularly transmit status and environmental conditions to the base stations (BS) to maintain updated information in the system. Hence, the age of information (AoI) can be used as a status measurement indicator to evaluate the degree to which the current status information of UAM aircraft is outdated. Optimizing each UAM aircraft's flight trajectory for both power efficiency and safety, while also minimizing the AoI for timely data transmission forms a computationally challenging NP-hard problem. In this paper, a multi-agent deep reinforcement learning based solution is proposed to address this problem by exploiting a sophisticated method to handle multidimensional decision spaces for power efficiency, safety in trajectory planning, and AoI. Simulation results demonstrate that our method produce a power efficiency improvement of 8.8% compared to the PPO method and 13.8% compare to the detour method. Furthermore, the proposed approach significantly improve the data freshness by up to 31.5%, 31.7%, and 12.9% in terms of AoI compared to the greedy, detour, and single PPO methods, respectively.
The high rates of urbanization and high concentration of vehicles have led to extreme traffic congestion, thereby contributing significantly to greenhouse gas emissions. This study proposes an Adaptive Multi-Agent Reinforcement Learning (MARL) framework combined with an Emission-Aware Reward System to optimize traffic lights. The suggested system in this work is decentralized MARL, where each intersection is considered as a single agent that disseminates the information to the neighboring agents to prevent congestion, coordinate with other agents, and attain an efficient traffic flow. This new dynamic signal timing algorithm, Traffic-Emission Adaptive Learning (TEAL), will examine actual traffic density, vehicle types, and other environmental statistics, such as the AAI (airquake index) and carbon footprint metric. The proposed module is an Eco-Mobility Prediction Module, a Graph Neural Network (GNN) predictor of congestion patterns and green wave synchronization, designed to reduce vehicle idle time and emissions. Furthermore, Vehicle-to-Infrastructure (V2I) is a mechanism that encourages the use of electric and hybrid vehicles, as it allows them to have a higher priority in zero-emission transportation. The proposed solution is shown to reduce CO2 emissions by 30 percent and is significantly more efficient than traditional models, as indicated by experimental findings conducted in a simulated urban setting. This new model, therefore, scales sustainable city and zero emission mobility up to an eco-friendly solution in the future smart city environment
No abstract available
The emerging concepts of Urban Air Mobility (UAM) and Advanced Air Mobility (AAM) open a new paradigm for urban air transportation. One big challenge is that new aerial vehicles (AV) will quickly saturate the already crowded aviation spectrum, which is an essential resource to ensure reliable communications for safe air operations. In this paper, we consider an air transportation system where multiple AVs are operated to transport passengers or cargo from different sources to destinations along their pre-defined paths. During the flight, the minimum communication Quality of Service (QoS) must be achieved at all times to ensure flight safety. Our objective is to minimize the total mission completion time by jointly optimizing the velocities and spectrum allocation for all AVs. We formulate the optimization problem as a multi-stage Markov game where the optimization variables are coupled together. A multi-agent deep reinforcement learning VD3QN algorithm is proposed to enable cooperative learning among AVs. Additionally, we propose a heuristic greedy algorithm (HGA) and an orthogonal multiple access (OMA) solution as baseline solutions. Extensive simulation results show that our learning-based solution outperforms the baseline solutions under different network configurations.
This paper presents an integrated framework for autonomous mobility-on-demand (AMoD) systems, focusing on dynamic pricing, ride-sharing, and decentralized coordination. Built on a high-fidelity, city-scale environment calibrated with NYC taxi data, the framework dynamically generates passenger and shared autonomous vehicle (SAV) agents based on realworld spatiotemporal demand patterns. The system integrates a multi-objective multi-agent deep reinforcement learning (MOMADRL) framework with centralized training and decentralized execution (CTDE), allowing agents to optimize individual incentives and system-level social welfare jointly. Adaptive pricing strategies, flexible ride-matching mechanisms, and zone-based geographic abstractions are included to enhance computational efficiency while maintaining geographic realism. Experimental results demonstrate that our framework consistently improves key performance indicators like passenger waiting times, vehicle utilization, and pricing stability, outperforming purely centralized or decentralized methods. This research provides a robust platform for testing adaptive policies and advancing scalable, equitable AMoD system design.
In response to traffic congestion in urban areas, the National Aeronautics and Space Administration promotes the concept of Advanced Air Mobility (AAM), which envisages a safe and efficient air transportation system. However, the increased communication demands in AAM can exacerbate the spectrum scarcity issue, so a new communication paradigm is necessary. In this paper, we consider multiple aerial vehicles (AV) flying along their shortest paths for cargo/passenger delivery. During the flight, each AV must make decisions in every time slot on communication resource allocation and velocity selection under safety constraints. Accordingly, we formulate the joint optimization problem to minimize the weighted sum of the total travel time and communication outage time. The optimization problem is formulated as a Markov game and a multi-agent reinforcement learning based algorithm is proposed. Simulation results corroborate the effectiveness of the proposed solution.
We study a sequential decision-making problem for a profit-maximizing operator of an autonomous mobility-on-demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider coordinated actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.
No abstract available
Urbanization is intensifying congestion, emissions, and unequal mobility access in cities. This study aims to operationalize sustainability objectives—efficiency, environmental externalities, and service equity—in network-wide traffic system control. We propose SERL-H, a sustainability-aware hierarchical multi-agent reinforcement learning (MARL) controller. SERL-H separates fast intersection-level actuation from slower region-level coordination under a centralized-training decentralized-execution paradigm, and employs adaptive graph attention to capture time-varying interdependencies with bounded neighborhood communication. The learning reward explicitly balances delay/throughput, emissions/fuel, and an equity regularizer based on service dispersion across user groups. In a SUMO-based city-scale simulation with 100 signalized intersections, SERL-H reduces average delay from 45 s to 29 s and average travel time from 120 s to 88 s relative to fixed-time control, while increasing throughput and lowering total emissions (4800 kg to 3950 kg). A socio-economic assessment suggests higher annualized cost savings (e.g., $50.27 M/year to $65.91 M/year) and improved environmental quality indices. We also report, as supporting evidence, an optional sustainability-enhanced spatio-temporal graph predictor (SUT-GNN) that provides reliable short-horizon forecasts during peak-hour volatility.
Urban earthquake evacuations require the usage of real-time, automated computer decision-support systems, which are based on data and can handle risks that are changing and make restrictions on mobility. In this paper, a framework for large-scale evacuation optimization is offered, which consists of a Risk-Aware Multi-Agent Reinforcement Learning (MARL) integrated with a GIS-based Digital Twin. The framework presented here uses a Conditional Value-at-Risk (CVaR) objective to augment the robustness of the policy in the face of significant and rare disruptions; at the same time, a multilayered architecture (data fusion, digital twin, CTDE learning, real-time guidance) assures coherence in operations. The scalability and reliability of the system were tested through the application of 1,200 earthquake scenarios that were created using the fusion of GIS, IoT, and mobility data that are statistically representative. The results of the experiments show that the CVaR-MARL model proposed has about 10–12% lower variance in clearance time and much more improved stability compared to the traditional MARL and PPO methods. The contributions of this paper are (1) development of a risksensitive learning paradigm for real-time evacuation, (2) a GIS-IoT digital twin that can be used for scenario simulation, and (3) comprehensive evaluation on latency, robustness, and interpretability. The implications of these findings are that risk-aware MARL has a very strong potential for being used in urban crisis management and decision support systems in real-time situations.
This study aims to explore how to improve the reliability of the next-generation mobility model–Mobility as a Service (MaaS) based on autonomous vehicles, with a particular focus on the system’s resilience to uncertainty. Currently, the application of reliability engineering in the field of smart mobility services is primarily concentrated on technical details, lacking unified standards and methods to enhance the reliability of service levels. This paper attempts to fill this research gap. In this study, we adopt a combined approach of system analysis and optimization algorithms. First, we design a system reliability analysis method by examining the potential discrepancies between the system’s capability to provide mobility services and stakeholder demands. Subsequently, we propose a reinforcement learning-based system service capacity optimization algorithm aimed at enhancing the system’s resilience at the service level to tackle challenges posed by uncertainty. To validate the effectiveness of the proposed method, we conduct a case study on a practical intelligent mobility service framework. Through system simulation, we generate and collect data on system service capacity, demand discrepancies, and uncertainty, as well as stakeholders’ expectations for the MaaS framework evaluation. Case studies and experimental data analysis confirm that the proposed resilience engineering approach effectively identifies potential risks in system service capacity and provides a compromise system resilience engineering solution in the context of conflicting stakeholder demands. To facilitate reproducibility and further research, the core code is available at https://github.com/zzs-code/MaaS-RE-MARL.git.
Efficient and collision-free route planning for multiple agents in urban environments presents a significant challenge for intelligent transportation systems and autonomous logistics. This paper introduces an integrated system that generates globally collision-free routes for multiple agents operating on real-world road networks derived from OpenStreetMap (OSM) data. The system employs the ConflictBased Search (CBS) algorithm for centralized route optimization, while modeling agents' local movements using principles inspired by Cellular Automata (CA). The primary novel contributions are twofold: An enhanced hybrid OSMSUMO route conversion mechanism that robustly transforms planned OSM-based routes into executable edge sequences in the SUMO (Simulation of Urban MObility) microscopic traffic simulator, addressing common ID matching and segmentation issues; and a comprehensive analysis and simulation framework, incorporating diverse visualization tools to thoroughly evaluate the performance, spatio-temporal behavior, and conflict resolution processes of CBS solutions. The effectiveness of the developed system is demonstrated through the successful generation and dynamic simulation of collision-free routes and their seamless transfer to the SUMO environment using the proposed conversion technique, accentuating its practical applicability in realistic urban scenarios.
The Autonomous Mobility-on-Demand system is an emerging green and sustainable transportation system providing on-demand mobility services for urban residents. To achieve the best recharging, delivering, and repositioning task assignment decision-making process for shared autonomous electric vehicles, this paper formulates the fleet dynamic operating process into a multi-agent multi-task dynamic dispatching problem based on Markov Decision Process. Specifically, the decision-making process at each time step is divided into 3 sub-processes, among which recharging and delivery task assignment processes are transformed into a maximum weight matching problem of bipartite graph respectively, and the repositioning task assignment process is quantified as a maximum flow problem. Kuhn-Munkres Algorithm and Edmond-Karp Algorithm are adopted to solve the above two mathematical problems to achieve the optimal task allocation policy. To further improve the dispatching performance, a new instant reward function balancing order income with trip satisfaction is designed, and a state-value function estimated by Back Propagation-Deep Neural Network is defined as a matching degree between each shared autonomous electric vehicle and each delivery task. The numerical results show that: (i) a reward function focusing on income and satisfaction can increase total revenue by 33.2%, (ii) the introduction of task allocation repositioning increases total revenue by 50.0%, (iii) a re-estimated state value function leads to a 2.8% increase in total revenue, (iv) the combination of charging and task repositioning can reduce user waiting time and significantly improve user satisfaction with the trip.
With the rapid development of High-Speed Railways (HSR), there is a growing focus on research. High-mobility communication is an area that requires significant improvement in 5G, 6G, and beyond cellular communication systems. However, research on optimizing communication pilot design specifically for HSR scenarios is still relatively premature. To validate the effectiveness of pilot allocation in HSR communication, this paper establishes a model of the train-to-ground Predictor Antenna (PA) communication system. We focus on the design of Sounding Reference Signals (SRS) in a high-speed train-to-ground Time Division Duplex (TDD) communication system that includes PAs. Our objective is to maximize the system's downlink throughput. We propose a reinforcement learning algorithm based on multi-agent double Deep Q-Network (MADDQN), which jointly optimizes the allocation quantity of SRS for different antennas, pattern of SRS, and transmission period in both the time and frequency domains. Simulation results indicate that our proposed algorithm jointly allocates the time-frequency resources of SRS, thus enhancing the channel estimation for PA and channel prediction for Main Antennas (MA) in HSR scenarios, while ensuring low complexity and stability.
This paper addresses the challenges of inter-vehicle communication, taking into consideration the stochastic nature of primary user spectrum occupancy, the highly dynamic fluctuation of channel states, and the timeliness requirements for communication among vehicles. The study investigates the joint channel selection and power control resource allocation problem in cognitive Internet of Things (CIoT) under high-speed mobility, with the aim of minimizing the system’s Age of Information (AoI). The presented problem is modeled as a Markov Decision Process (MDP) and incorporates a meticulously designed reward function. Furthermore, to meet the timeliness demands, a multi-agent reinforcement learning approach is employed, with vehicles serving as intelligent agents that gather localized observational information and directly determine their transmission strategies. An improved Multi-agent Proximal Policy Optimization (IMAPPO) algorithm is proposed, which is based on a centralized training and distributed execution framework. Enhancements to the Actor network within the algorithm enable it to address the challenges presented by the discrete-continuous hybrid action space. Finally, the feasibility and effectiveness of the enhanced multi-agent proximal policy optimization algorithm are verified through simulations. The results demonstrate that compared to alternative approaches, the CIoT resource allocation scheme based on the improved multi-agent proximal policy optimization algorithm significantly reduces the AoI for vehicle users.
With the rapid development of the food delivery industry, efficient battery-swapping services have become a critical factor in enhancing the delivery efficiency of delivery electric micro-mobility (DEM). However, issues such as battery-swapping queues and insufficient battery levels after battery-swapping significantly reduce drivers’ satisfaction with the service. To address the inefficiencies in DEM battery-swapping, this study integrates a discrete choice model of driver pReferences with a multi-agent reinforcement learning (MARL) framework, forming an interactive decision-making system. This approach enables the optimization of battery-swapping timing and station selection while improving recommendation service satisfaction. The system employs a “centralized training, decentralized execution” approach with improved MAPPO (IMAPPO), where agents share global information during training but operate independently when executing, deciding swaps based on local states. Experiments show this IMAPPO significantly cuts rejected swap recommendations, reduces queue and detour times, and boosts order completion rates. It outperforms baseline algorithms across multiple metrics, adapting dynamically to optimize for both driver satisfaction and operational efficiency.
Abstract In this work the framework for detectors layout optimization based on a multi-agent simulation is proposed. Its main intention is to provide a decision support team with a tool for automatic design of social threat detection systems for public crowded places. Containing a number of distributed detectors, this system performs detection and an identification of threat carriers. The generic model of detector used in the framework allows to consider detection of various types of threats, e.g. infections, explosives, drugs, radiation. The underlying agent-based models provide data on social mobility, which is used along with a probability based quality assessment model within the optimization process. The implemented multi-criteria optimization scheme is based on a genetic algorithm. For experimental study the framework has been applied in order to get the optimal detectors’ layout in Pulkovo airport.
It is hard to find the global optimum of general nonlinear and nonconvex optimization problems in a reasonable time. This article presents a method to transfer the receding horizon control approach, where nonlinear, nonconvex optimization problems are considered, into graph-search problems. Specifically, systems with symmetries are considered to transfer system dynamics into a finite-state automaton. In contrast to traditional graph-search approaches where the search continues until the goal vertex is found, the transfer of a receding horizon control approach to graph-search problems presented in this article allows to solve them in real time. We prove that the solutions are recursively feasible by restricting the graph search to end in accepting states of the underlying finite-state automaton. The approach is applied to trajectory planning for multiple networked and autonomous vehicles. We evaluate its effectiveness in simulation and experiments in the Cyber-Physical Mobility Lab, an open-source platform for networked and autonomous vehicles. We show real-time capable trajectory planning with collision avoidance in experiments on off-the-shelf hardware and code in MATLAB for two vehicles.
Area traffic signal control is important to alleviate urban traffic congestion. In this paper, we propose an improved multi-agent proximal policy optimization (MAPPO) algorithm via combine intrinsic curiosity module and proximal policy optimization to control area traffic signal. In the proposed algorithm, a multi-intersection traffic network is modeled as a multi-agent system and each agent is trained to search the optimal strategy. We validate our algorithm performance on the simulation of mobility (SUMO) platform. Experimental results show that the proposed algorithm can effectively reduce queue lengths and waiting time. Also, the performance of our algorithm is superior to MAPPO and fixed-time control.
No abstract available
No abstract available
Urban traffic congestion remains a pressing challenge as cities expand and traffic patterns become increasingly dynamic. This paper presents a decentralized traffic signal control architecture powered by Multi-Agent Deep Deterministic Policy Gradient (DDPG) reinforcement learning, tightly integrated with the Simulation of Urban Mobility (SUMO) platform. In this framework, each traffic intersection is treated as an independent learning agent that optimizes signal phase decisions based on real-time local observations, such as queue lengths and vehicle waiting times. The agents interact with the traffic environment via the TraCI API, enabling continuous learning without requiring global state information. To enhance decision quality, temporal features are extracted from traffic flow data during training. The proposed system is evaluated across multiple synthetic traffic scenarios with varying densities and network topologies. Experimental results show substantial improvements in key performance metrics, including reduced average vehicle delay, shorter queue lengths, and increased intersection throughput, specifically the Average Waiting Time (AWT) of 17.6 %, demonstrating the framework's scalability and adaptability.
No abstract available
Traffic congestion has increased significantly in today’s rapidly urbanizing world, influencing people’s daily lives. Traffic signal control systems (TSCSs) play an important role in alleviating congestion by optimizing traffic light timings and improving road efficiency. Yet traditional TSCSs neglected pedestrians, cyclists, and other non-monitored road users, degrading traffic signal optimization (TSO). Therefore, this framework proposes a multi-object-based traffic flow analysis and intensity estimation model for efficient TSO using Upper Confidence Bound Multi-agent Reinforcement Learning Cubic Spline Fuzzy Logic (UCB-MRL-CSFL). Initially, the real-time traffic videos undergo frame conversion and redundant frame removal, followed by preprocessing. Then, the lanes are detected; further, the objects are detected using Temporal Context You Only Look Once (TC-YOLO). Now, the object counting in each lane is carried out using the Cumulative Vehicle Motion Kalman Filter (CVMKF), followed by queue detection using Vehicle Density Mapping (VDM). Next, the traffic flow is analyzed by Feature Variant Optical Flow (FVOF), followed by traffic intensity estimation. Now, based on the siren flashlight colors, emergency vehicles are separated. Lastly, UCB-MRL-CSFL optimizes the Traffic Signals (TSs) based on the separated emergency vehicle, pedestrian information, and traffic intensity. Therefore, the proposed framework outperforms the other conventional methodologies for TSO by considering pedestrians, cyclists, and so on, with higher computational efficiency (94.45%).
Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.
Adaptive traffic signal control techniques have been developed in numerous studies to increase traffic flow efficiency. Using traffic signals to design an adaptive traffic management system is ideal for reducing traffic congestion. Reinforcement learning is a branch of current approaches that try to learn a policy function through a trial-and-error process and maximize the reward through properly adjusted interaction with the learning agent’s environment. We propose a traffic signal control architecture for an oversaturated urban network using Deep Q-Network. We have enhanced the learning process by incorporating diverse state information through upstream and downstream detailed traffic states. We conduct experiments on the Simulation of Urban MObility, an open-source traffic simulator that supports large-scale traffic signal control.
Free-flow road networks, such as suburban highways, are increasingly experiencing traffic congestion due to growing commuter inflow and limited infrastructure. Traditional control mechanisms—traffic signals or local heuristics—are ineffective or infeasible in these high-speed, signal-free environments. We introduce self-regulating cars, a reinforcement learning-based traffic control protocol that dynamically modulates vehicle speeds to optimize throughput and prevent congestion, without requiring new physical infrastructure. Our approach integrates classical traffic flow theory, gap acceptance models, and microscopic simulation into a physics-informed RL framework. By abstracting roads into super-segments, the agent captures emergent flow dynamics and learns robust speed modulation policies from instantaneous traffic observations. Evaluated in the high-fidelity PTV Vissim simulator on a real-world highway network, our method improves total throughput by 5%, reduces average delay by 13%, and decreases total stops by 3% compared to the no-control setting. It also achieves smoother, congestion-resistant flow while generalizing across varied traffic patterns—demonstrating its potential for scalable, ML-driven traffic management.
For the traffic network signal control scenario, this paper proposes an improve multi-agent data-driven distributed adaptive coordination control algorithm (I-MA-DACC), which outputs the distributed adaptive green time at each intersection for the purpose of dynamic queue balancing. The queuing length obtained by the front-end information collection devices and the green time of the current cycle are the system input and output data, respectively. Meanwhile, queue balancing control of multi-directional traffic flow on signal-controlled traffic networks is considered in the framework of multi-agent system. As a result, the proposed I-MA-DACC is an improved multi-agent based data-driven control strategy. Firstly, the parameter and control learning law are deployed for each intersection enhancing the scalability and adaptive ability. Secondly, the algorithm can be applied to different congestion scenarios in large-scaled traffic network. Finally, it is verified by the Sumo-Python simulation platform with open-source superiority. Compared with other distributed adaptive cooperative control methods, the simulation results show the advantage of I-MA-DACC in reducing of the queuing delays, waiting count and time loss.
No abstract available
Abstract This paper describes the implementation of the fully traffic adaptive signal control algorithm by Lammer 2 in the agent-based transport simulation MATSim. The implementation is tested at an illustrative, single intersection scenario and compared to the results of Lammers MATLAB simulation. Plausibility of the self-controlled signals and overall results can be confirmed. Small deviations can be explained by differences in flow simulation and resolution of simulation time steps. In the simulation of the illustrative intersection, the adaptive control is proved to be stable and, overall, superior to a fixed-time control. Constant vehicle arrivals are simulated to show the performance of the control and its underlying sub-strategies. The expected behavior of the algorithm and its implementation are validated by analyzing queue lengths over time. The adaptive control significantly outperforms the fixed-time control for stochastic demand, where its ability to dynamically react to changes in flow becomes important.
Urban traffic management is highly complex, and inefficient control strategies often worsen congestion and increase energy consumption. This paper introduces a collaborative multi-agent reinforcement learning method tailored for sparse control scenarios, IKS-SAC (Improved Knowledge Sharing Soft Actor–Critic), which enhances coordination between traffic signals to optimize traffic flow. IKS-SAC incorporates a communication protocol for knowledge sharing among agents, enabling each agent to access and utilize traffic environment data collected by other agents, effectively addressing the challenge of data processing in asynchronous updates, thereby achieving a comprehensive understanding of the traffic environment within a sparse control framework. Validation of the synthetic data demonstrates that IKS-SAC exhibits superior adaptability and efficiency in managing traffic flow fluctuations and uncertainties, significantly outperforming existing reinforcement learning-based and traditional traffic control methods. The proposed method demonstrates significant advantages in reducing traffic congestion, lowering energy consumption, and mitigating environmental pollution.
Improving traffic performance in dense mixed traffic scenarios such as bottlenecks, presents significant challenges due to complex interactions and the unpredictable behaviors of human drivers. These challenges are compounded by varying human driving styles and different proportions of Connected and Automated Vehicles (CAVs) within the traffic flow. Our research focuses on developing cooperative control strategies for CAVs to enhance generalization across diverse traffic scenarios. To address these challenges, we introduce an Interaction-Aware Hierarchical Representation (IAHR) module, integrated into Multi-Agent Reinforcement Learning (MARL) framework. The IAHR module hierarchically processes interactions between CAVs and Human-Driven Vehicles (HDVs), effectively extracting essential features to facilitate generalization across various traffic scenarios. Additionally, we design an effective reward function that balances individual interests with overall traffic performance, guiding CAVs to improve their driving efficiency and safety while also enhancing overall traffic flow. The model is rigorously trained and zero-shot evaluated in various bottleneck scenarios. Results demonstrate the model's capability to significantly improve traffic performance under dense conditions and generalize across different CAV penetration rates, vehicle numbers, and HDV driving style distributions.
Traffic signal control (TSC) has seen substantial advancements through the application of reinforcement learning (RL) algorithms, which have shown remarkable potential in enhancing traffic flow efficiency. These RL-based approaches often surpass traditional rule-based methods, particularly in dynamic traffic environments. However, current RL solutions for TSC predominantly rely on model-free methods, necessitating extensive environmental interactions during training. This requirement can be prohibitively expensive or unfeasible in real-world implementations. Furthermore, existing methods have frequently neglected the issue of fairness in multi-intersection control, resulting in unbalanced congestion across different intersections. To address these challenges, we present FM2Light, a fairness-aware model-based multi-agent RL framework for TSC. Our approach leverages an ensemble of global world models for generating synthetic samples to enhance sample efficiency, thereby mitigating the data-intensive nature of the training process. Additionally, FM2Light incorporates a refined reward structure to promote fairness and improve coordination across multiple intersections. Extensive evaluations conducted in diverse real-world scenarios demonstrate that FM2Light achieves performance comparable to or exceeding that of model-free RL (MFRL) methods, while significantly reducing sample requirements and ensuring more equitable control among multiple agents.
Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.
No abstract available
Traditional automated monitoring systems adopted for Intersection Traffic Control still face challenges, including high costs, maintenance difficulties, insufficient coverage, poor multimodal data integration, and limited traffic information analysis. To address these issues, the study proposes a sovereign AI-driven Smart Transportation governance approach, developing a mobile AI solution equipped with multimodal perception, task decomposition, memory, reasoning, and multi-agent collaboration capabilities. The proposed system integrates computer vision, multi-object tracking, natural language processing, Retrieval-Augmented Generation (RAG), and Large Language Models (LLMs) to construct a Pipeline-based Traffic Analysis System (PTAS). The PTAS can produce real-time statistics on pedestrian and vehicle flows at intersections, incorporating potential risk factors such as traffic accidents, construction activities, and weather conditions for multimodal data fusion analysis, thereby providing forward-looking traffic insights. Experimental results demonstrate that the enhanced DuCRG-YOLOv11n pre-trained model, equipped with our proposed new activation function βsilu, can accurately identify various vehicle types in object detection, achieving a frame rate of 68.25 FPS and a precision of 91.4%. Combined with ByteTrack, it can track over 90% of vehicles in medium- to low-density traffic scenarios, obtaining a 0.719 in MOTA and a 0.08735 in MOTP. In traffic flow analysis, the RAG of Vertex AI, combined with Claude Sonnet 4 LLMs, provides a more comprehensive view, precisely interpreting the causes of peak-hour congestion and effectively compensating for missing data through contextual explanations. The proposed method can enhance the efficiency of urban traffic regulation and optimizes decision support in intelligent transportation systems.
With the development of artificial intelligence and autonomous driving technology, the vehicle-road cooperative control system combined with artificial intelligence technology can provide more effective and adaptive traffic control solutions for intelligent transportation systems. Existing research works are confronted with two kinds of challenges. For one thing, traditional recurrent neural networks-based methods cannot model the long-time dependent information in traffic flow sequences. For another, the large sample correlation makes it difficult to optimize the trained strategies. In this paper, we propose a Multi-agent Deep Reinforcement Learning (MADRL)-based intelligent vehicle cooperative control method to deal remedy current gaps. To this end, a closed-loop control system of self-driving vehicles and signal controllers is used as the research object to achieve dynamic scheduling of traffic flow by MADRL. After designing relevant experimental validation, the feasibility of the method is verified in terms of both scheme comparison and operational effect analysis, which is a good aid to traffic signal timing. The simulation results show that the proposal can be well utilized to realize collaborative control of smart vehicles, and there is some performance improvement compared with several typical methods.
No abstract available
This paper proposes a parallel system-based predictive control (PPC) method to address the problem of active traffic signal control in large-scale urban road networks. The method leverages simulated artificial transportation systems to infer the short-term future operating states of the real transportation system. During the inference process, an efficient predictive learning-based multi-agent reinforcement learning (RL) algorithm is employed to optimize the cooperative control policies. The optimized policies are then deployed to the real transportation system at fixed intervals to adapt to the real-time and dynamic traffic flow. Experimental results demonstrate that PPC outperforms traditional traffic control methods and some multi-agent RL benchmarks in large-scale road network control scenarios with nearly two hundred intersections, showcasing superior generalization capabilities.
As Connected and Autonomous Vehicles (vehicle) evolve, Autonomous Intersection Management (AIM) systems are emerging to enable safe, efficient traffic flow at urban intersections without traffic signals. However, existing AIM systems, whether based on traditional optimization control methods or machine learning, suffer from low computational efficiency and a lack of robustness in ensuring safety, respectively. To overcome these limitations, we propose an innovative AIM scheme rooted in Safe Multi-Agent Deep Reinforcement Learning (MADRL). We initially model the safe MADRL problem as a constrained Markov game (CMG) and tackle it with our multi-agent projective constrained policy optimization (MAPCPO). This method first optimizes policy updates within the Kullback-Leibler divergence trust region to maximize performance, and then projects these optimized policies onto the bounds of risk constraints, thus ensuring safety. Building on this, we introduce a Risk-Bounded RL for Autonomous Intersection Management (RbRL-AIM) algorithm. This algorithm adopts an architecture that consists of an LSTM based policy neural network, a reward value network, and a risk neural network. These components, through the MAPCPO policy, enable continuous learning from complex and random intersection traffic environments, thereby facilitating the safe, efficient, and smooth control of vehicles at intersections. Our method is validated in a CARLA simulation, showing significant gains in computational and traffic efficiency over baseline optimization control methods. Compared to non-safety-aware MADRL methods, our approach achieves zero collisions and improved ride comfort.
No abstract available
Reinforcement learning can be applied to signal control to achieve efficient control while reducing the cost of manual configuration. However, as reinforcement learning acquires control that is specific to the environment at the time of learning, conventional methods assume that the optimal control is the same at the time of learning and application, and that it remains consistent despite changes in traffic flow. In this study, we address this problem using multi-objective reinforcement learning, in which multiple policies are obtained by changing the weights. We propose a method to obtain control laws exhaustively for various traffic flow ratios and switch between them when applied, using the traffic flow ratios as weights and the rewards for each road as rewards for each objective. The superiority of the proposed method is verified by two computer experiments, one with a single agent and the other with multiple agents.
No abstract available
The traditional traffic signal control system usually relies on fixed or dynamic preset time period regulation schemes, which has a limited response to the real-time changes in traffic flow and the demand of different vehicle priorities. Existing control strategies are difficult to ensure the fast passage of high-priority vehicles while providing sufficient passage guarantees for normal vehicles. In this paper, we propose a deep reinforcement learning traffic signal control method for multi-priority vehicles. Firstly, we improve the DQN by using the dueling structure. Secondly, we assign different weights to different priority vehicles and redesign the agent's state information and reward function. In simulation experiments, we compare four similar traffic signal control methods, and the experimental results prove the effectiveness of our proposed method.
Traffic Signal Control using Reinforcement Learning has been proved to have potential in alleviating traffic congestion in urban areas. Although research has been conducted in this field, it is still an open challenge to find an effective but low-cost solution to this problem. This paper presents multiple deep reinforcement learning-based traffic signal control systems that can help regulate the flow of traffic at intersections and then compares the results. The proposed systems are coupled with SUMO (Simulation of Urban MObility), an agent-based simulator that provides a realistic environment to explore the outcomes of the models.
No abstract available
Communication improves the efficiency and convergence of multi-agent learning. Existing study of agent communication has been limited on predefined fixed connections. While an attention mechanism exists and is useful for scheduling the communication between agents, it, however, largely ignores the dynamical nature of communication and thus the correlation between agents' connections. In this work, we adopt a normalizing flow to encode correlation between agents interactions. The dynamical communication topology is directly learned by maximizing the agent rewards. In our end-to-end formulation, the communication structure is learned by considering it as a hidden dynamical variable. We realize centralized training of critics and graph reasoning policy, and decentralized execution from local observation and message that are received through the learned dynamical communication topology. Experiments on cooperative navigation in the particle world and adaptive traffic control tasks demonstrate the effectiveness of our method.
No abstract available
No abstract available
This paper presents a new distributed data-driven adaptive cooperative control method (DDACC) for urban traffic signal timing which can achieve the multi-directional queuing length balance with changeable cycle in multi-intersections. This method can guarantee the consensus convergence of the distributed coordinated errors of queuing length with the goal of reducing traffic congestion in multi-agent traffic systems. The proposed DDACC has three novel features, merely using the collected I/O traffic queueing length data and network topology of multi-directional signal controllers at multi-intersections, considering maximum and minimum green time constraints, well working on both undersaturation and supersaturation traffic flow conditions. The results are illustrated by numerical and experimental comparison simulations which are performed on a VISSIM-VB-MATLAB joint simulation platform.
In modern cities, intersections are vital pieces of road infrastructure, but they also have the potential to snarl traffic owing to accidents or a lack of traffic coordination systems like traffic signals. As traffic eco-systems become more connected and autonomous, researchers have been increasingly interested in the concept of autonomous intersection management (AIM). AIM improves traffic flow by coordinating the motion of connected vehicles through intersections. This study proposes an AIM system consisting of a two-layer hierarchal control architecture consisting of a vehicle controller layer utilizing Model Predictive Control (MPC) that is tuned to maintain the designated trajectory for each vehicle while keeping the passengers comfortable by delivering a smooth acceleration profile. And an Intersection Manager layer utilizing Multi-Agent Reinforcement Learning using centralized training decentralized execution scheme to find the optimal strategy to prevent collisions between vehicles at an intersection by assigning a trajectory for each vehicle entering the intersection. A longitudinal vehicle dynamic model was used to mimic the motion of vehicles in response to the controller’s commands. The proposed AIM system noticeably improved throughput and boosted efficiency without sacrificing safety or passenger comfort.
Mobility experience of senior drivers is affected by varying factors such as travel time, surrounding vehicles’ speed, hectic traffic and availability of parking spaces. The efficiency of traffic flow in urban areas depends on an optimal signal plan. Several techniques have been used to optimise traffic signal control plans such as artificial neural networks, mixed integer linear programming and metaheuristics. However, they do not consider senior drivers special needs and characteristics. We propose a collaborative multi-agent adaptive signal control model, which offers an improved driving experience to senior drivers. The results show an improvement in delay time, waiting time and fuel consumption for all drivers and the driving experience of senior drivers when compared to state of the art approaches.
No abstract available
No abstract available
It is recognized that the control of mixed-autonomy platoons comprising connected and automated vehicles (CAVs) and human-driven vehicles (HDVs) can enhance traffic flow. Among existing methods, Multi-Agent Reinforcement Learning (MARL) appears to be a promising control strategy because it can manage complex scenarios in real time. However, current research on MARL-based mixed-autonomy platoon control suffers from several limitations. First, existing MARL approaches address safety by penalizing safety violations in the reward function, thus lacking theoretical safety guarantees due to the limited interpretability of RL. Second, few studies have explored the cooperative safety of multi-CAV platoons, where CAVs can be coordinated to further enhance the system-level safety involving the safety of both CAVs and HDVs. Third, existing work tends to make an unrealistic assumption that the behavior of HDVs and CAVs is publicly known and rational. To bridge the research gaps, we propose a safe MARL framework for mixed-autonomy platoons. Specifically, this framework 1) characterizes cooperative safety by designing a cooperative Control Barrier Function (CBF), enabling CAVs to collaboratively improve the safety of the entire platoon, 2) provides a safety guarantee to the MARL-based controller by integrating the CBF-based safety constraints into MARL through a differentiable quadratic programming (QP) layer, and 3) incorporates a conformal prediction module that enables each CAV to estimate the unknown behaviors of the surrounding vehicles with uncertainty qualification. Simulation results show that our proposed control strategy can effectively enhance the system-level safety through CAV cooperation of a mixed-autonomy platoon with a minimal impact on control performance.
Cooperative driving of connected and automated vehicles (CAVs) is envisioned as a promising approach to improving fuel efficiency, safety, and traffic flow. However, achieving robust and efficient control in heterogeneous CAV platoons remains challenging, especially under uncertain dynamics and external disturbances. Most existing reinforcement learning (RL)-based platoon controllers often ignore model uncertainties or rely on centralized training, limiting their scalability and robustness in real-world applicability. To address these limitations, this study proposes a fully distributed, model-free RL framework integrated with a robust compensator for optimal platoon control of heterogeneous vehicles with unknown dynamics. The RL agent simultaneously learns the optimal control policy and estimates control-relevant dynamic parameters using only local input-output data, without requiring explicit vehicle models. These estimates are then used in real time to construct a disturbance-rejection input that ensures robust trajectory tracking. A distributed observer based on consensus theory is embedded within the hybrid controller to estimate leader-relative reference trajectories using only local neighbor information, eliminating the need for global communication. Theoretical analysis guarantees policy convergence and bounded tracking errors under dynamic uncertainties. The proposed method is experimentally validated using the high-fidelity Mixed Traffic Simulation (MiTaS) platform, combining the SUMO microscopic traffic simulator with MATLAB, demonstrating improved tracking, damping of traffic oscillations, and up to 15.8% fuel savings compared to recent RL-based methods.
Large language models (LLMs) are promising for autonomous driving decision‐making, but existing methods mostly rely on cloud‐side deployment, causing high decision latency, privacy concerns and a lack of explicit safety verification for generated actions. To address these challenges, we propose SEDM (safety‐enhanced decision‐making framework) for highway driving scenarios. SEDM comprises an environment encoding module, an edge‐side LLM‐based decision‐making module enhanced through chain‐of‐thought prompting and low‐rank adaptation (LoRA) fine‐tuning, and an XGBoost‐based safety shield module that filters unsafe actions generated by the LLM. Experiments show that SEDM achieves driving success rates of 95%, 82% and 55% under simple, normal and dense traffic conditions, respectively—substantially outperforming such as deep Q‐network and proximal policy optimization. Moreover, it yields a 17‐percentage‐point improvement in success rate over an ablated variant without the safety shield module. Furthermore, decision latency is reduced from 7.80 s (cloud‐side LLM) to 1.01 s.
The emergency resource dispatch environment for public health emergencies is characterised by high uncertainty, dynamic nature, and multi-agent coordination complexity. Traditional models face significant challenges in dynamic response and knowledge integration. To overcome these limitations, this paper proposes an innovative theoretical framework: the AI-Agent-based Emergency Resource Dynamic Dispatch Model (AI-Agent-ERDM). By integrating multimodal data fusion, the symbolic reasoning capabilities of large language models, and the decision-optimization capabilities of reinforcement learning, this model constructs an intelligent decision-making architecture comprising three core modules: Perception, Brain, and Action. The paper formally specifies the model's multi-agent Markov decision process, detailing environment state construction via a multimodal knowledge base, LLM-driven agent decision mechanisms, and multi-role decision-making (encompassing government command, suppliers, transport, warehousing, and arbitrators). Through theoretical comparative analysis, it demonstrates the model's potential advantages over system dynamics, traditional operations research, and classical reinforcement learning approaches in addressing dynamic uncertainty, achieving interpretable decisions, and enabling efficient coordination. This paper aims to provide a novel and promising theoretical framework and technical methodology for the field of intelligent emergency resource dispatch.
This paper presents a novel framework, (Large Language Model) LLM-Augmented Reinforcement Learning (LLM-RL) to achieve adaptive and intelligent decision-making in dynamic environments. Converting the capacity of Large Language Models (LLMs) in semantic reasoning and producing generalization capabilities of the Visual Form-aided Autonomous Driving Task In contrast to more traditional reinforcement learning algorithms which rely purely on trial and error-based exploration, the proposed approach ensures that the semantic delicate mass idea and the capacity of contextual generation is combined into the exploration for policy optimization. The LLM offers high-level action priors, interpretive state representations and natural language guided reward shaping, reducing the sample inefficiency and advancing convergence. Moreover, LLM-based meta-prompting allows support for adaptability to previously unseen tasks without retraining. To test performance, the framework was tested over multiple benchmarks of multi-agent control, resource allocation and sequential decision-making. Experimental results show that we achieve an average improvement of 21.4% in cumulative reward, faster convergence of 18.7% and 25.2% reduction of catastrophic exploration error over state-of-the-art RL baselines. These results demonstrate the potential of LLM - RL for enabling a new paradigm of trustworthy, scalable, and adaptive decision-making in complex systems.
No abstract available
Earthquakes are currently one of the most serious natural disasters, and each occurrence brings a great threat to life and property safety. The intelligent emergency decision-making system based on the LangChain framework can integrate large model agents with emergency decision-making, providing emergency management personnel with rapid access to and analysis of data, as well as rapid formulation of emergency rescue plans. This system is based on the LangChain framework and utilizes the intelligent agent module, chat model, and text embedding model in the framework to achieve intelligent question answering and formulate emergency rescue plan functions. This system effectively alleviates the lack of information and data faced in the early stages of earthquake disaster rescue, providing great assistance for emergency management and rescue work.
No abstract available
The Internet of Drone Things (IoDT) advances autonomous drone operations by integrating live sensor inputs with environmental and situational awareness and intelligent decision-making capabilities. The full capabilities of IoDT remain limited by the difficulties of dynamic task assignment and path optimization, along with adaptive decision-making, when operating in complex environments such as disaster relief and smart agriculture. Traditional task-scheduling techniques have difficulty adapting to real-time changes caused by dynamic constraints such as weather variations, battery limitations, and drone malfunctions. We present an LLM-based task scheduling framework that uses Large Language Models (LLMs) to improve task prioritization performance and path planning accuracy while minimizing operational failures. We combine heuristic algorithms (A*, Dijkstra) with decision-making processes driven by LLMs to allow drones to adapt to environmental changes while optimizing efficiency and resource consumption. Integrating LLM technology into IoDT operations results in up to 95% task completion rates and improves the scenario completion time by up to 42%, while adding reasonable computational overhead. Our framework demonstrates improved task adaptability, battery efficiency, and stronger system resilience against non-LLM baselines during disaster relief and package delivery operations. Our research shows that LLM-based IoDT task management has transformative potential, leading to the development of more innovative and autonomous drone ecosystems.
Intelligent Transportation Systems (ITSs) aim to improve mobility and reduce congestion, yet current solutions still struggle with scalability, sensing bottlenecks, and inefficient computational resource usage. These limitations impede the shift towards environmentally responsible mobility. This work introduces ORQCIAM (Orchestrated Reasoning based on Quantum Computing and Intelligence for Advanced Mobility), a modular framework that combines Quantum Computing (QC) and Large Language Models (LLMs) to enable real-time, energy-aware decision-making in ITSs. Unlike conventional ITS or AI-based approaches that focus primarily on traffic performance, ORQCIAM explicitly incorporates sustainability as a design objective, targeting reductions in travel time, fuel or energy consumption, and CO2 emissions. The framework unifies cognitive, virtual, and federated sensing to enhance data reliability, while a hybrid decision layer dynamically orchestrates QC–LLM interactions to minimize computational overhead. Scenario-based evaluation demonstrates faster incident screening, more efficient routing, and measurable sustainability benefits. Across tested scenarios, ORQCIAM achieved 9–18% reductions in travel time, 6–14% lower estimated CO2 emissions, and around a 50–75% decrease in quantum-optimization calls by concealing QC activation during non-critical events. These results confirm that dynamic QC–LLM coordination effectively decreases computational overhead while supporting greener and more adaptive mobility patterns. Overall, ORQCIAM illustrates how hybrid QC–LLM architectures can serve as catalysts for efficient, low-carbon, and resilient transportation systems aligned with sustainable smart-city goals.
The low-altitude economy (LAE), encompassing urban air mobility, drone logistics and sub 3000 m aerial surveillance, demands secure, intelligent infrastructures to manage increasingly complex, multi-stakeholder operations. This survey evaluates the integration of Internet of Things (IoT) networks, artificial intelligence (AI) decision-making and blockchain trust mechanisms as foundational enablers for next-generation LAE ecosystems. IoT sensor arrays deployed at ground stations, unmanned aerial vehicles (UAVs) and vertiports form a real-time data fabric that records variables from air traffic density to environmental parameters. These continuous data streams empower AI models ranging from predictive analytics and computer vision (CV) to multi-agent reinforcement learning (MARL) and large language model (LLM) reasoning to optimize flight paths, identify anomalies and coordinate swarm behaviors autonomously. In parallel, blockchain architectures furnish immutable audit trails for regulatory compliance, support secure device authentication via decentralized identifiers (DIDs) and automate contractual exchanges for services such as airspace leasing or payload delivery. By examining current research and practical deployments, this review demonstrates how the synergistic application of IoT, AI and blockchain can bolster operational efficiency, resilience and trustworthiness across the LAE landscape.
No abstract available
With the development of unmanned aerial vehicle (UAV) technology, multimachine collaborative operations have become the core model for increasing mission effectiveness. However, large-scale UAV clusters face challenges such as dynamic security threats, heterogeneous data fusion difficulties, and resource-constrained decision-making delays. Traditional single-machine intelligent architectures have limitations when addressing new threats, such as insufficient real-time response capabilities. To address these issues, this paper presnts an LLM-layered collaborative security architecture (LLM-LCSA) for multimachine collaborative security. This architecture optimizes the spatiotemporal fusion efficiency of multisource asynchronous data through cloud–edge–end collaborative deployment, combining an end lightweight LLM, an edge medium LLM, and a cloud-based foundation LLM. Additionally, a Mixture of Experts (MoEs) intelligent algorithm that dynamically activates the most relevant expert models by leveraging a threat–expert association matrix is introduced, thereby increasing the accuracy of complex threat identification and dynamic adaptability. Moreover, a resource-aware multi-objective optimization model is constructed to generate optimal decisions under resource constraints. Simulation results indicate that compared with traditional methods, LLM-LCSA achieves an average 7.92% improvement in the threat detection accuracy, reduces the system’s total response time by 44.52%, and enables resource scheduling during off-peak periods. This architecture provides an efficient, intelligent, and scalable solution for secure collaboration among UAV swarms. Future research should further explore its application potential in 6G network integration and large-scale swarm environments.
In recent years, the growing development of Connected Autonomous Vehicles (CAV), Intelligent Transport Systems (ITS), and 5G communication networks have led to the advent of Autonomous Intersection Management (AIM) systems. AIMs present a new paradigm for CAV control in future cities, taking control of CAVs in scenarios where cooperation is necessary and allowing safe and efficient traffic flows, eliminating traffic signals. So far, the development of AIM algorithms has been based on basic control algorithms, without the ability to adapt or keep learning new situations. To solve this, in this paper we present a new advanced AIM approach based on end-to-end Multi-Agent Deep Reinforcement Learning (MADRL) and trained using Curriculum through Self-Play, called advanced Reinforced AIM (adv.RAIM). adv.RAIM enables the control of CAVs at intersections in a collaborative way, autonomously learning complex real-life traffic dynamics. In addition, adv.RAIM provides a new way to build smarter AIMs capable of proactively controlling CAVs in other highly complex scenarios. Results show remarkable improvements when compared to traffic light control techniques (reducing travel time by 59% or reducing time lost due to congestion by 95%), as well as outperforming other recently proposed AIMs (reducing waiting time by 56%), highlighting the advantages of using MADRL.
The purpose of signal control is to allocate time for competing traffic flows to ensure safety. Artificial intelligence has made transportation researchers more interested in adaptive traffic signal control, and recent literature confirms that deep reinforcement learning (DRL) can be effectively applied to adaptive traffic signal control. Deep neural networks enhance the learning potential of reinforcement learning. This study applies the DRL method, Double Deep Q‐Network, to train local agents. Each local agent learns independently to accommodate the regional traffic flows and dynamics. After completing the learning, a global agent is created to integrate and unify the action policies selected by each local agent to achieve the purpose of traffic signal coordination. Traffic flow conditions are simulated through the simulation of urban mobility. The benefits of the proposed approach include improving the efficiency of intersections and minimizing the overall average waiting time of vehicles. The proposed multi‐agent reinforcement learning model significantly improves the average vehicle waiting time and queue length compared with the results from PASSER‐V and pre‐timed signal setting strategies.
To solve a decentralized radio resource management problem in a 5G vehicular network, we propose a novel resource allocation algorithm based on a multiagent deep reinforcement learning (MARL). We let each vehicle act as an individual agent that can select a unique combination of transport block (TB) and transmission power to broadcast periodic packets. Agent explores the environment and collects observations that later will be used to find the best combination of TB and transmission power. We apply an actor-critic reinforcement learning technique to choose optimal TB for each agent. To eliminate a nonstationarity in a multiagent setting, we utilize a centralized training that allows all agents to share their observations over critic networks. The information shared through critic network can assist each agent to learn the policies of other agents. In a decentralized execution, each agent may only use its actor network and local observation to find the most appropriate TB in the given level of transmission power. While training, the actions taken by actor are evaluated by corresponding critic that maps Q-value for all feasible actions in the given state. Our method results in 18% higher packet reception ratio than a spectrum allocation scheme based on a double DQN. The proposed method achieves 33% higher reward than the previous state-of-the-art that is also based on MARL.
Addressing the complexity of collaborative decision-making within heterogeneous UAV swarms in dynamic scenarios, and the difficulty in understanding the overall mission, this paper presents the AM-Qmix algorithm. The algorithm incorporates the prioritized multi-pool of experience replay approach into deep reinforcement learning for heterogeneous multi-agent systems, enhancing the learning capabilities of the UAV swarm. Additionally, through local behavioral guidance strategies, the algorithm improves UAVs' understanding and execution efficiency for specific tasks, thereby increasing the collaborative decision-making capacity of the entire swarm. Simulation experiments conducted on collaborative material transport tasks with heterogeneous UAV swarms have demonstrated the superiority of our algorithm in resolving issues of collaborative decision-making among heterogeneous UAV swarms.
In this paper, we explore a multi-agent reinforcement learning approach to address the design problem of communication and control strategies for multi-agent cooperative transport. Typical end-to-end deep neural network policies may be insufficient for covering communication and control; these methods cannot decide the timing of communication and can only work with fixed-rate communications. Therefore, our framework exploits event-triggered architecture, namely, a feedback controller that computes the communication input and a triggering mechanism that determines when the input has to be updated again. Such event-triggered control policies are efficiently optimized using a multi-agent deep deterministic policy gradient. We confirmed that our approach could balance the transport performance and communication savings through numerical simulations.
In recent years, multi‐agent reinforcement learning (MARL) has been increasingly applied in training cooperative decision models for connected autonomous vehicles (CAVs). Despite the success they have demonstrated, they are bound to inherit issues that deep learning models suffer, such as vulnerability to adversarial attacks which is the focus of this study. Consequently, this paper aims to assess and enhance the robustness of MARL‐trained cooperative policies used by CAVs, in terms of their resilience to adversarial behavior encountered during deployment. First, a specific existing cooperative policy was identified to be the victim policy, deployed in an on‐ramp merging road scenario. Second, two adversarial policies, namely collision adversary () and speed adversary (), were developed and trained to disrupt the performance of the victim policy. The adversarial policies significantly impacted the victim policy, increasing the collision rate to 62% and decreasing the average speed from 25 m/s to 21.73 m/s. Finally, several adversarial training approaches were developed, producing more robust cooperative policies against adversarial scenarios, by significantly bolstering road safety in adversarial conditions. The collision rate was cut by half against , whereas, 0% collision scored in the face of .
No abstract available
With the advent of cooperative intelligent transport systems (C-ITS) and vehicle-to-everything (V2X) communications, cooperative positioning based on V2X sharing of location information has been emerging as a promising augmentation system for conventional satellite navigation. An example is implicit cooperative positioning (ICP) which relies on Bayesian filtering for cooperative sensing of targets that are used as reference points for improving vehicle positioning. ICP methods, however, rely on pre-determined models which makes them sub-optimal in case of non-Gaussian non-linear models or complex cooperation graphs. To address these limitations, the paper proposes a decentralized-partially observable Markov decision process (Dec-POMDP) framework, paired with deep multi-agent reinforcement learning (MARL) algorithms. We introduce a novel ICP-multi-agent proximal policy optimization (MAPPO) algorithm where distributed agents (i.e., vehicles) dynamically activate/deactivate the radio links for cooperation with the neighbors to optimize the communication efficiency, still guaranteeing accurate positioning. We reproduce a realistic C-ITS scenario with CARLA simulator, where vehicles move according to real-world dynamics and communicate with each other to cooperatively sense their locations. Results show that the proposed ICP-MAPPO algorithm, with its dynamic-decentralized-execution and centralized-training schemes, outperforms state-of-the-art ICP methods by 21% in terms of positioning accuracy, and it can reduce the communication overhead by following the optimal learned policy.
As travel demand increases and urban traffic condition becomes more complicated, applying multi-agent deep reinforcement learning (MARL) to traffic signal control becomes one of the hot topics. The rise of Reinforcement Learning (RL) has opened up opportunities for solving Adaptive Traffic Signal Control (ATSC) in complex urban traffic networks, and deep neural networks have further enhanced their ability to handle complex data. Traditional research in traffic signal control is based on the centralized Reinforcement Learning technique. However, in a large-scale road network, centralized RL is infeasible because of an exponential growth of joint state-action space. In this paper, we propose a Friend-Deep Q-network (Friend-DQN) approach for multiple traffic signal control in urban networks, which is based on an agent-cooperation scheme. In particular, the cooperation between multiple agents can reduce the state-action space and thus speed up the convergence. We use SUMO (Simulation of Urban Transport) platform to evaluate the performance of Friend-DQN model, and show its feasibility and superiority over other existing methods.
Deep reinforcement learning (DRL) has seen re-markable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to form a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to a goal location within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.
Transit Signal Priority (TSP) has been widely used for reducing transit delays for decades. Since reliability is valued equally as travel time, a dual-objective coordinated (DC) TSP is developed to adaptively optimize transit headway adherence and travel time simultaneously over consecutive intersections. This is the first attempt at using a centralized agent deep reinforcement learning (RL) framework in solving a coordinated TSP optimization problem. Decentralized control algorithms using multi-agent RL are also developed as baseline scenarios. TSP algorithms are trained and tested in a stochastic microsimulation environment within Aimsun Next for a corridor segment in Toronto with a transit line experiencing high service variability. DC TSP demonstrates a clear promise in reducing headway variability and travel time at different traffic levels. It highlights the importance of coordinating TSP actions at consecutive intersections. It is also shown to be robust, providing effective control under various configurations of bus stop locations.
—In this paper, we present a solution to a design problem of control strategies for multi-agent cooperative transport. Although existing learning-based methods assume that the number of agents is the same as that in the training environment, the number might differ in reality considering that the robots’ batteries may completely discharge, or additional robots may be introduced to reduce the time required to complete a task. Therefore, it is crucial that the learned strategy be applicable to scenarios wherein the number of agents differs from that in the training environment. In this paper, we propose a novel multi-agent reinforcement learning framework of event-triggered communication and consensus-based control for distributed cooperative transport. The pro- posed policy model estimates the resultant force and torque in a consensus manner using the estimates of the resultant force and torque with the neighborhood agents. Moreover, it computes the control and communication inputs to determine when to communicate with the neighboring agents under local observations and estimates of the resultant force and torque. Therefore, the proposed framework can balance the control performance and communication savings in scenarios wherein the number of agents differs from that in the training environment. We confirm the effectiveness of our approach by using a maximum of eight and six robots in the simulations and experiments, respectively.
In recent years, cooperative transport using autonomous mobile robots has attracted attention in multi-agent systems which control multiple agents simultaneously. Although many cooperative transport methods have been studied, they have been limited to simple transport tasks. In particular, it cannot cope with the complex situations in which the target transport formation changes with many robots. High-speed calculation for multi-agent using reinforcement learning is required for these situations to realize flexible and efficient transformation. However, it is known that it takes time to learn the robot’s trajectory when changing the formation using reinforcement learning. For example, MADDPG, which is an Actor-Critic method, Deep-Dyna Q with a model-based algorithm and an improved model Dyna-MADDPG algorithm, which is specialized for multi-agent systems for formation, were proposed. In this paper, a GASIL-MADDPG method is proposed, which simultaneously applies model-based learning and imitation learning to MADDPG for fast learning. As a result, the learning time was 50% faster than MADDPG and 37% faster than Dyna-MADDPG by using the proposed method.
This study proposes a multi-objective optimization-based framework for task allocation and path planning to address the challenges faced by multi-robot systems in transport-oriented task environments. The framework considers robot capability heterogeneity and load capacity, aiming to minimize task execution time and overall system energy consumption. A hierarchical training architecture divides the process into two stages: the upper layer uses the NSGA-II algorithm to estimate the cost of task allocation strategies and construct a multi-objective solution space, allowing decision-makers to select suitable solutions based on optimization preferences or practical constraints. The lower layer leverages deep neural networks and reinforcement learning to perform multi-agent learning and generate collisionfree paths. This architecture supports solution selection based on varying optimization priorities, enabling capability-oriented task distribution aligned with system needs. Simulation and experimental results show that the proposed method effectively handles complex scenarios with task dependencies, improves path learning efficiency and task competition rates, and, while maintaining single-objective solution quality, offers flexible, interpretable options—demonstrating strong applicability and scalability in real-world applications.
In the modern era, smart traffic control in urban transport systems has gained significant attention because of the growing safety concerns, congestion, and fuel wastage in urban regions. However, traditional rule-based or fixed-time signal systems fail to adjust rapidly, altering traffic dynamics and leading to inefficiencies. Recent studies, such as Multi-Agent Reinforcement Learning with Deep Q-Network (MARL-DQN)based cooperative traffic control, have shown promise in decentralized decision making. However, this approach faces challenges that restrict its real-world applicability, spatial awareness, unstable learning in dense networks, and singleobjective optimization. Therefore, to overcome these challenges, a Graph-Aware Multi-Agent Proximal Policy Optimization (GAM-PPO) model is proposed. First, traffic data queue lengths, comprising vehicle flow and signal phases are collected from the Simulation of Urban Mobility (SUMO)-based urban simulation. Furthermore, the traffic network is formulated as a graph, where roads are edges and intersections are nodes that allow Graph Neural Networks (GNNs) to capture spatial dependencies. Moreover, each intersection is measured by an independent Proximal Policy Optimization (PPO) agent that supports the enhancement of the signal phases. Furthermore, the multi-objective reward function integrates safety, traffic delay, and emissions to confirm balanced learning. Therefore, the experimental results validate that GAM-PPO condenses the Average Waiting Time (AWT) to 190 s, boosting throughput to 1150 vehicles/h, and significantly outperforming the existing MARL-DQN approach.
The shift towards transportation electrification, marked by the rising use of electric vehicles (EVs) and the development of fast charging stations (FCS), plays a crucial role in transport decarbonization initiatives. To optimize the rollout of FCS and set appropriate charging service fees (CSF)-a process referred to as the coupled FCS multi-stage bi-level operation problem (FCS-MBOP)-is essential for improving both investment and operational efficiency within the integrated power distribution and transportation network (CPTN). For operators, it's not only necessary to adapt to short-term fluctuations within the environment but also to swiftly respond to changes in the FCS layout resulting from various long-term investment decisions. To address this complexity, we introduce a dual-timescale evolutionary assist deep reinforcement learning framework, which includes two specialized agents with distinct functions: an investment agent (planner) and an operational agent (operator). The planner focuses on annual investments, evolving long-term strategies that weigh social benefits against investment costs through the use of a genetic algorithm (GA). In contrast, the operator acts on an hourly basis, fine-tuning CSF to alleviate traffic congestion and minimize the social costs, while taking into account the planner's feasible investment decisions. Leveraging the integrated capabilities of a graph neural network (GNN), long-short-term memory (LSTM), and attention mechanisms, our framework's agents are adept at extracting both temporal and spatial features and facilitating the transfer of experiences across different investment stages. Empirical evidence underscores the effectiveness of our approach, showcasing its ability to surpass conventional methodologies in delivering high-quality solutions.
Traffic flow optimization in urban environments remains one of the key challenges of modern research, even despite the significant volume of scientific works devoted to this topic. Despite the achievements, this problem still does not have a universal solution that would work effectively in real-world scenarios. One of the main difficulties is the processing of a large array of input data, in particular, traffic data, which constantly comes from sensors installed throughout the urban road network. Traditionally, due to the scale of the task, researchers have focused on the development of systems with localized agents. Such agents usually manage traffic at individual intersections, while their coordination is carried out within the framework of multi-stream agent systems. However, modern approaches take into account the volume and complexity of input data through the use of deep learning methods. In particular, the use of the deep deterministic policy gradient (DDPG) algorithm is proposed, on the basis of which large input data can be processed. As part of the experimental study, a simple intersection model was tested to verify the effectiveness of the approach. The DDPG algorithm performed better in the simple model compared to Q-learning. DDPG provided a reward in the range of 4-4.3 points, while the reward of Q-learning was in the range of 2-4 points. To evaluate the performance of the DDPG approach compared to Q-learning and random timings, the main criterion is the average reward per episode. DDPG and Q-learning achieve similar reward levels, but DDPG shows stable convergence (0.04-0.21 points), while Q-learning remains unstable (0.04-0.43 points). The study of intra-episode performance shows that DDPG achieves improvements mainly closer to the end of the episode. Overall, this algorithm has proven successful for this scenario, and the results obtained can serve as a basis for further improvements and applications in more complex traffic scenarios.
This paper presents a multi-agent reinforcement learning algorithm to represent strategic bidding behavior by carriers and shippers in freight transport markets. We investigate whether feasible market equilibriums arise without central control or communication between agents. Observed behavior in such environments serves as a stepping stone towards self-organizing logistics systems like the Physical Internet, while also offering valuable insights for the design of contemporary transport brokerage platforms. We model an agent-based environment in which shipper and carrier actively learn bidding strategies using policy gradient methods, posing bid- and ask prices at the individual container level. Both agents aim to learn the best response given the expected behavior of the opposing agent. Inspired by financial markets, a neutral broker allocates jobs based on bid-ask spreads. Our game-theoretical analysis and numerical experiments focus on behavioral insights. To evaluate system performance, we measure adherence to Nash equilibria, fairness of reward division and utilization of transport capacity. We observe good performance both in predictable, deterministic settings (∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} 95% adherence to Nash equilibria) and highly stochastic environments (∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} 85% adherence). Risk-seeking behavior may increase an agent’s reward share, yet overly aggressive strategies destabilize the system. The results suggest a potential for full automation and decentralization of freight transport markets. These insights ease the design of real-world market platforms, suggesting an innate tendency of markets to reach equilibria without behavioral models, information sharing or explicit incentives.
Designing well-functioning and fair transport networks is not a trivial task, given the large space of solutions and constraints one must satisfy. Moreover, different spatial segregation sources can render some transportation network interventions unfair to specific groups. It is thereby crucial to optimize the transportation system while mitigating the disproportional benefits it can lead to. In this paper, we explore the trade-off between efficiency and fairness in the Transport Network Design Problem (TNDP), via the use of Deep Reinforcement Learning (Deep RL). We formulate different fairness definitions as reward functions - inspired by Equal Sharing of Benefits, Narrowing the Gap, and Rawl's justice theory. We apply our method to Amsterdam (The Netherlands) and Xi'an (China) and show that vanilla Deep RL can lead to biased outcomes. By considering different fair rewards, however, we can shed light on possible compromises between fairness and efficiency in the TNDP.
Abstract Future 6th Generation (6G) networks will rely on Terahertz (THz) wireless communication as their main enabler for delivering both ultra-high data speed and minimal delay. THz wireless systems become crucial for upcoming communications by using Unmanned Aerial Vehicles (UAVs) together with Intelligent Reflecting Surfaces (IRS) while improving reliability and efficiency. In UAV-IRS-assisted networks, minimizing mission completion time and energy consumption is critical. However, achieving rapid mission execution often requires UAVs to operate at higher speeds, increasing energy usage and creating a trade-off that demands optimization. This paper addresses the challenge of optimizing UAV-IRS trajectories in THz networks to reduce mission time while adhering to energy constraints. Given the non-convex and NP-hard nature of the problem, traditional optimization methods are insufficient. To tackle this, we propose a Multi-Agent Deep Reinforcement Learning (MADRL) algorithm, which provides an efficient, low-complexity solution for trajectory optimization. MADRL dynamically adapts UAV-IRS paths, balancing mission efficiency and energy savings. Simulation results demonstrate that the proposed MADRL-based approach outperforms existing benchmarks, achieving shorter mission times and near-optimal energy consumption across varying scenarios. By leveraging cooperative learning, the algorithm effectively handles complex environments with multiple users and IRS elements. This work highlights the potential of MADRL for UAV-IRS trajectory optimization, offering a scalable solution for energy-efficient and high-performance THz communication systems.
Connected and Autonomous Vehicles (CAVs) are an emerging solution to the issues of safe and sustainable transportation systems in the future. One major transport technology for CAVs is Cooperative Adaptive Cruise Control (CACC), for which unsignalized autonomous intersection crossing is a growing use case. CACC relies heavily on inter-vehicular communication and is thus vulnerable to message forgery and jamming attacks. Most solutions for CACC focus exclusively on enhancing efficiency or security but do not offer an integrated framework for achieving both on a large scale. In this paper, we propose a Blockchain-integrated Multi-Agent Deep Reinforcement Learning (Block-MADRL) architecture for enhancing the efficiency of CACC while cooperatively detecting attacks, reducing the fuel efficiency of identified attackers and securely notifying the overall network. Our approach uses multi-agent deep reinforcement learning to find fuel and throughput optimizing solutions for CACC and a cooperative verification mechanism based on Extended Isolation Forest (EIF) for attack detection. Attacker data is securely stored in a Road Side Unit (RSU) level blockchain, and we design a low-latency, high throughput consensus protocol for speedy and secure data dissemination. Simulation results indicate over 29.5% better lane throughput with our approach during acceleration forgery attack, up to 23% induced reduction in fuel efficiency of malicious vehicles, 17.6% higher blockchain throughput through our consensus protocol and over 8% improvement in attack detection rate compared to the state-of-the-art.
The emerging concepts of Urban Air Mobility (UAM) and Advanced Air Mobility (AAM) open a new paradigm for urban air transportation. A big challenge is that these new aerial vehicles will quickly saturate the already crowded aviation spectrum, which is an essential resource to ensure reliable communications for safe operations. In this paper, we consider an air transportation system where multiple aerial vehicles are operated to transport passengers or cargo from different sources to destinations along their pre-defined paths. During the flight, the minimum communication Quality of Service (QoS) requirement must be achieved to ensure flight safety. Our objective is to minimize the average mission completion time by jointly optimizing the velocity selection and spectrum allocation for all aerial vehicles. We formulate the optimization problem as a multi-stage Markov Decision Process (MDP) where the optimization variables are coupled together. A multi-agent Deep Reinforcement Learning (DRL) based solution is proposed where Value Decomposition Networks (VDN) algorithm is utilized to take discrete actions. Additionally, we propose a heuristic greedy algorithm as a baseline solution. Simulation results show that our learning based solution outperforms the heuristic greedy algorithm and another Orthogonal Multiple Access (OMA) solution in minimizing the mission completion time.
Autonomous driving promises to transform road transport. Multi-vehicle and multi-lane scenarios, however, present unique challenges due to constrained navigation and unpredictable vehicle interactions. Learning-based methods-such as deep reinforcement learning-are emerging as a promising approach to automatically design intelligent driving policies that can cope with these challenges. Yet, the process of safely learning multi-vehicle driving behaviours is hard: while collisions-and their near-avoidance-are essential to the learning process, directly executing immature policies on autonomous vehicles raises considerable safety concerns. In this article, we present a safe and efficient framework that enables the learning of driving policies for autonomous vehicles operating in a shared workspace, where the absence of collisions cannot be guaranteed. Key to our learning procedure is a sim2real approach that uses real-world online policy adaptation in a mixed reality setup, where other vehicles and static obstacles exist in the virtual domain. This allows us to perform safe learning by simulating (and learning from) collisions between the learning agent(s) and other objects in virtual reality. Our results demonstrate that, after only a few runs in mixed reality, collisions are significantly reduced.
The development of autonomous swarm behavior for UAV swarms has increased significantly in recent years. Many applications such as collective transport, exploration of unknown territory or target search and delivery benefit from the flexibility, scalability and robustness of the swarm approach. Besides new application possibilities, these characteristics might also be used for malicious or dangerous purposes like autonomous target-oriented attacks. To date, research is lacking intelligent countermeasures to intervene in attacking UAV swarms. Typical defense mechanisms employ attack-defense confrontation which increases the risk of collateral damage as drones might fall from the sky. Rather than creating a confrontation, we focus on developing countermeasures to intelligently mislead or delay attacks on a target. Therefore, we explore two multi-agent deep reinforcement learning strategies for defender UAVs to intervene in target-oriented attacks of intelligent UAV swarms. Both strategies are based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm and aim at preventing or at least delaying attacks. Via simulations we model and evaluate the performance of both methods and compare it to a baseline approach.
Intelligent Transport System (ITS) provides an efficient solution to road safety traffic. To support safety applications, cellular vehicle-to-everything (C-V2X) is developed by third generation partnership project (3GPP). C-V2X support two modes of communication as mode 3 and mode 4. In mode 4, vehicles reserve the resources based on their local observations using semi-persistent scheduling (SPS). If two vehicles, simultaneously select the same resources, it will lead to resource contention. This arises the consensus problem. To overcome this, in this paper we proposed the multi agent collaborative deep reinforcement learning based scheme. A single deep Q network (DQN) is trained for each zone. Each zone is preconfigured with resources which constitute a resource pool. A reward function is shared between the vehicles that belong to the same pool. This approach makes the vehicles to collaborate rather than compete in selecting the resources for their transmission. The proposed scheme is compared with the random resource allocation in C-V2X. The results show that the proposed scheme outperforms even in dense vehicular environment.
Nowadays, academic research, disaster mitigation, industry, and transportation apply the cooperative multi-agent concept. A cooperative multi-agent system is a multi-agent system that works together to solve problems or maximise utility. The essential marks of formation control are how the multiple agents can reach the desired point while maintaining their position in the formation based on the dynamic conditions and environment. A cooperative multi-agent system closely relates to the formation change issue. It is necessary to change the arrangement of multiple agents according to the environmental conditions, such as when avoiding obstacles, applying different sizes and shapes of tracks, and moving different sizes and shapes of transport objects. Reinforcement learning is a good method to apply in a formation change environment. On the other hand, the complex formation control process requires a long learning time. This paper proposed using the Deep Dyna-Q algorithm to speed up the learning process while improving the formation achievement rate by tuning the parameters of the Deep Dyna-Q algorithm. Even though the Deep Dyna-Q algorithm has been used in many applications, it has not been applied in an actual experiment. The contribution of this paper is the application of the Deep Dyna-Q algorithm in formation control in both simulations and actual experiments. This study successfully implements the proposed method and investigates formation control in simulations and actual experiments. In the actual experiments, the Nexus robot with a robot operating system (ROS) was used. To confirm the communication between the PC and robots, camera processing, and motor controller, the velocities from the simulation were directly given to the robots. The simulations could give the same goal points as the actual experiments, so the simulation results approach the actual experimental results. The discount rate and learning rate values affected the formation change achievement rate, collision number among agents, and collisions between agents and transport objects. For learning rate comparison, DDQ (0.01) consistently outperformed DQN. DQN obtained the maximum −170 reward in about 130,000 episodes, while DDQ (0.01) could achieve this value in 58,000 episodes and achieved a maximum −160 reward. The application of an MEC (model error compensator) in the actual experiment successfully reduced the error movement of the robots so that the robots could produce the formation change appropriately.
Autonomous Campus Shuttles (ACS) provide an eco friendly solution for short-distance transport, but they need to function in dynamic environments with random pedestrian movement, route blockages, and varying passenger demand. This study presents an adaptive decision-making system based on Deep Reinforcement Learning (DRL), using the Deep Deterministic Policy Gradient (DDPG) algorithm, to facilitate real-time navigation and route optimisation. The system processes multi modal sensor inputs, shuttle states, and contextual traffic features to generate safe, efficient, and responsive decisions. A custom reward function causes the agent to balance efficiency, safety, and passenger satisfaction. The algorithm is trained in a simulated University campus environment and evaluated on a scaled physical model named AB-POD2. Results show that the DRL solution outperforms rule-based and supervised baselines in adaptability, route optimisation, and robustness to environmental variation with 94.5% accuracy and peak performance within 20 trials, demonstrating efficient functionality. This study demonstrates the feasibility of DRL for context-aware decision making in autonomous campus shuttles and its potential for urban micro mobility use cases at large.
As cities grow, traffic congestion increases, necessitating efficient and reliable multimodal transportation networks. Lack of real-time prediction and optimization hindered transportation management. Advanced machine learning is used to create a multimodal transportation model with real-time, high-accuracy forecasts and adaptive optimization. Start using Cross-Modal Joint Embedding Networks (CJEN) to merge data sources into a cohesive feature space. A network of ST-GCNs captures spatial and temporal dependency needed to characterize urban transport complexity. TCN and GRU enable robust shortterm forecasting across all transport modalities. We optimize dynamic routes based on real-time conditions using Deep Deterministic Policy Gradient (DDPG) in a multi-agent reinforcement learning framework to maximize flexibility. Timestamp Varying Dynamic Bayesian Networks (TVDBNs) refine results and reduce travel timestamp and congestion forecasting uncertainty using probability-based forecasting. This complete approach improves forecast accuracy by 89 $\mathbf{9 2 \%}$, optimizes journey durations by $\mathbf{1 5 - 2 0 \%}$, and decreases expenses by 30 % through transfer learning. We improve urban transportation by providing actionable and trustworthy predictions and adaptive route improvements for sustainable and efficient multimodal networks.
This study explores how the integration of generative artificial intelligence, multi-objective evolutionary optimization, and reinforcement learning can enable sustainable and cost-effective decision-making in supply chain strategy. Using real-world retail demand data enriched with synthetic sustainability attributes, we trained a Variational Autoencoder (VAE) to generate plausible future demand scenarios. These were used to seed a Non-Dominated Sorting Genetic Algorithm (NSGA-II) aimed at identifying Pareto-optimal sourcing strategies that balance delivery cost and CO2 emissions. The resulting Pareto frontier revealed favorable trade-offs, enabling up to 50% emission reductions for only a 10–15% cost increase. We further deployed a deep Q-learning (DQN) agent to dynamically manage weekly shipments under a selected balanced strategy. The reinforcement learning policy achieved an additional 10% emission reduction by adaptively switching between green and conventional transport modes in response to demand and carbon pricing. Importantly, the agent also demonstrated resilience during simulated supply disruptions by rerouting decisions in real time. This research contributes a novel AI-based decision architecture that combines generative modeling, evolutionary search, and adaptive control to support sustainability in complex and uncertain supply chains.
: In this paper, we use deep reinforcement learning to enable connected and automated vehicles (CAVs) to drive in a intersection with human-driven vehicles. The multi-agent deep deterministic policy gradient (MADDPG) algorithm is improved to be more efficient for data processing, so that it can solve the problem of learning bottlenecks in complex environments, and use sliding control to execute control strategies. Finally, the feasibility of the method is verified in the simulation environment of CARLA.
The fast-paced growth of electric vehicles (EVs) in modern cities has driven up the demand for optimally synchronizing charging to reduce grid stress, and enhance user experience, and sustainability. Intelligent Urban Sustainable Mobility Ecosystems (IUSME) inherently needs intelligent AI solutions that can monitor energy use, accommodate renewable power, and ensure continuous traffic flow in real-time. Traditional EV charging systems based on localized optimization or static scheduling can lead to longer waiting time for charging stations, insufficient use of renewable energy, and congestion at peak load. These inefficiencies have both an immediate context of the energy systems in an urban environment and the scalability of smart city sustainable transport networks. In this proposed work, we introduce AIO-ptimized Electric Vehicle Charging Coordination (AIO-EVCC), an adaptive multi-agent deep reinforcement learning system that synchronizes charging schedules in real time. This implies syncing electric vehicle charging in the most optimized manner through AI. AIO-EVCC utilizes predictive analytics to even better predict demand in the future, simulate how vehicles behave, and continue to consider the renewable energy capacity. The algorithm employs a hierarchical decision-making architecture that marries centralized grid optimization with decentralized vehicle-level learning to reduce the cost of energy, minimize waiting time, and even out the load on the grid. AIO-EVCC reduces peak demand by 27%, raises the utilization of renewable energy by 22%, and reduces the average wait time by 35% compared to regular scheduling strategies. These results were obtained after running tests on a model of a smart city grid. The system is highly robust and can expand, even when demand is not predictable and renewable sources are not consistently present. The proposed AIO-EVCC algorithm significantly improves performance across key parameters, achieving up to 33% reduction in peak grid load, 85% renewable energy utilization, 8-10 minutes average user waiting time, and 0.18 $/kWh energy cost, outperforming conventional methods. These results highlight its effectiveness in optimizing EV charging schedules, enhancing grid stability, reducing operational costs, and maximizing user satisfaction in intelligent urban mobility ecosystems. As a result, the AIO-EVCC architecture makes charging stations for electric vehicles in urban areas more reliable and effective in the long term. Smart coordination helps enable future smart cities, in which sustainability, energy efficiency, and mobility function harmoniously together.
Amidst the rapid progression of the road transport industry, the safety and efficiency of heavy-vehicle platoons have garnered significant attention. The study tackles the challenge of obstacle avoidance presented by vehicles owing to their considerable mass, delayed response times, and line-of-sight impediments, by introducing a cooperative obstacle avoidance system for heavy-vehicle platoons based on deep reinforcement learning. The system comprises three primary modules: perception, decision-making, and control. Initially, the perception module acquires real-time environmental data. Subsequently, the decision-making module formulates obstacle avoidance decisions based on the acquired data. Specifically, it implements a two-stage braking obstacle avoidance strategy under low collision risk scenarios, while employing a fifth-degree polynomial for planning and tracking obstacle avoidance paths under high collision risk conditions suitable for steering maneuvers. The control module utilizes the local multi-agent deep deterministic policy gradient (LADDPG) algorithm to train the heavy-vehicle platoon agents, ensuring the formation’s maintenance while mitigating collisions with other vehicles and obstacles. The effectiveness of the proposed system is substantiated through simulation experiments, demonstrating its adaptability to various traffic conditions, selection of suitable obstacle avoidance strategies, and significant enhancement of obstacle avoidance performance and heavy-vehicle platoon stability.
No abstract available
Individual mobility prediction plays a key role in urban transport, enabling personalized service recommendations and effective travel management. It is widely modeled by data-driven methods such as machine learning, deep learning, as well as classical econometric methods to capture key features of mobility patterns. However, such methods are hindered in promoting further transferability and robustness due to limited capacity to learn mobility patterns from different data sources, predict in out-of-distribution settings (a.k.a ``zero-shot"). To address this challenge, this paper introduces MoBLLM, a foundational model for individual mobility prediction that aims to learn a shared and transferable representation of mobility behavior across heterogeneous data sources. Based on a lightweight open-source large language model (LLM), MoBLLM employs Parameter-Efficient Fine-Tuning (PEFT) techniques to create a cost-effective training pipeline, avoiding the need for large-scale GPU clusters while maintaining strong performance. We conduct extensive experiments on six real-world mobility datasets to evaluate its accuracy, robustness, and transferability across varying temporal scales (years), spatial contexts (cities), and situational conditions (e.g., disruptions and interventions). MoBLLM achieves the best F1 score and accuracy across all datasets compared with state-of-the-art deep learning models and shows better transferability and cost efficiency than commercial LLMs. Further experiments reveal its robustness under network changes, policy interventions, special events, and incidents. These results indicate that MoBLLM provides a generalizable modeling foundation for individual mobility behavior, enabling more reliable and adaptive personalized information services for transportation management.
Human mobility generation plays a critical role in urban transportation planning. Existing human mobility generation models often fall short of understanding travelers' demographics and integrating multimodal information, including activity purposes, destination choices and transport mode preferences. Recently, mobility generation models leveraging Large Language Models (LLMs) have gained significant attention, while they are limited in directly reproducing spatial information in human mobility profiles. To address these challenges, this paper proposes the Mobility Generative Language Model (MobGLM), a novel approach for generating synthetic human mobility data to support urban planning, transport management, energy consumption and epidemic control. MobGLM addresses these limitations by capturing the complex relationships between agents' mobility patterns and individual demographics. By incorporating personal information, activity types, locations and traffic modes as encoders, MobGLM uniquely identifies and replicates features of human mobility. Our framework is evaluated using a large, real-world mobility dataset and benchmarked against state-of-the-art personal mobility generation techniques. The results demonstrate the effectiveness of MobGLM in producing accurate and reliable synthetic mobility data, highlighting its potential applications in various urban mobility contexts.
Understanding and classifying mobility modes, such as walking, cycling, driving, or public transport, is essential for sustainable urban planning and mobility behavior analysis. Traditional approaches rely on handcrafted features and machine learning models trained on GPS trajectory data. However, these methods require extensive data preparation and model training. In this work, we explore the potential of large language models (LLMs) as zero-shot predictors for transportation mode classification, eliminating the need for training data altogether. We propose a pipeline that transforms enriched trajectory segments into textual prompts, enabling LLMs to perform classification without task-specific pretraining. We benchmark the performance of a locally distilled 32B parameter LLM (DeepSeek Gwen) against standard machine learning baselines on the Geolife dataset. Preliminary results demonstrate that LLMs effectively capture semantic and contextual cues from trajectory-derived features, highlighting their promise for rapid, data-efficient transportation mode classification. Our work provides novel insights into leveraging LLMs in the mobility domain and identifies future opportunities for their integration.
The development of urban air mobility (UAM) systems requires scalable, regulation-aware planning of low-altitude airspace and supporting infrastructure. This study proposes an end-to-end framework for the design, simulation, and iterative optimization of a structured UAM corridor over Brasilia’s central road axis (Eixão-UAM), aligned with the Brazilian unmanned aircraft traffic management (BR-UTM) ecosystem. In addition, this study proposes a multilayered aerial configuration stratified by unmanned aerial vehicle class, supported by a modular ground infrastructure composed of vertihubs, vertiports, and vertistops. A takeoff-scheduling simulator is developed to evaluate platform allocation strategies under realistic traffic and weather conditions. Initial experiments compare a round-robin (RR) baseline with a genetic algorithm (GA), and results reveal that RR outperforms GA v1 in terms of the average waiting time. To address this gap, a large language model (LLM) assisted optimization loop is implemented using GPT-4o Mini and Gemini 2.5 Pro. The LLMs act as reasoning partners, supporting the root-cause diagnoses, fitness function redesign, and rapid prototyping of five GA variants. Among these, GA v5 achieves a 59.62% reduction in maximum waiting time and an approximately 10% reduction in average waiting time over GA v1, thereby approaching the robustness of RR. In contrast, GA v2–v4 and GA v6 perform less consistently, showing an importance of fitness function design. These results underscore the role of an iterative, LLM-guided development in enhancing classical optimization, demonstrating that generative artificial intelligence (AI) can contribute to simulation acceleration and the cocreation of operational logic. The proposed method provides a replicable blueprint for integrating LLMs into early-stage UAM planning, offering both theoretical insights and architectural guidance for future low-altitude airspace systems.
Urban Air Mobility (UAM) is an emerging System of System (SoS) that faces challenges in system architecture, planning, task management, and execution. Traditional architectural approaches struggle with scalability, adaptability, and seamless resource integration within dynamic and complex environments. This paper presents an intelligent holonic architecture that incorporates Large Language Model (LLM) to manage the complexities of UAM. Holons function semi-autonomously, allowing for real-time coordination among air taxis, ground transport, and vertiports. LLMs process natural language inputs, generate adaptive plans, and manage disruptions such as weather changes or airspace closures. Through a case study of multimodal transportation with electric scooters and air taxis, we demonstrate how this architecture enables dynamic resource allocation, real-time replanning, and autonomous adaptation without centralized control, creating more resilient and efficient urban transportation networks. By advancing decentralized control and AI-driven adaptability, this work lays the groundwork for resilient, human-centric UAM ecosystems, with future efforts targeting hybrid AI integration and real-world validation.
The vigorous development of urban air mobility (UAM) is reshaping the urban travel landscape, but it also poses severe challenges to the safe and efficient operation of dense and complex airspace. Potential conflicts between flight plans have become a core bottleneck restricting its development. Traditional flight plan adjustment and management methods often rely on deterministic trajectory predictions, ignoring the inherent temporal uncertainties in actual operations, which may lead to the underestimation of potential risks. Meanwhile, existing global optimization strategies often face issues of inefficiency and overly broad adjustment scopes when dealing with large-scale plan conflicts. To address these challenges, this study proposes an innovative flight plan conflict management framework. First, by introducing a probabilistic model of flight time errors, a new conflict detection mechanism based on confidence intervals is constructed, significantly enhancing the ability to foresee non-obvious conflict risks. Furthermore, based on complex network theory, the framework accurately identifies a small number of “critical flight plans” that play a core role in the conflict network, revealing their key impact on chain reactions of conflicts. On this basis, a phased optimization strategy is adopted, prioritizing the adjustment of spatiotemporal parameters (departure time and speed) for these critical plans to systematically resolve most conflicts. Subsequently, only fine-tuning the speeds of non-critical plans is required to address remaining local conflicts, thereby minimizing interference with the overall operational order. Simulation results demonstrate that this framework not only significantly improves the comprehensiveness of conflict detection but also effectively reduces the total number of conflicts. Additionally, the proposed phased artificial lemming algorithm (ALA) outperforms traditional optimization algorithms in terms of solution quality. This work provides an important theoretical foundation and a practically valuable solution for developing robust and efficient UAM dynamic scheduling systems, holding promise to support the safe and orderly operation of large-scale urban air traffic in the future.
Human mobility modeling is critical for urban planning and transportation management, yet existing approaches often lack the integration capabilities needed to handle diverse data sources. We present a foundation model framework for universal human mobility patterns that leverages cross-domain data fusion and large language models to address these limitations. Our approach integrates multi-modal data of distinct nature and spatio-temporal resolution, including geographical, mobility, socio-demographic, and traffic information, to construct a privacy-preserving and semantically enriched human travel trajectory dataset. Our framework demonstrates adaptability through domain transfer techniques that ensure transferability across diverse urban contexts, as evidenced in case studies of Los Angeles (LA) and Egypt. The framework employs LLMs for semantic enrichment of trajectory data, enabling comprehensive understanding of mobility patterns. Quantitative evaluation shows that our generated synthetic dataset accurately reproduces mobility patterns observed in empirical data. The practical utility of this foundation model approach is demonstrated through large-scale traffic simulations for LA County, where results align well with observed traffic data. On California's I-405 corridor, the simulation yields a Mean Absolute Percentage Error of 5.85% for traffic volume and 4.36% for speed compared to Caltrans PeMS observations, illustrating the framework's potential for intelligent transportation systems and urban mobility applications.
The application of artificial intelligence (AI) to dynamic mobility management can support the achievement of efficiency and sustainability goals. AI can help to model alternative mobility system scenarios in real time (by processing big data from heterogeneous sources in a very short time) and to identify network and service configurations by comparing phenomena in similar contexts, as well as support the implementation of measures for managing demand that achieve sustainable goals. In this paper, an in-depth analysis of scenarios, with an IT (Information Technology) framework based on emerging technologies and AI to support sustainable and cooperative digital mobility, is provided. Therefore, the definition of the functional architecture of an AI-based mobility control centre is defined, and the process that has been implemented in a medium-large city is presented.
In the context of smart city development and public safety management, real-time and accurate dynamic perception of pedestrians in complex urban scenarios, together with reliable forecasting of future mobility trends, is of paramount importance. This paper proposes a unified model termed the Cross-modal Spatio-Temporal Graph Attention Network (CST-GAN). Designed as an end-to-end deep learning framework, the model jointly addresses two tasks: pedestrian real-time detection and spatio-temporal flow forecasting. The proposed framework provides an integrated solution for dynamic pedestrian perception in urban environments, demonstrating the significant potential of deeply fusing realtime visual streams with historical time-series data to enhance predictive accuracy. This study not only advances theoretical understanding but also offers practical guidance for domains such as intelligent traffic control, large-scale event security, and public space planning.
Predicting the density and flow of the crowd or traffic at a citywide level becomes possible by using the big data and cutting-edge AI technologies. It has been a very significant research topic with high social impact, which can be widely applied to emergency management, traffic regulation, and urban planning. In particular, by meshing a large urban area to a number of fine-grained mesh-grids, citywide crowd and traffic information in a continuous time period can be represented with 4D tensor (Timestep, Height, Width, Channel). Based on this idea, a series of methods have been proposed to address grid-based prediction for citywide crowd and traffic. In this study, we revisit the density and in-out flow prediction problem and publish a new aggregated human mobility dataset generated from a real-world smartphone application. Comparing with the existing ones, our dataset holds several advantages including large mesh-grid number, fine-grained mesh size, and high user sample. Towards this large-scale crowd dataset, we propose a novel deep learning model called DeepCrowd by designing pyramid architectures and high-dimensional attention mechanism based on Convolutional LSTM. Lastly, thorough and comprehensive performance evaluations are conducted to demonstrate the superiority of the proposed DeepCrowd comparing to multiple state-of-the-art methods.
This paper proposes a novel real-time traffic congestion prediction framework based on dynamic risk field modeling and multi-source data fusion, tailored for complex urban road networks. By integrating physically interpretable risk field theory with data-driven deep learning methods, we develop a hybrid system that quantifies spatiotemporal congestion risk in a transparent and computationally efficient manner. The proposed model introduces a dynamic field superposition structure driven by Gaussian kernel functions, enabling intuitive three-dimensional visualization of congestion propagation. To meet the demands of real-time urban mobility management, we design a spatial parallel processing architecture that significantly improves computational speed while maintaining high prediction accuracy. Empirical validation using large-scale real-world traffic data demonstrates that our model improves prediction accuracy by 26.2% over standard LSTM methods while reducing CPU usage by 75%. Moreover, the proposed framework supports a dual-driven decision paradigm by linking micro-level driver behavior with macro-level congestion evolution, offering interpretable insights for smart traffic control policies such as dynamic lane assignment and variable speed limits. This study contributes to the development of robust and scalable solutions for intelligent transportation systems and sustainable urban mobility.
Understanding human mobility benefits numerous applications such as urban planning, traffic control, and city management. Previous work mainly focuses on modeling spatial and temporal patterns of human mobility. However, the semantics of trajectory are ignored, thus failing to model people's motivation behind mobility. In this paper, we propose a novel semantics-aware mobility model that captures human mobility motivation using large-scale semantic-rich spatial-temporal data from location-based social networks. In our system, we first develop a multimodal embedding method to project user, location, time, and activity on the same embedding space in an unsupervised way while preserving original trajectory semantics. Then, we use hidden Markov model to learn latent states and transitions between them in the embedding space, which is the location embedding vector, to jointly consider spatial, temporal, and user motivations. In order to tackle the sparsity of individual mobility data, we further propose a von Mises-Fisher mixture clustering for user grouping so as to learn a reliable and fine-grained model for groups of users sharing mobility similarity. We evaluate our proposed method on two large-scale real-world datasets, where we validate the ability of our method to produce high-quality mobility models. We also conduct extensive experiments on the specific task of location prediction. The results show that our model outperforms state-of-the-art mobility models with higher prediction accuracy and much higher efficiency.
Smart cities increasingly rely on data-driven intelligence to support public-service allocation, transport planning, and urban management. Traditional accessibility evaluation methods based on static GIS buffers or manual surveys fail to meet the real-time, predictive, and large-scale requirements of modern smart-city systems. This paper proposes a novel real-time urban accessibility prediction and facility optimization framework that integrates multi-source sensing data, spatio-temporal graph neural networks, and deep reinforcement learning. First, we construct a dynamic urban mobility graph by fusing IoT pedestrian sensors, mobile trajectory data, road-topology data, and transit network feeds. Second, an ST-GNN model is developed to predict future 5-30 minute walking accessibility distributions with high temporal resolution. Third, a DRL-based facility optimization module is introduced to recommend optimal locations for parks, commercial facilities, and transit stops, maximizing coverage within a ten-minute accessibility threshold. Experiments on real-world data from Songdo International City demonstrate that the proposed ST-GNN achieves 18.6 % lower MAE and 21.3 % lower RMSE than state-of-the-art baselines, while the DRL optimizer improves ten-minute life-circle coverage by up to $\text{4 2. 5 \%}$. This work provides a generalizable algorithmic foundation for intelligent urban accessibility management and smart-city service planning.
Human mobility research has significantly benefited from recent advances in machine learning, as have numerous other industries. Aided by the ever-increasing availability of geospatial and mobility data, machine learning models have enabled large-scale systems for simulating city-wide macro and micro mobility behaviors, urban planning, transportation management, and disaster relief optimization. However, while many fields have invested significant effort in solving the model transferability and generalization problem, the inability of machine learning-based human mobility models to generalize to new locations has come to be implicitly accepted in most geospatial research. In this vision paper, we focus on this geospatial generalization problem, its root causes, and how it is restricting the applications of otherwise-promising research. Most importantly, we argue for several data- and modeling-driven innovations which could help remedy this problem, spanning mega-scale simulations, large foundation models, and multi-task, transfer, and meta-learning. We also spotlight a handful of promising ideas which have recently emerged from the community. We hope that these proposals take root and help develop more capable, flexible, and generalizable models in research and industry.
Against the background of rapid urbanization and continuous growth of vehicle numbers, traffic congestion has become increasingly prominent, causing serious negative impacts on urban life and economic development. Traditional traffic signal control methods struggle to adapt to dynamic and complex traffic flows. Although reinforcement learning has brought new opportunities to this field, it faces challenges such as high computational costs, large data requirements, and slow convergence rates. This paper focuses on the Multi-Armed Bandit (MAB) model algorithm, using the Simulation of Urban Mobility (SUMO) traffic simulation software to build a lane model highly similar to reality. Different signal timing plans are set up as "arms" in the MAB model, with vehicle waiting time as "reward". The performance of algorithms such as Explore Then Commit (ETC), Upper Confidence Bound (UCB), Asymptotically Optimal Upper Confidence Bound (asUCB), and Thompson Sampling (TS) in intelligent traffic signal control scenarios is compared. The study finds that the TS algorithm performs best in reducing cumulative regret and vehicle waiting time, providing an effective reference for optimizing actual traffic signal control strategies. The characteristics of other algorithms also provide directions for subsequent algorithm improvements, contributing to the development of intelligent traffic signal control technology.
No abstract available
Intelligent Transportation Systems (ITSs) play a vital role in improving urban and regional mobility by reducing traffic congestion and enhancing trip planning. A key element of ITS is travel-time prediction, which supports informed decisions for both travelers and traffic management. While non-parametric models offer flexibility, they often require large datasets and significant computation. Parametric models, though easier to fit and interpret, are less adaptable. Fuzzy logic models, by contrast, provide robustness and scalability, adjusting to new data and changing conditions. This paper proposes a cascaded fuzzy logic system for highway travel-time prediction, using the Greenshields model as its reasoning foundation. The system consists of multiple fuzzy subsystems, each representing a highway segment. These subsystems transform traffic flow and density inputs into speed predictions through fuzzification, Greenshields-based rules, and defuzzification. The approach enables localized and segment-specific predictions, enhancing route planning and congestion avoidance. The system’s accuracy is evaluated by comparing its predictions with those of a regression model using real traffic data from the Sun Yat-Sen Highway in Taiwan. Simulation results confirm that the proposed model achieves reliable, adaptable travel-time forecasts, including for long-distance trips.
Many efforts have been focused on the network-wide traffic signal optimization to deal with the congestion problem in big cities. Nevertheless, research evidence illustrates that both improper traffic network managements and excessive traffic demands are the key factors leading to the oversaturated traffic conditions. Current studies encounter the bottleneck in addressing the multi-objective optimization problem. This point calls for designing the hierarchical control framework. In this paper, we concern a two-level hierarchical model-based predictive control scheme to improve mobility in heterogeneous large-scale urban traffic networks, so as to mitigate traffic jams. On the basis of a network partition, a regional demand management approach regulating the input traffic flow from adjacent regions is proposed for multi-subnetworks management taking the advantage of the concept of a macroscopic fundamental diagram of urban traffic networks. This can be viewed as a higher level control layer and can be integrated with other strategies. The lower level control layer utilizes the traffic signals coordination within the subnetworks based on a detailed link-level traffic model to optimize the allocation of vehicles in each subnetwork as homogeneous as possible. The simulation results show that integrating regional demand management with a local traffic responsive control into a hierarchical framework can significantly improve the whole network performance under different traffic scenarios in comparison with other available control strategies.
The growth of vehicle mobility in the past decades and increased traffic complexity leads to a need for traffic management systems, especially in large-scale urban traffic networks. The erroneous data problems are common problems that affect traffic management systems. The traffic management systems also relied on traffic prediction particularly in traffic signal control and route guidance. This paper investigated probabilistic principal component analysis (PPCA) methods to impute missing traffic count data and predict future data. We also investigated the resulting principal components' significance in urban traffic analysis. These methods are applied to traffic count data from vehicle detectors in the urban network of Surabaya city, Indonesia. The results show that the PPCA-based data imputation method is able to impute missing data with imputation error under 20% WMAPE. The resulting principal components analysis demonstrates that 1st principal component scores can be seen as a fundamental temporal pattern of the Surabaya urban network while the characteristic of the link can be derived from the 1st principal component coefficient. We also demonstrate that 1st principal component coefficient of the link might detect outliers or anomalies such as detector malfunction and unique temporal pattern. PPCA can also be used to predict future data based on observed data, but experiments show that even though the majority of the links can be predicted accurately, some links are having large errors that might be caused by different temporal patterns between future data and observed data.
The rise of 6G-enabled vehicular metaverses is transforming the automotive industry by integrating immersive, real-time vehicular services through ultra-low latency and high bandwidth connectivity. In 6G-enabled vehicular metaverses, vehicles are represented by vehicle twins (VTs), which serve as digital replicas of physical vehicles to support real-time vehicular applications such as large artificial intelligence (AI) model-based augmented reality (AR) navigation, called VT tasks. VT tasks are resource-intensive and need to be offloaded to ground base stations (BSs) for fast processing. However, high demand for VT tasks and limited resources of ground BSs, pose significant resource allocation challenges, particularly in densely populated urban areas like intersections. As a promising solution, unmanned aerial vehicles (UAVs) act as aerial edge servers to dynamically assist ground BSs in handling VT tasks, relieving resource pressure on ground BSs. However, due to the high mobility of UAVs, information asymmetry exists between UAVs and ground BSs regarding VT task demands, resulting in inefficient resource allocation. To address these challenges, we propose a learning-based modified second-bid (MSB) auction mechanism to optimize resource allocation between ground BSs and UAVs by accounting for VT task latency and accuracy. Moreover, we design a diffusion-based reinforcement learning algorithm to optimize the price scaling factor, maximizing the total surplus of resource providers and minimizing VT task latency. Finally, simulation results demonstrate that the proposed diffusion-based MSB auction outperforms traditional baselines, providing better resource distribution and enhanced service quality for vehicular users.
Traffic congestion in modern cities is exacerbated by the limitations of traditional fixed-time traffic signal systems, which fail to adapt to dynamic traffic patterns. Adaptive Traffic Signal Control (ATSC) algorithms have emerged as a solution by dynamically adjusting signal timing based on real-time traffic conditions. However, the main limitation of such methods is they are not transferable to environments under real-world constraints, such as balancing efficiency, minimizing collisions, and ensuring fairness across intersections. In this paper, we view the ATSC problem as a constrained multi-agent reinforcement learning (MARL) problem and propose a novel algorithm named Multi-Agent Proximal Policy Optimization with Lagrange Cost Estimator (MAPPO-LCE) to produce effective traffic signal control policies. Our approach integrates the Lagrange multipliers method to balance rewards and constraints, with a cost estimator for stable adjustment. We also introduce three novel constraints on the traffic network: GreenTime, GreenSkip, and PhaseSkip, which penalize traffic policies that do not conform to real-world scenarios. Our experimental results on three real-world datasets demonstrate that MAPPO-LCE outperforms three baseline MARL algorithms by across all environments and traffic constraints (improving on MAPPO by \(12.60\% \) , IPPO by \(10.29\% \) , and QTRAN by \(13.10\% \) ). Our results show that constrained MARL is a valuable tool for traffic planners to deploy scalable and efficient ATSC methods in real-world traffic networks.
Lifelong Multi-Agent Path Finding (LMAPF) focuses on planning conflict-free paths for agents, like autonomous vehicles, that are continuously assigned new tasks. The synergy of search-based and learning-based methods holds promise for striking a balance in-between effectiveness and efficiency but still faces several challenges such as inferior initial paths, weak search-learning synergy and low sample utilization rate. To address these issues, this paper proposes a new synergized LMAPF approach, named Synergistic Multi-Agent Path Optimization (SMAPO), which consists of two tightly-coupled phases: Primordial Planning and Decision Refinement. In the Primordial Planning phase, we introduce a novel load-balanced A* algorithm that integrates planned and perceived congestion costs, which enhances initial solution quality by evenly distributing spatiotemporal traffic loads, thereby mitigating potential conflicts. In the Decision Refinement phase, we propose a novel Encoder-Decoder based neural network to learn a collaborative optimization policy through multi-agent reinforcement learning. In addition, we leverage dual transformations to augment trajectory samples during online learning, enhancing both the sample utilization rate and overall learning stability. Extensive experiments reveal that our SMAPO is superior to the state-of-the-art baselines in effectiveness, efficiency, and generalization capability. Source code is available at https://github.com/ByteUser-blues/SMAPO.
Urban traffic congestion presents a major challenge in rapidly developing smart cities, where traditional roadside sensors and fixed camera systems lack the spatial flexibility needed for continuous observation of fast-evolving high-traffic zones. This paper proposes a deep reinforcement learning (DRL)–based autonomous UAV surveillance framework designed to provide persistent, hotspot-focused monitoring of urban congestion. A two-dimensional traffic-density model based on Kuwait’s road network was developed, and a Proximal Policy Optimization (PPO) agent was trained using a reward structure explicitly crafted to maximize dwell time within congestion clusters. The agent demonstrated strong convergence, achieving a mission-level Hotspot Dwell Ratio of approximately 0.80, meaning that the UAV remained within high-density traffic regions for nearly 80% of its operational duration. These findings highlight the potential of DRL-driven UAV systems to deliver adaptive, real-time hotspot surveillance and represent a significant step toward next-generation intelligent traffic-monitoring solutions for smart-city environments.
Intelligent Route Optimization (IRO) addresses autonomous navigation in dynamic urban environments by combining real-time machine learning, probabilistic modeling, and adaptive simulation into a seamless routing framework. Unlike traditional approaches that depend on static maps or simple shortest-path algorithms, IRO actively integrates live contextual data, such as time, weather, road state, and events, using neural networks to achieve highly accurate traffic state predictions with minimal input features. These predictions are further refined by Hidden Markov Models to ensure consistency and reliability, especially during abrupt transitions or anomalies in traffic. Building on this stabilized foundation, a Deep Reinforcement Learning agent equipped with a deep Q-network evaluates a broad action space, selecting routes that optimize not only for speed but also for risk, congestion, and environmental safety across various transport modes. For every potential path, IRO invokes adaptive Monte Carlo sampling to quantify travel time variability and the probability of events that could impact the route, thus assigning a reliability measure alongside the expected travel time and suggested route. With a modular pipeline that separates prediction, smoothing, policy selection, and uncertainty quantification, IRO is able to rapidly adapt to new environments, technologies, and data sources. It consistently outperforms classical shortest-path methods, as demonstrated by its 96.06% accuracy on a custom urban dataset, delivering more reliable results in the presence of outliers such as traffic surges, harsh weather, or road closures. The proposed method empowers autonomous systems to navigate urban settings with a high degree of resilience and transparency, providing interpretable recommendations that specify not just which route to take, but why, bridging rigorous AI modeling with practical, real-time urban mobility needs.
This study proposes a Multi-agent Fusion Double-Dueling-Deep Q-Network Traffic Flow (MF3DQN-TF) for Connected and Autonomous Vehicles (CAVs) in mountain tunnel entrance sections, considering nonlinear coupling effects. Complex road conditions and nonlinear coupling effects of tunnel exit flow often cause unstable traffic flow there, impacting traffic efficiency and safety. The new framework, combining multi - agent deep reinforcement learning and attention mechanisms, has shown marked improvements over traditional rule - based regulation methods in various traffic scenarios through comparative experiments. Actual simulations indicate it can boost traffic flow stability by over 15%, vehicle efficiency by about 20%, and cut congestion time by 18%. Specifically, it enhances average vehicle speed by 25% and reduces the traffic congestion index by 22% compared to conventional methods. The attention mechanism improves intelligent agents’ decision - making efficiency, enabling real - time vehicle interaction and coordination optimization, thus enhancing intelligent transportation systems’ overall adaptability. The framework helps stabilize traffic flow and ease common traffic issues at mountain tunnel entrances, strongly supporting intelligent transportation system development.
No abstract available
An intelligent agent refers to an autonomous entity directing its activity towards achieving goals, acting upon an environment using data obtained with the help of a sensory mechanism. Intelligent agent software is a software system that performs tasks independently on behalf of a user in a networking environment based on user interface and past experiences. By the design of an intelligent sensing software program we can regulate the flow of traffic in a transportation infrastructure network. The problems leading to inefficiencies like loss of time, decrease in safety of vehicles and pedestrians, massive pollution, high wastage of fuel energy, degradation in the quality of life can be achieved by the optimized design. Ant Colony Optimization (ACO) has proven to be a very powerful optimization model for combinatorial optimization problems. The algorithm has the objective of regulating high real time traffic enabling every vehicle in the network with increased efficiency to minimize factors like time delay and traffic congestion.
No abstract available
The behavior decision-making subsystem is a key component of the autonomous driving system, which reflects the decision-making ability of the vehicle and the driver, and is an important symbol of the high-level intelligence of the vehicle. However, the existing rule-based decision-making schemes are limited by the prior knowledge of designers, and it is difficult to cope with complex and changeable traffic scenarios. In this work, an advanced deep reinforcement learning model is adopted, which can autonomously learn and optimize driving strategies in a complex and changeable traffic environment by modeling the driving decision-making process as a reinforcement learning problem. Specifically, we used Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) for comparative experiments. DQN guides the agent to choose the best action by approximating the state-action value function, while PPO improves the decision-making quality by optimizing the policy function. We also introduce improvements in the design of the reward function to promote the robustness and adaptability of the model in real-world driving situations. Experimental results show that the decision-making strategy based on deep reinforcement learning has better performance than the traditional rule-based method in a variety of driving tasks.
Ensuring high-density air transportation systems of the future are both safe and efficient is a top priority. With the growing air traffic complexity in traditional and low-altitude airspace, an autonomous air traffic control system is needed to ensure safe-separation requirements. We propose a deep multi-agent reinforcement learning framework that is able to identify and resolve conflicts between aircraft in a high-density, stochastic, and dynamic en route sector with multiple intersections. The proposed framework utilizes an actor-critic model, A2C that incorporates the loss function from Proximal Policy Optimization (PPO) to help stabilize the learning process. In addition, we use a centralized learning, decentralized execution scheme where one neural network is learned and shared by all agents in the environment. We show that our framework is both scalable and efficient for large number of incoming aircraft to achieve extremely high traffic throughput. We evaluate our model via simulation in the BlueSky environment. Results show that our framework is able to resolve 99.97% of all conflicts both along route and at the intersections.
Autonomous intersection management (AIM) is gaining increasing attention due to its crucial role in ensuring safety and efficiency. Various methods have been proposed in the literature to address the AIM problem, including traffic light optimization, connected vehicle optimization based on vehicle-to-anything (V2X) communication technology, and multi-agent autonomous systems. However, each of these approaches has its own limitations, such as parking delays, communication latency, or the lack of guaranteed collision avoidance. This paper presents a novel approach to AIM using adaptive control barrier functions (aCBFs). The proposed aCBF first estimates the power transmission efficiency and incorporates it into the CBF design to ensure collision-free operation. Compared to existing methods, the aCBF approach offers several advantages. Firstly, it eliminates parking delays caused by traffic light systems. Secondly, it can be deployed in intersections with limited network coverage, unlike IoT-based solutions that rely heavily on connectivity. Thirdly, it ensures guaranteed collision-free agent movement at intersections. Specifically, our method guarantees that the minimum distance between agents remains no less than the safe distance at all times, significantly enhancing safety. Furthermore, compared to the TriPField algorithm, our approach achieves a 95% success rate in collision avoidance, demonstrating reliability in autonomous intersection management. The effectiveness of the proposed aCBF-based AIM algorithm has been validated through simulations and experiments with multiple autonomous agent-like robots.
Abstract The Data Technology Triad, encompassing the Internet of Things, Blockchain technology, and Artificial Intelligence, has the potential to transform Integrated Autonomous Transportation Networks. In this study, the authors apply the Triadic Model and the Triple S holistic approach (which focuses on synthetic, systemic, and synergic perspectives) to create a model for optimizing freight systems. The authors use the Multi-Agent Transport Simulation (MATSim) platform to examine the performance of integrated (freight_i) and non-integrated (freight_n) freight systems under urban traffic conditions. The simulation consists of 80% commuter and 20% freight agents who travel in a terrestrial-only network. The results highlight the efficiency and adaptability of integrated systems. More significantly, they show that the synergy of Internet of Things data collection, Blockchain-enabled security, and AI-driven optimization can produce important gains in the number of kilometers traveled and reduced travel times. The findings also validate the triad’s potential to improve operational efficiency, security, and interoperability within urban transportation networks.
Due to their highly flexible deployment and agility features, unmanned aerial vehicles (UAVs) serving as aerial base stations are increasingly being used in challenging environments, including emergency communication, traffic offloading, and failures of existing communications infrastructure. A reliable and effective communication service requires 3D deployment and autonomous UAV trajectory optimization in each time slot. While most prior approaches focus on trajectory design that maximizes communication coverage or network throughput without considering fairness as well as computation time. This paper presents a multi-UAV trajectory control and fair communication (TCFC) scheme that maximizes ground user data rate and communication coverage while ensuring service fairness in a UAV-aided wireless communications system. The proposed TCFC scheme employs a two-stage learning approach. Firstly, it develops a gated recurrent unit-based link quality estimation model to assess each user’s link quality over time. Then, a federated multi-agent deep reinforcement learning (FedMADRL) algorithm is utilized to continuously adjust the trajectory of the UAVs, optimizing communication performance. We evaluated our proposed system using real channel measurement data, i.e., amplitude and phase signal information. The results show that the proposed TCFC scheme reduces computation time by 26.91% and provides comparable network performance with the baseline methods while improving DRL agents’ privacy.
Reinforcement learning(RL) has made significant advancements in autonomous driving(AD). However, the stochastic nature of dynamic traffic scenario and the diversity of road type make it challenging for autonomous vehicles to make safe and efficient decisions. To tackle these problems, this paper proposes a novel RL framework that incorporates the motion prediction model to enhance the agent’s decision-making capability. We first utilize Transformer to model driving scenarios and capture interaction-aware relationships between the ego vehicle and scenarios, then design a safety-constraint and integrate it into the Proximal Policy Optimization (PPO) algorithm so as to guarantee the safety and feasibility of the policy. To improve data efficiency and filter noisy samples, we construct a dual network to communicate and guide each other. Experimental results show that compared with popular RL algorithms, our method demonstrates superior performance in success rate, completion time, safety, and data efficiency.
The need for vehicular networks with exceptional levels of reliability and negligible delay in communication, especially with the ongoing 5G and the upcoming generation of 6G systems, has given rise to Cellular-Vehicle-to-Anything C-V2X systems. This paper proposes a novel framework of Priority-Aware Multi-Agent Deep Reinforcement Learning (PA-MADRL) to address with issues such as high interference, dynamic allocation of critical signals, resource contention in density traffic, dynamic fading, and safety-critical message ratio. This approach is aimed at the development of an effective and adaptive resource allocation mechanism that will improve the likelihood of transmission of safety critical signals and improve overall network performance in metropolitan areas. PA-MADRL allows autonomous vehicles to efficiently allocate resources despite varied congestion along with interference conditions by combining centralized training for global optimization and decentralized execution for scalable decision-making. PA-MADRL is derived from the concept of optimal global allocation of resources in central training and intelligent instantaneous distribution in decentralized setting. Several tests carried out at different levels of interference and traffic densities demonstrate that PA-MADRL improves throughput by 35%, lowers latency by 40% and increases Packet Delivery Ratio (PDR) by nearly 50%, especially when the traffic density increases. These results show that PA-MADRL has the capability to adaptively work in dense vehicular networks with low communication overhead. Ultimately, the framework enhances safety in autonomous transportation by facilitating the timely delivery of critical messages, thus contributing to sustainable and intelligent transportation systems as part of next-generation smart cities.
Urban route planning is a critical challenge for Autonomous Vehicles (AVs) due to the high variability of traffic conditions and the topological complexity of road networks. Traditional routing systems often rely on static maps or algorithms that do not account for the temporal evolution of traffic, resulting in suboptimal solutions in real-world scenarios. This work proposes a Deep Reinforcement Learning (DRL) based framework for optimal path generation in realistic urban environments, leveraging historical traffic data provided by TomTom APIs. The road network of the city of Bari is used as a case study, and traffic conditions are modeled across three distinct time slots to capture typical daily traffic fluctuations. The routing problem is formally modeled as a Markov Decision Process, where the agent interacts with a dynamic environment and learns a policy that minimizes travel time and avoids congestion. The agent is trained using the Proximal Policy Optimization algorithm, a state-of-the-art actor-critic method that ensures stable learning in complex environments. The simulation environment is built using Simulation of Urban MObility, with traffic flows generated coherently with the data collected from TomTom. Experimental results show that the agent trained with time-dependent traffic information outperforms static-data-based strategies, demonstrating the effectiveness of the proposed approach for intelligent and adaptive transportation systems.
This paper introduces a novel “CAV Trio” structure, where three connected autonomous vehicles (CAVs) are strategically scattered across adjacent lanes, serving as a key regulatory means for multi-lane mixed traffic flow. Based on the CAV Trio structure, an end-to-end hierarchical deep learning control framework is proposed to dynamically adjust the formation and speed of CAVs in the CAV Trio, further guiding surrounding vehicles and regulating lane-wise macro traffic flow. The control framework features a spatio-temporal perception module, constructed using a Gated Recurrent Unit (GRU) and Graph Attention Network (GAT), which accurately extracts traffic features. A centralized Proximal Policy Optimization (PPO) agent generates macro-level decisions, while a Model Predictive Control (MPC) layer filters actions to ensure safety and practicality. Additionally, a decision filtering mechanism and a reward coupling term are integrated to prevent the formation of traffic barriers and enhance policy tracking. Simulation results demonstrate that compared with baseline methods, the proposed approach can effectively achieve multilane macroscopic traffic regulation under different targets, and balance throughput enhancement and speed preservation, highlighting the structure's effectiveness and the framework's superiority in traffic regulation.
Unmanned aerial vehicles (UAVs) are becoming increasingly popular as mobile base stations due to their flexible deployment and low-cost features, particularly for emergency communications, traffic offloading, and terrestrial communications infrastructure failures. This paper presents an autonomous trajectory control method for multiple UAVs equipped with base stations for UAV-enabled wireless communications. The objective of this work is to address the optimization challenge of maximizing both communication coverage and network throughput for ground users. The proposed multi-aerial base station trajectory control (MATC) scheme employs a two-stage learning approach. Initially, we developed a long short-term memory-based link quality estimation model to assess each user's link quality over time. The trajectory of the aerial base stations is then continuously adjusted through a centralized multi-agent deep reinforcement learning algorithm to optimize communication performance. We evaluated our proposed system using real channel measurement data, i.e., amplitude and phase signal information. Notably, the proposed approach operates solely on received signals from users, without requiring knowledge of their specific locations. The proposed MATC strategy achieves 97.41% communication coverage while maintaining satisfactory system throughput performance. Numerical results demonstrate that the proposed method significantly enhances both communication coverage and network throughput in comparison to the base line algorithms.
The fast development of autonomous vehicles requires low-latency and high throughput communications systems that are reliable to guarantee safety as well as efficient information transfer. The conventional vehicular communication approaches (DSRC and cellular 5G/6G networks) are limited in their nature: DSRC allows ultra-low latency to send safety-related messages, but with limited bandwidth, and $\mathbf{5 G} / \mathbf{6 G}$ offers high throughput at the expense of higher latency and unreliability in congested traffic conditions. A solution to these issues is proposed in this research project, which is a hybrid DSRC-5G/6G car-to-car communication system called SafeLinkX that uses deep reinforcement learning (DRL) to select the dynamic channel. It uses realistic vehicle motion traces of the SUMO and LuST databases in combination with a network simulation of NS-3/Veins to produce the DRL agent. SafeLinkX is smart enough to give safety-related messages priority using DSRC whilst delivering highthroughput cooperative and infotainment data over $\mathbf{5 G} / \mathbf{6 G}$ connections. The performance measurements reveal that the response time, packet delivery ratio, throughput, and the accuracy of decisions are vastly better than the response time, packet delivery ratio, throughput, and accuracy of decisions in case of DSRC-only, $\mathbf{5 G} / \mathbf{6 G}$-only, and the tools that do not start. The findings confirm SafeLinkX as a strong, flexible and scalable solution to a heterogeneous vehicular environment, and assures safety-first communications usage without the cost of bandwidth use. The current work adds a new DRL-based solution to hybrid V2X communication, between the needs of autonomous driving and the future-generation networking technology.
Given the advancements in intelligent highway management systems and connected and autonomous vehicles, precise traffic flow control via highway gantries has become feasible. Bottleneck zones caused by construction or accidents, as a primary source of highway congestion and crashes, have emerged as a key focus in proactive traffic flow management. To address the issue of conflicting gantry decisions in the context of sparse gantries, multi-lane highways, and mixed traffic flows within bottleneck zones, a coordinated speed limit control model utilizing the gantry system is proposed. This model aims to enhance safety and efficiency while maintaining energy consumption levels. Building upon this coordinated control framework, an improved neural network-based Proximal Policy Optimization integrated-agent algorithm is introduced for centralized training and decision-making. The algorithm incorporates elite experience replay, enhanced exploration, entropy regularization, and gradient clipping to improve deep reinforcement learning performance and avoid local optima. Finally, simulation via SUMO and PyTorch demonstrates that the proposed control strategy improves safety metrics by 11.6%, increases space-mean speed by 6.7%, and achieves a 1.02% reduction in energy consumption compared to real-world control strategies. The effectiveness of the proposed coordinated speed limit control model and the integrated-agent training and decision-making framework is validated against traditional models and multi-agent distributed control approaches.
The integration of autonomous vehicles (AVs) into the existing transportation infrastructure offers a promising solution to alleviate congestion and enhance mobility. This research explores a novel approach to traffic optimization by employing a multi-agent rollout approach within a mixed autonomy environment. The study concentrates on coordinating the speed of human-driven vehicles by longitudinally controlling AVs, aiming to dynamically optimize traffic flow and alleviate congestion at highway bottlenecks in real-time. We model the problem as a decentralized partially observable Markov decision process (Dec-POMDP) and propose an improved multi-agent rollout algorithm. By employing agent-by-agent policy iterations, our approach implicitly considers cooperation among multiple agents and seamlessly adapts to complex scenarios where the number of agents dynamically varies. Validated in a real-world network with varying AV penetration rates and traffic flow, the simulations demonstrate that the multi-agent rollout algorithm significantly enhances performance, reducing average travel time on bottleneck segments by 9.42% with a 10% AV penetration rate.
The rapid evolution of technology in connected automated and autonomous vehicles offers immense potential for revolutionizing future intelligent traffic control and management. This potential is exemplified by the diverse range of control paradigms, ranging from self-routing to centralized control. However, the selection among these paradigms is beyond technical consideration but a delicate balance between autonomous decision-making and holistic system optimization. A pivotal quantitative parameter in navigating this balance is the concept of the “price of anarchy” (PoA) inherent in autonomous decision frameworks. This paper analyses the price of anarchy for road networks with traffic of CAV. We model a traffic network as a routing game in which vehicles are selfish agents who choose routes to travel autonomously to minimize travel delays caused by road congestion. Unlike existing research in which the latency function of road congestion was based on polynomial functions like the well-known BPR function, we focus on routing games where an exponential function can specify the latency of road traffic. We first calculate a tight upper bound for the price of anarchy for this class of games and then compare this result with the tight upper bound of the PoA for routing games with the BPR latency function. The comparison shows that as long as the traffic volume is lower than the road capacity, the tight upper bound of the PoA of the games with the exponential function is lower than the corresponding value with the BPR function. Finally, numerical results based on real-world traffic data demonstrate that the exponential function can approximate road latency as close as the BPR function with even tighter exponential parameters, which results in a relatively lower upper bound.
Accurate prediction of multiclass agent trajectories at signalized intersections is crucial for urban traffic management and autonomous driving systems. However, this task presents significant challenges due to the complex intersection layouts, heterogeneous agent interactions, and the influence of traffic signal control. To address these challenges, we propose the novel Knowledge-Informed Graph Convolutional Neural Network (KI-GCNN), which integrates a scene interaction graph, a class-aware semantic graph, and traffic signal encoding to enhance trajectory prediction accuracy. Additionally, we introduce an interaction optimization mechanism that dynamically prunes misleading interactions based on the positions and velocities of agents. Extensive experiments on the SIND dataset demonstrate that KI-GCNN achieves a minimum Average Displacement Error (mADE) of 0.43 and a minimum Final Displacement Error (mFDE) of 0.70, using a 12-frame observation and prediction window. When the prediction window is extended to 18 frames, the model achieves an mADE of 0.62 and an mFDE of 0.94. These results demonstrate the effectiveness of KI-GCNN in complex, signalized intersection scenarios, which represents a significant advancement in multiclass trajectory prediction.
As a popular research field, autonomous driving may offer great benefits for human society. To achieve that, current studies often applied machine learning methods like reinforcement learning to enable an agent to interact and learn in a stimulating environment. However, most simulators lack realistic traffic which may cause a deficiency in realistic interaction. The present study adopted the SMARTS platform to create a simulator in which the trajectories of the vehicles in the NGSIM I-80 dataset were extracted as the background traffic. The built NGSIM simulator was used to train a model using the proximal policy optimization method. The actor-critic neural network was applied, and the model takes inputs including 38 features that encode the information of the host vehicle and the nearest surrounding vehicles in the current lane and adjacent lane. A2C was selected as a comparative method. The results revealed that the PPO model outperformed the A2C model in the current task by collecting more rewards, traveling longer distances, and encountering less dangerous events during model training and testing. The PPO model achieved an 84% success rate in the test which is comparable to the related studies. The present study proved that the public driving dataset and reinforcement learning can provide a useful tool to achieve autonomous driving.
No abstract available
No abstract available
No abstract available
Connected and autonomous vehicle technology has advanced rapidly in recent years. These technologies create possibilities for advanced AI-based traffic management techniques. Developing such techniques is an important challenge and opportunity for the AI community as it requires synergy between experts in game theory, multiagent systems, behavioral science, and flow optimization. This paper takes a step in this direction by considering traffic flow optimization through setting and broadcasting of dynamic and adaptive tolls. Previous tolling schemes either were not adaptive in realtime, not scalable to large networks, or did not optimize traffic flow over an entire network. Moreover, previous schemes made strong assumptions on observable demands, road capacities and users homogeneity. This paper introduces △-tolling, a novel tolling scheme that is adaptive in real-time and able to scale to large networks. We provide theoretical evidence showing that under certain assumptions △-tolling is equal to Marginal-Cost Tolling, which provably leads to system-optimal, and empirical evidence showing that △-tolling increases social welfare (by up to 33%) in two traffic simulators with markedly different modeling assumptions.
We propose a \emph{collaborative} multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a \emph{joint} policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such that the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable. We evaluate our algorithm on several large scale challenging tasks and demonstrate that it outperforms previous state-of-the-arts.
Effective communication is pivotal for addressing complex collaborative tasks in multi-agent reinforcement learning (MARL). Yet, limited communication bandwidth and dynamic, intricate environmental topologies present significant challenges in identifying high-value communication partners. Agents must consequently select collaborators under uncertainty, lacking a priori knowledge of which partners can deliver task-critical information. To this end, we propose Interference-Aware K-Step Reachable Communication (IA-KRC), a novel framework that enhances cooperation via two core components: (1) a K-Step reachability protocol that confines message passing to physically accessible neighbors, and (2) an interference-prediction module that optimizes partner choice by minimizing interference while maximizing utility. Compared to existing methods, IA-KRC enables substantially more persistent and efficient cooperation despite environmental interference. Comprehensive evaluations confirm that IA-KRC achieves superior performance compared to state-of-the-art baselines, while demonstrating enhanced robustness and scalability in complex topological and highly dynamic multi-agent scenarios.
Multi-agent reinforcement learning systems aim to provide interacting agents with the ability to collaboratively learn and adapt to the behaviour of other agents. In many real-world applications, the agents can only acquire a partial view of the world. Here we consider a setting whereby most agents' observations are also extremely noisy, hence only weakly correlated to the true state of the environment. Under these circumstances, learning an optimal policy becomes particularly challenging, even in the unrealistic case that an agent's policy can be made conditional upon all other agents' observations. To overcome these difficulties, we propose a multi-agent deep deterministic policy gradient algorithm enhanced by a communication medium (MADDPG-M), which implements a two-level, concurrent learning mechanism. An agent's policy depends on its own private observations as well as those explicitly shared by others through a communication medium. At any given point in time, an agent must decide whether its private observations are sufficiently informative to be shared with others. However, our environments provide no explicit feedback informing an agent whether a communication action is beneficial, rather the communication policies must also be learned through experience concurrently to the main policies. Our experimental results demonstrate that the algorithm performs well in six highly non-stationary environments of progressively higher complexity, and offers substantial performance gains compared to the baselines.
Team formation and the dynamics of team-based learning have drawn significant interest in the context of Multi-Agent Reinforcement Learning (MARL). However, existing studies primarily focus on unilateral groupings, predefined teams, or fixed-population settings, leaving the effects of algorithmic bilateral grouping choices in dynamic populations underexplored. To address this gap, we introduce a framework for learning two-sided team formation in dynamic multi-agent systems. Through this study, we gain insight into what algorithmic properties in bilateral team formation influence policy performance and generalization. We validate our approach using widely adopted multi-agent scenarios, demonstrating competitive performance and improved generalization in most scenarios.
Designing effective task sequences is crucial for curriculum reinforcement learning (CRL), where agents must gradually acquire skills by training on intermediate tasks. A key challenge in CRL is to identify tasks that promote exploration, yet are similar enough to support effective transfer. While recent approach suggests comparing tasks via their Structural Causal Models (SCMs), the method requires access to ground-truth causal structures, an unrealistic assumption in most RL settings. In this work, we propose Causal-Paced Deep Reinforcement Learning (CP-DRL), a curriculum learning framework aware of SCM differences between tasks based on interaction data approximation. This signal captures task novelty, which we combine with the agent's learnability, measured by reward gain, to form a unified objective. Empirically, CP-DRL outperforms existing curriculum methods on the Point Mass benchmark, achieving faster convergence and higher returns. CP-DRL demonstrates reduced variance with comparable final returns in the Bipedal Walker-Trivial setting, and achieves the highest average performance in the Infeasible variant. These results indicate that leveraging causal relationships between tasks can improve the structure-awareness and sample efficiency of curriculum reinforcement learning. We provide the full implementation of CP-DRL to facilitate the reproduction of our main results at https://github.com/Cho-Geonwoo/CP-DRL.
We introduce a new generative model for human planning under the Bayesian Inverse Reinforcement Learning (BIRL) framework which takes into account the fact that humans often plan using hierarchical strategies. We describe the Bayesian Inverse Hierarchical RL (BIHRL) algorithm for inferring the values of hierarchical planners, and use an illustrative toy model to show that BIHRL retains accuracy where standard BIRL fails. Furthermore, BIHRL is able to accurately predict the goals of `Wikispeedia' game players, with inclusion of hierarchical structure in the model resulting in a large boost in accuracy. We show that BIHRL is able to significantly outperform BIRL even when we only have a weak prior on the hierarchical structure of the plans available to the agent, and discuss the significant challenges that remain for scaling up this framework to more realistic settings.
An often neglected issue in multi-agent reinforcement learning (MARL) is the potential presence of unreliable agents in the environment whose deviations from expected behavior can prevent a system from accomplishing its intended tasks. In particular, consensus is a fundamental underpinning problem of cooperative distributed multi-agent systems. Consensus requires different agents, situated in a decentralized communication network, to reach an agreement out of a set of initial proposals that they put forward. Learning-based agents should adopt a protocol that allows them to reach consensus despite having one or more unreliable agents in the system. This paper investigates the problem of unreliable agents in MARL, considering consensus as a case study. Echoing established results in the distributed systems literature, our experiments show that even a moderate fraction of such agents can greatly impact the ability of reaching consensus in a networked environment. We propose Reinforcement Learning-based Trusted Consensus (RLTC), a decentralized trust mechanism, in which agents can independently decide which neighbors to communicate with. We empirically demonstrate that our trust mechanism is able to handle unreliable agents effectively, as evidenced by higher consensus success rates.
While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.
Learning in multi-agent systems is highly challenging due to several factors including the non-stationarity introduced by agents' interactions and the combinatorial nature of their state and action spaces. In particular, we consider the Mean-Field Control (MFC) problem which assumes an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward. In many cases, solutions of an MFC problem are good approximations for large systems, hence, efficient learning for MFC is valuable for the analogous discrete agent setting with many agents. Specifically, we focus on the case of unknown system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient model-based reinforcement learning algorithm, $M^3-UCRL$, that runs in episodes, balances between exploration and exploitation during policy learning, and provably solves this problem. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC, obtained via a novel mean-field type analysis. To learn the system's dynamics, $M^3-UCRL$ can be instantiated with various statistical models, e.g., neural networks or Gaussian Processes. Moreover, we provide a practical parametrization of the core optimization problem that facilitates gradient-based optimization techniques when combined with differentiable dynamics approximation methods such as neural networks.
We develop Upside-Down Reinforcement Learning (UDRL), a method for learning to act using only supervised learning techniques. Unlike traditional algorithms, UDRL does not use reward prediction or search for an optimal policy. Instead, it trains agents to follow commands such as "obtain so much total reward in so much time." Many of its general principles are outlined in a companion report; the goal of this paper is to develop a practical learning algorithm and show that this conceptually simple perspective on agent training can produce a range of rewarding behaviors for multiple episodic environments. Experiments show that on some tasks UDRL's performance can be surprisingly competitive with, and even exceed that of some traditional baseline algorithms developed over decades of research. Based on these results, we suggest that alternative approaches to expected reward maximization have an important role to play in training useful autonomous agents.
Communication is a critical factor for the big multi-agent world to stay organized and productive. Typically, most previous multi-agent "learning-to-communicate" studies try to predefine the communication protocols or use technologies such as tabular reinforcement learning and evolutionary algorithm, which can not generalize to changing environment or large collection of agents. In this paper, we propose an Actor-Coordinator-Critic Net (ACCNet) framework for solving "learning-to-communicate" problem. The ACCNet naturally combines the powerful actor-critic reinforcement learning technology with deep learning technology. It can efficiently learn the communication protocols even from scratch under partially observable environment. We demonstrate that the ACCNet can achieve better results than several baselines under both continuous and discrete action space environments. We also analyse the learned protocols and discuss some design considerations.
A common challenge in reinforcement learning is how to convert the agent's interactions with an environment into fast and robust learning. For instance, earlier work makes use of domain knowledge to improve existing reinforcement learning algorithms in complex tasks. While promising, previously acquired knowledge is often costly and challenging to scale up. Instead, we decide to consider problem knowledge with signals from quantities relevant to solve any task, e.g., self-performance assessment and accurate expectations. $\mathcal{V}^{ex}$ is such a quantity. It is the fraction of variance explained by the value function $V$ and measures the discrepancy between $V$ and the returns. Taking advantage of $\mathcal{V}^{ex}$, we propose MERL, a general framework for structuring reinforcement learning by injecting problem knowledge into policy gradient updates. As a result, the agent is not only optimized for a reward but learns using problem-focused quantities provided by MERL, applicable out-of-the-box to any task. In this paper: (a) We introduce and define MERL, the multi-head reinforcement learning framework we use throughout this work. (b) We conduct experiments across a variety of standard benchmark environments, including 9 continuous control tasks, where results show improved performance. (c) We demonstrate that MERL also improves transfer learning on a set of challenging pixel-based tasks. (d) We ponder how MERL tackles the problem of reward sparsity and better conditions the feature space of reinforcement learning agents.
Re-conceptualising the Language Game Paradigm in the Framework of Multi-Agent Reinforcement Learning
In this paper, we formulate the challenge of re-conceptualising the language game experimental paradigm in the framework of multi-agent reinforcement learning (MARL). If successful, future language game experiments will benefit from the rapid and promising methodological advances in the MARL community, while future MARL experiments on learning emergent communication will benefit from the insights and results gained from language game experiments. We strongly believe that this cross-pollination has the potential to lead to major breakthroughs in the modelling of how human-like languages can emerge and evolve in multi-agent systems.
Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure.
The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2$\times$2 team version of Pommerman, developed for a competition at NeurIPS 2018. Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm. The basic PPO approach is modified for stable transition from the imitation learning phase through reward shaping, action filters based on heuristics, and curriculum learning. The proposed methodology is able to beat heuristic and pure reinforcement learning baselines with a combined 100,000 training games, significantly faster than other non-tree-search methods in literature. We present results against multiple agents provided by the developers of the simulation, including some that we have enhanced. We include a sensitivity analysis over different parameters, and highlight undesirable effects of some strategies that initially appear promising. Since Pommerman is a complex multi-agent competitive environment, the strategies developed here provide insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.
In the modern world, the development of Artificial Intelligence (AI) has contributed to improvements in various areas, including automation, computer vision, fraud detection, and more. AI can be leveraged to enhance the efficiency of Autonomous Smart Traffic Management (ASTM) systems and reduce traffic congestion rates. This paper presents an Autonomous Smart Traffic Management (STM) system that uses AI to improve traffic flow rates. The system employs the YOLO V5 Convolutional Neural Network to detect vehicles in traffic management images. Additionally, it predicts the number of vehicles for the next 12 hours using a Recurrent Neural Network with Long Short-Term Memory (RNN-LSTM). The Smart Traffic Management Cycle Length Analysis manages the traffic cycle length based on these vehicle predictions, aided by AI. From the results of the RNN-LSTM model for predicting vehicle numbers over the next 12 hours, we observe that the model predicts traffic with a Mean Squared Error (MSE) of 4.521 vehicles and a Root Mean Squared Error (RMSE) of 2.232 vehicles. After simulating the STM system in the CARLA simulation environment, we found that the Traffic Management Congestion Flow Rate with ASTM (21 vehicles per minute) is 50\% higher than the rate without STM (around 15 vehicles per minute). Additionally, the Traffic Management Vehicle Pass Delay with STM (5 seconds per vehicle) is 70\% lower than without STM (around 12 seconds per vehicle). These results demonstrate that the STM system using AI can increase traffic flow by 50\% and reduce vehicle pass delays by 70\%.
Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models, making fine-tuning data scarce and acquiring it both difficult and costly. Discarding failed trajectories also leads to significant wastage of data and resources and limits the possible optimization paths during fine-tuning. In this paper, we argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning strategies. By simply adding a prefix or suffix that tells the model whether to generate a successful trajectory during training, we improve model performance by a large margin on mathematical reasoning, multi-hop question answering, and strategic question answering tasks. We further analyze the inference results and find that our method provides a better trade-off between valuable information and errors in unsuccessful trajectories. To our knowledge, we are the first to demonstrate the value of negative trajectories and their application in agent-tunning scenarios. Our findings offer guidance for developing better agent-tuning methods and low-resource data usage techniques.
Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks. However, the procedure to optimizing the mixing of instruction datasets for LLM fine-tuning is still poorly understood. This study categorizes instructions into three primary types: NLP downstream tasks, coding, and general chat. We explore the effects of instruction tuning on different combinations of datasets on LLM performance, and find that certain instruction types are more advantageous for specific applications but can negatively impact other areas. This work provides insights into instruction mixtures, laying the foundations for future research.
Urban traffic management faces significant challenges due to the dynamic environments, and traditional algorithms fail to quickly adapt to this environment in real-time and predict possible conflicts. This study explores the ability of a Large Language Model (LLM), specifically, GPT-4o-mini to improve traffic management at urban intersections. We recruited GPT-4o-mini to analyze, predict position, detect and resolve the conflicts at an intersection in real-time for various basic scenarios. The key findings of this study to investigate whether LLMs can logically reason and understand the scenarios to enhance the traffic efficiency and safety by providing real-time analysis. The study highlights the potential of LLMs in urban traffic management creating more intelligent and more adaptive systems. Results showed the GPT-4o-mini was effectively able to detect and resolve conflicts in heavy traffic, congestion, and mixed-speed conditions. The complex scenario of multiple intersections with obstacles and pedestrians saw successful conflict management as well. Results show that the integration of LLMs promises to improve the effectiveness of traffic control for safer and more efficient urban intersection management.
Reinforcement learning (RL) has become a key technique for enhancing the reasoning abilities of large language models (LLMs), with policy-gradient algorithms dominating the post-training stage because of their efficiency and effectiveness. However, most existing benchmarks evaluate large-language-model reasoning under idealized settings, overlooking performance in realistic, non-ideal scenarios. We identify three representative non-ideal scenarios with practical relevance: summary inference, fine-grained noise suppression, and contextual filtering. We introduce a new research direction guided by brain-science findings that human reasoning remains reliable under imperfect inputs. We formally define and evaluate these challenging scenarios. We fine-tune three LLMs and a state-of-the-art large vision-language model (LVLM) using RL with a representative policy-gradient algorithm and then test their performance on eight public datasets. Our results reveal that while RL fine-tuning improves baseline reasoning under idealized settings, performance declines significantly across all three non-ideal scenarios, exposing critical limitations in advanced reasoning capabilities. Although we propose a scenario-specific remediation method, our results suggest current methods leave these reasoning deficits largely unresolved. This work highlights that the reasoning abilities of large models are often overstated and underscores the importance of evaluating models under non-ideal scenarios. The code and data will be released at XXXX.
This tutorial explores the advancements and challenges in the development of Large Language Models (LLMs) such as ChatGPT and Gemini. It addresses inherent limitations like temporal knowledge cutoffs, mathematical inaccuracies, and the generation of incorrect information, proposing solutions like Retrieval Augmented Generation (RAG), Program-Aided Language Models (PAL), and frameworks such as ReAct and LangChain. The integration of these techniques enhances LLM performance and reliability, especially in multi-step reasoning and complex task execution. The paper also covers fine-tuning strategies, including instruction fine-tuning, parameter-efficient methods like LoRA, and Reinforcement Learning from Human Feedback (RLHF) as well as Reinforced Self-Training (ReST). Additionally, it provides a comprehensive survey of transformer architectures and training techniques for LLMs. The source code can be accessed by contacting the author via email for a request.
Optimal management of traffic light timing is one of the most effective factors in reducing urban traffic. In most old systems, fixed timing was used along with human factors to control traffic, which is not very efficient in terms of time and cost. Nowadays, methods in the field of traffic management are based on the use of artificial intelligence. In this method, by using real-time processing of video surveillance camera images along with reinforcement learning, the optimal timing of traffic lights is determined and applied according to several parameters. In the research, deep learning methods were used in vehicle detection using the YOLOv9-C model to estimate the number and other characteristics of vehicles such as speed. Finally, by modeling vehicles in an urban environment simulator at OpenAI Gym using multi-factor reinforcement learning and the DQN Rainbow algorithm, timing is applied to traffic lights at intersections. Additionally, the use of transfer learning along with retraining the model on images of Iranian cars has increased the accuracy of the model. The results of the proposed method show a model that is reasonably accurate in both parts of analyzing surveillance cameras and finding the optimal timing, and it has been observed that it has better accuracy than previous research.
The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of "rules" or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent's action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.
Multi-agent reinforcement learning focuses on training the behaviors of multiple learning agents that coexist in a shared environment. Recently, MARL models, such as the Multi-Agent Transformer (MAT) and ACtion dEpendent deep Q-learning (ACE), have significantly improved performance by leveraging sequential decision-making processes. Although these models can enhance performance, they do not explicitly consider the importance of the order in which agents make decisions. In this paper, we propose an Agent Order of Action Decisions-MAT (AOAD-MAT), a novel MAT model that considers the order in which agents make decisions. The proposed model explicitly incorporates the sequence of action decisions into the learning process, allowing the model to learn and predict the optimal order of agent actions. The AOAD-MAT model leverages a Transformer-based actor-critic architecture that dynamically adjusts the sequence of agent actions. To achieve this, we introduce a novel MARL architecture that cooperates with a subtask focused on predicting the next agent to act, integrated into a Proximal Policy Optimization based loss function to synergistically maximize the advantage of the sequential decision-making. The proposed method was validated through extensive experiments on the StarCraft Multi-Agent Challenge and Multi-Agent MuJoCo benchmarks. The experimental results show that the proposed AOAD-MAT model outperforms existing MAT and other baseline models, demonstrating the effectiveness of adjusting the AOAD order in MARL.
Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region methods no longer holds because an agent's payoff is also affected by other agents' adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines.
MmWaves have been envisioned as a promising direction to provide Gbps wireless access. However, they are susceptible to high path losses and blockages, which directional antennas can only partially mitigate. That makes mmWave networks coverage-limited, thus requiring dense deployments. Integrated access and backhaul (IAB) architectures have emerged as a cost-effective solution for network densification. Resource allocation in mmWave IAB networks must face big challenges to cope with heavy temporal dynamics, such as intermittent links caused by user mobility and blockages from moving obstacles. This makes it extremely difficult to find optimal and adaptive solutions. In this article, exploiting the distributed structure of the problem, we propose a Multi-Agent Reinforcement Learning (MARL) framework to optimize user throughput via flow routing and link scheduling in mmWave IAB networks characterized by user mobility and link outages generated by moving obstacles. The proposed approach implicitly captures the environment dynamics, coordinates the interference, and manages the buffer levels of IAB relay nodes. We design different MARL components, considering full-duplex and half-duplex IAB-nodes. In addition, we provide a communication and coordination scheme for RL agents in an online training framework, addressing the feasibility issues of practical systems. Numerical results show the effectiveness of the proposed approach.
This extended abstracts presents a method to generate energy-optimal trajectories for multi-agent systems as a strategic-form game. Using recent results in optimal control, we demonstrate that an energy-optimal trajectory can be generated in milliseconds if the sequence of constraint activations is known a priori. Thus, rather than selecting an infinite-dimensional action from a function space, the agents select their actions from a finite number of constraints and determine the time that each becomes active. Furthermore, the agents can exactly encode their trajectory in a set of real numbers, rather than communicating their control action as an infinite-dimensional function. We demonstrate the performance of this algorithm in simulation and find an optimal trajectory in 45 milliseconds on a tablet PC.
The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.
Designing the optimal linear quadratic regulator (LQR) for a large-scale multi-agent system (MAS) is time-consuming since it involves solving a large-size matrix Riccati equation. The situation is further exasperated when the design needs to be done in a model-free way using schemes such as reinforcement learning (RL). To reduce this computational complexity, we decompose the large-scale LQR design problem into multiple smaller-size LQR design problems. We consider the objective function to be specified over an undirected graph, and cast the decomposition as a graph clustering problem. The graph is decomposed into two parts, one consisting of independent clusters of connected components, and the other containing edges that connect different clusters. Accordingly, the resulting controller has a hierarchical structure, consisting of two components. The first component optimizes the performance of each independent cluster by solving the smaller-size LQR design problem in a model-free way using an RL algorithm. The second component accounts for the objective coupling different clusters, which is achieved by solving a least squares problem in one shot. Although suboptimal, the hierarchical controller adheres to a particular structure as specified by inter-agent couplings in the objective function and by the decomposition strategy. Mathematical formulations are established to find a decomposition that minimizes the number of required communication links or reduces the optimality gap. Numerical simulations are provided to highlight the pros and cons of the proposed designs.
Generalization is a major challenge for multi-agent reinforcement learning. How well does an agent perform when placed in novel environments and in interactions with new co-players? In this paper, we investigate and quantify the relationship between generalization and diversity in the multi-agent domain. Across the range of multi-agent environments considered here, procedurally generating training levels significantly improves agent performance on held-out levels. However, agent performance on the specific levels used in training sometimes declines as a result. To better understand the effects of co-player variation, our experiments introduce a new environment-agnostic measure of behavioral diversity. Results demonstrate that population size and intrinsic motivation are both effective methods of generating greater population diversity. In turn, training with a diverse set of co-players strengthens agent performance in some (but not all) cases.
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.
We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset.
We study the computational complexity of multi-agent path finding (MAPF). Given a graph $G$ and a set of agents, each having a start and target vertex, the goal is to find collision-free paths minimizing the total distance traveled. To better understand the source of difficulty of the problem, we aim to study the simplest and least constrained graph class for which it remains hard. To this end, we restrict $G$ to be a 2D grid, which is a ubiquitous abstraction, as it conveniently allows for modeling well-structured environments (e.g., warehouses). Previous hardness results considered highly constrained 2D grids having only one vertex unoccupied by an agent, while the most restricted hardness result that allowed multiple empty vertices was for (non-grid) planar graphs. We therefore refine previous results by simultaneously considering both 2D grids and multiple empty vertices. We show that even in this case distance-optimal MAPF remains NP-hard, which settles an open problem posed by Banfi et al. (2017). We present a reduction directly from 3-SAT using simple gadgets, making our proof arguably more informative than previous work in terms of potential progress towards positive results. Furthermore, our reduction is the first linear one for the case where $G$ is planar, appearing nearly four decades after the first related result. This allows us to go a step further and exploit the Exponential Time Hypothesis (ETH) to obtain an exponential lower bound for the running time of the problem. Finally, as a stepping stone towards our main results, we prove the NP-hardness of the monotone case, in which agents move one by one with no intermediate stops.
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
We address the problem of model-free distributed stabilization of heterogeneous multi-agent systems using reinforcement learning (RL). Two algorithms are developed. The first algorithm solves a centralized linear quadratic regulator (LQR) problem without knowing any initial stabilizing gain in advance. The second algorithm builds upon the results of the first algorithm, and extends it to distributed stabilization of multi-agent systems with predefined interaction graphs. Rigorous proofs are provided to show that the proposed algorithms achieve guaranteed convergence if specific conditions hold. A simulation example is presented to demonstrate the theoretical results.
In the context of global urbanization and motorization, traffic congestion has become a significant issue, severely affecting the quality of life, environment, and economy. This paper puts forward a single-agent reinforcement learning (RL)-based regional traffic signal control (TSC) model. Different from multi - agent systems, this model can coordinate traffic signals across a large area, with the goals of alleviating regional traffic congestion and minimizing the total travel time. The TSC environment is precisely defined through specific state space, action space, and reward functions. The state space consists of the current congestion state, which is represented by the queue lengths of each link, and the current signal phase scheme of intersections. The action space is designed to select an intersection first and then adjust its phase split. Two reward functions are meticulously crafted. One focuses on alleviating congestion and the other aims to minimize the total travel time while considering the congestion level. The experiments are carried out with the SUMO traffic simulation software. The performance of the TSC model is evaluated by comparing it with a base case where no signal-timing adjustments are made. The results show that the model can effectively control congestion. For example, the queuing length is significantly reduced in the scenarios tested. Moreover, when the reward is set to both alleviate congestion and minimize the total travel time, the average travel time is remarkably decreased, which indicates that the model can effectively improve traffic conditions. This research provides a new approach for large-scale regional traffic signal control and offers valuable insights for future urban traffic management.
Vehicle-to-vehicle (V2V) communications have a great potential to improve traffic system performance. Most existing work of connected and autonomous vehicles (CAVs) focused on adaptation to downstream traffic conditions, neglecting the impact of CAVs' behaviors on upstream traffic flow. In this paper, we introduce a notion of Leading Cruise Control (LCC) that retains the basic car-following operation and explicitly considers the influence of the CAV's actions on the vehicles behind. We first present a detailed modeling process for LCC. Then, rigorous controllability analysis verifies the feasibility of exploiting the CAV as a leader to actively lead the motion of its following vehicles. Besides, the head-to-tail transfer function is derived for LCC under adequate employment of V2V connectivity. Numerical studies confirm the potential of LCC to strengthen the capability of CAVs in suppressing traffic instabilities and smoothing traffic flow.
A control scheme for the multi-gated perimeter traffic flow control problem of cities is presented. The proposed scheme determines feasible and optimally distributed input flows for the various gates located at the periphery of a protected network. A parsimonious model is employed to describe the traffic dynamics of the protected network. To describe traffic dynamics outside of the protected area, the state-space model is augmented with additional state variables to account for vehicle queues at store-and-forward origin links at the periphery. The perimeter flow control problem is formulated as a convex optimisation problem with finite horizon, and constrained control and state variables. It aims to equalise the relative queues at origin links and to maintain the vehicle accumulation in the protected network around a desired set point, while the system's throughput is maximised. For real-time control, the optimal control problem is embedded in a rolling-horizon scheme using the current state of the system as the initial state as well as predicted demand flows at entrance links. Furthermore, practical flow allocation policies for single-region perimeter control without explicitly considering entrance link dynamics are presented. These policies allocate a global perimeter-ordered flow to candidate gates at the periphery of a protected network by taking into account the different geometric characteristics of origin links. The proposed flow allocation policies are then benchmarked against the multi-gated perimeter flow control. A study is carried out for a 2.5 square mile protected network area of San Francisco, CA, including fifteen gates of different geometric characteristics. The results have showed that the proposed scheme is able to manage excessive queues outside of the protected network and to optimally distribute the input flows, which confirms its efficiency and equity properties.
Traffic congestion, primarily driven by intersection queuing, significantly impacts urban living standards, safety, environmental quality, and economic efficiency. While Traffic Signal Control (TSC) systems hold potential for congestion mitigation, traditional optimization models often fail to capture real-world traffic complexity and dynamics. This study introduces a novel single-agent reinforcement learning (RL) framework for regional adaptive TSC, circumventing the coordination complexities inherent in multi-agent systems through a centralized decision-making paradigm. The model employs an adjacency matrix to unify the encoding of road network topology, real-time queue states derived from probe vehicle data, and current signal timing parameters. Leveraging the efficient learning capabilities of the DreamerV3 world model, the agent learns control policies where actions sequentially select intersections and adjust their signal phase splits to regulate traffic inflow/outflow, analogous to a feedback control system. Reward design prioritizes queue dissipation, directly linking congestion metrics (queue length) to control actions. Simulation experiments conducted in SUMO demonstrate the model's effectiveness: under inference scenarios with multi-level (10%, 20%, 30%) Origin-Destination (OD) demand fluctuations, the framework exhibits robust anti-fluctuation capability and significantly reduces queue lengths. This work establishes a new paradigm for intelligent traffic control compatible with probe vehicle technology. Future research will focus on enhancing practical applicability by incorporating stochastic OD demand fluctuations during training and exploring regional optimization mechanisms for contingency events.
Within the Nagel-Schreckenberg traffic flow model we consider the transition from the free flow regime to the jammed regime. We introduce a method of analyzing the data which is based on the local density distribution. This analyzes allows us to determine the phase diagram and to examine the separation of the system into a coexisting free flow phase and a jammed phase above the transition. The investigation of the steady state structure factor yields that the decomposition in this phase coexistence regime is driven by density fluctuations, provided they exceed a critical wavelength.
This paper examines the evolution, architecture, and practical applications of AI agents from their early, rule-based incarnations to modern sophisticated systems that integrate large language models with dedicated modules for perception, planning, and tool use. Emphasizing both theoretical foundations and real-world deployments, the paper reviews key agent paradigms, discusses limitations of current evaluation benchmarks, and proposes a holistic evaluation framework that balances task effectiveness, efficiency, robustness, and safety. Applications across enterprise, personal assistance, and specialized domains are analyzed, with insights into future research directions for more resilient and adaptive AI agent systems.
Several studies have employed reinforcement learning (RL) to address the challenges of regional adaptive traffic signal control (ATSC) and achieved promising results. In this field, existing research predominantly adopts multi-agent frameworks. However, the adoption of multi-agent frameworks presents challenges for scalability. Instead, the Traffic signal control (TSC) problem necessitates a single-agent framework. TSC inherently relies on centralized management by a single control center, which can monitor traffic conditions across all roads in the study area and coordinate the control of all intersections. This work proposes a single-agent RL-based regional ATSC model compatible with probe vehicle technology. Key components of the RL design include state, action, and reward function definitions. To facilitate learning and manage congestion, both state and reward functions are defined based on queue length, with action designed to regulate queue dynamics. The queue length definition used in this study differs slightly from conventional definitions but is closely correlated with congestion states. More importantly, it allows for reliable estimation using link travel time data from probe vehicles. With probe vehicle data already covering most urban roads, this feature enhances the proposed method's potential for widespread deployment. The method was comprehensively evaluated using the SUMO simulation platform. Experimental results demonstrate that the proposed model effectively mitigates large-scale regional congestion levels via coordinated multi-intersection control.
This contribution presents experimental study of two-dimensional pedestrian flow with the aim to capture the pedestrian behaviour within the cluster formed in front of the bottleneck. Two experiments of passing through a room with one entrance and one exit were arranged according to phase transition study in Ezaki et al. (2012), the inflow rate was regulated to obtain different walking modes. By means of automatic image processing, pedestrians' paths are extracted from camera records to get actual velocity and local density. Macroscopic information is extracted by means of virtual detector and leaving times of pedestrians. The pedestrian's behaviour is evaluated by means of density and velocity. Different approaches of measurement are compared using several fundamental diagrams. Two phases of crowd behaviour have been recognized and the phase transition was described.
Traffic and pedestrian systems consist of human collectives where agents are intelligent and capable of processing available information, to perform tactical manoeuvres that can potentially increase their movement efficiency. In this study, we introduce a social force model for agents that possess memory. Information of the agent's past affects the agent's instantaneous movement in order to swiftly take the agent towards its desired state. We show how the presence of memory is akin to an agent performing a proportional-integral control to achieve its desired state. The longer the agent remembers and the more impact the memory has on its motion, better is the movement of an isolated agent in terms of achieving its desired state. However, when in a collective, the interactions between the agents lead to non-monotonic effect of memory on the traffic dynamics. A group of agents with memory exiting through a narrow door exhibit more clogging with memory than without it. We also show that a very large amount of memory results in variation in the memory force experienced by agents in the system at any time, which reduces the propensity to form clogs and leads to efficient movement.
The control of a single agent in complex and uncertain multi-agent environments requires careful consideration of the interactions between the agents. In this context, this paper proposes a dual model predictive control (MPC) method using Gaussian process (GP) models for multi-agent systems. While Gaussian process MPC (GP-MPC) has been shown to be effective in predicting the dynamics of other agents, current methods do not consider the influence of the control input on the covariance of the predictions, and hence lack the dual control effect. Therefore, we propose a dual MPC that directly optimizes the actions of the ego agent, and the belief of the other agents by jointly optimizing their state trajectories as well as the associated covariance while considering their interactions through a GP. We demonstrate our GP-MPC method in a simulation study on autonomous driving, showing improved prediction quality compared to a baseline stochastic MPC. The results show that GP-MPC can learn the interactions between the agents online, demonstrating the potential of GPs for dual MPC in uncertain and unseen scenarios.
We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls' optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach's effectiveness on a 150-dimensional multi-agent problem with obstacles.
Intersections are essential road infrastructures for traffic in modern metropolises. However, they can also be the bottleneck of traffic flows as a result of traffic incidents or the absence of traffic coordination mechanisms such as traffic lights. Recently, various control and coordination mechanisms that are beyond traditional control methods have been proposed to improve the efficiency of intersection traffic by leveraging the ability of autonomous vehicles. Amongst these methods, the control of foreseeable mixed traffic that consists of human-driven vehicles (HVs) and robot vehicles (RVs) has emerged. We propose a decentralized multi-agent reinforcement learning approach for the control and coordination of mixed traffic by RVs at real-world, complex intersections -- an open challenge to date. We design comprehensive experiments to evaluate the effectiveness, robustness, generalizablility, and adaptability of our approach. In particular, our method can prevent congestion formation via merely 5% RVs under a real-world traffic demand of 700 vehicles per hour. In contrast, without RVs, congestion will form when the traffic demand reaches as low as 200 vehicles per hour. Moreover, when the RV penetration rate exceeds 60%, our method starts to outperform traffic signal control in terms of the average waiting time of all vehicles. Our method is not only robust against blackout events, sudden RV percentage drops, and V2V communication error, but also enjoys excellent generalizablility, evidenced by its successful deployment in five unseen intersections. Lastly, our method performs well under various traffic rules, demonstrating its adaptability to diverse scenarios. Videos and code of our work are available at https://sites.google.com/view/mixedtrafficcontrol
This paper introduces SOLID (Synergizing Optimization and Large Language Models for Intelligent Decision-Making), a novel framework that integrates mathematical optimization with the contextual capabilities of large language models (LLMs). SOLID facilitates iterative collaboration between optimization and LLMs agents through dual prices and deviation penalties. This interaction improves the quality of the decisions while maintaining modularity and data privacy. The framework retains theoretical convergence guarantees under convexity assumptions, providing insight into the design of LLMs prompt. To evaluate SOLID, we applied it to a stock portfolio investment case with historical prices and financial news as inputs. Empirical results demonstrate convergence under various scenarios and indicate improved annualized returns compared to a baseline optimizer-only method, validating the synergy of the two agents. SOLID offers a promising framework for advancing automated and intelligent decision-making across diverse domains.
We model human decision-making behaviors in a risk-taking task using inverse reinforcement learning (IRL) for the purposes of understanding real human decision making under risk. To the best of our knowledge, this is the first work applying IRL to reveal the implicit reward function in human risk-taking decision making and to interpret risk-prone and risk-averse decision-making policies. We hypothesize that the state history (e.g. rewards and decisions in previous trials) are related to the human reward function, which leads to risk-averse and risk-prone decisions. We design features that reflect these factors in the reward function of IRL and learn the corresponding weight that is interpretable as the importance of features. The results confirm the sub-optimal risk-related decisions of human-driven by the personalized reward function. In particular, the risk-prone person tends to decide based on the current pump number, while the risk-averse person relies on burst information from the previous trial and the average end status. Our results demonstrate that IRL is an effective tool to model human decision-making behavior, as well as to help interpret the human psychological process in risk decision-making.
The premise of the Multi-disciplinary Conference on Reinforcement Learning and Decision Making is that multiple disciplines share an interest in goal-directed decision making over time. The idea of this paper is to sharpen and deepen this premise by proposing a perspective on the decision maker that is substantive and widely held across psychology, artificial intelligence, economics, control theory, and neuroscience, which I call the "common model of the intelligent agent". The common model does not include anything specific to any organism, world, or application domain. The common model does include aspects of the decision maker's interaction with its world (there must be input and output, and a goal) and internal components of the decision maker (for perception, decision-making, internal evaluation, and a world model). I identify these aspects and components, note that they are given different names in different disciplines but refer essentially to the same ideas, and discuss the challenges and benefits of devising a neutral terminology that can be used across disciplines. It is time to recognize and build on the convergence of multiple diverse disciplines on a substantive common model of the intelligent agent.
In order to drive effectively, a driver must be aware of how they can expect other vehicles' behaviour to be affected by their decisions, and also how they are expected to behave by other drivers. One common family of methods for addressing this problem of interaction are those based on Game Theory. Such approaches often make assumptions about leaders and followers in an interaction which can result in conflicts arising when vehicles do not agree on the hierarchy, resulting in sub-optimal behaviour. In this work we define a measurement for the incidence of conflicts, Area of Conflict (AoC), for a given interactive decision-making model. Furthermore, we propose a novel decision-making method that reduces this value compared to an existing approach for incorporating altruistic behaviour. We verify our theoretical analysis empirically using a simulated lane-change scenario.
This workshop invites researchers and practitioners to participate in exploring behavioral change support intelligent transportation applications. We welcome submissions that explore intelligent transportation systems (ITS), which interact with travelers in order to persuade them or nudge them towards sustainable transportation behaviors and decisions. Emerging opportunities including the use of data and information generated by ITS and users' mobile devices in order to render personalized, contextualized and timely transport behavioral change interventions are in our focus. We invite submissions and ideas from domains of ITS including, but not limited to, multi-modal journey planners, advanced traveler information systems and in-vehicle systems. The expected outcome will be a deeper understanding of the challenges and future research directions with respect to behavioral change support through ITS.
The growing integration of robots in shared environments - such as warehouses, shopping centres, and hospitals - demands a deep understanding of the underlying dynamics and human behaviours, including how, when, and where individuals engage in various activities and interactions. This knowledge goes beyond simple correlation studies and requires a more comprehensive causal analysis. By leveraging causal inference to model cause-and-effect relationships, we can better anticipate critical environmental factors and enable autonomous robots to plan and execute tasks more effectively. To this end, we propose a novel causality-based decision-making framework that reasons over a learned causal model to assist the robot in deciding when and how to complete a given task. In the examined use case - i.e., a warehouse shared with people - we exploit the causal model to estimate battery usage and human obstructions as factors influencing the robot's task execution. This reasoning framework supports the robot in making informed decisions about task timing and strategy. To achieve this, we developed also PeopleFlow, a new Gazebo-based simulator designed to model context-sensitive human-robot spatial interactions in shared workspaces. PeopleFlow features realistic human and robot trajectories influenced by contextual factors such as time, environment layout, and robot state, and can simulate a large number of agents. While the simulator is general-purpose, in this paper we focus on a warehouse-like environment as a case study, where we conduct an extensive evaluation benchmarking our causal approach against a non-causal baseline. Our findings demonstrate the efficacy of the proposed solutions, highlighting how causal reasoning enables autonomous robots to operate more efficiently and safely in dynamic environments shared with humans.
Edge caching is a promising solution for next-generation networks by empowering caching units in small-cell base stations (SBSs), which allows user equipments (UEs) to fetch users' requested contents that have been pre-cached in SBSs. It is crucial for SBSs to predict accurate popular contents through learning while protecting users' personal information. Traditional federated learning (FL) can protect users' privacy but the data discrepancies among UEs can lead to a degradation in model quality. Therefore, it is necessary to train personalized local models for each UE to predict popular contents accurately. In addition, the cached contents can be shared among adjacent SBSs in next-generation networks, thus caching predicted popular contents in different SBSs may affect the cost to fetch contents. Hence, it is critical to determine where the popular contents are cached cooperatively. To address these issues, we propose a cooperative edge caching scheme based on elastic federated and multi-agent deep reinforcement learning (CEFMR) to optimize the cost in the network. We first propose an elastic FL algorithm to train the personalized model for each UE, where adversarial autoencoder (AAE) model is adopted for training to improve the prediction accuracy, then {a popular} content prediction algorithm is proposed to predict the popular contents for each SBS based on the trained AAE model. Finally, we propose a multi-agent deep reinforcement learning (MADRL) based algorithm to decide where the predicted popular contents are collaboratively cached among SBSs. Our experimental results demonstrate the superiority of our proposed scheme to existing baseline caching schemes.
Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning
Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose $\textbf{Shielded Multi-Agent Reinforcement Learning (SMARL)}$ as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded $n$-player game-theoretic benchmarks, demonstrating fewer constraint violations and significantly better cooperation under normative constraints. These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.
Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.
This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a set of techniques that have proven to be successful when dealing with single-agent reinforcement learning problems (SARL). The key idea of DQV algorithms is to jointly learn an approximation of the state-value function $V$, alongside an approximation of the state-action value function $Q$. We follow this principle and generalise these algorithms by introducing two fully decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are based on the centralised training with decentralised execution training paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC) environment. We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can even outperform them on some of the tested environments, being the algorithm which performs best overall. We hypothesise that this is due to the fact that QVMix suffers less from the overestimation bias of the $Q$ function.
This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-world data, the simulation models individual behaviors and large-scale mobility patterns in Taipei City. Key insights, such as route heat maps and mode-specific indicators, provide urban planners with actionable information for policy-making. Future work focuses on establishing robust validation frameworks to ensure accuracy and reliability in urban planning applications.
Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, yet access to such data is frequently hindered by privacy concerns. This research introduces a transformative framework for generating large-scale urban mobility trajectories, employing a novel application of a transformer-based model pre-trained and fine-tuned through a two-phase process. Initially, trajectory generation is conceptualized as an offline reinforcement learning (RL) problem, with a significant reduction in vocabulary space achieved during tokenization. The integration of Inverse Reinforcement Learning (IRL) allows for the capture of trajectory-wise reward signals, leveraging historical data to infer individual mobility preferences. Subsequently, the pre-trained model is fine-tuned using the constructed reward model, effectively addressing the challenges inherent in traditional RL-based autoregressive methods, such as long-term credit assignment and handling of sparse reward environments. Comprehensive evaluations on multiple datasets illustrate that our framework markedly surpasses existing models in terms of reliability and diversity. Our findings not only advance the field of urban mobility modeling but also provide a robust methodology for simulating urban data, with significant implications for traffic management and urban development planning. The implementation is publicly available at https://github.com/Wangjw6/TrajGPT_R.
The massive digital footprints generated by bike-sharing systems in megacities like Shanghai offer a novel perspective on the urban socio-economic fabric. This study investigates whether these daily mobility patterns can quantitatively map the city's underlying social stratification. To overcome the persistent challenge of acquiring fine-grained socio-economic data, we constructed a multi-layered analytical dataset. We annotated 2,000 raw bike trips with local economic attributes, derived from a novel data enrichment methodology that employs a Large Language Model (LLM), and integrated contextual features of the built environment. A Random Forest model was then utilized as an interpretable framework to determine the key factors governing the relationship between mobility behavior and local economic status. The analysis reveals a compelling and unambiguous finding: a neighborhood's economic level, proxied by housing prices, is the single most dominant predictor of its bike-sharing patterns, substantially outweighing other geographic or temporal factors. This economic determinism manifests in three distinct ways: (1) a spatial clustering of resources, a phenomenon we term the \textit{club effect}, which concentrates mobility infrastructure and usage in affluent areas; (2) a functional dichotomy between necessity-driven, utilitarian usage in lower-income zones and flexible, recreational usage in wealthier ones; and (3) a nuanced inverted U-shaped adoption curve that identifies the urban middle class as the system's primary user base.
The trend for Urban Air Mobility (UAM) is growing with prospective air taxis, parcel deliverers, and medical and industrial services. Safe and efficient UAM operation relies on timely communication and reliable data exchange. In this paper, we explore Cooperative Perception (CP) for Unmanned Aircraft Systems (UAS), considering the unique communication needs involving high dynamics and a large number of UAS. We propose a hybrid approach combining local broadcast with a central CP service, inspired by centrally managed U-space and broadcast mechanisms from automotive and aviation domains. In a simulation study, we show that our approach significantly enhances the environmental awareness for UAS compared to fully distributed approaches, with an increased communication channel load, which we also evaluate. These findings prompt a discussion on communication strategies for CP in UAM and the potential of a centralized CP service in future research.
Urban traffic flow is governed by the complex, nonlinear interaction between land use configuration and spatiotemporally heterogeneous mobility demand. Conventional global regression and time-series models cannot simultaneously capture these multi-scale dynamics across multiple travel modes. This study proposes a GeoAI Hybrid analytical framework that sequentially integrates Multiscale Geographically Weighted Regression (MGWR), Random Forest (RF), and Spatio-Temporal Graph Convolutional Networks (ST-GCN) to model the spatiotemporal heterogeneity of traffic flow patterns and their interaction with land use across three mobility modes: motor vehicle, public transit, and active transport. Applying the framework to an empirically calibrated dataset of 350 traffic analysis zones across six cities spanning two contrasting urban morphologies, four key findings emerge: (i) the GeoAI Hybrid achieves a root mean squared error (RMSE) of 0.119 and an R^2 of 0.891, outperforming all benchmarks by 23-62%; (ii) SHAP analysis identifies land use mix as the strongest predictor for motor vehicle flows and transit stop density as the strongest predictor for public transit; (iii) DBSCAN clustering identifies five functionally distinct urban traffic typologies with a silhouette score of 0.71, and GeoAI Hybrid residuals exhibit Moran's I=0.218 (p<0.001), a 72% reduction relative to OLS baselines; and (iv) cross-city transfer experiments reveal moderate within-cluster transferability (R^2>=0.78) and limited cross-cluster generalisability, underscoring the primacy of urban morphological context. The framework offers planners and transportation engineers an interpretable, scalable toolkit for evidence-based multimodal mobility management and land use policy design.
Micromobility, which utilizes lightweight mobile machines moving in urban public spaces, such as delivery robots and mobility scooters, emerges as a promising alternative to vehicular mobility. Current micromobility depends mostly on human manual operation (in-person or remote control), which raises safety and efficiency concerns when navigating busy urban environments full of unpredictable obstacles and pedestrians. Assisting humans with AI agents in maneuvering micromobility devices presents a viable solution for enhancing safety and efficiency. In this work, we present a scalable urban simulation solution to advance autonomous micromobility. First, we build URBAN-SIM - a high-performance robot learning platform for large-scale training of embodied agents in interactive urban scenes. URBAN-SIM contains three critical modules: Hierarchical Urban Generation pipeline, Interactive Dynamics Generation strategy, and Asynchronous Scene Sampling scheme, to improve the diversity, realism, and efficiency of robot learning in simulation. Then, we propose URBAN-BENCH - a suite of essential tasks and benchmarks to gauge various capabilities of the AI agents in achieving autonomous micromobility. URBAN-BENCH includes eight tasks based on three core skills of the agents: Urban Locomotion, Urban Navigation, and Urban Traverse. We evaluate four robots with heterogeneous embodiments, such as the wheeled and legged robots, across these tasks. Experiments on diverse terrains and urban structures reveal each robot's strengths and limitations.
Urban Air Mobility (UAM) offers a solution to current traffic congestion by providing on-demand air mobility in urban areas. Effective traffic management is crucial for efficient operation of UAM systems, especially for high-demand scenarios. In this paper, we present a centralized traffic management framework for on-demand UAM systems. Specifically, we provide a scheduling policy, called VertiSync, which schedules the aircraft for either servicing trip requests or rebalancing in the system subject to aircraft safety margins and energy requirements. We characterize the system-level throughput of VertiSync, which determines the demand threshold at which passenger waiting times transition from being stabilized to being increasing over time. We show that the proposed policy is able to maximize throughput for sufficiently large fleet sizes. We demonstrate the performance of VertiSync through a case study for the city of Los Angeles, and show that it significantly reduces passenger waiting times compared to a first-come first-serve scheduling policy.
Urban Artificial Intelligence (Urban AI) has advanced human-centered urban tasks such as perception prediction and human dynamics. Large Language Models (LLMs) can integrate multimodal inputs to address heterogeneous data in complex urban systems but often underperform on domain-specific tasks. Urban-MAS, an LLM-based Multi-Agent System (MAS) framework, is introduced for human-centered urban prediction under zero-shot settings. It includes three agent types: Predictive Factor Guidance Agents, which prioritize key predictive factors to guide knowledge extraction and enhance the effectiveness of compressed urban knowledge in LLMs; Reliable UrbanInfo Extraction Agents, which improve robustness by comparing multiple outputs, validating consistency, and re-extracting when conflicts occur; and Multi-UrbanInfo Inference Agents, which integrate extracted multi-source information across dimensions for prediction. Experiments on running-amount prediction and urban perception across Tokyo, Milan, and Seattle demonstrate that Urban-MAS substantially reduces errors compared to single-LLM baselines. Ablation studies indicate that Predictive Factor Guidance Agents are most critical for enhancing predictive performance, positioning Urban-MAS as a scalable paradigm for human-centered urban AI prediction. Code is available on the project website:https://github.com/THETUREHOOHA/UrbanMAS
The growing integration of urban air mobility (UAM) for urban transportation and delivery has accelerated due to increasing traffic congestion and its environmental and economic repercussions. Efficiently managing the anticipated high-density air traffic in cities is critical to ensure safe and effective operations. In this study, we propose a routing and scheduling framework to address the needs of a large fleet of UAM vehicles operating in urban areas. Using mathematical optimization techniques, we plan efficient and deconflicted routes for a fleet of vehicles. Formulating route planning as a maximum weighted independent set problem enables us to utilize various algorithms and specialized optimization hardware, such as quantum annealers, which has seen substantial progress in recent years. Our method is validated using a traffic management simulator tailored for the airspace in Singapore. Our approach enhances airspace utilization by distributing traffic throughout a region. This study broadens the potential applications of optimization techniques in UAM traffic management.
Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.
Preventing traffic congestion by forecasting near time traffic flows is an important problem as it leads to effective use of transport resources. Social network provides information about activities of humans and social events. Thus, with the help of social network, we can extract which humans will attend a particular event (in near time) and can estimate flow of traffic based on it. This opens up a wide area of research which poses need to have a framework for traffic management that can capture essential parameters of real-life behaviour and provide a way to iterate upon and evaluate new ideas. In this paper, we present building blocks of a framework and a system to simulate a city with its transport system, humans and their social network. We emphasize on relevant parameters selected and modular design of the framework. Our framework defines metrics to evaluate congestion avoidance strategies. To show utility of the framework, we present experimental studies of few strategies on a public transport system.
Platooning of multiple autonomous vehicles has attracted significant attention in both academia and industry. Despite its great potential, platooning is not the only choice for the formation of autonomous vehicles in mixed traffic flow, where autonomous vehicles and human-driven vehicles (HDVs) coexist. In this paper, we investigate the optimal formation of autonomous vehicles that can achieve an optimal system-wide performance in mixed traffic flow. Specifically, we consider the optimal $\mathcal{H}_2$ performance of the entire traffic flow, reflecting the potential of autonomous vehicles in mitigating traffic perturbations. Then, we formulate the optimal formation problem as a set function optimization problem. Numerical results reveal two predominant optimal formations: uniform distribution and platoon formation, depending on traffic parameters. In addition, we show that 1) the prevailing platoon formation is not always the optimal choice; 2) platoon formation might be the worst choice when HDVs have a poor string stability behavior. These results suggest more opportunities for the formation of autonomous vehicles, beyond platooning, in mixed traffic flow.
Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods.
Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regression applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy.
Transportation and traffic are currently undergoing a rapid increase in terms of both scale and complexity. At the same time, an increasing share of traffic participants are being transformed into agents driven or supported by artificial intelligence resulting in mixed-intelligence traffic. This work explores the implications of distributed decision-making in mixed-intelligence traffic. The investigations are carried out on the basis of an online-simulated highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first step traffic agents are trained by means of a deep reinforcement learning approach, being deployed inside an elitist evolutionary algorithm for hyperparameter search. The resulting architectures and training parameters are then utilized in order to either train a single autonomous traffic agent and transfer the learned weights onto a multi-agent scenario or else to conduct multi-agent learning directly. Both learning strategies are evaluated on different ratios of mixed-intelligence traffic. The strategies are assessed according to the average speed of all agents driven by artificial intelligence. Traffic patterns that provoke a reduction in traffic flow are analyzed with respect to the different strategies.
With the forecasted emergence of autonomous vehicles in urban traffic networks, new control policies are needed to leverage their potential for reducing congestion. While several efforts have studied the fully autonomous traffic control problem, there is a lack of models addressing the more imminent transitional stage wherein legacy and autonomous vehicles share the urban infrastructure. We address this gap by introducing a new policy for stochastic network traffic control involving both classes of vehicles. We conjecture that network links will have dedicated lanes for autonomous vehicles which provide access to traffic intersections and combine traditional green signal phases with autonomous vehicle-restricted signal phases named blue phases. We propose a new pressure-based, decentralized, hybrid network control policy that activates selected movements at intersections based on the solution of mixed-integer linear programs. We prove that the proposed policy is stable, i.e. maximizes network throughput, under conventional travel demand conditions. We conduct numerical experiments to test the proposed policy under varying proportions of autonomous vehicles. Our experiments reveal that considerable trade-offs exist in terms of vehicle-class travel time based on the level of market penetration of autonomous vehicles. Further, we find that the proposed hybrid network control policy improves on traditional green phase traffic signal control for high levels of congestion, thus helping in quantifying the potential benefits of autonomous vehicles in urban networks.
Vehicular traffic is a classical example of a multi-agent system in which autonomous drivers operate in a shared environment. The article provides an overview of the state-of-the-art in microscopic traffic modeling and the implications for simulation techniques. We focus on the short-time dynamics of car-following models which describe continuous feedback control tasks (acceleration and braking) and models for discrete-choice tasks as a response to the surrounding traffic. The driving style of an agent is characterized by model parameters such as reaction time, desired speed, desired time gap, anticipation etc. In addition, internal state variables corresponding to the agent's "mind" are used to incorporate the driving experiences. We introduce a time-dependency of some parameters to describe the frustration of drivers being in a traffic jam for a while. Furthermore, the driver's behavior is externally influenced by the neighboring vehicles and also by environmental input such as limited motorization and braking power, visibility conditions and road traffic regulations. A general approach for dealing with discrete decision problems in the context of vehicular traffic is introduced and applied to mandatory and discretionary lane changes. Furthermore, we consider the decision process whether to brake or not when approaching a traffic light turning from green to amber. Another aspect of vehicular traffic is related to the heterogeneity of drivers. We discuss a hybrid system of coupled vehicle and information flow which can be used for developing and testing applications of upcoming inter-vehicle communication techniques.
This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms.
Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles (UAVs).
本报告通过对智能交通领域文献的系统性整理,将核心研究逻辑划分为五大方向:(1) MARL在交通信号与流量控制中的高效协作;(2) 自动驾驶与多智能体系统在路径规划与决策中的自主演进;(3) 大语言模型与多模态AI赋能的智能交通决策与语义感知;(4) 空中交通与车联网通信资源的高度优化与保障;(5) 城市交通行为模式的深度分析与感知。总体上,智能交通研究正从单纯的规则驱动与算法仿真,演变为具备推理能力的大模型交互与多主体协同控制的新形态。