AI与自动驾驶
自动驾驶综述、安全性与行业趋势
涵盖了自动驾驶领域的宏观综述、技术挑战分析、AI安全性(如对抗攻击)、可解释性以及未来发展趋势的探讨。
- Autonomous Driving Technology Trend and Future Outlook: Powered by Artificial Intelligence(Sanmin Kim, Youngseok Kim, Hyeongseok Jeon, Dongsuk Kum, Kibeom Lee, 2022, Transaction of the Korean Society of Automotive Engineers)
- Trustworthy Artificial Intelligence Requirements in the Autonomous Driving Domain(D. Fernández-Llorca, Emilia Gómez, 2023, Computer)
- Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems(Jiqian Dong, Sikai Chen, Mohammad Miralinaghi, Tiantian Chen, Pei Li, S. Labi, 2023, Transportation Research Part C: Emerging Technologies)
- A Review of Machine Learning Techniques Utilised in Self-Driving Cars(Zahraa Salah Dhaif, Nidhal K. El Abbadi, 2024, Iraqi Journal for Computer Science and Mathematics)
- A Survey on Theories and Applications for Self-Driving Cars Based on Deep Learning Methods(J. Ni, Yinan Chen, Yuanchun Chen, Jinxiu Zhu, D. Ali, Weidong Cao, 2020, Applied Sciences)
- Autonomous Driving Driven by Artificial Intelligence: Development Status and Future Prospects(Li Geng, 2025, Computers and Artificial Intelligence)
- A Comprehensive Review on Deep Learning-Based Motion Planning and End-to-End Learning for Self-Driving Vehicle(Manikandan Ganesan, S. Kandhasamy, Bharatiraja Chokkalingam, L. Mihet-Popa, 2024, IEEE Access)
- Deep Learning for Self-Driving Cars: Chances and Challenges(Qing Rao, Jelena Frtunikj, 2018, Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems)
- Driverless Car Using AI(P Acharjee, 2024, Artificial Intelligence for Multimedia Information …)
- A STUDY ON DRIVERLESS CAR TECHNOLOGIES AND IMPLEMENTATION OF A SENSOR-BASED SPEED-CONTROLLED AUTONOMOUS VEHICLE(Md. Tanjil, I. Khan, 2023, International Research Journal of Modernization in Engineering Technology and Science)
- Attacks on Machine Learning: Adversarial Examples in Connected and Autonomous Vehicles(Prinkle Sharma, David Austin, Hong Liu, 2019, 2019 IEEE International Symposium on Technologies for Homeland Security (HST))
- Autonomous Vision of Driverless car in Machine Learning(Jiaxuan Lu, 2022, Advances in Economics, Business and Management Research)
- Autonomous decision making for a driver-less car(N. Gallardo, N. Gamez, P. Rad, Mo M. Jamshidi, 2017, 2017 12th System of Systems Engineering Conference (SoSE))
- Autonomous driving system: A comprehensive survey(Jingyuan Zhao, Wenyi Zhao, Bo Deng, Zhenghong Wang, Fengwangdong Zhang, Wenxiang Zheng, Wanke Cao, Jinrui Nan, Yubo Lian, Andrew F. Burke, 2023, Expert Systems with Applications)
环境感知、高精度定位与SLAM技术
聚焦于计算机视觉、传感器融合(LiDAR/视觉/GPS/惯导)、SLAM算法以及语义分割,旨在解决车辆在复杂环境下的定位、建图与障碍物识别问题。
- LiDAR-Visual Fusion SLAM for Autonomous Vehicle Location(Qinglu Ma, Qiuwei Jian, Meiqiang Li, Saleem Ullah, 2025, IEEE Internet of Things Journal)
- Traffic Light Detection and Recognition for Self Driving Cars Using Deep Learning(Ruturaj Kulkarni, Shruti Dhavalikar, S. Bangar, 2018, 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA))
- A Survey of Computer Vision Detection, Visual SLAM Algorithms, and Their Applications in Energy-Efficient Autonomous Systems(Lu Chen, Gun Li, Weisi Xie, Jie Tan, Yang Li, Junfeng Pu, Lizhu Chen, D. Gan, W. Shi, 2024, Energies)
- Deep Learning for Visual SLAM in Transportation Robotics: A review(Chaojing Duan, S. Junginger, Jiahao Huang, Kairong Jin, K. Thurow, 2019, Transportation Safety and Environment)
- Robust Autonomous Vehicle Computer-Vision-Based Localization in Challenging Environmental Conditions(Sergei Chuprov, P. Belyaev, Ruslan Gataullin, Leon Reznik, E. Neverov, Ilia Viksnin, 2023, Applied Sciences)
- Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars(A. I. Maqueda, Antonio Loquercio, Guillermo Gallego, N. García, D. Scaramuzza, 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition)
- Image Processing & M.L Based Driverless Car with Image Detection System(Ketan D. Bodhe, H. Taiwade, Janhvi Ingle, 2021, IJARCCE)
- Real-Time Computer Vision/DGPS-Aided Inertial Navigation System for Lane-Level Vehicle Navigation(Anh Vu, Arvind Ramanandan, Anning Chen, J. Farrell, M. Barth, 2012, IEEE Transactions on Intelligent Transportation Systems)
- Precise positioning and prediction system for autonomous driving based on generative artificial intelligence(Beichang Liu, Guoqing Cai, Zhipeng Ling, Jili Qian, Quan Zhang, 2024, Applied and Computational Engineering)
- HOOFR SLAM System: An Embedded Vision SLAM Algorithm and Its Hardware-Software Mapping-Based Intelligent Vehicles Applications(D. Nguyen, A. Elouardi, Sergio Rodríguez Flórez, S. Bouaziz, 2019, IEEE Transactions on Intelligent Transportation Systems)
- Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning(Stefan Milz, Georg Arbeiter, Christian Witt, Bassam Abdallah, S. Yogamani, 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
- Neural Network-Based Recent Research Developments in SLAM for Autonomous Ground Vehicles: A Review(Hajira Saleem, R. Malekian, Hussan Munir, 2023, IEEE Sensors Journal)
- Visual SLAM for autonomous ground vehicles(Henning Lategahn, Andreas Geiger, B. Kitt, 2011, 2011 IEEE International Conference on Robotics and Automation)
- A Driverless Vehicle Vision Path Planning Algorithm for Sensor Fusion(Xinyu Wang, 2019, 2019 IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE))
- Map Construction and Path Planning Method for a Mobile Robot Based on Multi-Sensor Information Fusion(Aijuan Li, Jiaping Cao, Shunming Li, Zhen Huang, Jinbo Wang, Gang Liu, 2022, Applied Sciences)
- State of the Art in Vision-Based Localization Techniques for Autonomous Navigation Systems(Yusra Alkendi, L. Seneviratne, Yahya Zweiri, 2021, IEEE Access)
- A Review of SLAM Techniques and Security in Autonomous Driving(Ashutosh Singandhupe, Hung M. La, 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC))
- Visual SLAM algorithms: a survey from 2010 to 2016(Takafumi Taketomi, Hideaki Uchiyama, Sei Ikeda, 2017, IPSJ Transactions on Computer Vision and Applications)
- Stereo Visual SLAM for Autonomous Vehicles: A Review(Boyu Gao, H. Lang, Jing Ren, 2020, 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC))
- A Deep Analysis of Visual SLAM Methods for Highly Automated and Autonomous Vehicles in Complex Urban Environment(Ke Wang, Guoliang Zhao, Jianbo Lu, 2024, IEEE Transactions on Intelligent Transportation Systems)
- Monocular reconstruction of vehicles: Combining SLAM with shape priors(Falak Chhaya, Dinesh Reddy Narapureddy, Sarthak Upadhyay, Visesh Chari, M. Zia, K. Krishna, 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA))
- Intelligent Semantic Segmentation for Self-Driving Vehicles Using Deep Learning(Qusay Sellat, S. Bisoy, R. Priyadarshini, Ankit Vidyarthi, Sandeep Kautish, R. Barik, 2022, Computational Intelligence and Neuroscience)
轨迹预测与行为建模
专门研究利用深度学习方法对行人及周围车辆进行未来轨迹预测,强调多模态交互、驾驶知识融入及场景上下文理解。
- A Review of Deep Learning-Based Methods for Pedestrian Trajectory Prediction(Bogdan Ilie Sighencea, R. Stanciu, C. Căleanu, 2021, Sensors)
- Incorporating Driving Knowledge in Deep Learning Based Vehicle Trajectory Prediction: A Survey(Zhezhang Ding, Huijing Zhao, 2023, IEEE Transactions on Intelligent Vehicles)
- Deep Learning for Vehicle Trajectory Prediction in Intelligent Transportation: Methods, Challenges, and Future Directions(Laixiang Xu, Xiaowei Wang, Xiangjun Chen, Tiwei Zeng, Hao Xi, Xiaojie Du, Junmin Zhao, 2026, Archives of Computational Methods in Engineering)
- Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light(Seoyoung Lee, H. Park, Yeonhwi You, Sungjung Yong, Il-Young Moon, 2023, Applied Sciences)
- Predict the Performance of Driverless Car through the Cognitive Data Analysis and Reliability Analysis based Approach(Vikas Khare, Ankita Jain, 2023, e-Prime - Advances in Electrical Engineering, Electronics and Energy)
- OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model(Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Volker Tresp, Alois Knoll, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- A Survey of Deep Learning-Based Pedestrian Trajectory Prediction: Challenges and Solutions(Jiaming Jiang, Kai Yan, Xindong Xia, Biao Yang, 2025, Sensors)
- Deep‐learning‐based vehicle trajectory prediction: A review(Chenhui Yin, Marco Cecotti, D. Auger, A. Fotouhi, Haobin Jiang, 2025, IET Intelligent Transport Systems)
- Deep Learning Methods for Vehicle Trajectory Prediction: A Survey(Shuvam Shiwakoti, Suryodaya Bikram Shahi, Priya Singh, 2023, Lecture Notes in Networks and Systems)
- Trajectory Prediction of Vehicles Based on Deep Learning(Huatao Jiang, Lin Chang, Qing Li, Dapeng Chen, 2019, 2019 4th International Conference on Intelligent Transportation Engineering (ICITE))
- DeepTrack: Lightweight Deep Learning for Vehicle Trajectory Prediction in Highways(Vinit Katariya, Mohammadreza Baharani, Nichole L. Morris, O. Shoghli, Hamed Tabkhi, 2021, IEEE Transactions on Intelligent Transportation Systems)
端到端自动驾驶与决策规划控制
探讨从传感器输入到控制指令的端到端学习框架,以及模块化的路径规划、动态避障与决策控制算法,涵盖Transformer、扩散模型及V2X协同技术。
- End-to-End Autonomous Driving Through V2X Cooperation(Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie, 2025, Proceedings of the AAAI Conference on Artificial Intelligence)
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving(Aditya Prakash, Kashyap Chitta, Andreas Geiger, 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Real-Time Self-Driving Car Navigation Using Deep Neural Network(Truong-Dong Do, Minh-Thien Duong, Q. Dang, M. Le, 2018, 2018 4th International Conference on Green Technology and Sustainable Development (GTSD))
- Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving(Xiaosong Jia, Qifeng Li, Junchi Yan, Zhenjie Yang, Zhiyuan Zhang, 2024, Advances in Neural Information Processing Systems 37)
- DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving(Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang, 2024, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Simulation of Self Driving Car Using Deep Learning(S. Lade, Parth Shrivastav, S. Waghmare, Sudarshan Hon, Sushil Waghmode, Shubham Teli, 2021, 2021 International Conference on Emerging Smart Computing and Informatics (ESCI))
- Self Driving Car using Deep Learning Technique(Chirag Sharma, S. Bharathiraja, G. Anusooya, 2020, International Journal of Engineering Research and)
- Neural Network Based Heterogeneous Sensor Fusion for Robot Motion Planning(Bijo Sebastian, Hailin Ren, Pinhas Ben-Tzvi, 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Environment-aware sensor fusion for obstacle detection(Adrian Rechy Romero, P. Borges, A. Elfes, Andreas Pfrunder, 2016, 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI))
- StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving(Ruiyang Hao, Bowen Jing, Haibao Yu, Zaiqing Nie, 2026, Proceedings of the AAAI Conference on Artificial Intelligence)
- A Method to Plan the Path of a Robot Utilizing Deep Reinforcement Learning and Multi-Sensory Information Fusion(Jieren Tan, 2023, Applied Artificial Intelligence)
- Simulation of Self-driving Car using Deep Learning(Aman Bhalla, Munipalle Sai Nikhila, Pradeep Singh, 2020, 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS))
- Development of Flexible Autonomous Car System Using Machine Learning and Blockchain(S. Ramachandran, A. K. Veeraraghavan, Uvais Karni, Kalpathy Sivaraman, 2018, Lecture Notes in Electrical Engineering)
- Navigating the Rules: Integrating TD3 and Sensor Fusion for Traffic-Aware Autonomous Vehicle Path Planning(Mahmoud Elsayed, Amr El-Mougy, 2025, International Journal of Intelligent Transportation Systems Research)
- Path planning of mobile robot based on multi-sensor information fusion(R. Xu, 2019, EURASIP Journal on Wireless Communications and Networking)
- Computer Vision based Autonomous Navigation in Controlled Environment(Sarmad Shafique, Samia Abid, F. Riaz, Zainab Ejaz, 2021, 2021 International Conference on Robotics and Automation in Industry (ICRAI))
- Dynamic Collision Avoidance Path Planning for Mobile Robot Based on Multi-sensor Data Fusion by Support Vector Machine(Jingwen Tian, Meijuan Gao, Erhong Lu, 2007, 2007 International Conference on Mechatronics and Automation)
- Hybrid Motion Planning Method for Autonomous Robots Using Kinect Based Sensor Fusion and Virtual Plane Approach in Dynamic Environments(Doopalam Tuvshinjargal, Byambaa Dorj, D. Lee, 2015, Journal of Sensors)
- Path planning algorithm for logistics autonomous vehicles at Cainiao stations based on multi-sensor data fusion(Yan Chen, 2025, PLOS One)
- Deep Learning Techniques for Obstacle Detection and Avoidance in Driverless Cars(N. Sanil, Pasumarthy Ankith Naga Venkat, R. V, Rishab Mallapur, Mohammed Riyaz Ahmed, 2020, 2020 International Conference on Artificial Intelligence and Signal Processing (AISP))
- Designing an Autonomous Vehicle Using Sensor Fusion Based on Path Planning and Deep Learning Algorithms(B. Suprapto, Suci Dwijayanti, Dimsyiar M.A. Hafiz, Farhan A. Ardandy, Javen Jonathan, 2024, SAIEE Africa Research Journal)
- Research on autonomous path planning based on multi-sensors information fusion(J Shen, 2024, … Conference on Automation Control, Algorithm, and …)
- Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?(Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, José M. Álvarez, 2023, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques(Nayereh Zaghari, M. Fathy, S. M. Jameii, M. Sabokrou, M. Shahverdy, 2020, The Journal of Supercomputing)
- End-to-End Autonomous Driving in CARLA: A Survey(Youssef Al Ozaibi, Manolo Dulva Hina, Amar Ramdane-Cherif, 2024, IEEE Access)
- A Review of End-to-End Autonomous Driving in Urban Environments(Daniel Coelho, Miguel Oliveira, 2022, IEEE Access)
- Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving(Xiaosong Jia, Peng Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, Hongyang Li, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving(Shuo Xing, C. Qian, Yuping Wang, Hongyuan Hua, Kexin Tian, Yang Zhou, Zhengzhong Tu, 2024, 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW))
- Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art(Joel Janai, Fatma Güney, Aseem Behl, Andreas Geiger, 2020, … in Computer Graphics and Vision)
- A Survey on Hybrid Human-Artificial Intelligence for Autonomous Driving(Huansheng Ning, Rui Yin, A. Ullah, Feifei Shi, 2021, IEEE Transactions on Intelligent Transportation Systems)
- Sensor-Fusion Based Navigation for Autonomous Mobile Robot(Vygantas Ušinskis, Michał Nowicki, Andrius Dzedzickis, V. Bučinskas, 2025, Sensors)
- A Physical Law Constrained Deep Learning Model for Vehicle Trajectory Prediction(Hanchu Li, Ziyi Liao, Yikang Rui, Linchao Li, Bin Ran, 2023, IEEE Internet of Things Journal)
- Experimental Autonomous Road Vehicle with Logical Artificial Intelligence(S. Shadrin, O. Varlamov, A. Ivanov, 2017, Journal of Advanced Transportation)
- Path Planning and Control of Mobile Robot in Road Environments Using Sensor Fusion and Active Force Control(Mohammed A. H. Ali, M. Mailah, 2019, IEEE Transactions on Vehicular Technology)
- A Comprehensive Review on Autonomous Navigation(Saeid Nahavandi, R. Alizadehsani, D. Nahavandi, Shady M. K. Mohamed, N. Mohajer, M. Rokonuzzaman, Ibrahim Hossain, 2022, ACM Computing Surveys)
- Autonomous driving of vehicles based on artificial intelligence(Xianping Gao, X. Bian, 2021, Journal of Intelligent & Fuzzy Systems)
本报告将自动驾驶领域的文献划分为四大核心板块:首先是宏观层面的综述、安全性与行业趋势;其次是底层感知、定位与SLAM技术;第三是针对动态交互的轨迹预测与行为建模;最后是涵盖端到端学习、路径规划与决策控制的系统实现。该分类体系清晰地展现了从基础感知到智能决策、从模块化设计到端到端一体化的技术演进路径。
总计83篇相关文献
With the fast development, Internet technology has become a game-changer to the automotive industry. The advances and general applications of high-precision maps make it possible for accurate real-time positioning of vehicles. Meanwhile, the extensive applications of intelligent driving technology make it easier and more intelligent to drive vehicles. This paper reviewed the application of artificial intelligence (AI) in the field of autonomous driving comprehensively and explored the innovative studies of other unmanned motion systems at the same time. Firstly, the hardware architecture of the autonomous driving system is introduced, including five modules as follows: sensing, autonomous driving computer, power supply, signal communication, execution and braking. In addition, General Motors autonomous vehicle is used as an example to introduce its differences from the traditional vehicles in the hardware part. Subsequently, the autonomous driving software is divided into four modules according to functions: positioning, sensing, planning, and control, and the innovative application of artificial intelligence algorithms is introduced. Finally, this paper expands from autonomous driving technology and puts forward an innovative research idea for the intelligent unmanned system.
With the continuous development of Artificial Intelligence (AI), autonomous driving has become a popular research area. AI enables the autonomous driving system to make a judgment, which makes studies on autonomous driving reaches a period of booming development. However, due to the defects of AI, it is not easy to realize a general intelligence, which also limits the research on autonomous driving. In this paper, we summarize the existing architectures of autonomous driving and make a taxonomy. Then we introduce the concept of hybrid human-artificial intelligence (H-AI) into a semi-autonomous driving system. For making better use of H-AI, we propose a theoretical architecture based on it. Given our architecture, we classify and overview the possible technologies and illustrate H-AI’s improvements, which provides a new perspective for the future development. Finally, we have identified several open research challenges to attract the researchers for presenting reliable solutions in this area of research.
… driving systems, one promising way of building user trust is through the concept of explainable artificial intelligence … enhance trustworthiness in autonomous driving systems through the …
We identify the maturity level of the different requirements for artificial intelligence (AI) in autonomous driving and outline the main challenges to be addressed in the future to ensure that automotive AI systems are developed in a trustworthy way.
… of artificial intelligence (AI) and machine learning in these processes. Overall, the survey presents the rapid progress in the field of autonomous driving, … and efficient autonomous future. …
This article describes some technical issues regarding the adaptation of a production car to a platform for the development and testing of autonomous driving technologies. A universal approach to performing the reverse engineering of electric power steering (EPS) for the purpose of external control is also presented. The primary objective of the related study was to solve the problem associated with the precise prediction of the dynamic trajectory of an autonomous vehicle. This was accomplished by deriving a new equation for determining the lateral tire forces and adjusting some of the vehicle parameters under road test conductions. A Mivar expert system was also integrated into the control system of the experimental autonomous vehicle. The expert system was made more flexible and effective for the present application by the introduction of hybrid artificial intelligence with logical reasoning. The innovation offers a solution to the major problem of liability in the event of an autonomous transport vehicle being involved in a collision.
This paper aims to explore the current status and future development trends of artificial intelligence technology in the field of autonomous driving. By analyzing the application of artificial intelligence technologies such as computer vision, deep learning and reinforcement learning in autonomous driving, this paper shows that autonomous driving is currently a hot topic in society. At present, L2 and L3 autonomous driving systems have been launched. In the future, autonomous driving may develop in the direction of vehicle‒road collaboration and L4 unmanned delivery. In addition, we still face many challenges, such as the accuracy attenuation of computer vision algorithms in extreme weather and the proportion of responsibility between car companies and users in autonomous driving accidents.
… However, in recent years, autonomous driving technology … In particular, the essential components of autonomous driving, … and development of autonomous driving technology will be …
Self-driving systems collect vast amounts of data through a variety of sensors, including cameras, lidar, millimeter-wave radar, and more. This data needs to be processed in real time to identify obstacles such as roads, vehicles, pedestrians and make decisions accordingly. Therefore, this paper discusses the importance of accurate positioning and prediction system in automatic driving technology, and analyzes the performance of various positioning technologies in automatic driving applications.In addition, the paper explores the application potential of AI technology in autonomous driving and the prospect of combining advanced positioning and prediction systems with generative AI. Overall, this study highlights the importance of algorithm performance improvement and artificial intelligence technology in the development of autonomous driving technology, and provides new ideas and directions for the innovation and development of intelligent transportation systems in the future.
Self-driving cars are a hot research topic in science and technology, which has a great influence on social and economic development. Deep learning is one of the current key areas in the field of artificial intelligence research. It has been widely applied in image processing, natural language understanding, and so on. In recent years, more and more deep learning-based solutions have been presented in the field of self-driving cars and have achieved outstanding results. This paper presents a review of recent research on theories and applications of deep learning for self-driving cars. This survey provides a detailed explanation of the developments of self-driving cars and summarizes the applications of deep learning methods in the field of self-driving cars. Then the main problems in self-driving cars and their solutions based on deep learning methods are analyzed, such as obstacle detection, scene recognition, lane detection, navigation and path planning. In addition, the details of some representative approaches for self-driving cars using deep learning methods are summarized. Finally, the future challenges in the applications of deep learning for self-driving cars are given out.
Self-Driving Vehicles (SDVs) are increasingly popular, with companies like Google, Uber, and Tesla investing significantly in self-driving technology. These vehicles could transform commuting, offering safer, and efficient transport. A key SDV aspect is motion planning, generating secure, and efficient routes. This ensures safe navigation and prevents collisions with obstacles, pedestrians, and other vehicles. Deep Learning (DL) could aid SDV motion planning. AI tools and algorithms, like Artificial Neural Networks (ANNs), Machine Learning (ML) and DL can learn from data to create effective driving strategies, enhancing SDV adaptability to changing conditions for improved safety and efficiency. This survey gives a DL-based motion planning overview for SDVs, covering behaviour planning, trajectory planning, and End to End Learning (E2EL). It assesses various DL-based behaviour and trajectory planning methods, comparing and summarizing them. It also reviews diverse E2EL techniques including Imitation Learning (IL) and Reinforcement Learning (RL) gaining traction lately. Additionally, this review emphasizes the significance of two crucial enablers: datasets and simulation deployment frameworks for SDVs. The survey compares strategies using multiple metrics and highlights DL-based SDV implementation challenges, including simulation and real-world use cases. This article also suggests future research directions to address E2EL and DL-based motion planning limitations. The presented article is an excellent reference for scholars, engineers, and decision-makers who have an interest in DL-based SDV motion planning.
Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (~1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.
Artificial Intelligence (AI) is revolutionizing the modern society. In the automotive industry, researchers and developers are actively pushing deep learning based approaches for autonomous driving. However, before a neural network finds its way into series production cars, it has to first undergo strict assessment concerning functional safety. The chances and challenges of incorporating deep learning for self-driving cars are presented in this paper.
Science and technology researchers are currently focused on the creationof self-driving cars. This can havea profound effect on social and economic progress;self-driving vehicles can help reduce auto accidents dramatically and enhance the quality of life of people the world over. Self-driving cars have had a tremendous increase in popularity in the recent past because of artificial intelligence development. However, there is a lot of research work to be done to manufacture fully-automated cars becausea self-driving carshas tto be able to sense its environment and operate without human involvement. A human passenger is not required to take control of the vehicle at any time, nor are they required to be present in the vehicle at all. Currently, self-driving cars are still at level 3 and are not allowed ply the roads due to many challenges which usually cause blurred images, including irregular roads, weather factors (rain and fog).This paper is a review study on self-driving cars, and will be examining the obstacles that self-driving cars face, as well as how they might overcome them. The paperwill provide the researchers with pieces of informationabout self-driving cars, the challenges they face, the recent methods usedto overcome these challenges,and theadvantage, disadvantage, and accuracyof these methods. The paper aims to encourage researchers to work on solving the problems that inhibitthe evolution of self-driving vehicles
In this paper, a monocular vision-based self-driving car prototype using Deep Neural Network on Raspberry Pi is proposed. Self-driving cars are one of the most increasing interests in recent years as the definitely developing relevant hardware and software technologies toward fully autonomous driving capability with no human intervention. Level-3/4 autonomous vehicles are potentially turning into a reality in near future. Convolutional Neural Networks (CNNs) have been shown to achieve significant performance in various perception and control tasks in comparison to other techniques in the latest years. The key factors behind these impressive results are their ability to learn millions of parameters using a large amount of labeled data. In this work, we concentrate on finding a model that directly maps raw input images to a predicted steering angle as output using a deep neural network. The technical contributions of this work are two-fold. First, the CNN model parameters were trained by using data collected from vehicle platform built with a 1/10 scale RC car, Raspberry Pi 3 Model B computer and front-facing camera. The training data were road images paired with the time-synchronized steering angle generated by manually driving. Second, road tests the model on Raspberry to drive itself in the outdoor environment around oval-shaped and 8-shaped with traffic sign lined track. The experimental results demonstrate the effectiveness and robustness of autopilot model in lane keeping task. Vehicle’s top speed is about 5-6km/h in a wide variety of driving conditions, regardless of whether lane markings are present or not.
Considering the significant advancements in autonomous vehicle technology, research in this field is of interest to researchers. To drive vehicles autonomously, controlling steer angle, gas hatch, and brakes needs to be learned. The behavioral cloning method is used to imitate humans’ driving behavior. We created a dataset of driving in different routes and conditions, and using the designed model, the output used for controlling the vehicle is obtained. In this paper, the learning of self-driving vehicles based on real driving behavior using deep neural network techniques (LSV-DNN) is proposed. We designed a convolutional network which uses the real driving data obtained through the vehicle’s camera and computer. The response of the driver during driving is recorded in different situations, and by converting the real driver’s driving video to images and transferring the data to an Excel file, obstacle detection is carried out with the best accuracy and speed using the Yolo algorithm version 3. This way, the network learns the response of the driver to obstacles in different locations and the network is trained with the Yolo algorithm version 3 and the output of obstacle detection. Then, it outputs the steer angle and amount of brake, gas, and vehicle acceleration. This study focuses on designing a convolutional network using behavioral cloning and motion planning of autonomous vehicle using a deep learning framework. Neural networks are effective systems for finding relationships between data, modeling, and predict new data or classify data. As a result Neural networks with input real data predict steer angle and speed for autonomous driving. The LSV-DNN is evaluated here via extensive simulations carried out in Python and TensorFlow environment. We evaluated the network error using the loss function. The results confirmed that our scheme is capable of exhibiting high prediction accuracy (exceeding 92.93%). In addition, our proposed scheme has high speed (more than 64.41%), low FPR (less than 6.89%), and low FNR (less than 3.95%), in comparison with the other approaches currently being employed. By comparing other methods which were conducted on the simulator’s data, we obtained good performance results for the designed network on the data from KITTI benchmark, the data collected using a private vehicle, and the data we collected.
The rapid development of Artificial Intelligence has revolutionized the area of autonomous vehicles by incorporating complex models and algorithms. Self-driving cars are always one of the biggest inventions in computer science and robotic intelligence. Highly robust algorithms that facilitate the functioning of these vehicles will reduce many problems associated with driving such as the drunken driver problem. In this paper our aim is to build a Deep Learning model that can drive the car autonomously which can adapt well to the real-time tracks and does not require any manual feature extraction. This research work proposes a computer vision model that learns from video data. It involves image processing, image augmentation, behavioural cloning and convolutional neural network model. The neural network architecture is used to detect path in a video segment, linings of roads, locations of obstacles, and behavioural cloning is used for the model to learn from human actions in the video.
Understanding the situation is a critical component of any self-driving system. Accurate real-time visual signal processing to create pixelwise classed pictures, also known as semantic segmentation, is critical for scenario comprehension and subsequent acceptance of this new technology. Due to the intricate interaction between pixels in each frame of the received camera data, such efficiency in terms of processing time and accuracy could not be achieved prior to recent advances in deep learning algorithms. We present an effective approach for semantic segmentation for self-driving automobiles in this study. We combine deep learning architectures like convolutional neural networks and autoencoders, as well as cutting-edge approaches like feature pyramid networks and bottleneck residual blocks, to develop our model. The CamVid dataset, which has undergone considerable data augmentation, is utilised to train and test our model. To validate the suggested model, we compare the acquired findings to various baseline models reported in the literature.
Self-driving cars has the potential to revolutionize urban mobility by providing sustainable, safe, convenient and congestion free transportability. This vehicle autonomy as an application of AI has several challenges like infallibly recognizing traffic lights, signs, unclear lane markings, pedestrians, etc. These problems can be overcome by using the technological development in the fields of Deep Learning, Computer Vision due to availability of Graphical Processing Units (GPU) and cloud platform. In this paper, we propose a deep neural network based model for reliable detection and recognition of traffic lights using transfer learning. The method incorporates use of faster region based convolutional network (R-CNN) Inception V2 model in TensorFlow for transfer learning. The model was trained on dataset containing different images of traffic signals in accordance with Indian Traffic Signals which are distinguished in five types of classes. The model accomplishes its objective by detecting the traffic light with its correct class type.
— The biggest challenge of a self-driving car is autonomous lateral motion so the main aim of this paper is to clone drives for better performance of the autonomous car for which we are using multilayer neural networks and deep learning techniques. We will focus to achieve autonomous cars driving in stimulator conditions. Within the simulator, preprocessing the image obtained from the camera placed in the car imitate the driver’s vision and then the reaction, which is the steering angle of the car. The neural network trains the deep learning technique on the basis of photos taken from a camera in manual mode which provides a condition for running the car in autonomous mode, utilizing the trained multilayered neural network. The driver imitation algorithm fabricated and characterized in the paper is all about the profound learning technique that is centered around the NVIDIA CNN model.
According to the WHO, around 1.35 million people die every year in road traffic crashes. Year by year there are advancements in technology which has paved a way for the inclusion of artificial intelligence even into automobiles. In this paper, we have provided an extensive study on the implementation of self-driving cars based on deep learning. For ease and safety, we will be simulating the car in the simulator provided by Udacity. The data for training the model is recorded in the simulator and imported into the project for training the model. Finally, we have implemented and compared various existing deep learning models and showcased the results in the section IV. We have obtained accuracy of 96.83% for model A and 76.67% for model B.
… Learning techniques and the Tensorflow framework with the goal of navigating a driverless car … autonomous driving system with machine learning and Deep Learning techniques at the …
… of driverless cars to … driverless cars, and how they work in machine learning. By comparing the difference between Google's self-driving car in an urban environment and a selfdriving car …
With the advent of Internet of Things (IoT), The realization of smart city seems to be very imminent. One of the key parts of a cyber physical system of urban life is transportation. This mission-critical application has attracted many researchers in both academia and industry to investigate driverless cars. In the domain of autonomous vehicles, intelligent video analytics is very critical. By the advent of deep learning many neural networks based learning approaches are under consideration. This work tries to implement obstacle detection and avoidance in a self-driven car. One of advanced neural network called Convolutional Neural Network (CNN) is exploited for real time video/image analysis using an IOT device. This project makes use of a raspberry pi which is responsible for controlling the car and performing inference using CNN, based on its current input. The model trained has achieved an accuracy of 88.6% and are in good consent with expected performance.
… Machine learning is a critical component of driverless car technology, enabling the car to make … In summary, machine learning is a critical component of driverless cars, and the use of …
… machine learning and connect the system to any electric car. The proposed system provides an autonomous car feature to any existing electric car … Most existing electric cars that are on …
This author represented a comprehensive literature review and the successful development of an automatic Arduino car which can detect objects, give warning and avoid collisions. The model car will run at full speed in regular situations, slow down when obstacle is found in 100cm and stops when obstacle is found in 20cm. The car continuously measures distance and takes instant decisions. The project incorporates principles of coding, Computer Engineering and Electrical Engineering knowledge, artificial intelligence and robot car components. Moreover, the literature review on the most relevant papers was conducted to gather knowledge about the advancements in machine learning, deep learning, artificial intelligence, hardware components, automatic car algorithms and other methods. The findings lay the groundwork for future project advancements, with the final driverless car project aiming to integrate machine learning and deep learning methods, algorithms and technologies to enable the automatic detection of road lanes and traffic signals.
Connected and autonomous vehicles (CAV a.k.a. driverless cars) offset human response for transportation infrastructure, enhancing traffic efficiency, travel leisure, and road safety. Behind the wheels of these mobile robots lies machine learning (ML) to automate mundane driving tasks and make decisions from situational awareness. Attacking ML, the brain of driverless cars, can cause catastrophes. This paper proposes a novel approach to attack CAV by fooling its ML model. Using adversarial examples in CAVs, the work demonstrates how adversarial machine learning can generate attacks hardly detectable by current ML classifiers for CAV misbehavior detection. First, adversarial datasets are generated by a traditional attack engine, which CAV misbehavior detection ML models can easily detect. Building attack ML model takes two phases: training and testing. Using supervised learning, Phase I trains the model on the time-series data, converted from the adversarial datasets. Phase II tests the model, which leads, for the next round of model improvement. The initial round deploys K-Nearest Neighbor (KNN) and Random Forest (RF) algorithms, respectively. The next round, guided by deep learning (DL) models, uses Logistic Regression (LG) of neural network and Long Short-Term Memory (LSTM) of recurrent neural network. The results, in precision-recall (PR) and receiver operating characteristic (ROC) curves, validate the effectiveness of the proposed adversarial ML models. This work reveals the vulnerability in ML. At the same time, it shows the promise to protect critical infrastructure by studying the opponent strategies. Future work includes retraining the adversarial ML models with real-world datasets from pilot CAV sites.
Every year, 1.25 million people are projected to die in traffic accidents around the world. Humans' failure to pay attention to road signs and follow the rules is a major cause of accidents. A signboard detection system has been installed to avoid this problem. This technology could be beneficial in recognizing certain domains like classrooms, traffic signals, colleges, hospitals, offices, and so on, as well as potentially saving many lives. This study describes the building of a low-cost prototype of a small self-driving automobile model utilizing simple and easily available technologies. In this prototype, the Raspberry Pi controller and H-bridge drive two DC motors to enable vehicle automation. Sonar sensors, image processing, computer vision, and machine learning have all been used in intelligent systems. We propose using a pattern matching methodology to create a self-driving vehicle to overcome the challenge.
This research aims to predict the performance of driverless cars by employing a cognitive data analysis and reliability analysis-based approach. With the advancement of autonomous …
The field of autonomous mobile robots has undergone dramatic advancements over the past decades. Despite achieving important milestones, several challenges are yet to be addressed. Aggregating the achievements of the robotic community as survey articles is vital to keep the track of current state-of-the-art and the challenges that must be tackled in the future. This article tries to provide a comprehensive review of autonomous mobile robots covering topics such as sensor types, mobile robot platforms, simulation tools, path planning and following, sensor fusion methods, obstacle avoidance, and SLAM. The urge to present a survey article is twofold. First, autonomous navigation field evolves fast so writing survey articles regularly is crucial to keep the research community well-aware of the current status of this field. Second, deep learning methods have revolutionized many fields including autonomous navigation. Therefore, it is necessary to give an appropriate treatment of the role of deep learning in autonomous navigation as well which is covered in this article. Future works and research gaps will also be discussed.
Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This monograph attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.
Robust Autonomous Vehicle Computer-Vision-Based Localization in Challenging Environmental Conditions
In this paper, we present a novel autonomous vehicle (AV) localization design and its implementation, which we recommend to employ in challenging navigation conditions with a poor quality of the satellite navigation system signals and computer vision images. In the case when the GPS signal becomes unstable, other auxiliary navigation systems, such as computer-vision-based positioning, are employed for more accurate localization and mapping. However, the quality of data obtained from AV’s sensors might be deteriorated by the extreme environmental conditions too, which infinitely leads to the decrease in navigation performance. To verify our computer-vision-based localization system design, we considered the Arctic region use case, which poses additional challenges for the AV’s navigation and might employ artificial visual landmarks for improving the localization quality, which we used for the computer vision training. We further enhanced our data by applying affine transformations to increase its diversity. We selected YOLOv4 image detection architecture for our system design, as it demonstrated the highest performance in our experiments. For the computational platform, we employed a Nvidia Jetson AGX Xavier device, as it is well known and widely used in robotic and AV computer vision, as well as deep learning applications. Our empirical study showed that the proposed computer vision system that was further trained on the dataset enhanced by affine transformations became robust regarding image quality degradation caused by extreme environmental conditions. It was effectively able to detect and recognize images of artificial visual landmarks captured in the extreme Arctic region’s conditions. The developed system can be integrated into vehicle navigation facilities to improve their effectiveness and efficiency and to prevent possible navigation performance deterioration.
Vision-based localization systems, namely visual odometry (VO) and visual inertial odometry (VIO), have attracted great attention recently. They are regarded as critical modules for building fully autonomous systems. The simplicity of visual and inertial state estimators, along with their applicability in resource-constrained platforms motivated robotic community to research and develop novel approaches that maximize their robustness and reliability. In this paper, we surveyed state-of-the-art VO and VIO approaches. In addition, studies related to localization in visually degraded environments are also reviewed. The reviewed VO techniques and related studies have been analyzed in terms of key design aspects including appearance, feature, and learning based approaches. On the other hand, research studies related to VIO have been categorized based on the degree and type of fusion process into loosely-coupled, semi-tightly coupled, or tightly-coupled approaches and filtering or optimization-based paradigms. This paper provides an overview of the main components of visual localization, key design aspects highlighting the pros and cons of each approach, and compares the latest research works in this field. Finally, a detailed discussion of the challenges associated with the reviewed approaches and future research considerations are formulated.
… such as the Global Navigation Satellite Systems may have … computer vision and differential pseudorange Global Positioning System (DGPS) measurements to aid an inertial navigation …
A key requirement in the development of self-driving vehicles is accurate steering angle estimation. In this paper, we present a novel steering angle computation approach for self-driving vehicles using a computer vision-based technique to navigate in a controlled environment. We have performed color-based thresholding to segment the drivable region of the road. Moreover, Circle-Rectangle-based shapes are detected from the region of interest to compute the steering angle for automated navigation of customized self-driving vehicles. To evaluate the performance of our proposed approach, we have conducted simulations in CARLA. While the in-field testing yields the performance of our proposed approach and meets the real-time requirement of an efficient navigation system with 1.911 mean square error.
A path planning and control approach of a non-holonomic three-wheeled mobile robot (WMR) for online navigation in road following and roundabout environments is presented in this paper. We proposed a complete navigation algorithm that enables the WMR to autonomously navigate on the road with various scenarios. With such an algorithm, the robot is able to localize itself within the road environment and find a collision free-path starting from a pre-defined start position to a goal point using a novel approach called laser simulator (LS). The path planning and roundabout detection are determined based on LS and sensor fusion of a laser range finder, camera, and odometry measurements. The sensor fusion algorithm is used to remove noises and uncertainties from sensors’ data and provide optimum measurements for path planning. A robot motion control scheme is used for the purpose of controlling the kinematic parameters of WMR using a resolved acceleration control coupled with an active force control for rejecting the disturbances. Experimental results show the capability of the proposed algorithms to robustly drive the robot on the road following and roundabout environments.
In order to solve the path planning problem of an intelligent vehicle in an unknown environment, this paper proposes a map construction and path planning method for mobile robots based on multi-sensor information fusion. Firstly, the extended Kalman filter (EKF) is used to fuse the ambient information of LiDAR and a depth camera. The pose and acceleration information of the robot is obtained through the pose sensor. The SLAM algorithm based on a fusion of LiDAR, a depth camera, and the inertial measurement unit was built. Secondly, the improved ant colony algorithm was used to carry out global path planning. Meanwhile, the dynamic window method was used to realize local planning and local obstacle avoidance. Finally, experiments were carried out on a robot platform to verify the reliability of the proposed method. The experiment results showed that the map constructed by multi-sensor information fusion was closer to the real environment, and the accuracy and robustness of SLAM were effectively improved. The turning angle of the path was smoothed using the improved ant colony algorithm, and the real-time obstacle avoidance was able to be realized using the dynamic window method. The efficiency of path planning was improved, and the automatic feedback control of the intelligent vehicle was able to be realized.
Mobile robot is a very important branch of robotics. In practice, the performance of mobile robot is required to be higher and higher. It is required that mobile robot can adapt to different complex environments through its own intelligent system to achieve the established functional goals. Compared with traditional path planning with single sensor information, Kalman filter is used to fuse multi-sensor information, and path planning method based on improved dynamic artificial potential field method is studied. Using the improved dynamic artificial potential field method, the robot can also achieve optimal path planning and obstacle avoidance in complex dynamic environment. The simulation results show that the proposed algorithm is feasible.
… path planning by integrating sophisticated multi-sensor … Sensor fusion is the synthesis of data from different sensors to … In this study, a sensor fusion algorithm based on Kalman filter …
Efficient path planning and obstacle avoidance in a complex and dynamic environment is one of the key challenges of unmanned vehicle logistics distribution, especially in the logistics scene of Cainiao Station, which involves crowded communities and dynamic campus roads. In view of the shortcomings of existing methods in multi-sensor data fusion and path optimization, this paper proposes a path planning model based on multi-sensor image fusion, named DynaFusion-Plan. The model is able to provide an optimal path from the starting point to the target point in a complex environment, avoiding obstacles and realizing the smoothness and dynamic adjustment ability of the path. The model consists of three modules: the sensor data fusion module uses Convolutional Neural Networks (CNN) and Lidar-Inertial Odometry and Simultaneous Localization and Mapping (LIO-SAM) technology to build a high-precision dynamic environment map; the path planning module combines Artificial Potential Field (APF) and Deep Deterministic Policy Gradient (DDPG) algorithms to balance path length, smoothness, and obstacle avoidance capabilities; the decision and control module uses Model Predictive Control (MPC) and Long Short-Term Memory (LSTM) to achieve real-time path tracking and dynamic adjustment. Experimental results on TartanAir, NuScenes, and AirSim datasets show that DynaFusion-Plan significantly outperforms existing methods in key indicators such as path length (42.5 m vs. 48.7 m), path smoothness (κ=0.05 vs. κ=0.15), and obstacle avoidance success rate (98.7% vs. 85.4%), especially in complex dynamic environments. It shows strong adaptability and stability. This work provides an efficient and reliable solution for unmanned vehicle path planning in intelligent logistics scenarios, and lays the foundation for future optimization directions, such as lightweight model design and more real-world scenario verification.
… sensors. This study focuses on sensor fusion for object-road detection and path planning … scenarios demonstrate the successful integration of sensor fusion, enabling the vehicle to …
This paper presents a neural network based heterogeneous sensor fusion approach towards real-time traversability estimation of mobile robots using sensor data. Even though significant advances have been made for autonomous navigation in structured terrain conditions, obtaining reliable traversability estimates for tracked vehicle navigation in challenging terrain conditions is still an open research problem. In this regard, we propose a neural network architecture capable of fusing depth images along with roll and pitch measurements on board the robot to perform traversability estimation. The proposed architecture is trained in a variety of simulated structured and unstructured environments. As such, the proposed architecture is capable of extracting the relevant features from the sensor measurements in a data driven manner as compared to existing heuristic based approaches that fail to generalize for different environmental conditions. The reliability of the traversability estimates provided by the trained architecture was validated in indoor and outdoor conditions using real sensor data. In addition, the feasibility of using the traversability estimates in incremental path planning was also demonstrated through simulation. For both applications the proposed approach provided compelling results. Inferences based on the results of the experiments along with directions for future research are also outlined.
ABSTRACT Nowadays, mobile robots are being widely employed in various settings, including factories, homes, and everyday tasks. Achieving successful implementation of autonomous robot movement largely depends on effective route planning. Therefore, it is not surprising that there is a growing trend in studying and improving the intelligence of this technology. Deep reinforcement learning has shown remarkable performance in decision-making problems and can be effectively utilized to address path planning challenges faced by mobile robots. This manuscript focuses on investigating path planning problems using deep reinforcement learning and multi-sensing information fusion technology. The manuscript elaborates on the significance of path planning, providing comprehensive research encompassing path planning algorithms, deep reinforcement learning, and multi-sensing information fusion. Also, the fundamental theory of deep reinforcement learning is introduced, followed by the design of a multimodal perception module based on image and lidar. A semantic segmentation approach is employed to bridge the gap between simulated and real environments. To enhance strategy, a lightweight multimodal data fusion network model is carefully developed, incorporating modality separation learning. Overall, in this paper, we explore the use of a deep reinforcement learning architecture for conducting path planning experiments with mobile robots. The results obtained from these experiments demonstrate promising outcomes.
Navigation systems are developing rapidly; nevertheless, tasks are becoming more complex, significantly increasing the number of challenges for robotic systems. Navigation can be separated into global and local navigation. While global navigation works according to predefined data about the environment, local navigation uses sensory data to dynamically react and adjust the trajectory. Tasks are becoming more complex with the addition of dynamic obstacles, multiple robots, or, in some cases, inspection of places that are not physically reachable by humans. Cognitive tasks require not only detecting an object but also evaluating it without direct recognition. For this purpose, sensor fusion methods are employed. However, sensors of different physical nature sometimes cannot directly extract required information. As a result, AI methods are becoming increasingly popular for evaluating acquired information and for controlling and generating robot trajectories. In this work, a review of sensors for mobile robot localization is presented by comparing them and listing advantages and disadvantages of their combinations. Also, integration with path-planning methods is looked into. Moreover, sensor fusion methods are analyzed and evaluated. Furthermore, a concept for channel robot navigation, designed based on the research literature, is presented. Lastly, discussion and conclusions are drawn.
Visual orientation can extend human vision to machines. Combined with high-precision devices, it achieves high-precision, strong-intelligence, and fast-responding location awareness. It faces the problem that processing capacity cannot adapt to dynamic environment changes, and the single vision system cannot perceive the distance in non-line-of-sight scene. Therefore, this thesis carries out the research on a driverless vehicle visual path planning algorithm for sensor fusion. Firstly, via the analysis of satellite timing, inertial devices, navigation and positioning, it provides a research basis for the time perception, orientation perception and location perception of driverless vehicles. Combined with the practical application scenarios of driverless vehicles, this thesis analyzes the causes of errors in driverless vehicle positioning. It also designs a normalized pseudo-range processing method based on the fusion of heterogeneous sensor to realize the visual positioning of multiple sensor fusion. Through MATLAB simulation verification, it improves the visual positioning accuracy performance based on the proposed algorithm.
A new reactive motion planning method for an autonomous vehicle in dynamic environments is proposed. The new dynamic motion planning method combines a virtual plane based reactive motion planning technique with a sensor fusion based obstacle detection approach, which results in improving robustness and autonomy of vehicle navigation within unpredictable dynamic environments. The key feature of the new reactive motion planning method is based on a local observer in the virtual plane which allows the effective transformation of complex dynamic planning problems into simple stationary in the virtual plane. In addition, a sensor fusion based obstacle detection technique provides the pose estimation of moving obstacles by using a Kinect sensor and a sonar sensor, which helps to improve the accuracy and robustness of the reactive motion planning approach in uncertain dynamic environments. The performance of the proposed method was demonstrated through not only simulation studies but also field experiments using multiple moving obstacles even in hostile environments where conventional method failed.
… This work presents a novel algorithm for local path planning for autonomous vehicles (AVs), prioritizing safety and adherence to traffic regulations. The algorithm integrates the Twin …
… while path planning usually corresponds to a global navigation task. This may vary among different autonomous vehicle implementations. This work, however, proposes the use of the …
… planning based on multi-sensor data fusion by SVM is presented in this paper. We utilize 5 ultrasonic sensors and an image sensor get … to do multi-sensor data fusion to compute these …
Autonomous driving in urban environments requires intelligent systems that are able to deal with complex and unpredictable scenarios. Traditional modular approaches focus on dividing the driving task into standard modules, and then use rule-based methods to connect those different modules. As such, these approaches require a significant effort to design architectures that combine all system components, and are often prone to error propagation throughout the pipeline. Recently, end-to-end autonomous driving systems have formulated the autonomous driving problem as an end-to-end learning process, with the goal of developing a policy that transforms sensory data into vehicle control commands. Despite promising results, the majority of end-to-end works in autonomous driving focus on simple driving tasks, such as lane-following, which do not fully capture the intricacies of driving in urban environments. The main contribution of this paper is to provide a detailed comparison between end-to-end autonomous driving systems that tackle urban environments. This analysis comprises two stages: a) a description of the main characteristics of the successful end-to-end approaches in urban environments; b) a quantitative comparison based on two CARLA simulator benchmarks (CoRL2017 and NoCrash). Beyond providing a detailed overview of the existent approaches, we conclude this work with the most promising aspects of end-to-end autonomous driving approaches suitable for urban environments.
Autonomous Driving (AD) has evolved significantly since its beginnings in the 1980s, with continuous advancements driven by both industry and academia. Traditional AD systems break down the driving task into smaller modules—such as perception, localization, planning, and control– and optimizes them independently. In contrast, end-to-end models use neural networks to map sensory inputs directly to vehicle controls, optimizing the entire driving process as a single task. Recent advancements in deep learning have driven increased interest in end-to-end models, which is the central focus of this review. In this survey, we discuss how CARLA-based state-of-the-art implementations address various issues encountered in end-to-end autonomous driving through various model inputs, outputs, architectures, and training paradigms. To provide a comprehensive overview, we additionally include a concise summary of these methods in a single large table. Finally, we present evaluations and discussions of the methods, and suggest future avenues to tackle current challenges faced by end-to-end models.
Since the advent of Multimodal Large Language Models (MLLMs), they have made a significant impact across a wide range of real-world applications, particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about intricate driving scenarios has paved the way for a new paradigm in end-to-end AD systems. However, the progress of developing end-to-end models for AD has been slow, as existing fine-tuning methods demand substantial resources, including extensive computational power, large-scale datasets, and significant funding. Drawing inspiration from recent advancements in inference computing, we propose OpenEMMA, an open-source end-to-end framework based on MLLMs. By incor-porating the Chain-of- Thought reasoning process, Open-EMMA achieves significant improvements compared to the baseline when leveraging a diverse range of MLLMs. Fur-thermore, OpenEMMA demonstrates effectiveness, gener-alizability, and robustness across a variety of challenging driving scenarios, offering a more efficient and effective approach to autonomous driving. We release all the codes in https://github.com/taco-group/OpenEMMA.
Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance.
End-to-end autonomous driving has made impressive progress in recent years. Existing methods usually adopt the decoupled encoder-decoder paradigm, where the encoder extracts hidden features from raw sensor data, and the decoder outputs the ego-vehicle's future trajectories or actions. Under such a paradigm, the encoder does not have access to the intended behavior of the ego agent, leaving the burden of finding out safety-critical regions from the massive receptive field and inferring about future situations to the decoder. Even worse, the decoder is usually composed of several simple multi-layer perceptrons (MLP) or GRUs while the encoder is delicately designed (e.g., a combination of heavy ResNets or Transformer). Such an imbalanced resource-task division hampers the learning process. In this work, we aim to alleviate the aforementioned problem by two principles: (1) fully utilizing the capacity of the encoder; (2) increasing the capacity of the decoder. Concretely, we first predict a coarse-grained future position and action based on the encoder features. Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly. We also retrieve the encoder features around the predicted coordinate to obtain fine-grained information about the safety-critical region. Finally, based on the predicted future and the retrieved salient feature, we refine the coarse-grained position and action by predicting its offset from ground-truth. The above refinement module could be stacked in a cascaded fashion, which extends the capacity of the decoder with spatial-temporal prior knowledge about the conditioned future. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance in closed-loop benchmarks. Extensive ablation studies demonstrate the effectiveness of each proposed module.
How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise for perception tasks such as object detection and motion forecasting. However, for the actual driving task, the global context of the 3D scene is key, e.g. a change in traffic light state can affect the behavior of a vehicle geometrically distant from that traffic light. Geometry alone may therefore be insufficient for effectively fusing representations in end-to-end driving models. In this work, we demonstrate that imitation learning policies based on existing sensor fusion methods under-perform in the presence of a high density of dynamic agents and complex scenarios, which require global contextual reasoning, such as handling traffic oncoming from multiple directions at uncontrolled intersections. Therefore, we propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. We experimentally validate the efficacy of our approach in urban settings involving complex scenarios using the CARLA urban driving simulator. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
Recently, the diffusion model has emerged as a powerful generative technique for robotic policy learning, capable of modeling multi-mode action distributions. Leveraging its capability for end-to-end autonomous driving is a promising direction. However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed. To address these challenges, we propose a novel truncated diffusion policy that incorporates prior multi-mode anchors and truncates the diffusion schedule, enabling the model to learn denoising from anchored Gaussian distribution to the multi-mode driving action distribution. Additionally, we design an efficient cascade diffusion decoder for enhanced interaction with conditional scene context. The proposed model, DiffusionDrive, demonstrates 10× reduction in denoising steps compared to vanilla diffusion policy, delivering superior diversity and quality in just 2 steps. On the planning-oriented NAVSIM dataset, with aligned ResNet-34 backbone, DiffusionDrive achieves 88.1 PDMS without bells and whistles, setting a new record, while running at a real-time speed of 45 FPS on an NVIDIA 4090. Qualitative results on challenging scenarios further confirm that DiffusionDrive can robustly generate diverse plausible driving actions.
End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity. These models tend to rely predominantly on the ego vehicle's status for future path planning. Beyond the limitations of the dataset, we also note that current metrics do not comprehensively assess the planning quality, leading to potentially biased conclusions drawn from existing benchmarks. To address this issue, we introduce a new metric to evaluate whether the predicted trajectories adhere to the road. We further propose a simple baseline able to achieve competitive results without relying on perception annotations. Given the current limitations on the benchmark and metrics, we suggest the community reassess relevant prevailing research and be cautious about whether the continued pursuit of state-of-the-art would yield convincing and universal conclusions. Code and models are available at https://github.com/NVlabs/BEV-Planner.
We present OpenDriveVLA, a Vision-Language Action (VLA) model designed for end-to-end autonomous driving, built upon open-source large language models. OpenDriveVLA generates spatially-grounded driving actions by leveraging multimodal inputs, including both 2D and 3D instance-aware visual representations, ego vehicle states, and language commands. To bridge the modality gap between driving visual representations and language embeddings, we introduce a hierarchical vision-language alignment process, projecting both 2D and 3D structured visual tokens into a unified semantic space. Furthermore, we incorporate structured agent–environment–ego interaction modeling into the autoregressive decoding process, enabling the model to capture fine-grained spatial dependencies and behavior-aware dynamics critical for reliable trajectory planning. Extensive experiments on the nuScenes dataset demonstrate that OpenDriveVLA achieves state-of-the-art results across open-loop trajectory planning and driving-related question-answering tasks. Qualitative analyses further illustrate its superior capability to follow high-level driving commands and robustly generate trajectories under challenging scenarios, highlighting its potential for next-generation end-to-end autonomous driving.
Personalization, while extensively studied in conventional autonomous driving pipelines, has been largely overlooked in the context of end-to-end autonomous driving (E2EAD), despite its critical role in fostering user trust, safety perception, and real-world adoption. A primary bottleneck is the absence of large-scale real-world datasets that systematically capture driving preferences, severely limiting the development and evaluation of personalized E2EAD models. In this work, we introduce the first large-scale real-world dataset explicitly curated for personalized E2EAD, integrating comprehensive scene topology with rich dynamic context derived from agent dynamics and semantics inferred via a fine-tuned vision-language model (VLM). We propose a hybrid annotation pipeline that combines behavioral analysis, rule-and-distribution-based heuristics, and subjective semantic modeling guided by VLM reasoning, with final refinement through human-in-the-loop verification. Building upon this dataset, we introduce the first standardized benchmark for systematically evaluating personalized E2EAD models. Empirical evaluations on state-of-the-art architectures demonstrate that incorporating personalized driving preferences significantly improves behavioral alignment with human demonstrations.
… In recent years, the field of autonomous driving has witnessed … era of end-to-end autonomous driving (E2E-AD) systems [4–8], which promise a scalable, data-driven approach to vehicle …
Pedestrian trajectory prediction is one of the main concerns of computer vision problems in the automotive industry, especially in the field of advanced driver assistance systems. The ability to anticipate the next movements of pedestrians on the street is a key task in many areas, e.g., self-driving auto vehicles, mobile robots or advanced surveillance systems, and they still represent a technological challenge. The performance of state-of-the-art pedestrian trajectory prediction methods currently benefits from the advancements in sensors and associated signal processing technologies. The current paper reviews the most recent deep learning-based solutions for the problem of pedestrian trajectory prediction along with employed sensors and afferent processing methodologies, and it performs an overview of the available datasets, performance metrics used in the evaluation process, and practical applications. Finally, the current work exposes the research gaps from the literature and outlines potential new research directions.
Vehicle trajectory prediction enables autonomous vehicles to better reason about fast‐changing driving scenarios and thus perform well‐informed decision‐making tasks. Among different prediction approaches, deep learning‐based (DL‐based) methodologies stand out because of their capabilities to efficiently summarise historical data, infer nonlinear behavioural patterns from human driving data, and perform long‐horizon prediction. This work reviews the DL‐based methods that have shown promising results, organising them in terms of usage of the input data, separating the encodings of the target vehicle's historical data, surrounding vehicle's historical data, and road layout data. In particular, this paper explores the relationships between the scope of the prediction components and the input data formats, as well as the connections with other elements in the same prediction framework, including vehicle interaction and road scene mining. This information is crucial to understand complex architectural decisions and to provide guidance for the design of improved solutions. This work also compares the performance of the most successful prediction models, establishing that appropriate encodings of vehicle interactions and road scenes improve trajectory prediction accuracy, with the best performance achieved by attention mechanism and Transformer‐based models. Finally, this work discusses future research directions, including considerations for real‐time applications.
Vehicle trajectory prediction is essential for enabling safety-critical intelligent transportation systems (ITS) applications used in management and operations. While there have been some promising advances in the field, there is a need for modern deep learning algorithms that allow real-time trajectory prediction on embedded IoT devices. This article presents DeepTrack, a novel deep learning algorithm customized for real-time vehicle trajectory prediction and monitoring applications in arterial management, freeway management, traffic incident management, and work zone management for high-speed incoming traffic. In contrast to previous methods, the vehicle dynamics are encoded using Temporal Convolutional Networks (TCNs) to provide more robust time prediction with less computation. DeepTrack also uses depthwise convolution, which reduces the complexity of models compared to existing approaches in terms of model size and operations. Overall, our experimental results demonstrate that DeepTrack achieves comparable accuracy to state-of-the-art trajectory prediction models but with smaller model sizes and lower computational complexity, making it more suitable for real-world deployment.
In order to safely and efficiently drive through the complex traffic scenarios, predicting the trajectory of the forward vehicle accurately is important for intelligent vehicles. Accurate and realtime trajectory prediction can make the intelligent vehicles adjust their maneuvers according to the running state of the vehicles in front of them. In recent years, deep-learning-based methods have been applied as novel alternatives for trajectory prediction with the development of the machine learning. But which kind of deep neural networks is the most suitable model for trajectory prediction is uncertain. In this paper, we design three kinds of deep neural networks: Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), and Stacked Autoencoders (SAEs) to predict the position and the velocity of the forward vehicles. We verify the performance of these three network models on the NGSIM I-80 dataset which consists of real trajectories of vehicles on multi-lanes. What’s more, we use Savitzky-Golay filter to filter noise in order to reduce the effect of noise on the training models. Our results demonstrate that in the three deep neural networks that we designed, the LSTM model perform better than GRU model and SAEs model in the area of trajectory prediction. The results of our works will have certain guiding significance for choosing the model of neural network to predict the vehicle trajectories.
… the groundwork for trajectory prediction research, … deep learning approaches, we provide a systematic review of the popular deep learning methods used for vehicle trajectory prediction. …
Vehicle Trajectory Prediction (VTP) is one of the key issues in the field of autonomous driving. In recent years, more researchers have tried applying Deep Learning methods and techniques to VTP tasks. However, due to the black-box nature of Deep Learning, it cannot meet the interpretability and safety requirements of autonomous driving systems. Researchers have tried alleviating this problem by introducing driving knowledge in Deep Learning-based VTP. From the perspective of introducing driving knowledge, this paper systematically investigates the research status of DL-based VTP. First of all, this paper summarizes the research on VTP under three different problem formulations; secondly, this paper summarizes the application methods and application stages of driving knowledge in DL-based VTP; finally, this paper investigates and analyzes the VTP datasets and evaluation, and summarizes the knowledge contained in the datasets and its usage. Through the investigation and summary of problem formulation, knowledge usage, datasets, and evaluation of DL-based VTP, this paper analyzes the challenges and open questions of existing VTP research. It puts forward an outlook on future research directions.
Pedestrian trajectory prediction is widely used in various applications, such as intelligent transportation systems, autonomous driving, and social robotics. Precisely forecasting surrounding pedestrians’ future trajectories can assist intelligent agents in achieving better motion planning. Currently, deep learning-based trajectory prediction methods have demonstrated superior prediction performance to traditional approaches by learning from trajectory data. However, these methods still face many challenges in improving prediction accuracy, efficiency, and reliability. In this survey, we research the main challenges in deep learning-based pedestrian trajectory prediction methods and study this problem and its solutions through literature collection and analysis. Specifically, we first investigate and analyze the existing literature and surveys on pedestrian trajectory prediction. On this basis, we summarize several main challenges faced by deep learning-based pedestrian trajectory prediction, including motion uncertainty, interaction modeling, scene understanding, data-related issues, and the interpretability of prediction models. We then summarize solutions for each challenge. Subsequently, we introduce mainstream trajectory prediction datasets and analyze the state-of-the-art (SOTA) results reported on them. Finally, we discuss potential research prospects in trajectory prediction, aiming to promote the trajectory prediction community.
Vehicle trajectory prediction is crucial and indispensable for ensuring the safe and efficient operation of autonomous vehicles in complex traffic environments. The application of Internet of Things technology in the collaborative automated driving system (CADS) has established a robust data foundation for vehicle trajectory prediction. Accurate prediction requires not only a substantial amount of high-quality data but also a deep understanding of the vehicle’s driving characteristics and interactions between neighboring vehicles. To enhance the study of vehicle trajectory prediction, this article proposes a novel Social Force-constrained Gated Recurrent Unit (SF-GRU) model, which integrates data-driven and physics-driven models. Specifically, the SF-GRU model is based on the gated recurrent unit encoder–decoder framework and incorporates social force constraints to enhance and supplement the model input based on vehicle time-series trajectory data, which describes the driving and interactive behaviors of vehicles during driving, as well as the interactions between neighboring vehicles and the surrounding environment. The model is trained and validated using the next generation simulation data set. Experimental results demonstrate that the SF-GRU model outperforms existing state-of-the-art models in both longitudinal and lateral motion, and that social force constraints are more effective than spatial variables in improving prediction accuracy. Furthermore, the SF-GRU model can intuitively and accurately consider the interactions between vehicles, and precisely describe the changes of relevant variables in the prediction process, thus enhancing the interpretability of the data-driven model. The SF-GRU model has great potential in vehicle trajectory prediction and can provide important support for the practical implementation of autonomous driving vehicles.
… To address these issues, this paper systematically reviews deep learning-driven vehicle trajectory prediction methods and highlights their application value in expert systems and …
Trajectory prediction is essential for the safe driving of autonomous vehicles. With the advancement of advanced sensors and deep learning technologies, attempts have been made to reflect complex interactions. In this study, we propose a deep learning-based Multimodal Trajectory Prediction method that reflects traffic light conditions in complex urban intersection situations. Based on existing state-of-the-art research, the multi-path of multi-agents was predicted using a generative model, and the actor’s trajectory information, state, social interaction, and traffic light state, and scene context were reflected. Performance evaluation was conducted using metrics commonly used to evaluate the performance of stochastic trajectory prediction models. This study is meaningful in that trajectory prediction was performed by reflecting realistic elements of traffic lights in a complex urban environment. Future research will need to be conducted on efficient ways to reduce time and computational performance while reflecting different real-world environments.
Within the area of environmental perception, automatic navigation, object detection, and computer vision are crucial and demanding fields with many applications in modern industries, such as multi-target long-term visual tracking in automated production, defect detection, and driverless robotic vehicles. The performance of computer vision has greatly improved recently thanks to developments in deep learning algorithms and hardware computing capabilities, which have spawned the creation of a large number of related applications. At the same time, with the rapid increase in autonomous systems in the market, energy consumption has become an increasingly critical issue in computer vision and SLAM (Simultaneous Localization and Mapping) algorithms. This paper presents the results of a detailed review of over 100 papers published over the course of two decades (1999–2024), with a primary focus on the technical advancement in computer vision. To elucidate the foundational principles, an examination of typical visual algorithms based on traditional correlation filtering was initially conducted. Subsequently, a comprehensive overview of the state-of-the-art advancements in deep learning-based computer vision techniques was compiled. Furthermore, a comparative analysis of conventional and novel algorithms was undertaken to discuss the future trends and directions of computer vision. Lastly, the feasibility of employing visual SLAM algorithms in the context of autonomous vehicles was explored. Additionally, in the context of intelligent robots for low-carbon, unmanned factories, we discussed model optimization techniques such as pruning and quantization, highlighting their importance in enhancing energy efficiency. We conducted a comprehensive comparison of the performance and energy consumption of various computer vision algorithms, with a detailed exploration of how to balance these factors and a discussion of potential future development trends.
Simultaneous Localization and Mapping (SLAM) problem, where an autonomous vehicle moving in an unknown environment attempts to sense and map its surroundings while recognizing its own location and trajectory within the map, has always been a notable and popular research topic in the field of computer vision, robotics and artificial intelligence. Among the various types of solutions relying on different sensor modalities such as the global positioning system (GPS), radio signals, lidar, etc., vision-based solutions are of major interest nowadays because most cameras are low-cost and rich information gathering, especially for the stereo cameras. In this paper, different technologies of visual SLAM, where the main sensors are cameras, are surveyed with an emphasis on methodologies using stereo cameras. Some state-of-the-art open-source stereo visual SLAM frameworks are also discussed and compared. Finally, a general discussion of the challenges in terms of accuracy, processing time, cost, etc. is provided. The main purpose of this review is to provide a comprehensive overview of public available stereo visual SLAM frameworks and their corresponding pros and cons in different real-world scenarios.
… SLAM algorithm that estimates a dense 3D map representation which is more accurate than raw stereo measurements. Thereto, we run a sparse VSLAM … building using computer vision. …
This paper deals simultaneously with the trajectory estimation and map reconstruction by means of a stereo-calibrated vision system evolving in a large-scale unknown environment. This problem is widely known as Visual SLAM. Our proposal optimizes the execution time of the VSLAM framework while preserving its localization accuracy. The contributions of this paper are structured as follows. First, a novel VSLAM approach based on a “Weighted Mean” of multiple neighbor poses is detailed and is denoted as HOOFR SLAM. This approach provides a localization estimate after computing the camera poses (6-DOF rigid transformation) from the current image frame to previous neighbor frames. Taking advantage of the camera motion, we conjointly incorporate two types of stereo modes: “Static Stereo” mode (SS) through the fixed-baseline of left-right cameras setup along with the “Temporal Multi-view Stereo” mode (TMS). Moreover, instead of computing beforehand the disparity of SS mode for all key-points set, the disparity map in scale estimation step is limited to the inliers of the TMS mode so as to reduce the computational cost. This strategy is suitable to be parallelized on a multiprocessor architecture and exhibits a competitive performance with the other state-of-the-art strategies in many real datasets. Second, we report a hardware-software mapping of the proposed VSLAM approach. To this end, a heterogeneous CPU-GPU architecture-based vision system is considered. Third, a thorough and extensive experimental evaluation of our algorithm implemented on an automotive architecture (the NVIDIA Tegra TX1 system) is studied and analyzed. We report hence the localization and timing results through experiments on five well-known public stereo SLAM datasets: KITTI, Malaga, Oxford, MRT, and St_Lucia datasets.
The development of autonomous vehicles has prompted an interest in exploring various techniques in navigation. One such technique is simultaneous localization and mapping (SLAM), which enables a vehicle to comprehend its surroundings, build a map of the environment in real time, and locate itself within that map. Although traditional techniques have been used to perform SLAM for a long time, recent advancements have seen the incorporation of neural network techniques into various stages of the SLAM pipeline. This review article provides a focused analysis of the recent developments in neural network techniques for SLAM-based localization of autonomous ground vehicles. In contrast to the previous review studies that covered general navigation and SLAM techniques, this paper specifically addresses the unique challenges and opportunities presented by the integration of neural networks in this context. Existing review studies have highlighted the limitations of conventional visual SLAM, and this article aims to explore the potential of deep learning methods. This article discusses the functions required for localization, and several neural network-based techniques proposed by researchers to carry out such functions. First, it presents a general background of the issue, the relevant review studies that have already been done, and the adopted methodology in this review. Then, it provides a thorough review of the findings regarding localization and odometry. Finally, it presents our analysis of the findings, open research questions in the field, and a conclusion. A semisystematic approach is used to carry out the review.
Simultaneous localization and mapping (SLAM) is a widely researched topic in the field of robotics, augmented/virtual reality and more dominantly in self-driving cars. SLAM is a technique of building a map of the environment and estimating the state of the robot in the map in which it is moving, simultaneously. SLAM has been there for more than 30 years and has contributed significantly in the industry targeting from small scale driven applications to large scale, which resulted in the advent of this decade’s self driving cars. This paper attempts to give an understanding and progress of SLAM in autonomous driving industry as well as briefly describes the SLAM techniques that have contributed significantly to the industry, which were especially evaluated on KITTI dataset. We have also attempted to compare various techniques that were presented and made a rough estimate on why the state of the art approach can be revised and refurnished to suit the complex understanding of the environment for effective localization. In the end we have briefly described the security threats related to autonomous driving industry and why this is alarming.
Visual SLAM (Simultaneously Localization and Mapping) is a solution to achieve localization and mapping of robots simultaneously. Significant achievements have been made during the past decades, geography-based methods are becoming more and more successful in dealing with static environments. However, they still cannot handle a challenging environment. With the great achievements of deep learning methods in the field of computer vision, there is a trend of applying deep learning methods to visual SLAM. In this paper, the latest research progress of deep learning applied to the field of visual SLAM is reviewed. The outstanding research results of deep learning visual odometry and deep learning loop closure detect are summarized. Finally, future development directions of visual SLAM based on deep learning is prospected.
In the context of automated driving, navigating through challenging urban environments with dynamic objects, large-scale scenes, and varying lighting/weather conditions, achieving accurate localization is paramount for highly-automated (HAVs) or autonomous vehicles (AVs). An imprecise localization can greatly impact subsequent decision-making to manage an HAV or AV’s motion (planning and control tasks). In recent years, visual simultaneous localization and mapping (VSLAM) has shown substantial progress and equipping it can lead to handling non-standardized situations of real-world scenes and achieving higher localization and mapping accuracy. In this article, we present a comprehensive analysis of the current research status of VSLAM and its potential application to HAV or AV operating in complex urban environments. We first discuss the criteria to assess how well for the solutions that VSLAM methods offer to address the challenges, which include real-time performance, accuracy, robustness, and system operating cost. By employing these assessment criteria, we evaluate various VSLAM methods in four essential aspects including rejection and tracking of high dynamic objects, map construction in large-scale environments, loop detection and error correction, and sustainable operation and map updating. This evaluation provides valuable insights into the effectiveness of different VSLAM techniques. We then discuss potential research directions for leveraging VSLAM methods in achieving high-level automated driving in complex settings. We hope this article to serve as a timely update on recent progress and advances in VSLAM which are applicable to HAVs or AVs. To facilitate future research, we create a repository that includes links to relevant reviews and methodological papers for learning at https://github.com/bumblebee15138/VSLAM for HAVs and AVs.
Deep learning has become the standard model for object detection and recognition. Recently, there is progress on using CNN models for geometric vision tasks like depth estimation, optical flow prediction or motion segmentation. However, Visual SLAM remains to be one of the areas of automated driving where CNNs are not mature for deployment in commercial automated driving systems. In this paper, we explore how deep learning can be used to replace parts of the classical Visual SLAM pipeline. Firstly, we describe the building blocks of Visual SLAM pipeline composed of standard geometric vision tasks. Then we provide an overview of Visual SLAM use cases for automated driving based on the authors' experience in commercial deployment. Finally, we discuss the opportunities of using Deep Learning to improve upon state-of-the-art classical methods.
… in computer vision. Specifically, in the context of scene understanding for roads, 3D vehicle … Current approaches leverage two kinds of information to deal with the vehicle detection and …
The simultaneous localization and mapping (SLAM) is indispensable to Autonomous Vehicle (AV). However, the visual images are susceptible to light interference, and light detection and ranging (LiDAR) depends heavily on geometric features of the surrounding scene, relying solely on a camera or LiDAR exhibits limitations in challenging environments. To solve these problems, we propose an LiDAR-visual fusion method for high precision and robust vehicle localization. Compared with the previous LiDAR-visual fusion method, the proposed method fully utilizes the sensor’s measurement data for fusion in each part. First, an LiDAR vision frame is constructed at the front end, then the LiDAR is used to assist the vision in obtaining the depth information and tracking. In the closed-loop recognition part, a logic judgment module is introduced, and the LiDAR point cloud assists in the vision for loop closure correction to reduce the positioning error. Additionally, a visual-assisted LiDAR method for 3-D scene reconstruction is proposed. Experiments in real scenes show that the average positioning errors are 2.065, 1.9, and 2.9 cm in x, y, and z-directions, respectively; and the average rotation errors are 0.11 rad, 0.11 rad, and 0.13 rad in roll, pitch, yaw. The average positioning time is 29.98 ms. Compared with the classical ORB-SLAM2, LeGO-LOAM, DEMO, and TVL-SLAM algorithms, the proposed method demonstrates superior accuracy, robustness, and real-time performance.
SLAM is an abbreviation for simultaneous localization and mapping, which is a technique for estimating sensor motion and reconstructing structure in an unknown environment. Especially, Simultaneous Localization and Mapping (SLAM) using cameras is referred to as visual SLAM (vSLAM) because it is based on visual information only. vSLAM can be used as a fundamental technology for various types of applications and has been discussed in the field of computer vision, augmented reality, and robotics in the literature. This paper aims to categorize and summarize recent vSLAM algorithms proposed in different research communities from both technical and historical points of views. Especially, we focus on vSLAM algorithms proposed mainly from 2010 to 2016 because major advance occurred in that period. The technical categories are summarized as follows: feature-based, direct, and RGB-D camera-based approaches.
本报告将自动驾驶领域的文献划分为四大核心板块:首先是宏观层面的综述、安全性与行业趋势;其次是底层感知、定位与SLAM技术;第三是针对动态交互的轨迹预测与行为建模;最后是涵盖端到端学习、路径规划与决策控制的系统实现。该分类体系清晰地展现了从基础感知到智能决策、从模块化设计到端到端一体化的技术演进路径。