dropout CNN
医学影像疾病诊断与临床应用
集中研究CNN在医学影像(如MRI、CT、X光、皮肤病、病理切片)中的应用,重点探讨Dropout如何解决医疗数据样本不足带来的过拟合问题,以及如何提升模型在临床诊断中的准确性与鲁棒性。
- Medical Image Segmentation Algorithm Based on Optimized Convolutional Neural Network-Adaptive Dropout Depth Calculation(Fengping An, Jun-e Liu, 2020, Complexity)
- Optimizing CNN Architectures for Early Detection of Lung Cancer Using CT Images(Swagatika Devi, Lakshya Swarup, S. S, Preeti Naval, Neha Jaswani, Jaskirat Singh, 2025, 2025 International Conference on Automation and Computation (AUTOCOM))
- Classification of Glaucoma Optical Coherence Tomography (OCT) Images Based on Blood Vessel Identification Using CNN and Firefly Optimization(Komanduri Venkata Sesha Sai Rama Krishna, Kosaraju Chaitanya, P. S. Subhashini, Rajesh Yamparala, K. S. Sandeep, 2021, Traitement du Signal)
- Deep Learning-Based Classification of Alzheimer's Disease Using EEG Signals: A CNN Approach for Early Detection(N. Mezher, Ahmed F. Hussein, S. M. Salih, 2025, Al-Nahrain Journal for Engineering Sciences)
- Alzheimer’s Disease Stage Classification Using a 4-Layer CNN and SMOTE for Imbalanced MRI Data(A. Raj, Deepti Kakkar, Gagandeep Singh, 2025, 2025 International Conference on Electronics, AI and Computing (EAIC))
- A Robust CNN Model for Diagnosis of COVID-19 Based on CT Scan Images and DL Techniques(A. H. Eldeeb, M. Amr, A. S. Ibrahim, H. Kamel, S. Fouad, 2023, International Journal of Electronics and Telecommunications)
- Classification of Alzheimer's Disease via Eight-Layer Convolutional Neural Network with Batch Normalization and Dropout Techniques(Xianwei Jiang, Liang Chang, Yudong Zhang, 2020, Journal of Medical Imaging and Health Informatics)
- Head and neck tumor segmentation convolutional neural network robust to missing PET/CT modalities using channel dropout(Lin-mei Zhao, Helen Zhang, Daniel Kim, Kanchan Ghimire, R. Hu, Daniel Kargilis, Lei Tang, Shujuan Meng, Quan Chen, W. Liao, H. Bai, Z. Jiao, Xue Feng, 2023, Physics in Medicine & Biology)
- PneuNet: A Compact Deep CNN Architecture for Pneumonia Classification in Chest Radiographs(Meet Sadariya, Kumar Parmar, Nevil Vasani, Hiren Kukadiya, Pranav Gediya, 2025, 2025 IEEE Madhya Pradesh Section Conference (MPCON))
- Integrating AI and genomics: predictive CNN models for schizophrenia phenotypes(G. Henriques, Maryam Abbasi, Daniel Martins, Joel P. Arrais, 2025, Journal of Integrative Bioinformatics)
- A Comparative Performance Analysis of Activation Functions for Cardiovascular Disease Detection Using ECG Images(Mrityunjay Chaubey, Abhay Kumar Pathak, Marisha, Manjari Gupta, 2025, Advanced Theory and Simulations)
- Identification and diagnosis of schizophrenia based on multichannel EEG and CNN deep learning model.(Imene Latreche, S. Slatnia, O. Kazar, S. Harous, Mohamed Akram Khelili, 2024, Schizophrenia Research)
- Enhancing CNN Performance for Image Classification(Tasmiya Mujawar, 2025, 2025 3rd International Conference on Communication, Security, and Artificial Intelligence (ICCSAI))
- Performance Evaluation Analysis and Hyperparameter Tuning of CNN Model Using MobileNetV2 Architecture on the Ba-Nanas! Application(R. Isnanto, D. Mónica, Zaharani, C. E. Widodo, Bellia Dwi, Cahya Putri, 2025, 2025 Tenth International Conference on Informatics and Computing (ICIC))
- A Custom CNN for Skin Lesion Classification(Rahul Dept. of, Cse, Meenu Gupta, Rakesh Kumar, Ahmed J. Obaid, 2024, 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N))
- Lung Cancer Detection Using a Modified Convolutional Neural Network (CNN)(C. Cari, Mohtar Yunianto, Aisyah Ajibah Rahmah, 2024, INDONESIAN JOURNAL OF APPLIED PHYSICS)
- Evaluation of the Effect Of Regularization on Neural Networks for Regression Prediction: A Case Study of MLLP, CNN, and FNN Models(Susandri Susandri, Ahmad Zamsuri, Nurliana Nasution, Maya Ramadhani, 2025, INOVTEK Polbeng - Seri Informatika)
- Enhanced ECG Beat Classification using A Hybrid Transformer-CNN Model on the PTB-Xl Dataset(Rakesh Ramakrishna Pai, Ranjith Bhat, 2025, 2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA))
- Analysis of Prediction of Pneumonia from Chest X-Ray Images Using CNN and Transfer Learning(V. K, Shashanka, Ashwini Kodipalli, S. P., K. S, 2024, 2024 5th International Conference for Emerging Technology (INCET))
- Employing Xception convolutional neural network through high-precision MRI analysis for brain tumor diagnosis(R. Sathya, T. Mahesh, Surbhi Bhatia Khan, A. Malibari, Fatima Asiri, A. Rehman, Wajdan Al Malwi, 2024, Frontiers in Medicine)
- Hyperparameter Optimization with Hyperband for Tuberculosis Classification(Yovi Ibnu Nasikhin, B. Rahmat, Chrystia Aji Putra, 2025, bit-Tech)
- Deep Learning-Based Melanoma Skin Cancer Detection: A CNN Approach with Data Augmentation(Akash Prajapati, Poornima Tyagi, Pradeep Kumar, 2025, 2025 International Conference on Emerging Technologies and Innovation for Sustainability (EmergIN))
- XRF-SVM: Early Detection of Alzheimer’s Disease using CNN(K. Dhanushrinivas, D. Raj, N. Sathwik, I. R. Oviya, T. Balaji, J. Reddy, 2025, 2025 International Conference on Computational Robotics, Testing and Engineering Evaluation (ICCRTEE))
- Interpretable Deep Learning for Musculoskeletal Radiograph Classification: Optimizing CNN Architectures with Explainable Insights(Shakhawat Hossain Refat, Md Azizul Rahaman, Yeasin Arafat, Parves Alam, Shahriar Sultan Ramit, M. Rahman, 2026, 2026 5th International Conference on Sentiment Analysis and Deep Learning (ICSADL))
- A Fine-Tuned AlexNet-Inspired CNN with Attention Mechanism for Brain Tumor Classification(G. R. Kumar, Ritu Rani, Sonal Malhotra, Kamlesh Kukreti, 2025, 2025 6th International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS))
- Adaptive CNN-Based Multi-Comorbid Heart Disease Prediction Using US Dataset with Clinically Validated Feature Optimization(Nazia Sultana, Dr.P.K. Kumar, 2026, Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications)
- IMed-CNN: Ensemble Learning Approach With Systematic Model Dropout for Enhanced Medical Image Classification Using Image Channels and Pixel Intervals(Javokhir Musaev, Abdulaziz Anorboev, Yeong-Seok Seo, Odil Fayzullaev, Akobir Musaev, N. Nguyen, Dosam Hwang, 2025, IEEE Access)
- Enhanced Mammogram Images Classification Through Comprehensive CNN Parameters Analysis(Saguna Ingle, A. Vidhate, Sangita S. Chaudhari, 2024, 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI))
- An Automatic Nucleus Segmentation and CNN Model based Classification Method of White Blood Cell(Partha Pratim Banik, Rappy Saha, Ki-Doo Kim, 2020, Expert Systems with Applications)
- Development of Convolutional Neural Network Models to Improve Facial Expression Recognition Accuracy(Fatimatuzzahra Fatimatuzzahra, L. Lindawati, Sopian Soim, 2024, Jurnal Ilmiah Teknik Elektro Komputer dan Informatika)
- Enhancing Pneumonia Classification Performance through CNN Architecture Optimization and Hyperparameter Tuning(M. Wahyudi, A. Windarto, 2025, Journal of Image and Graphics)
- OcuMDNet: A Lightweight CNN for Robust Multi-Disease Retinal Diagnosis with Cross-Dataset Reliability.(Qianjie Yang, Vijay Govindarajan, Qiyuan Li, Heding Zhou, Z. Shaikh, Amel Ksibi, Jing Yang, L. Y. Por, 2025, Experimental Eye Research)
- Multiple Sclerosis Identification by 14-Layer Convolutional Neural Network With Batch Normalization, Dropout, and Stochastic Pooling(Shui-hua Wang, Chaosheng Tang, Junding Sun, Jingyuan Yang, Chenxi Huang, Preetha Phillips, Yudong Zhang, 2018, Frontiers in Neuroscience)
- An Adaptive Weighted Attention-Enhanced Deep Convolutional Neural Network for Classification of MRI images of Parkinson's Disease.(Xinchun Cui, N. Chen, Chao Zhao, Jianlong Li, Xiangwei Zheng, Caixia Liu, Jiahu Yang, Xiuli Li, Chao Yu, JinXing Liu, Xiaoli Liu, 2023, Journal of Neuroscience Methods)
- The Influence of Dropout and Variational Inference on CNN Model Performance in Breast Tumor Classification(M. Nababan, Poltak Sihombing, E. Nababan, T. Harumy, 2025, 2025 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS))
- Multiple sclerosis identification by convolutional neural network with dropout and parametric ReLU(Yudong Zhang, Chichun Pan, Junding Sun, Chaosheng Tang, 2018, Journal of Computational Science)
- Sensorineural hearing loss identification via nine-layer convolutional neural network with batch normalization and dropout(Shui-hua Wang, Jin Hong, Ming Yang, 2018, Multimedia Tools and Applications)
- An Improved Deep Learning Powered Stock Market Price Prediction System Based on Overfitting Resistance Level(Balamohan. S, V. Khanaa, 2025, 2025 Tenth International Conference on Science Technology Engineering and Mathematics (ICONSTEM))
- A Fine-Funed CNN for Multiclass Classification Of Brain tumors On Figshare CE-MRI and its Raspberry Pi Deployment(Alaeddine Hmidi, Neji Kouka, Lina Tekari, 2025, Statistics, Optimization & Information Computing)
- Performance analysis of seven Convolutional Neural Networks (CNNs) with transfer learning for Invasive Ductal Carcinoma (IDC) grading in breast histopathological images(Wingates Voon, Y. Hum, Y. Tee, W. Yap, M. Salim, Tian Swee Tan, H. Mokayed, K. Lai, 2022, Scientific Reports)
- Automated Diagnostic System for Leukaemia Using CNN-Based Image Analysis(Gaurav Tuteja, K. R. Chythanya, Neha Sharma, Preeti Badhani, Garima Jaitly, 2026, 2026 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI))
- Oral cancer detection via Vanilla CNN optimized by improved artificial protozoa optimizer(Yulong Chai, Xiuqing Chai, Lan Zhang, G. Ye, Fatima Rashid Sheykhahmad, 2025, Scientific Reports)
- Classification of Dyslexia Using Bayesian Optimized CNN on Plane-wise Separated fMRI Images with Visual Interpretation(Adarsh Pradhan, Mirzanur Rahman, Shikhar Kumar Sarma, Achintam Kalita, Abbas Ali, Punasmita Ghosh, 2025, IETE Journal of Research)
- Alcoholism identification via convolutional neural network based on parametric ReLU, dropout, and batch normalization(Shui-hua Wang, Khan Muhammad, Jin Hong, A. K. Sangaiah, Yudong Zhang, 2018, Neural Computing and Applications)
- Deep Convolutional Neural Network-based Automatic Detection of Brain Tumour(Indraneel Paul, Adyasha Sahu, P. Das, S. Meher, 2023, 2023 2nd International Conference for Innovation in Technology (INOCON))
- Implementation and Optimization of Saliency Mapping Algorithms in Convolutional Neural Networks (CNN) to Enhance Transparency in Pneumonia Diagnosis(Marta Ardiyanto, Ridwan Dwi Irawan, Kresna Agung Yudhianto, 2025, Proceeding of International Conference on Science, Health, And Technology)
- ECG Disease Classification Using 1D CNN(Agniva Dutta, M. Das, 2024, 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI))
通用图像识别与多行业应用场景
涉及面部识别、农作物监测、工业检测、交通标志分类等通用计算机视觉任务,研究如何通过正则化技术和模型架构优化应对不同数据分布的鲁棒性挑战。
- KLASIFIKASI CITRA WAJAH BERDASARKAN PENGGUNAAN KACAMATA MENGGUNAKAN ALGORITMA CNN DAN IMPLEMENTASI FLASK(Rahayu Fathan Asri, Subektiningsih Subektiningsih, 2026, Rabit : Jurnal Teknologi dan Sistem Informasi Univrab)
- A CNN Architecture for Distinguishing Pebbles and Shells in Coastal Image(Vatsala Anand, R. Sridhar, R. Dhenia, I. Kanani, D. Banerjee, Ajay Khajuria, 2025, 2025 International Conference on Innovations and Emerging Technologies In AI & Communication Systems (IETACS))
- Study on the CNN model optimization for household garbage classification based on machine learning(Wenzhuo Xie, Shiping Li, Wei Xu, Haotian Deng, Weihan Liao, Xianbao Duan, Xiang Wang, 2022, Journal of Ambient Intelligence and Smart Environments)
- Impact of fine-tuning parameters of convolutional neural network for skin cancer detection(Zaib Unnisa, Asadullah Tariq, Nadeem Sarwar, I. Din, M. Serhani, Zouheir Trabelsi, 2025, Scientific Reports)
- Robust SAR Automatic Target Recognition Based on Transferred MS-CNN with L2-Regularization(Yikui Zhai, Wenbo Deng, Ying Xu, Qirui Ke, Junying Gan, Bing Sun, Junying Zeng, V. Piuri, 2019, Computational Intelligence and Neuroscience)
- Pet dog facial expression recognition based on convolutional neural network and improved whale optimization algorithm(Yan Mao, Yaqian Liu, 2023, Scientific Reports)
- ENHANCING HERBAL PLANT LEAF IMAGE DETECTION ACCURACY THROUGH MOBILENET ARCHITECTURE OPTIMIZATION IN CNN(Anan Wibowo, Rahmat Zulpani, A. Windarto, A. Wanto, Sundari Retno Andani, 2025, JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer))
- Convolutional Neural Network (CNN) Model for the Classification of Varieties of Date Palm Fruits (Phoenix dactylifera L.)(P. Rybacki, Janetta Niemann, S. Derouiche, S. Chetehouna, I. Boulaares, Nili Mohammed Seghir, Jean Diatta, A. Osuch, 2024, Sensors)
- Gender Prediction through Image Analysis: A CNN Model Approach(Goldy Verma, Komuravelly Sudheer Kumar, 2025, 2025 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0)
- Based on Transfer Learning a Performance Analysis and Comparative Study of CNN for Accurate Plant Disease Detection(N. S, N. S, Ashwini Kodipalli, Trupthi Rao, K. S, 2023, 2023 International Conference on Network, Multimedia and Information Technology (NMITCON))
- Impact of Data Quality on CNN-Based Sewer Defect Detection(S. Jang, Dooil Kim, 2025, Water)
- Enhanced Plant Leaf Disease Detection Using CLAHE-Processed Images and Custom CNN Deep Learning Architecture(Neelam Sulaiya, 2025, International Journal for Research in Applied Science and Engineering Technology)
- CNN Optimization Using Dropout Regularization in Chicken Egg Fertility Detection(st Tundo, Shoffan Saifullah, rd Mesra, Betty Yel, 2024, 2024 5th International Conference on Computational Science & Information Management (ICoCSIM))
- Tiny-CNN: Structuring Convolutional Neural Networks for Accurate Classification of Rice Leaf Diseases in Resource-Constrained Environments(K. Q. Nguyen, H. Nguyen, Trinh Le, Viet Binh Quoc Thai, B. Tran, Lan Thi Thu Le, Luyl-Da Quach, 2024, Proceedings of the 2024 9th International Conference on Intelligent Information Technology)
- Enhancing Facial Expression Recognition with Robust CNN Architectures and Adaptive Preprocessing Techniques(Ao Guo, 2025, Applied and Computational Engineering)
- An enhanced light weight face liveness detection method using deep convolutional neural network(Swapnil R. Shinde, A. Bongale, D. Dharrao, Sudeep D. Thepade, 2025, MethodsX)
- An Explainable and Lightweight CNN Framework for Robust Potato Leaf Disease Classification Using Grad‐ CAM Visualization(Md. Jiabul Hoque, M. Islam, 2025, Applied AI Letters)
- Improving CNN Robustness to Color Shifts via Color Balancing and Spatial Dropout(Pradyumna Elavarthi, Anca L. Ralescu, 2025, Advances in Artificial Intelligence and Machine Learning)
- Domain-Specific Progressive Channel Dropout: Single-Source Domain Generalization for Vessel Segmentation in X-ray Coronary Angiography(Mohammad Z. Atwany, Mojtaba Lashgari, Robin P. Choudhury, Abhirup Banerjee, 2025, 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC))
- Regularization of Deep Neural Networks with Spectral Dropout(Salman Hameed Khan, Munawar Hayat, F. Porikli, 2017, Neural Networks)
- Addressing Overfitting Problem in Deep Learning-Based Solutions for Next Generation Data-Driven Networks(Mansheng Xiao, Yuezhong Wu, Guocai Zuo, Shuangnan Fan, Huijun Yu, Zeeshan Azmat Shaikh, Zhiqiang Wen, 2021, Wireless Communications and Mobile Computing)
- Comparative Analysis of CNN Architectures for Eight-Class Facial Expression Recognition: A Performance and Error Pattern Study(K. Park, 2025, Tehnicki vjesnik - Technical Gazette)
- Cross-Sectional Texture and Pattern Analysis of Burmese and Assamese Areca Nuts Using CNN Model(Prodipto Das, I. Chakraborty, Purnendu Das, D. Roy, S. Biswas, 2025, Advances in Nonlinear Variational Inequalities)
- Comparison and Implementation of CNN Facial Emotion Recognition Model with Hyperparameter Analysis on Multiple Datasets(Xaviera Valentina Tandianto, Dwi Hosanna Bangkalang, Nina Setiyawati, 2026, Teknika)
- Fruit classification and recognition based on CNN(Chenxi Wang, 2025, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2025))
- Deep Convolutional Neural Network and Metaheuristic Optimization for Disease Detection in Plant Leaves(S. Towfek, Nima Khodadadi, 2023, Journal of Intelligent Systems and Internet of Things)
- Improving CNN Test Accuracy and Generalization with Targeted Optimization Training:(Jiseok Ham, 2025, Research Competence)
- Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer(Jing Yang, Guanci Yang, 2018, Algorithms)
- A fine tuned EfficientNet-B0 convolutional neural network for accurate and efficient classification of apple leaf diseases(Hassan Ali, Noora Shifa, Rachid Benlamri, A. Farooque, R. Yaqub, 2025, Scientific Reports)
- TikogAI: A Feature-Engineered CNN Model for Classifying Indigenous Tikog Leaves in Banig Weaving(L. J. B. Caluza, Arnel C. Fajardo, 2025, Journal of Innovative Image Processing)
- Scalable and Accurate Malaria Detection with CNN: Towards Accessible Diagnostic Solutions(I. Kanani, Vishnu Kant, R. Dhenia, R. Sridhar, D. Banerjee, Gaurav Sharma, 2025, 2025 Global Conference on Information Technology and Communication Networks (GITCON))
- Hybrid CNN-Based Classification of Coffee Bean Roasting Levels Using RGB and GLCM Features(Rico Halim, Mohammad Faisal Riftiarrasyid, 2025, Engineering, MAthematics and Computer Science Journal (EMACS))
- TSC18 Convolutional Neural Network for Traffic Sign Classification(Adil Hussain, Kashif Naseer Qureshi, Ayesha Aslam, Tariq Tariq, Muhammad Rabiu Abdullahi, 2024, 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA))
- Research of regularization techniques for SAR target recognition using deep CNN models(Qiuchen Feng, Dongliang Peng, Yu Gu, 2019, Tenth International Conference on Graphics and Image Processing (ICGIP 2018))
- Harnessing the Future with an Ensemble Model of Bi-LSTM and CNN for Precise Crop Prediction(S. Yadav, Prashant Giridhar Shambharkar, 2025, 2025 International Conference on Networks and Cryptology (NETCRYPT))
- An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of Chinese sign language(Xianwei Jiang, Mingzhou Lu, Shuihua Wang, 2019, Multimedia Tools and Applications)
- Comparison of CNN-Based Models in Facial Micro-Expression Classification(Ruoxuan Liu, 2025, Highlights in Science, Engineering and Technology)
- Developing BrutNet: A New Deep CNN Model with GRU for Realtime Violence Detection(M. Haque, Syma Afsha, Hussain Nyeem, 2022, 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET))
- Fruit category classification via an eight-layer convolutional neural network with parametric rectified linear unit and dropout technique(Shui-hua Wang, Yi Chen, 2018, Multimedia Tools and Applications)
- Enhancing agriculture through real-time grape leaf disease classification via an edge device with a lightweight CNN architecture and Grad-CAM(Md. Jawadul Karim, Md. Omaer Faruq Goni, Md. Nahiduzzaman, M. Ahsan, J. Haider, M. Kowalski, 2024, Scientific Reports)
- AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection(Oleksandr Kuznetsov, D. Zakharov, Emanuele Frontoni, Andrea Maranesi, 2024, Computers & Security)
- PlantPulse: Intelligent CNN-based Analysis for Effective Maize Leaf Disease Detection(Ruchika Bhuria, Sheifali Gupta, 2024, 2024 8th International Conference on Electronics, Communication and Aerospace Technology (ICECA))
- Effectiveness of Random Search in Enhancing CNN Performance for Rice Plant Disease Classification(Tinuk Agustin, Indrawan Ady Saputro, Mochammad Luthfi Rahmadi, Fito Patria, Aradea Pinkan Kartiningtyas, Dicky Kurniawan, 2024, 2024 6th International Conference on Cybernetics and Intelligent System (ICORIS))
- Design of Enhanced CNN Model for Rice Disease Classification with Comparative Analysis on Different Variants of Dataset(S. Kazi, Bhakti Palkar, 2024, 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE))
- A Two-stage Optimized CNN Model for Power Quality Classification(Mustak Ahmed, N. Das, Nikita Kumari, S. Sreekumar, 2025, 2025 IEEE 5th International Conference on Sustainable Energy and Future Electric Transportation (SEFET))
- A Deep CNN model for Landuse Landcover Classification for 4 Band Visible and NIR Datasets(Pranavi Chachra, Aparna Tiwari, Minakshi Kumar, 2025, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences)
- Custom Lightweight Convolutional Neural Network Architecture for Automated Detection of Damaged Pallet Racking in Warehousing & Distribution Centers(Muhammad Hussain, Richard Hill, 2023, IEEE Access)
- LeafGuardNet: A Deep CNN-Based Model for Early Plant Disease Prediction in Smart Agriculture(P. Kumar, E. Venkatesh, 2025, International Journal of Computer Science and Mobile Computing)
混合模型架构与非图像数据分析
关注CNN与其他算法(LSTM、注意力机制等)的结合,应用于音频、文本、时间序列分析、网络安全及语音情感识别等领域,探讨多模态特征提取下的Dropout策略。
- A Simplified CNN Classification Method for MI-EEG via the Electrode Pairs Signals(Xiangmin Lun, Zhenglin Yu, Tao Chen, F. Wang, Yimin Hou, 2020, Frontiers in Human Neuroscience)
- Bearing fault diagnosis based on parallel multi-channel deep CNN and optimization with local sparse structure(Fangzhen Wang, Shuangxuan Liang, Xiaoli Zhang, Yongqing Zhou, 2026, Journal of Vibration and Control)
- Word Level Ethiopian Sign Language Recognition Using Hybrid CNN-LSTM Model(Bekalu Tadele, Yaregal Assabie, Tesfa Tegegne, 2025, 2025 3rd International Conference on Advancement in Computation & Computer Technologies (InCACCT))
- Hybrid LSTM-CNN deep learning framework for stock price prediction with google stock and reddit sentiment data(Emmanuel Chibuogu Asogwa, M. P. Nwankwo, Emmanuel E. Oguadimma, Chinyere Okechukwu, Ahmad Abubakar Suleiman, 2025, Innovation in Computer and Data Sciences)
- Research on network intrusion detection based on Whitening PCA and CNN(Peiqing Zhang, Guangke Tian, Haiying Dong, 2023, 2023 7th International Conference on Smart Grid and Smart Cities (ICSGSC))
- Attention mechanism based CNN-LSTM hybrid deep learning model for atmospheric ozone concentration prediction(Jiang Yuan, Dengxin Hua, Yufeng Wang, Xueting Yang, Di Huige, Yan Qing, 2025, Scientific Reports)
- Automated Recognition of Autism Spectrum Disorder from EEG Signals Using a CNN-LSTM Hybrid Model(Pranav Gupta, Helen Lee, Aranyak Laxmanan, Mahesh Khadtare, Pragati Dharmale, Dnyaneshwar Ahire, 2025, 2025 IEEE International Conference on Consumer Electronics (ICCE))
- M-FANet: Multi-Feature Attention Convolutional Neural Network for Motor Imagery Decoding(Yiyang Qin, Banghua Yang, Sixiong Ke, Peng Liu, Fenqi Rong, Xinxing Xia, 2024, IEEE Transactions on Neural Systems and Rehabilitation Engineering)
- Enhancing Cybersecurity with Deep Learning: A Comparative Study of CNN and RNN for IDS(Mana Saleh Al Reshan, 2025, 2025 10th International Conference on Information and Communication Technology for the Muslim World (ICT4M))
- Adversarially Robust 1D-CNN for Malicious Traffic Detection in Network Security Applications(Admin Admin, E. Alomari, J. S. Alrubaye, O. Hassen, 2025, Journal of Cybersecurity and Information Management)
- Sentiment Analysis of Indonesian Hashtag #kaburajadulu Using CNN(Arvyn Rezky Fahrezy, Sabila Aralia Refina, Jessica Giovanna Chandra, Soni Yora, 2025, 2025 International Conference on Information Technology and Computing (ICITCOM))
- Word Embedding Dropout and Variable-Length Convolution Window in Convolutional Neural Network for Sentiment Classification(Shangdi Sun, Xiaodong Gu, 2017, Lecture Notes in Computer Science)
- Robust CNN-based Musical Instrument Recognition with Enhanced Feature Learning(Padmesh Sivalingam, A. J., S. P, Y. B, Ragav S, L. R., 2025, 2025 International Conference on Inventive Computation Technologies (ICICT))
- Sign Language to Text Translation Using Convolutional Neural Network(Trivedi Devarsh Gunvantray, T. Ananthan, 2024, 2024 International Conference on Emerging Smart Computing and Informatics (ESCI))
- Real-Time Facial Region Identification Via a Custom Six-Layer Convolutional Network(Pan Thiri Tun, Keniya Roy, Oakkar Min, Manish Deshwal, Gaurav Aggarwal, Rishabh Dev Shukla, 2025, 2025 International Conference on Electrical and Electronics Engineering (ICE3))
- Comparing Regularization Techniques for Overfitting in CNN-Based Music Genre Recognition(Qiaomu Liu, 2026, Theoretical and Natural Science)
- RMSE-Driven CNN for Efficient Speech Emotion Recognition(Uddipan Sarkar, Khairul Nisak Md Hasan, A. N. Nagaraja Rao, Tiasa Jana, 2025, 2025 5th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU))
- Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network(Dawei Feng, Kele Xu, Haibo Mi, Feifan Liao, Yan Zhou, 2018, Lecture Notes in Computer Science)
- Convolutional Neural Network and Dropout Technique for Recognition of Thai Food Image(Niti Natephakdee, Supansa Chaising, P. Temdee, 2022, 2022 25th International Symposium on Wireless Personal Multimedia Communications (WPMC))
- Deep Convolutional Neural Networks based Performance Optimization for Fine-grained Sentiment Analysis(Zhe Chen, 2025, 2025 International Conference on Signal Processing, Computer Networks and Communications (SPCNC))
Dropout理论创新、正则化变体与参数优化
集中于Dropout本身的机制改进(如加权、通道、混合池化等)、新型正则化策略研究,以及利用元启发式算法进行超参数自动搜索的系统性研究。
- HYPERPARAMETER TUNING OF CNN USING BAYESIAN OPTIMIZATION ON GASTRIC CANCER HISTOPATHOLOGICAL IMAGES(Radhika Shetty, 2025, International Journal of Applied Mathematics)
- Performance analysis of Convolutional Neural Network (CNN) based Cancerous Skin Lesion Detection System(G. Jayalakshmi, V. S. Kumar, 2019, 2019 International Conference on Computational Intelligence in Data Science (ICCIDS))
- Overfitting Defeat with Dropout for Image Classification in Convolutional Neural Networks(Kukuh Nugroho, Hendrawan, Iskandar, 2024, 2024 10th International Conference on Wireless and Telematics (ICWT))
- Dropout regularization to overcome the overfitting of the ResNet-50 CNN algorithm in oil palm leaf disease classification(H. P. Kiki Iranda, Ade Candra, Agus Harjoko, 2024, AIP Conference Proceedings)
- A CNN-based surrogate model of isogeometric analysis in nonlocal flexoelectric problems(Qimin Wang, X. Zhuang, 2022, Engineering with Computers)
- Effects and results of dropout layer in reducing overfitting with convolutional Neural Networks (CNN)(Olivia Harris, Michael Andrews, 2024, World Journal of Advanced Engineering Technology and Sciences)
- Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuning(Wajahat Hussain, Muhammad Faheem Mushtaq, Mobeen Shahroz, Urooj Akram, E. Ghith, M. Tlija, Tai-hoon Kim, Imran Ashraf, 2025, Scientific Reports)
- Intelligent handwritten recognition using hybrid CNN architectures based-SVM classifier with dropout(A. Ali, Suresha Mallaiah, 2021, Journal of King Saud University - Computer and Information Sciences)
- Comprehensive Analysis of the Impact of Learning Rate and Dropout Rate on the Performance of Convolutional Neural Networks on the CIFAR-10 Dataset(Changyu Peng, 2024, Applied and Computational Engineering)
- Hybrid Aquila optimizer-Harris Hawks optimization for CNN hyperparameter tuning in brain tumor classification.(Manoj Kumar, Noor Mohd, G. Shivam, Ankur Goyal, Deepak Parashar, Rijwan Khan, 2026, Scientific Reports)
- Cuckoo Search-Optimized Deep CNN for Enhanced Cyber Security in IoT Networks(B. B. Gupta, Akshat Gaurav, V. Arya, R. Attar, Shavi Bansal, Ahmed Alhomoud, K. Chui, 2024, Computers, Materials & Continua)
- Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network(A. Poernomo, Dae-Ki Kang, 2018, Neural Networks)
- Weight Dropout for Preventing Neural Networks from Overfitting(Karshiev Sanjar, A. Rehman, Anand Paul, Kim JeongHong, 2020, 2020 8th International Conference on Orange Technology (ICOT))
- Hyperparameter Tuning of Convolutional Neural Networks Using Lion Optimization Algorithm(Swagatika Mohapatra, P. P. Sarangi, Bhabani Shankar Prasad Mishra, Madhumita Panda, Siddharth Singh, A. Hasib, 2025, 2025 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI))
- Hyperparameter Optimization for Deepfake Detection Improving CNN and Transformer Performance Using Bayesian Search(Khadeja M. Mohamed, Ahmad ali Skaiky, Hanan Mahmood Shukur Ali, Aymen Mohammed, Zalzala Ali Mahdi, 2025, 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI))
- Leveraging Nature-Inspired Search to Boost CNN Performance in Colorectal Cancer Classification(Islam A. A. Mohammed, Nour S. Bakr, H. Moustafa, Adel F. Mohamed Moustafa, Hanan. M. Amer, 2025, International Journal of Telecommunications)
- NeuroFusion: A Hybrid CNN and Cuckoo Search Model for Enhanced Alzheimer’s Disease Detection(D. Ramani, Lakshmi, S. B V, C. B, B. S V, Soja Rani S, 2025, 2025 3rd International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI))
- Hyperparameter optimization of convolutional neural network using grey wolf optimization for facial emotion recognition(Muhammad Munsarif, Muhammad Sam'an, Ernawati Ernawati, Budi Santosa, 2025, Indonesian Journal of Electrical Engineering and Computer Science)
- A Hybrid Model using CNN and Genetic Algorithm for Pattern Classification of Interstitial Lung Disease(Tapas Pal, Biswadev Goswami, Sanjib Saha, Rajesh P. Barnwal, 2025, 2025 International Conference on Computing, Intelligence, and Application (CIACON))
- Tomato Leaf Detection Using Hybrid of CNN and Ant Lion Optimization(Rajsimar Singh, Law Kumar Singh, 2025, 2025 International Conference on Emerging Technologies and Innovation for Sustainability (EmergIN))
- Acne Scar Classification Using Enhanced CNN and Particle Swarm Optimization(Suzanne Vitolo, G. F. Shidik, Purwanto, 2025, 2025 4th International Conference on Electronics Representation and Algorithm (ICERA))
- Lightweight CNN with Particle Swarm Optimization for Odia and Bangla Handwritten Script Recognition(Pragnya Ranjan Dash, R. Balabantaray, 2026, SN Computer Science)
- Tomato Leaf Disease Detection Using a Hybrid of CNN and Whale Optimization Algorithm(Rajsimar Singh, Law Kumar Singh, V. Gupta, Manjot Kaur, 2025, 2025 2nd Global AI Summit - International Conference on Artificial Intelligence and Emerging Technology (AI Summit))
- Improving Visible Light Positioning Accuracy Using Particle Swarm Optimization (PSO) for Deep Learning Hyperparameter Updating in Received Signal Strength (RSS)-Based Convolutional Neural Network (CNN)(Chun-Ming Chang, Yuan-Zeng Lin, Chi-Wai Chow, 2025, Sensors)
- Mixed-pooling-dropout for convolutional neural network regularization(Brahim Ait Skourt, Abdelhamid El Hassani, A. Majda, 2021, Journal of King Saud University - Computer and Information Sciences)
- CSD: Channel Selection Dropout for Regularization of Convolutional Neural Networks(Imrus Salehin, Dae-Ki Kang, 2025, IEEE Access)
- GDnet-IP: Grouped Dropout-Based Convolutional Neural Network for Insect Pest Recognition(Dongcheng Li, Yongqi Xu, Zheming Yuan, Zhijun Dai, 2024, Agriculture)
- Enhancing Gravitational Lens Study with Deep Learning: A Study on Effects of Dropout Regularization(J. J. Ancona-Flores, A. Hernández-Almada, V. Motta, 2026, Galaxies)
- Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking(Yunpeng Gong, Chuangliang Zhang, Yongjie Hou, Lifei Chen, Min Jiang, 2024, 2024 International Joint Conference on Neural Networks (IJCNN))
- Improving CNN Runtime Robustness Against Soft Errors by Dropout Layer Optimization(Robert Limas Sierra, Giuseppe Esposito, Juan-David Guerrero-Balaguera, J. Condia, M. S. Reorda, 2025, 2025 IEEE 26th Latin American Test Symposium (LATS))
- Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network(Saihui Hou, Zilei Wang, 2019, Proceedings of the AAAI Conference on Artificial Intelligence)
- CNN Hyperparameter Optimization for MNIST Dataset with Metaheuristic Algorithms(Gülistan Arslan, Hasan Temurtaş, 2025, Artificial Intelligence Studies)
- Deep Residual Multiscale Convolutional Neural Network With Attention Mechanism for Bearing Fault Diagnosis Under Strong Noise Environment(Shuzhen Han, Shengke Sun, Zhanshan Zhao, Ziqian Luan, Pingjuan Niu, 2024, IEEE Sensors Journal)
- Spectral Wavelet Dropout: Regularization in the Wavelet Domain(Rinor Cakaj, Jens Mehnert, Bin Yang, 2024, 2024 International Conference on Machine Learning and Applications (ICMLA))
- DisturbLabel: Regularizing CNN on the Loss Layer(Lingxi Xie, Jingdong Wang, Zhen Wei, Meng Wang, Qi Tian, 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR))
- Combining Regularization and Dropout Techniques for Deep Convolutional Neural Network(Zari Farhadi, H. Bevrani, M. Feizi-Derakhshi, 2022, 2022 Global Energy Conference (GEC))
- Towards Reliable Malaria Diagnosis: Hybrid CNN Framework Based on VGG16 with Data Augmentation and Dropout-Enhanced Training Strategy(Md. Tofael Ahmed Bhuiyan, Nazim Uddin, Shahriar Manzoor, K. M. Uddin, 2025, 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN))
- An Empirical Examination of CNN and ResNet-8 Architectures on the CIFAR-10 Dataset through Bayesian Hyper-parameter Optimization(Tanagi A. Omer, 2025, 2025 International Conference on Computer and Applications (ICCA))
本报告对Dropout在CNN中的应用进行了系统梳理,涵盖四大核心方向:医学影像分析(聚焦诊断准确性与小样本处理)、通用图像识别与行业实战(聚焦鲁棒性与泛化能力)、多模态与时序信号分析(聚焦混合架构特征提取),以及Dropout技术与超参数优化的方法论创新。综合研究表明,Dropout已不仅是防止过拟合的基石,通过与元启发式算法、注意力机制及新型变体(如通道Dropout、加权Dropout)的结合,其在各类复杂任务中的模型可靠性与性能上限得到了显著提升。
总计155篇相关文献
This research investigates the application of Convolutional Neural Networks (CNNs) to automate fertility detection in chicken eggs, a task traditionally performed through manual candling. Utilizing a dataset of 292 candled egg images, we initially employed a CNN-basic model, which, while achieving high training accuracy, exhibited substantial overfitting with only 75% accuracy and a high loss of 0.9525 on validation datasets. Various Dropout rates were systematically tested to enhance the model's generalization capabilities, with 0.3 identified as the optimal rate. This adjustment significantly improved the model's performance, achieving a testing accuracy of 97.92% and reducing the loss to $\mathbf{0.2095}$. The findings highlight the efficacy of tailored CNNs in accurately classifying egg fertility, offering significant advancements for poultry farming. Future efforts will refine the CNN architecture through advanced preprocessing, expanded data augmentation, and sophisticated segmentation techniques to increase robustness and extend the model's applicability across diverse agricultural environments.
No abstract available
Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural net-works (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work intro-duces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: ID-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral “Fourier” Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral “Fourier” Dropout (lD-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both ID and 2D SWD variants have competitive performance on CIFAR-IO/IOO benchmarks relative to both ID-SFD and 2D-SFD. Specifically, ID-SWD has a significantly lower computational complexity compared to ID/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass ID-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.
Regularization is an important technique for developing deep learning models to improve generalization and reduce overfitting. This study evaluated the effect of regularization on the performance of neural network models in regression prediction tasks using earthquake data. We compare Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Feedforward Neural Network (FNN) architectures with L2 and Dropout regularization. The experimental results show that MLP without regularization achieved the best performance (RMSE: 0.500, MAE: 0.380, R²: 0.625), although prone to overfitting. CNN performed poorly on tabular data, while FNN showed marginal improvement with deeper layers. The novelty of this study lies in a comparative evaluation of regularization strategies across multiple architectures for earthquake regression prediction, highlighting practical implications for early warning systems.
Plasmodium parasites produce malaria, which may be very dangerous if left untreated. While computer-aided procedures allow for quicker and more accurate detection, conventional diagnostic methods are often expensive and subject to human mistake. Deep learning works especially well for classifying malaria because it makes use of a wealth of visual data. This study employs transfer learning with pre-trained convolutional neural networks (CNNs), such as VGG16, ResNet50, and EfficientNetB3, to differentiate between images of cells infected with malaria and those that are not. On a well-known malaria dataset, VGG16 has the best classification accuracy of 98.15 % among them, together with outstanding specificity and sensitivity. To maximize model performance, the approach combines data augmentation with a customized CNN architecture built on the VGG16 architecture, which is improved by dropout regularization along with training callbacks. Numerous tests confirm the efficacy of the suggested approach, offering a dependable and automated way to diagnose malaria.
Convolutional neural networks (CNNs) have demonstrated remarkable success in visionrelated tasks. However, their susceptibility to failing when inputs deviate from the training distribution is well-documented. Recent studies suggest that CNNs exhibit a bias toward texture instead of object shape in image classification tasks, and that background information may affect predictions. This paper investigates the ability of CNNs to adapt to different color distributions of an image while maintaining context and background. The results of our experiments on modified MNIST, CIFAR10 and CIFAR 100 data demonstrate that changes in color can substantially affect classification accuracy. The paper explores the effects of various regularization techniques on generalization error across datasets and proposes an architectural modification using in a novel way color balancing and spatial dropout regularization. This enhances the model reliance on color-invariant intensity-based features for improved classification accuracy. Overall, this work contributes to ongoing efforts to understand the limitations and challenges of CNNs in image classification tasks and offers a potential solution to improve their performance.
Convolutional Neural Networks (CNNs) have shown exceptional effectiveness in complex and data-intensive domains such as image and video processing, conversational systems, and healthcare. Moreover, sectors like High-Performance Computing and safety-critical applications, including automotive, aerospace, and autonomous robotics, impose stringent requirements on energy efficiency, performance, and robustness. However, modern semiconductor technologies are increasingly vulnerable to faults, which can degrade CNN performance and potentially result in catastrophic failures. This work explores the impact of regularization techniques (dropout layer) in enhancing the inference robustness of CNN models against soft errors. We analyzed soft error impacts on five widely adopted CNN architectures, each trained with ten different dropout rates. Our experimental results reveal that optimizing the dropout rate during training can improve the in-field robustness of CNN models by up to 12% compared to baseline configurations under soft error conditions. Additionally, fine-tuning this architectural parameter can lead to accurary improvements of up to 10%.
In this study, we present a novel approach, Channel Selection Dropout (CSD), designed to regularize deep convolutional neural network (CNN) architectures. Unlike standard Dropout, which randomly deactivates neurons in fully connected layers, CSD works on the image channels within the sequence of convolutional layers. Specifically, CSD is composed of three modules, i.e., Channel Process Module, Channel Drop Module, and Scale Module. CSD primarily emphasizes channels to identify significant channels based on activation values. It preserves channels that possess values above a user-defined threshold <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>. Channels that are less significant are set to a value of zero. CSD does not add any extra expenses during the CNN architecture training phase. It is used only during training and deployed only at a minimal computational cost. In the testing phase, the network retains its original state, resulting in no added expenses for inference. Moreover, CSD integration into current networks does not require re-pretraining on ImageNet. This makes it fit seamlessly with other datasets. Finally, the performance of CSD with ResNet-18, ResNet-50, and VGGNet-16 is experimentally evaluated across multiple datasets. Our results demonstrate that setting <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>= 0.60 significantly enhances performance, with most results reaching over 95% accuracy on the benchmark datasets. However, the performance of <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> may vary, and it can be adjusted based on the specific dataset and architecture used. The comprehensive results clarify that CSD consistently enhances performance over the baselines. This method can be applied in future CNN applications to mitigate overfitting, particularly in image segmentation, Vision Transformers (ViT), and medical imaging.
Convolutional Neural Networks (CNNs) have proved to be more precise for most computer vision tasks like image classification, object detection, and facial recognition. In the process, though, CNNs are susceptible to overfitting, particularly where the model complexity is high but the training data are few. Overfitting diminishes the generalizability potential of a model to new data, and deep learning consequently demands regularization techniques. One of the most powerful and widely used regularization methods is dropout, in which a random set of neurons is dropped at each training iteration. It prevents neurons from co-adapting too strongly to specific features in the training data, making the network more robust and generalizable. Here, we empirically validate the effect of the dropout layers used in the CNN model scenario. Particular interest to us is obtaining the dynamics of model training, generalization, and performance about changes in the dropout rate. Experimental and model comparisons are performed using standard image classification datasets under various dropout settings. In all our experiments, results indicate that models trained with dropout are achieved at the cost of reduced overfitting, enhanced validation accuracy, and better generalization over novel data. The findings highlight the need to apply dropout in CNN architecture, particularly when dealing with small datasets. Our contribution highlights the trade-off in choosing an optimal dropout rate since high or low rates can lead to underfitting or insufficient regularization. Lastly, the current study reiterates the application of dropout as a leading method for enhancing the performance and stability of deep learning models in computer vision applications.
Strong gravitational lensing provides valuable insights into the mass distribution of galaxies and the nature of dark matter. However, its modeling is computationally demanding due to the large volume of strong lensing observations. In this work, we explore the application of Convolutional Neural Networks to infer physical parameters from simulated galaxy–galaxy lens systems, described by the Singular Isothermal Ellipsoid (SIE) profile for the galaxy lens. We construct a dataset of 76,396 synthetic lensing images derived from the China Space Station Telescope catalog and employ it to train a modified CNN model, based on the AlexNet architecture, to predict four key SIE parameters, the Einstein radius, the axis ratio and ellipticity components. We analyze the network performance under three distinct dropout configurations to quantify their influence on generalization and parameter inference accuracy. The results indicate that the incorporation of dropout is critical for enhancing the precision and robustness of the estimated parameters as demonstrated using a 4-fold cross-validation procedure. When dropout tools are included, we obtain coefficients of determination up to R2∼0.96 for most SIE parameters and mean peak signal-to-noise ratios of up to ∼37 dB. Relative to the configuration without dropout, the use of dropout reduces the relative errors in the inferred SIE parameters by approximately 60–76%, resulting in errors of at most ∼9% at the 90% confidence level for the majority of parameters. These findings highlight the potential of deep learning approaches to enable scalable, computationally efficient, and high-precision modeling of strong gravitational lensing systems.
Training a deep neural network with a large number of parameters often leads to overfitting problem. Recently, Dropout has been introduced as a simple, yet effective regularization approach to combat overfitting in such models. Although Dropout has shown remarkable results on many deep neural network cases, its actual effect on CNN has not been thoroughly explored. Moreover, training a Dropout model will significantly increase the training time as it takes longer time to converge than a non-Dropout model with the same architecture. To deal with these issues, we address Biased Dropout and Crossmap Dropout, two novel approaches of Dropout extension based on the behavior of hidden units in CNN model. Biased Dropout divides the hidden units in a certain layer into two groups based on their magnitude and applies different Dropout rate to each group appropriately. Hidden units with higher activation value, which give more contributions to the network final performance, will be retained by a lower Dropout rate, while units with lower activation value will be exposed to a higher Dropout rate to compensate the previous part. The second approach is Crossmap Dropout, which is an extension of the regular Dropout in convolution layer. Each feature map in a convolution layer has a strong correlation between each other, particularly in every identical pixel location in each feature map. Crossmap Dropout tries to maintain this important correlation yet at the same time break the correlation between each adjacent pixel with respect to all feature maps by applying the same Dropout mask to all feature maps, so that all pixels or units in equivalent positions in each feature map will be either dropped or active during training. Our experiment with various benchmark datasets shows that our approaches provide better generalization than the regular Dropout. Moreover, our Biased Dropout takes faster time to converge during training phase, suggesting that assigning noise appropriately in hidden units can lead to an effective regularization.
A key component of music information retrieval (MIR) is music genre classification, which has several uses, including automatic labelling and recommendation systems found on numerous music applications. Considering previous studies that exhibited overfitting, this study compares three widely used regularization techniquesDropout, L2-Regularization, and Batch Normalizationto evaluate their effectiveness in minimizing overfitting. The GTZAN dataset was used, where the raw audio waveforms were converted into log-scaled Mel-spectrograms, which were used as the input to the convolutional neural network (CNN) model. A baseline CNN model and three regularized variants were trained and evaluated based on classification test accuracy, the generalization gap, and the generalization ratio. The results indicate that Batch Normalization was the most effective strategy, achieving the highest test accuracy of 75.00%. In contrast, L2-Regularization provided moderate improvement compared to the baseline model. At the same time, using Dropout by itself made the model worse at handling new songs, because randomly removing neurons interfered with learning stable patterns in how the frequency changes over time in spectrograms made from the original audio data. These results indicate that the effectiveness of regularization methods in audio-based CNNs depends strongly on the specific technique employed, with Batch Normalization offering the most promising approach for enhancing generalization in CNN-based music classification systems.
In this work, we propose a novel method named Weighted Channel Dropout (WCD) for the regularization of deep Convolutional Neural Network (CNN). Different from Dropout which randomly selects the neurons to set to zero in the fully-connected layers, WCD operates on the channels in the stack of convolutional layers. Specifically, WCD consists of two steps, i.e., Rating Channels and Selecting Channels, and three modules, i.e., Global Average Pooling, Weighted Random Selection and Random Number Generator. It filters the channels according to their activation status and can be plugged into any two consecutive layers, which unifies the original Dropout and Channel-Wise Dropout. WCD is totally parameter-free and deployed only in training phase with very slight computation cost. The network in test phase remains unchanged and thus the inference cost is not added at all. Besides, when combining with the existing networks, it requires no re-pretraining on ImageNet and thus is well-suited for the application on small datasets. Finally, WCD with VGGNet-16, ResNet-101, Inception-V3 are experimentally evaluated on multiple datasets. The extensive results demonstrate that WCD can bring consistent improvements over the baselines.
Though Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) via Convolutional Neural Networks (CNNs) has made huge progress toward deep learning, some key issues still remain unsolved due to the lack of sufficient samples and robust model. In this paper, we proposed an efficient transferred Max-Slice CNN (MS-CNN) with L2-Regularization for SAR ATR, which could enrich the features and recognize the targets with superior performance. Firstly, the data amplification method is presented to reduce the computational time and enrich the raw features of SAR targets. Secondly, the proposed MS-CNN framework with L2-Regularization is trained to extract robust features, in which the L2-Regularization is incorporated to avoid the overfitting phenomenon and further optimizing our proposed model. Thirdly, transfer learning is introduced to enhance the feature representation and discrimination, which could boost the performance and robustness of the proposed model on small samples. Finally, various activation functions and dropout strategies are evaluated for further improving recognition performance. Extensive experiments demonstrated that our proposed method could not only outperform other state-of-the-art methods on the public and extended MSTAR dataset but also obtain good performance on the random small datasets.
The big breakthrough on the ImageNet challenge in 2012 was partially due to the 'Dropout' technique used to avoid overfitting. Here, we introduce a new approach called 'Spectral Dropout' to improve the generalization ability of deep neural networks. We cast the proposed approach in the form of regular Convolutional Neural Network (CNN) weight layers using a decorrelation transform with fixed basis functions. Our spectral dropout method prevents overfitting by eliminating weak and 'noisy' Fourier domain coefficients of the neural network activations, leading to remarkably better results than the current regularization methods. Furthermore, the proposed is very efficient due to the fixed basis functions used for spectral transformation. In particular, compared to Dropout and Drop-Connect, our method significantly speeds up the network convergence rate during the training process (roughly ×2), with considerably higher neuron pruning rates (an increase of ∼30%). We demonstrate that the spectral dropout can also be used in conjunction with other regularization approaches resulting in additional performance gains.
Due to limited training data for SAR target recognition tasks, different regularization techniques have been used in the deep convolution neural networks (CNN) to improve generalization ability. In this paper, the influences of three regularization techniques, including data augmentation, L2 regularization term, dropout, are studied under standard operating conditions (SOC) when moving and stationary target recognition (MSTAR) dataset is used for SAR target recognition. Four representative CNN models based on classical models, such as AlexNet and ResNet, are selected and trained to recognize 10-classes targets. Additionally, a CNN model which has fewer network parameters is designed based on multi-scale spatial feature extraction strategy and SqueezeNet to study the influence of the amount of network parameters. The experimental results demonstrate that, when using the AlexNet series model for SAR target recognition, using dropout may greatly improve the ability of model optimization. ResNet series models which have more layers, have better effect on Test 1+noise than other CNN models, especially taking dropout in the model. For the models based on highway networks, adding L2 regularization terms in loss function can improve the test accuracy, but it also makes the latter phase of training extremely unstable. Data augmentation is an effective regularization technique when the model can get high training accuracy.
In this study, we propose a new method for oral cancer detection using a modified Vanilla Convolutional Neural Network (CNN) architecture with incorporated batch normalization, dropout regularization, and a customized design structure for the convolutional block. An Improved Artificial Protozoa Optimizer (IAPO) metaheuristic algorithm is proposed to optimize the Vanilla CNN and the IAPO improves the original Artificial Protozoa Optimizer through a new search strategy and adaptive parameter tuning mechanism. Due to its effectiveness in search space exploration while avoiding local optimal points, the IAPO algorithm is chosen to optimize the convolutional neural network. In this study, a dataset of 1000 images of patients had published which will be preprocessed with contrast enhancement, noise reduction and data augmentation (like rotation, flipping and cropping) to generate the robust targeted model for oral cancer detection. The experimental results are evaluated against benchmark performance measures (accuracies, precision, recall, F1-score and area under the receiver operating characteristic (ROC) curve (AUC-ROC). We demonstrate through experimental results that the proposed IAPO optimized Vanilla CNN achieves a high accuracy of 92.5% which is superior than the previous state-of-the art models such as ResNet-101 (90.1%) and DenseNet-121 (89.5%). This proves to be a more trustworthy approach to oral cancer detectionbecause of the accuracy of the proposed method in comparison to denoting the supplementary results of the suggested method in contrast to other existing models.
For identifying foliar diseases in crops at an early stage, accurate detection is necessary in maintaining food security, minimizing economic losses, and cultivating sustainable agriculture. In staple crops, potato is highly vulnerable to lethal diseases like Early Blight and Late Blight that can drastically affect both the quality and the quantity of the yield. Conventional diagnostic procedures using visual observation and/or laboratory examinations are frequently tedious, time‐consuming, and susceptible to error. To address these problems, in this research, we propose a novel deep learning architecture using a customized convolutional neural network (CNN) for classifying potato leaf images into three distinct classes, namely Early Blight, Late Blight and Healthy. The model is trained on a selective and heavily augmented subset of the PlantVillage dataset containing 11,593 images and further optimized using regularization techniques like dropout and batch normalization. The system architecture is intended to keep the tradeoff between performance and computational efficiency, so as to fit real‐world agricultural scenarios. To increase interpretability and improve trust, we use the Gradient‐weighted Class Activation Mapping (Grad‐CAM) to visualize the regions in space of the leaves that most contribute to the prediction of the model. The experimental results show superior performance and the proposed model reaches 99.14% accuracy and close‐to‐perfect precision, recall and F1‐scores in all of the classes. Grad‐CAM visualizations validate that the model is robust in attending to biologically meaningful regions for the disease symptoms. In addition, we perform comparative analyses against recent state‐of‐the‐art models, and demonstrate that the proposed approach outperforms the others in accuracy and interpretability.
Sewer pipelines are essential urban infrastructure that play a key role in sanitation and disaster prevention. Regular condition assessments are necessary to detect defects early and determine optimal maintenance timing. However, traditional visual inspection using closed-circuit television (CCTV) footage is time-consuming, labor-intensive, and dependent on subjective human judgment. To address these limitations, this study develops a convolutional neural network (CNN)-based sewer defect classification model and analyzes how data quality—such as mislabeled or redundant images—affects model accuracy. A large-scale public dataset of approximately 470,000 sewer images was used for training. The model was designed to classify non-defect and three major defect categories. Based on the ResNet50 architecture, the model incorporated dropout and L2 regularization to prevent overfitting. Experimental results showed the highest accuracy of 92.75% at a dropout rate of 0.2 and a regularization coefficient of 0.01. Further analysis revealed that mislabeled, redundant, or obscured images within the dataset negatively impacted model performance. Additional experiments quantified the impact of data quality on accuracy, emphasizing the importance of proper dataset curation. This study provides practical insights into optimizing data-driven approaches for automated sewer defect detection and high-performance model development.
This study aims to develop a lightweight convolutional network for the classification of multiple retinal diseases using fundus images and to evaluate cross-dataset generalization under strict label alignment. We introduce OcuMDNet (Ocular Multi-Disease Net), a compact convolutional neural network (CNN) specifically designed for fundus imagery, incorporating batch normalization and dropout for regularization. A standardized processing pipeline is employed, which includes cropping and resizing images to 224×224 pixels, applying contrast-limited adaptive histogram equalization (CLAHE), and performing per-channel normalization. The training process utilizes the AdamW optimizer and incorporates early stopping to enhance model performance. We propose a label-aligned evaluation protocol: (i) assesses 4-class performance (Normal, diabetic retinopathy (DR), Glaucoma, age-related macular degeneration (AMD)) on a combined dataset assembled from public sources; (ii) reports disease-specific results based on the native labels of each dataset (DR: EyePACS, Messidor; Glaucoma: ORIGA; AMD: AREDS); and (iii) evaluates cross-dataset transfer for DR (training on EyePACS and testing on Messidor). Patient-level splits are implemented to prevent data leakage, and class counts are reported for each split. Performance metrics such as accuracy, macro-F1 score, and one-vs-rest ROC-AUC are calculated with 95% confidence intervals using stratified bootstrap (n=1000). Paired comparisons are conducted using McNemar's test for accuracy and DeLong's method for AUC, with multiplicity control applied. The OcuMDNet demonstrates strong performance on both combined and disease-specific benchmarks, maintaining robust discrimination in cross-dataset evaluations for DR while ensuring computational efficiency suitable for large-scale screening applications. Ablation studies confirm the significance of preprocessing steps and architectural choices. In conjunction with a label-aligned protocol, the OcuMDNet provides an accurate and efficient baseline for multi-disease fundus analysis, facilitating a transparent assessment of cross-dataset reliability. The code, scripts, and trained weights will be made available to support reproducibility.
Musical instrument recognition is a challenging task with applications in music information retrieval, audio processing, and automated transcription. This study presents a Convolutional Neural Network (CNN) model leveraging Mel spectrograms from the IRMAS dataset to classify 11 instrument categories. The model, incorporating convolutional layers, batch normalization, and dropout regularization, achieved a peak validation accuracy of 78.37 % over 60 epochs. Comparative analysis with state-of-the-art methods highlights its competitive performance and computational efficiency. Robustness evaluations on varying input lengths and noise levels assess the model's generalization. Performance metrics, including accuracy trends, loss curves, and a confusion matrix, demonstrate strong classification for instruments like piano and violin while revealing challenges in distinguishing spectrally similar instruments. These findings underscore the effectiveness of CNNs for instrument classification and provide insights for enhancing deep learning-based audio recognition models.
Facial expression recognition (FER) is an essential technology at the intersection of artificial intelligence (AI), computer vision, and psychology. This study proposes a novel framework for FER, aiming to improve system robustness and generalization, especially under variable real-world conditions. Using the FER2013 dataset, this research combines an adaptive preprocessing pipeline with a custom Convolutional Neural Network (CNN) architecture. Key preprocessing steps include normalization, rotation, and flipping to improve data quality and diversity. The CNN architecture combines regularization methods, including dropout and L2 regularization. Dynamic hyperparameter tuning and early stopping optimize performance and prevent overfitting. Normalized confusion matrix indicating strong recognition for well-represented emotions, such as happiness with 86% accuracy, and challenges with underrepresented categories like disgust. This research aims to contribute to the ongoing development of facial expression recognition systems by enhancing their robustness and generalization. While further refinement is needed, this work provides a step toward more accurate and adaptable FER models, with the potential to support advancements in human-computer interaction and various real-world applications.
This study aims to develop a hybrid Convolutional Neural Network (CNN) model for classifying the roasting levels of Coffea arabica beans by integrating RGB color and GLCM texture features. A total of 1,600 high-resolution images were used, consisting of 1,200 training images and 400 testing images, evenly distributed across four roasting levels: Green, Light, Medium, and Dark. Local feature extraction was performed using a sliding window approach to capture fine-grained color and texture information from each image. Three model types were evaluated: a CNN with RGB-only input, a CNN with GLCM-only input, and a hybrid CNN with dual inputs. The hybrid model consistently demonstrated superior performance, achieving a validation accuracy of 99.74%, with minimal misclassification and stable convergence throughout training. Furthermore, six architectural variations of the hybrid model were tested by applying dropout and L2 regularization techniques. The model combining both dropout and L2 regularization achieved the most balanced results in terms of accuracy, generalization, and training stability. This research contributes an effective feature fusion strategy for fine-grained visual classification tasks, particularly in domains where inter-class visual differences are subtle. The proposed approach offers a cost-effective and scalable solution that is well-suited for real-time implementation in small to medium-sized coffee production facilities, and it shows strong potential for broader applications in agricultural product quality assessment.
The increasing challenges of global food security, driven by limited natural resources, climate change, and the growing demands of a growing population, require innovative solutions to agricultural decision making. Accurate pre-season crop prediction is crucial for optimizing resource allocation, minimizing risks, and fostering sustainable farming practices. This study presents a hybrid model that integrates CNN and BiLSTM networks to improve crop prediction accuracy by leveraging historical crop rotation data and synthetic field-level representations. Unlike conventional pixel-based methods, which often struggle to capture complex spatial and temporal dynamics, the proposed model effectively extracts spatial features through CNNs and temporal patterns via BiLSTM, converting raw data into detailed spatial-temporal representations. Robustness and generalizability are ensured using stratified k-fold crossvalidation and regularization techniques, such as dropout to prevent overfitting, making the model capable of handling largescale datasets. Experimental results demonstrate that the hybrid model consistently outperforms existing approaches, delivering more precise crop predictions that closely align with practical agricultural needs. By offering accurate and actionable information, this framework enables better decision making for farmers and stakeholders, supports sustainable agriculture, and paves the way for resilient food systems in the face of future challenges.
Although melanoma represents a small number of skin cancers, it causes approximately 75% of deaths from skin cancer and can be one of the most fatal types of skin cancer. Although early detection is the first step to curative treatment, automated detection is becoming increasingly critical as the need for dermatology continues to grow. In this paper, we present an end-to-end CNN-based method to detect melanoma, using the ISIC (International Skin Imaging Collaboration) dataset. Our approach also considers the critical issue of class imbalance by using systematic data augmentation through the Augmentor library to create balanced representations of 9 different skin lesion classes. For our proposed CNN architecture, we included multiple convolutional layers and dropout regularization to deal with overfitting, which enabled us to reach a validation accuracy of 84.26%. In general, the model demonstrated good performance in distinguishing melanoma from other skin lesions, and shown good performance in detecting the most important morphologic features of malignant lesions. Our findings further support a growing body of evidence that deep learning methods offer an appreciable method for medical image analysis and specifically dermatologic applications, where early detection can provide a more meaningful difference for patients.
This paper introduces a lightweight CNN-based speech emotion recognition (SER) approach with a balance of accuracy and computational cost optimization. Using Root Mean Square Energy (RMSE) features from speech signals, our model achieves an impressive accuracy of 94.34% across seven emotion categories (anger, disgust, fear, happiness, neutral, sorrow, and surprise) with just 7.1 million parameters. We use four established emotional speech datasets (RAVDESS, CREMA-D, TESS, and SAVEE) and employ strategic augmentation approaches such noise addition, pitch shifting, and combination transformations to strengthen the model. We use three progressive convolutional blocks with increasing filter sizes, batch normalization, max pooling, and dropout regularization, ending in dense layers with softmax activation. F1-scores of 0.93–0.97 are obtained, which suggest good confidence in all the mentioned emotions. The efficiency and accuracy of our model demonstrate that carefully designed architectures with concentrated feature selection can outperform more complicated methods, making it appropriate for resource-constrained or real-time SER applications.
Early detection of Alzheimer’s is based on the evaluation of cognitive impairment based on magnetic resonance imaging. Our study presents a model based on the convolutional neural network (CNN) to classify cognitive impairment from magnetic resonance imaging. The model employs pre-processing techniques such as grayscale conversion, normalization, and image resizing to improve feature extraction. Its architecture consists of multiple convolutional and pooling layers, followed by fully connected layers with dropout regularization to mitigate overfitting. The model is trained and evaluated using categorical cross-entropy loss, with performance measured through accuracy and AUC-ROC/AUC-PR curves. The experimental results demonstrate the effectiveness of the model in classifying cognitive impairment, strengthening its potential for automated diagnosis based on magnetic resonance imaging.
No abstract available
Abstract. Accurate classification of Land Use and Land Cover (LULC) from satellite imagery is vital for environmental monitoring, sustainable urban development, and resource management. With the increasing availability of multi-spectral data from Earth observation missions such as Sentinel-2, deep learning provides powerful solutions for automating LULC classification. In this study, we present a lightweight Convolutional Neural Network (CNN) architecture tailored for 4-band satellite imagery. Unlike conventional approaches that rely solely on RGB inputs, our model incorporates Red, Green, Blue, and Near-Infrared (NIR) bands to capture a broader range of surface and vegetation characteristics. The architecture combines stacked convolutional blocks with batch normalization, pooling layers, and dropout regularization, ensuring both strong accuracy and efficient computation. Training was further enhanced through data augmentation strategies such as rotation, flipping, and zooming. Using the EuroSAT dataset (27,000 images across 10 classes), the model achieved a test accuracy of 96% and a macro-averaged F1-score of 0.96, with excellent performance in challenging categories such as Residential, SeaLake, and Forest. The compact design of the model makes it highly suitable for deployment in time-sensitive or resource-limited scenarios, including monitoring of city growth, assessing agricultural productivity, and supporting rapid response to environmental hazards.
Innovative agricultural technologies increasingly utilize artificial intelligence (AI) and machine learning to enhance productivity and precision. Among these advancements, Convolutional Neural Networks (CNNs) have demonstrated significant promise in image classification tasks across various domains, including agriculture. However, the classification of Tikog leaves a culturally significant raw material used in the banig weaving industry in the Philippines has not been explored using CNNs with feature engineering. This study developed and optimized a feature-engineered CNN model for Tikog leaf classification by integrating Lab color space representation, data augmentation, autoencoder-based feature extraction, mean-max pooling, and dropout regularization. A total sample size of 500 standard-quality and 500 substandard-quality Tikog leaf images was augmented to generate 3,000 training images and 500 validation samples. Among the 27 CNN configurations tested, four models demonstrated superior performance, with Case 12 emerging as the best. This model achieved training and validation accuracies of 94.23% and 96.83%, F1-scores of 94.35% and 96.87%, ROC/AUC scores of 98.18% and 99.40%, and low sum of squared errors (SSE) values (173, 19). Case 12 exhibited excellent generalizability, high classification performance, and computational efficiency, making it the most effective model for deployment in real-world Tikog quality assessment. The study advances both technological innovation and the preservation of indigenous knowledge through intelligent systems.
Alzheimer's disease (AD) is a progressive neurodegenerative disorder that severely impacts cognitive functions such as memory, attention, and reasoning, ultimately affecting daily life. Early and accurate detection is crucial for timely intervention and management. Traditional diagnostic methods, including neuroimaging and cognitive assessments, can be expensive and time-consuming, necessitating more accessible and efficient alternatives. This study aims to develop an automated and efficient deep learning-based detection system that uses Electroencephalogram (EEG) signals to accurately classify AD and healthy individuals. A Convolutional Neural Network (CNN) model was designed to extract meaningful features from preprocessed EEG data. The architecture consists of convolutional layers with max pooling, dropout regularization, and fully connected layers to improve classification accuracy. The model was trained and evaluated on a comprehensive EEG dataset, using key performance metrics such as accuracy, recall, precision, and F1-score. The proposed CNN model achieved a high classification accuracy of 94.56%, a low loss of 0.2162, and an AUC value of 0.93828, demonstrating superior classification capability. The results indicate that the model effectively distinguishes between AD and healthy individuals, outperforming several state-of-the-art approaches. The findings highlight the potential of deep learning-based EEG analysis for AD detection, providing an accessible and cost-effective tool for early diagnosis. The high accuracy of the proposed CNN model suggests that it can assist medical professionals in making well-informed decisions, ultimately improving patient outcomes.
Among tropical and subtropical areas Malaria stands as a major public health difficulty which affects both global health and economic security. Overall mortality reduction and disease prevention depends on the early discovery of malaria-infected blood cells. The current diagnostic approaches take too much time and resources thus require advanced automated methods. The research introduces a Convolutional Neural Network (CNN) as an advanced system, which performs both quick and precise identification of parasitized blood cells together with uninfected cells. The model obtained 89% overall accuracy when processing 43,432 microscopic cell images from Kaggle platform. Multiple convolutional and pooling layers in the CNN architecture utilize dropout regularization jointly with data augmentation techniques to boost generalization. The strong predictive abilities of the model are verified through precision, recall, F1-score, and confusion matrix evaluation, which demonstrate strong infection detection performance. This research helps fulfil the Sustainable Development Goals of Good Health and Well-being, Industry Innovation, Reduced Inequalities, Quality Education and Sustainable Communities and Climate Action through its creation of scalable accessible intelligent healthcare diagnostics.
This paper introduces PneuNet, which is a specialized lightweight CNN model aimed at the binary classification of pneumonia in chest X-rays of children. Unlike other models, he existing deep learning models apply a computational burden and require a pre-trained network, so they utilize large scale frameworks which are not efficient. Thus, PneuNet's design adopts a seven-layered structure that achieves a better balance between functionality and efficiency. The model has been trained and tested for a clinically verified dataset containing 5856 publicly available images. There is a systematic algorithmic approach to preprocessing aimed at high-quality data augmentation and standardization. With PneuNet, the accuracy reached is of 96% in the test set with good precision and recall particularly in cases of pneumonia. PneuNet's structure offers progressive filter scaling, global average pooling, and dropout regularization, enabling its use in resource-constrained environments. The results suggest that PneuNet significantly improves diagnostic support for health systems that lack radiologists or high-end systems making it a practical solution.
Objective. Radiation therapy for head and neck (H&N) cancer relies on accurate segmentation of the primary tumor. A robust, accurate, and automated gross tumor volume segmentation method is warranted for H&N cancer therapeutic management. The purpose of this study is to develop a novel deep learning segmentation model for H&N cancer based on independent and combined CT and FDG-PET modalities. Approach. In this study, we developed a robust deep learning-based model leveraging information from both CT and PET. We implemented a 3D U-Net architecture with 5 levels of encoding and decoding, computing model loss through deep supervision. We used a channel dropout technique to emulate different combinations of input modalities. This technique prevents potential performance issues when only one modality is available, increasing model robustness. We implemented ensemble modeling by combining two types of convolutions with differing receptive fields, conventional and dilated, to improve capture of both fine details and global information. Main Results. Our proposed methods yielded promising results, with a Dice similarity coefficient (DSC) of 0.802 when deployed on combined CT and PET, DSC of 0.610 when deployed on CT, and DSC of 0.750 when deployed on PET. Significance. Application of a channel dropout method allowed for a single model to achieve high performance when deployed on either single modality images (CT or PET) or combined modality images (CT and PET). The presented segmentation techniques are clinically relevant to applications where images from a certain modality might not always be available.
Deep learning techniques face the problem of overfitting due to their complex layer structure. Regularization methods are used to overcome this problem and improve the designed models. In this article, we use the combination of L1 regularization, L2 regularization, Elastic Net-regularization, and Dropout methods. The designed deep model using combination of these methods is considered with different rates. The deep network model using a combination of these methods is designed with different rates. Finally, the performance of all combination methods is compared with the Convolutional Neural Network model which does not use regularization methods. Experiments are performed using the Gold price per ounce data set and linear simulation model. The obtained results show that the performance of the combination model of Dropout and Elastic Net regularization is better than the other models.
One of the main causes of diabetes, heart disease, and obesity in today's world is consumption behavior. To help people prevent such diseases, the consumption advice is required for behavior changes. A comparative study was proposed in this paper to find out a method for recognition of Thai food image using Convolutional Neural Network (CNN) and dropout technique to recognize the type of Thai food. CNN is used for distinguishing the complexity among Thai food images, while dropout technique is used for noise reduction. Sample used in the experiment were 16,170 images for recognizing 52 food types. The proposed method is compared with CNN without dropout, and Nu-In Net 1.1. The comparison results showed that CNN with dropout performed the best with 96.67% accuracy.
Precise classification and detection of apple diseases are essential for efficient crop management and maximizing yield. This paper presents a fine-tuned EfficientNet-B0 convolutional neural network (CNN) for the automated classification of apple leaf diseases. The model builds upon a pre-trained EfficientNet-B0 base, enhanced through architectural modifications such as the integration of a global max pooling (GMP) layer, dropout, regularization, and full-model fine-tuning. To address class imbalance and improve generalization, the study adopts a holistic training strategy that integrates data augmentation, stratified data splitting, and class weighting, alongside transfer learning. The model is evaluated on the PlantVillage (PV) dataset and a curated Apple PV (APV) dataset and compared against EfficientNet-B0, EfficientNet-B3, Inception-v3, ResNet50, and VGG16 models. The fine-tuned model demonstrates outstanding test accuracies of 99.69% and 99.78% for classifying plant diseases using the APV and PV datasets, respectively. The fine-tuned model outperforms EfficientNet-B0, EfficientNet-B3, and VGG16 on both datasets and shows superior performance compared to Inception-v3 and ResNet-50 on the PV dataset. Both EfficientNet-B0 and the fine-tuned model demonstrate the lowest memory consumption and floating-point operations per second (FLOPs). Also, as compared to the EfficientNet-B0 model, the fine-tuned model achieves an 11% increase in accuracy on the APV dataset and a 49.5% accuracy improvement on the PV dataset, with approximately a 7-8% increase in both memory usage and FLOPs. The fine-tuned model thus emerges as an effective solution for plant leaf disease classification, delivering outstanding accuracy with optimized memory consumption and FLOPs, making it suitable for resource-constrained environments. This study demonstrates that fine-tuned CNN approaches, when combined with transfer learning, advanced data pre-processing, and architectural optimizations, can significantly enhance the accuracy of diseased leaf classification in crops with efficient implementation in limited-resource settings.
Abstract Deep neural networks are the most used machine learning systems in the literature, for they are able to train huge amounts of data with a large number of parameters in a very effective way. However, one of the problems that such networks face is overfitting. There are many ways to address the overfitting issue, one of which is regularization using the dropout function. The use of dropout has the benefit of using a combination of different networks in one architecture and preventing units from co-adapting in an excessive way. The dropout function is known to work well in fully-connected layers as well as in pooling layers. In this work, we propose a novel method called Mixed-Pooling-Dropout that adapts the dropout function with a mixed-pooling strategy. The dropout operation is represented by a binary mask with each element drawn independently from a Bernoulli distribution. Experimental results show that our proposed method outperforms conventional pooling methods as well as the max-pooling-dropout method with an interesting margin (0.926 vs 0.868) regardless of the retaining probability.
Abstract Multiple sclerosis is a condition affecting brain and/or spinal cord. Based on deep learning, this study aims to develop an improved convolutional neural network system. We collected 676 multiple sclerosis brain slices and 681 healthy control brain slices. Data augmentation was used to increase the size of training set. Our improved convolutional neural network combined the parametric rectified linear unit (PReLU) and dropout techniques. Finally, a 10-layer deep convolutional neural network was established, with 7 convolution layer and 3 fully connected layers. The retention probabilities of three dropout layers are set as 0.4, 0.5, and 0.5, respectively. Our method achieved a sensitivity of 98.22%, a specificity of 98.24%, and an accuracy of 98.23%. The dropout helped increase the accuracy by 0.88% compared to not using dropout. PReLU helped increase the accuracy by 1.92% compared to using ordinary ReLU, and by 1.48% compared to using leaky ReLU. This proposed method is superior to four state-of-the-art approaches.
More than 35 million patients are suffering from Alzheimer’s disease and this number is growing, which puts a heavy burden on countries around the world. Early detection is of benefit, in which the deep learning can aid AD identification effectively and gain ideal results. A novel eight-layer convolutional neural network with batch normalization and dropout techniques for classification of Alzheimer’s disease was proposed. After data augmentation, the training dataset contained 7399 AD patient and 7399 HC subjects. Our eight-layer CNN-BN-DO-DA method yielded a sensitivity of 97.77%, a specificity of 97.76%, a precision of 97.79%, an accuracy of 97.76%, a F1 of 97.76%, and a MCC of 95.56% on the test set, which achieved the best performance in seven state-of-the-art approaches. The results strongly demonstrate that this method can effectively assist the clinical diagnosis of Alzheimer’s disease.
No abstract available
Medical image segmentation is a key technology for image guidance. Therefore, the advantages and disadvantages of image segmentation play an important role in image-guided surgery. Traditional machine learning methods have achieved certain beneficial effects in medical image segmentation, but they have problems such as low classification accuracy and poor robustness. Deep learning theory has good generalizability and feature extraction ability, which provides a new idea for solving medical image segmentation problems. However, deep learning has problems in terms of its application to medical image segmentation: one is that the deep learning network structure cannot be constructed according to medical image characteristics; the other is that the generalizability y of the deep learning model is weak. To address these issues, this paper first adapts a neural network to medical image features by adding cross-layer connections to a traditional convolutional neural network. In addition, an optimized convolutional neural network model is established. The optimized convolutional neural network model can segment medical images using the features of two scales simultaneously. At the same time, to solve the generalizability problem of the deep learning model, an adaptive distribution function is designed according to the position of the hidden layer, and then the activation probability of each layer of neurons is set. This enhances the generalizability of the dropout model, and an adaptive dropout model is proposed. This model better addresses the problem of the weak generalizability of deep learning models. Based on the above ideas, this paper proposes a medical image segmentation algorithm based on an optimized convolutional neural network with adaptive dropout depth calculation. An ultrasonic tomographic image and lumbar CT medical image were separately segmented by the method of this paper. The experimental results show that not only are the segmentation effects of the proposed method improved compared with those of the traditional machine learning and other deep learning methods but also the method has a high adaptive segmentation ability for various medical images. The research work in this paper provides a new perspective for research on medical image segmentation.
Aim: Multiple sclerosis is a severe brain and/or spinal cord disease. It may lead to a wide range of symptoms. Hence, the early diagnosis and treatment is quite important. Method: This study proposed a 14-layer convolutional neural network, combined with three advanced techniques: batch normalization, dropout, and stochastic pooling. The output of the stochastic pooling was obtained via sampling from a multinomial distribution formed from the activations of each pooling region. In addition, we used data augmentation method to enhance the training set. In total 10 runs were implemented with the hold-out randomly set for each run. Results: The results showed that our 14-layer CNN secured a sensitivity of 98.77 ± 0.35%, a specificity of 98.76 ± 0.58%, and an accuracy of 98.77 ± 0.39%. Conclusion: Our results were compared with CNN using maximum pooling and average pooling. The comparison shows stochastic pooling gives better performance than other two pooling methods. Furthermore, we compared our proposed method with six state-of-the-art approaches, including five traditional artificial intelligence methods and one deep learning method. The comparison shows our method is superior to all other six state-of-the-art approaches.
Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer
This study proposes a modified convolutional neural network (CNN) algorithm that is based on dropout and the stochastic gradient descent (SGD) optimizer (MCNN-DS), after analyzing the problems of CNNs in extracting the convolution features, to improve the feature recognition rate and reduce the time-cost of CNNs. The MCNN-DS has a quadratic CNN structure and adopts the rectified linear unit as the activation function to avoid the gradient problem and accelerate convergence. To address the overfitting problem, the algorithm uses an SGD optimizer, which is implemented by inserting a dropout layer into the all-connected and output layers, to minimize cross entropy. This study used the datasets MNIST, HCL2000, and EnglishHand as the benchmark data, analyzed the performance of the SGD optimizer under different learning parameters, and found that the proposed algorithm exhibited good recognition performance when the learning rate was set to [0.05, 0.07]. The performances of WCNN, MLP-CNN, SVM-ELM, and MCNN-DS were compared. Statistical results showed the following: (1) For the benchmark MNIST, the MCNN-DS exhibited a high recognition rate of 99.97%, and the time-cost of the proposed algorithm was merely 21.95% of MLP-CNN, and 10.02% of SVM-ELM; (2) Compared with SVM-ELM, the average improvement in the recognition rate of MCNN-DS was 2.35% for the benchmark HCL2000, and the time-cost of MCNN-DS was only 15.41%; (3) For the EnglishHand test set, the lowest recognition rate of the algorithm was 84.93%, the highest recognition rate was 95.29%, and the average recognition rate was 89.77%.
No abstract available
No abstract available
Melanoma skin cancer is a deadly disease with a high mortality rate. A prompt diagnosis can aid in the treatment of the disease and potentially save the patient’s life. Artificial intelligence methods can help diagnose cancer at a rapid speed. The literature has employed numerous Machine Learning (ML) and Deep Learning (DL) algorithms to detect skin cancer. ML algorithms perform well for small datasets but cannot comprehend larger ones. Conversely, DL algorithms exhibit strong performance on large datasets but misclassify when applied to smaller ones. We conduct extensive experiments using a convolutional neural network (CNN), varying its parameter values to determine which set of values yields the best performance measure. We discovered that adding layers, making each Conv2D layer have multiple filters, and getting rid of dropout layers greatly improves the accuracy of the classifiers, going from 62.5% to 85%. We have also discussed the parameters that have the potential to significantly impact the model’s performance. This shows how powerful it is to fine-tune the parameters of a CNN-based model. These findings can assist researchers in fine-tuning their CNN-based models for use with skin cancer image datasets.
Motor imagery (MI) decoding methods are pivotal in advancing rehabilitation and motor control research. Effective extraction of spectral-spatial-temporal features is crucial for MI decoding from limited and low signal-to-noise ratio electroencephalogram (EEG) signal samples based on brain-computer interface (BCI). In this paper, we propose a lightweight Multi-Feature Attention Neural Network (M-FANet) for feature extraction and selection of multi-feature data. M-FANet employs several unique attention modules to eliminate redundant information in the frequency domain, enhance local spatial feature extraction and calibrate feature maps. We introduce a training method called Regularized Dropout (R-Drop) to address training-inference inconsistency caused by dropout and improve the model’s generalization capability. We conduct extensive experiments on the BCI Competition IV 2a (BCIC-IV-2a) dataset and the 2019 World robot conference contest-BCI Robot Contest MI (WBCIC-MI) dataset. M-FANet achieves superior performance compared to state-of-the-art MI decoding methods, with 79.28% 4-class classification accuracy (kappa: 0.7259) on the BCIC-IV-2a dataset and 77.86% 3-class classification accuracy (kappa: 0.6650) on the WBCIC-MI dataset. The application of multi-feature attention modules and R-Drop in our lightweight model significantly enhances its performance, validated through comprehensive ablation experiments and visualizations.
In recent years, deep learning (DL) methods have gained much success in the area of intelligent fault diagnosis. However, due to the fact that the working conditions are various and the noise is inevitable, degradation of previous model is very serious. To address the challenge of bearing fault detection under strong noise environment, this article proposed a novel antinoise deep residual multiscale convolutional neural network with attention mechanism named Attention-MSCNN. First, dynamic dropout is used to improve the antinoise ability by introducing artificial noise into the training process. In addition, we design a residual connection between input and the convolved features to fully capture the characteristics of the initial input. Finally, a novel denoised multihead attention mechanism is applied to remove excess noise in raw input and obtain the relationships between long time series. The experimental results show that Attention-MSCNN can achieve robust anti strong noise performance with over 85% accuracy on the Case Western Reserve University (CWRU) dataset. On the self-collected two-stage gear drive test bench, our model achieves an accuracy of over 99% under strong noise environment. Thus, Attention-MSCNN successfully solves the problem of low detection accuracy of previous models under strong noise environment.
The popularity and demand for high-quality date palm fruits (Phoenix dactylifera L.) have been growing, and their quality largely depends on the type of handling, storage, and processing methods. The current methods of geometric evaluation and classification of date palm fruits are characterised by high labour intensity and are usually performed mechanically, which may cause additional damage and reduce the quality and value of the product. Therefore, non-contact methods are being sought based on image analysis, with digital solutions controlling the evaluation and classification processes. The main objective of this paper is to develop an automatic classification model for varieties of date palm fruits using a convolutional neural network (CNN) based on two fundamental criteria, i.e., colour difference and evaluation of geometric parameters of dates. A CNN with a fixed architecture was built, marked as DateNET, consisting of a system of five alternating Conv2D, MaxPooling2D, and Dropout classes. The validation accuracy of the model presented in this study depended on the selection of classification criteria. It was 85.24% for fruit colour-based classification and 87.62% for the geometric parameters only; however, it increased considerably to 93.41% when both the colour and geometry of dates were considered.
In the contemporary of deep learning, where models often grapple with the challenge of simultaneously achieving robustness against adversarial attacks and strong generalization capabilities, this study introduces an innovative Local Feature Masking (LFM) strategy aimed at fortifying the performance of Convolutional Neural Networks (CNNs) on both fronts. During the training phase, we strategically incorporate random feature masking in the shallow layers of CNNs, effectively alleviating overfitting issues, thereby enhancing the model’s generalization ability and bolstering its resilience to adversarial attacks. LFM compels the network to adapt by leveraging remaining features to compensate for the absence of certain semantic features, nurturing a more elastic feature learning mechanism. The efficacy of LFM is substantiated through a series of quantitative and qualitative assessments, collectively showcasing a consistent and significant improvement in CNN’s generalization ability and resistance against adversarial attacks—a phenomenon not observed in current and prior methodologies. The seamless integration of LFM into established CNN frameworks underscores its potential to advance both generalization and adversarial robustness within the deep learning paradigm. Through comprehensive experiments, including robust person re-identification baseline generalization experiments and adversarial attack experiments, we demonstrate the substantial enhancements offered by LFM in addressing the aforementioned challenges. This contribution represents a noteworthy stride in advancing robust neural network architectures.
Authentication plays a pivotal role in contemporary security frameworks, with various methods utilized including passwords, hardware tokens, and biometrics. Biometric authentication and face recognition hold significant application potential, albeit susceptible to forgery, termed as face spoofing attacks. These attacks, encompassing 2D and 3D modalities, pose challenges through fake photos, warped images, video displays, and 3D masks. The existing counter measures are attack specific and use complex architecture adding to the computational cost. The deep transfer learning models such as AlexNet, ResNet, VGG, and Inception V3 can be used, but they are computationally expensive. This article proposes LwFLNeT, a lightweight deep CNN method that leverages parallel dropout layers to prevent over fitting and achieves excellent performance on 2D and 3D face spoofing datasets. The proposed methods is validated through the Cross-dataset train test evaluation. The methodology proposed in the article has the following key contributions:• Design of Light Weight Dual Stream CNN architecture with a parallel dropout layer to minimize over fitting issue.• Design of Generalized and Robust deep CNN architecture that detects both 2D and 3D attacks with higher efficiency compared to existing methodology.• Method validation done with State-of-the-Art methods using the standard performance metrics for face spoofing attack detection.
No abstract available
Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal. On the other hand, most of previous CNN-based audio scene classification approaches aim to improve the classification accuracy, by employing different regularization techniques, such as the dropout of hidden units and data augmentation, to reduce overfitting. It is widely known that outliers in the training set have a high negative influence on the trained model, and culling the outliers may improve the classification performance, while it is often under-explored in previous studies. In this paper, inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset. Using the DCASE 2017 audio scene classification datasets, the experimental results demonstrates the proposed multi-scale DenseNet providing a superior performance than the traditional single-scale DenseNet, while the sample dropout method can further improve the classification robustness of multi-scale DenseNet.
Development of Convolutional Neural Network Models to Improve Facial Expression Recognition Accuracy
Advancements in information and computer technology, particularly in machine learning, have significantly alleviated human tasks. One of the current primary focuses is facial expression recognition using deep learning methods such as Convolutional Neural Network (CNN). Complex models like CNNs often encounter issues such as gradient vanishing and overfitting. This study aims to enhance the accuracy of CNN models in facial expression recognition by incorporating additional convolutional layers, dropout layers, and optimizing hyperparameters using Grid Search. The research utilizes the FER2013 public dataset sourced from the Kaggle website, trained and evaluated using CNN models, hyperparameter tuning, and downsampling methods. FER2013 comprises thousands of facial images representing various human expressions, with a specific focus on four facial expression categories (angry, happy, neutral, and sad). Through the addition of convolutional and dropout layers, as well as hyperparameter optimization, the developed model demonstrates a significant improvement in accuracy. Findings reveal that the refined CNN model achieves a highest accuracy of 98.89%, with testing accuracy at 89%, precision 78%, recall 78%, and F1-score 78%. This research contributes by enhancing facial expression recognition accuracy through optimized CNN models and providing a framework beneficial for the social-emotional development of children with special needs and aiding in the detection of mental health conditions. Additionally, it identifies avenues for future research, including exploring advanced data augmentation techniques and integrating multimodal information. Furthermore, this study paves the way for applications across diverse fields like human-computer interaction and mental health diagnostics.
Biometric security is the cornerstone of modern identity verification and authentication systems, where the integrity and reliability of biometric samples is of paramount importance. This paper introduces AttackNet, a bespoke Convolutional Neural Network architecture, meticulously designed to combat spoofing threats in biometric systems. Rooted in deep learning methodologies, this model offers a layered defense mechanism, seamlessly transitioning from low-level feature extraction to high-level pattern discernment. Three distinctive architectural phases form the crux of the model, each underpinned by judiciously chosen activation functions, normalization techniques, and dropout layers to ensure robustness and resilience against adversarial attacks. Benchmarking our model across diverse datasets affirms its prowess, showcasing superior performance metrics in comparison to contemporary models. Furthermore, a detailed comparative analysis accentuates the model's efficacy, drawing parallels with prevailing state-of-the-art methodologies. Through iterative refinement and an informed architectural strategy, AttackNet underscores the potential of deep learning in safeguarding the future of biometric security.
People who have hearing loss can communicate through sign language. The sign language will subsequently be converted into text, specifically alphabets. This paper presents the specifically designed Convolutional Neural Network (CNN) is composed of eleven layers, each carefully calibrated using exact parameters. These layers encompass convolution, activation, max-pooling, and flattening processes, ultimately leading to dense layers that incorporate dropout regularization. The last dense layer employs softmax activation. In the initial phase, the MNIST dataset undergoes preprocessing steps. Next, numerous essential characteristics of the preprocessed hand gesture image are computed. The ASL employs 24 classes, representing the letters A to Y (without J and Z) in the alphabet. However, there are no cases available for 9 (representing J) or 25 (representing Z) due to the involvement of gesture motions associated with these letters. According to MNIST dataset, total number of training examples are 34,627 images which is divided in to 80% for training and 20% testing, all sharing a common size of 28 × 28 pixels image include gray scale image value 0 to 255. The results demonstrate that the suggested approaches yielded favorable results, achieving a recognition accuracy of the classification 98.75%.
The classification of traffic signs holds significant importance in the realm of autonomous vehicles. The primary objective of our research is to effectively perform this classification to mitigate the occurrence of accidents and enhance the overall reliability of autonomous vehicles. This research proposed a TSC18 CNN model, using 18 layers only for the Traffic Sign Classification and performing the evaluation using the pre-trained Convolutional Neural Network (CNN) models. The TSC18 CNN model contains four convolutional layers, three max-pooling layers, one flattening layer, four fully connected layers, and batch normalization and dropout layers. The TSC18 CNN model is trained and validated using the Road Cracks dataset, which contains a traffic image collection of 43 classes. The training set includes 34,799 images that have been labeled, while the validation set contains 4,410 labeled images. The performance of the proposed CNN model is compared with the pre-trained CNN models, including EffecientNet, InceptionNet, and VGG-19, using the same dataset. The training and validation accuracy and losses are presented in graphical form for comparison. Also, the wellknown pre-trained CNN models, including EfficientNet, InceptionNet, and VGG- 19, were tested using the same dataset for the performance evaluation with the proposed model. The TSC18 model exhibited a test data accuracy of 99.21%. The performance of the proposed model is compared with the existing models using the GTSRB dataset. The proposed model performs well as compared to the pre-trained CNN models. The TSC18 CNN model can be used for road crack detection using road images.
Image processing is used to classify lung images with malignant or normal nodules. The Convolutional Neural Network (CNN) method is often used to classify images. This study uses a modified CNN architecture with various layers, filters, batch size, dropout, and epoch values. Variations were made to determine the best accuracy value and reduce the overfitting value of the proposed CNN architecture. This study implements the method using the Keras library with the Python programming language. The data is in the form of CT-Scan images of lung cancer and normal lungs. The results of several experiments from the proposed model produce an accuracy value of 95% using three layers, 128 filters on the first layer, 256 on the second layer, and 512 filters on the third layer, then with 32 batch sizes, 0.5 dropout.
The classification of brain tumors from medical imaging is pivotal for accurate medical diagnosis but remains challenging due to the intricate morphologies of tumors and the precision required. Existing methodologies, including manual MRI evaluations and computer-assisted systems, primarily utilize conventional machine learning and pre-trained deep learning models. These systems often suffer from overfitting due to modest medical imaging datasets and exhibit limited generalizability on unseen data, alongside substantial computational demands that hinder real-time application. To enhance diagnostic accuracy and reliability, this research introduces an advanced model utilizing the Xception architecture, enriched with additional batch normalization and dropout layers to mitigate overfitting. This model is further refined by leveraging large-scale data through transfer learning and employing a customized dense layer setup tailored to effectively distinguish between meningioma, glioma, and pituitary tumor categories. This hybrid method not only capitalizes on the strengths of pre-trained network features but also adapts specific training to a targeted dataset, thereby improving the generalization capacity of the model across different imaging conditions. Demonstrating an important improvement in diagnostic performance, the proposed model achieves a classification accuracy of 98.039% on the test dataset, with precision and recall rates above 96% for all categories. These results underscore the possibility of the model as a reliable diagnostic tool in clinical settings, significantly surpassing existing diagnostic protocols for brain tumors.
No abstract available
In this research, we employed a deep convolutional neural network, often known as a Deep CNN, to propose a novel approach to the detection of illnesses in the leaves of plants. In order to train the Deep CNN model, a dataset that is already accessible is employed. This dataset contains photographs of the leaves of 39 distinct plant species. Six different methods of data augmentation were utilized, including image inversion, gamma correction, noise injection, principal component analysis (PCA), color enhancement, rotation, and scaling. We came to the conclusion that adding more data to a model can improve its accuracy. The proposed model was trained using many epochs, batch sizes, and dropout percentages over the course of its development. When utilizing validation data, the suggested model performs better than methods of transfer learning that are commonly utilized. Extensive simulations demonstrate that the proposed model is capable of an astounding 83.12% accuracy in data classification. The proposed research is more accurate than the many machine learning technologies that are currently in use. In addition to that, we put the suggested model through our consistency and reliability testing.
This paper proposes a Convolutional Neural Network–Block Development Mechanism (CNN-BDM) enabling the development of a lightweight deep learning architecture for the detection of damaged pallet-racking, within the manufacturing/warehousing environment. The developed CNN architecture consisted of only 6.5 Million learnable parameters, making it the first custom designed CNN architecture for the pallet racking domain. Architectural training was based on a real dataset collected from various warehouses after implementation of several data modelling strategies for scaling and increasing the variance within the dataset, in a representative manner. Additionally, after achieving a baseline accuracy of greater than 90%, various regularization strategies were applied for further enhancing the performance and generalizability of the network. Dropout at a drop rate of 50% provided the highest performance during training, achieving 99% precision, recall and F1 score. The effectiveness of the proposed methodology was manifested by the fact that the architecture was able to maintain high performance on the test data achieving an overall F1 score of 96%.
Pet dogs are our good friends. Realizing the dog’s emotions through the dog's facial expressions is beneficial to the harmonious coexistence between human beings and pet dogs. This paper describes a study on dog facial expression recognition using convolutional neural network (CNN), which is a representative algorithm model of deep learning. Parameter settings have a profound impact on the performance of a CNN model, improper parameter setting will make the model exposes several shortcomings, such as slow learning speed, easy to fall into local optimal solution, etc. In response to these shortcomings and improve the accuracy of recognition, a novel CNN model based on the improved whale optimization algorithm (IWOA) called IWOA–CNN is applied to complete this recognition task. Unlike human face recognition, a dedicated face detector in Dlib toolkit is utilized to recognize the facial region, and the captured facial images are augmented to build an expression dataset. The random dropout layer and L2 regularization are introduced into the network to reduce the number of transmission parameters of network and avoid over fitting. The IWOA optimizes the keep probability of the dropout layer, the parameter λ of L2 regularization and the dynamic learning rate of gradient descent optimizer. Carry out a comparative experiment of IWOA–CNN, Support Vector Machine, LeNet-5 and other classifiers for facial expression recognition, its results demonstrate that the IWOA–CNN has better recognition effect in facial expression recognition and also explain the efficiency of the swarm intelligence algorithm in dealing with model parameter optimization.
BACKGROUND Parkinson's disease (PD) is the second prevalent neurological diseases with a significant growth rate in incidence. Convolutional neural networks using structural magnetic resonance images (sMRI) are widely used for PD classification. However, the areas of change in the patient's MRI images are small and unfixed. Thus, capturing the features of the areas accurately where the lesions changed became a problem. METHOD We propose a deep learning framework that combines multi-scale attention guidance and multi-branch feature processing modules to diagnose PD by learning sMRI T2 slice features. In this scheme, firstly, to achieve effective feature transfer and gradient descent, a deep convolutional neural network framework based on dense block is designed. Next, an Adaptive Weighted Attention algorithm is proposed, whose pursers is to extract multi branch and even diverse features. Finally, Dropout layer and SoftMax layer are added to the network structure to obtain good classification results and rich and diverse feature information. The Dropout layer is used to reduce the number of intermediate features to increase the orthogonality between features of each layer. The activation function SoftMax increases the flexibility of the neural network by increasing the degree of fitting to the training set and converting linear to nonlinear. RESULTS The best performance of the proposed method an accuracy of 92%, a sensitivity of 94%, specificity of 90% and a F1 score of 95% respectively for identifying PD and HC. CONCLUSION Experiments show that the proposed method can successfully distinguish PD and NC. Good classification results were obtained in PD diagnosis classification task and compared with advanced research methods.
Deep convolutional neural networks (DCNNs) have been extensively studied for different types of detection and classification in the field of biomedical image processing. Many of them have produced results that are on par with or even better than those of radiologists and neurologists. But, the challenge to get good results from such DCNNs is the requirement of large dataset. In this paper, a unique single-model based approach for classifying brain tumours on small dataset is presented in this study. A modified DCNN called the RegNetY-3.2G is used, integrated with regularization DropOut and DropBlock to prevent over-fitting. Furthermore, an improved augmentation technique called the RandAugment is used to lessen the problem of small dataset. Lastly, MWNL (Multi-Weighted New Loss) method and end to end CLS (cumulative learning strategy) is used to address the problem of unequal size of sample, complexity in the classification and to lessen the effect of aberrant samples on training.
Data is a crucial part of machine learning, used to create models whose patterns are defined with specific algorithms. The primary purpose of the machine learning process is to generate the best prediction models. However, in general, models that trained well in the training phase differ when used in the testing phase. This phenomenon is called overfitting, a common problem in machine learning, mainly when developing image classification models using Convolutional Neural Networks (CNNs). The main task of the output layer in CNNs is to classify the truth labels from the data input images. One machine learning performance is defined by how well the models can predict the truth labels. Overfitting is one of the problems that can influence the model's performance. Dropout is one of the regularization techniques intended to solve this problem. Our purpose in this research is to know the relationship between the use of dropout and CNN's performance in image classification from the point of view of overfitting problems. The experimental results stated that this technique can alleviate overfitting by 35% with a learning rate value 0.001.
Crop diseases can significantly affect various aspects of crop cultivation, including crop yield, quality, production costs, and crop loss. The utilization of modern technologies such as image analysis via machine learning techniques enables early and precise detection of crop diseases, hence empowering farmers to effectively manage and avoid the occurrence of crop diseases. The proposed methodology involves the use of modified MobileNetV3Large model deployed on edge device for real-time monitoring of grape leaf disease while reducing computational memory demands and ensuring satisfactory classification performance. To enhance applicability of MobileNetV3Large, custom layers consisting of two dense layers were added, each followed by a dropout layer, helped mitigate overfitting and ensured that the model remains efficient. Comparisons among other models showed that the proposed model outperformed those with an average train and test accuracy of 99.66% and 99.42%, with a precision, recall, and F1 score of approximately 99.42%. The model was deployed on an edge device (Nvidia Jetson Nano) using a custom developed GUI app and predicted from both saved and real-time data with high confidence values. Grad-CAM visualization was used to identify and represent image areas that affect the convolutional neural network (CNN) classification decision-making process with high accuracy. This research contributes to the development of plant disease classification technologies for edge devices, which have the potential to enhance the ability of autonomous farming for farmers, agronomists, and researchers to monitor and mitigate plant diseases efficiently and effectively, with a positive impact on global food security.
This paper briefly introduces an enhanced neural network regularization method, so called weight dropout, in order to prevent deep neural networks from overfitting. In suggested method, the fully connected layer jointly used with weight dropout is a collection of layers in which the weights between nodes are dropped randomly on the process of training. To accomplish the desired regularization method, we propose a building blocks with our weight dropout mask and CNN. The performance of proposed method has been compared with other previous methods in the domain of image classification and segmentation for the evaluation purpose. The results show that the proposed method gives successful performance accuracies in several datasets.
Considering that ozone is essential to understanding air quality and climate change, this study presents a deep learning method for predicting atmospheric ozone concentrations. The method combines an attention mechanism with a convolutional neural network (CNN) and long short-term memory (LSTM) network to address the nonlinear nature of multivariate time-series data. It employs CNN and LSTM to extract features from short time series, enhanced by the attention mechanism to improved short-term prediction accuracy. It takes eight meteorological and environmental parameters from 16,806 records (2018–2019) as input, which are selected principal component analysis (PCA). It features an attention-based CNN-LSTM hybrid deep learning model with specific settings: a time step of 5, a batch size of 25, 15 units in the LSTM layer, the ReLU activation function, 25 epochs, and an overfitting avoidance strategy with a dropout rate of 0.15. Experimental results demonstrate that this hybrid model outperforms individual models and the CNN-LSTM model, especially in forward prediction with a multi-hour time lag. The model exhibits a high coefficient of determination (R2 = 0.971) and a root mean square error of 3.59 for a 1-hour time lag. It also exhibits consistent accuracy across different seasons, highlighting its robustness and superior time-series prediction capabilities for ozone concentrations.
This study evaluates a hybrid model that integrates Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN) to predict stock prices. The model leverages two datasets: historical Google stock data and sentiment data from Reddit comments. Sentiment analysis was performed using VADER from NLTK, which classified comments as negative, neutral, or positive, while a CNN model was trained to predict sentiment scores. Separately, an LSTM model was built using ten years of Google stock data from Yahoo Finance, with features scaled using MinMax normalization to improve learning and a dropout layer added to prevent overfitting. Model performance was evaluated using Root Mean Squared Error (RMSE) and Mean Squared Error (MSE). The LSTM model performed well on test data but showed lower accuracy on unseen data during forecasting. The hybrid model successfully combined the outputs of both the CNN and LSTM, demonstrating superior performance with lower RMSE and higher classification accuracy compared to the standalone models. This highlights the potential of integrating sentiment analysis with traditional stock prediction. The study acknowledges challenges in classifying neutral sentiments, suggesting that more comprehensive sentiment data is needed for future research.
During a long period of time we are combating overfitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets. To the best of our knowledge, DisturbLabel serves as the first work which adds noises on the loss layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide complementary regularization functions. Experiments demonstrate competitive recognition results on several popular image recognition datasets.
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder that impacts social interaction and behavior. Accurate early diagnosis is critical for timely intervention, yet current diagnostic methods are often lengthy and influenced by subjective interpretation. This research introduces an automated ASD detection framework based on electroencephalogram (EEG) data analysis using a Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model. In this approach, the CNN component identifies spatial patterns in the EEG signals, while the LSTM component detects temporal relationships, providing a comprehensive basis for ASD classification. The CNN-LSTM architecture is designed to process raw EEG signals without additional preprocessing, employing distinct temporal and spatial filters focused on motor cortex electrode pairs. A 5-layer CNN paired with a 4-layer max pooling structure optimizes dimensionality reduction, followed by a fully connected (FC) layer for final classification. To mitigate overfitting, dropout and batch normalization techniques are applied. This model was trained and validated using EEG data from 5, 10, 20, and 50 subjects, obtained from the Physionet repository and recorded with a 14-channel Emotiv EPOC EEG device. This study incorporated varying cognitive states such as focused, unfocused, and drowsy. Experimental results indicate a group-level classification accuracy averaging 94.13%, an ROC area of 0.971, and a peak accuracy of 99.41% for the FC3-FC4 electrode pair on a 10-subject dataset. This automated system demonstrates promising support for clinicians in diagnosing ASD, potentially streamlining the diagnostic process and minimizing subjective biases.
In this research, we propose a hybrid CNN and LSTM model for word-level Ethiopian sign language recognition. The recognition system has four major components: preprocessing, feature extraction, sequence modeling, and training. Image frame extraction was performed on the collected word video dataset, followed by keyframe selection using the k-means clustering algorithm. Image concatenation and image resizing were applied to the selected keyframe images. The final dataset is downsized into 1500 images, which is 150 images for each class. CNN was utilized to extract the important features through convolutional layers, pooling layers, and dropout functions were used to overcome overfitting issues. Following feature extraction, sequence modeling was performed by using LSTM with LSTM layers, a dense layer, and a SoftMax classifier to recognize words in both predicted and actual classes. The system was tested by the dataset that we collected for this purpose. The experiment demonstrates that the proposed hybrid CNN-LSTM model performs well across the training, validation, and testing phases, achieving 98%,97%, and 95% accuracy during model evaluation, respectively. These results demonstrate the recognition success of the model isolated word-level Ethiopian sign language, which plays a step forward in assisting hearing-impaired individuals.
Financial time series are nonlinear, volatile and stochastic hence making it extremely difficult to predict the stock market. Traditional time-series models and even traditional deep learning models are prone to overfitting and fail to capture key behavioral patterns in the market such as resistance zones, the price levels in the market which historically are of enormous interest that reversal trends do exist. This paper is a proposal of a more successful deep learning-based stock market prediction system that includes overfitting resistance mechanisms, as well as resistance zone-sensitive forecasting. The model exploits a Bi-LSTM and CNN structure to sequentially obtain local features with assistance of Resistance Encoding Layer (REL), which compares historical reversal areas to the Gaussian blur operations. Adversarial perturbation, dropout annealing and Bayesian regularization training schemes are applied in order to prevent overfitting more. It compared the proposed model with the historical data of ten active-traded stocks with ARIMA, LSTM, and CNN-LSTM baselines on Alpha Vantage historical data. With the simulation under trading, the proposed model was found to have a mean prediction accuracy of 97.89, Resistance Sensitivity Score (RSS) of 0.93 and a Profit Factor of 3.16-worse than any other baseline models. The directional accuracy on all stocks was more than 94%. The results confirm the fact that adding resistance zone intelligence and overfitting control enhance the accuracy and generalization. The framework gives a better and efficient tool to possess a more reliable interpretation of stock forecast in real financial set ups. The additional research will be conducted in the frames of integrating the news sentiment, modeling of multi-stock correlation, and adaptive real-time learning to improve the performance within the live trading environment.
With the rapid development of deep learning, especially the application of Convolutional Neural Networks (CNNs), significant progress has been made in the identification and recognition of facial micro-expressions. This paper explores and compares three widely used models in this field: VGG, DenseNet, and ResNet, across seven expression labels—Anger, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise. The comparison is based on several key evaluation metrics commonly used in deep learning classification tasks. To ensure consistency, critical training parameters such as epochs, learning rate, and data preprocessing steps are kept the same across all models. Additionally, a dropout layer is incorporated to address issues such as overfitting and improve generalization in each model. These works indicate that VGG has the highest performance in RAFDB dataset because it as an F1-score of 0.76, an AUC value of 0.95, and a 75% accuracy rate. Additionally, VGG uses only 1243.31MB of memory, making it the most efficient model compared to the other two models. This shows that VGG excels not only in performance but also in memory efficiency.
The research introduces an extended Convolutional Neural Network (CNN) architecture, based on AlexNet, to classify medical images into benign, malignant, and normal categories. The experimental dataset was downloaded from Kaggle and contains 1,194 images, out of which 1,050 images are used for training and 144 images are used for testing. Every image was preprocessed by resizing, normalizing, and applying augmentation techniques in order to maintain consistency and enhance generalization. The suggested architecture is started with various convolutional and pooling layers for the extraction of features, followed by the addition of Batch Normalization to stabilize the training process and Dropout layers in order to avoid overfitting. To avoid the complexity of the model and enhance robustness, a Global Average Pooling (GAP) layer is utilized rather than a conventional flattening operation. Moreover, a Squeeze-and-Excitation (SE) attention mechanism is used to enable the network to focus on salient areas of the image and suppresses irrelevant details. Lastly, the classification is conducted using dense fully connected layers followed by a softmax output layer. Experimental results prove that the proposed model attains about 99% accuracy on the test set, with high precision, recall, and F1-score for all classes. These results show that the suggested CNN model is excellent, efficient, and trustworthy for medical image classification, and is able to offer promising assistance in clinical decision-making.
This article presents a Hybrid Transformer-CNN model designed for the classification of 12-lead ECG signals, combining Convolutional Neural Networks (CNNs) and Transformer-based self-attention mechanisms to effectively capture local morphological aspects and long-range temporal relationships. The model uses the PTB-XL dataset, consisting of a variety of annotated ECG recordings, to analyze pre-segmented multi lead signals for precise identification of cardiac anomalies. The CNN layers function as the principal extractor of localized waveform components, including QRS complexes and P waves, while the Transformer encoder contextualizes these characteristics throughout time, facilitating comprehensive rhythm analysis. Positional encoding, residual connections, and layer normalization are utilized to enhance the stability of the learning process. The architecture utilized the Adam optimizer and incorporated regularization methods, including dropout and early stopping, to mitigate overfitting. Assessment of patient-specific test cases demonstrated good accuracy (~98%), accompanied by favorable sensitivity and specificity metrics. With the comparative study and initial results suggest that the proposed hybrid model possesses considerable potential for implementation in clinical decision support systems. Future endeavors will concentrate on benchmarking against known methodologies and enhancing the framework for real-time ECG interpretation in wearable and remote monitoring applications.
The timely identification of plant diseases is very vital in crop productivity and food security. The conventional hand inspection systems are costly in time, inaccurate, and highly reliant on the level of expertise. As a solution to these issues, LeafGuardNet, which is a deep Convolutional Neural Network (CNN)-based model, will be suggested in this research study to effectively predict plant leaf diseases with high accuracy and early onset in the smart agriculture setting. The model uses a huge amount of annotated leaf image data, which is preprocessed by image enhancement and noise reduction algorithms. The proposed model uses a multi-layer CNN model, which has been optimized to extract features, dropout, and batch normalization layers to enhance accuracy and minimize overfitting. The pre-trained models, including ResNet50 and EfficientNet-B0, are incorporated in transfer learning to improve the performance of the classification. Proposed system is trained and tested against publicly available datasets like PlantVillage and field-acquired images with an overall classification accuracy exceeding 97 percent in a variety of crop species. The experimental findings reveal that LeafGuardNet is better than currently existing models in terms of precision, recall, and F1-score, which is why it is a stable decision-support tool to use in precision agriculture and sustainable crop management.
This study explores various architectural modifications and hyperparameter tuning strategies to enhance the performance of Convolutional Neural Networks (CNNs) in image classification. We systematically analyze the effects of varying convolutional kernel sizes, dropout layers, and fully connected layer configurations. Our experiments demonstrate that while dropout improves generalization, increasing convolutional layers can lead to overfitting. Comparisons with alternative approaches, including the Bag of Visual Words (BoW) model and a Multi-Layer Perceptron (MLP), reveal that BoW achieves the highest accuracy but at the cost of increased computational complexity, while MLP exhibits significant overfitting. The findings highlight the trade-offs between network depth, regularization techniques, and computational efficiency. Future improvements may focus on data augmentation, refined region-of-interest segmentation, and leveraging pre-trained models like VGG16 for fine-tuning. This study provides insights into optimizing CNN architectures for improved image classification performance.
In order to address the issues of high recognition energy consumption in existing fruit classification and recognition methods, this paper proposes a fruit recognition system based on Convolutional Neural Networks (CNN). The system employs a multi-layer convolutional structure to extract features from fruit images, which are then classified using a fully connected layer. A capturing program is developed to collect images of fruits, and techniques such as Gaussian blur are applied to increase the sample size and ensure diversity in the dataset. Additionally, dropout techniques are utilized to prevent overfitting during model training. Compared to conventional methods, the proposed approach achieves an accuracy of 95.86% in fruit recognition, enabling more accurate detection and recognition, thus meeting the requirements for fruit classification and recognition.
The research develops a Convolutional Neural Network (CNN), which uses images for classifying pebbles and shells. The open-source Kaggle extracted dataset includes 4284-labeled images, which cover both the shell and pebble categories. The CNN model structure starts with an input layer that applies resizing and rescaling and continues with a feature extraction stage including two dense layers of 128 units that incorporate dropout layers with a rate of 0.2 for regularization followed by an output layer for binary classification. The achieved results demonstrated 87% accuracy together with precision of 0.87 for detecting both classification types. The classification report and confusion matrix highlight high recognition capability for pebbles and moderate performance for shells. Both loss and accuracy curves support the proper functioning of the model throughout training without significant overfitting issues. The research demonstrates how artificial intelligence supports all the United Nations Sustainable Development Goals from responsible consumption to life below water through climate action and sustainable communities and innovation in industry. Deep learning techniques allow the model to apply to environmental research while also supporting coastal management operations and automated material classification functions to advance sustainability knowledge about coastal resources.
Identification and diagnosis of schizophrenia based on multichannel EEG and CNN deep learning model.
This paper proposes a high-accuracy EEG-based schizophrenia (SZ) detection approach. Unlike comparable literature studies employing conventional machine learning algorithms, our method autonomously extracts the necessary features for network training from EEG recordings. The proposed model is a ten-layered CNN that contains a max pooling layer, a Global Average Pooling layer, four convolution layers, two dropout layers for overfitting prevention, and two fully connected layers. The efficiency of the suggested method was assessed using the ten-fold-cross validation technique and the EEG records of 14 healthy subjects and 14 SZ patients. The obtained mean accuracy score was 99.18 %. To confirm the high mean accuracy attained, we tested the model on unseen data with a near-perfect accuracy score (almost 100 %). In addition, the results we obtained outperform numerous other comparable works.
Cardiovascular diseases remain the leading global cause of death, with Invasive Coronary Angiography (ICA) being the gold standard for cardiac interventions. While deep learning enables automated vessel segmentation from ICA for critical tasks like stenosis assessment, these models struggle with domain shifts across clinical settings due to variations in protocols, equipment, and patient demographics. This is exacerbated by the scarcity of annotated datasets, making Single-source Domain Generalization (SDG) essential. Current SDG methods are augmentation-based and risk overfitting to synthetic variations. To address this, we propose a progressive and targeted channel dropout method that explicitly targets channel behavior in the first layer of a Convolutional Neural Network (CNN). By identifying and progressively dropping domain-specific channels that overfit to training source features, our proposed method stabilizes feature learning and promotes domain-invariant representations. Our architecture-agnostic method can be integrated with any CNN backbone to enhance generalization. Extensive evaluation across five diverse ICA datasets demonstrates improved out-of-distribution performance while maintaining in-domain performance, establishing robust foundation for clinical deployment.Clinical relevance— Our proposed method aids reliable deployment of coronary vessel segmentation models for x-ray angiography by improving generalization across diverse imaging conditions, thus supporting important downstream tasks, such as 3D reconstruction of coronary arteries and non-invasive hemodynamic analysis.
Recent improvements in the Artificial Intelligence and Machine Learning lead us to the automated models for detecting diseases based on ECGs. The previous models based on 2D CNNs had pretty good accuracy in ECG Classification, but these models requires us to create a 2D spatial representation from ECGs as ECGs are 1D temporal data. As these are typically done through CWT Scalogram it is computationally intensive. In this paper a model based on a 1D Convolutional Neural Network is proposed which can classify specific cardiac disease from ECG data. This model consists of four feature extracting Convolutional layers each succeeded by a Max Pooling layer and a dropout layer to prevent overfitting. This model is trained with ECGs of different types based on different cardiac disease like Cardiac Arrhythmia, Congestive Heart Failure and Normal Sinus Rhythm etc. and it can classify these diseases from ECGs. For training and evaluation MIT-BIH arrhythmia database have been used. It results in classification accuracy of 96.83% which is almost equal to the previous 2D CNN model with significant decrease in complexity.
This study presents a CNN architecture aimed at increasing the accuracy of skin lesion detection by incorporating advanced techniques such as batch normalization data augmentation, dropout layers and data balancing techniques. Through experiments conducted on the ISIC dataset, the effectiveness of these modifications is demonstrated. The model, despite its simplicity, exhibits significant potential for enhancement through additional training epochs. Remarkably, achieving an accuracy rate exceeding 90%, outperforming other models with data balancing, augmentation, and dropout layer defeating overfitting. This achievement highlights the model's applicability in various settings, including real-time medical diagnostics. However, the research also emphasizes the importance of further development and testing. Suggesting future research avenues, including the exploration of transfer learning, evaluation with larger datasets, and expansion into more medical diagnostic areas. The paper concludes by advocating for continued research efforts to ensure the reliability and effectiveness of these techniques in clinical environments, pointing to their promising future in medical diagnostics.
A good design for image classification problems is the CNN architecture you presented, which consists of 5 convolution blocks followed by 4 fully connected layers. From the input X-ray images, the convolutional blocks extract pertinent features, and the fully connected layers assist in determining the final classification based on those learned features. You have integrated various approaches to improve the performance of your model. The inputs to each layer are normalized through batch normalization, which can speed up training and enhance generalization. By removing certain neurons at random during training, dynamic dropout helps avoid overfitting. L2 regularization weight decay and learning rate decay are two efficient strategies for preventing overfitting and enhancing the model's capacity to expand to new data. Popular optimization algorithm Adam optimizer effectively neural network training. For binary classification problems like the diagnosis of pneumonia, the loss function for binary Cross-Entropy is the best option. To determine your model's efficacy, you must validate it using benchmark datasets that are available to the general public. You can evaluate your model's effectiveness by comparing its performance to that of current methods by conducting experimental investigations on these datasets. Your model performs well as evidenced by accuracy scores of 90.93%, 89.17% for multi-class classification and binary classification. tasks. Automated methods, such as the one you suggested, might help medical practitioners recognize pneumonia and spot diseased spots in chest X-ray pictures. However, it's crucial to remember that automated systems shouldn't take the place of professional radiologists' and doctors' skills and judgment; rather, they should be used as supportive tools. Medical To ensure accurate diagnosis and suitable patient care, specialists should always review and interpret the system's data. It's also crucial to take into account potential drawbacks and difficulties, such as data quality, the model's decision-making process's interpretability, and the requirement for ongoing validation and refinement.
Convolutional Neural Network (CNN) is extensively employed in crop disease classification and has been achieving remarkable results. But the challenge researchers face while designing the CNN model is misclassification of diseases leading to low accuracy and huge training time. Overfitting is the common problem which is faced by CNN models. Also, the dataset used is the key factor for training the CNN model. This study proposes the CNN architecture for rice disease identification with enhanced accuracy and less training time. Three convolutional layers and two dense layer are added along with application of max pooling, batch normalization and dropout layers. Hyperparameter fine tuning is applied for training the proposed architecture to obtain the optimized parameters to improve the model accuracy. The proposed CNN model is evaluated by doing comparative analysis on three variants of rice disease dataset which includes RGB images, grey scale images and augmented RGB images. It has been found in the study that proposed CNN architecture performs classification of rice diseases with high accuracy and without overfitting on all three variants of dataset dataset. It attains best training accuracy of 96.75% and testing accuracy of 95.25% with augmented RGB images with training time of 127 seconds only.
Detecting diseases on rice leaves plays a crucial role in advancing agricultural automation. The challenge of image resolution impacting model error detection in disease identification prompted the development of a Tiny-CNN model in this research. This model addresses image recognition issues on datasets with diverse solutions and dimensions, leveraging a multi-layer structure, continuous convolution, and consistent image size across layers. Designed with activation filter parameters, dropout, and RELU to counter overfitting and gradient vanishing during training, the Tiny-CNN model is evaluated on datasets with high and low resolutions and varying input sizes. Results show an impressive Accuracy and F1-Score exceeding 91.8% and 92%, respectively. Notably, Tiny-CNN trained on high-resolution images excels, achieving over 98% across all metrics when tested on low-resolution images, with a 2% decrease compared to high-resolution scenarios. Even when trained on limited data for low-resolution input sizes, Tiny-CNN maintains results above 95%, demonstrating its effectiveness in addressing image resolution challenges in leaf disease identification. This progress opens avenues for developing image-processing techniques on devices like UAVs, IoT, and more.
Face image classification is an important branch of computer vision and artificial intelligence, commonly applied in various fields such as facial recognition and facial attribute analysis. One facial attribute that is particularly interesting to classify is the use of eyeglasses, as it can affect the overall accuracy of facial recognition systems. This study aims to develop an eyeglass-use classification system based on a Convolutional Neural Network (CNN) and implement it in a web application using Flask to enable real-time prediction results. The research methodology includes collecting facial image datasets from Kaggle, performing preprocessing steps such as resizing, augmentation, and normalization, designing the CNN architecture, training the model, and evaluating its performance using a confusion matrix and classification report. The designed CNN model consists of three convolutional layers, max pooling, a flattening process, two fully connected layers, and a dropout layer to reduce the risk of overfitting. During the training phase, the model achieved 90% training accuracy and 96% validation accuracy, while testing on the test dataset resulted in an overall accuracy of 82%. The Flask-based system is capable of displaying real-time predictions, including the input image, classification label, accuracy percentage, and inference time. In the detection process, the accuracy of the model implementation reached 93% and the time required was in the range of a few milliseconds. The results demonstrate that the CNN can effectively classify faces with and without eyeglasses, and its implementation through a web interface offers broad potential for visual identification applications.
Lightweight convolutional neural network (CNN) models have proven effective in recognizing common pest species, yet challenges remain in enhancing their nonlinear learning capacity and reducing overfitting. This study introduces a grouped dropout strategy and modifies the CNN architecture to improve the accuracy of multi-class insect recognition. Specifically, we optimized the base model by selecting appropriate optimizers, fine-tuning the dropout probability, and adjusting the learning rate decay strategy. Additionally, we replaced ReLU with PReLU and added BatchNorm layers after each Inception layer, enhancing the model’s nonlinear expression and training stability. Leveraging the Inception module’s branching structure and the adaptive grouping properties of the WeDIV clustering algorithm, we developed two grouped dropout models, the iGDnet-IP and GDnet-IP. Experimental results on a dataset containing 20 insect species (15 pests and five beneficial insects) demonstrated an increase in cross-validation accuracy from 84.68% to 92.12%, with notable improvements in the recognition rates for difficult-to-classify species, such as Parnara guttatus Bremer and Grey (PGBG) and Papilio xuthus Linnaeus (PXLL), increasing from 38% and 47% to 62% and 93%, respectively. Furthermore, these models showed significant accuracy advantages over standard dropout methods on test sets, with faster training times compared to four conventional CNN models, highlighting their suitability for mobile applications. Theoretical analyses of model gradients and Fisher information provide further insight into the grouped dropout strategy’s role in improving CNN interpretability for insect recognition tasks.
Abstract White blood cells (WBCs) play a remarkable role in the human immune system. To diagnose blood-related diseases, pathologists need to consider the characteristics of WBC. The characteristics of WBC can be defined based on the morphological properties of WBC nucleus. Therefore, nucleus segmentation plays a vital role to classify the WBC image and it is an important part of the medical diagnosis system. In this study, color space conversion and k-means algorithm based new WBC nucleus segmentation method is proposed. Then we localize the WBC based on the location of segmented nucleus to separate them from the entire blood smear image. To classify the localized WBC image, we propose a new convolutional neural network (CNN) model by combining the concept of fusing the features of first and last convolutional layers, and propagating the input image to the convolutional layer. We also use a dropout layer for preventing the model from overfitting problem. We show the effectiveness of our proposed nucleus segmentation method by evaluating with seven quality metrics and comparing with other methods on four public databases. We achieve average accuracy of 98.61% and more than 97% on each public database. We also evaluate our proposed CNN model by using nine classification metrics and achieve an overall accuracy of 96% on BCCD test database. To validate the generalization capability of our proposed CNN model, we show the training and testing accuracy and loss curves for random test set of BCCD database. Further, we compare the performance of our proposed CNN model with four state-of-the-art CNN models (biomedical image classifier) by measuring the value of evaluation metrics.
This paper recommends a customized sixlayer Convolutional Neural Network (CNN) for detecting facial region in real-time using Vasuki Patel dataset (2,562 face and 90 non-face images). It consists of a compact model of convolution, batch normalization, average pooling, dropout, and dense layers that balance efficiency and accuracy with tradeoff. Trained for 50 iterations, it accomplished a 98% train and an 88% validation set accuracy with strong learning capability and low level of overfitting. Compared with traditional methods, it is superior under background and illuminance variation and under pose variation and is therefore suitable for applications with security, identity authentication, and automatic attendance tracking.
In the face of a large number of network attacks, intrusion detection system can issue early warning, indicating the emergence of network attacks. In order to improve the traditional machine learning network intrusion detection model to identify the behavior of network attacks, improve the detection accuracy and accuracy. Convolutional neural network is used to construct intrusion detection model, which has better ability to solve complex problems and better adaptability of algorithm. In order to solve the problems such as dimension explosion caused by input data, the albino PCA algorithm is used to extract data features and reduce data dimensions. For the common problem of convolutional neural networks in intrusion detection such as overfitting, Dropout layers are added before and after the fully connected layer of CNN, and Sigmoid is selected as the intrusion classification prediction function. This reduces the overfitting, improves the robustness of the intrusion detection model, and enhances the fault tolerance and generalization ability of the model to improve the accuracy of the intrusion detection model. The effectiveness of the proposed method in intrusion detection is verified by comparison and analysis of numerical examples.
The 2019 Coronavirus (COVID-19) virus has caused damage on people's respiratory systems over the world. Computed Tomography (CT) is a faster complement for RT-PCR during peak virus spread times. Nowadays, Deep Learning (DL) with CT provides more robust and reliable methods for classifying patterns in medical pictures. In this paper, we proposed a simple low training proposed customized Convolutional Neural Networks (CNN) customized model based on CNN architecture that layers which are optionals may be included such as the layer of batch normalization to reduce time taken for training and a layer with a dropout to deal with overfitting. We employed a huge dataset of chest CT slices images from diverse sources COVIDx-CT, which consists of a 16,146-image dataset with 810 patients of various nationalities. The proposed customized model's classification results compared to the VGG-16, Alex Net, and ResNet50 Deep Learning models. The proposed CNN model shows robustness by achieving an overall accuracy of 93% compared to 88%, 89%, and 95% for the VGG-16, Alex Net, and ResNet50 DL models for the classification of 3 classes. When this relates to binary classification, the classification accuracy of the proposed model and the VGG-16 models were identical (almost 100% accurate), with 0.17% of misclassification in the class of Non-Covid-19, the Alex Net model achieved almost 100% classification accuracy with 0.33% misclassification in the class of Non-Covid-19. Finally, ResNet50 achieved 95% classification accuracy with 5% misclassification in the Non-Covid-19 class. © The Author(s).
A brain-computer interface (BCI) based on electroencephalography (EEG) can provide independent information exchange and control channels for the brain and the outside world. However, EEG signals come from multiple electrodes, the data of which can generate multiple features. How to select electrodes and features to improve classification performance has become an urgent problem to be solved. This paper proposes a deep convolutional neural network (CNN) structure with separated temporal and spatial filters, which selects the raw EEG signals of the electrode pairs over the motor cortex region as hybrid samples without any preprocessing or artificial feature extraction operations. In the proposed structure, a 5-layer CNN has been applied to learn EEG features, a 4-layer max pooling has been used to reduce dimensionality, and a fully-connected (FC) layer has been utilized for classification. Dropout and batch normalization are also employed to reduce the risk of overfitting. In the experiment, the 4 s EEG data of 10, 20, 60, and 100 subjects from the Physionet database are used as the data source, and the motor imaginations (MI) tasks are divided into four types: left fist, right fist, both fists, and both feet. The results indicate that the global averaged accuracy on group-level classification can reach 97.28%, the area under the receiver operating characteristic (ROC) curve stands out at 0.997, and the electrode pair with the highest accuracy on 10 subjects dataset is FC3-FC4, with 98.61%. The research results also show that this CNN classification method with minimal (2) electrode can obtain high accuracy, which is an advantage over other methods on the same database. This proposed approach provides a new idea for simplifying the design of BCI systems, and accelerates the process of clinical application.
Next-generation networks are data-driven by design but face uncertainty due to various changing user group patterns and the hybrid nature of infrastructures running these systems. Meanwhile, the amount of data gathered in the computer system is increasing. How to classify and process the massive data to reduce the amount of data transmission in the network is a very worthy problem. Recent research uses deep learning to propose solutions for these and related issues. However, deep learning faces problems like overfitting that may undermine the effectiveness of its applications in solving different network problems. This paper considers the overfitting problem of convolutional neural network (CNN) models in practical applications. An algorithm for maximum pooling dropout and weight attenuation is proposed to avoid overfitting. First, design the maximum value pooling dropout in the pooling layer of the model to sparse the neurons and then introduce the regularization based on weight attenuation to reduce the complexity of the model when the gradient of the loss function is calculated by backpropagation. Theoretical analysis and experiments show that the proposed method can effectively avoid overfitting and can reduce the error rate of data set classification by more than 10% on average than other methods. The proposed method can improve the quality of different deep learning-based solutions designed for data management and processing in next-generation networks.
Computer vision with deep learning has recently emerged for Automatic Violence Detection and Classification (AVDC) with enormous potential. This paper reports an early development of a new Deep Convolutional Neural Network (DCNN) model that we call BrutNet. Building on the Gated Recurrent Unit (GRU), the BrutNet is designed to operate on the patterns within multiple frames of a video or video clips of shape 160× 90 with a duration of at least 3 seconds. For obtaining the image-feature set and the pattern of each frame, convolutional layers were considered for each frame of the time distributed layer. The model thus encodes the data from 4D to 2D to obtain a 512-features set for each frame. The temporal nature of these frames is then extracted by the GRU layer as a 1D vector, which is processed by several dense layers. A binary classification is thereby performed denoting the content as violent and non-violent. Dropout layers with a dropping rate of 0.25 were added to avoid overfitting the model. Besides, ReLu-activation and sigmoid-activation functions were defined in the hidden and output layers, respectively. Being trained with a recent high-resolution AVDC video dataset and appropriate hyper-parameters on the NVIDIA Tesla K80 GPU of Google Colab, the initial testing and validation of the model has recorded a test accuracy of 90.00% outperforming the earlier LSTM based ResNet50 model.
Visible light positioning (VLP) has emerged as a promising indoor positioning technology, owing to its high accuracy and cost-effectiveness. In practical scenarios, signal attenuation, multiple light reflections, or light-deficient regions, particularly near room corners or furniture, can significantly degrade the light quality. In addition, the non-uniform light distribution by light-emitting diode (LED) luminaires can also introduce errors in VLP estimation. To mitigate these challenges, recent studies have increasingly explored the use of machine learning (ML) techniques to model the complex nonlinear characteristics of indoor optical channels and improve VLP performance. Convolutional neural networks (CNNs) have demonstrated strong potential in reducing positioning errors and improving system robustness under non-ideal lighting conditions. However, the performance of CNN-based systems is highly sensitive to their hyperparameters, including learning rate, dropout rate, batch size, and optimizer selection. Manual tuning of these parameters is not only time-consuming but also often suboptimal, particularly when models are applied to new or dynamic environments. Therefore, there is a growing need for automated optimization techniques that can adaptively determine optimal model configurations for VLP tasks. In this work, we propose and demonstrate a VLP system that integrates received signal strength (RSS) signal pre-processing, a CNN, and particle swarm optimization (PSO) for automated hyperparameter tuning. In the proof-of-concept VLP experiment, three different height layer planes (i.e., 200, 225, and 250 cm) are employed for the comparison of three different ML models, including linear regression (LR), an artificial neural network (ANN), and a CNN. For instance, the mean positioning error of a CNN + pre-processing model at the 200 cm receiver (Rx)-plane reduces from 9.83 cm to 5.72 cm. This represents an improvement of 41.81%. By employing a CNN + pre-processing + PSO, the mean error can be further reduced to 4.93 cm. These findings demonstrate that integrating PSO-based hyperparameter tuning with a CNN and RSS pre-processing significantly enhances positioning accuracy, reliability, and model robustness. This approach offers a scalable and effective solution for real-world indoor positioning applications in smart buildings and Internet of Things (IoT) environments.
Herbal plants have various health benefits, but their type identification remains challenging for the general public. This study aims to improve the accuracy of herbal plant leaf classification using Convolutional Neural Network (CNN) based on MobileNetV2 architecture. To enhance model performance, various optimization techniques including fine-tuning, batch normalization, dropout, and learning rate scheduling were implemented. The experimental results showed that the proposed optimized model achieved an accuracy of 100%, significantly outperforming previous studies that used standard MobileNet with an accuracy of 86.7%. While these perfect results warrant additional validation with more diverse datasets to confirm generalizability, this study contributes to the development of a more accurate herbal plant classification system that is readily accessible to the general public. Future work should explore model performance under varying environmental conditions and with expanded plant species datasets.
This research addresses the challenge of multi-class image classification on the low-resolution CIFAR-10 dataset by developing and empirically examining various Convolutional Neural Network (CNN) and ResNet-8 architectures. The study systematically evaluates the impact of critical deep learning techniques, including image normalization, batch normalization, and advanced data augmentation, using a sequential CNN baseline. Subsequently, a ResNet-8 model was implemented to mitigate the vanishing gradient problem inherent in deeper architectures. Performance optimization was achieved through a rigorous three-phase Bayesian hyperparameter optimization (HPO) strategy, focusing on learning rate, dropout rate, filter capacity, and optimization algorithms. To overcome the computational bottlenecks of high-capacity models, the experimental framework was accelerated using an NVIDIA GeForce RTX 4090 GPU on the RunPod platform.The results indicate that the systematic addition of batch normalization and data augmentation improved the baseline CNN accuracy from 70.81 percent to 72.23 percent. Crucially, the transition to the ResNet-8 architecture provided a significant performance leap, with the final optimized configuration (Architecture 8) achieving a peak mean accuracy of 85.31 percent +/-0.0044. This model, characterized by 256-filter layers and an optimized learning rate identified through the Bayesian search landscape, demonstrates superior generalization compared to traditional CNN baselines. This study confirms the efficacy of residual learning and high-performance hardware acceleration in achieving robust classification on challenging 32x32 low-resolution datasets like CIFAR-10.
Tomato leaf diseases pose a major threat to crop yield and agricultural productivity worldwide. Although Convolutional Neural Networks (CNNs) have become the standard for imagebased plant disease detection, their performance heavily depends on carefully chosen hyperparameters such as learning rate, dropout rate, and dense layer configuration. In this study, we explore how the Ant Lion Optimization (ALO) algorithm can effectively optimize CNN hyperparameter tuning to enchance the model’s ability to extract disease-relevant features. ALO mimics the hunting strategy of antlions to balance exploration and exploitation in the high-dimensional search space of hyperparameter combinations. We evaluated the proposed ALO-tuned CNN on a real-world tomato leaf disease dataset comprising 30,609 images across 10 classes captured under varied lighting and background conditions. The optimized model achieves a test accuracy of 96.61%, with macro-averaged precision, recall, and F1-score of 97%. These results demonstrate that ALO not only outperforms manual and random search strategies but also reduces the number of evaluations required for convergence. This work highlights the potential of metaheuristic-guided optimization in advancing robust and efficient plant disease diagnosis systems for agricultural applications.
Hyperparameter tuning is an essential procedure for improving the precision and effectiveness of Convolutional Neural Networks (CNNs), especially in the examination of medical images. This paper examines the optimization of CNN hyperparameters by Bayesian methods and Keras Tuner, applied to a dataset of histopathology images of gastric cancer. We modified critical hyperparameters such as the learning rate, batch size, number of filters, kernel size, dropout rate, and optimizer type to enhance the model's ability to identify malignant traits. We employed Bayesian optimization to efficiently navigate the hyperparameter space, significantly reducing the number of iterations required in comparison to exhaustive search techniques. The experimental results demonstrated that the model excelled in classification, converged more rapidly, and exhibited superior generalization. The study emphasizes the effectiveness of Bayesian optimization in CNN applications focused on medical imaging and offers a practical approach for hyperparameter tuning using Keras tools.
—In the era of health digitalization, early detection of pneumonia through medical image analysis is one of the main challenges in improving the quality of health services. This study aims to enhance pneumonia classification performance using Convolutional Neural Network (CNN) architecture optimization and careful hyperparameter tuning. Through the application of optimization techniques such as Random Search, Bayesian Optimization, and tuning key hyperparameters such as the number of convolution layers, kernel size, dropout rate, and learning rate, this research succeeded in identifying the optimal model configuration. The Proposed Method model shows the best overall performance based on research results involving three models: Proposed Method, VGG16, and ResNet50. With the highest F1-Score value of 0.8440, accuracy of 0.9000, and lowest loss of 0.0977, the Proposed Method achieved an optimal balance between recall and Precision. Although VGG16 has the highest recall, its low precision value shows a tendency to produce more false positives. In contrast, the Proposed Method, with the best Precision of 0.7600 and superior accuracy performance, makes it the most reliable model for classifying pneumonia in this study. Experimental results show a significant increase in classification accuracy compared with conventional approaches, thus supporting further implementation in clinical applications. This study also provides insight into the importance of a systematic approach in designing and optimizing CNN models for disease classification tasks, especially pneumonia.
We provide an extensive study of hyperparameter optimization for CNN and Transformer models using the deepfake detection problem. We will use Optuna for Hyperparameter Tuning with Bayesian optimization in which we will tune some important parameters (dropout rate, learning rate, optimizer type). Experiments are performed with a real and fake image dataset extracted from video clips. We evaluate the optimized models in terms of accuracy, precision, recall, F1-score, and ROC-AUC metrics. With hyperparameter discovered by bayesian search, the Transformer model presta the best performance where it attained 94% accuracy and 0.94 AUC compare to optimized CNN for which we get 90% accuracy, 0.89 AUC. Moreover, we observe that Transformer has a better recall to identify fake videos (96% vs 83% for CNN), indicating its robustness in identifying deepfakes. In conclusion, the proposed framework for hyperparameter optimization can be a valuable addition to the growing body of research aimed at developing robust and effective models for deepfake detection.
This study evaluates whether targeted optimization (early data augmentation, dropout, L2 kernel regularization, and exponential learning-rate decay) improves the generalization and accuracy of a convolutional neural network (CNN) model on a three-class image dataset (450 training images consisting of backpack, eraser, and pen images and 15 test images). The optimized CNN reached 100% test accuracy (15/15), outperforming the unoptimized CNN along with other traditional models tested in the same conditions (Random Forest 46.67%, Gradient Boosting 53.33%). Whilst training, the unoptimized model showed a validation loss trend of bottoming early and inconsistently rising afterwards, a clear sign of overfitting. On the other hand, the optimized model maintained a stable validation curve along with a significantly smaller train-validation gap. The added optimizations moderately increased the runtime of the CNN (optimized CNN = 399 seconds, unoptimized CNN = 355 seconds, Random Forest = 10 seconds, Gradient Boosting = 1946 seconds). All models failed a simple unknown-class detection relying on confidence thresholds, mainly due to the small dataset size. Such results note that targeted optimization can notably improve CNN test accuracy and generalization, although having minor drawbacks.
In this study, we used nature-inspired metaheuristic algorithms for hyperparameter optimization, a key problem in deep learning. Specifically, the Gray Wolf Optimization (GWO) and Harris Hawk Optimization (HHO) algorithms were comparatively evaluated on the MNIST dataset, which is widely used for handwritten digit classification. The main objective of the study was to achieve high classification accuracy while simultaneously keeping the network structure as simple and computational cost low as possible. In this study, critical hyperparameters such as the number of layers, number of neurons, learning rate, dropout rate, and batch size were optimized. The findings show that both algorithms achieve high accuracy rates, but HHO, with a test accuracy of 98.1%, surpasses GWO's performance of 97.94%. Importantly, HHO achieved this success with fewer layers, a lower epoch count, and minimal regularization techniques. This demonstrates the advantage of HHO, especially under limited hardware resources and time constraints. In conclusion, our proposed study highlights that GWO and HHO algorithms provide effective solutions in hyperparameter optimization; moreover, HHO stands out with its low computational cost and high generalization ability.
This study aims to develop a transparent and reliable artificial intelligence model for pneumonia diagnosis using chest X-ray images by implementing and optimizing Convolutional Neural Networks (CNN) with Saliency Mapping. The research employed a combination of advanced optimization techniques, including aggressive data augmentation, class weight balancing, L2 regularization, dropout, batch normalization, and adaptive learning rate scheduling to address overfitting challenges. A functional prototype was then deployed in a Streamlit-based application to provide an interactive diagnostic tool. The evaluation results demonstrated that the model achieved strong performance, with high training accuracy and competitive testing accuracy, while visualization through Saliency Mapping provided meaningful interpretability by highlighting critical lung regions, particularly the mid-to-lower lung fields and hilar area. This interpretability ensured that the system not only delivered accurate predictions but also supported clinical reasoning by aligning with radiological characteristics of early-stage pneumonia and bronchopneumonia. The integration into a user-friendly application illustrates the potential for practical adoption in healthcare settings, especially in regions with limited access to radiologists. Overall, the study demonstrates that combining CNN-based classification with explainable AI techniques can bridge the gap between advanced machine learning and clinical applicability, offering a strategic pathway to improve pneumonia diagnosis and patient outcomes.
Tomato leaf diseases are a major challenge in agriculture, often leading to reduced yields and economic loss. Deep learning, particularly Convolutional Neural Networks (CNNs), has shown strong potential to automatically identify diseases from images. However, the performance of CNN depends heavily on hyperparameters, and traditional tuning methods like manual adjustment, grid search, or random search, which can be inefficient and time consuming. In this paper, we present a CNN model optimized using the Whale Optimization Algorithm (WOA), a bioinspired approach that mimics the humpback whale bubble net hunting strategy. WOA was used to tune three important hyperparameters: learning rate, dense layer units, and dropout rates. The proposed WOA CNN was evaluated on a 10-class tomato leaf disease dataset and achieved a test precision of 97.94%, outperforming both manual tuning and conventional search strategies. These findings demonstrate the effectiveness of WOA in improving CNN performance and highlight its potential to build practical, scalable plant disease detection systems.
The five types of acne scars rolling, ice pick, boxcar, hypertrophic, and keloidal present significant psychological and skin-related issues. With a global acne occurrence of 9.4 %, the manual diagnosis by dermatologists remains subjective, labor-intensive, and inaccessible in resource-limited settings, highlighting the urgent requirement for an automated solution. To enhance hyperparameter tuning (learning rate, filter size, dropout), this research merges Convolutional Neural Networks (CNN) with Particle Swarm Optimization (PSO) to develop ScarNet, a refined deep-learning framework for accurate acne scar classification. To improve generalization, a newly curated dataset of 250 images from five types of scars underwent preprocessing through Lab color space transformation and guided filtering, followed by augmentation with flipping and rotation. The ScarNet architecture incorporated convolutional layers, batch normalization, and SoftMax classification. The application of PSO optimization significantly boosted performance, surpassing the standalone ScarNet (0.9253) and traditional models such as VGG-16 (0.8900) and Inception-V3 (0.8800), achieving 0.9670 accuracy & Validation Accuracy 0.9799, 0.8521 precision, and 0.8700 recall. These results demonstrate the effectiveness of integrating CNN with PSO in delivering high-quality performance. Performances and adaptable assessments make it a possibly valuable resource for telehealth and medical applications.
No abstract available
To solve the problem that signal processing-based bearing fault diagnosis methods require rich professional knowledge and experience and cannot cope with the large amount of data, an end-to-end one-dimensional parallel multi-channel deep convolutional neural network (PMDCNN) has been proposed. The experimental results show that the network model can effectively mine the different signal characteristics of different channels, which can realize the bearing fault diagnosis with high accuracy. The performance of the model is improved by increasing the depth of the network, using batch normalization (BN), Dropout technique, and choosing the right optimizer and learning rate. To reduce the number of model parameters, a local sparse structure is employed. Firstly, the larger convolutional kernels are replaced by smaller sized ones, thus greatly reducing the redundant parameters of that convolutional layer. Secondly, the fully connected layer is substituted by a global average pooling layer to further reduce the number of parameters. The final result is that the number of parameters is only 1/66th of the previous model with similar accuracy, and the training time is reduced, which shows the optimization method of local sparse structure achieves good results. The average fault diagnosis accuracy of 99.90 ± 0.03% in Case Western Reserve University (CWRU), 99.80 ± 0.05% in comprehensive experimental platform, and 99.65 ± 0.10% in University of Ottawa Rolling-element Dataset-Vibration and Acoustic Faults under Constant Load and Speed conditions (UORED-VAFCLS) also indicates that the optimized network model is good at extracting fault features from the original fault data, reflecting the superiority of the proposed method.
The prediction of heart disease with several comorbid conditions (hypertension, diabetes, arrhythmia, and obesity) makes the prediction complex due to nonlinear interactions between conditions and heterogeneous risk patterns. As a solution to these research issues, this paper suggests an Adaptive Convolutional Neural Network (CNN) architecture to predict multi-comorbid heart disease on a unified U.S. Heart Disease dataset (n = 1,025) that is comprised of a union of Cleveland, Hungarian, Switzerland, and VA repositories. Out of the 76 attributes, 24 clinically validated features were then picked up based on correlation ranking, SHAP-based interpretability analysis, and cardiologist validation. These characteristics were converted to structured representations of images through a feature-to-image encoding plan that facilitated learning of patterns that are deep both in space and relationally. In the proposed CNN architecture, layer-driven optimization, regularization through dropout, adaptive learning-rate scheduling, and reproducibility control are included to guarantee consistent generalization in a wide range of comorbidity groups. SHAP-based feature attribution generates clinical and understandable explanations about model predictions, achieving clinical interpretability. There is strong performance on experimental evaluation with 92% accuracy, 89% precision, 90% recall, an F1-score of 0.89, and an ROC-AUC of 0.94, and high sensitivity and specificity that is applicable in a clinical decision-support setting. The proposed framework can provide scalable, interpretable, and clinically reviewed methods of automated multi-comorbid cardiac risk assessment to aid in the integration of AI transparency into healthcare system components and clinical implementation.
No abstract available
Model optimization is a problem of great concern and challenge for developing an image classification model. In image classification, selecting the appropriate hyperparameters can substantially boost the model’s ability to learn intricate patterns and features from complex image data. Hyperparameter optimization helps to prevent overfitting by finding the right balance between complexity and generalization of a model. The ensemble genetic algorithm and convolutional neural network (EGACNN) are proposed to enhance image classification by fine-tuning hyperparameters. The convolutional neural network (CNN) model is combined with a genetic algorithm GA) using stacking based on the Modified National Institute of Standards and Technology (MNIST) dataset to enhance efficiency and prediction rate on image classification. The GA optimizes the number of layers, kernel size, learning rates, dropout rates, and batch sizes of the CNN model to improve the accuracy and performance of the model. The objective of this research is to improve the CNN-based image classification system by utilizing the advantages of ensemble learning and GA. The highest accuracy is obtained using the proposed EGACNN model which is 99.91% and the ensemble CNN and spiking neural network (CSNN) model shows an accuracy of 99.68%. The ensemble approaches like EGACNN and CSNN tends to be more effective as compared to CNN, RNN, AlexNet, ResNet, and VGG models. The hyperparameter optimization of deep learning classification models reduces human efforts and produces better prediction results. Performance comparison with existing approaches also shows the superior performance of the proposed model.
In order to solve the problem of household garbage classification accurately and efficiently, convolutional neural network classifier is an effective method. In this study, a garbage classification device was designed, and the image dataset Wit-Garbage for garbage classification was constructed based on the device by collecting garbage images under different light intensity and weather environment. The performances of the five network models VGG16, ResNet50, DenseNet121, MobileNet V2, Inception V3 on this dataset were compared by transfer learning. Then, the lightweight convolutional neural network MobileNet V2 was optimized by fine-tuning the hyperparameters, such as the type of optimizer, learning rate, Dropout parameter and number of freezing layers, respectively, and the training accuracy and efficiency were discussed in detail. Finally, the optimized model MobileNet V2 was deployed to the self-made garbage classification device for verification. The results show that the MobileNet V2 network model is superior to other networks in terms of training accuracy and efficiency on the proposed dataset, when the image input size was 224 ∗ 224 pixels, the Adamax optimizer was adopted, the learning rate was 0.0001, the Dropout was less than 0.5, and the number of frozen layers is less than 30. The actual verification results show that the average accuracy of the optimized network model trained on the proposed dataset for MSW classification was up to 98.75%, and compared with the model before optimization, the average accuracy was improved by 2.83%, and the average detection time was reduced by 69%.
Received: 17 June 2020 Accepted: 2 January 2021 Glaucoma is a chronic ocular neurodegenerative condition characterised by optic neuropathy and visual disturbance, corresponding to optic disc cupping and degeneration of optic nerve fibres. Globally 76 million people are suffered from glaucoma, it is an aggregate name of a gathering of eye conditions which can cause vision loss and in the end bring about visual deficiency by dynamic basic and useful harm to the optic nerve. It is one of the main reasons for visual deficiency. Early detection of this condition can reduce the progression of the disease and saves many users throughout the world. The detection and identification of glaucoma in an image are important for controlling the loss of the vision. Even there are numerous models for classification of glaucoma disease, the effected rate and prediction rate is less. But accurate identification is a foremost important thing. In order to train CNN, data augmentation and dropout were performed. A classifier was trained to identify the disc fundus images of healthy and glaucomatous eyes using feature vector representation of each input image to integrate the results from each CNN model, removing the second completely linked layer. In this manuscript convolutional neural network (CNN) is proposed with an optimization mechanism for classification of glaucoma OCT images using CNN based firefly optimization model. The proposed firefly based CNN model performs better when compared to the state of art mechanisms.
This paper introduces a fine-tuned Convolutional Neural Network (CNN) for multiclass classification of brain tumors on contrast-enhanced T1-weighted MRI scans. The proposed model integrates batch normalization, dropout, and lightweight convolutional blocks to extract discriminative features while maintaining computational efficiency suitable for embedded deployment. Experiments were conducted on the Figshare dataset comprising 3,064 MRI slices from 233 patients with gliomas, meningiomas, or pituitary tumors. Images were preprocessed through resizing, normalization. The model was trained using the Adam optimizer with a learning rate of 1e-4, a batch size of 32, and 100 epochs. Evaluation metrics included accuracy, precision, recall, and F1-score. The fine-tuned CNN achieved an overall accuracy of 94.08%, with class-specific performance indicating strong results for pituitary tumors (precision 95.65%, recall 95.96%) and meningiomas (precision 90.20%, recall 88.81%), while glioma classification showed high sensitivity (recall 96.85%) but lower precision (75.00%). To validate real-world applicability, the model was converted to TensorFlow Lite and deployed on a Raspberry Pi 4, achieving an inference time of approximately 60 ms per image. These findings demonstrate that fine-tuned CNNs can offer a competitive and resource-efficient solution for computer-aided diagnosis of brain tumors, balancing accuracy and practicality in clinical environments with limited computational resources.
Our objective is to develop an advanced computational approach to classify patterns of Interstitial Lung Disease(ILD) using a Hybrid model of Convolutional Neural Network(CNN) and Genetic Algorithm (GA). ILD encompasses a group of lung disorders that are often difficult to diagnose due to the complex and overlapping patterns observed in medical imaging. By leveraging the optimization capabilities of GAs, this work seeks to enhance the accuracy of ILD pattern classification. The approach finds the best hyperparameter to train the CNN classifier which identify the most essential nine classes of ILD patterns such as consolidation, fibrosis, emphysema, micronodules, and ground-glass opacities(GGO) and healthy. Here we have considered learning rate, dropout rate and filters as hyper parameters. Our experimental results show outstanding performance, attaining an accuracy of 98.54% and an F1 score of 98.55%. This model is expected to aid healthcare professionals in making more accurate and timely diagnoses of ILD, ultimately improving patient outcomes.
Ensuring quality power to all consumers is the fundamental objective of the power systems. This necessitates continuous monitoring and accurate classification of various power quality issues. The accurate classification helps to perform accurate diagnoses, prioritize the issues, and plan mitigation strategies. Deep learning models like Convolutional Neural Networks (CNN) can accurately classify power quality issues. However, the classification accuracy of these models highly relies on the selection of activation functions and hyperparameter optimization techniques. Further, inbuilt optimization techniques like Adam, SGD, RMSprop, Adagrad, AdaDelta, Nadam, FTRL, and AdamW optimize only weights and/or learning rates and/or exponential decay rates. This necessitates a separate optimization technique for other hyperparameters like batch size, number of filters, and dropout rate. Therefore, this paper proposes a two-stage optimized CNN model for power quality classification. It implements two CNN architectures to show the impact of the number of layers on classification accuracy. The weights and exponential decay rates are optimized using an inbuilt optimization technique. The learning rate, batch size, number of filters, and dropout rates are optimized using Bayesian optimization. Further, it analyses the impact of various activation functions like Rectified Linear Unit (ReLU), Leaky ReLU, Exponential Linear Unit (ELU), Gaussian Error Linear Unit (GELU), Parametric Rectified Linear Unit (PReLU), and Tanh on classification accuracy. It selects the best combination of activation functions. Performance analysis is conducted on a standard power quality dataset containing diverse disturbance signals. It shows that the proposed model offers a classification accuracy of 99.71%.
: Cancer affecting the colorectal region (CRC) is a leading contributor to world-wide cancer-associated mortality, where timely and precise diagnosis is critical for enhancing patient prognosis. Although histopathology remains the benchmark for diagnostic accuracy, its manual assessment is time-consuming and subject to variability among pathologists. To address these challenges, this paper proposes a novel optimization framework “CrcMRFA,” based on the Manta Ray Foraging Optimization (MRFO) algo-rithm to fine-tune convolutional neural nets initialized with learned weights from prior training for histopathological image classification. Three architectures-VGG16, ResNet50, and DenseNet121 were optimized with respect to key hyperparameters, encompassing parameters such as learning rate, batch size, dropout rate, and the number of trainable layers. Experimental evaluation leveraging the Kather_texture_2016_image_tiles_5000 dataset demonstrated significant performance enhancements across all metrics. The optimized ResNet50 achieved the best results, with accuracy improving from 90.32% to 95.97% and the Weighted Sum Metric (WSM) exceeding 96.77%. These findings highlight the potential of MRFO in automating CNN optimization for robust and efficient CRC tissue classification.
Dyslexia, regarded as a learning disability among children, is a neurological disorder that affects children’s cognitive capacity, resulting in difficulties with reading and writing. As early diagnosis is critical for the well-being of dyslexic children, in this study, we introduce an innovative approach that utilizes the voxel data within functional Magnetic Resonance Imaging (fMRI), specifically the axial, coronal, and sagittal planes of fMRI and transforms them into two dimensional (2D) images. Our proposed system involves adapting and optimizing the LeNet-5 architecture’s hyperparameters by employing Bayesian Optimization within a K-fold cross-validation framework. The hyperparameters under consideration include the number of filters in the three convolution layers, the volume of units within the dense layer, the dropout rate, and a customized learning rate. The accuracy score of our model is 98%, with an average precision of 97.5%, an average recall of 97.5%, and an average F1 score of 98%. Additionally, to gain insight into our model’s decision-making process, we utilize GRAD-CAM (Gradient-weighted Class Activation Mapping) to visualize the regions of the brain that contribute more significantly to the model’s decision, illustrate the activations of individual activation maps, and highlight the learned patterns.
Alzheimer’s Disease (AD) is a neurodegenerative disease characterized by deficiencies in memory and cognitive function, and primarily afflicts aging individuals. It is difficult to detect early due to the result of less sensitive nature of traditional imaging methods. This paper introduces NeuroFusion, a new hybrid model of Convolutional Neural Networks (CNN) and Cuckoo search Optimization (CSO) for enhancing the accuracy of AD detection based on MRI data. There are several well-known design issues with CNNs: they are naturally prone to overfit and it is difficult to find good hyperparameters. Tackle these problems in NeuroFusion by utilizing CSO to fine-tune hyperparameters (e.g., learning rate, kernel size, dropout rates) via bio-inspired stochastic search State-Of-the-art accuracy (96.52%) on the ADNI dataset is reported for the model, surpassing the traditional CNNs (92%) and machine learning methods (e.g., SVM: 83%). It is also highly interpretable in the clinical context by integrating explainable AI (XAI) methods.
Abstract This study explores the use of deep learning to analyze genetic data and predict phenotypic traits associated with schizophrenia, a complex psychiatric disorder with a strong hereditary component yet incomplete genetic characterization. We applied Convolutional Neural Networks models to a large-scale case-control exome sequencing dataset from the Swedish population to identify genetic patterns linked to schizophrenia. To enhance model performance and reduce overfitting, we employed advanced optimization techniques, including dropout layers, learning rate scheduling, batch normalization, and early stopping. Following systematic refinements in data preprocessing, model architecture, and hyperparameter tuning, the final model achieved an accuracy of 80 %. These results demonstrate the potential of deep learning approaches to uncover intricate genotype-phenotype relationships and support their future integration into precision medicine and genetic diagnostics for psychiatric disorders such as schizophrenia.
No abstract available
Phishing attacks seriously threaten information privacy and security within the Internet of Things (IoT) ecosystem. Numerous phishing attack detection solutions have been developed for IoT; however, many of these are either not optimally efficient or lack the lightweight characteristics needed for practical application. This paper proposes and optimizes a lightweight deep-learning model for phishing attack detection. Our model employs a two-fold optimization approach: first, it utilizes the analysis of the variance (ANOVA) F-test to select the optimal features for phishing detection, and second, it applies the Cuckoo Search algorithm to tune the hyperparameters (learning rate and dropout rate) of the deep learning model. Additionally, our model is trained in only five epochs, making it more lightweight than other deep learning (DL) and machine learning (ML) models. The proposed model achieved a phishing detection accuracy of 91%, with a precision of 92% for the ’normal’ class and 91% for the ‘attack’ class. Moreover, the model’s recall and F1-score are 91% for both classes. We also compared our approach with traditional DL/ML models and past literature, demonstrating that our model is more accurate. This study enhances the security of sensitive information and IoT devices by offering a novel and effective approach to phishing detection.
Problems in the classification of plant diseases are often caused by difficulties in determining optimal hyperparameters, which can affect the accuracy and generalization ability of the model. This study aims to explore the influence of hyperparameter variations, especially the number of units, dropout rate, and learning rate, and compare the effectiveness of manual optimization methods with Random Search. Experiments are conducted on MobileNetV2-based CNN models using the Rice Plant Disease (RPD) dataset, which consists of 3698 images of rice leaf diseases classified into 10 categories, including fungal, bacterial, and virus infections. Random Search is applied to find optimal combinations. The results showed that settings with low dropouts and small learning rates, such as 0.001, resulted in higher accuracy. Random Search shows superior performance with an accuracy of 97.84% and significantly reduces validation losses, especially with longer training durations. These findings underscore the importance of the proper selection of hyperparameters and the effectiveness of Random Search in finding the optimal configuration for the CNN model.
X-ray radiography, in the form of medical imaging, exists the basis for the diagnosis of musculoskeletal disorders. However, manual interpretation by radiologists is not only time-consuming but also inconsistent, thus automated deep learning-based classification becomes inevitable. This study indeed applies Convolutional Neural Networks (CNNs) to classify musculoskeletal radiographs from the MURA dataset which contains 5,107 X-ray images. To evaluate and validate the performance of the four deep learning models VGG19, Xception, MobileNetV2, and ConvNeXtBase, the following classification performance measures were used: Recall, Precision, and F1-score. Among these models, MobileNetV2 turned out to be the best baseline with an F1-score of 0.92. The application of hyperparameter optimization with the increased learning rate, dropout, and $\text{L 2}$ regularization has resulted in the score increase to 0.94. The preprocessing techniques included auto-orientation, resizing, and data augmentation (which consisted of blurring, rotation, shearing, and adding noise) and all these contributed to the model's generalize ability. To strengthen clinical validity, the interpretability methods Local Interpretable Model-Agnostic Explanations (LIME), Saliency Maps, and Gradient-weighted Class Activation Mapping (Grad-CAM) were combined to facilitate the visual representation of the model's focus on clinically relevant anatomical structures. These techniques corroborated the model's attention to the joint and bone areas, corresponding to the radiology proficiency. The fine-tuned MobileNetV2 model together with interpretability has a high accuracy and computational efficiency property, thus being clinically appropriate for the support of radiologists. This method keeps improving diagnostic accuracy, decreases workload, and propels AI-based medical image analysis for the identification of musculoskeletal disorders.
Facial emotion recognition (FER) is a challenging task in computer vision with wide applications in areas such as human-computer interaction, security, and healthcare. To improve the performance of convolutional neural networks (CNN) in FER, a novel approach combining CNN with grey wolf optimization (GWO) was proposed to optimize key hyperparameters. The CNN-GWO model was fine-tuned by adjusting hyperparameters such as the number of convolutional layers, kernel size, number of filters, and learning rate. This model was evaluated using the CK+ dataset and achieved an accuracy of 90.97%, demonstrating its competitive performance compared to existing methods. The optimized hyperparameters included three convolutional layers, 35 filters, a kernel size of 5, a learning rate of 0.045990, a dropout rate of 0.4988, and a max pooling size of 3. These results confirm that GWO is effective in optimizing CNN for FER tasks, providing an efficient solution to enhance model accuracy. This approach shows promising potential for future FER applications, highlighting GWO as a valuable optimization technique for CNN architectures.
Sentiment analysis technology based on deep learning plays a crucial role in improving high costs and low efficiency associated with manual annotation. However, traditional basic neural network structures are relatively simple and hard to complete the fine-grained sentiment analysis. In this paper, we introduce a six-category emotion framework for practical sentiment analysis and enhance a simple convolutional neural network(CNN) by incorporating multiple convolutional layers. Additionally, in terms of hyper-parameter tuning for performance optimization, this paper first applies the control variable method to identify individual optimal parameters like filter size, dropout rate, etc. The random search and Bayesian optimization methods are then applied to find the best hyperparameter combinations. Comparisons for accuracy and efficiency reveal that the relatively simpler mechanism of random search remains a practical choice for hyper-parameter optimization, which balances the overall accuracy and efficiency.
Convolutional Neural Networks (CNNs) require careful hyperparameter tuning, which significantly impacts model performance. Manual tuning is time-consuming and often suboptimal. This paper proposes an automated approach for CNN hyperparameter optimization using the Lion Optimization Algorithm (LOA), a bio-inspired metaheuristic. LOA optimizes various CNN hyperparameters, including the number of convolutional layers, filter sizes, stride, padding, activation functions, dropout rate, learning rate, pooling type, fully connected layers, and batch size. The approach is validated on the MNIST handwritten digit dataset and a COVID-19 chest X-ray dataset. Experimental results show that LOA achieves 99.5% accuracy on MNIST and 88.44% on the COVID-19 dataset. Moreover, LOA demonstrates superior convergence speed and performance compared to existing state-of-the-arts.
Tuberculosis (TB) is an infectious illness that continues to be a major global health challenge due to its high rates of disease and death. Early detection of TB using chest X-ray images often faces challenges related to subjective interpretation by radiologists and limited sensitivity and specificity. This study develops a Convolutional Neural Network (CNN) model to classify chest X-ray images into Normal and Tuberculosis classes, using a total of 2,198 chest X-ray images consisting of 1,173 Normal and 1,025 Tuberculosis samples. Hyperparameter optimization was carried out using the Hyperband algorithm implemented in the Keras Tuner framework to obtain the best parameter combination that produced optimal model performance. The main hyperparameters tuned included the number of dense layers, the number of units per layer, dropout rate, and learning rate. The optimization process yielded the best configuration consisting of two dense layers with 160 and 64 units, a dropout rate of 0.3, and a learning rate of 0.0011. The optimization process increased the model’s accuracy from 0.84 to 0.88 and reduced the validation loss from 0.44 to 0.34, indicating a more stable and effective learning outcome after optimization using Hyperband. The application of Hyperband successfully enhanced learning stability, accelerated convergence, and improved overall model performance. These results indicate that hyperparameter optimization using Hyperband not only enhances CNN-based TB classification accuracy but also strengthens its potential clinical utility by supporting more consistent, rapid, and objective early diagnosis in real-world healthcare settings.
Early detection is crucial for improving patients’ chances of recovery, and using artificial intelligence (AI) in medical image analysis has become a potential solution to enhance diagnostic accuracy. Convolutional Neural Network (CNN) models have been widely used in breast tumor classification; however, the main challenges in their implementation are the risk of overfitting and prediction uncertainty. This study aims to analyse the impact of Dropout and Variational Inference (VI) techniques on the performance of CNN models in breast tumor classification. This study applied an AlexNet-based CNN model to two primary datasets: the Breast Cancer Histopathological Dataset and the Breast Ultrasound Dataset. The results show that the model without dropout achieved the highest accuracy ($90.38 \%$) and F1-score (0.90). In contrast, the model with a high dropout rate experienced a drop in accuracy to $73.55 \%$, indicating that excessive dropout can hinder model learning. This study concludes that an optimal CNN architecture, combined with appropriate dropout selection and optimizer choice, can improve breast tumor classification performance. Additionally, Variational Inference provides extra benefits in handling prediction uncertainty, which can enhance the model’s reliability in medical applications.
: This paper presents a systematic evaluation of deep learning architectures for facial expression recognition, focusing on improving recognition accuracy through advanced CNN models. This paper investigates three different architectures: Conv2D with Max Pooling (M1), Conv2D with Max Pooling & Dropout (M2), and EfficientNet-B0 (M3), and examines their effectiveness in recognizing eight different facial expressions (Anger, Content, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise). The experimental framework uses the Tsinghua facial expression database, which has a baseline recognition rate of 79.08% by human evaluators. The study yields several significant findings through rigorous comparative analysis using standardized metrics, such as accuracy measurements and confusion matrices. The EfficientNet-B0 model achieves superior performance with an average accuracy of 86.47%, while Conv2D with Max Pooling demonstrates robust performance at 81.68%, both exceeding the accuracy of human evaluators. Notably, the Conv2D with Max Pooling & Dropout model shows reduced effectiveness at 73.25%. Heat map analysis reveals specific recognition patterns: happiness achieves the highest recognition rate (96%), while sadness shows the lowest (63%). The study provides three main contributions: (1) empirical evidence for the superiority of EfficientNet-B0 for facial expression recognition, (2) comprehensive error pattern analysis through heat map visualization, and (3) practical insights into the limitations of dropout layers in expression recognition tasks. These findings advance the technical understanding of CNN architectures in emotion recognition systems and provide practical guidelines for implementing efficient facial expression recognition systems in real-world applications.
The demand for efficient and accurate fruit quality monitoring systems, particularly for bananas, continues to grow. This research presents a lightweight Convolutional Neural Network model derived from the MobileNetV2 architecture. It's implemented in a Ba-Nanas! mobile application, to classify banana ripeness levels based on images captured using a smartphone camera. The primary dataset consists of 4,000 images representing three Indonesian banana varieties (Ambon, Kepok, and Susu), divided into 16 maturity level classes. This study evaluates four hyperparameter configurations (batch size, learning rate, dropout, and the number of frozen layers). Experimental results show that the optimal hyperparameter configuration during training was a batch size of 32, learning rate of 0.0001, dropout rate of 0.4, and freeze layer of -4, achieving validation accuracy of 88.17% and the lowest validation loss of 0.3939. In the testing phase, the best performance was obtained using batch size of 32, learning rate of 0.0001, dropout rate of 0.5, and freeze layer of −10, resulting in accuracy of 85.67%, precision of 86.09%, recall of 86.25%, and $\mathbf{F1}$-score of 85.87%. The proposed system demonstrates high precision, recall, and $\mathbf{F1}$-score values and fast inference time on mobile devices. This study demonstrates the potential of the MobileNetV2 model in real-time AI-based agricultural applications, particularly for fruit ripeness classification at the farmer or end-consumer level.
This paper investigates the use of Convolutional Neural Networks (CNNs) for plant disease diagnosis, utilizing the notion of transfer learning to develop a deep CNN network at a low cost. The dataset utilized includes 30,000 pictures from 18 distinct categories. Several architectural models, including ResNet50, ResNet50V2, VGG16, VGG19, EfficientNetB0, MobileNet, MobileNetV2, NASNetMobile, and DenseNet, were created using transfer learning methodologies. The models are trained on the target plant disease dataset while keeping the features learnt from large-scale image datasets like ImageNet. Performance criteria such as accuracy, precision, and recall are used to assess the models' accuracy in detecting various plant diseasesx. Furthermore, the research investigates the effect of dropout on accuracy and other model parameters in the context of transfer learning. Finally, the outcomes of a tailored model with and without batch normalization are compared.
Abstract. This study investigates the impact of two crucial hyperparameters, learning rate and dropout rate, on the performance of Convolutional Neural Networks (CNNs) on the CIFAR-10 dataset. Learning rate controls how quickly a model adapts to new data, while dropout rate helps prevent overfitting by randomly omitting neurons during training. These parameters are critical because they directly influence model convergence speed, accuracy, and generalization ability. By experimenting with various combinations of learning rates (0.01, 0.001, 0.0001) and dropout rates (0.3, 0.5, 0.7), we analyze the model's performance in terms of accuracy and overfitting. Our results suggest that a learning rate of 0.001 combined with a dropout rate of 0.5 yields the best balance between learning efficiency and generalization. This study offers important insights for optimizing hyperparameters in CNN training.
To obtain light ensemble model through clearly explained effective ensemble member selection and finding data representation in various valuable forms are major challenges in medical image classification tasks. Despite numerous deep learning (DL) models were developed to solve the generalization problem in DL through ensemble learning, most of the models lacks the evaluation of the effects of each ensemble member for the final ensemble model. Additionally, existing ensemble models tend to include huge number of parameters and use images only in RGB channels or greyscale forms missing the crucial representations of the dataset to reach robust classification outcomes, particularly in medical applications. In this study, novel ensemble model IMed-CNN proposed solution to above-mentioned gaps in the field by introducing the data in ten various forms to the model and applying systematic model dropout (SMDE) with unique true prediction (UTP) analysis which insures to choose only useful ensemble members. To verify the performance of the IMed-CNN, extensive experiments was designed for testing. Results illustrate that IMed-CNN outperformed baseline models.
This paper focuses on the classification of dermoscopic images to identify the type of Skin lesion whether it is benign or malignant. Dermoscopic images provide deep insight for the analysis of any type of skin lesion. Initially, a custom Convolutional Neural Network (CNN) model is developed to classify the images for lesion identification. This model is trained across different train-test split and 30% split of train data is found to produce better accuracy. To further improve the classification accuracy a Batch Normalized Convolutional Neural Network (BN-CNN) is proposed. The proposed solution consists of 6 layers of convolutional blocks with batch normalization followed by a fully connected layer that performs binary classification. The custom CNN model is similar to the proposed model with the absence of Batch normalization and presence of Dropout at Fully connected layer. Experimental results for the proposed model provided better accuracy of 89.30%. Final work includes analysis of the proposed model to identify the best tuning parameters.
In recent years, artificial intelligence (AI) has become an automated tool for detecting cardiovascular diseases using ECG images. Activation functions are the core of neural network models, ranging from shallow to deep convolutional neural networks (CNN). In ECG image‐based cardiovascular disease detection, activation functions enable the network to capture non‐linear patterns like irregular heartbeats and subtle anomalies. The proposed CNN architecture in this paper comprised convolutional layers for feature extraction, followed by custom activation functions to introduce non‐linearity and enhanced learning. These features are downsampled using max pooling and aggregated through global average pooling. Fully connected layers, with a suitable dropout regularization, map the features to the final classification output, which is probabilistically determined using a softmax activation function. This paper used a public dataset of ECG images of cardiac patients to analyze the significance of activation functions in predicting the four main cardiac abnormalities: irregular heartbeat, myocardial infarction, history of myocardial infarction, and normal person classes. We have analyzed 19 different activation functions for their detection performance on the same dataset. The detection performance is compared with the existing state‐of‐the‐art studies. A set of activation functions is suggested for robust and accurate detection of cardiovascular disease using ECG images.
Deep learning finds use in many fields, including security, marketing, and healthcare; gender classification has drawn a lot of interest in all of them. This paper uses an open-source Kaggle dataset to show a convolutional neural network model for gender classification. There are 2250 pictures in all, equally divided between males and women. Image preprocessing to maintain a constant size of 126x126 pixels allowed them be utilized into the model. The CNN architecture consists in three convolutional layers plus max-pooling layers, dropout layers for regularity, and fully connected layers for classification. The model was trained using eighty percent of the data; twenty percent was used for validation. With balanced performance across both classes, the model obtains an accuracy of 79%, per results. Though validation loss varies somewhat, the model shows good gender classification, which makes it an interesting tool for practical uses. By tuning the model for improved generalization and performance, one can make still more progress.
The hashtag #kaburajadulu has recently gained significant traction in Indonesia’s digital discourse, reflecting a growing public sentiment of disillusionment with domestic conditions and aspirations to pursue better opportunities abroad. This phenomenon, amplified through social media platforms, provides valuable insight into broader societal concerns, including economic hardship, employment challenges, access to education, public services, and declining institutional trust. To investigate these sentiments, this study analyzes user responses in the comment section of a BBC News Indonesia YouTube video using a Convolutional Neural Network (CNN). Given the substantial class imbalance in the dataset, particularly within the neutral sentiment category, this study incorporates class weighting and dual dropout layers, along with the application of the Synthetic Minority Oversampling Technique (SMOTE), to improve model generalization and mitigate overfitting. The performance of the proposed model is compared with previous research utilizing a CNN-LSTM architecture, which achieved an accuracy of 79.91%. Experimental results indicate that the CNN model integrated with SMOTE achieved a notable improvement in classification performance, reaching an accuracy of 82.80%. Precision, recall, and F1-score metrics also showed consistent enhancement compared to the baseline CNN model without SMOTE. These findings suggest that SMOTE-based oversampling is effective in addressing class imbalance and significantly improves the performance of deep learning models in sentiment analysis tasks.
This paper endeavours to distinguish between Burmese areca nut and Assamese areca nut based on their cross-sectional texture and pattern. Employing advanced machine learning techniques, specifically Convolutional Neural Networks (CNNs), we constructed a model capable of accurately classifying these two variants using images of their cross-sections. The paper explores two distinct datasets: one comprising unedited images and the other generated by applying edge detection algorithms to these images. A custom Python script utilizing OpenCV libraries was developed to create edge maps, providing enhanced feature representations for the CNN model. The model architecture involves convolutional layers, max-pooling layers, and dropout layers for regularization. We conducted extensive experimentation with varied batch sizes and epochs to optimize the model's performance. Our results reveal that the model achieved high accuracy in distinguishing between the two areca nut variants, demonstrating the potential for machine learning to contribute to quality control and identification in this domain. This paper not only presents a successful classification model but also sheds light on the efficacy of edge maps as input data for image-based classification tasks.
Management of crop health and guarantes of food security depend on accurate detection of maize leaf diseases. highly laborious and prone to errors are conventional manual inspection techniques. This work uses a convolutional neural network can automatically increase disease detection precision for maize leaves. With images scaled to $256 \times 256$ pixels and enhanced through rotation and scaling to boost generalization, the CNN model was Instruction utilising a photo dataset. classified into four categories: Common rust; blight; grey leaf spot; healthy leaves. Built with convolutional, dropout, pooling, fully connected layers, trained with a batch size of 32 using categorical cross-entropy loss and Adam optimiser spanning 15 epochs. With an overall accuracy of 89%, evaluation guidelines comprising recall, precision and F1-score point to excellent performance. While Blight (precision: 0.84, recall: 0.87) and Grey Leaf Spot (precision: 0.73, recall: 0.71) exhibit somewhat less performance, the model shines in spotting Healthy leaves (precision: 1.00, recall: 0.99) and Common Rust (precision: 1.00, recall: 0.99). With reliable disease diagnosis to promote crop management and enhance food security, the CNN model has great practical relevance in agriculture.
The rising incidence and death rate of breast cancer make it a serious issue. To tackle this problem, this study has applied convolutional Neural Networks (CNN) for breast cancer detection using mammography images. CNN can automatically learn hierarchical feature representations directly from raw image data. This study evaluates the performance of CNN models in predictive tasks, focusing on the impact of various parameters such as dropout rate, the number of convolutional layers, batch size, and data augmentation techniques. This study uses the mammography images dataset of CBIS-DDSM (Curated Breast Imaging Subset of Digital Database for Screening Mammography). The experimental results highlight the critical role of parameter selection in optimizing model performance.
Leukemia, a severe hematological malignancy, presents significant diagnostic challenges due to the subtle morphological differences among blood cell subtypes. This research proposes a convolutional neural network (CNN)-based framework for the automated classification of blood cells into four categories: benign, malignant pre-B, malignant pro-B, and malignant early pre-B. Utilizing a carefully curated dataset of $\mathbf{3, 2 4 2}$ microscopic blood cell images, the model architecture incorporates convolutional layers, batch normalization, max pooling, and dropout layers to enhance feature extraction and reduce overfitting. The model was trained over 30 epochs with a batch size of 32, employing real-time data augmentation and dynamic learning rate scheduling. Experimental evaluation demonstrated outstanding performance, achieving an overall accuracy of 97.9%, with class-specific precision, recall, and F1-scores consistently exceeding 0.94. Accuracy and loss plots confirmed the model's stable convergence, while the confusion matrix revealed minimal misclassification across classes. This study contributes to the advancement of automated hematological diagnostics, demonstrating the potential of deep learning to support early and accurate leukemia detection. The findings suggest promising applications in clinical workflows, enhancing diagnostic precision, reducing manual workload, and improving patient outcomes. Future work may explore larger datasets, alternative model architectures, and real-world deployment strategies to further strengthen diagnostic capabilities.
This study presents a systematic comparison and implementation of a Convolutional Neural Network (CNN) for Facial Emotion Recognition (FER) across multiple public datasets, namely FER-2013, FER+, RAF-DB, and AffectNet. Unlike previous studies that focused on a single dataset or different model architectures, The main contributions of this research consist of three aspects. First, a five-layer integrated CNN architecture is used to enable fair cross-dataset evaluation within a consistent training and testing framework. Second, structured hyperparameter tuning is performed, including variations in learning rate, batch size, filter configuration, and dropout rate, resulting in a stable and reproducible model configuration. Third, an in-depth analysis was conducted to explore the impact of annotation quality and dataset complexity on model performance. The experimental results show that FER+ achieved the highest accuracy and weighted F1 score thanks to better label consistency, followed by RAF-DB, while FER-2013 and AffectNet experienced a decline in performance due to label noise and higher pose and lighting variations. Further confusion matrix analysis shows that happy and neutral expressions are classified more reliably, while negative emotions such as anger, fear, and disgust remain challenging. To validate practical application, the best-performing model was implemented in a webcam-based facial expression recognition prototype using Python and OpenCV, demonstrating reliable frame-level emotion inference under controlled real-time conditions.
Computer-aided Invasive Ductal Carcinoma (IDC) grading classification systems based on deep learning have shown that deep learning may achieve reliable accuracy in IDC grade classification using histopathology images. However, there is a dearth of comprehensive performance comparisons of Convolutional Neural Network (CNN) designs on IDC in the literature. As such, we would like to conduct a comparison analysis of the performance of seven selected CNN models: EfficientNetB0, EfficientNetV2B0, EfficientNetV2B0-21k, ResNetV1-50, ResNetV2-50, MobileNetV1, and MobileNetV2 with transfer learning. To implement each pre-trained CNN architecture, we deployed the corresponded feature vector available from the TensorFlowHub, integrating it with dropout and dense layers to form a complete CNN model. Our findings indicated that the EfficientNetV2B0-21k (0.72B Floating-Point Operations and 7.1 M parameters) outperformed other CNN models in the IDC grading task. Nevertheless, we discovered that practically all selected CNN models perform well in the IDC grading task, with an average balanced accuracy of 0.936 ± 0.0189 on the cross-validation set and 0.9308 ± 0.0211on the test set.
Intelligent handwritten recognition using hybrid CNN architectures based-SVM classifier with dropout
Abstract Text recognition in Arabic handwritten scripts is an active research field. These recognition systems face numerous challenges, including enormous open data-bases, infinite variation in people’s handwriting, and freestyle. In this manuscript, Authors model deep learning architecture which can efficiently be utilized to recognizing Arabic handwritten scripts. This work explored a new model for both single font and multi-font type which concentrate on two common classifiers which are: Support Vector Machine (SVM) along with Convolutional Neural Network (CNN). Furthermore, authors protected the proposed model against the issue of over-fitting because of the strong performance of dropout technique. Both classification and feature extraction are done automatically. In the light of the error backpropagation method analysis, authors also have been proposed an innovative depth neural network training rule for maximum interval minimum classification error. In the meantime, max-margin minimum classification error (M3CE) and cross entropy are analyzed and hybridized to obtain better outcomes. Authors tested the proposed model on AHDB, AHCD, HACDB, and IFN/ENIT databases. The proposed model performance is compared with the accuracies of text recognition gained from state-of-the-art Arabic text recognition. The proposed model delivers favorable results.
We proposed a convolutional neural network (CNN)-based surrogate model to predict the nonlocal response for flexoelectric structures with complex topologies. The input, i.e. the binary images, for the CNN is obtained by converting geometries into pixels, while the output comes from simulations of an isogeometric (IGA) flexoelectric model, which in turn exploits the higher-order continuity of the underlying non-uniform rational B-splines (NURBS) basis functions to fast computing of flexoelectric parameters, e.g., electric gradient, mechanical displacement, strain, and strain gradient. To generate the dataset of porous flexoelectric cantilevers, we developed a NURBS trimming technique based on the IGA model. As for CNN construction, the key factors were optimized based on the IGA dataset, including activation functions, dropout layers, and optimizers. Then the cross-validation was conducted to test the CNN’s generalization ability. Last but not least, the potential of the CNN performance has been explored under different model output sizes and the corresponding possible optimal model layout is proposed. The results can be instructive for studies on deep learning of other nonlocal mech-physical simulations.
For many unmanned aerial vehicle (UAV)-based applications, especially those that need to operate with resource-limited edge networked devices in real-time, it is crucial to have a lightweight computing model for data processing and analysis. In this study, we focus on UAV-based forest fire imagery detection using a lightweight convolution neural network (CNN). The task is challenging owing to complex image backgrounds and insufficient training samples. Specifically, we enhance the MobileNetV2 model with an attention mechanism for UAV-based image classification. The proposed model first employs a transfer learning strategy that leverages the pre-trained weights from ImageNet to expedite learning. Then, the model incorporates randomly initialised weights and dropout mechanisms to mitigate over-fitting during training. In addition, an ensemble framework with a majority voting scheme is adopted to improve the classification performance. A case study on forest fire scenes classification with benchmark and real-world images is demonstrated. The results on a publicly available UAV-based image data set reveal the competitiveness of our proposed model as compared with those from existing methods. In addition, based on a set of self-collected images with complex backgrounds, the proposed model illustrates its generalisation capability to undertake forest fire classification tasks with aerial images.
While threats in cyberspace are in a state of constant evolution, the use of AI in cyber defense has numerous opportunities and dangers. This paper evaluates adversarial robustness for deep learning networks in network security applications by introducing a novel one-dimensional CNN model for malicious traffic detection. We conducted rigorous end-to-end processing and analysis of network traffic data, using a balanced dataset of 200,000 connections (46.52% benign, 53.48% malicious). Our model architecture includes three convolutional blocks (32, 64, and 128 filters, respectively) with batch normalization and dropout mechanisms (0.3 and 0.2, respectively). We use standardized feature scaling, label encoding for categorical features, and stratified sampling to maintain class distribution integrity. Our proposed approach achieved remarkable performance metrics compared to standard approaches with a 95% AUC-ROC result (15% better than baseline CNN models) and detection rate of 99.99% malicious traffic (compared to 98.5% with standard architectures). The model demonstrates better robustness with only 10 false negatives out of 107,895 malicious samples, a 67% enhancement compared to current state-of-the-art systems. Training dynamics show great stability with minimal overfitting (validationtraining loss difference of only 0.01), indicating good generalization ability.
Lung cancer remains a primary cause of cancer-related deaths globally, and early diagnosis is crucial to improve patient outcomes. Whole-body computed tomography (CT) imaging holds promise as a diagnostic tool for the early detection of lung cancer. Still, malignancies' highly variable and spatially complex properties require refined image analysis techniques. Although convolutional neural networks (CNNs) have become great tools for image analysis, performance relies highly on network architecture design. This project focuses on CNN architecture optimization for lung cancer detection from CT scan images at a very early stage. We use this method to train and evaluate different CNN architectures on a big dataset of lung CT images. Architecture [(TODO) link] Add dropout, experiments with network designs and hyperparameters (layer size, filter sizes & activations ReLu or PReLU Conclusions [same recommendations?] (Previous fin data discussion)] ((Same records for whatever it's worth)) We will use data augmentation to try and make our models a bit more general. Once we get the best CNN model, we can assess its capability to identify lung cancer images earlier than other state-of-the-art methods and present our results by comparing them with them on a separate testing dataset. The project can potentially improve diagnostic tests for early lung cancer detection, thus improving patient survival.
This study aims to develop an efficient and accurate deep learning-based model for the classification of plant leaf diseases using Convolutional Neural Networks (CNN). The objective is to automate disease detection in agricultural crops to assist farmers and agricultural experts in early and reliable diagnosis. The model is trained on the publicly available “Plant Village CLAHE Processed Data” dataset, which includes high-resolution RGB images of healthy and diseased plant leaves. Images are preprocessed through resizing (128×128), normalized, and split into training, validation, and test sets. Data augmentation techniques such as flipping, zooming, and rotation are used to improve generalization. A custom CNN architecture comprising convolutional, pooling, dense, and dropout layers is employed and trained using the Adam optimizer. Exploratory Data Analysis (EDA) ensures data quality and balance. The model achieves impressive results, with 93% test accuracy, 91% precision, 93% recall, and an F1-score of 92%, indicating robust performance in identifying diverse plant diseases. Training accuracy reached 94.64% with a validation accuracy of 92.95%, confirming minimal overfitting. These results validate the model’s reliability for practical use in smart farming solutions, especially in mobile or IoT-based applications for real-time disease monitoring and precision agriculture.
The design and development of intrusion detection systems (IDS) are also widely used to implement deep learning approaches in the field of cybersecurity. Deep-learning techniques are discussed in the development of network-based IDS in this paper. The research shows the classic IDS methods and their drawbacks in the processing of complex and adaptable network traffic, and examines the convolutional neural networks (CNN) and recurrent neural networks (RNNs) to use superior detection and precision of the IDS. The paper utilizes multiclassification based on the SoftMax framework and offers a structure on the basis of New Structured Label (NSL) Knowledge Discovery and Data Mining (KDD) data sample. The preprocessing component of the methodology adopted is concerned with the removal of redundancy, the removal of null values, as well as one-hot encoding to enhance the quality of data and training. The analysis of CNN structures that include extra hidden layers and dropout binary indicates that it has better performance in identifying suspicious traffic. Moreover, this paper compares and contrasts RNN models with CNN when it comes to the consideration of the temporal patterns. The experimental findings reveal that CNN-Focal and CNN-Cross models have obtained greater accuracy, which forms a solid foundation for IDS in the future that is adapted using deep learning in a real network setup.
本报告对Dropout在CNN中的应用进行了系统梳理,涵盖四大核心方向:医学影像分析(聚焦诊断准确性与小样本处理)、通用图像识别与行业实战(聚焦鲁棒性与泛化能力)、多模态与时序信号分析(聚焦混合架构特征提取),以及Dropout技术与超参数优化的方法论创新。综合研究表明,Dropout已不仅是防止过拟合的基石,通过与元启发式算法、注意力机制及新型变体(如通道Dropout、加权Dropout)的结合,其在各类复杂任务中的模型可靠性与性能上限得到了显著提升。