机器学习 非扁平输入 预测抑郁
多模态数据融合抑郁识别技术
该类研究通过整合音视频、文本、生理信号(EEG/fNIRS)等多种异构数据源,利用Transformer、Mamba及多模态融合网络挖掘模态间的互补信息,实现对抑郁程度的精准量化评估。
- Improving the diagnostic accuracy for major depressive disorder using machine learning algorithms integrating clinical and near-infrared spectroscopy data.(Cyrus S. H. Ho, Y. L. Chan, T. Tan, Gabrielle W. N. Tay, T. Tang, 2022, Journal of Psychiatric Research)
- Neuroreflect: A Multimodal AI Framework for Real-Time Emotion and Mental Disorder Detection Using Speech and Text(S. Philip, Nehal Shaju, E. Priya, Reena Pagare, 2025, 2025 IEEE Pune Section International Conference (PuneCon))
- A Depression Detection Auxiliary Decision System Based on Multi-Modal Feature-Level Fusion of EEG and Speech(Zhaolong Ning, Hao Hu, Ling Yi, Zihan Qie, Amr Tolba, Xiaojie Wang, 2024, IEEE Transactions on Consumer Electronics)
- mDRA: A Multimodal Depression Risk Assessment Model Using Audio and Text(Longhui Zhou, Bin Hu, Zhi-Hong Guan, 2025, IEEE Signal Processing Letters)
- Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121(Musyysb Yousufi, R. Damaševičius, R. Maskeliūnas, 2024, Brain Sciences)
- Multimodal Mental Health Digital Biomarker Analysis From Remote Interviews Using Facial, Vocal, Linguistic, and Cardiovascular Patterns(Zifan Jiang, S. Seyedi, Emily Griner, Ahmed Abbasi, Ali Bahrami Rad, Hyeokhyen Kwon, Robert O. Cotes, Gari D. Clifford, 2024, IEEE Journal of Biomedical and Health Informatics)
- Multimodal Depression Detection Using Audio, Video and Text Fusion with PCA-Based Feature Selection(Taufeeq Ahmed, Ramesh Chandra Sahoo, 2025, 2025 3rd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT))
- Depression level prediction via textual and acoustic analysis(Jisun Hong, Jihun Lee, Daegil Choi, Jaehyo Jung, 2025, Computers in Biology and Medicine)
- An objective quantitative diagnosis of depression using a local-to-global multimodal fusion graph neural network(Shuyu Liu, Jingjing Zhou, Xuequan Zhu, Ya Zhang, Xinzhu Zhou, Shaoting Zhang, Zhi Yang, Zijin Wang, Ruoxi Wang, Yizhe Yuan, Xin Fang, Xiongying Chen, Yanfeng Wang, Ling Zhang, Gang Wang, Cheng Jin, 2024, Patterns)
- Transformer-MIL and AdaBoost Fusion for Non-Invasive Depression Detection(A. Saji, M. P, Alwin Poulose, 2025, 2025 International Conference on Robotics and Mechatronics (ICRM))
- Fusing Multi-Level Features from Audio and Contextual Sentence Embedding from Text for Interview-Based Depression Detection(Junqi Xue, Ruihan Qin, Xinxu Zhou, Honghai Liu, Min Zhang, Zhiguo Zhang, 2024, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes(Jiayu Ye, Yanhong Yu, Lin Lu, Hao Wang, Yunshao Zheng, Yang Liu, Qingxiang Wang, 2025, IEEE Transactions on Circuits and Systems for Video Technology)
- DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection(Jiaxin Ye, Junping Zhang, Hongming Shan, 2024, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Development of a Depression Detection System using Speech and Text Data(Dr. B. Surekha Reddy, Kondaveti Jishitha, Dept, of Ece, V. Akshaya, Bhavani Dept, P. Aishwarya, D. Ece, 2023, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT))
- Cross-Silo, Privacy-Preserving, and Lightweight Federated Multimodal System for the Identification of Major Depressive Disorder Using Audio and Electroencephalogram(Chetna Gupta, Vikas Khullar, Nitin Goyal, Kirti Saini, Ritu Baniwal, Sushil Kumar, Rashi Rastogi, 2023, Diagnostics)
- A Robust Hybrid Deep Learning Model for Multiclass Depression Classification from Speech Audio(N. Sulistianingsih, Galih Hendro Martono, 2026, International Journal of Image, Graphics and Signal Processing)
- Multimodal integration of neuroimaging and genetic data for the diagnosis of mood disorders based on computer vision models.(Seungeun Lee, Yongwon Cho, Yu Ji, Minhyek Jeon, A. Kim, Byung-Joo Ham, Yoonjung Yoonie Joo, 2024, Journal of Psychiatric Research)
- Wavelet Convolutions for Audio-Visual Fusion in Mental Health Disorder Detection(Yichun Li, Douglas Amobi Amoke, S. Naqvi, 2025, 2025 25th International Conference on Digital Signal Processing (DSP))
- Multimodal Depression Detection from Social Media X Using a Hybrid LSTM-GRU Model and FastText Feature Expansion(Gusti Raka Ananto, Erwin Budi Setiawan, 2025, 2025 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS))
- A Novel Audio-Visual Multimodal Semi-Supervised Model Based on Graph Neural Networks for Depression Detection(Yaqin Li, Chenjian Sun, Yihong Dong, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Climate and Weather: Inspecting Depression Detection via Emotion Recognition(Wen Wu, Mengyue Wu, K. Yu, 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- STE-Mamba: Automated Multimodal Depression Detection through Emotional Analysis and Spatio-Temporal Information Ensemble(Zulong Lin, Yaowei Wang, Yujue Zhou, Fei Du, Yun Yang, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Emotion-Guided Graph Attention Networks for Speech-Based Depression Detection under Emotion-Inducting Tasks(Yuqiu Zhou, Yongjie Zhou, Yudong Yang, Yang Liu, Jun Huang, Shuzhi Zhao, Rongfeng Su, Lan Wang, Nan Yan, 2025, Interspeech 2025)
- A novel approach to depression detection using POV glasses and machine learning for multimodal analysis(Hakan Kayış, Murat Çelik, V. ÇAKIR KARDEŞ, Hatice Aysima Karabulut, Ezgi Özkan, Çınar Gedizlioğlu, Burcu Özbaran, Nuray Atasoy, 2025, Frontiers in Psychiatry)
- Depression Detection Based on Recursive Joint Specific Cross-Modal Fusion Enhanced by Multi-Modal Features(Jiaxin Wang, Zhuochang Xu, Lei Jin, Xiaojia Wang, Chunfeng Yang, R. L. B. Jeannès, 2025, 2025 10th International Conference on Biomedical Signal and Image Processing (ICBIP))
- Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition(Esaú Villatoro-Tello, S. Pavankumar Dubagunta, J. Fritsch, Gabriela Ramírez-de-la-Rosa, P. Motlícek, M. Magimai-Doss, 2021, Interspeech 2021)
- Real-time Acoustic based Depression Detection using Machine Learning Techniques(B. Yalamanchili, N. Kota, Maruthi Saketh Abbaraju, Venkata Sai Sathwik Nadella, S. Alluri, 2020, 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE))
- Voice of Mind, a Deep Learning Model for Depression and Anxiety Assessment From Acoustic and Lexical Vocal Biomarkers.(S. Regondi, F. Roncone, V. Colombo, R. Pugliese, E. Bagli, G. Russo, A. Panella, M. Radavelli, S. Bolognini, 2025, Journal of Voice)
- Depression detection methods based on multimodal fusion of voice and text(Zhenrong Xu, Yuan Gao, Fang Wang, Longqian Zhang, Li Zhang, Junke Wang, Jie Shu, 2025, Scientific Reports)
- Estimating Severity of Depression From Acoustic Features and Embeddings of Natural Speech(Sri Harsha Dumpala, S. Rempel, K. Dikaios, M. Sajjadian, R. Uher, Sageev Oore, 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- HAN-DistilBERT for Depression Detection from Twitter Data: Integrating Hierarchical Attention with Transformer-Based Model(Saif Rahman, Tarannum Binte Tariq, Nusrot Jahan, Naimul Hasan Naime, Nazneen Akhter, 2024, 2024 27th International Conference on Computer and Information Technology (ICCIT))
- FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning(Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho D. Choi, Sung-Ju Lee, 2023, Conference on Empirical Methods in Natural Language Processing)
- Beyond Short-Frame Acoustic Features: Capturing Long-Term Speech Patterns for Depression Detection(Shizuku Fushimi, Mohammad Aiman Azani, Mizuto Chiba, Yoshifumi Okada, 2026, Technologies)
- A Multimodel Deep Learning Framework for Emotion Aware Mental Health Assessment(Pranav T, P. Prabhavathy, Kanipriya M, Venkatesan M, Devipriya S K, Rashmi N S, 2025, 2025 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT))
- Speech Signal Analysis to Predict Depression(Mogeeb A. Saeed, Vladimir Komashinsky, Saleem A. Mohammed, Noha N. Abdulqader, Laila Q. Saif, 2025, International Journal of Advanced Networking and Applications)
- Multimodal Depression Classification using Articulatory Coordination Features and Hierarchical Attention Based text Embeddings(Nadee Seneviratne, C. Espy-Wilson, 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Automatic Depression Detection: an Emotional Audio-Textual Corpus and A Gru/Bilstm-Based Model(Yingli Shen, Huiyu Yang, Lin Lin, 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Comparative Efficacy of MultiModal AI Methods in Screening for Major Depressive Disorder: Machine Learning Model Development Predictive Pilot Study(Donghao Chen, Pengfei Wang, Xiaolong Zhang, Runqi Qiao, Nanxi Li, Xiaodong Zhang, Honggang Zhang, Gang Wang, 2025, JMIR Formative Research)
- An Automatic Depression Recognition Method from Spontaneous Pronunciation Using Machine Learning(Minghao Du, Wenquan Zhang, Tao Wang, Shuang Liu, Dong Ming, 2022, Proceedings of the 2022 9th International Conference on Biomedical and Bioinformatics Engineering)
- Enhanced classification and severity prediction of major depressive disorder using acoustic features and machine learning(Lijuan Liang, Yang Wang, Hui Ma, Ran Zhang, Rongxun Liu, Rongxin Zhu, Zhiguo Zheng, Xizhe Zhang, Fei Wang, 2024, Frontiers in Psychiatry)
- MFMamba: A Multimodal Fusion State Space Model for Depression Recognition(Jingyi Liu, Yuanyuan Shang, Mengyuan Yang, Zhuhong Shao, Jiaxi Lu, Tie Liu, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
神经影像与脑电生理信号特征分析
聚焦于处理EEG、fMRI、sMRI等非扁平神经生物信号,通过时频分析、空间连接特征提取以及图神经网络,探讨脑部功能连接与抑郁症病理机制的内在关联。
- A novel EEG-based major depressive disorder detection framework with two-stage feature selection(Yujie Li, Yingshan Shen, Xiaomao Fan, Xingxian Huang, Haibo Yu, Gansen Zhao, Wenjun Ma, 2022, BMC Medical Informatics and Decision Making)
- A novel fuzzy deep learning network for electroencephalogram classification of major depressive disorder.(Rong Hu, Tangsen Huang, Xiangdong Yin, Ensong Jiang, 2025, Computer Methods in Biomechanics and Biomedical Engineering)
- Depression Detection with EEG Based on Mutual Information Regularization(Haoyu Lin, Tianyuan Ma, Chen Zhao, Jun Qi, Tingting Zhang, Xiangzeng Kong, 2024, 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA))
- NOVEL APPROACH EXPLAINS SPATIO-SPECTRAL INTERACTIONS IN RAW ELECTROENCEPHALOGRAM DEEP LEARNING CLASSIFIERS(Charles A. Ellis, Abhinav Sattiraju, Robyn Miller, V. Calhoun, 2023, 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW))
- Applied Machine Learning in EEG data Classification to Classify Major Depressive Disorder by Critical Channels(Sudhir Dhekane, Anand Khandare, 2025, Journal of Electronics, Electromedical Engineering, and Medical Informatics)
- FPGA Implementation of Enhanced Intelligent Signal Processing System for Depression(T. S., S. R., M. P., A. L., 2025, IET Signal Processing)
- Dynamic reward-augmented ensemble learning for EEG signal classification in major depressive disorder(Jin Xu, Yu Ziwei, Zhaojun Xu, 2025, Biomedical Physics & Engineering Express)
- EEG-VARNet: An Advanced Deep Learning Model for AccuratePrediction and Classification of Major Depressive Disorder Using EEG Data(Udutala Mahender, V. Sathiyasuntharam, 2024, 2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA))
- Impact of Feature Selection Techniques on the Performance of Machine Learning Models for Depression Detection Using EEG Data(M. Hassan, N. Kaabouch, 2024, Applied Sciences)
- EEG-Based Classification of MDD Using Different Feature Sets(Oleksandr Sukholeister, Adrian Nakonecnyi, 2025, 2025 IEEE 13th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS))
- Functional connectivity signatures of major depressive disorder: machine learning analysis of two multicenter neuroimaging studies(S. Gallo, A. El‐Gazzar, P. Zhutovsky, R. Thomas, N. Javaheripour, Meng Li, L. Bartova, Deepti R. Bathula, U. Dannlowski, C. Davey, T. Frodl, I. Gotlib, S. Grimm, D. Grotegerd, T. Hahn, Paul J Hamilton, B. Harrison, A. Jansen, T. Kircher, B. Meyer, I. Nenadić, S. Olbrich, Elisabeth R Paul, L. Pezawas, M. Sacchet, P. Sämann, G. Wagner, H. Walter, M. Walter, G. V. van Wingen, 2023, Molecular Psychiatry)
- A Local-to-Global Graph Neural Network for Major Depressive Disorder Classification from rs-fMRI(Leoni-Stavroula Christakou, Tiziana Currieri, Francesco Prinzi, Salvatore Vitabile, 2025, 2025 International Joint Conference on Neural Networks (IJCNN))
- Suicidal Tendency and Depression Diagnosing Medical Agent Using fNIRS and VFT(Joo Hun Yoo, Harim Jeong, J. An, Hong Jin Jeon, Tai-Myung Chung, 2023, 2023 IEEE International Conference on Agents (ICA))
- Mood Disorder Severity and Subtype Classification Using Multimodal Deep Neural Network Models(Joo Hun Yoo, Harim Jeong, J. An, Tai-Myung Chung, 2024, Sensors)
- Major Depressive Disorder Detection using Dung Beetle Optimization based QGNN Classification(R.K Shalini, V. Sujitha, P. Parameswari, 2025, 2025 3rd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA))
- Classifying Major Depressive Disorder Using Multimodal MRI Data: A Personalized Federated Algorithm(Zhipeng Fan, Jingrui Xu, Jianpo Su, Dewen Hu, 2025, Brain Sciences)
- Through the Youth Eyes: Training Depression Detection Algorithms with Eye Tracking Data(Derick Axel Lagunes-Ramírez, Gabriel González-Serna, Leonor Rivera-Rivera, Nimrod González-Franco, María Y. Hernández-Pérez, J. Reyes-Ortíz, 2025, IEEE Latin America Transactions)
- Teaching Machines to Know Your Depressive State: On Physical Activity in Health and Major Depressive Disorder(Kun Qian, Hiroyuki Kuromiya, Zixing Zhang, Jinhyuk Kim, Toru Nakamura, K. Yoshiuchi, Björn Schuller, Yoshiharu Yamamoto, 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC))
- Prediction of self-reported depression scores using person-generated health data from a virtual 1-year mental health observational study(M. Makhmutova, R. Kainkaryam, Marta Ferreira, J. Min, M. Jaggi, I. Clay, 2021, Proceedings of the 2021 Workshop on Future of Digital Biomarkers)
- Predictive brain networks for major depression in a semi-multimodal fusion hierarchical feature reduction framework.(Jie Yang, Yingying Yin, Zuping Zhang, J. Long, Jian Dong, Yuqun Zhang, Zhi Xu, Lei Li, Jie Liu, Yonggui Yuan, 2018, Neuroscience Letters)
- LightFFNet: MDD Prediction on EEG Quantitative Biomarkers(U. Shukla, Shreeya Garg, 2022, 2022 International Conference on Engineering and Emerging Technologies (ICEET))
- Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder(David M Schnyer, P. Clasen, Christopher E. Gonzalez, C. Beevers, 2017, bioRxiv)
- Early-Stage Non-Severe Depression Detection Using a Novel Convolutional Neural Network Approach Based on Resting-State EEG Data(Pascal Penava, Ricardo Buettner, 2024, IEEE Access)
- Discriminating between bipolar and major depressive disorder using a machine learning approach and resting-state EEG data.(M. Ravan, A. Noroozi, M. Sanchez, L. Borden, N. Alam, P. Flor-Henry, G. Hasey, 2022, Clinical Neurophysiology)
- Identification of suicidality in adolescent major depressive disorder patients using sMRI: A machine learning approach.(Su Hong, Yang S. Liu, Bo Cao, Jun Cao, Ming Ai, Jian-mei Chen, A. Greenshaw, L. Kuang, 2020, Journal of Affective Disorders)
- Multivariate Machine Learning Analyses in Identification of Major Depressive Disorder Using Resting-State Functional Connectivity: A Multicentral Study.(Yachen Shi, Linhai Zhang, Zan Wang, Xiang Lu, Tao Wang, Deyu Zhou, Zhijun Zhang, 2021, ACS Chemical Neuroscience)
- Using PCA Machine Learning Approach Based on Psychological Questionnaires and Spectral Characteristics of the EEG to Separate the Healthy Participants and Participants with Major Depressive Disorder(E. Merkulova, I. Kozulin, A. Savostyanov, A. Bocharov, E. Privodnova, 2023, 2023 IEEE 24th International Conference of Young Professionals in Electron Devices and Materials (EDM))
- Classification of major depressive disorder using vertex-wise brain sulcal depth, curvature, and thickness with a deep and a shallow learning model(R. Goya-Maldonado, T. Erwin-Grabner, Ling-Li Zeng, Christopher R. K. Ching, André Aleman, Alyssa R Amod, Zeynep Basgoze, Francesco Benedetti, B. Besteher, Katharina Brosch, Robin Bülow, Romain Colle, Colm G. Connolly, Emmanuelle Corruble, B. Couvy-Duchesne, Kathryn R. Cullen, U. Dannlowski, C. Davey, Annemieke Dols, J. Ernsting, Jennifer W Evans, L. Fisch, P. Fuentes-Claramonte, A. Gonul, I. Gotlib, Hans J Grabe, N. Groenewold, D. Grotegerd, Tim Hahn, J. Hamilton, L. K. Han, B. J. Harrison, T. C. Ho, N. Jahanshad, A. Jamieson, A. Karuk, T. Kircher, B. Klimes-Dougan, S. Koopowitz, T. Lancaster, R. Leenings, Meng Li, David E J Linden, F. MacMaster, David M. A. Mehler, S. Meinert, E. Melloni, Byron A Mueller, B. Mwangi, I. Nenadić, Amar Ojha, Y. Okamoto, M. Oudega, Brenda W. J. H. Penninx, Sara Poletti, E. Pomarol-Clotet, M. J. Portella, J. Raduà, E. Rodríguez-Cano, M. Sacchet, R. Salvador, A. Schrantee, Kang Sim, Jair C Soares, Aleix Solanes, D. J. Stein, F. Stein, A. Stolicyn, S. Thomopoulos, Y. Toenders, Aslihan Uyar-Demir, E. Vieta, Yolanda Vives-Gilabert, H. Völzke, Martin Walter, H. Whalley, S. Whittle, N. Winter, K. Wittfeld, M. J. Wright, Mon-Ju Wu, Tony T. Yang, Carlos Zarate, D. Veltman, L. Schmaal, P. Thompson, 2025, Molecular Psychiatry)
- Identification of major depressive disorder based on Triple-GCN model constructed with multimodal elastic network from higher-order brain connectivity.(Ying Zou, H. Shan, Yuan Li, 2026, Psychiatry Research: Neuroimaging)
- Leveraging Machine Learning and Feature Extraction from Physiological Signals for Emotion Detection(A. Raina, 2025, 2025 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA))
- Comparing resting state and task-based EEG using machine learning to predict vulnerability to depression in a non-clinical population(Pallavi Kaushik, Han Yang, P. Roy, Marieke K. van Vugt, 2023, Scientific Reports)
- Optimized EEG-Based Depression Detection and Severity Staging Using GAN-Augmented Neuro-Fuzzy and Deep Learning Models(Sudhir Dhekane, Anand Khandare, 2025, Journal of Electronics, Electromedical Engineering, and Medical Informatics)
- Classification of Major Depressive Disorder Based on EEG Signals Using Superlet Transformation and ResNet-18(Sudhan M, S. Priya, M. Subathra, S. A, S. T. George, 2025, 2025 8th International Conference on Trends in Electronics and Informatics (ICOEI))
- Prediction of rTMS treatment response in major depressive disorder using machine learning techniques and nonlinear features of EEG signal.(Fatemeh Hasanzadeh, M. Mohebbi, R. Rostami, 2019, Journal of Affective Disorders)
- Hilbert-Huang Transform Embedded Self-Attention Neural Network for EEG-based major depressive disorder vs. healthy controls classification(Junxiang Chen, Kaikun Tian, Yu Ye, Jiaming Liu, 2025, Frontiers in Psychiatry)
- Deep-Asymmetry: Asymmetry Matrix Image for Deep Learning Method in Pre-Screening Depression(Min Kang, Hyunjin Kwon, Jinhyeok Park, Seokhwan Kang, Youngho Lee, 2020, Sensors)
- Brain and Heart Rate Variability Patterns Recognition for Depression Classification of Mental Health Disorder(Qaisar Abbas, M. E. Celebi, Talal Albalawi, Yassine Daadaa, 2024, International Journal of Advanced Computer Science and Applications)
- Automated Detection of Major Depressive Disorder With EEG Signals: A Time Series Classification Using Deep Learning(Alireza Rafiei, Rasoul Zahedifar, C. Sitaula, F. Marzbanrad, 2022, IEEE Access)
- Channel Selection Guided by Layer-Wise Relevance Propagation for CNN-Based EEG Classification of Major Depressive Disorder(Woo-Seok Ahn, Seung-Hwan Lee, Han-Jeong Hwang, 2025, 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC))
- DST-M2D: A Dualistic-Series Transformer Framework for Major Depressive Disorder Analysis Using Multimodal MRI Data(Rakesh Kumar Mahendran, Rohini C, Siva Sankaralingam G, P. Kumar, 2025, 2025 International Conference on Emerging Smart Computing and Informatics (ESCI))
- Improving Multichannel Raw Electroencephalography-based Diagnosis of Major Depressive Disorder via Transfer Learning with Single Channel Sleep Stage Data(Charles A. Ellis, Robyn Miller, V. Calhoun, 2023, bioRxiv)
- Identifying neuroimaging biomarkers of major depressive disorder from cortical hemodynamic responses using machine learning approaches(Zhifei Li, R. McIntyre, S. F. Husain, R. Ho, B. Tran, H. T. Nguyen, S. Soo, C. Ho, Nanguang Chen, 2022, eBioMedicine)
- Implications of Aperiodic and Periodic EEG Components in Classification of Major Depressive Disorder from Source and Electrode Perspectives(Ahmad Zandbagleh, Saeid Sanei, Hamed Azami, 2024, Sensors)
- Toward Probabilistic Diagnosis and Understanding of Depression Based on Functional MRI Data Analysis with Logistic Group LASSO(Yu Shimizu, J. Yoshimoto, Shigeru Toki, M. Takamura, Shinpei Yoshimura, Y. Okamoto, S. Yamawaki, K. Doya, 2015, PLOS ONE)
- Gray and white matter structural examination for diagnosis of major depressive disorder and subthreshold depression in adolescents and young adults: a preliminary radiomics analysis(Huan Ma, Dafu Zhang, Dewei Sun, Hongbo Wang, Jianzhong Yang, 2022, BMC Medical Imaging)
- Identifying suicide attempter in major depressive disorder through machine learning: the importance of pain avoidance, event-related potential features of affective processing.(Huanhuan Li, Shijie Wei, Fang Sun, Jiachen Wan, Ting Guo, 2024, Cerebral Cortex)
- Fusion of eyes-open and eyes-closed electroencephalography in resting state for classification of major depressive disorder(Jianli Yang, Jiehui Li, Songlei Zhao, Yunshu Zhang, Bing Li, Xiuling Liu, 2025, Biomedical Signal Processing and Control)
- Enhanced EEG-based detection of major depressive disorder using maximum likelihood estimation and machine learning(A. K. Choudhary, kamtanath mishra, Rajesh Kumar Lal, Alok Mishra, 2026, Journal of Big Data)
- A machine learning framework involving EEG-based functional connectivity to diagnose major depressive disorder (MDD)(W. Mumtaz, S. Ali, M. A. M. Yasin, A. Malik, 2017, Medical & Biological Engineering & Computing)
语音文本单模态情感与抑郁检测
专门研究利用单一模态(语音或文本)输入进行特征工程(如MFCC、韵律、语义嵌入),侧重于通过NLP技术和生成模型对访谈及社交媒体文本进行抑郁风险筛查。
- Breaking Age Barriers With Automatic Voice-Based Depression Detection(Brian Stasak, Dale Joachim, J. Epps, 2022, IEEE Pervasive Computing)
- Depression recognition using voice-based pre-training model(Xiang Huang, Fang Wang, Yuan Gao, Yilong Liao, Wenjing Zhang, Li Zhang, Zhenrong Xu, 2024, Scientific Reports)
- The Significance of Time Duration and Feature Extraction of Voice Signal Dataset for Depression Classification(Patteera Tongnopparat, Komsan Kiatrungrit, Treesukon Treebupachatsakul, Suvit Poomrittigul, 2024, 2024 16th Biomedical Engineering International Conference (BMEiCON))
- Using Emotionally Rich Speech Segments for Depression Prediction(Jiawei Yu, Heysem Kaya, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Depression detection with machine learning of structural and non‐structural dual languages(Filza Rehmani, Qaisar Shaheen, Muhammad Anwar, Muhammed Faheem, Shahzad Sarwar Bhatti, 2024, Healthcare Technology Letters)
- MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech(Emna Rejaibi, Ali Komaty, F. Mériaudeau, Said Agrebi, Alice Othmani, 2019, Biomedical Signal Processing and Control)
- A deep learning approach for the depression detection of social media data with hybrid feature selection and attention mechanism(M. Bhuvaneswari, V. Prabha, 2023, Expert Systems)
- LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild(Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari, 2024, Information Fusion)
- Vision Transformer for Audio-Based Depression Detection on Multi-Lingual Audio Data(Monica Pratiwi, Samuel Ady Sanjaya, 2024, Proceedings of the 2024 7th International Conference on Digital Medicine and Image Processing)
- From Social Media to Mental Health Insights: A Hybrid CNN-LSTM Model for Depression Detection in Bangladesh(Tuntusree Banik, S. Asgar, Md.Hamid Hosen, Altaf Uddin, Sadia Nawar, Arnob Saha, 2024, 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS))
- Utilizing Temporal Inductive Path Neural Networks for Accurate Voice-Based Depression Classification: A Detailed Approach for Analyzing Speech Patterns to Identify Mental Health States.(K. Ashok kumar, Narsaiah Domala, Vijayakumar Sajjan, Kiran kumar Bhadavath, Sreedhar Jadapalli, Ramadevi Vemula, 2025, Journal of Voice)
- Depression Classification Algorithm Based on Voice Signals Using MFCC and CNN Autoencoders(Jisun Hong, Jihun Lee, Daegil Choi, Jaehyo Jung, 2024, 2024 International Conference on Machine Learning and Applications (ICMLA))
- HCAG: A Hierarchical Context-Aware Graph Attention Model for Depression Detection(M. Niu, Kai Chen, Qingcai Chen, Lufeng Yang, 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Learning Voice Source Related Information for Depression Detection(S. Pavankumar Dubagunta, Bogdan Vlasenko, M. Magimai-Doss, 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Depression Detection in Speech Using Speaker Disentanglement and Multi-Task Learning(Licheng Wan, Zhiyuan Guo, Ke Shi, Fei Xie, Minchao Wu, 2025, 2025 IEEE 8th International Conference on Electronic Information and Communication Technology (ICEICT))
- Depression Detection in Speech Using Generative Models(Lojaine Amr Elghandour, Adel Eltoweiry, M. A. Salem, Nada Sharaf, 2025, 2025 Twelfth International Conference on Intelligent Computing and Information Systems (ICICIS))
- Detecting Anxiety and Depression from Phone Conversations using x-vectors(Namhee Kwon, Shahruk Hossain, Nate Blaylock, Henry O'Connell, Naomi Hachen, Joseph T. Gwin, 2022, SMM22, Workshop on Speech, Music and Mind 2022)
- Bilingual Audio Depression Identification Model by Machine Learning(Suvit Poomrittigul, Komsan Kiatrungrit, Phanomkorn Homsiang, Treesukon Treebupachatsakul, 2025, 2025 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC))
行为感知与移动健康纵向监测
利用可穿戴设备、智能终端传感器收集的持续行为数据(心率、步数、位置、睡眠模式等),建立纵向时序模型以预测抑郁倾向及情绪稳定性。
- Privacy-Conscious Internet Behavior for Depression Detection with Cross-Scale Adaptive Transformer.(Minqiang Yang, Ye Bai, Weihao Zheng, Bin Hu, 2025, IEEE Journal of Biomedical and Health Informatics)
- Towards Privacy-Preserving Depression Detection: Experiments on Passive Sensor Signal Data(Aijia Yuan, Michael Xu, Hongyi Zhu, Sagar Samtani, Edlin Garcia Colato, 2023, 2023 IEEE International Conference on Digital Health (ICDH))
- A Machine Learning Based Study to Predict Depression with Monitoring Actigraph Watch Data(M. Raihan, A. K. Bairagi, Shagoto Rahman, 2021, 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT))
- Automated classification of stress and relaxation responses in major depressive disorder, panic disorder, and healthy participants via heart rate variability(Sangwon Byun, A. Kim, Min-Sup Shin, Hong Jin Jeon, Chul-Hyun Cho, 2025, Frontiers in Psychiatry)
- Depression and Severity Detection Based on Body Kinematic Features: Using Kinect Recorded Skeleton Data of Simple Action(Yanhong Yu, Wentao Li, Yue Zhao, Jiayu Ye, Yunshao Zheng, Xinxin Liu, Qingxiang Wang, 2022, Frontiers in Neurology)
- An unsupervised machine learning approach using passive movement data to understand depression and schizophrenia.(George D. Price, Michael V. Heinz, Daniel Zhao, M. Nemesure, Franklin Y. Ruan, N. Jacobson, 2022, Journal of Affective Disorders)
- Classifying and clustering mood disorder patients using smartphone data from a feasibility study(Carsten Langholm, Scott A. Breitinger, Lucy Gray, Fernando S Goes, Alex Walker, Ashley Xiong, Cindy Stopel, P. Zandi, M. Frye, J. Torous, 2023, npj Digital Medicine)
- Long-term trajectories of depressive symptoms and machine learning techniques for fall prediction in older adults: Evidence from the China Health and Retirement Longitudinal Study (CHARLS).(Xiaodong Chen, Shaowu Lin, Yixuan Zheng, Lingxiao He, Ya Fang, 2023, Archives of Gerontology and Geriatrics)
- Using Resting State Heart Rate Variability and Skin Conductance Response to Detect Depression in Adults(Lukasz Tyszczuk Smith, L. Levita, F. Amico, Jennifer Fagan, J. Yek, J. Brophy, Haihong Zhang, M. Arvaneh, 2020, 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC))
- Tracking and Monitoring Mood Stability of Patients With Major Depressive Disorder by Machine Learning Models Using Passive Digital Data: Prospective Naturalistic Multicenter Study(Ran Bai, Le Xiao, Yu Guo, Xuequan Zhu, Nanxi Li, Yashen Wang, Qinqin Chen, Lei Feng, Yinghua Wang, Xiangyi Yu, Haiyong Xie, Gang Wang, 2021, JMIR mHealth and uHealth)
- Mann-Whitney U Test Based Identification of Significant ECG Features for Improving Accuracy of Major Depressive Disorder Classification(Rupali Pawar, Prachichi Mukherji, 2025, 2025 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI))
临床决策支持系统与算法优化方法
涵盖临床应用导向的综合辅助诊断研究,以及机器学习算法在医疗场景中的通用优化(如抗对抗攻击、联邦学习、特征筛选与控制算法),旨在解决模型鲁棒性和落地可行性问题。
- Development of a differential treatment selection model for depression on consolidated and transformed clinical trial datasets(K. Perlman, J. Mehltretter, D. Benrimoh, C. Armstrong, R. Fratila, C. Popescu, Jingla-Fri Tunteng, Jérôme Williams, Colleen P. E. Rollins, G. Golden, G. Turecki, 2024, Translational Psychiatry)
- Minimal model of interictal and ictal discharges “Epileptor-2”(A. Chizhov, A. V. Zefirov, D. Amakhin, E. Y. Smirnova, A. Zaitsev, 2018, PLOS Computational Biology)
- Improving Flat Maxima with Natural Gradient for Better Adversarial Transferability(Yunfei Long, Huosheng Xu, 2026, Big Data and Cognitive Computing)
- Electroencephalogram (EEG) based prediction of attention deficit hyperactivity disorder (ADHD) using machine learning(Nitin Ahire, R. Awale, Abhay E. Wagh, 2023, Applied Neuropsychology: Adult)
- Parkinson’s Disease Prediction Using Machine Learning Models(Srinivas Kanakala, 2024, International Journal of Advances in Computer Science and Technology)
- Classification of Major Depressive Disorder Using Graph Attention Mechanism with Multi-Site rs-fMRI Data(Shiyue Su, Yicai Ning, Zijian Guo, Weifeng Yang, Manyun Zhu, Qilin Zhou, Xuan He, 2025, Neuroinformatics)
- An Efficient Network Intrusion Detection System using Machine Learning Approach with Recursive Feature Elimination(Deepak Patel, Ankit Singh, N. Chandra, Nibedita Jagadev, Smita Rath, P. Sahu, 2025, 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA))
- Model Predictive Control of a Tandem-Rotor Helicopter With a Nonuniformly Spaced Prediction Horizon(Faraaz Ahmed, Ludwik A. Sobiesiak, J. Forbes, 2022, IEEE Control Systems Letters)
- SRFE: A stepwise recursive feature elimination approach for network intrusion detection systems(Abdelaziz Alshaikh Qasem, Mahmoud H. Qutqut, F. Alhaj, Asem Kitana, 2024, Peer-to-Peer Networking and Applications)
- Positive–Unlabeled Learning-Based Hybrid Models and Interpretability for Groundwater Potential Mapping in Karst Areas(Benteng Bi, Jingwen Li, Tianyu Luo, Bo Wang, Chen Yang, Lina Shen, 2025, Water)
- Discriminating Bipolar Disorder From Major Depression Based on SVM-FoBa: Efficient Feature Selection With Multimodal Brain Imaging Data(Nan-Feng Jie, Mao-Hu Zhu, Xiao-Ying Ma, E. Osuch, M. Wammes, J. Théberge, Huan-Dong Li, Yu Zhang, Tianzi Jiang, J. Sui, V. Calhoun, 2015, IEEE Transactions on Autonomous Mental Development)
- Classification of major depressive disorder based on functional and structural MRI(Yucheng Wei, Junlong Gao, 2024, Fourth International Conference on Computer Vision and Pattern Analysis (ICCPA 2024))
- Unveiling the prevalence and risk factors of early stage postpartum depression: a hybrid deep learning approach(U. Lilhore, Surjeet Dalal, Neetu Faujdar, Sarita Simaiya, Mamta Dahiya, Shilpi Tomar, Arshad Hashmi, 2024, Multimedia Tools and Applications)
- Machine learning approaches for integrating clinical and imaging features in late‐life depression classification and response prediction(Meenal J. Patel, C. Andreescu, J. Price, Kathryn L. Edelman, C. Reynolds, H. Aizenstein, 2015, International Journal of Geriatric Psychiatry)
- From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech(Sharifa Alghowinem, Roland Göcke, M. Wagner, J. Epps, M. Breakspear, G. Parker, 2012, The Florida AI Research Society)
- Two-Stage Feature Selection Algorithm Based on an Improved Grey Predictive Evolutionary Algorithm for Depression Detection(Mengyuan Li, Shuhua Mao, 2025, 2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI))
- Digital phenotypes and digital biomarkers for health and diseases: a systematic review of machine learning approaches utilizing passive non-invasive signals collected via wearable devices and smartphones(A. Sameh, M. Rostami, M. Oussalah, R. Korpelainen, Vahid Farrahi, 2024, Artificial Intelligence Review)
- Machine Learning for Depression Detection: An Evaluation of Acoustic and sMRI Biomarkers(P. Mahale, Y. Chavhan, Niranjana Patil, T. Patil, 2025, 2025 3rd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIHEI))
- Investigating Protective and Risk Factors and Predictive Insights for Aboriginal Perinatal Mental Health: Explainable Artificial Intelligence Approach(Guanjin Wang, Hachem Bennamoun, Wai Hang Kwok, Jenny Paola Ortega Quimbayo, Bridgette Kelly, Trish Ratajczak, Rhonda Marriott, Roz Walker, Jayne Kotz, 2025, Journal of Medical Internet Research)
- A deep learning approach to analyse stress by using voice and body posture(Sumita Gupta, Sapna Gambhir, M. Gambhir, Dr. Rana Majumdar, A. Shrivastava, H. Pham, 2025, Soft Computing)
- Application of Machine Learning Technique to Distinguish Parkinson’s Disease Dementia and Alzheimer’s Dementia: Predictive Power of Parkinson’s Disease-Related Non-Motor Symptoms and Neuropsychological Profile(H. Byeon, 2020, Journal of Personalized Medicine)
- Analysis of Features Selected by a Deep Learning Model for Differential Treatment Selection in Depression(J. Mehltretter, Colleen P. E. Rollins, D. Benrimoh, R. Fratila, K. Perlman, S. Israel, Marc J. Miresco, M. Wakid, G. Turecki, 2020, Frontiers in Artificial Intelligence)
- A Novel Deep Learning Model Leveraging State-of-the-Art Audio Transformations for Robust Speech Depression Detection(Ananya Lakkaraju, 2025, 2025 5th International Conference on Electrical, Computer and Energy Technologies (ICECET))
- Automatic Detection of Major Depressive Disorder via a Bag-of-Behaviour-Words Approach(Kun Qian, Hiroyuki Kuromiya, Zhao Ren, Maximilian Schmitt, Zixing Zhang, Toru Nakamura, K. Yoshiuchi, Björn Schuller, Yoshiharu Yamamoto, 2019, Proceedings of the Third International Symposium on Image Computing and Digital Medicine)
- An evolutionary approach for depression detection from Twitter big data using a novel deep learning model with attention based feature learning mechanism(P. K, K. V, 2024, Automatika)
- Using Machine Learning Methods to Search for EEG and Genetic Markers of Depressive Disorder(K. Zorina, A. Kriveckiy, Darya Klemeshova, A. Bocharov, V. Karmanov, 2025, 2025 IEEE 26th International Conference of Young Professionals in Electron Devices and Materials (EDM))
- Implementing Sentiment Analysis with Applied Nonlinear Analysis: Predicting Mental Disorders from Diverse Online Social Network Datasets(Mr Sushant.A, Patinge, Dr. Vijaya.K.Shandilya, 2025, Advances in Nonlinear Variational Inequalities)
- Optimal Trajectory Planning and Model Predictive Control of Underactuated Marine Surface Vessels using a Flatness-Based Approach(Max Lutz, T. Meurer, 2021, 2021 American Control Conference (ACC))
- A Lyapunov-Based Framework for Trajectory Planning of Wheeled Vehicle Using Imitation Learning(Jialun Lai, Zongze Wu, Zhigang Ren, Ci Chen, Qi Tan, Shengli Xie, 2025, IEEE Transactions on Automation Science and Engineering)
- NOMA System Performance Improvement Using Chaos and Deep Learning(Huiwen Yin, H. Ren, 2025, IEEE Transactions on Circuits and Systems I: Regular Papers)
- Optimization Based Deep Learning for COVID-19 Detection Using Respiratory Sound Signals(J. Dar, K. Srivastava, S. A. Lone, 2024, Cognitive Computation)
- A Machine Learning Approach for Path Loss Prediction Using Combination of Regression and Classification Models(I. Iliev, Y. Velchev, Peter Z. Petkov, B. Bonev, Georgi Iliev, Ivaylo Nachev, 2024, Sensors)
- Neural Network-Based Dynamic Stress Prediction Method for Bladed Disks(Yan Jiang, Houxin She, Chaoping Zang, Daorui Bai, Tong Jing, 2025, 2025 International Conference on Mechatronics, Robotics, and Artificial Intelligence (MRAI))
- Online Payment Fraud Detection Optimization with XG Boost and Recursive Feature Elimination(J. Wibowo, Budi Hartono, Veronica Lusiana, 2024, Journal of Software Engineering and Simulation)
- Channel Interpolation of Fading Channels and the Pilot Density Required for Predictor Antennas(Joachim Björsell, Mikael Sternad, Dinh-Thuy Phan-Huy, Michael Grieger, 2024, IEEE Transactions on Vehicular Technology)
- Efficacy of novel attention-based gated recurrent units transformer for depression detection using electroencephalogram signals(Neha Prerna Tigga, S. Garg, 2022, Health Information Science and Systems)
- Detection of Clinical Depression in Adolescents’ Speech During Family Interactions(L. Low, N. Maddage, M. Lech, L. Sheeber, N. Allen, 2011, IEEE Transactions on Biomedical Engineering)
- Predicting Levels of Depression and Anxiety in People with Neurodegenerative Memory Complaints Presenting with Confounding Symptoms(Dalia Attas, Bahman Mirheidari, Daniel Blackburn, A. Venneri, T. Walker, K. Harkness, M. Reuber, C. Blackmore, H. Christensen, 2021, Lecture Notes in Networks and Systems)
- PyBird-JAX: Accelerated inference in large-scale structure with model-independent emulation of one-loop galaxy power spectra(Alexander Reeves, Pierre Zhang, Henry Zheng, 2025, Journal of Cosmology and Astroparticle Physics)
- Optimizing Facial Expression and Head Dynamics Data Processing to Enhance Depression Detection with Cutting-Edge AI Models(Aldo Januansyah, Rifa Amril Sahputra, Muhammad Adhiguna Hasibuan, Muhammad Yudya Ananda Hasibuan, Faiz Syukri Arta, Muhammad Fikry, 2025, 2025 International Conference on Activity and Behavior Computing (ABC))
- Application of Mind Evolutionary Algorithm and Artificial Neural Networks for Prediction of Profile and Flatness in Hot Strip Rolling Process(Zhenhua Wang, G. Ma, D. Gong, Jie Sun, Dian-hua Zhang, 2019, Neural Processing Letters)
- Depression Detection Using Machine Learning(V. Salunkhe, Harshada Gangane, Shrutika Awlankar, Kshitija Bhosale, Prof. Govind Pole, Prof. Gopal Deshmukh, U. Student, 2025, 2025 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI))
- Parametric and non-parametric modeling of short-term synaptic plasticity. Part II: Experimental study(D. Song, Zhuo Wang, V. Marmarelis, T. Berger, 2009, Journal of Computational Neuroscience)
- Improved rank-based recursive feature elimination method based ovarian cancer detection model via customized deep architecture(Namani Deepika Rani, M. Babu, 2024, Computer Methods and Programs in Biomedicine)
- Explainable depression detection from low-resource languages using CNN-BiLSTM with deep attention mechanism(Nurul Absar, Md. Mahbubul Islam, Zannatun Naim Somaya, 2025, Machine Learning for Computational Science and Engineering)
- Multi-Disease Prediction System Using Machine Learning(Vasepalli Kamakshamma, G. Sumanth, K. Reddy, B. Pallavi, C. Varshini, 2025, Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies)
- Depression detection using cascaded attention based deep learning framework using speech data(Sachi Gupta, Gaurav Agarwal, Shivani Agarwal, Dilkeshwar Pandey, 2024, Multimedia Tools and Applications)
- Reflectance modeling by neural texture synthesis(M. Aittala, Timo Aila, J. Lehtinen, 2016, ACM Transactions on Graphics)
- Data-Driven Model Predictive Control of a Nonlinear Ball-on-a- Wheel System(Niklas Kruse, A. Wache, Harald Aschemann, Jens Starke, 2024, 2024 American Control Conference (ACC))
- Deep Transformation Learning for Depression Diagnosis from Facial Images(Yajun Kang, Xiao Jiang, Ye Yin, Yuanyuan Shang, Xiuzhuang Zhou, 2017, Lecture Notes in Computer Science)
- Detection of High-Risk Depression Groups Based on Eye-Tracking Data(Simeng Lu, Shen Huang, Yun Zhang, Xiujuan Zheng, D. Miao, Jiajun Wang, Z. Chi, 2020, Lecture Notes in Computer Science)
- Convolution Neural Network Having Multiple Channels with Own Attention Layer for Depression Detection from Social Data(Sumit Dalal, Sarika Jain, M. Dave, 2023, New Generation Computing)
- A Multimodal Framework for Prognostic Modelling of Mental Health Treatment and Recovery Trajectories(Harold Ngabo-Woods, Larisa Dunai, I. Verdú, Sui Liang, 2026, Applied Sciences)
- Exploring Machine Learning and Language Models for Multimodal Depression Detection(Javier Si Zhao Hong, Timothy Zoe Delaya, Sherwyn Chan Yin Kit, Pai Chet Ng, Xiaoxiao Miao, 2025, 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC))
- Simulated depression risk classification from Parkinson’s voice features using a self-attention-enhanced MLP architecture(Nalineekumari Arasavali, Mohammed Ashik, Vaddadi Nirmal, Mogadala Vinod Kumar, U. Siddaraj, 2026, Scientific Reports)
- Integrating Hyperspectral, Thermal, and Ground Data with Machine Learning Algorithms Enhances the Prediction of Grapevine Yield and Berry Composition(S. Jewan, Deepak Gautam, Debbie Sparkes, Ajit Singh, L. Billa, A. Cogato, Erik H. Murchie, Vinay Pagay, 2024, Remote Sensing)
本次调研系统梳理了基于非扁平数据输入的机器学习抑郁症预测研究,通过多模态数据融合实现诊断效能的提升,深入挖掘神经影像与脑电生理信号的病理特征,并结合纵向行为监测和临床辅助决策方法,推动了从症状主观评估向数字化客观预测的范式转型。
总计177篇相关文献
Relative limb movement is an important feature in assessing depression. In this study, we looked into whether a skeleton-mimetic task using natural stimuli may help people recognize depression. We innovatively used Kinect V2 to collect participant data. Sequential skeletal data was directly extracted from the original Kinect-3D and tetrad coordinates of the participant's 25 body joints. Two constructed skeletal datasets of whole-body joints (including binary classification and multi classification) were input into the proposed model for depression recognition after data preparation. We improved the temporal convolution network (TCN), creating novel spatial attention dilated TCN (SATCN) network that included a hierarchy of temporal convolution groups with different dilated convolution scales to capture important skeletal features and a spatial attention block for final result prediction. The depression and non-depression groups can be classified automatically with a maximum accuracy of 75.8% in the binary classification task, and 64.3% accuracy in the multi classification dataset to recognize more fine-grained identification of depression severity, according to experimental results. Our experiments and methods based on Kinect V2 can not only identify and screen depression patients but also effectively observe the recovery level of depression patients during the recovery process. For example, in the change from severe depression to moderate or mild depression multi classification dataset.
No abstract available
Depression, a leading global mental health concern, is often underdiagnosed due to subjective and time-intensive clinical evaluations. This paper proposes a proactive, Machine Learning-based depression detection system that leverages facial expression analysis to enable early and accessible mental health screening. The proposed model utilizes Convolutional Neural Networks (CNNs) trained on standardized grayscale facial images to identify emotional states such as sadness, fear, and anger, which are then mapped to a binary classification of “Depressed” or “Not Depressed.” Image preprocessing techniques—including grayscale conversion, Contrast Limited Adaptive Histogram Equalization (CLAHE), and normalization—are applied to enhance image quality and classification accuracy. The CNN model's performance is evaluated against other configurations including PCA+CNN, CNN+SVM, SVM, and PCA+SVM to benchmark its effectiveness. The best-performing model is deployed within a Flask-based web application that delivers real-time, userfriendly predictions with responsive design, making it suitable for remote or underserved regions. Results indicate strong performance in detecting key expressions, though challenges remain in accurately distinguishing between visually similar emotions. Future improvements will explore the integration of multimodal inputs (e.g., voice and text), local inference to enhance privacy, and Explainable AI (XAI) for improved transparency and clinical interpretability. This work highlights the potential of Machine Learning to serve as a scalable, non-invasive solution for early mental health screening.
No abstract available
During depression neurophysiological changes can occur, which may affect laryngeal control i.e. behaviour of the vocal folds. Characterising these changes in a precise manner from speech signals is a non trivial task, as this typically involves reliable separation of the voice source information from them. In this paper, by exploiting the abilities of CNNs to learn task-relevant information from the input raw signals, we investigate several methods to model voice source related information for depression detection. Specifically, we investigate modelling of low pass filtered speech signals, linear prediction residual signals, homomorphically filtered voice source signals and zero frequency filtered signals to learn voice source related information for depression detection. Our investigations show that subsegmental level modelling of linear prediction residual signals or zero frequency filtered signals leads to systems better than the state-of-the-art low level descriptor based systems and deep learning based systems modelling the vocal tract system information.
One of the main factors causing suicide is depression. However, many cases of depression go undiagnosed because they are not correctly diagnosed. An increasing number of people with mental illnesses express their emotions online using tools like social media (SM) and specialized websites. Recently, efforts have been made to use Machine Learning (ML) and deep learning (DL) models to predict depression from SM platforms. However, it is problematic that most ML algorithms now provide no explanation. As a result, this study proposes a novel Deep Learning (DL) model called residual network 50, which includes optimal long short-term memory (RNT-OLSTM) for Depression Detection (DD) on Twitter data. In addition, to address the issue of data imbalance in the Twitter data, a cluster-based oversampling approach is used, which considerably reduces the possibility of bias towards the dominant class (non-depressed).. Finally, the embedding layers are inputted to RNT-OLSTM for DD, in which the hyperparameters of the network are tuned using the Sine Chaotic map and constriction factor-based Coyote Optimization Algorithm (SCCOA) to minimize the prediction loss. The out-comes prove that the proposed system performs better than the existing schemes for the DD of imbalanced Twitter data with higher detection rates.
No abstract available
Depression is a prevalent mental health condition that impacts a considerable proportion of the world's population. However, early detection and intervention are crucial for effective treatment. Current depression detection models often rely on a single modality, either speech or text data, which may lead to inaccurate results due to the limited information being considered. In this paper, we propose a novel approach for the accurate detection of depression in individuals using both speech and text data. This approach addresses the limitations of existing solutions by combining objective measures of speech and text data, providing a non-intrusive and efficient means for depression detection. In this project, two models were trained: a speech emotion recognition model using Long Short-Term Memory (LSTM) and a text model based on the Random Forest Method. The TESS dataset was used to train the speech emotion recognition model, whereas the text model was trained on the Twitter dataset, which is accessible on Kaggle. The results indicate that our proposed approach is highly effective in detecting depression, with an accuracy of 98% achieved by the speech emotion recognition model and 95% by the depression detection model on the testing data. The outputs of both models are combined using a decision tree method, resulting in an accuracy of 100%. The proposed method employs a Decision Tree algorithm that takes the output of both the Speech Emotion Recognition (SER) model and the text-based depression detection model as inputs and applies a set of rules to classify users into one of three categories: depressed, mildly depressed, or not depressed. Then a webpage was developed where users can input their speech and text data and receive a prediction of their depression status based on the integrated output of both models. Overall, the results demonstrate that the proposed system provides an assuring solution for the early diagnosis of depressive symptoms.
Diagnosis of psychiatric disorders based on brain imaging data is highly desirable in clinical applications. However, a common problem in applying machine learning algorithms is that the number of imaging data dimensions often greatly exceeds the number of available training samples. Furthermore, interpretability of the learned classifier with respect to brain function and anatomy is an important, but non-trivial issue. We propose the use of logistic regression with a least absolute shrinkage and selection operator (LASSO) to capture the most critical input features. In particular, we consider application of group LASSO to select brain areas relevant to diagnosis. An additional advantage of LASSO is its probabilistic output, which allows evaluation of diagnosis certainty. To verify our approach, we obtained semantic and phonological verbal fluency fMRI data from 31 depression patients and 31 control subjects, and compared the performances of group LASSO (gLASSO), and sparse group LASSO (sgLASSO) to those of standard LASSO (sLASSO), Support Vector Machine (SVM), and Random Forest. Over 90% classification accuracy was achieved with gLASSO, sgLASSO, as well as SVM; however, in contrast to SVM, LASSO approaches allow for identification of the most discriminative weights and estimation of prediction reliability. Semantic task data revealed contributions to the classification from left precuneus, left precentral gyrus, left inferior frontal cortex (pars triangularis), and left cerebellum (c rus1). Weights for the phonological task indicated contributions from left inferior frontal operculum, left post central gyrus, left insula, left middle frontal cortex, bilateral middle temporal cortices, bilateral precuneus, left inferior frontal cortex (pars triangularis), and left precentral gyrus. The distribution of normalized odds ratios further showed, that predictions with absolute odds ratios higher than 0.2 could be regarded as certain.
Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones' large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction.
No abstract available
Over 300 million people worldwide are affected by depression, with symptoms that have a major impact on patients and, in the worst cases, can lead to suicide. As the severity of the disease increases over time, early detection can save a patient’s life. The disease is diagnosed by professionals using questionnaires that might be influenced by biases, and of which the accuracy and reliability are not guaranteed. For this reason, an increasing number of studies are looking at physiological ways of detecting the disease, with electroencephalogram-based machine learning prediction models having been successful in recent years. However, the focus is not on the early detection of mild depression, which can be the entry point to major depression. In this work, we developed a deep learning based model using a 1D convolutional neural network to detect mild depression in resting-state electroencephalogram data. We evaluated the model using a realistic world-like dataset and were able to achieve a balanced accuracy of 69.21%. With this result, we are setting a new benchmark for resting-state-based early detection. Due to the low level of preprocessing and the associated fast computing time and low computational intensity, our innovative approach can serve as a basis for applications in the real world. This enables patients with suitable hardware to recognize the disease themselves at an early stage and thus receive timely treatment to prevent further development.
No abstract available
Depression is a prevalent mental health disorder, and early detection is crucial for effective intervention. Recent advancements in eye-tracking technology and machine learning offer new opportunities for non-invasive diagnosis. This study aims to assess the performance of different machine learning algorithms in. predicting depression in a young sample using eye-tracking metrics. Eye-tracking data from 139 participants were recorded with an emotional induction paradigm in which each participant observed a set of positive and negative emotional stimuli. The data were analyzed to find differences between groups, where the most significant features were selected to train prediction models. The dataset was then split into training and testing sets using stratified sampling. Four algorithms support vector machines (SVM), random forest (RF), a multi-layer perceptron (MLP) neural network, and gradient boosting (GB) were trained with hyperparameter optimization and 5-fold cross-validation. The RF algorithm achieved the highest accuracy at 84%, followed by SVM, GB, and the MLP neural network. Performance metrics such as accuracy, recall, F1-score, precision recall area under the curve (PR-AUC), and Matthews Correlation Coefficient (MCC) were also used to evaluate the models. The findings suggest that eye-tracking metrics combined with machine learning algorithms can effectively identify depressive symptoms in the young, indicating their potential as non-invasive diagnostic tools in clinical settings.
Depression is a widespread mental health disorder characterized by persistent sadness, lack of interest, and cognitive impairment, affecting millions of people worldwide. Early detection is crucial but is often hindered by limited access to mental health professionals. This paper aims to automatically classify individuals as depressed or non-depressed based on the PHQ-8 scale using multimodal data from the DAIC-WOZ dataset, particularly the CLNF scores and the interview transcripts. Participants were labelled as depressed (PHQ-8 ≥ 10) or non-depressed (PHQ-8 < 10), forming a binary classification task. Facial behaviour was captured using CLNF-derived Action Unit (AU) features and modelled with an AdaBoost classifier. Conversational text data was processed using RoBERTa embeddings and modelled through a Transformer-based Multiple Instance Learning (MIL) architecture. To leverage the complementary strengths of both modalities, a late fusion strategy was applied by combining output probabilities from both models. Experimental results on development and test sets indicate that the fused model outperforms individual modalities, highlighting the effectiveness of multimodal learning for depression detection. This study demonstrates the practical viability of integrating facial and linguistic cues for mental health screening and offers a scalable, non-invasive tool to support early intervention in clinical and remote settings.
Abstract Depression is a serious mental state that negatively impacts thoughts, feelings, and actions. Social media use is rapidly growing, with people expressing themselves in their regional languages. In Pakistan and India, many people use Roman Urdu on social media. This makes Roman Urdu important for predicting depression in these regions. However, previous studies show no significant contribution in predicting depression through Roman Urdu or in combination with structured languages like English. The study aims to create a Roman Urdu dataset to predict depression risk in dual languages [Roman Urdu (non‐structural language) + English (structural language)]. Two datasets were used: Roman Urdu data manually converted from English on Facebook, and English comments from Kaggle. These datasets were merged for the research experiments. Machine learning models, including Support Vector Machine (SVM), Support Vector Machine Radial Basis Function (SVM‐RBF), Random Forest (RF), and Bidirectional Encoder Representations from Transformers (BERT), were tested. Depression risk was classified into not depressed, moderate, and severe. Experimental studies show that the SVM achieved the best result with anaccuracy of 0.84% compared to existing models. The presented study refines thearea of depression to predict the depression in Asian countries.
Depression has the potential to impact death rates, particularly when it comes to death by suicide. Inadequate diagnosis may result in a delay or unsuitable therapy, which can worsen symptoms of depression. Unaddressed or insufficiently addressed depression can result in deteriorating mental well-being, which includes a higher risk of experiencing suicidal ideation and engaging in self-destructive actions. Voice analysis can be employed to distinguish between individuals with depression and those without depression. However, the research that has been conducted on voice recognition to distinguish between depressed and non-depressed individuals keeps focusing on the use of a single dataset source as input for the classification model that is being developed. This work utilizes the Vision Transformer model and various pre-trained transformers, such as DeiT (Data-Efficient Image Transformer) and Swin Transformer, to detect depression based on voice. The objective was to create a more generic model not restricted to a specific language. Integrating Mel-spectrogram characteristics with a vision transformer-based model can enhance the efficacy of voice recognition models when dealing with multi-language data. The result shows 21% higher accuracy than the previous study that also implemented a cross-dataset test.
Depression, a severe mental health disorder often linked to suicide, can be detected through social media content. However, analyzing depression-related signs is challenging due to the informal, brief, and varied nature of posts. To tackle this, this study presents a Hierarchical Attention Network (HAN) combined with DistilBERT to identify depression indicators in social media posts. Through the utilization of Hierarchical Attention Network (HAN) architecture that incorporates word-level attention using DistilBERT and sentence-level attention through a customized layer, this model significantly enhances its capability to distinguish between depressed and non-depressed posts with improved accuracy. Performance metrics such as accuracy, recall, precision, and F1 score are used to evaluate the model and the results demonstrate an outstanding accuracy of 96%. The integration of hierarchical attention with transformer-based embeddings has proven to make this model a powerful and reliable solution for detecting depression in social media posts.
By improving the accuracy of depression recognition and designing a consumer-oriented depression detection system, consumers are expected to receive convenient and fast e-health services. Recently, depression recognition methods based on the analysis of physiological and behavioral data have attracted attention. In particular, the research on Electroencephalography (EEG) and speech signals becomes hotspots. However, EEG is susceptible to individual differences, while speech signal is susceptible to environmental factors. In this study, we propose an auxiliary decision-making system for depression detection that considers both physiological and behavioral factors by fusing EEG and speech signals. Compared to existing studies, our proposed multi-modal fusion strategy exploits more linear and nonlinear features to support the recognition of task classifications. In addition, we analyze the functional connectivity of brain regions to facilitate EEG feature extraction. Considering the non-stationary feature of EEG and speech signals, we perform filtering, artifact processing, and time-frequency domain processing. Furthermore, we integrate the EEG and speech signals at the feature level and train their classification. Performance evaluation results show that our proposed multi-modal feature fusion strategy achieves 86.11% accuracy on the dataset of major depressive disorders, and 87.44% recognition accuracy on the healthy controls.
Depression is a severe mental health issue. The user‐generated content on social media (SM) is growing nowadays. Some computational approaches have been proposed for detecting depression based on users' SM data. However, because of the use of formal language, short range of words and misspellings in the SM data, depression detection (DD) is a challenging task. This paper proposes a novel deep learning (DL) technique for performing DD of the SM data with the help of the hybrid feature selection (FS) mechanism. Initially, two publicly available datasets containing user tweets are collected for implementing the proposed research model. Then the collected datasets are preprocessed for further processing. The preprocessing phase includes critical processes that contribute to creating a ready‐to‐use dataset for training and testing. After preprocessing, the preprocessed data is divided into prime and non‐prime words based on the dictionary approach. After that, the hybrid FS approach is implemented to select the most relevant features from the prime and non‐prime words for higher classification accuracy (AC). In the hybrid model, firstly Term Frequency Inverse Document Frequency integrated Modified Information Gain (TFIDF‐MIG) approach is proposed that assigns the score value of each prime and non‐prime word in the dataset. Secondly, optimal features are selected from the weighted features using the Improved Elephant Herding Algorithm (IEHA). Finally, the decided features from the hybrid model are fed into the DL model, namely attention included improved ReLU‐based Convolution Neural Network with Long Short‐Term Memory (AIRCNN‐LSTM) for DD. Experiments are performed on the collected datasets to assess the proposed model's performance efficiency. The results of the extensive experiments show that the presented work outperforms existing techniques regarding DD classification AC by locating the best solutions. At the same time, it reduces the number of features chosen.
No abstract available
Depression affects emotional expression and perception. As a non-invasive and privacy-preserving method, speech is widely used for automatic depression detection. However, existing models often focus only on depressive features in speech, ignoring the differential emotion expression patterns across different emotion-inducing tasks. To address this, we propose an emotion-guided graph attention network (emoGAT) for depression detection. By collecting speech-text data from depressed individuals and healthy controls during emotion-inducing tasks, we construct graph embeddings using sentiment cues from both speech and text. Experimental results show our method reduces the standard deviation by 1.8% and improves accuracy by 4.36%. Graph attention visualization also reveals depression-specific characteristics, such as flattened prosody in neutral picture description tasks and cognitive biases toward negative information, offering deeper insights into emotional relational expressions.
Depression remains a leading cause of suicide among college students, highlighting the need for effective and scalable screening methods. Internet usage behavior has shown strong potential for identifying depressive tendencies, but privacy concerns limit its practical use. In this study, we propose a privacy-conscious cross-scale adaptive transformer designed for irregular time series data derived from weakly private online behavior, such as application categories and usage patterns, while excluding content-sensitive or personally identifiable information. Our model incorporates an adaptive sampling strategy to unify temporal resolutions and uses a cross-scale attention mechanism to capture depression-related behavioral patterns. We compared several classic models for irregular time series data, and the proposed method outperformed them, offering a promising, non-intrusive approach for depression detection based on privacy-conscious online activity patterns.
Depression is a widespread yet often undiagnosed mental health issue due to the limitations of traditional depression detection methods. Speech-based depression detection offers a non-invasive alternative, but its development is affected by data scarcity, class imbalance, and privacy concerns. This paper investigates synthetic data augmentation to improve depression detection from speech. In Experiment 1, real speech is segmented, and traditional features like MFCC, ZCR, and Spectral Centroid are extracted to train classical machine learning models (SVM, RF, LR). In Experiment 2, deep audio embeddings from wav2vec 2.0 are summarized and used with the same classifiers in addition to XGBoost. Synthetic speech is introduced to balance the dataset, and its quality is assessed using Frechet Audio Distance (FAD). ${}^{\prime}$ Results show that synthetic data can improve model performance when carefully integrated. SVM achieved the best performance in Experiment 1, with 0.661 accuracy and a macro F1-score of 0.617. While synthetic data isn't a replacement for real data, it presents a practical and scalable solution for reducing data scarcity, especially in domains where privacy is a concern, like the medical domain.
Depression detection through speech signal analysis has emerged as a promising non-invasive method for assessing mental health conditions. Although deep learning approaches have demonstrated significant success in this domain, they raise critical concerns regarding patient privacy. To address this issue, we propose a multi-task learning framework for depression detection, which enhances performance by incorporating speaker disentanglement during pre-training. The proposed framework concurrently addresses depression-related speech patterns and speaker characteristics through independent tasks, thereby improving both robustness and performance of the model. By leveraging large-scale speech data during pre-training, the model effectively disentangles depression-specific features from speaker-specific attributes, leading to more reliable and efficient detection. Experimental results on the DAIC-WOZ benchmark demonstrate the superiority of the proposed approach in depression detection tasks. What’s more, the ablation study further highlights the benefits of combining speaker identity disentanglement with multitask learning, confirming its potential to advance automated depression screening. These findings underscore the importance of integrating sophisticated learning paradigms in AI-driven mental health analysis.
With rising global depression rates, there is an urgent need for scalable and accurate automated screening tools. Existing automated methods often rely on single data modalities or fail to effectively manage the high dimensionality of combined data streams. This research introduces a multimodal depression detection framework that integrates audio, video, and text features. The core contribution is the application of Principal Component Analysis (PCA) on a fused feature vector to eliminate redundancy and extract the most informative signals. Evaluated on the DAIC-WOZ benchmark dataset, our approach, using a Logistic Regression classifier, achieves an F1-score of 98.0 %, significantly outperforming unimodal, bimodal, and non-PCA-enhanced multimodal configurations. This result demonstrates that a lightweight, interpretable model can surpass more complex architectures when paired with robust feature fusion and dimensionality reduction, offering a promising tool for real-world clinical use.
Depression is a mental health disorder that can be identified through visuals shared on social media platforms. This study proposes a multimodal depression detection model that combines text and image features using a hybrid LSTM + GRU architecture with FastText feature expansion. The data used in this study consists of 24,779 multimodal data collected from the social media platform X, consisting of 12,279 depressed and 12,500 non-depressed individuals. Text was processed using TF-IDF feature extraction and enriched through FastText expansion with three similarity levels (Top 1, Top 5, and Top 10) built from the Tweet, IndoNews, and Tweet+IndoNews, while image features were extracted using ResNet50. This experiment was conducted in four scenarios, including baseline, feature expansion, optimization, and ensemble learning. The results show that the LSTM + GRU hybrid model achieved the best performance, increasing accuracy from 77.07% in the baseline to 79.30% after applying FastText expansion, optimizer adjustment, and ensemble strategies. These findings indicate that semantic enrichment through FastText significantly improves text representation, while the hybrid LSTM + GRU model is also effective at capturing short-term and long-term dependencies. The multimodal architecture developed demonstrates stable performance, offering a promising approach for early detection of depression on social media platforms.
No abstract available
Human accuracy in diagnosing psychiatric disorders is still low. Even though digitizing health care leads to more and more data, the successful adoption of AI-based digital decision support (DDSS) is rare. One reason is that AI algorithms are often not evaluated based on large, real-world data. This research shows the potential of using deep learning on the medical claims data of 812,853 people between 2018 and 2022, with 26,973,943 ICD-10-coded diseases, to predict depression (F32 and F33 ICD-10 codes). The dataset used represents almost the entire adult population of Estonia. Based on these data, to show the critical importance of the underlying temporal properties of the data for the detection of depression, we evaluate the performance of non-sequential models (LR, FNN), sequential models (LSTM, CNN-LSTM) and the sequential model with a decay factor (GRU-Δt, GRU-decay). Furthermore, since explainability is necessary for the medical domain, we combine a self-attention model with the GRU decay and evaluate its performance. We named this combination Att-GRU-decay. After extensive empirical experimentation, our model (Att-GRU-decay), with an AUC score of 0.990, an AUPRC score of 0.974, a specificity of 0.999 and a sensitivity of 0.944, proved to be the most accurate. The results of our novel Att-GRU-decay model outperform the current state of the art, demonstrating the potential usefulness of deep learning algorithms for DDSS development. We further expand this by describing a possible application scenario of the proposed algorithm for depression screening in a general practitioner (GP) setting-not only to decrease healthcare costs, but also to improve the quality of care and ultimately decrease people's suffering.
Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection concerns, that data in this area is still scarce, presenting a challenge for the deep discriminative models used in detecting depression. To navigate these obstacles, a large-scale multimodal vlog dataset (LMVD), for depression recognition in the wild is built. In LMVD, which has 1823 samples with 214 hours of the 1475 participants captured from four multimedia platforms (Sina Weibo, Bilibili, Tiktok, and YouTube). A novel architecture termed MDDformer to learn the non-verbal behaviors of individuals is proposed. Extensive validations are performed on the LMVD dataset, demonstrating superior performance for depression detection. We anticipate that the LMVD will contribute a valuable function to the depression detection community. The data and code will released at the link: https://github.com/helang818/LMVD/.
Mental health such as depression issues are on an unfortunate rise. Analyzing signals generated from sensors using machine learning (ML) techniques has shown promise as an objective, holistic, and cost-effective tool for assessing human mental conditions. However, many sensors often expose individuals’ Personally Identifiable Information (PII). In this work-in-progress study, we explore the performance of ML-based models for depression detection using only non-PII-releasing sensors. We compared the performance of prevailing ML models on the StudentLife dataset, which contains sensor signal data from smartphones for 48 students across 65 days. Our findings suggest that Decision Tree, Gradient Boosting, and Logistic Regression show higher predictive power even with limited non-PII-releasing sensor data. Additionally, we found that certain combinations of non-PII releasing sensors, such as surrounding conversation and phone lock frequency, help ML classifiers attain almost perfect performance across accuracy, recall, precision, and F1. This study has implications for a privacy-preserving approach for detecting depression.
Major depressive disorder (MDD) represents a significant global health challenge, yet its diagnosis predominantly relies on subjective clinical assessments. This paper introduces a comprehensive framework for augmenting depression diagnostics through objective, data-driven methods. In this study, we apply machine learning to test two basic classification algorithms: support vector machines (SVM) and decision trees and then proceeded to higher algorithms like Exponential African Vulture Optimization Algorithm (ExpAVOA) with Deep Max Out. The aim is to detect depression from measurable biomarkers. This is important because, in practice, clinicians often depend on structured interviews and rating scales. Common examples include the Beck Depression Inventory (BDI) and Hamilton Rating Scale for Depression (HRSD). Study focuses on two promising and non-invasive data modalities: acoustic patterns derived from speech and neuroanatomical attributes extracted from structural magnetic resonance imaging (sMRI). We present a detailed analysis of the end-to-end pipeline for each modality, encompassing critical stages of data preprocessing, feature engineering, and robust model evaluation. This work addresses key computational challenges, including the extraction of discriminative features from complex audio signals and overcoming the “curse of dimensionality” inherent in neuroimaging data through systematic feature selection. By evaluating model performance using metrics suitable for imbalanced clinical datasets, this paper provides a comparative analysis of classifier suitability for this critical application. The findings underscore the potential for these computational tools to serve as powerful diagnostic aids, augmenting clinical judgment with quantitative evidence to improve detection rates and mitigate subjective bias in mental healthcare.
No abstract available
No abstract available
As the field of deep learning has grown in recent years, its application to the domain of raw resting-state electroencephalography (EEG) has also increased. Relative to traditional machine learning methods or deep learning methods applied to manually engineered features, there are fewer methods for developing deep learning models on small raw EEG datasets. One potential approach for enhancing deep learning performance, in this case, is the use of transfer learning. While a number of studies have presented transfer learning approaches for manually engineered EEG features, relatively few approaches have been developed for raw resting-state EEG. In this study, we propose a novel EEG transfer learning approach wherein we first train a model on a large publicly available single-channel sleep stage classification dataset. We then use the learned representations to develop a classifier for automated major depressive disorder diagnosis with raw multichannel EEG. Statistical testing reveals that our approach significantly improves the performance of our model (p < 0.05), and we also find that the performance of our approach exceeds that of many previous studies using both engineered features and raw EEG. We further examine how transfer learning affected the representations learned by the model through a pair of explainability analyses, identifying key frequency bands and channels utilized across models. Our proposed approach represents a significant step forward for the domain of raw resting-state EEG classification and has broader implications for use with other electrophysiology and time-series modalities. Importantly, it has the potential to expand the use of deep learning methods across a greater variety of raw EEG datasets and lead to the development of more reliable EEG classifiers.
Major Depressive Disorder (MDD) affects a large portion of the population and levies a huge societal burden. It has serious consequences like decreased productivity and reduced quality of life, hence there is considerable interest in understanding and predicting it. As it is a mental disorder, neural measures like EEG are used to study and understand its underlying mechanisms. However most of these studies have either explored resting state EEG (rs-EEG) data or task-based EEG data but not both, we seek to compare their respective efficacy. We work with data from non-clinically depressed individuals who score higher and lower on the depression scale and hence are more and less vulnerable to depression, respectively. Forty participants volunteered for the study. Questionnaires and EEG data were collected from participants. We found that people who are more vulnerable to depression had on average increased EEG amplitude in the left frontal channel, and decreased amplitude in the right frontal and occipital channels for raw data (rs-EEG). Task-based EEG data from a sustained attention to response task used to measure spontaneous thinking, an increased EEG amplitude in the central part of the brain for individuals with low vulnerability and an increased EEG amplitude in right temporal, occipital and parietal regions in individuals more vulnerable to depression were found. In an attempt to predict vulnerability (high/low) to depression, we found that a Long Short Term Memory model gave the maximum accuracy of 91.42% in delta wave for task-based data whereas 1D-Convolution neural network gave the maximum accuracy of 98.06% corresponding to raw rs-EEG data. Hence if one has to look at the primary question of which data will be good for predicting vulnerability to depression, rs-EEG seems to be better than task-based EEG data. However, if mechanisms driving depression like rumination or stickiness are to be understood, task-based data may be more effective. Furthermore, as there is no consensus as to which biomarker of rs-EEG is more effective in the detection of MDD, we also experimented with evolutionary algorithms to find the most informative subset of these biomarkers. Higuchi fractal dimension, phase lag index, correlation and coherence features were also found to be the most important features for predicting vulnerability to depression using rs-EEG. These findings bring up new possibilities for EEG-based machine/deep learning diagnostics in the future.
INTRODUCTION Schizophrenia and Major Depressive Disorder (MDD) are highly burdensome mental disorders, with significant cost to both individuals and society. Despite these disorders representing distinct clinical categories, they are each heterogenous in their symptom profiles, with considerable transdiagnostic features. Although movement and sleep abnormalities exist in both disorders, little is known of the precise nature of these changes longitudinally. Passively-collected longitudinal data from wearable sensors is well suited to characterize naturalistic features which may cross traditional diagnostic categories (e.g., highlighting behavioral markers not captured by self-report information). METHODS The present analyses utilized raw minute-level actigraphy data from three diagnostic groups: individuals with schizophrenia (N = 23), individuals with depression (N = 22), and controls (N = 32), respectively, to interrogate naturalistic behavioral differences between groups. Subjects' week-long actigraphy data was processed without diagnostic labels via unsupervised machine learning clustering methods, in order to investigate the natural bounds of psychopathology. Further, actigraphic data was analyzed across time to determine timepoints influential in model outcomes. RESULTS We find distinct actigraphic phenotypes, which differ between diagnostic groups, suggesting that unsupervised clustering of naturalistic data aligns with existing diagnostic constructs. Further, we found statistically significant inter-group differences, with depressed persons showing the highest behavioral variability. LIMITATIONS However, diagnostic group differences only consider biobehavioral trends captured by raw actigraphy information. CONCLUSIONS Passively-collected movement information combined with unsupervised deep learning algorithms shows promise in identifying naturalistic phenotypes in individuals with mental health disorders, specifically in discriminating between MDD and schizophrenia.
The application of deep learning classifiers to resting-state electroencephalography (rs-EEG) data has become increasingly common. However, relative to studies using traditional machine learning methods and extracted features, deep learning methods are less explainable. A growing number of studies have presented explainability approaches for rs-EEG deep learning classifiers. However, to our knowledge, no approaches give insight into spatio-spectral interactions (i.e., how spectral activity in one channel may interact with activity in other channels). In this study, we combine gradient and perturbation-based explainability approaches to give insight into spatio-spectral interactions in rs-EEG deep learning classifiers for the first time. We present the approach within the context of major depressive disorder (MDD) diagnosis identifying differences in frontal δ activity and reduced interactions between frontal electrodes and other electrodes. Our approach provides novel insights and represents a significant step forward for the field of explainable EEG classification.
A less-invasive method for the diagnosis of the major depressive disorder can be useful for both the psychiatrists and the patients. We propose a machine learning framework for automatically discriminating patients suffering from the major depressive disorder (n = 14) and healthy subjects (n = 17). To this end, spontaneous physical activity data were recorded via a watch-type computer device equipped by the participants in their daily lives. Two machine learning models are investigated and compared, i. e., support vector machines, and deep recurrent neural networks. Experimental results show that, both of the two methods, i. e., the static model fed with human hand-crafted features, and the sequential model fed with raw data can reach a promising performance with an unweighted average recall at 76.0 % and 56.3 %, respectively.
In recent years, machine learning has been increasingly applied to the area of mental health diagnosis, treatment, support, research, and clinical administration. In particular, using less-invasive wearables combined with the artificial intelligence to monitor, or diagnose the mental diseases has tremendous needs in real practice. To this end, we propose a novel approach for automatic detection of major depressive disorder. Firstly, spontaneous activity physical data are recorded by a watch-type device equipped with an activity monitor. Subsequently, a bag-of-behaviour-words approach is applied to extract higher representations from the raw sensor data in an unsupervised scenario. Finally, a support vector machine is selected as the classifier to make the predictions on screening the major depressive disorder. There are 69 healthy control subjects, and 14 major depressive disorder patients involved in this study. The experimental results demonstrate the effectiveness of the proposed method in a rigorous subject-independent test, which achieves an unweighted average recall at 59.3 % (an accuracy of 66.0 %). This unweighted average recall significantly (p < .05, onetailed z-test) outperforms human hand-crafted features with an unweighted average recall at 53.6 % (an accuracy of 61.7 %).
To have an objective depression diagnosis, numerous studies based on machine learning and deep learning using electroencephalogram (EEG) have been conducted. Most studies depend on one-dimensional raw data and required fine feature extraction. To solve this problem, in the EEG visualization research field, short-time Fourier transform (STFT), wavelet, and coherence commonly used as method s for transferring EEG data to 2D images. However, we devised a new way from the concept that EEG’s asymmetry was considered one of the major biomarkers of depression. This study proposes a deep-asymmetry methodology that converts the EEG’s asymmetry feature into a matrix image and uses it as input to a convolutional neural network. The asymmetry matrix image in the alpha band achieved 98.85% accuracy and outperformed most of the methods presented in previous studies. This study indicates that the proposed method can be an effective tool for pre-screening major depressive disorder patients.
Major depressive disorder (MDD) is a global issue and every year the number of people suffering, is increasing at an alarming rate. The role of electroencephalography (EEG) in diagnosing MDD has shown a potential surge. Many studies have been carried out for designing an automated approach to the diagnosis of MDD through EEG as a primary tool of analysis. However, most of the methodologies depend on machine learning and the application of deep neural network tools. These heavily depend on the annotated EEG signals for training, which requires trained professional for data generation. In addition, the time and memory complexity of its implementations are huge. With these challenges, the article designs an approach for the detection of MDD using spectral clustering. The raw EEG is pre-processed, and then three quantitative biomarkers: band power (delta, beta, and theta band power, and three non-linear signal extracted features have been extracted from raw EEG signals. Channel-wise and hemisphere-wise analyses have been conducted to understand the correlation and reliance among the cross-hemisphere. The efficiency and effectiveness of the solution on par with the other existing design are tested and validated.
The promise of machine learning has fueled the hope for developing diagnostic tools for psychiatry. Initial studies showed high accuracy for the identification of major depressive disorder (MDD) with resting-state connectivity, but progress has been hampered by the absence of large datasets. Here we used regular machine learning and advanced deep learning algorithms to differentiate patients with MDD from healthy controls and identify neurophysiological signatures of depression in two of the largest resting-state datasets for MDD. We obtained resting-state functional magnetic resonance imaging data from the REST-meta-MDD ( N = 2338) and PsyMRI ( N = 1039) consortia. Classification of functional connectivity matrices was done using support vector machines (SVM) and graph convolutional neural networks (GCN), and performance was evaluated using 5-fold cross-validation. Features were visualized using GCN-Explainer, an ablation study and univariate t-testing. The results showed a mean classification accuracy of 61% for MDD versus controls. Mean accuracy for classifying (non-)medicated subgroups was 62%. Sex classification accuracy was substantially better across datasets (73–81%). Visualization of the results showed that classifications were driven by stronger thalamic connections in both datasets, while nearly all other connections were weaker with small univariate effect sizes. These results suggest that whole brain resting-state connectivity is a reliable though poor biomarker for MDD, presumably due to disease heterogeneity as further supported by the higher accuracy for sex classification using the same methods. Deep learning revealed thalamic hyperconnectivity as a prominent neurophysiological signature of depression in both multicenter studies, which may guide the development of biomarkers in future studies.
OBJECTIVE Distinguishing major depressive disorder (MDD) from bipolar disorder (BD) is a crucial clinical challenge as effective treatment is quite different for each condition. In this study electroencephalography (EEG) was explored as an objective biomarker for distinguishing MDD from BD using an efficient machine learning algorithm (MLA) trained by a relatively large and balanced dataset. METHODS A 3 step MLA was applied: (1) a multi-step preprocessing method was used to improve the quality of the EEG signal, (2) symbolic transfer entropy (STE), an effective connectivity measure, was applied to the resultant EEG and (3) the MLA used the extracted STE features to distinguish MDD (N = 71) from BD (N = 71) subjects. RESULTS 14 connectivity features were selected by the proposed algorithm. Most of the selected features were related to the frontal, parietal, and temporal lobe electrodes. The major involved regions were the Broca region in the frontal lobe and the somatosensory association cortex in the parietal lobe. These regions are near electrodes FC5 and CPz and are involved in processing language and sensory information, respectively. The resulting classifier delivered an evaluation accuracy of 88.5% and a test accuracy of 89.3%, using 80% of the data for training and evaluation and the remaining 20% for testing, respectively. CONCLUSIONS The high evaluation and test accuracies of our algorithm, derived from a large balanced training sample suggests that this method may hold significant promise as a clinical tool. SIGNIFICANCE The proposed MLA may provide an inexpensive and readily available tool that clinicians may use to enhance diagnostic accuracy and shorten time to effective treatment.
Background Major depressive disorder (MDD) is a common mental illness characterized by persistent sadness and a loss of interest in activities. Using smartphones and wearable devices to monitor the mental condition of patients with MDD has been examined in several studies. However, few studies have used passively collected data to monitor mood changes over time. Objective The aim of this study is to examine the feasibility of monitoring mood status and stability of patients with MDD using machine learning models trained by passively collected data, including phone use data, sleep data, and step count data. Methods We constructed 950 data samples representing time spans during three consecutive Patient Health Questionnaire-9 assessments. Each data sample was labeled as Steady or Mood Swing, with subgroups Steady-remission, Steady-depressed, Mood Swing-drastic, and Mood Swing-moderate based on patients’ Patient Health Questionnaire-9 scores from three visits. A total of 252 features were extracted, and 4 feature selection models were applied; 6 different combinations of types of data were experimented with using 6 different machine learning models. Results A total of 334 participants with MDD were enrolled in this study. The highest average accuracy of classification between Steady and Mood Swing was 76.67% (SD 8.47%) and that of recall was 90.44% (SD 6.93%), with features from all types of data being used. Among the 6 combinations of types of data we experimented with, the overall best combination was using call logs, sleep data, step count data, and heart rate data. The accuracies of predicting between Steady-remission and Mood Swing-drastic, Steady-remission and Mood Swing-moderate, and Steady-depressed and Mood Swing-drastic were over 80%, and the accuracy of predicting between Steady-depressed and Mood Swing-moderate and the overall Steady to Mood Swing classification accuracy were over 75%. Comparing all 6 aforementioned combinations, we found that the overall prediction accuracies between Steady-remission and Mood Swing (drastic and moderate) are better than those between Steady-depressed and Mood Swing (drastic and moderate). Conclusions Our proposed method could be used to monitor mood changes in patients with MDD with promising accuracy by using passively collected data, which can be used as a reference by doctors for adjusting treatment plans or for warning patients and their guardians of a relapse. Trial Registration Chinese Clinical Trial Registry ChiCTR1900021461; http://www.chictr.org.cn/showprojen.aspx?proj=36173
Depression is one of the most common mental disorders. Therefore, the development of new methods for early diagnosis of depression is a highly relevant task. It is well known that depression has both a genetic predisposition and greatly depends on the patient's life background. Therefore, the analysis of only genetic markers of depression is usually unsuccessful, because it does not take into account the physiological state of a person at the time of examination. The aim of our study was to develop an algorithm for the joint analysis of a collection of genetic and neurophysiological data collected from healthy people and patients with depression to identify genetic markers and neurophysiological correlates of pathology. As EEG indicators, we considered the amplitudes of evoked potentials under the conditions of performing tasks in the stop-signal paradigm. We applied machine learning algorithms that allow us to identify single nucleotide polymorphisms associated with the risk of depression.
No abstract available
Statistical method Principal Component Analysis (PCA) based on psychological questionnaires and EEG spectral characteristics was used to reduce the complex data by finding the most important variables to separate the healthy participants and participants with major depressive disorder from clinie of the State Researeh Institute of Neuroseienees and Medieine. The basic idea of PCA is to identify data patterns that are difficult to see otherwise. In this article it was used PCA for disease classification. We applied PCA to reduce the data dimensionality, for uncovering hidden patterns or relationships between the features. Finally, we trained the classification model on the reduced data to distinguish between healthy participants and participants with major depressive disorder. The study aimed to investigate the relationship between psychological factors and EEG spectral characteristics in participants with major depressive disorder. The study found that there was a significant correlation between the severity of depression and the EEG spectral characteristics. These findings suggest that the combination of psychological questionnaires and EEG spectral characteristics could be a useful tool for the diagnosis and classification of major depressive disorder.
The correct and early identification of Major Depressive Disorder (MDD) is challenging because of its complicated causes and symptoms that are similar to those of other mental diseases. This study introduces a new method for forecasting and categorizing MDD by utilizing EEG data, with a specific emphasis on Valence and Arousal measurements. Our approach combines sophisticated methods for extracting and selecting features with deeplearning models. Firstly, EEG signals are subjected to preprocessing in order to eliminate artifacts and standardize the quality of the data. The Boruta algorithm is utilized to extract features, which are then selected using the improved Cuckoo Search Algorithm. The suggested model, referred to as EEG-VARNet, utilizes two CNN models (Xception V4) and an SVM classifier to capture both spatial and temporal dependencies present in the EEG data. The model undergoes hyperparameter adjustment to maximize performance. The results of our study show substantial enhancements in the prediction of MDD with an accuracy rate of $\mathbf{9 8 . 2 \%}$, a precision rate of $96.43 \%$, and a recall rate of $\mathbf{9 7 . 8 9 \%}$. The measurements demonstrate the model's exceptional ability to diagnose MDD, emphasizing its promise as a dependable tool for clinical use.
Summary Background Early diagnosis of major depressive disorder (MDD) could enable timely interventions and effective management which subsequently improve clinical outcomes. However, quantitative and objective assessment tools for the suspected cases who present with depressive symptoms have not been fully established. Methods Based on a large-scale dataset (n = 363 subjects) collected with functional near-infrared spectroscopy (fNIRS) measurements during the verbal fluency task (VFT), this study proposed a data representation method for extracting spatiotemporal characteristics of NIRS signals, which emerged as candidate predictors in a two-phase machine learning framework to detect distinctive biomarkers for MDD. Supervised classifiers (e.g., support vector machine (SVM), k-nearest neighbors (KNN)) cooperated with cross-validation were implemented to evaluate the predictive capability of selected features in a training set. Another test set that was not involved in developing the algorithms enabled the independent assessment of the model's generalization. Findings For the classification with the optimal fusion features, the SVM classifier achieved the highest accuracy of 75.6% ± 4.7% in the nested cross-validation, and the correct prediction rate of 78.0% with a sensitivity of 75.0% and a specificity of 81.4% in the test set. Moreover, the multiway ANOVA test on clinical and demographic factors confirmed that twenty out of 39 optimal features were significantly correlated with the MDD-distinctive consequence. Interpretation The abnormal prefrontal activity of MDD may be quantified as diminished relative intensity and inappropriate activation timing of hemodynamic response, resulting in an objectively measurable biomarker for assessing cognitive deficits and screening MDD at the early stage. Funding This study was funded by NUS iHeathtech Other Operating Expenses (R-722-000-004-731).
Diagnosis of major depressive disorder (MDD) using resting-state functional connectivity (rs-FC) data faces many challenges, such as the high dimensionality, small samples, and individual difference. To assess the clinical value of rs-FC in MDD and identify the potential rs-FC machine learning (ML) model for the individualized diagnosis of MDD, based on the rs-FC data, a progressive three-step ML analysis was performed, including six different ML algorithms and two dimension reduction methods, to investigate the classification performance of ML model in a multicentral, large sample dataset [1021 MDD patients and 1100 normal controls (NCs)]. Furthermore, the linear least-squares fitted regression model was used to assess the relationships between rs-FC features and the severity of clinical symptoms in MDD patients. Among used ML methods, the rs-FC model constructed by the eXtreme Gradient Boosting (XGBoost) method showed the optimal classification performance for distinguishing MDD patients from NCs at the individual level (accuracy = 0.728, sensitivity = 0.720, specificity = 0.739, area under the curve = 0.831). Meanwhile, identified rs-FCs by the XGBoost model were primarily distributed within and between the default mode network, limbic network, and visual network. More importantly, the 17 item individual Hamilton Depression Scale scores of MDD patients can be accurately predicted using rs-FC features identified by the XGBoost model (adjusted R2 = 0.180, root mean squared error = 0.946). The XGBoost model using rs-FCs showed the optimal classification performance between MDD patients and HCs, with the good generalization and neuroscientifical interpretability.
No abstract available
No abstract available
BACKGROUND Suicidal behavior is a major concern for patients who suffer from major depressive disorder (MDD), especially among adolescents and young adults. Machine learning models with the capability of suicide risk identification at an individual level could improve suicide prevention among high-risk patient population. METHODS A cross-sectional assessment was conducted on a sample of 66 adolescents/young adults diagnosed with MDD. The structural T1-weighted MRI scan of each subject was processed using the FreeSurfer software. The classification model was conducted using the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm to distinguish suicide attempters and patients with suicidal ideation but without attempts. RESULTS The SVM model was able to correctly identify suicide attempters and patients with suicidal ideation but without attempts with a cross-validated prediction balanced accuracy of 78.59%, the sensitivity was 73.17% and the specificity was 84.0%. The positive predictive value of suicide attempt was 88.24%, and the negative predictive value was 65.63%. Right lateral orbitofrontal thickness, left caudal anterior cingulate thickness, left fusiform thickness, left temporal pole volume, right rostral anterior cingulate volume, left lateral orbitofrontal thickness, left posterior cingulate thickness, right pars orbitalis thickness, right posterior cingulate thickness, and left medial orbitofrontal thickness were the 10 top-ranked classifiers for suicide attempt. CONCLUSIONS The findings indicated that structural MRI data can be useful for the classification of suicide risk. The algorithm developed in current study may lead to identify suicide attempt risk among MDD patients.
No abstract available
BACKGROUND Prediction of therapeutic outcome of repetitive transcranial magnetic stimulation (rTMS) treatment is an important purpose that eliminates financial and psychological consequences of applying inefficient therapy. To achieve this goal we proposed a method based on machine learning to classify responders (R) and non- responders (NR) to rTMS treatment for major depression disorder (MDD) patients. METHODS 19 electrodes resting state EEG was recorded from 46 MDD patients before treatment. Then patients underwent 7 weeks of rTMS, and 23 of them responded to treatment. Features extracted from EEG include Lempel-Ziv complexity (LZC), Katz fractal dimension (KFD), correlation dimension (CD), the power spectral density, features based on bispectrum, frontal and prefrontal cordance and combination of them. The most relevant features were selected by the minimal-redundancy-maximal-relevance (mRMR) feature selection algorithm. For classifying two groups of R and NR, k-nearest neighbors (KNN) were applied. The performance of the proposed method was evaluated by leave-1-out cross-validation. For further study, the capability of features in differentiating R and NR was investigated by a statistical test. RESULTS Effective EEG features for prediction of rTMS treatment response were found. EEG beta power, the sum of bispectrum diagonal elements in delta and beta bands and CD were the most discriminative features. Power of beta classified R and NR with the high performance of 91.3% accuracy, 91.3% specificity, and 91.3% sensitivity. LIMITATIONS Lack of large sample size restricted our method for using in clinical applications. CONCLUSION This considerable high accuracy indicates that our proposed method with power and some of the nonlinear and bispectral features can lead to promising results in predicting treatment outcome of rTMS for MDD patients only by one session pretreatment EEG recording.
Major depressive disorder (MDD) has been considered a severe and common ailment with effects on functional frailty, while its clear manifestations are shrouded in mystery. Hence, manual detection of MDD is a challenging and subjective task. Although Electroencephalogram (EEG) signals have shown promise in aiding diagnosis, further enhancement is required to improve accuracy, clinical utility, and efficiency. This study focuses on the automated detection of MDD using EEG data and deep neural network architecture. For this aim, first, a customized InceptionTime model is recruited to detect MDD individuals via 19-channel raw EEG signals. Then a channel-selection strategy, which comprises three channel-selection steps, is conducted to omit redundant channels. The proposed method achieved 91.67% accuracy using the full set of channels and 87.5% after channel reduction. Our analysis shows that i) only the first minute of EEG recording is sufficient for MDD detection, ii) models based on EEG recorded in eyes-closed resting-state outperform eyes-open conditions, and iii) customizing the InceptionTime model can improve its efficiency for different assignments. The proposed method is able to help clinicians as an efficient, straightforward, and intelligent diagnostic tool for the objective detection of MDD.
Background: Neuroimaging-based diagnostic approaches are of critical importance for the accurate diagnosis and treatment of major depressive disorder (MDD). However, multisite neuroimaging data often exhibit substantial heterogeneity in terms of scanner protocols and population characteristics. Moreover, concerns over data ownership, security, and privacy make raw MRI datasets from multiple sites inaccessible, posing significant challenges to the development of robust diagnostic models. Federated learning (FL) offers a privacy-preserving solution to facilitate collaborative model training across sites without sharing raw data. Methods: In this study, we propose the personalized Federated Gradient Matching and Contrastive Optimization (pF-GMCO) algorithm to address domain shift and support scalable MDD classification using multimodal MRI. Our method incorporates gradient matching based on cosine similarity to weight contributions from different sites adaptively, contrastive learning to promote client-specific model optimization, and multimodal compact bilinear (MCB) pooling to effectively integrate structural MRI (sMRI) and functional MRI (fMRI) features. Results and Conclusions: Evaluated on the Rest-Meta-MDD dataset with 2293 subjects from 23 sites, pF-GMCO achieved accuracy of 79.07%, demonstrating superior performance and interpretability. This work provides an effective and privacy-aware framework for multisite MDD diagnosis using federated learning.
Background/Objectives: This study investigates the classification of Major Depressive Disorder (MDD) using electroencephalography (EEG) Short-Time Fourier-Transform (STFT) spectrograms and audio Mel-spectrogram data of 52 subjects. The objective is to develop a multimodal classification model that integrates audio and EEG data to accurately identify depressive tendencies. Methods: We utilized the Multimodal open dataset for Mental Disorder Analysis (MODMA) and trained a pre-trained Densenet121 model using transfer learning. Features from both the EEG and audio modalities were extracted and concatenated before being passed through the final classification layer. Additionally, an ablation study was conducted on both datasets separately. Results: The proposed multimodal classification model demonstrated superior performance compared to existing methods, achieving an Accuracy of 97.53%, Precision of 98.20%, F1 Score of 97.76%, and Recall of 97.32%. A confusion matrix was also used to evaluate the model’s effectiveness. Conclusions: The paper presents a robust multimodal classification approach that outperforms state-of-the-art methods with potential application in clinical diagnostics for depression assessment.
Depression is one of the most common mental health disorders and has been a major focus of research, particularly through the lens of automated diagnostic methods. While many studies have explored magnetic resonance imaging techniques separately, the integration of multiple neuroimaging modalities has received less attention. To address this gap, we introduce a multimodal automatic classification method that leverages both resting-state functional magnetic resonance imaging and structural magnetic resonance imaging. Our approach employs a multi-stream 3D Convolutional Neural Network model to facilitate joint training on diverse features extracted from rs-fMRI and sMRI data. By classifying a combined group of 830 MDD patients and 771 normal controls from the REST-meta-MDD dataset, our model achieves an impressive accuracy of 69.38% using a feature combination of CSF, REHO, and fALFF. This result signifies a notable enhancement in classification performance, contributing valuable insights into the capabilities of multimodal imaging in MDD diagnosis.
In this day and age, depression is still one of the biggest problems in the world. If left untreated, it can lead to suicidal thoughts and attempts. There is a need for proper diagnoses of Major Depressive Disorder (MDD) and evaluation of the early stages to stop the side effects. Early detection is critical to identify a variety of serious conditions. In order to provide safe and effective protection to MDD patients, it is crucial to automate diagnoses and make decision-making tools widely available. Although there are various classification systems for the diagnosis of MDD, no reliable, secure method that meets these requirements has been established to date. In this paper, a federated deep learning-based multimodal system for MDD classification using electroencephalography (EEG) and audio datasets is presented while meeting data privacy requirements. The performance of the federated learning (FL) model was tested on independent and identically distributed (IID) and non-IID data. The study began by extracting features from several pre-trained models and ultimately decided to use bidirectional short-term memory (Bi-LSTM) as the base model, as it had the highest validation accuracy of 91% compared to a convolutional neural network and LSTM with 85% and 89% validation accuracy on audio data, respectively. The Bi-LSTM model also achieved a validation accuracy of 98.9% for EEG data. The FL method was then used to perform experiments on IID and non-IID datasets. The FL-based multimodal model achieved an exceptional training and validation accuracy of 99.9% when trained and evaluated on both IID and non-IIID datasets. These results show that the FL multimodal system performs almost as well as the Bi-LSTM multimodal system and emphasize its suitability for processing IID and non-IIID data. Several clients were found to perform better than conventional pre-trained models in a multimodal framework for federated learning using EEG and audio datasets. The proposed framework stands out from other classification techniques for MDD due to its special features, such as multimodality and data privacy for edge machines with limited resources. Due to these additional features, the framework concept is the most suitable alternative approach for the early classification of MDD patients.
How to achieve a high-precision suicide attempt classifier based on the three-dimensional psychological pain model is a valuable issue in suicide research. The aim of the present study is to explore the importance of pain avoidance and its related neural features in suicide attempt classification models among patients with major depressive disorder. By recursive feature elimination with cross-validation and support-vector-machine algorithms, scores from the measurements and the task-based EEG signals were chosen to achieve a suicide attempt classification model. In the multimodal suicide attempt classifier with an accuracy of 83.91% and an area under the curve of 0.90, pain avoidance ranked as the top one in the optimal feature set. Theta (reward positive feedback minus neutral positive feedback) was the shared neural representation ranking as the top one of event-related potential features in pain avoidance and suicide attempt classifiers. In conclusion, the suicide attempt classifier based on pain avoidance and its related affective processing neural features has excellent accuracy among patients with major depressive disorder. Pain avoidance is a stable and strong indicator for identifying suicide risks in both traditional analyses and machine-learning approaches. A novel methodology is needed to clarify the relationship between cognitive and affective processing evoked by punishment stimuli and pain avoidance.
The subtype diagnosis and severity classification of mood disorder have been made through the judgment of verified assistance tools and psychiatrists. Recently, however, many studies have been conducted using biomarker data collected from subjects to assist in diagnosis, and most studies use heart rate variability (HRV) data collected to understand the balance of the autonomic nervous system on statistical analysis methods to perform classification through statistical analysis. In this research, three mood disorder severity or subtype classification algorithms are presented through multimodal analysis of data on the collected heart-related data variables and hidden features from the variables of time and frequency domain of HRV. Comparing the classification performance of the statistical analysis widely used in existing major depressive disorder (MDD), anxiety disorder (AD), and bipolar disorder (BD) classification studies and the multimodality deep neural network analysis newly proposed in this study, it was confirmed that the severity or subtype classification accuracy performance of each disease improved by 0.118, 0.231, and 0.125 on average. Through the study, it was confirmed that deep learning analysis of biomarker data such as HRV can be applied as a primary identification and diagnosis aid for mental diseases, and that it can help to objectively diagnose psychiatrists in that it can confirm not only the diagnosed disease but also the current mood status.
BACKGROUND Given that major depressive disorder (MDD) is both biologically and clinically heterogeneous, a diagnostic system integrating neurobiological markers and clinical characteristics would allow for better diagnostic accuracy and, consequently, treatment efficacy. OBJECTIVE Our study aimed to evaluate the discriminative and predictive ability of unimodal, bimodal, and multimodal approaches in a total of seven machine learning (ML) models-clinical, demographic, functional near-infrared spectroscopy (fNIRS), combinations of two unimodal models, as well as a combination of all three-for MDD. METHODS We recruited 65 adults with MDD and 68 matched healthy controls, who provided both sociodemographic and clinical information, and completed the HAM-D questionnaire. They were also subject to fNIRS measurement when participating in the verbal fluency task. Using the nested cross validation procedure, the classification performance of each ML model was evaluated based on the area under the receiver operating characteristic curve (ROC), balanced accuracy, sensitivity, and specificity. RESULTS The multimodal ML model was able to distinguish between depressed patients and healthy controls with the highest balanced accuracy of 87.98 ± 8.84% (AUC = 0.92; 95% CI (0.84-0.99) when compared with the uni- and bi-modal models. CONCLUSIONS Our multimodal ML model demonstrated the highest diagnostic accuracy for MDD. This reinforces the biological and clinical heterogeneity of MDD and highlights the potential of this model to improve MDD diagnosis rates. Furthermore, this model is cost-effective and clinically applicable enough to be established as a robust diagnostic system for MDD based on patients' biosignatures.
—Depression is common and dangerous if untreated. We must detect depression patterns early and accurately to provide timely interventions and assistance. We present a novel depression prediction method (depressive-deep), which combines preprocess brain electroencephalogram (EEG) and ECG-based heart-rate variability (HRV) signals into a 2D scalogram. Later, we extracted features from 2D scalogram images using a fine-tuned MobileNetV2 deep learning (DL) architecture. We integrated an AdaBoost ensemble learning algorithm to improve the model’s performance. Our study suggested ensemble learning can accurately predict asymmetric and symmetric depression patterns from multimodal signals such as EEG and ECG. These patterns include major depressive state (MDS), cognitive and emotional arousal (CEA), mood disorder patterns (MDPs), mood and emotional regulation (MER), and stress and emotional dysregulation (SED). To develop this depressive-deep model, we have performed a pre-trained strategy on two publicly available datasets, MODMA and SWEEL-KW. The sensitivity (SE), specificity (SP), accuracy (ACC), F1-score, precision (P), Matthew’s correlation coefficient (MCC), and area under the curve (AUC) have been analyzed to determine the best depression prediction model. Moreover, we used wearable devices over the Internet of Medical Things (IoMT) to extract signals and check the depressive-deep system’s generalizability. To ensure model robustness, we use several assessment criteria, including cross-validation. The depressive-deep and feature extraction strategies outperformed compared to the other methods in depression prediction, obtaining an ACC of 0.96, IOTSE of 0.98, SP of 0.95, P of 0.95, F1-score of 0.96, and MCC of 0.96. The main findings suggest that using 2D scalogram and depressive-deep (fine-tuning of MobileNet2 + AdaBoost) algorithms outperform them in detecting early depression, improving mental health diagnosis and treatment.
No abstract available
Objective: Psychiatric evaluation suffers from subjectivity and bias, and is hard to scale due to intensive professional training requirements. In this work, we investigated whether behavioral and physiological signals, extracted from tele-video interviews, differ in individuals with psychiatric disorders. Methods: Temporal variations in facial expression, vocal expression, linguistic expression, and cardiovascular modulation were extracted from simultaneously recorded audio and video of remote interviews. Averages, standard deviations, and Markovian process-derived statistics of these features were computed from 73 subjects. Four binary classification tasks were defined: detecting 1) any clinically-diagnosed psychiatric disorder, 2) major depressive disorder, 3) self-rated depression, and 4) self-rated anxiety. Each modality was evaluated individually and in combination. Results: Statistically significant feature differences were found between psychiatric and control subjects. Correlations were found between features and self-rated depression and anxiety scores. Heart rate dynamics provided the best unimodal performance with areas under the receiver-operator curve (AUROCs) of 0.68–0.75 (depending on the classification task). Combining multiple modalities provided AUROCs of 0.72–0.82. Conclusion: Multimodal features extracted from remote interviews revealed informative characteristics of clinically diagnosed and self-rated mental health status. Significance: The proposed multimodal approach has the potential to facilitate scalable, remote, and low-cost assessment for low-burden automated mental health services.
Summary This study developed an artificial intelligence (AI) system using a local-global multimodal fusion graph neural network (LGMF-GNN) to address the challenge of diagnosing major depressive disorder (MDD), a complex disease influenced by social, psychological, and biological factors. Utilizing functional MRI, structural MRI, and electronic health records, the system offers an objective diagnostic method by integrating individual brain regions and population data. Tested across cohorts from China, Japan, and Russia with 1,182 healthy controls and 1,260 MDD patients from 24 institutions, it achieved a classification accuracy of 78.75%, an area under the receiver operating characteristic curve (AUROC) of 80.64%, and correctly identified MDD subtypes. The system further discovered distinct brain connectivity patterns in MDD, including reduced functional connectivity between the left gyrus rectus and right cerebellar lobule VIIB, and increased connectivity between the left Rolandic operculum and right hippocampus. Anatomically, MDD is associated with thickness changes of the gray and white matter interface, indicating potential neuropathological conditions or brain injuries.
Mood disorders, particularly major depressive disorder (MDD) and bipolar disorder (BD), are often underdiagnosed, leading to substantial morbidity. Harnessing the potential of emerging methodologies, we propose a novel multimodal fusion approach that integrates patient-oriented brain structural magnetic resonance imaging (sMRI) scans with DNA whole-exome sequencing (WES) data. Multimodal data fusion aims to improve the detection of mood disorders by employing established deep-learning architectures for computer vision and machine-learning strategies. We analyzed brain imaging genetic data of 321 East Asian individuals, including 147 patients with MDD, 78 patients with BD, and 96 healthy controls. We developed and evaluated six fusion models by leveraging common computer vision models in image classification: Vision Transformer (ViT), Inception-V3, and ResNet50, in conjunction with advanced machine-learning techniques (XGBoost and LightGBM) known for high-dimensional data analysis. Model validation was performed using a 10-fold cross-validation. Our ViT ⊕ XGBoost fusion model with MRI scans, genomic Single Nucleotide polymorphism (SNP) data, and unweighted polygenic risk score (PRS) outperformed baseline models, achieving an incremental area under the curve (AUC) of 0.2162 (32.03% increase) and 0.0675 (+8.19%) and incremental accuracy of 0.1455 (+25.14%) and 0.0849 (+13.28%) compared to SNP-only and image-only baseline models, respectively. Our findings highlight the opportunity to refine mood disorder diagnostics by demonstrating the transformative potential of integrating diverse, yet complementary, data modalities and methodologies.
A Multimodal Framework for Prognostic Modelling of Mental Health Treatment and Recovery Trajectories
The clinical management of major depressive disorder is constrained by a trial-and-error approach. The clinical management of major depressive disorder is constrained by a trial-and-error approach. While computational methods have focused on static binary classification (e.g., responder vs. non-responder), they ignore the dynamic nature of recovery. Building upon the recently proposed prognostic theory of treatment response, this article presents a methodological framework for its operationalisation. We define a multi-modal data architecture for the theory’s core constructs—the Patient State Vector (PSV), Therapeutic Impulse Function (TIF), and Predicted Recovery Trajectory (PRT)—transforming them from abstract concepts into specified computational inputs. To model the asynchronous interactions between these components, we specify a Time-Aware Long Short-Term Memory (LSTM) architecture, providing explicit mathematical formulations for time-decay gates to handle irregular clinical sampling. Furthermore, we outline a synthetic validation protocol to benchmark this dynamic approach against static baselines. By integrating these technical specifications with a translational pipeline for Explainable AI (XAI) and ethical governance, this paper provides the necessary blueprint to transition psychiatry from theoretical prognosis to empirical forecasting.
No abstract available
Major Depressive Disorder (MDD) remains a leading cause of disability worldwide. Electroencephalography (EEG) coupled with machine-learning provides a rapid, non-invasive decision-support tool for clinicians. Seven families of EEG features—relative power, wavelet energy, frontal-alpha asymmetry (FAA), theta-beta ratio (TBR), P300 amplitude, and their combinations were benchmarked on 90 eyes-closed resting recordings from the open TDBRAIN repository. After artefact removal, each 5 -min record was split into 10 s epochs and class counts were balanced. A grid-searched Support Vector Machine with five-fold cross-validation achieved a peak accuracy of 90.9% ($\mathrm{F} 1=90.9 \%$) using channel-wise wavelet features (198 variables); region-level averaging compressed dimensionality by $3.6 x$ while retaining 82.9% accuracy. The findings highlight the diagnostic value of spatially resolved wavelet dynamics and caution that earlier studies may be biased by gender/age imbalance and multimodal acquisitions that are not properly synchronized in time.
Differentiating between bipolar disorder and major depressive disorder can be challenging for clinicians. The diagnostic process might benefit from new ways of monitoring the phenotypes of these disorders. Smartphone data might offer insight in this regard. Today, smartphones collect dense, multimodal data from which behavioral metrics can be derived. Distinct patterns in these metrics have the potential to differentiate the two conditions. To examine the feasibility of smartphone-based phenotyping, two study sites (Mayo Clinic, Johns Hopkins University) recruited patients with bipolar I disorder (BPI), bipolar II disorder (BPII), major depressive disorder (MDD), and undiagnosed controls for a 12-week observational study. On their smartphones, study participants used a digital phenotyping app (mindLAMP) for data collection. While in use, mindLAMP gathered real-time geolocation, accelerometer, and screen-state (on/off) data. mindLAMP was also used for EMA delivery. MindLAMP data was then used as input variables in binary classification, three-group k-nearest neighbors (KNN) classification, and k-means clustering. The best-performing binary classification model was able to classify patients as control or non-control with an AUC of 0.91 (random forest). The model that performed best at classifying patients as having MDD or bipolar I/II had an AUC of 0.62 (logistic regression). The k-means clustering model had a silhouette score of 0.46 and an ARI of 0.27. Results support the potential for digital phenotyping methods to cluster depression, bipolar disorder, and healthy controls. However, due to inconsistencies in accuracy, more data streams are required before these methods can be applied to clinical practice.
Functional Near-Infrared Spectroscopy (fNIRS) is a technique for measuring blood flow in the brain, specifically focusing on changes in the frontal lobe. It has found valuable applications in psychiatry, particularly in diagnostic processes. This study explores the potential of fNIRS data and verbal fluency test (VFT) data, both collected during fNIRS measurements, as tools for diagnosing the severity of major depressive disorder, a significant mental health condition. Beyond merely detecting depression, our research introduces a medical support agent model to identify signs of suicidal tendencies in individuals with severe depression. By integrating data collection, preprocessing techniques, feature extraction, and multimodal classification methods for fNIRS and VFT data, our study suggests these artificial intelligence-based medical agents could enhance diagnostic accuracy and provide valuable support in clinical judgments.
Depression is the leading cause of disability worldwide, yet rates of missed- and mis-diagnoses are alarmingly high. The introduction of objective biomarkers, to aid diagnosis, informed by depression’s physiological pathology may alleviate some of the burden on strained mental health services. Three minutes of eyes-closed resting state heart rate and skin conductance response (SCR) data were acquired from 27 participants (16 healthy controls, 11 with major depressive disorder (MDD)). Various classifiers were trained on state-of-the-art and novel features. We are aware of no previous studies analysing the utility of multimodal vs. individual modalities for classification. We found no improvement using multimodal classifiers over using heart rate variability (HRV) alone, which achieved 81% test accuracy. The best multimodal and SCR only classifiers were only slightly less accurate at 78%. Despite not improving depression detection, SCR features did show stronger correlation with suicidal ideation than HRV. SD1/SD22 is a novel HRV feature proposed in this paper, similar to the commonly used ratio SD1/SD2 but with more marked separation between classes, having the largest Rank Biserial Correlation of all examined features (p-value = 0.002, RBC = -0.73). We recommend further studies in this area.
Abstract Background Conventional approaches for major depressive disorder (MDD) screening rely on two effective but subjective paradigms: self-rated scales and clinical interviews. Artificial intelligence (AI) can potentially contribute to psychiatry, especially through the use of objective data such as objective audiovisual signals. Objective This study aimed to evaluate the efficacy of different paradigms using AI analysis on audiovisual signals. Methods We recruited 89 participants (mean age, 37.1 years; male: 30/89, 33.7%; female: 59/89, 66.3%), including 41 patients with MDD and 48 asymptomatic participants. We developed AI models using facial movement, acoustic, and text features extracted from videos obtained via a tool, incorporating four paradigms: conventional scale (CS), question and answering (Q&A), mental imagery description (MID), and video watching (VW). Ablation experiments and 5-fold cross-validation were performed using two AI methods to ascertain the efficacy of paradigm combinations. Attention scores from the deep learning model were calculated and compared with correlation results to assess comprehensibility. Results In video clip-based analyses, Q&A outperformed MID with a mean binary sensitivity of 79.06% (95%CI 77.06%‐83.35%; P=.03) and an effect size of 1.0. Among individuals, the combination of Q&A and MID outperformed MID alone with a mean extent accuracy of 80.00% (95%CI 65.88%‐88.24%; P= .01), with an effect size 0.61. The mean binary accuracy exceeded 76.25% for video clip predictions and 74.12% for individual-level predictions across the two AI methods, with top individual binary accuracy of 94.12%. The features exhibiting high attention scores demonstrated a significant overlap with those that were statistically correlated, including 18 features (all Ps<.05), while also aligning with established nonverbal markers. Conclusions The Q&A paradigm demonstrated higher efficacy than MID, both individually and in combination. Using AI to analyze audiovisual signals across multiple paradigms has the potential to be an effective tool for MDD screening.
Major Depressive Disorder (MDD) is a widespread and devastating mental health disorder marked by ongoing feelings of sadness, a diminished interest in activities, and various cognitive and physical challenges. Many existing models rely exclusively on either structural MRI (sMRI) or resting-state fMRI (rs-fMRI) data, limiting their capacity to fully capture the complexity of brain activity and dynamics, which hampers our understanding of mental health conditions. Traditional methods often employ basic attention mechanisms that fail to grasp intricate relationships in high-dimensional neuroimaging data, leading to deficient feature representations. Furthermore, these approaches frequently overlook temporal variability, which is essential for understanding the fluctuations associated with mental health disorders. Conventional models also struggle with effective strategies for integrating information from different modalities, often lack interpretability, and tend to be computationally intensive, which limits their practical use. To address these challenges, this study introduces a novel framework for detecting and classifying MDD using deep learning (DL) techniques that use both sFMRI and rs-fMRI data. Our framework employs the Dualistic-series Transformer (Dual-Trans), which features an enhanced mechanism for capturing and representing features. This includes multi-Dconv-head Temporal Attention (MDCTA) for simultaneous spectral feature extraction and Multihead Spatial Self-Attention (MSPSA) for analyzing spatial relationships within the data. Additionally, the Feature Compensation Module (FCM) assesses consistency across spatial and temporal dimensions, identifying discrepancies between modalities to enhance the model's reliability. Furthermore, the Gate Module (GM) fuses these features, enabling a decision-making process that weighs their relevance in the final representation. Ultimately, detection and classification occur in a Dualistic-series Feed-Forward Network, which processes input in parallel branches tailored for each modality while benefiting from integrated features.
No abstract available
Background Stress is a significant risk factor for psychiatric disorders such as major depressive disorder (MDD) and panic disorder (PD). This highlights the need for advanced stress-monitoring technologies to improve treatment. Stress affects the autonomic nervous system, which can be evaluated via heart rate variability (HRV). While machine learning has enabled automated stress detection via HRV in healthy individuals, its application in psychiatric patients remains underexplored. This study evaluated the feasibility of using machine-learning algorithms to detect stress automatically in MDD and PD patients, as well as healthy controls (HCs), based on HRV features. Methods The study included 147 participants (MDD: 41, PD: 47, HC: 59) who visited the laboratory up to five times over 12 weeks. HRV data were collected during stress and relaxation tasks, with 20 HRV features extracted. Random forest and multilayer perceptron classifiers were applied to distinguish between the stress and relaxation tasks. Feature importance was analyzed using SHapley Additive exPlanations, and differences in HRV between the tasks (ΔHRV) were compared across groups. The impact of personalized longitudinal scaling on classification accuracy was also assessed. Results Random forest classification accuracies were 0.67 for MDD, 0.69 for PD, and 0.73 for HCs, indicating higher accuracy in the HC group. Longitudinal scaling improved accuracies to 0.94 for MDD, 0.90 for PD, and 0.96 for HCs, suggesting its potential in monitoring patients’ conditions using HRV. The HC group demonstrated greater ΔHRV fluctuation in a larger number of and more significant features than the patient groups, potentially contributing to higher accuracy. Multilayer perceptron models provided consistent results with random forest, confirming the robustness of the findings. Conclusion This study demonstrated that differentiating between stress and relaxation was more challenging in the PD and MDD groups than in the HC group, underscoring the potential of HRV metrics as stress biomarkers. Psychiatric patients exhibited altered autonomic responses, which may influence their stress reactivity. This indicates the need for a tailored approach to stress monitoring in these patient groups. Additionally, we emphasized the significance of longitudinal scaling in enhancing classification accuracy, which can be utilized to develop personalized monitoring technologies for psychiatric patients.
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing 7012 participants from 31 sites (N = 2772 MDD and N = 4240 HC), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible. Future studies are needed to determine whether more sophisticated integration of information from other MRI modalities such as fMRI and DWI will lead to a higher performance in this diagnostic task.
This paper proposes a novel approach for distinguishing Major Depressive Disorder (MDD) patients from healthy controls (HC), namely depression screening, using EEG signals, where the Hilbert-Huang Transform (HHT) is integrated into a Self-Attention neural network (HHT-SANN). The incorporation of the HHT enhances the model’s time-frequency analysis capabilities and allows for more effective nonlinear processing of the EEG data. By embedding the HHT within the self-attention module, the model captures intricate temporal and spectral patterns that are critical for accurate depression classification. We evaluated our method on a clinical EEG dataset comprising 34 MDD patients and 30 healthy controls from the Hospital of Universiti Sains Malaysia. Experimental results indicate that the proposed method achieves an accuracy of 98.78%, sensitivity of 99.23%, and specificity of 98.27%, outperforming traditional models and offering a more robust solution for depression detection. This work contributes to advancing the field of neuroinformatics by providing a more interpretable and effective model for mental health diagnostics based on EEG data.
The electroencephalogram (EEG) stands out as a promising non-invasive tool for assessing depression. However, the efficient selection of channels is crucial for pinpointing key channels that can differentiate between different stages of depression within the vast dataset. This study outcome a comprehensive strategy for optimizing EEG channels to classify Major Depressive Disorder (MDD) using machine learning (ML) and deep learning (DL) approaches, and monitor effect of central lobe channels. A thorough review underscores the vital significance of EEG channel selection in the analysis of mental disorders. Neglecting this optimization step could result in heightened computational expenses, squandered resources, and potentially inaccurate classification results. Our assessment encompassed a range of techniques, such as Asymmetric Variance Ratio (AVR), Amplitude Asymmetry Ratio (AAR), Entropy-based selection employing Probability Mass Function (PMF), and Recursive Feature Elimination (RFE) where, RFE exhibited superior performance, particularly in pinpointing the most pertinent EEG channels while including central lobe channels like Fz, Cz, and Pz. With this accuracy between 97 to 99% is recorded by Electroencephalography Neural Network (EEGNet). Our experimental findings indicate that, models using RFE achieved enhancement in accuracy to classifying depressive disorders across diverse classifiers: EEGNet (96%), Random Forest (95%), Long Short-Term Memory (LSTM: 97.4%), 1D-CNN with 95%, and Multi-Layer Perceptron (98%) irrespective of central lobe incorporation. A pivotal contribution of this research is the development of a robust Multilayer Perceptron (MLP) model trained on EEG data from 382 participants, achieved accuracy of 98.7%, with a perfect precision score of 1.00, F1-Score of 0.983, and a Recall-Score of 0.966, to make it an enhanced technique for depression classification. Significant channels identified include Fp1, Fp2, F7, F4, F8, T3, C3, Cz, T4, T5, and P3, offering critical insights about depression. Our findings shows that, optimized EEG channel selection via RFE enhances depression classification accuracy in the field of brain-computer interface.
This study presents a deep learning-based computer-aided diagnostic (CAD) system incorporating explainable artificial intelligence (XAI) techniques to support the accurate diagnosis of major depressive disorder (MDD). Resting-state electroencephalography (EEG) data from 40 drug-naïve male MDD patients and 41 male healthy controls were analyzed using a shallow convolutional neural network (Shallow ConvNet) combined with layer-wise relevance propagation (LRP). The proposed system achieved a classification accuracy of 99% without relying on hand-crafted feature engineering. The topographical map of relevance scores revealed higher relevance in the prefrontal, central, and occipital regions for MDD patients, and predominantly in the occipital regions for healthy controls. In addition, the LRP-based channel selection method enabled a substantial reduction in the number of EEG channels—from 62 to 10—while maintaining over $\text{9 0 \%}$ classification accuracy. These findings demonstrate the potential of XAI-based CAD systems not only to enhance diagnostic performance but also to provide novel insights into the neurophysiological mechanisms underlying MDD.
Major Depressive Disorder (MDD) diagnosis through Electroencephalography (EEG) is hindered by the non-stationary characteristics of neural oscillations and the limited adaptability of conventional classification frameworks. Static ensemble models, which rely on predetermined weight assignments, exhibit suboptimal performance in handling EEG variability induced by inter-individual neurophysiological diversity or environmental artifacts. Meanwhile, monolithic deep learning architectures often suffer from inadequate generalizability in clinical practice. To overcome these limitations, we present an Adaptive Agent-Based Ensemble Learning (AABEL) framework that integrates reinforcement learning (RL) with neurocomputational principles. AABEL pioneers three methodological advancements: (1) RL-Driven Adaptive Weighting: A meta-controller dynamically adjusts the contributions of convolutional (CNN), recurrent (GRU), and attention-based (Transformer) submodels through task-oriented reward signals, resolving the inflexibility of static ensemble paradigms. (2) Multiscale Neurodynamic Feature Fusion: Parallel processing branches extract complementary representations of EEG signals, including spatial-spectral patterns (CNN), temporal-contextual dynamics (GRU), and global interdependencies (Transformer), enabling holistic modeling of neuropathological signatures. (3) End-to-End Reward Propagation: An automated optimization pipeline eliminates manual aggregation rules by directly linking reward calculations to model weight updates. Utilizing the OpenNeuro ds003478 dataset, AABEL achieves superior classification metrics (accuracy: 98.06%, F1-score: 98.20%), outperforming static ensembles (e.g., Fuzzy Ensemble by 96% accuracy). The RL reward mechanism significantly enhances noise robustness, improving classification stability by 3.6%. By integrating dynamic reward-augmented learning with neurosignal processing, AABEL establishes a new paradigm for adaptive EEG-MDD diagnostics. This work bridges computational neuroscience and translational neuroengineering, offering a scalable framework for personalized mental health monitoring.
Major depressive disorder is an evolving mental health issue considered by WHO as one of the main contributors to the global disability. It has been affecting nearly 350 million people around the worldwide along with the highly increasing rate of suicidal attempts. Electroencephalogram (EEG) is a non-invasive method that detect the neural activity in the brain which reflects the working status of the human brain. It is used as the diagnostic tool for the depression detection. In this proposed work the dataset includes the EEG signals of 64 subjects (30 Healthy controls and 34 MDD patients). Each EEG signals have collected the signals from the 19 electrode channels. This project focus on the classification of the EEG signals of MDD individual from the Healthy using the deep learning algorithms. The deep learning algorithms effectively helps to find the patterns between the healthy and MDD individuals. The proposed model implements the Independent Component Analysis and Wavelet Denoising for cleaning the noisy signals and removing the irregularity from the raw EEG signals. The cleaned EEG data is transformed using the Superlet Transformation, the transformed data is converted into the images where all the 19 channels signal is compressed into a single image by taking the mean of all the channels. The transformed images are given to the pretrained deep learning model-various ResNet models for the feature extraction and classification. The ResNet-18 architecture achieves the highest accuracy of 95.65%. Which effectively have the impact on the early diagnosis of the MDD patients.
Major Depressive Disorder (MDD) is prevalent and incapacitating mental health disorder which mandates correct and early diagnosis for appropriate intervention. Conventional diagnostic methods are based on subjective assessments, often resulting in discrepancies and delayed treatment. Into our research, we offer novel framework based upon Quantum Graph Neural Networks (QGNN) for automatic detection of MDD, which focuses on leveraging deep learning and optimization techniques to improve accuracy in the diagnostic process. The DLC integrates QGNN with Dynamic Bayesian Optimization (DBO) for hyperparameter fine-tuning, enhancing model's predictive accuracy. Further, enhanced data preprocessing methods are applied, like Hanning Window function, for filtering out noise and enhancing input features. Experimental evaluations on the MDD patient dataset demonstrate that proposed approach achieves an accuracy of 86.62%, higher than existing graph-based models. Quantum-Inspired Schemes for Graph Learning and Optimization Techniques for NMPDD Classification and Recognition Expanding datasets, optimizing deep learning architectures, and incorporating clinical data are potential avenues for future research for further improving model generalization and diagnostic reliability. These conclusions specify a QGNN-DBO has probability to be powerful, scalable, and efficient solution for automatic MDD detection, potentially enhancing the landscape of mental health diagnostics and treatment planning
Major Depressive Disorder (MDD) is among the most prevalent and debilitating mental health conditions, demanding accurate and scalable predictive tools to support early intervention and prevention. In this study, we propose a novel Local-to-Global Graph Neural Network (LG-GNN) framework specifically designed to enhance the prediction of MDD from resting-state fMRI (rs-fMRI) data. The model integrates two complementary components: a Local-GNN, which extracts subject-specific features from regional brain connectivity, and a Global Subject-GNN, which captures inter-subject relationships by incorporating non-imaging data. This multi-scale approach enables both personalized and population-level insights into MDD. Comparative analyses identified GraphSAGE as the optimal architecture for the Global Subject-GNN and a custom GCN with Self-Attention-Based Pooling (SABP) as the best-performing Local-GNN. The final LG-GNN model significantly outperformed traditional machine learning and graph-based baselines, demonstrating its potential for MDD prediction. Beyond classification, the framework offers a scalable and interpretable approach for implementing personalized, digitalized, and data-driven prevention strategies in at-risk populations.
This study introduces the EEG-FDL model, a novel optimized fuzzy deep learning approach for classifying Major Depressive Disorder (MDD) using EEG data. Integrating deep learning with fuzzy learning via the Non-Dominated Sorting Genetic Algorithm II (NSGA-II), EEG-FDL optimizes fuzzy membership functions and backpropagation. The model handles noise and data uncertainty, achieving a remarkable 99.72% accuracy in distinguishing MDD from healthy EEG signals using 5-fold cross-validation on a large dataset. External validation further confirms its efficacy. EEG-FDL outperforms traditional classifiers due to its effective handling of uncertainties and optimized parameter tuning.
Major Depressive Disorder (MDD) is a composite mental state manifested through continuous negative thoughts, mood disturbances, and frequent thoughts of self-harm. Existing MDD diagnostic methods rely exclusively on mental health assessment questionnaires. There is a pressing need for an AI-driven approach to automate the detection of MDD, assisting healthcare practitioners. Since MDD has been shown to affect cardiovascular systems, this study aims to identify significant Electrocardiogram (ECG) features and automatically detect MDD using them. The study acquires ECG signals from 30 subjects, including 15 healthy subjects (HS) and 15 MDD patients. A total of 21 features are derived from ECG. Out of 21 features, significant ECG features are identified using 3 methods: Area Under the Receiver Operating Characteristic curve (AUC-ROC), Least Absolute Shrinkage and Selection Operator (LASSO), and Mann-Whitney U test (ρ < 0.001). Four Machine Learning (ML) classifiers: Random Forest (RF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT) with Grid Search optimization, are used to test classification accuracy obtained using features selected by 3 methods. The best classification accuracy (92.99%) is achieved by 16 significant features selected using the Mann-Whitney U test with an SVM-radial basis function (rbf) classifier. This is almost 2% more than the accuracy obtained using all 21 features.
No abstract available
Major Depressive Disorder (MDD) is the common mental health disease threatening human well-being. Several neuroimaging studies show that analyzing neural connectivity patterns improved diagnostic accuracy, though most approaches overlook node-edge interactions. Our study proposed an integrated approach combining LASSO and Ridge regression algorithms with brain connectivity features. Using both sMRI and fMRI data, we constructed a Multi-feature(gray matter volume/ALFF/Reho) Fusion Elastic Net (MFEN) framework to enhance MDD identification. Furthermore, we improved the Graph Convolutional Neural Network (GCN) algorithm by incorporating a self-attention mechanism and applied a triple Siamese Network to enhance feature extraction. Our proposed method of MDD identification was experimented on 2048 first-episode drug-naive MDD patients and 2562 healthy controls, using rs-fMRI data and sMRI features from the UK Biobank database. Results demonstrated that the extracted features significantly enhanced discriminative capability, establishing the foundation for identifying more reliable biomarkers in MDD patients. By integrating these techniques with elastic networks, the classification accuracy for MDD detection improved substantially to 89%, highlighting the framework's superior performance in mental health diagnostics. In summary, this MDD identification framework proved highly effective and may offer novel insights for auxiliary diagnosis of other neuropsychiatric disorders in clinical practice.
Electroencephalography (EEG) is useful for studying brain activity in major depressive disorder (MDD), particularly focusing on theta and alpha frequency bands via power spectral density (PSD). However, PSD-based analysis has often produced inconsistent results due to difficulties in distinguishing between periodic and aperiodic components of EEG signals. We analyzed EEG data from 114 young adults, including 74 healthy controls (HCs) and 40 MDD patients, assessing periodic and aperiodic components alongside conventional PSD at both source and electrode levels. Machine learning algorithms classified MDD versus HC based on these features. Sensor-level analysis showed stronger Hedge’s g effect sizes for parietal theta and frontal alpha activity than source-level analysis. MDD individuals exhibited reduced theta and alpha activity relative to HC. Logistic regression-based classifications showed that periodic components slightly outperformed PSD, with the best results achieved by combining periodic and aperiodic features (AUC = 0.82). Strong negative correlations were found between reduced periodic parietal theta and frontal alpha activities and higher scores on the Beck Depression Inventory, particularly for the anhedonia subscale. This study emphasizes the superiority of sensor-level over source-level analysis for detecting MDD-related changes and highlights the value of incorporating both periodic and aperiodic components for a more refined understanding of depressive disorders.
Parkinson’s disease affects both motor and non-motor functions, including vocal features that may indicate underlying mental health conditions such as depression. This work proposes a novel framework for simulated depression risk classification using vocal biomarkers derived from the UCI Parkinson’s dataset. A Self-Attention-Enhanced Multilayer Perceptron-MLP architecture is used model interactions between key acoustic features, particularly Harmonic-to-Noise Ratio and Jitter, which serve as the basis for generating binary depression risk labels. The proposed model outperforming traditional and deep learning benchmarks including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), TabNet, CNN-LSTM, Deep Neural Network (DNN), and Explainable Boosting Machine (EBM) with an accuracy of 97%, F1-score of 98%, recall of 95%, and specificity of 100%, While EBM offers strong interpretability, the attention-enhanced model demonstrates optimal predictive capability. These findings highlight the efficacy of voice-based features combined with attention mechanisms for early, non-invasive identification of depression risk in PD patients.
Speech-based depression detection is promising for objective mental health assessment. However, conventional methods relying on short-frame acoustic features often fail to capture long-term temporal and behavioral characteristics of speech essential for modeling depression-specific speaking patterns. Herein, four novel acoustic feature sets extracted from long-term speech are proposed: utterance interval feature set (UIFS), pause interval feature set (PIFS), response interval feature set (RIFS), and speech density (SD). These features explicitly characterize temporal structures and session-level speech behaviors beyond short-frame analysis. These features are combined with conventional acoustic features, including standard features extracted using openSMILE and voice level features, and evaluated using support vector machines under subject-independent conditions for the binary classification of depressed and nondepressed speakers. Incorporating the proposed features improves classification performance compared with baseline features (accuracy: 0.54 for openSMILE and 0.52 for openSMILE + voice level features). The configuration integrating all four proposed feature sets achieves an accuracy of 0.58, a precision of 0.56, a recall of 0.58, and a specificity of 0.58, indicating consistent performance gains under subject-independent and strictly controlled evaluation conditions. Thus, depression-related speech patterns can be captured by explicitly modeling temporal and behavioral speech characteristics across entire dialog sessions. This study contributes to advancing acoustic feature design for speech-based depression detection and developing clinically supportive screening and monitoring technologies.
Adults over the age of 60 years are a rising population at-risk for depression, and there is a need to create automatic screening for this illness. Most existing voice-based depression datasets comprise speakers younger than 60 and variations in speech due to age and depression are not well understood. In this study, which uses Patient Health Questionnaires for depression severity ground-truth, automatic depression detection is explored using acoustic-based prosodic, spectral, landmark, and voice quality features derived from smartphone recordings from 152 speakers in four different age ranges (e.g., 18–34, 35–48, 49–62, and 63–79). An age-dependent modeling paradigm for voice-based depression detection is proposed and evaluated. Results show that age-dependent models improve voice-based automatic depression classification accuracy with up to 10% absolute gains when compared with an age-agnostic model. Further, when compared with age-agnostic and gender-dependent models, age-dependent models often produced greater depressed class identification f-score sensitivity (up to 0.39 absolute gains). While automatically extracted acoustic voice features lead to statistically significant depression detection accuracy gains over the age-agnostic modeling baseline (4%–9% absolute), manually extracted voice quality features also are useful (4%–7% absolute gains over baseline). This investigation demonstrates the benefits of implementing age modeling to improve voice-based depression screening via smart devices.
Depression disorder is predicted to rise to the second leading cause of disability by 2030 as per the identifications of the World Health organization (WHO). Though well trained clinicians, medical and psychological treatments are available for depression treatment, persons or families are reluctant to speak out/reach doctors about this disorder for various social reasons. Diagnosis of depression disorder includes numerous interviews with patient and family, clinical analysis, questionnaires which is time consuming and also demands well trained clinicians. In the present era of Machine learning, automation of depression detection is not complicated and can easily be deployed. However, the automation should use fewer resources, provide accurate results with more reachability. In this paper, acoustic features are used to train a classification model to categorize a human as Depressed or not-Depressed. DIAC-WOZ database available with AVEC2016 challenge is considered for training the classifiers. Prosodic, Spectral and Voice control features are extracted using the COVAREP toolbox and are feature fused. SMOTE analysis is used for overcoming the class imbalance and 93% accuracy is obtained with the SVM algorithm resulting in Depression Classification Model (DCM). An android application cureD Deployed on Cloud is developed to self assess depression using DCM and PHQ-8 questionnaire. The application is tested on real time data of 50 subjects under the supervision of a qualified psychiatrist and an accuracy of 90% is obtained.
Mental disorders, e.g. depression and dementia, are categorized as priority conditions according to the World Health Organization (WHO). When diagnosing, psychologists employ structured questionnaires/interviews, and different cognitive tests. Although accurate, there is an increasing necessity of developing digital mental health support technologies to alleviate the burden faced by professionals. In this paper, we propose a multi-modal approach for modeling the communication process employed by patients being part of a clinical interview or a cognitive test. The language-based modality, inspired by the Lexical Availability (LA) theory from psycho-linguistics, iden-tifies the most accessible vocabulary of the interviewed subject and use it as features in a classification process. The acoustic-based modality is processed by a Convolutional Neural Network (CNN) trained on signals of speech that predominantly contained voice source characteristics. In the end, a late fusion technique, based on majority voting, assigns the final classifi-cation. Results show the complementarity of both modalities, reaching an overall Macro-F1 of 84% and 90% for Depression and Alzheimer’s dementia respectively.
We developed a model for detecting anxiety and depression from telephony recordings between a customer and a representative at a call center using vocal features and a deep neural network. Our binary classification model using x-vectors outperformed the use of the other acoustic features such as i-vectors and openSMILE features, as well as linguistic or text-based features. Our models were built based on self-reported scores: GAD-7 for anxiety and PHQ-8 for depression. Especially, the anxiety model’s performance is very similar to the GAD-7 score’s screening accuracy. A prior study compared self-reported GAD-7 scores to an actual mental health profes-sional’s diagnosis of anxiety disorder and reported sensitivity and specificity of 0.74 and 0.54 respectively, and our model showed a sensitivity of 0.70 and a specificity of 0.54. This study exhibits the potential of voice analysis on topic-independent speech, particularly from 8 kHz phone conversations, to identify anxiety and depression.
The rapidly growing number of depressed people increases the burden of clinical diagnosis. Due to the abnormal speech signal of depressed patients, automatic audio-based depression recognition has the potential to become a complementary method for diagnosing. However, recognition performance varies largely with different speech acquisition tasks and classifiers, making results not comparable, and the performance requires further improvement before clinical application. This work extracted high-level statistical acoustic features (prosodic, voice-quality, and spectral features) of 23 depressed patients and 29 healthy subjects under spontaneous pronunciation tasks (interview and picture description) and mechanical pronunciation tasks (story reading and word reading), then applied principal component analysis (PCA) to reduce features dimensions, finally employed multilayer perceptron (MLP) to establish the classification model and compared with traditional classifiers (logistic regression, support vector machine, decision tree, and naive Bayes). The results showed that spontaneous pronunciation induced more significantly discriminative acoustic features and achieved better recognition performance accordingly. And the PCA retained 90% useful information with 50% features. Furthermore, MLP achieved the best performance with the accuracy 0.875 and average F1 score 0.855 under the picture description task. This study provides support for task design and classifier building for audio-based depression recognition, which could assist in mass screening for depression.
NeuroReflect is a modular, offline-capable AI framework for early mental health screening using speech and text. It integrates (i) speech emotion recognition from acoustic features (MFCC, Chroma, Contrast) on RAVDESS, (ii) multilabel text emotion classification via TF-IDF and One-vs-Rest Logistic Regression on Reddit/eRisk data, and (iii) binary depression detection from voice prosody on DAIC-WOZ. We introduce emotion conflict scoring-detecting co-occurring opposite-valence emotions (e.g., joy + fear)-as a lightweight proxy for linguistic disorganization. The system achieves 74.2 % accuracy (ROC-AUC 0.75) in depression detection, 71% Hamming score in text emotion labeling, and 62.4 % in speech emotion recognition. All processing runs locally via a Python CLI, enabling privacy-preserving, real-time assessment in telehealth and self-monitoring scenarios.
No abstract available
No abstract available
OBJECTIVE To develop a deep learning model to assess anxiety and depression from acoustic and lexical biomarkers able to analyze Italian psychotherapy recordings and classify three distinct conditions: depression, anxiety, and no pathology. METHOD Five patients diagnosed with either Major Depressive Disorder or Generalized Anxiety Disorder were selected from psychotherapy sessions conducted at RAM Psyche. A total of seven audio recordings were manually analyzed by a clinical psychologist using the DASS-21 scale, resulting in over 1000 audio segments labeled for psychopathological content. From these recordings, acoustic features and lexical markers were extracted. These features were processed through a hybrid architecture combining a Convolutional Neural Network for Mel spectrogram analysis and a Multi-Layer Perceptron for integrating lexical and acoustic inputs. Three model variants (VOM 1.1, 1.2, and 1.3) were trained and evaluated using two custom datasets (DVOM2, DVOM3), including both internal patient audio and external neutral voices. RESULTS The model successfully classified segments into depression, anxiety, and no pathology with promising results. Feature importance analysis revealed that prosodic cues such as lower pitch, reduced intensity, and increased pauses were highly predictive of depression, while lexical richness and adverb usage were associated with both disorders. Among the model variants, VOM 1.1 showed balanced performance across all three classes, particularly excelling in detecting depression and no pathology. In contrast, VOM 1.2 prioritized depression and anxiety detection, occasionally misclassifying ambiguous cases as symptomatic, suggesting a heightened sensitivity to subtle pathological cues. VOM 1.3 while maintaining a strong classification performance, demonstrated improved robustness on external neutral voices. CONCLUSIONS The Voice of Mind model demonstrates the feasibility of using speech data to support mental health diagnostics. Its capacity to distinguish between depression and anxiety, while maintaining generalization across nonpathological voices, suggests its potential as a clinical decision-support tool.
Background Previous studies have classified major depression and healthy control groups based on vocal acoustic features, but the classification accuracy needs to be improved. Therefore, this study utilized deep learning methods to construct classification and prediction models for major depression and healthy control groups. Methods 120 participants aged 16–25 participated in this study, included 64 MDD group and 56 HC group. We used the Covarep open-source algorithm to extract a total of 1200 high-level statistical functions for each sample. In addition, we used Python for correlation analysis, and neural network to establish the model to distinguish whether participants experienced depression, predict the total depression score, and evaluate the effectiveness of the classification and prediction model. Results The classification modelling of the major depression and the healthy control groups by relevant and significant vocal acoustic features was 0.90, and the Receiver Operating Characteristic (ROC) curves analysis results showed that the classification accuracy was 84.16%, the sensitivity was 95.38%, and the specificity was 70.9%. The depression prediction model of speech characteristics showed that the predicted score was closely related to the total score of 17 items of the Hamilton Depression Scale(HAMD-17) (r=0.687, P<0.01); and the Mean Absolute Error(MAE) between the model’s predicted score and total HAMD-17 score was 4.51. Limitation This study’s results may have been influenced by anxiety comorbidities. Conclusion The vocal acoustic features can not only effectively classify the major depression and the healthy control groups, but also accurately predict the severity of depressive symptoms.
This project aims to study the effect of time duration of audio datasets and feature extractions for depression classification of machine learning models. The three different time durations of audio signals including dataset (1): 1–6 min, dataset (2): 1 min, and dataset (3): 30 seconds were investigated. There were 5 machine learning models applied with 12 feature extractions including 4 features of time domains, 4 features of frequency domains, and 4 features of time-frequency domains. The statistic significant of time duration of dataset and feature extraction were examined by one-way ANDV A compared to a p-critical of 0.05. The results of performance metrics indicate the highest accuracy, and fl-score was achieved by logistic regression with MFCC on dataset (1), which was 88.24 % and 85.71%, respectively. Moreover, the precision and recall reached 1 by several models and feature extraction, especially in frequency domains and time-frequency domains. Also, the time durations and feature extraction on each dataset didn't have statistical significance on model performance.
Depression is a very common mental illness. In severe cases, it is a scary disease that can lead to suicide. Consequently, early diagnosis is essential because it can improve with appropriate treatment if discovered early. Recently, research on voice-based automatic depression detection systems has been actively conducted. Most existing studies to date have diagnosed depression by analyzing the characteristics of voice signals fragmented. Data is not used efficiently because the signal is analyzed only from a specific and fragmentary aspect. To solve this problem, we propose a method to extract features from both the time series and spatial aspects of the signal. First, MFCC (Mel-Frequency Cepstral Coefficient) is used to extract important features through time series analysis of speech signal. Then these extracted features are inputted into a CNN AE (Convolutional Auto-Encoder) to capture complex patterns among the initial features. This methodology can extract features that contain important information from both time series and spatial aspects. Consequently, it makes it possible to learn features and subtle signals highly associated with depression from speech data. The proposed method analyzes voice signals in both time series and spatial aspects, allowing the model to utilize more diverse information. As a result, depression can be diagnosed more accurately. To evaluate the performance of the proposed method, experiments are performed on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset and compare with frequently used feature extraction methods in voice signal analysis, such as MFCC and CNN. As a result, the proposed method showed an accuracy of 91%, an improvement of about 15% compared to previous studies.
Depression is a prevalent mental health disorder, and early detection is crucial for timely intervention. Traditional diagnostics often rely on subjective judgments, leading to variability and inefficiency. This study proposes a fusion model for automated depression detection, leveraging bimodal data from voice and text. Wav2Vec 2.0 and BERT pre-trained models were utilized for feature extraction, while a multi-scale convolutional layer and Bi-LSTM network were employed for feature fusion and classification. Adaptive pooling was used to integrate features, enabling simultaneous depression classification and PHQ-8 severity estimation within a unified system.Experiments on the CMDC and DAIC datasets demonstrate the model’s effectiveness. On CMDC, the F1 score improved by 0.0103 and 0.2017 compared to voice-only and text-only models, respectively, while RMSE decreased by 0.5186. On DAIC, the F1 score increased by 0.0645 and 0.2589, with RMSE reduced by 1.9901. These results highlight the proposed method’s ability to capture and integrate multi-level information across modalities, significantly improving the accuracy and reliability of automated depression detection and severity prediction.
Depressive disorder is a common mental illness often triggered by environmental stress or social influences. It impacts mood and behavior, leading to difficulties in areas such as education, family life, and the workplace. In severe cases, depression can result in suicide attempts. Fortunately, depression is a treatable condition when properly diagnosed by mental health professionals. However, many individuals aware of their mental health issues hesitate to seek psychiatric care due to long waiting times and high treatment costs. To overcome this problem, this paper proposes a voice-based depression classification using temporal inductive path neural networks (TIPNNs) for analyzing speech patterns to identify mental health states (AVA-TIPNN-DD). Initially, input data are gathered from voice recordings. The input data are preprocessed using the Koopman Kalman particle filter, which is used to clean the audio recordings. After that, the preprocessed data are given to multi objective matched synchro squeezing chirplet transform for extracting the spectrum features such as power spectrum density, spectral power, spectral centroid, and spectral flatness measure. Then the extracted features are fed into TIPNN, which is used for detecting and classifying depressions such as depressed and nondepressed. Generally, TIPNN does not show adapting optimization strategies to determine optimal parameters to ensure accurate detection. Hence, the binary battle royale optimizer algorithm is used to optimize the weight parameter of TIPNN, which accurately detects depression. The proposed method is implemented in Python, and the efficacy of the AVA-TIPNN-DD technique was assessed based on various performance measures, including Accuracy, Precision, Sensitivity, Specificity, and F1-score. Performance of the AVA-TIPNN-DD method achieves 99.26% Accuracy, 99.5% Precision, 99.6% Sensitivity, and 99.5% F1-Score when analyzed through existing techniques such as the decision-support system for major depression detection utilizing spectrum and convolution neural network with electroencephalogram signals (DSS-DD-CNN), a textual-based feature method for depression recognition by machine learning classifiers and social media texts (TBFA-DD-FFANN), and smart voice detection that relies on deep learning for depression diagnosis (SVR-DD-DRN), respectively.
The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.
Major depressive disorder, referred to as depression, is a leading cause of disability, absence from work, and premature death. Automatic assessment of depression from speech is a critical step towards improving diagnosis and treatment of depression. Previous works on depression assessment from speech considered various acoustic features extracted from speech to estimate depression severity. But performance of these approaches is not at clinical standards, and thus requires further improvement. In this work, we examine two novel approaches for improving depression severity estimation from short audio recordings of speech. Specifically, in audio recordings of a narrative by individuals diagnosed with major depressive disorder, we analyze spectral-based and excitation source-based features extracted from speech, and significance of sentiment and emotion classification in estimation of depression severity. Initial results indicate synchrony between depression scores and the sentiment and emotion labels. We propose the use of sentiment and emotion based embeddings obtained using machine learning techniques in estimation of depression severity. We also propose use of multi-task training to better estimate depression severity. We show that the proposed approaches provide additive improvements in the estimation of depression severity.
Background Radiomics is an emerging image analysis framework that provides more details than conventional methods. In present study, we aimed to identify structural radiomics features of gray matter (GM) and white matter (WM), and to develop and validate the classification model for major depressive disorder (MDD) and subthreshold depression (StD) diagnosis using radiomics analysis. Methods A consecutive cohort of 142 adolescents and young adults, including 43 cases with MDD, 49 cases with StD and 50 healthy controls (HC), were recruited and underwent the three-dimensional T1 weighted imaging (3D-T 1 WI) and diffusion tensor imaging (DTI). We extracted radiomics features representing the shape and diffusion properties of GM and WM from all participants. Then, an all-relevant feature selection process embedded in a 10-fold cross-validation framework was used to identify features with significant power for discrimination. Random forest classifiers (RFC) were established and evaluated successively using identified features. Results The results showed that a total of 3030 features were extracted after preprocessing, including 2262 shape-related features from each T1-weighted image representing GM morphometry and 768 features from each DTI representing the diffusion properties of WM. 25 features were selected ultimately, including ten features for MDD versus HC, eight features for StD versus HC, and seven features for MDD versus StD. The accuracies and area under curve (AUC) the RFC achieved were 86.75%, 0.93 for distinguishing MDD from HC with significant radiomics features located in the left medial orbitofrontal cortex, right superior and middle temporal regions, right anterior cingulate, left cuneus and hippocampus, 70.51%, 0.69 for discriminating StD from HC within left cuneus, medial orbitofrontal cortex, cerebellar vermis, hippocampus, anterior cingulate and amygdala, right superior and middle temporal regions, and 59.15%, 0.66 for differentiating MDD from StD within left medial orbitofrontal cortex, middle temporal and cuneus, right superior frontal, superior temporal regions and hippocampus, anterior cingulate, respectively. Conclusion These findings provide preliminary evidence that radiomics features of brain structure are valid for discriminating MDD and StD subjects from healthy controls. The MRI-based radiomics approach, with further improvement and validation, might be a potential facilitating method to clinical diagnosis of MDD or StD.
The number of depression patients worldwide, particularly in Thailand, is increasing on an upward trend. Depression screening commonly relies on self-report questionnaires. However, these instruments provide subjective assessments. Recent advancements in machine learning technology offer potential improvements in diagnostic accuracy through more objective measures. This study aims to evaluate the effectiveness of machine learning models in classifying depression using a bilingual audio dataset comprising Thai and English languages. Such models have the potential to assist clinicians by providing objective preliminary screening for depression based on vocal analysis, enhancing diagnostic precision and clinical decision-making. Various machine learning models were implemented including KNN, MLP, Random Forest, Decision Tree, SGD, Logistic Regression, SVM, AdaBoost, and Gaussian Naïve Bayes using MFCC-converted audio datasets. The results indicate that machine learning models effectively classify and identify depression even in bilingual audio datasets compared to individual language models, with the highest accuracy reaching 0.95 from MLP and KNN when testing the trained model by a single Thai audio.
Depression is a kind of mental illness that is harmful to the development of society. Electroencephalography (EEG) is a promising tool in the area of auxiliary diagnosing diseases. In this paper, we develop a mutual information-based least absolute shrinkage and selection operator (MI-LASSO) model to learn representative features from the power spectral density (PSD) ratio extracted from data. Specifically, MI-LASSO adds an adaptive weight based on mutual information to LASSO, which can discriminate weights of different features. Following the feature selection accomplished by MI-LASSO, the feature set is input into the classifier. We design a stacking ensemble classifier composed of support vector machine (SVM), adaptive boosting (AdaBoost), random forest (RF), and K-nearest neighbor (KNN). Compared to independent classifiers, stacking has a stronger ability to recognize depression. The proposed framework is validated on the open datasets: MODMA and the dataset from Hospital Universiti Sains Malaysia (HUSM). The best classification accuracy on MODMA achieved 99.025%. The best classification accuracy on the second dataset achieved 99.06%. The results indicate that our framework outperforms other EEG-based methods in the identification of depression. We conducted several experiments whose results demonstrate our framework can effectively assist in the diagnosis of depression based on EEG.
The consequences of depression are breathtaking these days. The suicidal tendency, as well as other fatigues, depression has almost soaked the world. A detection system can combat such consequences early. Motor activity sensor values carry out an individual's daily routine activities that can somewhat signify momentary changes in behavior. A consolidation of these motor sensor data with other demographic, clinical data can be very convenient in terms of depression detection. The combination of motor sensor reads as well as demographic data has been obligated in this study with machine learning approaches, namely Random Forest(RF), AdaBoost, and Artificial Neural Networks (ANN), achieving accuracy and Fl-score of 98% in both cases. The Cohen's kappa coefficient and Matthew's correlation coefficient are 0.96 in both factors.
Depression has emerged as a high profile mental health issue, particularly among students. This study obtained multisource heterogeneous data from the Kaggle open source platform. During the data preprocessing phase, an effective data dimensionality reduction method was proposed: a two stage feature selection approach combining significance statistics and an Improved Binary Grey Predictive Evolutionary Algorithm (IBGPEA). The proposed method employs a feature hierarchy mechanism in the first stage by integrating statistical concepts with the “jump degree” principle, while the second stage introduces a novel feature selection strategy based on grey predictive evolutionary algorithm to enhance prediction accuracy and accelerate convergence. Based on the selected features, a depression risk prediction model was constructed using TabNet. The model leverages attention mechanisms and sequential decision making processes to effectively capture complex relationships among features. Interpretability analysis reveals that the feature “Have you ever had suicidal thoughts” demonstrates significant contribution to depression risk prediction. Compared with Random Forest, SVM, KNN, Multi Layer Perceptron, and Adaboost, the TabNet model shows remarkable advantages in various performance indicators, enabling more accurate and reliable prediction of depression risk.
Introduction: WHO projected that the depression disorder will increase over the next two decades; therefore, it should be diagnosed early and the sooner it can be detected and prevented. Several machine learning algorithms that may predict depression will be compared in this paper: CNN, LSTM, Bidirectional LSTM, Naive Bayes, Logistic Regression, Random Forest, AdaBoost, and Support Vector Machine. The results show that the models of deep learning, mainly variants of CNN and LSTM, exhibit high accuracy, and traditional classifiers perform exceptionally well. These results highlight how machine learning can contribute towards developing useful tools for the prevention of mental health disorders early in life. In modern life, online social platforms have very subtly integrated into the daily routine of a significant proportion of global citizens.
Mental health conditions like anxiety and depression often involve emotional dysregulation, but most methods used to measure emotions depend on people’s own subjective reports. The goal of this study is to use signal processing and machine learning to create a system that can objectively classify emotional states based on physiological signals, especially EEG and ECG, to support real-time mental health monitoring and intervention. For this research, the DREAMER dataset and features from EEG and ECG channels for 23 subjects were used. Bandpass filtering and z-score normalization were used for preprocessing to remove noise and make the data more consistent. Band power, entropy, and HRV were extracted from 14 EEG and 2 ECG channels. To differentiate between baseline and stimulus conditions, multiple classification models were evaluated, including Gradient Boosting, Random Forest, AdaBoost Classifier, Logistic Regression, and K-Nearest Neighbors. Subject-exclusive train-test splits ensured generalizability. The Gradient Boosting model achieved the highest performance among all tested algorithms, with an F1 score of 76.74%, an accuracy of 77.78%, a precision of 80.49%, and a recall of 73.33%, suggesting that EEG and ECG signals offer a promising foundation for developing objective, physiology-based emotion recognition systems. The multimodal machine learning approach used in this study also lends future work to incorporate additional bio-signals, such as skin temperature from smartwatches, to enhance emotion detection accuracy in clinical and personal health applications.
Depression, a pervasive mental disorder, significantly impacts individuals' daily lives, manifesting as persistent sadness, a negative disposition, lack of interest or pleasure, feelings of guilt or worthlessness, and sleep disturbances. Advances in psychological methodologies, including the use of questionnaires and the analysis of social media posts, have facilitated the detection of individuals affected by this mental health issue. While mental health is a priority in many developed countries, it remains primarily neglected in regions like Bangladesh. This neglect is evident in the lack of a standardized corpus for detecting depression in the area. In this paper, we present the annotated data collected by our crawlers and detail the processes undertaken to standardize this corpus. We also propose a novel model for depression analysis and detection, which integrates Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Our model achieves an accuracy of 90.11%, demonstrating its efficacy. Additionally, we compare the proposed model with various machine learning models, including SVM, Random Forest, Naïve Bayes, KNN, AdaBoost, CNN, LSTM, and Decision Tree with hyperparameter tuning, using several performance metrics to highlight the advantages of our approach.
Abstract “Attention-Deficit Hyperactivity Disorder (ADHD)” is a neuro-developmental disorder in children under 12 years old. Learning deficits, anxiety, depression, sensory processing disorder, and oppositional defiant disorder are the most frequent comorbidities of ADHD. This research focuses on ADHD in children, considering its common occurrence and frequent coexistence with other mental disorders. The study utilizes the resting-state open-eye “Electroencephalogram” (EEG) signals of 61 children with ADHD and 60 healthy children. Morphological and “Power Spectral Density” (PSD) features associated with ADHD are analysed and “Principal Component Analysis” (PCA) is employed to reduce data dimensionality. Classification algorithms including AdaBoost, “K-Nearest Neighbour” (KNN) classifier, Naive Bayes, and random forest are utilized, with the Bernoulli Naive Bayes classifier achieving the highest accuracy of 96%. This study found some relevant characteristics for classification at the frontal (F), central (C), and parietal (P) electrode placement sites. Finally, this reveals distinct EEG patterns in children with ADHD and the study provides a potential supplementary method for ADHD diagnosis.
There is a significant correlation between depression, verbal behavior, and facial expressions. By analyzing patients’ audio and facial visuals, depression assessments can be conducted. However, existing work is predominantly based on single modalities. Additionally, acquiring a sufficient amount of labeled data in clinical settings is challenging and costly. To leverage multimodal audio-visual data while addressing the issue of lacking trainable labeled data, we propose an audiovisual multimodal semi-supervised depression detection model based on Graph Neural Networks (AVS-GNN). This model first extracts dual-modality temporal information from audio features and facial visual features of patients and obtains modality-specific high-level embedding representations through graph representation learning. Subsequently, it utilizes graph-based contrastive unsupervised learning to capture consistency information between pairs of unlabeled samples across different modalities and to facilitate cross-modal interactions. We specifically designed a hybrid weighted pseudo-labeling strategy to assign high-confidence pseudo-labels to unlabeled data and further retraining the model. Experiments on two depression datasets show that this model outperforms baseline methods across all evaluation metrics.
This letter proposes a multimodal depression risk assessment (mDRA) framework to overcome the limitations of single-modal approaches and data fusion in depression detection from audio and text. The mDRA leverages advanced neural network models, including a BiLSTM model with ELMO embeddings for text analysis and a ResNet50 model to extract 2D heatmap features from audio inputs. A collaborative attention mechanism integrates cross-modal information to enhance diagnostic precision. Experiments on the EATD-Corpus (Chinese language) and DAIC-WoZ (English language) datasets demonstrate the framework's effectiveness, achieving F1-scores of 0.94 (18% improvement) and 0.88 (3% improvement), outperforming state-of-the-art methods. These improvements enhance the reliability of early depression risk screening across different languages, which is critical for timely intervention. When deployed on a cloud platform with a mobile interface, mDRA can facilitate user-friendly depression screening through interactive virtual conversations, showcasing potential for early-stage assessment.
Automatic depression detection based on audio and text representations from participants’ interviews has attracted widespread attention. However, most of previous researches only used one type of feature of one single modality for depression detection, so that the rich information of audio and text from interviews has not been fully utilized. Moreover, an effective multi-modal fusion approach to leverage the independence among audio and text representations is still lacking. To address these problems, we propose a multi-modal fusion depression detection model based on the interaction of multilevel audio features and text sentence embedding. Specifically, we first extract Low-Level Descriptors (LLDs), mel-spectrogram features, and wav2vec features from the audio. Then we design a Multi-level Audio Features Interaction Module (MAFIM) to fuse these three levels of features for a comprehensive audio representation. For interview text, we use pre-trained BERT to extract sentence-level embedding. Further, to effectively fuse audio and text representations, we design a Channel Attention-based Multi-modal Fusion Module (CAMFM) by taking into account the independence and correlation between two different modalities. Our proposed model shows better performance on two datasets, DAIC-WOZ and EATD-Corpus, than existing methods, so it has a high potential to be applied for interview-based depression detection in practice.
Depression is a global mental health problem, the worst case of which can lead to suicide. An automatic depression detection system provides great help in facilitating depression self-assessment and improving diagnostic accuracy. In this work, we propose a novel depression detection approach utilizing speech characteristics and linguistic contents from participants’ interviews. In addition, we establish an Emotional Audio-Textual Depression Corpus (EATD-Corpus) which contains audios and extracted transcripts of responses from depressed and non-depressed volunteers. To the best of our knowledge, EATD-Corpus is the first and only public depression dataset that contains audio and text data in Chinese. Evaluated on two depression datasets, the proposed method achieves the state-of-the-art performances. The outperforming results demonstrate the effectiveness and generalization ability of the proposed method. The source code and EATD-Corpus are available at https://github.com/speechandlanguageprocessing/ICASSP2022-Depression.
Depression remains one of the most prevalent and underdiagnosed mental health disorders globally, necessitating scalable, objective, and non-invasive diagnostic tools. Speech, as a rich biomarker of emotional and psychological states, offers a promising avenue for automated depression detection. This study proposes a robust hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer architectures to classify depression severity into three levels: normal, mild, and severe. Using a curated multimodal dataset comprising 400 labeled audio recordings, we extract comprehensive acoustic features, including MFCC, Chroma, Spectrogram, Contrast, and Tonnetz representations. Models are evaluated using precision, recall, F1-score, and accuracy. Experimental results show that the proposed hybrid models outperform traditional architectures, achieving up to 99% accuracy and strong generalization across all classes. This study demonstrates the potential of attention-enhanced hybrid architectures in mental health assessment and provides a foundation for future deployment in clinical and real-world settings. Future work includes multimodal fusion with EEG data and the implementation of explainable AI for clinical interpretability.
Depression is a severe mental illness, and extracting emotional information from video-audio signals for multimodal depression recognition is a challenging problem. Recent methods use the self-attention (SA) mechanism from Transformers to capture the dynamic relationships between different modalities. However, the quadratic computational complexity of SA reduces its effectiveness in modeling long sequences, making it insufficient for capturing complex intra-modal and inter-modal complementarity. To address this issue, this work proposes a Multimodal Fusion Mamba (MFMamba) framework, which is attention-free and purely focuses on using state space models (SSMs) for long-sequence modeling. Specifically, we devise Video Spatio-Temporal Mamba (VSTMamba) and Audio Temporal Mamba (ATMamba) for video-audio feature extraction. To fully capture the correlations among multimodal features and eliminate information redundancy, we introduce Fusion Mamba (FMamba) to integrate various features effectively. In experiments on AVEC 2013 and AVEC 2014 datasets, our method achieved competitive results.
Automatic Depression Detection (ADD) garners widespread attention due to its convenience and objectivity. While existing research makes significant progress, challenges remain. First, most current ADD methods struggle to balance computational overhead and prediction accuracy. Second, these methods primarily rely on facial images and audio, which are susceptible to external factors, affecting the model’s generalizability. In this study, we propose the Spatiotemporal Ensemble Mamba (STE-Mamba), a framework based entirely on the Mamba architecture for detecting and ensembling spatiotemporal information. This approach reduces computational overhead while effectively capturing long-range spatiotemporal information. Additionally, we extract remote Photoplethysmography (rPPG) and emotion trends (ET) from facial videos, providing two more generalizable physiological modalities for ADD. Experimental results indicate that the inclusion of the ET modality, which only adds two dimensions, improves diagnostic accuracy by approximately 6%. We conduct extensive experiments on five datasets (AVEC2013, AVEC2014, AVEC2017, AVEC2019, CMDep), and the results demonstrate that STE-Mamba is highly competitive in terms of both effectiveness and generalizability. The self-built CMDep can be requested via the following link.
Depression is one of the most common mental health disorders, it’s crucial to design an effective and robust model for automatic depression detection (ADD). Although current approaches rely on extra topic models or manually topic-selection procedures which is time-consuming, they still haven’t thoroughly explored the sufficient context information among clinical interviews. In this paper, we propose HCAG, a novel Hierarchical Context-Aware Graph attention model for ADD. Our model mirrors the hierarchical structure of depression assessment and leverages the Graph Attention Network (GAT) to grasp relational contextual information of text/audio modality. Experiments on the DAIC-WOZ dataset show a great performance improvement, with the Fl-score of 0.92, a Mean Absolute Error (MAE) of 2.94, and a Root Mean Square Error (RMSE) of 3.80. To the best of our knowledge, our model outperforms the existing state-of-the-art methods.
The high prevalence of mental disorders is increasingly pressuring public healthcare services and has become a major global societal and health challenge. Current deep learning-based diagnostic systems are costly and often lack versatility, being limited to expensive MRI machines and trained staff constraint neuroimaging data and processing. This paper introduces a cost-effective audio-visual detection model that leverages the emotional expression features of mental disorders. The proposed model innovatively integrates the Wavelet Transform (WT) into the ResNet backbone, replacing the conventional convolutional kernels. This approach expands the receptive field and generates effective multi-frequency responses. It also proposed the audio-visual fusion methods to capture features across both modalities efficiently. The proposed model significantly improves the detection of mental disorders in three real audio-visual AVEC 2013, AVEC 2014 and real multimodal ADHD datasets focused on depression and ADHD. The model demonstrates state-of-the-art performance, achieving over 80% accuracy across the datasets, indicating strong robustness and efficiency with potential for broad application and screening.
The automatic assessment of depression through human voice has gained increasing interest due to its cost-effectiveness and non-invasiveness. This paper employs acoustic embeddings from emotionally rich speech segments for depression prediction, given the strong connection between depression and emotion expression. We leverage a large pre-trained model to predict depression. The public dimensional emotion model (PDEM) used in this study is fine-tuned for recognizing arousal, valence and dominance. We use PDEM for both extracting embeddings and selecting emotionally rich speech segments based on its arousal, valence, and dominance predictions. We advance the state-of-the-art performance on both the Androids corpus (Interview task) for depression detection, following the predetermined protocol, and the E-DAIC corpus for acoustic-based depression severity prediction, adhering to the 2019 Audio Visual Emotion Challenge (AVEC) protocol. The analysis demonstrates that emotionally rich speech segments contain more depression-related cues compared to emotion-neutral segments.
Depression is a common mental disorder that affects millions of people worldwide. Although promising, current multimodal methods hinge on aligned or aggregated multi-modal fusion, suffering two significant limitations: (i) inefficient long-range temporal modeling, and (ii) sub-optimal multimodal fusion between intermodal fusion and intramodal processing. In this paper, we propose an audio-visual progressive fusion Mamba for multimodal depression detection, termed DepMamba. DepMamba features two core designs: hierarchical contextual modeling and progressive multimodal fusion. On the one hand, hierarchical modeling introduces convolution neural networks and Mamba to extract the local-to-global features within long-range sequences. On the other hand, the progressive fusion first presents a multimodal collaborative State Space Model (SSM) extracting intermodal and intramodal information for each modality, and then utilizes a multimodal enhanced SSM for modality cohesion. Extensive experimental results on two large-scale depression datasets demonstrate the superior performance of our DepMamba over existing state-of-the-art methods. Code is available at https://github.com/Jiaxin-Ye/DepMamba.
This paper presents our approach to the first Multimodal Personality-Aware Depression Detection Challenge, focusing on multimodal depression detection using machine learning and deep learning models. We explore and compare the performance of XGBoost, transformer-based architectures, and large language models (LLMs) on audio, video, and text features. Our results highlight the strengths and limitations of each type of model in capturing depression-related signals across modalities, offering insights into effective multimodal representation strategies for mental health prediction.
Depression is a widespread mental health disorder that affects millions of people around the world. Compared to other forms of expression, text modality more authentically conveys human thoughts, making it valuable to explore its potential for multimodal depression detection. However, combining audio and text modalities for automatic depression detection presents challenges due to the redundancy of information and the differences between the two. This paper proposes a new approach to address these challenges by introducing a multimodal feature enhancement-based cross-modal feature fusion framework for depression detection. Our method focuses on extracting relevant features from both the audio and text modalities, with an emphasis on semantic information. Temporal convolutional networks are used to model the time-based features of each modality. Additionally, a bidirectional cross-attention modality expert module is designed to enhance and better understand the features of each modality. To effectively capture both the internal relationships within each modality and the connections between audio and text, we introduce a recursive joint-specific cross-modal attention feature fusion technique. This approach ensures that each modality's unique information is preserved during the fusion process, preventing the loss of important features. The proposed model is evaluated extensively on the DAIC-WoZ public dataset, and the results demonstrate its competitive performance compared to other methods.
Major depression, also known as clinical depression, is a constant sense of despair and hopelessness. It is a major mental disorder that can affect people of any age including children and that affect negatively person's personal life, work life, social life and health conditions. Globally, over 300 million people of all ages are estimated to suffer from clinical depression. A deep recurrent neural network-based framework is presented in this paper to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire (a depression assessment test) and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, data augmentation techniques are used to expand the labeled training set and also transfer learning is performed where the proposed model is trained on a related task and reused as starting point for the proposed model on SDR task. The proposed framework is evaluated on the DAIC-WOZ corpus of the AVEC2017 challenge and promising results are obtained. An overall accuracy of 76.27\% with a root mean square error of 0.4 is achieved in assessing depression, while a root mean square error of 0.168 is achieved in predicting the depression severity levels.
Automatic depression detection has attracted increasing amount of attention but remains a challenging task. Psychological research suggests that depressive mood is closely related with emotion expression and perception, which motivates the investigation of whether knowledge of emotion recognition can be transferred for depression detection. This paper uses pretrained features extracted from the emotion recognition model for depression detection, further fuses emotion modality with audio and text to form multimodal depression detection. The proposed emotion transfer improves depression detection performance on DAIC-WOZ as well as increases the training stability. The analysis of how the emotion expressed by de-pressed individuals is further perceived provides clues for further understanding of the relationship between depression and emotion.
Multimodal depression classification has gained immense popularity over the recent years. We develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to unimodal classifiers (7.5% and 13.7% for audio and text respectively). We show that in the case of limited training data, a segment-level classifier can first be trained to then obtain a session-wise prediction without hindering the performance, using a multi-stage convolutional recurrent neural network. A text model is trained using a Hierarchical Attention Network (HAN). The multimodal system is developed by combining embeddings from the session-level audio model and the HAN text model.
Mental health disorders often go undiagnosed due to the subjective nature of conventional self-assessment methods. This paper presents a multi-model deep learning framework that integrates text, audio, and video modalities for emotion aware mental health assessment. Exploiting transformer-based architectures, such as Wav2Vec2, BiLSTM with attention pooling and SWIN Transformers, the model realises simultaneous emotion recognition and depression detection using multi-task learning. A cross-modal attention fusion mechanism with residual connections is incorporated to achieve effective feature alignment and modality-invariant representations, and the machine learning strategy involved contrastive learning enables stronger robustness when meeting missing or noisy input feedings. Each modality embedding is mapped to a shared latent space, and fused as a 256 dimensional joint representation with an eight-head Cross-Modal Attention module. Residual Fusion Layers(RFL) provides stable training and modality-specific features. The former classifier predicts the degree of mental health risk (0–3: healthy to severe) and dominant emotions that are related to some standardized scales (PHQ-9, GAD-7). Experiments on CMU-MOSEI
Depression is a widespread global disease of increasing global concern. Early recognition of signs of depression is crucial to evaluating and treating or preventing mental illness. With advances in machine learning, it has become possible to develop intelligent systems capable of recognizing depression and its signs in speech by analyzing speech and processing audio signals. This study presents an AI model for detecting and predicting mental illnesses through speech analysis of medical datasets related to depression. We used Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) database where 60% of the data was reserved for training, while 20% was allocated for testing the model and another 20% for model validation. The model includes a convolutional neural network (CNN) to detect and predict mental illnesses. The proposed CNN model achieved an accuracy of 82% in the training and testing phases. Ultimately, the results are useful for classifying depression during English speaking and will be useful to psychiatrists and psychologists in contributing to early detection of depression at an affordable cost.
Depressions affects the entire nervous system and, in turn, human behavior. Electroencephalogram (EEG) signal classification of depression datasets using traditional methods takes time. Non‐invasive EEG signals provide valuable insights into this mental health condition’s neural patterns and abnormalities. The proposed ENS model classifies better than many other machine learning models on these EEG datasets. Thus, it will be used to investigate the dataset and classify it as normal or depressed. The ENS model reduces dimensions after extracting features, and multiple classifiers classify the dataset. The proposed work attains a maximum classification accuracy of 97%. In order to validate the hardware’s computational efficiency, the proposed method was implemented in FPGA, and performance analyses were performed on various multiply‐accumulate (MAC) units. Overall performance of the proposed work is improved to 98.8% compared to the conventional approach.
Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former.
Major depressive disorder (MDD) poses a significant challenge in mental healthcare due to difficulties in accurate diagnosis and timely identification. This study explores the potential of machine learning models trained on EEG-based features for depression detection. Six models and six feature selection techniques were compared, highlighting the crucial role of feature selection in enhancing classifier performance. This study investigates the six feature selection methods: Elastic Net, Mutual Information (MI), Chi-Square, Forward Feature Selection with Stochastic Gradient Descent (FFS-SGD), Support Vector Machine-based Recursive Feature Elimination (SVM-RFE), and Minimal-Redundancy-Maximal-Relevance (mRMR). These methods were combined with six diverse classifiers: Logistic Regression, Support Vector Machine (SVM), Random Forest, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM). The results demonstrate the substantial impact of feature selection on model performance. SVM-RFE with SVM achieved the highest accuracy (93.54%) and F1 score (95.29%), followed by Logistic Regression with an accuracy of 92.86% and F1 score of 94.84%. Elastic Net also delivered strong results, with SVM and Logistic Regression both achieving 90.47% accuracy. Other feature selection methods yielded lower performance, emphasizing the importance of selecting appropriate feature selection and machine learning algorithms. These findings suggest that careful selection and application of feature selection techniques can significantly enhance the accuracy of EEG-based depression detection.
Background Major depressive disorder (MDD) remains challenging to diagnose due to its reliance on subjective interviews and self-reports. Objective, technology-driven methods are increasingly needed to support clinical decision-making. Wearable point-of-view (POV) glasses, which capture both visual and auditory streams, may offer a novel solution for multimodal behavioral analysis. Objective This study investigated whether features extracted from POV glasses, analyzed with machine learning, can differentiate individuals with MDD from healthy controls. Methods We studied 44 MDD patients and 41 age/sex-matched HCs (18–55 years). During semi-structured interviews, POV glasses recorded video and audio data. Visual features included gaze distribution, smiling duration, eye-blink frequency, and head movements. Speech features included response latency, silence ratio, and word count. Recursive feature elimination was applied. Multiple classifiers were evaluated, and the primary model—ExtraTrees—was assessed using leave-one-out cross-validation. Results After Bonferroni correction, smiling duration, center gaze and happy face duration showed significant group differences. The multimodal classifier achieved an accuracy of 84.7%, sensitivity of 90.9%, specificity of 78%, and an F1 score of 86%. Conclusions POV glasses combined with machine learning successfully captured multimodal behavioral markers distinguishing MDD from controls. This low-burden, wearable approach demonstrates promise as an objective adjunct to psychiatric assessment. Future studies should evaluate its generalizability in larger, more diverse populations and real-world clinical settings.
Contrary to what was assumed; our system achieves excellence in achieving maximally fair outcomes by being present in the environment variable. In the real sense, such approaches are empirically aided by theory to implement EEG-based depression detection. We established the parameters of development of EEG-based depression detection in optimization of channel selection together with machine-learning models. Extreme channel selection was performed during this study with Recursive Feature Elimination (RFE) whereby major 11 channels identified, and the MLP classifier achieved 98.7% accuracy supported by AI explainability, thus outpacing the XGBoost and LGBM by 5.2 to 8.2% across multiple datasets (n=184 to 382) and greatly endorsed incredible generalization (precision=1.000, recall=0.966). This makes MLP a trustworthy BCI tool for real-world implementation of depression screening. We also examined assigning depression stages (Mild/Moderate/Severe) on EEG data with models supported or not with GAN-based augmentation (198 to 5,000 samples). CNNs did well on Moderate-stage classification, while ANFIS kept a firm accuracy of 98.34% at perfect metric consistency (precision/recall=0.98) with AI explainability. GAN augmentation improved the classifications of severe cases by 15%, indicating a good marriage of neuro-fuzzy systems and synthetic data for the precise stage determination. This is an important contribution to BCI research since it offers a data-efficient and scalable framework for EEG based depression diagnosis and severity evaluation, thus contributing to the bridge between competitive modeling and clinical applicability. This work, therefore, lays down a pathway for the design of accessible and automated depression screening aids in both high-resource and low-resource settings.
No abstract available
Background Major depressive disorder (MDD) is a common mental illness, characterized by persistent depression, sadness, despair, etc., troubling people’s daily life and work seriously. Methods In this work, we present a novel automatic MDD detection framework based on EEG signals. First of all, we derive highly MDD-correlated features, calculating the ratio of extracted features from EEG signals at frequency bands between $$\beta$$ β and $$\alpha$$ α . Then, a two-stage feature selection method named PAR is presented with the sequential combination of Pearson correlation coefficient (PCC) and recursive feature elimination (RFE), where the advantages lie in minimizing the feature searching space. Finally, we employ widely used machine learning methods of support vector machine (SVM), logistic regression (LR), and linear regression (LNR) for MDD detection with the merit of feature interpretability. Results Experiment results show that our proposed MDD detection framework achieves competitive results. The accuracy and $$F_{1}$$ F 1 score are up to 0.9895 and 0.9846, respectively. Meanwhile, the regression determination coefficient $$R^2$$ R 2 for MDD severity assessment is up to 0.9479. Compared with existing MDD detection methods with the best accuracy of 0.9840 and $$F_1$$ F 1 score of 0.97, our proposed framework achieves the state-of-the-art MDD detection performance. Conclusions Development of this MDD detection framework can be potentially deployed into a medical system to aid physicians to screen out MDD patients.
No abstract available
No abstract available
This study addresses the challenge of monitoring dynamic stresses in high-pressure turbine blades of aviation engines by designing a non-contact dynamic stress prediction method based on a BP neural network for rotating Disks. Firstly, the research focuses on a bladed disk, simulating blade tip displacement and blade root strain under various excitation conditions, and calculating stress outcomes using Hooke’s Law. Subsequently, an input-output neural network surrogate model is constructed with disk excitation conditions as inputs and blade tip displacement and monitoring point stress as outputs. Finally, the input and output data are divided into training and testing sets, and the surrogate model is trained and validated, ultimately establishing a non-contact dynamic stress prediction model for the disk under the influence of multi-parameter coupling. The data for building the neural network surrogate model is obtained through flat-bladed disk simulations, with model prediction errors within 6%.
: The healthcare industry is currently experiencing a significant imbalance, with a high patient-to-doctor ratio, posing challenges to effective patient care. In the age of healthcare digitization, using machine learning (ML) to forecast diseases has become essential for accurately identifying and effectively treating a variety of medical disorders. This paper examines how to use Django and machine learning to build a reliable multi-disease prediction system for diseases such as stroke, depression in students, and diabetes. Based on user input features, the system is trained to categorize and forecast the likelihood of various diseases using real-world medical datasets. Both healthcare experts and non-technical individuals can profit from the system thanks to the Django framework's smooth user interface. The paper also discusses crucial procedures including data pre-processing, model selection, training, and deployment, emphasizing how crucial recall and dependability are for predictive health solutions. By offering prompt medical action, it seeks to increase awareness of how these technologies can transform early diagnosis, lower healthcare costs, and ultimately improve patient outcomes.
Background: Undiagnosed mental illnesses represent one of the biggest challenges in our society. Due to stigma surrounding mental health, many people experience symptoms years before diagnosis and often never receive active management.Objectives:We use person-generated health data, consisting of self-reports and data from consumer wearable devices to predict an individual's depression severity level. Methods: Reference labels and input feature sets were derived from a 1-year long longitudinal cohort study consisting of 10,036 individuals. Participant-reported PHQ-9 scores were used as reference labels for depression severity, and input feature sets consisted of self-reported socio-demographic information, lifestyle and medication change surveys, and objective behavioral data collected using consumer wearables. Results: Our best performing model achieved an adjacent accuracy of 0.889 (CI±0.006) and a Kappa of 0.655 (CI±0.015). We observe that socio-demographic features contribute strongly to model performance, and that although good performance can be achieved with self-reported features, the addition of a small number of threshold-based features, derived from objective wearable data, improves model robustness. Conclusions: To our knowledge, the presented classification model is developed using the largest longitudinal cohort study ever considered for depression diagnosis, and one of the first attempts to predict granular depression severity, beyond bi-nary classification of depressed individuals versus healthy controls.We demonstrate the feasibility of our approach for this non-trivial problem. Future work will focus on combining the output labels of this model with self-reports in order to attempt to predict changes in individual, longitudinal mental health status.
BACKGROUND Falls are the most common adverse outcome of depression in older adults, yet a accurate risk prediction model for falls stratified by distinct long-term trajectories of depressive symptoms is still lacking. METHODS We collected the data of 1617 participants from the China Health and Retirement Longitudinal Study register, spanning between 2011 and 2018. The 36 input variables included in the baseline survey were regarded as candidate features. The trajectories of depressive symptoms were classified by the latent class growth model and growth mixture model. Three data balancing technologies and four machine learning algorithms were utilized to develop predictive models for fall classification of depressive prognosis. RESULTS Depressive symptom trajectories were divided into four categories, i.e., non-symptoms, new-onset increasing symptoms, slowly decreasing symptoms, and persistent high symptoms. The random forest-TomekLinks model achieved the best performance among the case and incident models with an AUC-ROC of 0.844 and 0.731, respectively. In the chronic model, the gradient boosting decision tree-synthetic minority oversampling technique obtained an AUC-ROC of 0.783. In the three models, the depressive symptom score was the most crucial component. The lung function was a common and significant feature in both the case and the chronic models. CONCLUSIONS This study suggests that the ideal model has a good chance of identifying older persons with a high risk of falling stratified by long-term trajectories of depressive symptoms. Baseline depressive symptom score, lung function, income, and injury experience are influential factors associated with falls of depression evolution.
Deep neural networks are vulnerable and susceptible to adversarial examples, which can induce erroneous predictions by injecting imperceptible perturbations. Transferability is a crucial property of adversarial examples, enabling effective attacks under black-box settings. Adversarial examples at flat maxima-those around which the loss peaks and grows slowly-have been demonstrated to exhibit higher transferability. Existing methods to achieve flat maxima rely on the gradient of the worst-case loss within the small neighborhood around the adversarial point. However, the neighborhood structure is typically defined as a Euclidean space, which neglects the input space’s information geometry, leading to suboptimal results. In this work, we build upon the idea of flat maxima but extend the neighborhood structure from Euclidean space to the manifold measured by the Fisher metric, which takes into account the information geometry of the data space. In the non-Euclidean case, we search for the worst-case point in the direction of the natural gradient with respect to adversarial examples. The natural gradient adjusts the original gradient using the Fisher information matrix, giving the steepest direction in the manifold. Furthermore, to reduce the computational cost of calculating the Fisher information matrix, we introduce a diagonal approximation of the matrix and propose an empirical Fisher method under the model ensemble setting. Experimental results demonstrate that our proposed manifold extensions significantly enhance attack success rates against both normally and adversarially trained models. In particular, compared to methods relying on the Euclidean metric, our approach demonstrates more efficient performance.
Major depressive disorder (MDD) is the leading cause of disability worldwide, yet treatment selection still proceeds via “trial and error”. Given the varied presentation of MDD and heterogeneity of treatment response, the use of machine learning to understand complex, non-linear relationships in data may be key for treatment personalization. Well-organized, structured data from clinical trials with standardized outcome measures is useful for training machine learning models; however, combining data across trials poses numerous challenges. There is also persistent concern that machine learning models can propagate harmful biases. We have created a methodology for organizing and preprocessing depression clinical trial data such that transformed variables harmonized across disparate datasets can be used as input for feature selection. Using Bayesian optimization, we identified an optimal multi-layer dense neural network that used data from 21 clinical and sociodemographic features as input in order to perform differential treatment benefit prediction. With this combined dataset of 5032 individuals and 6 drugs, we created a differential treatment benefit prediction model. Our model generalized well to the held-out test set and produced similar accuracy metrics in the test and validation set with an AUC of 0.7 when predicting binary remission. To address the potential for bias propagation, we used a bias testing performance metric to evaluate the model for harmful biases related to ethnicity, age, or sex. We present a full pipeline from data preprocessing to model validation that was employed to create the first differential treatment benefit prediction model for MDD containing 6 treatment options.
Passive non-invasive sensing signals from wearable devices and smartphones are typically collected continuously without user input. This passive and continuous data collection makes these signals suitable for moment-by-moment monitoring of health-related outcomes, disease diagnosis, and prediction modeling. A growing number of studies have utilized machine learning (ML) approaches to predict and analyze health indicators and diseases using passive non-invasive signals collected via wearable devices and smartphones. This systematic review identified peer-reviewed journal articles utilizing ML approaches for digital phenotyping and measuring digital biomarkers to analyze, screen, identify, and/or predict health-related outcomes using passive non-invasive signals collected from wearable devices or smartphones. PubMed, PubMed with Mesh, Web of Science, Scopus, and IEEE Xplore were searched for peer-reviewed journal articles published up to June 2024, identifying 66 papers. We reviewed the study populations used for data collection, data acquisition details, signal types, data preparation steps, ML approaches used, digital phenotypes and digital biomarkers, and health outcomes and diseases predicted using these ML techniques. Our findings highlight the promising potential for objective tracking of health outcomes and diseases using passive non-invasive signals collected from wearable devices and smartphones with ML approaches for characterization and prediction of a range of health outcomes and diseases, such as stress, seizure, fatigue, depression, and Parkinson’s disease. Future studies should focus on improving the quality of collected data, addressing missing data challenges, providing better documentation on study participants, and sharing the source code of the implemented methods and algorithms, along with their datasets and methods, for reproducibility purposes.
One of the key parameters in radio link planning is the propagation path loss. Most of the existing methods for its prediction are not characterized by a good balance between accuracy, generality, and low computational complexity. To address this problem, a machine learning approach for path loss prediction is presented in this study. The novelty is the proposal of a compound model, which consists of two regression models and one classifier. The first regression model is adequate when a line-of-sight scenario is fulfilled in radio wave propagation, whereas the second one is appropriate for non-line-of-sight conditions. The classification model is intended to provide a probabilistic output, through which the outputs of the regression models are combined. The number of used input parameters is only five. They are related to the distance, the antenna heights, and the statistics of the terrain profile and line-of-sight obstacles. The proposed approach allows creation of a generalized model that is valid for various types of areas and terrains, different antenna heights, and line-of-sight and non line-of-sight propagation conditions. An experimental dataset is provided by measurements for a variety of relief types (flat, hilly, mountain, and foothill) and for rural, urban, and suburban areas. The experimental results show an excellent performances in terms of a root mean square error of a prediction as low as 7.3 dB and a coefficient of determination as high as 0.702. Although the study covers only one operating frequency of 433 MHz, the proposed model can be trained and applied for any frequency in the decimeter wavelength range. The main reason for the choice of such an operating frequency is because it falls within the range in which many wireless systems of different types are operating. These include Internet of Things (IoT), machine-to-machine (M2M) mesh radio networks, power efficient communication over long distances such as Low-Power Wide-Area Network (LPWAN)—LoRa, etc.
Parkinson's disease is a neurodegenerative condition that affects billions of persons worldwide. This abstract aims to shed light on the causes and consequences of this debilitating condition. The primary cause of Parkinson's disease is the progressive degeneration of dopaminergic neurons in the substantia nigra region of brain. This neuronal loss results in a depletion of dopamine, a crucial neurotransmitter responsible for regulating movement and coordination. Therefore, individuals with Parkinson's disease have symptoms like tremors, rigidity, bradykinesia, and postural instability. These signs profoundly impact the quality of life, causing difficulties with daily activities and reducing independence. In addition to motor symptoms, non-motor symptoms such as depression, cognitive impairment, and autonomic dysfunction often accompany the disease, further complicating the clinical picture. Research into the causes and consequences of Parkinson's disease is ongoing, with a focus on using efficient medications and refining the quality of life for those affected by this condition. Now by Using machine learning algorithms, we can predict whether a person has a specific disease based on input values like gender and age. These algorithms analyze patterns and relationships in data to get predictions about an individual's health status. This technology can assist in early disease detection and improve healthcare outcomes..
This letter considers model predictive control of a tandem-rotor helicopter. The error is formulated using the matrix Lie group $SE_{2}{(}3{)}$ . A reference trajectory to a target is calculated using a quartic guidance law, leveraging the differentially flat properties of the system, and refined using a finite-horizon linear quadratic regulator. The nonlinear system is linearized about the reference trajectory enabling the formulation of a quadratic program with control input, attitude keep-in zone, and attitude error constraints. A non-uniformly spaced prediction horizon is leveraged to capture the multi-timescale dynamics while keeping the problem size tractable. Monte-Carlo simulations demonstrate robustness of the proposed control structure to initial conditions, model uncertainty, and environmental disturbances.
Background: Deep learning has utility in predicting differential antidepressant treatment response among patients with major depressive disorder, yet there remains a paucity of research describing how to interpret deep learning models in a clinically or etiologically meaningful way. In this paper, we describe methods for analyzing deep learning models of clinical and demographic psychiatric data, using our recent work on a deep learning model of STAR*D and CO-MED remission prediction. Methods: Our deep learning analysis with STAR*D and CO-MED yielded four models that predicted response to the four treatments used across the two datasets. Here, we use classical statistics and simple data representations to improve interpretability of the features output by our deep learning model and provide finer grained understanding of their clinical and etiological significance. Specifically, we use representations derived from our model to yield features predicting both treatment non-response and differential treatment response to four standard antidepressants, and use linear regression and t-tests to address questions about the contribution of trauma, education, and somatic symptoms to our models. Results: Traditional statistics were able to probe the input features of our deep learning models, reproducing results from previous research, while providing novel insights into depression causes and treatments. We found that specific features were predictive of treatment response, and were able to break these down by treatment and non-response categories; that specific trauma indices were differentially predictive of baseline depression severity; that somatic symptoms were significantly different between males and females, and that education and low income proved important psycho-social stressors associated with depression. Conclusion: Traditional statistics can augment interpretation of deep learning models. Such interpretation can lend us new hypotheses about depression and contribute to building causal models of etiology and prognosis. We discuss dataset-specific effects and ideal clinical samples for machine learning analysis aimed at improving tools to assist in optimizing treatment.
In order to develop a predictive model that can distinguish Parkinson’s disease dementia (PDD) from other dementia types, such as Alzheimer’s dementia (AD), it is necessary to evaluate and identify the predictive accuracy of the cognitive profile while considering the non-motor symptoms, such as depression and rapid eye movement (REM) sleep behavior disorders. This study compared Parkinson’s disease (PD)’s non-motor symptoms and the diagnostic predictive power of cognitive profiles that distinguish AD and PD using machine learning. This study analyzed 118 patients with AD and 110 patients with PDD, and all subjects were 60 years or older. In order to develop the PDD prediction model, the dataset was divided into training data (70%) and test data (30%). The prediction accuracy of the model was calculated by the recognition rate. The results of this study show that Parkinson-related non-motor symptoms, such as REM sleep behavior disorders, and cognitive screening tests, such as Korean version of Montreal Cognitive Assessment, were highly accurate factors for predicting PDD. It is required to develop customized screening tests that can detect PDD in the early stage based on these results. Furthermore, it is believed that including biomarkers such as brain images or cerebrospinal fluid as input variables will be more useful for developing PDD prediction models in the future.
Accurately predicting grapevine yield and quality is critical for optimising vineyard management and ensuring economic viability. Numerous studies have reported the complexity in modelling grapevine yield and quality due to variability in the canopy structure, challenges in incorporating soil and microclimatic factors, and management practices throughout the growing season. The use of multimodal data and machine learning (ML) algorithms could overcome these challenges. Our study aimed to assess the potential of multimodal data (hyperspectral vegetation indices (VIs), thermal indices, and canopy state variables) and ML algorithms to predict grapevine yield components and berry composition parameters. The study was conducted during the 2019/20 and 2020/21 grapevine growing seasons in two South Australian vineyards. Hyperspectral and thermal data of the canopy were collected at several growth stages. Simultaneously, grapevine canopy state variables, including the fractional intercepted photosynthetically active radiation (fiPAR), stem water potential (Ψstem), leaf chlorophyll content (LCC), and leaf gas exchange, were collected. Yield components were recorded at harvest. Berry composition parameters, such as total soluble solids (TSSs), titratable acidity (TA), pH, and the maturation index (IMAD), were measured at harvest. A total of 24 hyperspectral VIs and 3 thermal indices were derived from the proximal hyperspectral and thermal data. These data, together with the canopy state variable data, were then used as inputs for the modelling. Both linear and non-linear regression models, such as ridge (RR), Bayesian ridge (BRR), random forest (RF), gradient boosting (GB), K-Nearest Neighbour (KNN), and decision trees (DTs), were employed to model grape yield components and berry composition parameters. The results indicated that the GB model consistently outperformed the other models. The GB model had the best performance for the total number of clusters per vine (R2 = 0.77; RMSE = 0.56), average cluster weight (R2 = 0.93; RMSE = 0.00), average berry weight (R2 = 0.95; RMSE = 0.00), cluster weight (R2 = 0.95; RMSE = 0.13), and average berries per bunch (R2 = 0.93; RMSE = 0.83). For the yield, the RF model performed the best (R2 = 0.97; RMSE = 0.55). The GB model performed the best for the TSSs (R2 = 0.83; RMSE = 0.34), pH (R2 = 0.93; RMSE = 0.02), and IMAD (R2 = 0.88; RMSE = 0.19). However, the RF model performed best for the TA (R2 = 0.83; RMSE = 0.33). Our results also revealed the top 10 predictor variables for grapevine yield components and quality parameters, namely, the canopy temperature depression, LCC, fiPAR, normalised difference infrared index, Ψstem, stomatal conductance (gs), net photosynthesis (Pn), modified triangular vegetation index, modified red-edge simple ratio, and ANTgitelson index. These predictors significantly influence the grapevine growth, berry quality, and yield. The identification of these predictors of the grapevine yield and fruit composition can assist growers in improving vineyard management decisions and ultimately increase profitability.
The topic of this paper is the design of a data-driven model predictive controller for a nonlinear ball-on-a-wheel system. A Koopman predictor is identified using simu-lation data of a ball-on-a-wheel system model. The challenge lies in using data, which is characterized by the instability of the given system. This limits the time horizon during which meaningful time series can be extracted. Within this scenario, the predictor is included in a model predictive control scheme. This controller successfully tracks a specified non-flat output trajectory subject to input and output constraints. Finally, the prediction quality of the Koopman predictor is compared with a linear predictor based on dynamic mode decomposition with control.
Trajectory planning with a learning-based approach has emerged as a crucial element in autonomous unmanned systems and has attracted substantial interest from both academia and industry. However, unresolved issues persist concerning data efficiency, safety, convergence, and generalization within the control pipeline. To address this gap, this work presents a trajectory planning method that combines the differential flatness of wheeled vehicle with global convergence property. Our proposed framework transforms the trajectory planning problem, integrating kinematic constraints into a motion planning paradigm. This transformation significantly reduces the state space associated with trajectory planning. Initially, Gaussian mixture regression (GMR) is employed to learn the nonlinear mapping from flat input, leveraging a limited number of demonstrations solved by the optimal control method. Subsequently, we design an asymmetric quadratic Lyapunov function that incorporates both random barrier information and the potential convergence property of the demonstration trajectories. Based on the optimized parameterized Lyapunov function incorporating the convergence and safety criteria, the analytical supplementary control is subsequently obtained by solving a quadratic programming problem to compensate for the prediction errors of GMR, which make the framework complete. Both numerical and real-world experiments are performed to validate the effectiveness of our framework.Note to Practitioners—This work is motivated by the concept of robot motion primitives. While direct segmented planning based on dynamical system movement is feasible for robots with multiple degrees of freedom, it cannot be directly applied to wheeled vehicles with non-holonomic constraints. In this paper, we propose a method that utilizes the differential flatness properties of wheeled vehicles to bridge the gap between these two application objects. By converting the trajectory planning problem of the mobile vehicles into an optimal control problem, we can obtain a satisfactory optimal trajectory. Solving the optimal control problem involves an iterative process due to its nature as an initial value problem. While some learning-based method offer end-to-end learning capabilities and can perform real-time optimal control, they requires a substantial number of training dataset (different optimal trajectories) and lack convergence validation. The foundation of this work is the imitation learning paradigm. Although learning-based approaches are also capable of directly learning from demonstrations, our approach realizes a combination of Lyapunov theory, which allows us to extract the planning task’s potential properties from a limited number of demonstrations, and the vehicle’s differential flat properties to construct a framework with analytical solution. The framework can effectively re-plan obstacle-free trajectories, even in scenarios with obstacles not given in learning phase. This effectively compensates for the shortcomings of learning-based methods in terms of data reliance, convergence, and interpretability.
No abstract available
No abstract available
Despite the increasing adoption of machine learning and data-driven models for predicting regional groundwater potential (GWP), exploration geoscientists have recognized that these models still face various challenges in their predictive precision. For instance, the stochastic uncertainty associated with incomplete groundwater investigation inventories and the inherent non-transparency characteristic of machine learning models, which lack transparency regarding how input features influence outcomes, pose significant challenges. This research constructs a bagging-based learning framework that integrates Positive–Unlabeled samples (BPUL), along with ex-post interpretability, to map the GWP of the Lijiang River Basin in China, a renowned karst region. For this purpose, we first aggregated various topographic, hydrological, geological, meteorological, and land conditional factors. The training samples were enhanced with data from the subterranean stream investigated in the study area, in addition to conventional groundwater inventories such as wells, boreholes, and karst springs. We employed the BPUL algorithm with four different base learners—Logistic Regression (LR), k-nearest neighbor (KNN), Random Forest (RF), and Light Gradient Boosting Machine (LightGBM)—and model validation was conducted to map the GWP in karst regions. The findings indicate that all models exhibit satisfactory performance in GWP mapping, with the hybrid ensemble models (RF-BPUL and LightGBM-BPUL) achieving higher validation scores. The model interpretation of the aggregated SHAP values revealed the contribution patterns of various conditional factors to groundwater distribution in karst zones, emphasizing that lithology, the multiresolution index of valley bottom flatness (MRVBF), and the geochemical element calcium oxide (CaO) have the most significant impact on groundwater enrichment in karst zones. These findings offer new approaches and methodologies for the in-depth exploration and scientific prediction of groundwater potential.
Non-orthogonal multiple access (NOMA) is one of the key technology of 5G system to enhance the capacity and spectral efficiency. However, the fast changing wireless channel makes the ideal power allocation be a challenging task in practice. A chaotic shape-forming filter (CSF) based NOMA system is proposed and a deep learning (DL) based method is used to estimate the user channel gains for power allocation in this paper. The contributions of the work lie in: 1) The CSF and the corresponding matched filter (MF) enhance the noise resistance performance of the NOMA system. 2) The autocorrelation function (ACF) of the superposition signal composed by the chaotic signals generated by the CSF of different users is proved to be the same as the ACF of the base function of the CSF, which is an interesting and important base point to use the previous theoretical result for blind channel identification. 3) For the flat fading channel assumed in the NOMA system, an analytical formula of the channel gain, sole parameter in this type of the channel, with respect to the ACFs of the received signal and the base function of the CSF is derived for the first time, which provides the underlying mechanism to use deep neural network (DNN) for channel gain prediction. Simulation results show that 1) the chaos-based NOMA using the CSF and corresponding MF achieves better performance as compared to the conventional binary phase shift keying NOMA (BPSK-NOMA) with root-raised cosine (RRC) filter; 2) the DNN with simplified structure and the less input neuron as compared to that for the frequency selective fading channel is capable of achieving superior performance for both blind channel identification and NOMA system in the sense of low bit error rate (BER).
Background Perinatal depression and anxiety significantly impact maternal and infant health, potentially leading to severe outcomes like preterm birth and suicide. Aboriginal women, despite their resilience, face elevated risks due to the long-term effects of colonization and cultural disruption. The Baby Coming You Ready (BCYR) model of care, centered on a digitized, holistic, strengths-based assessment, was co-designed to address these challenges. The successful BCYR pilot demonstrated its ability to replace traditional risk-based screens. However, some health professionals still overrely on psychological risk scores, often overlooking the contextual circumstances of Aboriginal mothers, their cultural strengths, and mitigating protective factors. This highlights the need for new tools to improve clinical decision-making. Objective We explored different explainable artificial intelligence (XAI)–powered machine learning techniques for developing culturally informed, strengths-based predictive modeling of perinatal psychological distress among Aboriginal mothers. The model identifies and evaluates influential protective and risk factors while offering transparent explanations for AI-driven decisions. Methods We used deidentified data from 293 Aboriginal mothers who participated in the BCYR program between September 2021 and June 2023 at 6 health care services in Perth and regional Western Australia. The original dataset includes variables spanning cultural strengths, protective factors, life events, worries, relationships, childhood experiences, family and domestic violence, and substance use. After applying feature selection and expert input, 20 variables were chosen as predictors. The Kessler-5 scale was used as an indicator of perinatal psychological distress. Several machine learning models, including random forest (RF), CatBoost (CB), light gradient-boosting machine (LightGBM), extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), and explainable boosting machine (EBM), were developed and compared for predictive performance. To make the black-box model interpretable, post hoc explanation techniques including Shapley additive explanations and local interpretable model-agnostic explanations were applied. Results The EBM outperformed other models (accuracy=0.849, 95% CI 0.8170-0.8814; F1-score=0.771, 95% CI 0.7169-0.8245; area under the curve=0.821, 95% CI 0.7829-0.8593) followed by RF (accuracy=0.829, 95% CI 0.7960-0.8617; F1-score=0.736, 95% CI 0.6859-0.7851; area under the curve=0.795, 95% CI 0.7581-0.8318). Explanations from EBM, Shapley additive explanations, and local interpretable model-agnostic explanations identified consistent patterns of key influential factors, including questions related to “Feeling Lonely,” “Blaming Herself,” “Makes Family Proud,” “Life Not Worth Living,” and “Managing Day-to-Day.” At the individual level, where responses are highly personal, these XAI techniques provided case-specific insights through visual representations, distinguishing between protective and risk factors and illustrating their impact on predictions. Conclusions This study shows the potential of XAI-driven models to predict psychological distress in Aboriginal mothers and provide clear, human-interpretable explanations of how important factors interact and influence outcomes. These models may help health professionals make more informed, non-biased decisions in Aboriginal perinatal mental health screenings.
We present PyBird-JAX, a differentiable, JAX-based implementation of PyBird, using internal neural network emulators to accelerate computationally costly operations for rapid large-scale structure (LSS) analysis. PyBird-JAX computes one-loop EFTofLSS predictions for redshift-space galaxy power spectrum multipoles in 1.2 ms on a CPU and 0.2 ms on a GPU, achieving 3–4 orders of magnitude speed-up over PyBird. The emulators take a compact spline-based representation of the input linear power spectrum P(k) as feature vectors, making the approach applicable to a wide range of cosmological models. We rigorously validate its accuracy against large-volume simulations and on BOSS data, including cosmologies not explicitly represented in the training set. Leveraging automatic differentiation, PyBird-JAX supports Fisher forecasting, Taylor expansion of model predictions, and gradient-based searches. Interfaced with a variety of samplers and Boltzmann solvers, PyBird-JAX provides a high-performance, end-to-end inference pipeline. Combined with a symbolic-P(k) generator, a typical Stage-4 LSS MCMC converges in minutes on a GPU. Our results demonstrate that PyBird-JAX delivers the precision and speed required for upcoming LSS surveys, opening the door to accelerated cosmological inference with minimal accuracy loss and no pretraining. In a companion paper [1], we put PyBird-JAX to use in achieving LSS marginalised constraints free from volume projection effects through non-flat measures.
No abstract available
No abstract available
Predictor antennas (PAs) are a potential solution to severe channel aging that can occur at high vehicular velocities in non line-of-sight (NLOS) environments. Channel aging reduces the performance of many advanced communication schemes based on channel state information at the transmitter (CSIT). Although PAs have been shown to work in combination with dense pilots in time and space, prediction performance can be reduced when channel estimates are sparse. This paper answers how densely pilots must be placed for PAs to be feasible when performing basic interpolation between channel estimates. This is important, especially for establishing upper limits on the length of the downlink (DL) frames required in a time-division duplex (TDD) system with PAs. Nearest-neighbor, linear, and spline interpolation are analyzed when applied to stochastic radio channels. A theoretical expression is derived for the power of the expected interpolation error for any interpolation method that can be expressed as a linear function of a set of measured values. The interpolation methods are evaluated on three theoretical channels with Rayleigh, flat, and Rician fading, and on two sets of channel measurements. The results indicate that linear and spline interpolation can be used with down to five and three samples per wavelength, respectively, without affecting the PA-based prediction NMSE. At two samples per wavelength, the prediction NMSE is still at a level that can be useful for precoding design in massive multiple-input multiple-output (M-MIMO) systems.
This paper demonstrates a refined approach to solving dynamic optimization problems for underactuated marine surface vessels. To this end the differential flatness of a mathematical model assuming full actuation is exploited to derive an efficient representation of a finite dimensional nonlinear programming problem, which in turn is constrained to apply to the underactuated case. It is illustrated how the properties of the flat output can be employed for the generation of an initial guess to be used in the optimization algorithm in the presence of static and dynamic obstacles. As an example energy optimal point to point trajectory planning for a nonlinear 3 degrees of freedom dynamic model of an underactuated surface vessel is undertaken. Input constraints, both in rate and magnitude as well as state constraints due to convex and non-convex obstacles in the area of operation are considered and simulation results for a challenging scenario are reported. Furthermore, an extension to a trajectory tracking controller using model predictive control is made where the benefits of the flatness based direct method allow to introduce nonuniform sample times that help to realize long prediction horizons while maintaining short term accuracy and real time capability. This is also verified in simulation where additional disturbances in the form of environmental disturbances, dynamic obstacles and parameter mismatch are introduced.
Seizures occur in a recurrent manner with intermittent states of interictal and ictal discharges (IIDs and IDs). The transitions to and from IDs are determined by a set of processes, including synaptic interaction and ionic dynamics. Although mathematical models of separate types of epileptic discharges have been developed, modeling the transitions between states remains a challenge. A simple generic mathematical model of seizure dynamics (Epileptor) has recently been proposed by Jirsa et al. (2014); however, it is formulated in terms of abstract variables. In this paper, a minimal population-type model of IIDs and IDs is proposed that is as simple to use as the Epileptor, but the suggested model attributes physical meaning to the variables. The model is expressed in ordinary differential equations for extracellular potassium and intracellular sodium concentrations, membrane potential, and short-term synaptic depression variables. A quadratic integrate-and-fire model driven by the population input current is used to reproduce spike trains in a representative neuron. In simulations, potassium accumulation governs the transition from the silent state to the state of an ID. Each ID is composed of clustered IID-like events. The sodium accumulates during discharge and activates the sodium-potassium pump, which terminates the ID by restoring the potassium gradient and thus polarizing the neuronal membranes. The whole-cell and cell-attached recordings of a 4-AP-based in vitro model of epilepsy confirmed the primary model assumptions and predictions. The mathematical analysis revealed that the IID-like events are large-amplitude stochastic oscillations, which in the case of ID generation are controlled by slow oscillations of ionic concentrations. The IDs originate in the conditions of elevated potassium concentrations in a bath solution via a saddle-node-on-invariant-circle-like bifurcation for a non-smooth dynamical system. By providing a minimal biophysical description of ionic dynamics and network interactions, the model may serve as a hierarchical base from a simple to more complex modeling of seizures.
To protect systems from unauthorized access and attacks, a network intrusion detection system watches network traffic for odd activity and possible threats. This study introduces a novel intrusion detection system (IDS) that enhances the capabilities of conventional IDSs by leveraging machine learning (ML). Traditional IDSs frequently struggle with high data quantities and complex attack patterns. We collect data, preprocess it, and apply ML models to existing networks to detect intrusions and regular connections. Evaluation using simulated and real-world scenarios assures robust performance, enabling iterative improvements in response to new threats. The goal is to create an adaptive, proactive defense against cyber-attacks. The study uses diverse training data, allowing the model to detect known and novel attack routes, considerably improving detection accuracy. The stepwise strategy consists of data collection, preparation, and seamless ML integration, followed by rigorous effectiveness evaluation using simulations and real-world scenarios.
No abstract available
BACKGROUND Ovarian cancer is often considered the most lethal gynecological cancer because it tends to be diagnosed at an advanced stage, leading to limited treatment options and poorer outcomes. Several factors contribute to the challenges in managing ovarian cancer, namely rapid metastasis, genetic factors, reproductive history, etc. This necessitates the prompt and precise diagnosis of ovarian cancer in order to carry out efficient treatment plans and give patients who are all impacted by OC the care and support they need. METHODS This CCLSTM model is suggested under four essential stages including preprocessing, feature extraction, feature selection and detection. Initially, the input data is preprocessed using Improved Two-step Data Normalization. Subsequently, features such as statistical, modified entropy, raw features and mutual information are extracted from the normalized data. Next, obtained features undergo the Improved Rank-based Recursive Feature Elimination method (IR-RFE) to select the most suitable features. Finally, the proposed CCLSTM model takes the selected features as input and provides a final detection outcome. RESULTS Furthermore, the performance of the proposed CCLSTM technique is examined through a thorough assessment using diverse analyses Additionally, the CCLSTM schemes show a sensitivity value of 0.948, whereas the sensitivity ratings for ALO-LSTM + ALOCNN, Bi-GRU, LSTM, RNN, KNN, CNN, and DCNN are 0.808, 0.893, 0.829, 0.851, 0.765, 0.872, and 0.893, respectively. CONCLUSION In the end, the development of CNN and the addition of LSTM technique have produced an ovarian cancer detection technique that is more accurate and consistent compared to other existing strategies.
Online payment fraud is an increasingly pressing issue as the volume of digital transactions grows. Accurate and fast detection is essential to minimize financial losses. This paper presents an approach to optimize fraud detection using the XGBoost algorithm and the Recursive Feature Elimination (RFE) feature selection technique. In this research, we use an online payment fraud dataset to train a model that can distinguish between legitimate and fraudulent transactions. The main contribution of this research is to demonstrate the effectiveness of the combination of XGBoost and RFE in improving fraud detection performance. The methods used include data preprocessing, feature selection with RFE, and model training with XGBoost. The evaluation results showed that the XGBoost model with RFE achieved a precision of 0.96, recall of 0.86, and f1-score of 0.91 for detecting fraudulent transactions, with an overall accuracy of 99.98%. In conclusion, the use of XGBoost together with RFE feature selection proved to be an efficient and effective approach for fraud detection in online payment systems, providing a reliable solution for real-world applications in the financial industry.
本次调研系统梳理了基于非扁平数据输入的机器学习抑郁症预测研究,通过多模态数据融合实现诊断效能的提升,深入挖掘神经影像与脑电生理信号的病理特征,并结合纵向行为监测和临床辅助决策方法,推动了从症状主观评估向数字化客观预测的范式转型。