无监督域适应滑坡识别
滑坡易发性评价与基础机器学习建模
该组文献聚焦于滑坡灾害的基础风险评估,利用集成学习(随机森林、XGBoost)、统计方法及特征筛选技术进行易发性制图(LSM)和时空概率建模,为深度学习识别提供背景支撑和数据基础。
- Comparison of tree-based ensemble learning algorithms for landslide susceptibility mapping in Murgul (Artvin), Turkey(Ziya Usta, H. Akıncı, Alper Tunga Akın, 2024, Earth Science Informatics)
- Comparative assessment of machine learning models for landslide susceptibility mapping: a focus on validation and accuracy(Mohamed M. Abdelkader, Á. Csámer, 2025, Natural Hazards)
- Regional-scale spatiotemporal landslide probability assessment through machine learning and potential applications for operational warning systems: a case study in Kvam (Norway)(Nicola Nocentini, A. Rosi, L. Piciullo, Zhongqiang Liu, S. Segoni, Riccardo Fanti, 2024, Landslides)
- Integrating Machine Learning Ensembles for Landslide Susceptibility Mapping in Northern Pakistan(Nafees Ali, Jian Chen, Xiaodong Fu, Rashid Ali, M. A. Hussain, Hamza Daud, Javid Hussain, Ali A. Altalbe, 2024, Remote. Sens.)
- Space–time landslide hazard modeling via Ensemble Neural Networks(Ashok Dahal, H. Tanyaș, C. V. Westen, M. Meijde, P. M. Mai, Raphaël Huser, L. Lombardo, 2024, Natural Hazards and Earth System Sciences)
- Identification of Landslide Precursors for Early Warning of Hazards with Remote Sensing(K. Strzabala, P. Ćwiąkała, E. Puniach, 2024, Remote. Sens.)
- Insights into landslide susceptibility: a comparative evaluation of multi-criteria analysis and machine learning techniques(Z. Ferreira, Bruna Almeida, Ana Cristina Costa, Manoel do Couto Fernandes, Pedro Cabral, 2025, Geomatics, Natural Hazards and Risk)
- Exploring machine learning and statistical approach techniques for landslide susceptibility mapping in Siwalik Himalayan Region using geospatial technology(Abhik Saha, Lakshya Tripathi, V. G. K. Villuri, A. Bhardwaj, 2024, Environmental Science and Pollution Research)
- Improving landslide susceptibility prediction through ensemble recursive feature elimination and meta-learning framework(Krishnagopal Halder, A. Srivastava, Anitabha Ghosh, Subhabrata Das, Santanu Banerjee, Subodh Chandra Pal, U. Chatterjee, Dipak Bisai, Frank Ewert, T. Gaiser, 2025, Scientific Reports)
- Optimizing landslide susceptibility mapping using machine learning and geospatial techniques(Gazali Agboola, L. H. Beni, Tamer Elbayoumi, Gary Thompson, 2024, Ecol. Informatics)
- Landslide susceptibility assessment using information quantity and machine learning integrated models: a case study of Sichuan province, southwestern China(Pengtao Zhao, Ying Wang, Yi Xie, Md Galal Uddin, Zhengxuan Xu, Xi-yang Chang, Yunhui Zhang, 2025, Earth Science Informatics)
- Landslide mapping based on a hybrid CNN-transformer network and deep transfer learning using remote sensing images with topographic and spectral features(Lei Wu, Rui Liu, Nengpan Ju, Ao Zhang, Jingsong Gou, Guolei He, Yuzhu Lei, 2024, Int. J. Appl. Earth Obs. Geoinformation)
针对滑坡识别的深度学习架构创新与轻量化设计
此类文献侧重于开发和改进专门用于滑坡检测的神经网络架构,包括轻量化网络(BisDeNet、MobileNet)、多尺度融合模型(YOLOv8增强版、DS Net)以及针对无人机影像的实时识别策略。
- Rapid Detection and Segmentation of Landslide Hazards in Loess Tableland Areas Using Deep Learning: A Case Study of the 2023 Jishishan Ms 6.2 Earthquake in Gansu, China(Zhuoli Bai, Lingyun Ji, Hongtao Tang, Jiangtao Qiu, Shuai Kang, Chuanjin Liu, Zongpan Bian, 2025, Remote Sensing)
- BisDeNet: A New Lightweight Deep Learning-Based Framework for Efficient Landslide Detection(Tao Chen, Xiao Gao, Gang Liu, Chen Wang, Zeyang Zhao, Jie Dou, Ruiqing Niu, Antonio Plaza, 2024, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- Risk Assessment of Geological Landslide Hazards Using D-InSAR and Remote Sensing(Jiaxin Zhong, Qiaomin Li, Jia Zhang, Pingping Luo, Wei Zhu, 2024, Remote. Sens.)
- UNet and Semantic Segmentation Based Landslide Detection System(Rekha R. Nair, T. Babu, Tripti Singh, A. K, 2025, 2025 12th International Conference on Computing for Sustainable Global Development (INDIACom))
- DS Net: A Dual-Coded Segmentation Network Leveraging Large Model Prior Knowledge for Intelligent Landslide Extraction(Xiao Wang, Dongsheng Zhong, Chenghao Liu, Xiaochuan Song, Luting Xu, Yue Deng, Shaoda Li, 2025, Remote Sensing)
- MED-DeepLabv3+: a lightweight landslide recognition algorithm on multi-scale remote sensing images(Xuhui Li, Zhihua Zhang, Xinxiu Zhang, Xinyu Zhu, Shuwen Yang, Chunlin Huang, Jie Hu, Li Hou, Wei Wang, Lujia Zhao, 2025, J. Comput. Des. Eng.)
- Accelerating Cross-Scene Co-Seismic Landslide Detection Through Progressive Transfer Learning and Lightweight Deep Learning Strategies(Aonan Dong, Jie Dou, Changdong Li, Zeqiang Chen, J. Ji, Ke Xing, Jie Zhang, Hamza Daud, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- TLSTMF-YOLO: Transfer Learning and Feature Fusion Network for Earthquake-Induced Landslide Detection in Remote Sensing Images(Shaoqiang Meng, Zhenming Shi, Saied Pirasteh, Silvia Liberata Ullo, Ming Peng, Changshi Zhou, Wesley Nunes Gonçalves, Limin Zhang, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- Lightweight Attention-Guided YOLO With Level Set Layer for Landslide Detection From Optical Satellite Images(Yueheng Yang, Z. Miao, Hua Zhang, Bing Wang, Lixin Wu, 2024, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- Landslide Detection in UAV Imagery: A Wavelet-Domain-Driven Multiscale Attention Approach(Daoying Zhou, Huilin Liu, Xiaowei Jin, Qingjie Wei, Kaiheng Cui, 2026, IEEE Transactions on Geoscience and Remote Sensing)
- UAV imagery-based landslide detection in challenging environment using pixel segmentation and generative AI approach(Yong-Soo Ha, Ho-Hong-Duy Nguyen, Thanh-Nhan Nguyen, Minh-Vuong Pham, 2025, Modeling Earth Systems and Environment)
- Landslide Image Segmentation with Attention Residual U-Net: A Hybrid Deep Learning Model(Syed Mujtaba Hussaine, Linlong Mu, Yimin Lu, Syed Sajid Hussain, 2025, Procedia Computer Science)
- A Transfer Learning Approach for Landslide Semantic Segmentation Based on Visual Foundation Model(Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen, 2025, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- Enhancing Landslide Segmentation with Guide Attention Mechanism and Fast Fourier Transformer(Kai Yan, Fei Shen, Zongyi Li, 2024, No journal)
- Regional landslide mapping model developed by a deep transfer learning framework using post-event optical imagery(Adel Asadi, L. Baise, Snehamoy Chatterjee, Magaly Koch, Babak Moaveni, 2024, Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards)
遥感影像无监督域适应(UDA)通用理论与对齐技术
该组研究探讨了UDA的底层算法,如对抗学习(DANN)、分布对齐(MMD)、梯度协调、场景协方差对齐及频率分解,旨在解决遥感影像中因传感器和环境差异导致的域偏移问题。
- Decomposition-Based Unsupervised Domain Adaptation for Remote Sensing Image Semantic Segmentation(Xianping Ma, Xiaokang Zhang, Xingchen Ding, M. Pun, Siwei Ma, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- Multi-Granularity Domain-Adaptive Teacher for Unsupervised Remote Sensing Object Detection(Fang Fang, Jianing Kang, Shengwen Li, Panpan Tian, Yang Liu, Chaoliang Luo, Shunping Zhou, 2025, Remote Sensing)
- Unsupervised Domain Adaptation Semantic Segmentation of Remote Sensing Images With Mask Enhancement and Balanced Sampling(Xin Li, Yuanbo Qiu, Jixiu Liao, Fan Meng, P. Ren, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- Unsupervised Remote Sensing Image Semantic Segmentation Based on Multiscale Contrastive Domain Adaptation(Jie Geng, Shuai Song, Zhen Xu, Wen Jiang, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- S4DL: Shift-Sensitive Spatial–Spectral Disentangling Learning for Hyperspectral Image Unsupervised Domain Adaptation(Jie Feng, Tianshu Zhang, Junpeng Zhang, Ronghua Shang, Weisheng Dong, G. Shi, Licheng Jiao, 2024, IEEE Transactions on Neural Networks and Learning Systems)
- Unsupervised Domain Adaptation by Backpropagation(Yaroslav Ganin, V. Lempitsky, 2014, No journal)
- Low-Rank Correlation Learning for Unsupervised Domain Adaptation(Yuwu Lu, W. Wong, Chun Yuan, Zhihui Lai, Xuelong Li, 2024, IEEE Transactions on Multimedia)
- Unsupervised Domain Adaptation With Hierarchical Masked Dual-Adversarial Network for End-to-End Classification of Multisource Remote Sensing Data(Wen-Shuai Hu, Wei Li, Hengchao Li, Xudong Zhao, Mengmeng Zhang, Ran Tao, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- Local Pixel-Contrast and Global Gaussian Multiprototype Bidirectional Alignment for Unsupervised Domain Adaptation of Semantic Segmentation in Remote Sensing Imagery(Yongkang Hu, Yupei Wang, Liang Chen, 2026, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- A MultiKernel Domain Adaptation Method for Unsupervised Transfer Learning on Cross-Source and Cross-Region Remote Sensing Data Classification(Wei Liu, R. Qin, 2020, IEEE Transactions on Geoscience and Remote Sensing)
- Domain Adaptive and Interactive Differential Attention Network for Remote Sensing Image Change Detection(Yuliang Ji, Weiwei Sun, Yumiao Wang, Z. Lv, Gang Yang, Yuanzeng Zhan, Chong Li, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- Unsupervised Domain Adaptation Semantic Segmentation of Remote Sensing Imagery with Scene Covariance Alignment(Kangjian Cao, Sheng Wang, Ziheng Wei, Kexin Chen, Runlong Chang, Fu Xu, 2024, Electronics)
- MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images(Yuxiang Cai, Yongheng Shang, Jianwei Yin, 2024, Proceedings of the 32nd ACM International Conference on Multimedia)
- DDCI: Unsupervised Domain Adaptation for Remote Sensing Images Based on Diffusion Causal Distillation(Jiaqi Zhao, Yao Li, Yong Zhou, Wenliang Du, Xixi Li, Rui Yao, A. E. Saddik, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- Aligning Higher-Order Graph Structure for Unsupervised Domain Adaptation in Remote Sensing Scene Classification(Qing He, Erzhu Li, A. Samat, Wei Liu, Xing Li, 2026, IEEE Geoscience and Remote Sensing Letters)
- Cycle-Refined Multidecision Joint Alignment Network for Unsupervised Domain Adaptive Hyperspectral Change Detection(Jiahui Qu, Wenqian Dong, Yufei Yang, Tongzhen Zhang, Yunsong Li, Qian Du, 2024, IEEE Transactions on Neural Networks and Learning Systems)
- Gradient Harmonization in Unsupervised Domain Adaptation(Fuxiang Huang, Suqi Song, Lei Zhang, 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
- Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation(Jian Liang, Dapeng Hu, Jiashi Feng, 2020, No journal)
- Deep Hashing Network for Unsupervised Domain Adaptation(Hemanth Venkateswara, José Eusébio, Shayok Chakraborty, S. Panchanathan, 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR))
- Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization(Mengzhu Wang, Shanshan Wang, Xun Yang, Jianlong Yuan, Wenjun Zhang, 2024, IEEE Transactions on Circuits and Systems for Video Technology)
- Geodesic flow kernel for unsupervised domain adaptation(Boqing Gong, Yuan Shi, Fei Sha, K. Grauman, 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition)
- Continual Unsupervised Domain Adaptation in Data-Constrained Environments(A. M. N. Taufique, C. S. Jahan, Andreas Savakis, 2024, IEEE Transactions on Artificial Intelligence)
专用滑坡跨域识别模型与迁移学习应用
这些文献直接针对滑坡跨区域识别任务,提出了LandsDANet、DisasterNets等专用框架,利用风格迁移、合成图像增强及跨域特征提取技术实现异构数据源之间的泛化。
- Cross-domain landslide mapping by harmonizing heterogeneous remote sensing datasets(B. Yu, Fangming Chen, Wenlong Chen, Guangyue Shi, Chong Xu, Ning Wang, Lei Wang, 2025, GIScience & Remote Sensing)
- A Transfer Learning Remote Sensing Landslide Image Segmentation Method Based on Nonlinear Modeling and Large Kernel Attention(Jiajun Li, Qiang Li, Jinzheng Lu, Kui Zheng, Lijuan Wei, Qiang Xiang, 2025, Applied Sciences)
- A Cross-Domain Landslide Extraction Method Utilizing Image Masking and Morphological Information Enhancement(Jie Chen, Jinge Liu, Xu Zeng, Songshan Zhou, Geng Sun, Siqiang Rao, Ya Guo, Jingru Zhu, 2025, Remote Sensing)
- Cross-Domain Landslide Mapping in Remote Sensing Images Based on Unsupervised Domain Adaptation Framework(Jing Yang, Mingtao Ding, Wu-da Huang, Qiang Xue, Ying Dong, Bo Chen, Lulu Peng, Fuling Zhang, Zhenhong Li, 2026, Remote Sensing)
- MSMCD: A Multi-Stage Mamba Network for Geohazard Change Detection(Liwei Qin, Quan Zou, Guoqing Li, Wenyang Yu, Lei Wang, Lichuan Chen, Heng Zhang, 2025, Remote Sensing)
- DisasterNets: Embedding Machine Learning in Disaster Mapping(Qingsong Xu, Yilei Shi, Xiao Xiang Zhu, 2023, IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium)
- Potential of synthetic images in landslide segmentation in data-poor scenario: a framework combining GAN and transformer models(Xiao Feng, Juan Du, Ming-Tai Wu, Bo Chai, Fa-sheng Miao, Yang Wang, 2024, Landslides)
- A landslide identification method based on integrated segmentation network and transfer learning(Sijing Chen, Hanqi Qu, Yunyan Shao, Yuxuan Zeng, Zikang Wu, Chengda Lu, Min Wu, 2025, Neurocomputing)
基于视觉大模型(VFM)与Transformer的前沿域适应
该组文献代表了最新趋势,利用SAM、CLIP等视觉底座模型或Transformer架构,通过适配器微调、提示学习和扩散模型增强,提升滑坡及灾害监测在复杂场景下的零样本或少样本迁移能力。
- Landslidenet: Adaptive Vision Foundation Model for Landslide Detection(Junchuan Yu, Yichuan Li, Yangyang Chen, Changhong Hou, Daqing Ge, Yanni Ma, Qiong Wu, 2024, IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium)
- Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation(B. B. Englert, Fabrizio J. Piva, Tommie Kerssies, Daan de Geus, G. Dubbelman, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
- Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model(Shuchang Lyu, Qi Zhao, Guangliang Cheng, Yiwei He, Zheng Zhou, Guangbiao Wang, Z. Shi, 2024, ArXiv)
- Integrating unsupervised domain adaptation and SAM technologies for image semantic segmentation: a case study on building extraction from high-resolution remote sensing images(Mengyuan Yang, Rui Yang, Min Wang, Haiyan Xu, Gang Xu, 2025, International Journal of Digital Earth)
- Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation(Zhekai Du, Xinyao Li, Fengling Li, Ke Lu, Lei Zhu, Jingjing Li, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- MarsScapes and UDAFormer: A Panorama Dataset and a Transformer-Based Unsupervised Domain Adaptation Framework for Martian Terrain Segmentation(Haiqiang Liu, Meibao Yao, Xueming Xiao, Bo Zheng, Hutao Cui, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- Unsupervised Domain Adaptation Augmented by Mutually Boosted Attention for Semantic Segmentation of VHR Remote Sensing Images(Xianping Ma, Xiaokang Zhang, Zhiguo Wang, M. Pun, 2023, IEEE Transactions on Geoscience and Remote Sensing)
- TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model(Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen, 2024, ArXiv)
- Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models(Zhengfeng Lai, Haoping Bai, Haotian Zhang, Xianzhi Du, Jiulong Shan, Yinfei Yang, Chen-Nee Chuah, Meng Cao, 2024, 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))
- Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation(Xinyao Li, Yuke Li, Zhekai Du, Fengling Li, Ke Lu, Jingjing Li, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification(Earl Ranario, Lars Lundqvist, Heesup Yun, Brian N. Bailey, J. M. Earles, Borden Day, Borden Night, 2025, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
- FireExpert: Fire Event Identification and Assessment Leveraging Cross-Domain Knowledge and Large Language Model(Guofeng Luo, Lijuan Weng, Yunqian Li, Yilu Sun, Yayao Hong, Yongyi Wu, Ruixiang Luo, Leye Wang, Cheng Wang, Longbiao Chen, 2025, IEEE Transactions on Mobile Computing)
自训练、伪标签精炼与无源/多源域适应策略
此类研究通过教师-学生模型、伪标签实例校准、贝叶斯神经网络及无源(Source-Free)域适应技术,在无法访问源数据或标签稀缺的情况下提升模型的自适应鲁棒性。
- Self‐training with Bayesian neural networks and spatial priors for unsupervised domain adaptation in crack segmentation(Pang-jo Chun, Toshiya Kikuta, 2024, Computer‐Aided Civil and Infrastructure Engineering)
- Unsupervised active-transfer learning for automated landslide mapping(Zhihao Wang, Alexander Brenning, 2023, Comput. Geosci.)
- Self-Training Based Image–Text Multimodal Unsupervised Domain Adaptation Segmentation Model for Remote Sensing Images(Qianqian Liu, Xili Wang, 2026, Remote Sensing)
- Self-Training-Based Unsupervised Domain Adaptation for Object Detection in Remote Sensing Imagery(Sihao Luo, Li Ma, Xiaoquan Yang, Dapeng Luo, Q. Du, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- Self-Training Based Instance Calibration for Unsupervised Domain Adaptation Semantic Segmentation of Remote Sensing Images(Meichen Ai, Xili Wang, 2024, Proceedings of the 2024 8th International Conference on Big Data and Internet of Things)
- Unsupervised Domain Adaptation Enhanced by Fuzzy Prompt Learning(Kuo Shi, Jie Lu, Zhen Fang, Guangquan Zhang, 2024, IEEE Transactions on Fuzzy Systems)
- Class-Incremental Unsupervised Domain Adaptation via Pseudo-Label Distillation(Kun-Juan Wei, Xu Yang, Zhe Xu, Cheng Deng, 2024, IEEE Transactions on Image Processing)
- Source-free Domain Adaptive Object Detection in Remote Sensing Images(Weixing Liu, Jun Liu, X. Su, Han Nie, Bin Luo, 2024, ArXiv)
- Evidential Multi-Source-Free Unsupervised Domain Adaptation(Jiangbo Pei, Aidong Men, Yang Liu, X. Zhuang, Qingchao Chen, 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
- KDA: Knowledge Distillation Adversarial Framework With Vision Foundation Models for Landslide Segmentation(Shijie Wang, Lulin Li, Xuan Dong, Lei Shi, Pin Tao, 2025, IEEE Geoscience and Remote Sensing Letters)
多模态融合、时空建模与跨学科应用延伸
该组文献探讨了整合DEM地形辅助、多时相影像、图卷积网络(GCN)等多源信息的方法,并展示了UDA技术在故障诊断、人脸识别等领域的跨学科迁移价值。
- DEM-Assisted Topography-Conditioned and Orientation-Adaptive Siamese Network for Cross-Region Landslide Change Detection(Jing Wang, Haiyang Li, Shuguang Wu, Guigen Nie, Yukui Yu, Z. Fan, 2026, Remote Sensing)
- Inter-Sensor High-Resolution and Multi-Temporal Image Fusion for Unsupervised Domain Adaptation in Remote Sensing(Damian Ibañez, Junshi Xia, N. Yokoya, Filiberto Pla, R. Fernández-Beltran, 2025, IEEE Transactions on Geoscience and Remote Sensing)
- M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training(L. Meegahapola, Hamza Hassoune, D. Gática-Pérez, 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)
- Feature Mutual Representation-Based Graph Domain Adaptive Network for Unsupervised Hyperspectral Change Detection(Jiahui Qu, Jingyu Zhao, Wenqian Dong, Song Xiao, Yunsong Li, Q. Du, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- A High-Efficient Wi-Fi-Based Cross-Domain Recognition Framework Using Multisource Domain Adaptation for Single-Transceiver Scenarios(Wanguo Jiao, Wei Du, Changsheng Zhang, Long Suo, 2025, IEEE Sensors Journal)
- Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos(Xiaokang Zhang, M. Pun, Ming Liu, 2021, Remote. Sens.)
- CoUDA: Continual Unsupervised Domain Adaptation for Industrial Fault Diagnosis Under Dynamic Working Conditions(Bojian Chen, Xinmin Zhang, Changqing Shen, Qi Li, Zhihuan Song, 2025, IEEE Transactions on Industrial Informatics)
- Cross-domain fault identification of bearings based on synchrosqueezed wavelet–scattering transform and bagging–convolutional neural network(Shizeng Lu, Huijun Dong, 2025, Journal of Vibration and Control)
- Pose-Invariant Face Recognition Using Optimized Hybrid Yolo Algorithm from Image(J. J. J. Prabha, R. S. Kumar, 2024, International Journal of Image and Graphics)
- MSC-TReID: Multi-Scale Transformer for Cross-Domain Person Re-Identification(Mahdi Khodayar, Jacob Regan, M. Saffari, Ali Farajzadeh Bavil, 2025, 2025 IEEE International Conference on Electro Information Technology (eIT))
- Makeup-Invariant Faces Recognition Using a Pre-Trained Neural Network, Grasshopper Optimization Algorithm, and Random Forest(Elaf Nassir, Abud Shubber, Omid Sojoodi Shijani, 2024, Journal of Education for Pure Science- University of Thi-Qar)
- Cross-Domain Identity Authentication Scheme for the IIoT Identification Resolution System Based on Self-Sovereign Identity(Yunhua He, Tingli Yuan, Bin Wu, Keshav Sood, Ke Xiao, Xiuzhen Cheng, 2025, IEEE Transactions on Networking)
- Toward Identity-Invariant Facial Expression Recognition: Disentangled Representation via Mutual Information Perspective(D. Kim, S. Kim, B. Song, 2024, IEEE Access)
- A novel facial expression recognition framework using deep learning based dynamic cross-domain dual attention network(A. Alzahrani, A. M. Alghamdi, M. U. Ashraf, Iqra Ilyas, Nadeem Sarwar, Abdulrahman Alzahrani, A. Alarood, 2025, PeerJ Computer Science)
- Multi-subspace mapping and adaptive learning: MMAL-CL for cross-domain few-shot image identification across scenarios(Qian Du, Xin Xia, Qilin Liu, Yanfei Lv, Lu Li, Zhuang Miao, 2025, Frontiers in Physics)
- Effective Comparative Prototype Hashing for Unsupervised Domain Adaptation(Hui Cui, Lihai Zhao, Fengling Li, Lei Zhu, Xiaohui Han, Jingjing Li, 2024, No journal)
- Unsupervised domain adaptation via feature transfer learning based on elastic embedding(Liran Yang, Bin Lu, Qinghua Zhou, Pan Su, 2024, International Journal of Machine Learning and Cybernetics)
- Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition(Xun Yang, Tianyu Chang, Tianzhu Zhang, Shanshan Wang, Richang Hong, Meng Wang, 2024, International Journal of Computer Vision)
- A style-Pix2Pix GAN framework for data augmentation in landslide semantic segmentation(Tianhe Ren, Wenping Gong, Federico Agliardi, Liang Gao, Xuyang Xiang, 2025, Landslides)
- Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification(Badr M. Abdullah, Matthew Baas, Bernd Mobius, Dietrich Klakow, 2025, ArXiv)
- VR Eye Tracking Data for Gender Identification: A Look at Same-Domain and Cross-Domain Scenarios(S. Asish, Arijet Sarker, 2025, Proceedings of the 2025 31st ACM Symposium on Virtual Reality Software and Technology)
最终分组结果构建了一个从基础理论到前沿应用的完整知识体系。研究脉络清晰地展示了滑坡识别如何从传统的易发性评价演进到深度学习语义分割,并最终聚焦于利用无监督域适应(UDA)解决跨区域泛化难题。核心趋势包括:1) 算法层面从简单的特征对齐转向自训练、伪标签精炼及无源域适应;2) 架构层面从轻量化CNN转向Transformer及视觉大模型(VFM);3) 数据层面从单模态光学影像转向结合DEM、多时相及图结构的多模态融合。这些研究共同推动了滑坡监测向更具鲁棒性、自动化和跨地域迁移的方向发展。
总计103篇相关文献
Rapid and accurate acquisition of landslide inventories is essential for effective disaster relief. Deep learning-based pixel-wise semantic segmentation of remote sensing imagery has greatly advanced in landslide mapping. However, the heavy dependance on extensive annotated labels and sensitivity to domain shifts severely constrain the model performance in unseen domains, leading to poor generalization. To address these limitations, we propose LandsDANet, an innovative unsupervised domain adaptation framework for cross-domain landslide identification. Firstly, adversarial learning is employed to reduce the data distribution discrepancies between the source and target domains, thereby achieving output space alignment. The improved SegFormer serves as the segmentation network, incorporating hierarchical Transformer blocks and an attention mechanism to enhance feature representation capabilities. Secondly, to alleviate inter-domain radiometric discrepancies and attain image-level alignment, a Wallis filter is utilized to perform image style transformation. Considering the class imbalance present in the landslide dataset, a Rare Class Sampling strategy is introduced to mitigate bias towards common classes and strengthen the learning of the rare landslide class. Finally, a contrastive loss is adopted to further optimize and enhance the model’s ability to delineate fine-grained class boundaries. The proposed model is validated on the Potsdam and Vaihingen benchmark datasets, followed by validation in two landslide scenarios induced by earthquakes and rainfall to evaluate its adaptability across different disaster domains. Compared to the source-only model, LandsDANet achieved improvements in IoU of 27.04% and 35.73% in two cross-domain landslide disaster recognition tasks, respectively. This performance not only showcases its outstanding capabilities but also underscores its robust potential to meet the demands for rapid response.
Disaster mapping is a critical task that often requires on-site experts and is time-consuming. To address this, a comprehensive framework is presented for fast and accurate recognition of disasters using machine learning, termed DisasterNets. It consists of two stages, space granulation and attribute granulation. The space granulation stage leverages supervised/semi-supervised learning, unsupervised change detection, and domain adaptation with/without source data techniques to handle different disaster mapping scenarios. Furthermore, the disaster database with the corresponding geographic information field properties is built by using the attribute granulation stage. The framework is applied to earthquake-triggered landslide mapping and large-scale flood mapping. The results demonstrate a competitive performance for high-precision, high-efficiency, and cross-scene recognition of disasters. To bridge the gap between disaster mapping and machine learning communities, we will provide an openly accessible tool based on DisasterNets. The framework and tool will be available at https://github.com/HydroPML/DisasterNets.
Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.
Unsupervised domain adaptation (UDA) has recently gained attention in fault diagnosis due to its ability to address domain shift problems arising from changes in working conditions. However, when faced with the continual domain shift problem inherent in real-world industries with dynamic working conditions, UDA often suffers from catastrophic forgetting. To address this challenge, we propose a novel replay-free continual UDA framework, CoUDA, for fault diagnosis under dynamic working conditions. In CoUDA, prototype contrastive learning is employed in source domain pre-training in order to improve the model generalization ability in preparation for the adaptation to the subsequent target domains. Then, source discriminator constraint is employed to ensure that the acquired source domain knowledge serves as an anchor, and source feature knowledge distillation is applied to prevent catastrophic forgetting without replay in sequential target domain adaptation. In addition, for better domain adaptation, local domain alignment and information entropy minimization are utilized to achieve fine-grained domain alignment. Experimental results demonstrate the superiority of the proposed CoUDA in achieving robust fault diagnosis under dynamic working conditions.
Conventional Unsupervised Domain Adaptation (UDA) strives to minimize distribution discrepancy between do-mains, which neglects to harness rich semantics from data and struggles to handle complex domain shifts. A promising technique is to leverage the knowledge of large-scale pretrained vision-language models for more guided adaptation. Despite some endeavors, current methods often learn textual prompts to embed domain semantics for source and target domains separately and perform classification within each domain, limiting cross-domain knowledge transfer. Moreover, prompting only the language branch lacks flex-ibility to adapt both modalities dynamically. To bridge this gap, we propose Domain-Agnostic Mutual Prompting (DAMP) to exploit domain-invariant semantics by mutually aligning visual and textual embeddings. Specifically, the image contextual information is utilized to prompt the language branch in a domain-agnostic and instance-conditioned way. Meanwhile, visual prompts are im-posed based on the domain-agnostic textual prompt to elicit domain-invariant visual embeddings. These two branches of prompts are learned mutually with a cross-attention module and regularized with a semantic-consistency loss and an instance-discrimination contrastive loss. Experiments on three UDA benchmarks demonstrate the superiority of DAMP over state-of-the-art approaches 1.
This study proposes a novel self‐training framework for unsupervised domain adaptation in the segmentation of concrete wall cracks using accumulated crack data. The proposed method incorporates Bayesian neural networks for uncertainty estimation of pseudo‐labels, and spatial priors of cracks for screening noisy labels. Experiments demonstrate that the proposed approach achieves significant improvements in F1 score. Comparing the F1 scores, Bayesian DeepLabv3+ and Bayesian U‐Net showed performance improvements of 0.0588 and 0.1501, respectively, after domain adaptation. Furthermore, the integration of Stable Diffusion for few‐shot image generation enhances domain adaptation performance by 0.0332. The proposed framework enables high‐precision crack segmentation with as few as 100 target images, which can be easily obtained at the site, reducing the cost of model deployment in infrastructure maintenance. The study also investigates the optimal number of iterations for domain adaptation based on the uncertainty score, providing insights for practical implementation. The proposed method contributes to the development of efficient and automated structural health monitoring using AI.
Unsupervised domain adaptation (UDA) techniques are vital for semantic segmentation in geosciences, effectively utilizing remote sensing imagery across diverse domains. However, most existing UDA methods, which focus on domain alignment at the high-level feature space, struggle to simultaneously retain local spatial details and global contextual semantics. To overcome these challenges, a novel decomposition scheme is proposed to guide domain-invariant representation learning. Specifically, multiscale high/low-frequency decomposition (HLFD) modules are proposed to decompose feature maps into high- and low-frequency components across different subspaces. This decomposition is integrated into a fully global-local generative adversarial network (GLGAN) that incorporates global-local transformer blocks (GLTBs) to enhance the alignment of decomposed features. By integrating the HLFD scheme and the GLGAN, a novel decomposition-based UDA framework called De-GLGAN is developed to improve the cross-domain transferability and generalization capability of semantic segmentation models. Extensive experiments on two UDA benchmarks, namely ISPRS Potsdam and Vaihingen, and LoveDA Rural and Urban, demonstrate the effectiveness and superiority of the proposed approach over existing state-of-the-art UDA methods. The source code for this work is accessible at https://github.com/sstary/SSRS.
Class-Incremental Unsupervised Domain Adaptation (CI-UDA) requires the model can continually learn several steps containing unlabeled target domain samples, while the source-labeled dataset is available all the time. The key to tackling CI-UDA problem is to transfer domain-invariant knowledge from the source domain to the target domain, and preserve the knowledge of the previous steps in the continual adaptation process. However, existing methods introduce much biased source knowledge for the current step, causing negative transfer and unsatisfying performance. To tackle these problems, we propose a novel CI-UDA method named Pseudo-Label Distillation Continual Adaptation (PLDCA). We design Pseudo-Label Distillation module to leverage the discriminative information of the target domain to filter the biased knowledge at the class- and instance-level. In addition, Contrastive Alignment is proposed to reduce domain discrepancy by aligning the class-level feature representation of the confident target samples and the source domain, and exploit the robust feature representation of the unconfident target samples at the instance-level. Extensive experiments demonstrate the effectiveness and superiority of PLDCA. Code is available at code.
In recent years, deep neural networks have emerged as a dominant machine learning tool for a wide variety of application domains. However, training a deep neural network requires a large amount of labeled data, which is an expensive process in terms of time, labor and human expertise. Domain adaptation or transfer learning algorithms address this challenge by leveraging labeled data in a different, but related source domain, to develop a model for the target domain. Further, the explosive growth of digital data has posed a fundamental challenge concerning its storage and retrieval. Due to its storage and retrieval efficiency, recent years have witnessed a wide application of hashing in a variety of computer vision applications. In this paper, we first introduce a new dataset, Office-Home, to evaluate domain adaptation algorithms. The dataset contains images of a variety of everyday objects from multiple domains. We then propose a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data. To the best of our knowledge, this is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem. Our extensive empirical studies on multiple transfer tasks corroborate the usefulness of the framework in learning efficient hash codes which outperform existing competitive baselines for unsupervised domain adaptation.
Unsupervised domain adaptive hashing is a highly promising research direction within the field of retrieval. It aims to transfer valuable insights from the source domain to the target domain while maintaining high storage and retrieval efficiency. Despite its potential, this field remains relatively unexplored. Previous methods usually lead to unsatisfactory retrieval performance, as they frequently directly apply slightly modified domain adaptation algorithms to hash learning framework, or pursue domain alignment within the Hamming space characterized by limited semantic information. In this paper, we propose a simple yet effective approach named Comparative Prototype Hashing (CPH) for unsupervised domain adaptive image retrieval. We establish a domain-shared unit hypersphere space through prototype contrastive learning and then obtain the Hamming hypersphere space via mapping from the shared hypersphere. This strategy achieves a cohesive synergy between learning uniformly distributed and category conflict-averse feature representations, eliminating domain discrepancies, and facilitating hash code learning. Moreover, by leveraging dual-domain information to supervise the entire hashing model training process, we can generate hash codes that retain inter-sample similarity relationships within both domains. Experimental results validate that our CPH significantly outperforms the state-of-the-art counterparts across multiple cross-domain and single-domain retrieval tasks. Notably, on Office-Home and Office-31 datasets, CPH achieves an average performance improvement of 19.29% and 13.85% on cross-domain retrieval tasks compared to the second-best results, respectively. The source codes of our method are available at: https://github.com/christinecui/CPH.
No abstract available
Martian terrain segmentation aims to assign all pixels of an input image with various terrain labels, which provides a firm support for the downstream research on rover traversing and geologic analysis tasks. However, existing studies in this field suffer from limitations in two aspects: one is the lack of large-scale and high-quality Martian terrain datasets, and the other is the over-reliance on purely supervised learning that is very data-hungry and sensitive to domain shifts among different datasets. In this article, we overcome these from the perspective of both data and methodology. First, we publish MarsScapes, a panorama dataset with appreciable data volume and fine-grained annotations for Martian terrain understanding. The dataset contains 195 terrain panoramas composed of 3779 subimages, and all pixels in the panoramas are split into nine semantic categories. Then, we propose the first transformer-based unsupervised domain adaptation (UDA) framework (UDAFormer) for the cross-domain terrain segmentation on Mars, which consists of a teacher–student model and an output-guided biased sampling (OGBS) module. The teacher–student model performs knowledge distillation to explore robust cross-domain features, where a modified augmentation regularization (MAR) is designed to alleviate the interference of undesirable augmentations to domain adaption. The OGBS helps the teacher–student network to emphasize the categories that tend to be ambiguous or submerged during the training, elevating the overall accuracy for the UDA segmentation of Martian terrains. Extensive experiments on the MarsScapes and another dataset called Mars-Seg demonstrate the superiority of UDAFormer over the state-of-the-art methods in UDA Martian terrain segmentation.
Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain. Prior UDA methods typically require to access the source data when learning to adapt the model, making them risky and inefficient for decentralized private data. This work tackles a practical setting where only a trained source model is available and investigates how we can effectively utilize such a model without source data to solve UDA problems. We propose a simple yet generic representation learning framework, named \emph{Source HypOthesis Transfer} (SHOT). SHOT freezes the classifier module (hypothesis) of the source model and learns the target-specific feature extraction module by exploiting both information maximization and self-supervised pseudo-labeling to implicitly align representations from the target domains to the source hypothesis. To verify its versatility, we evaluate SHOT in a variety of adaptation cases including closed-set, partial-set, and open-set domain adaptation. Experiments indicate that SHOT yields state-of-the-art results among multiple domain adaptation benchmarks.
Nuclear norm maximization has shown the power to enhance the transferability of unsupervised domain adaptation model (UDA) in an empirical scheme. In this paper, we identify a new property termed equity, which indicates the balance degree of predicted classes, to demystify the efficacy of nuclear norm maximization for UDA theoretically. With this in mind, we offer a new discriminability-and-equity maximization paradigm built on squares loss, such that predictions are equalized explicitly. To verify its feasibility and flexibility, two new losses termed Class Weighted Squares Maximization (CWSM) and Normalized Squares Maximization (NSM), are proposed to maximize both predictive discriminability and equity, from the class level and the sample level, respectively. Importantly, we theoretically relate these two novel losses (i.e., CWSM and NSM) to the equity maximization under mild conditions, and empirically suggest the importance of the predictive equity in UDA. Moreover, it is very efficient to realize the equity constraints in both losses. Experiments of cross-domain image classification on three popular benchmark datasets show that both CWSM and NSM contribute to outperforming the corresponding counterparts.
Over the years, multimodal mobile sensing has been used extensively for inferences regarding health and well-being, behavior, and context. However, a significant challenge hindering the widespread deployment of such models in real-world scenarios is the issue of distribution shift. This is the phenomenon where the distribution of data in the training set differs from the distribution of data in the real world---the deployment environment. While extensively explored in computer vision and natural language processing, and while prior research in mobile sensing briefly addresses this concern, current work primarily focuses on models dealing with a single modality of data, such as audio or accelerometer readings, and consequently, there is little research on unsupervised domain adaptation when dealing with multimodal sensor data. To address this gap, we did extensive experiments with domain adversarial neural networks (DANN) showing that they can effectively handle distribution shifts in multimodal sensor data. Moreover, we proposed a novel improvement over DANN, called M3BAT, unsupervised domain adaptation for multimodal mobile sensing with multi-branch adversarial training, to account for the multimodality of sensor data during domain adaptation with multiple branches. Through extensive experiments conducted on two multimodal mobile sensing datasets, three inference tasks, and 14 source-target domain pairs, including both regression and classification, we demonstrate that our approach performs effectively on unseen domains. Compared to directly deploying a model trained in the source domain to the target domain, the model shows performance increases up to 12% AUC (area under the receiver operating characteristics curves) on classification tasks, and up to 0.13 MAE (mean absolute error) on regression tasks.
Unsupervised Domain Adaptation (UDA) aims to leverage the labeled source domain to solve the tasks on the unlabeled target domain. Traditional UDA methods face the challenge of the tradeoff between domain alignment and semantic class discriminability, especially when a large domain gap exists between the source and target domains. The efforts of applying large-scale pre-training to bridge the domain gaps remain limited. In this work, we propose that Vision-Language Models (VLMs) can empower UDA tasks due to their training pattern with language alignment and their large-scale pre-trained datasets. For example, CLIP and GLIP have shown promising zero-shot generalization in classification and detection tasks. However, directly fine-tuning these VLMs into downstream tasks may be computationally expensive and not scalable if we have multiple domains that need to be adapted. Therefore, in this work, we first study an efficient adaption of VLMs to preserve the original knowledge while maximizing its flexibility for learning new knowledge. Then, we design a domain-aware pseudo-labeling scheme tailored to VLMs for domain disentanglement. We show the superiority of the proposed methods in four UDA-classification and two UDA-detection benchmarks, with a significant improvement (+9.9%) on DomainNet.
Unsupervised domain adaptation (UDA) addresses the challenge of distribution shift between a labeled source domain and an unlabeled target domain by utilizing knowledge from the source. Traditional UDA methods mainly focus on single-modal scenarios, either vision or language, thus, not fully exploring the advantages of multimodal representations. Visionlanguage models utilize multimodal information, applying prompt learning techniques for addressing target domain tasks. Motivated by the recent advancements in pretrained visionlanguage models, this article expands the UDA framework to incorporate multimodal approaches using fuzzy techniques. The adoption of fuzzy techniques, preferred over conventional domain adaptation methods, is based on the following two key aspects: 1) the nature of prompt learning is intrinsically linked to fuzzy logic, and 2) the superior capability of fuzzy techniques in processing soft information and effectively utilizing inherent relationships both within and across domains. To this end, we propose UDA enhanced by fuzzy prompt learning (FUZZLE), a simple and effective method for aligning the source and target domains via domain-specific prompt learning. Specifically, we introduce a novel technique to enhance prompt learning in the target domain. This method integrates fuzzy C-means clustering and a novel instance-level fuzzy vector into the prompt learning loss function, minimizing the distance between prompt cluster centers and instance prompts, thereby, enhancing the prompt learning process. In addition, we propose a Kullback–Leibler (KL) divergence-based loss function with a fuzzification factor. This function is designed to minimize the distribution discrepancy in the classification of similar cross-domain data, aligning domain-specific prompts during the training process. We contribute an in-depth analysis to understand the effectiveness of FUZZLE. Extensive experiments demonstrate that our method achieves superior performance on standard UDA benchmarks.
Unsupervised domain adaptation (UDA) techniques, extensively studied in hyperspectral image (HSI) classification, aim to use labeled source domain data and unlabeled target domain data to learn domain invariant features for cross-scene classification. Compared to natural images, numerous spectral bands of HSIs provide abundant semantic information, but they also increase the domain shift significantly. In most existing methods, both explicit alignment and implicit alignment simply align feature distribution, ignoring domain information in the spectrum. We noted that when the spectral channel between source and target domains is distinguished obviously, the transfer performance of these methods tends to deteriorate. Additionally, their performance fluctuates greatly owing to the varying domain shifts across various datasets. To address these problems, a novel shift-sensitive spatial-spectral disentangling learning (S4DL) approach is proposed. In S4DL, gradient-guided spatial-spectral decomposition (GSSD) is designed to separate domain-specific and domain-invariant representations by generating tailored masks under the guidance of the gradient from domain classification. A shift-sensitive adaptive monitor is defined to adjust the intensity of disentangling according to the magnitude of domain shift. Furthermore, a reversible neural network is constructed to retain domain information that lies not only in semantic but also the shallow-level detailed information. Extensive experimental results on several cross-scene HSI datasets consistently verified that S4DL is better than the state-of-the-art UDA methods. Our source code will be available at https://github.com/xdu-jjgs/IEEE_TNNLS_S4DL.
In unsupervised domain adaptation (UDA), negative transfer is one of the most challenging problems. Due to complex environments, the used domain data are always corrupted by noise or outliers in many applications. If the noisy data are directly used for domain adaptation, the disturbances and negative influence of the noise are also shifted for the target tasks. Thus, preventing disturbances and negative effects caused by noise are key problems in UDA that need to be addressed. In this article, a low-rank correlation learning (LRCL) method is proposed for UDA. In LRCL, the noisy domain data are recovered by low-rank learning; then both domain data are cleaned. Hence, the disturbances and negative effects of the noise are prevented. The maximized correlated features of the clean data from the source and target domains are learned by a novel correlation regularization term in a latent common space. LRCL also reduces the distribution difference of the learned clean source and target data by constructing a reconstruction term, in which the clean target data are linearly represented by the clean source data. To explore the temporal and structural information of the data, we further extend LRCL into a graph case and propose graph LRCL (GLRCL). Extensive experiments have been conducted on several public data benchmarks, and the experimental results demonstrate that our methods can effectively prevent negative transfer and obtain better classification outcomes than other compared approaches.
Large vision-language models (VLMs) like CLIP have demonstrated good zero-shot learning performance in the unsupervised domain adaptation task. Yet, most transfer approaches for VLMs focus on either the language or visual branches, overlooking the nuanced interplay between both modalities. In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. Leveraging insights from modality gap studies, we craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information while maintaining modality-specific nuances. We align features across domains using a modality discriminator. Comprehensive evaluations on three benchmarks reveal our approach sets a new state-of-the-art with minimal computational costs. Code: https://github.com/TL-UESTC/UniMoS.
Domain adaptation (DA) techniques aim to overcome the domain shift between the source domain used for training and the target domain where testing takes place. However, current DA methods assume that the entire target domain is available during adaptation, which may not hold in practice. We introduce a new, data-constrained DA paradigm where unlabeled target samples are received in batches and adaptation is performed continually. We propose a novel source-free method for continual unsupervised domain adaptation (UDA) that utilizes a buffer for selective replay of previously seen samples. In our continual DA framework, we selectively mix samples from incoming batches with data stored in a buffer using buffer management strategies and use the combination to incrementally update our model. We evaluate and compare the classification performance of the continual DA approach with state-of-the-art (SOTA) DA methods based on the entire target domain. Our results on three popular DA datasets demonstrate the benefits of our method when operating in data constrained environments. We further extend our experiments to adapting over multiple target domains and our method performs favorably with the SOTA methods.
Multi-Source-Free Unsupervised Domain Adaptation (MSFUDA) requires aggregating knowledge from multiple source models and adapting it to the target domain. Two challenges remain: 1) suboptimal coarse-grained (domain-level) aggregation of multiple source models, and 2) risky semantics propagation based on local structures. In this article, we propose an evidential learning method for MSFUDA, where we formulate two uncertainties, i.e. Evidential Prediction Uncertainty (EPU) and Evidential Adjacency-Consistent Uncertainty (EAU), respectively for addressing the two challenges. The former, EPU, captures the uncertainty of a sample fitted to a source model, which can suggest the preferences of target samples for different source models. Based on this, we develop an EPU-Based Multi-Source Aggregation module to achieve fine-grained, instance-level source knowledge aggregation. The latter, EAU, provides a robust measure of consistency among adjacent samples in the target domain. Utilizing this, we develop an EAU-Guided Local Structure Mining module to ensure the trustworthy propagation of semantics. The two modules are integrated into the Evidential Aggregation and Adaptation Framework (EAAF), and we demonstrated that this framework achieves state-of-the-art performances on three MSFUDA benchmarks.
Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of Vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4× speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6× more parameters, the VFM-UDA approach maintains a 3.3× speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://github.com/tue-mps/vfm-uda.
Unsupervised domain adaptation (UDA) intends to transfer knowledge from a labeled source domain to an unlabeled target domain. Many current methods focus on learning feature representations that are both discriminative for classification and invariant across domains by simultaneously optimizing domain alignment and classification tasks. However, these methods often overlook a crucial challenge: the inherent conflict between these two tasks during gradient-based optimization. In this paper, we delve into this issue and introduce two effective solutions known as Gradient Harmonization, including GH and GH++, to mitigate the conflict between domain alignment and classification tasks. GH operates by altering the gradient angle between different tasks from an obtuse angle to an acute angle, thus resolving the conflict and trade-offing the two tasks in a coordinated manner. Yet, this would cause both tasks to deviate from their original optimization directions. We thus further propose an improved version, GH++, which adjusts the gradient angle between tasks from an obtuse angle to a vertical angle. This not only eliminates the conflict but also minimizes deviation from the original gradient directions. Finally, for optimization convenience and efficiency, we evolve the gradient harmonization strategies into a dynamically weighted loss function using an integral operator on the harmonized gradient. Notably, GH/GH++ are orthogonal to UDA and can be seamlessly integrated into most existing UDA models. Theoretical insights and experimental analyses demonstrate that the proposed approaches not only enhance popular UDA baselines but also improve recent state-of-the-art models.
This work investigates unsupervised domain adaptation (UDA)-based semantic segmentation of very high-resolution (VHR) remote sensing (RS) images from different domains. Most existing UDA methods resort to generative adversarial networks (GANs) to cope with the domain shift problem caused by the discrepancies across different domains. However, these GAN-based UDA methods directly align two domains in the appearance, latent, or output space based on convolutional neural networks (CNNs), making them ineffective in exploiting long-range dependencies across the high-level feature maps derived from different domains. Unfortunately, such high-level features play an essential role in characterizing RS images with complex content. To circumvent this obstacle, a mutually boosted attention transformer (MBATrans) is proposed to capture cross-domain dependencies of semantic feature representations in this work. Compared with conventional UDA methods, MBATrans can significantly reduce domain discrepancies by capturing transferable features using global attention. More specifically, MBATrans utilizes a novel mutually boosted attention (MBA) module to align cross-domain feature maps while enhancing domain-general features. Furthermore, a novel GAN-based network with improved discriminative capability is devised by integrating an additional discriminator to learn domain-specific features. Extensive experiments on two large-scale VHR RS datasets, namely, International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam and Vaihingen, confirm the superior performance of the proposed MBATrans-augmented GAN (MBATA-GAN) architecture. The source code in this work is available at https://github.com/sstary/SSRS.
Landslides are among the most frequent and destructive geological hazards worldwide. Accurate and timely detection of landslide-affected areas from high-resolution uncrewed aerial vehicles (UAVs) imagery is essential for disaster mitigation, emergency response, and post-event evaluation. Accurate landslide detection using high-resolution UAV imagery is crucial for disaster prevention and emergency response. However, complex terrain conditions, such as indistinct boundaries, scale variation, and background interference from exposed rocks or shadows, make it difficult to extract reliable semantic features and achieve robust segmentation results. To address these challenges, we propose wavelet-guided semantic decoupling for landslide detection (WSDLNet), a novel semantic segmentation framework that integrates wavelet-domain decomposition, hierarchical attention, and spatial-frequency decoupling. By leveraging frequency-aware encoding, adaptive multiscale representation, and refined spatial attention, WSDLNet enhances feature discrimination in both spectral and spatial domains, enabling robust identification of landslide regions under diverse geological conditions. Extensive experiments on four practical UAV landslide datasets demonstrate that WSDLNet achieves state-of-the-art performance in terms of segmentation accuracy, boundary refinement, and generalization capability under diverse terrain and lighting conditions, offering a practical solution for intelligent landslide detection and risk assessment.
Automated landslide change detection using remote sensing imagery is critical for rapid disaster response. However, landslide change detection using bi-temporal optical imagery is frequently degraded by cross-region domain shifts and by the elongated, anisotropic morphology of landslide boundaries, leading to substantial pseudo-change alarms. To suppress pseudo-changes and improve cross-region robustness, we propose a DEM-assisted topography-conditioned and orientation-adaptive Siamese network (DEMO-Net) that injects topographic inductive bias through terrain-conditioned feature modulation and orientation-adaptive convolutions. Specifically, DEM-derived multi-channel priors are encoded to predict spatially varying FiLM parameters that recalibrate shallow optical features, suppressing spurious changes while preserving discriminative cues. In addition, we introduce an adaptive-oriented attention convolution that leverages a DEM-derived aspect to guide sparse multi-orientation aggregation via shared-kernel transformation, enabling direction-aware receptive-field alignment for elongated and direction-varying landslide structures without costly global attention. Experiments on the GVLM benchmark under a 5-fold site-wise cross-region protocol show that DEMO-Net achieves 85.17% F1 and 74.26% mIoU, outperforming the strongest CNN baseline FC-EF by 5.05% and 7.20%, respectively. These results demonstrate the effectiveness of jointly leveraging terrain-conditioned calibration and physically consistent orientation-aligned feature extraction for robust cross-region landslide change detection.
Hyperspectral change detection, which provides abundant information on land cover changes in the Earth’s surface, has become one of the most crucial tasks in remote sensing. Recently, deep-learning-based change detection methods have shown remarkable performance, but the acquirement of labeled data is extremely expensive and time-consuming. It is intuitive to learn changes from the scene with sufficient labeled data and adapting them into an unlabeled new scene. However, the nonnegligible domain shift between different scenes leads to inevitable performance degradation. In this article, a cycle-refined multidecision joint alignment network (CMJAN) is proposed for unsupervised domain adaptive hyperspectral change detection, which realizes progressive alignment of the data distributions between the source and target domains with cycle-refined high-confidence labeled samples. There are two key characteristics: 1) progressively mitigate the distribution discrepancy to learn domain-invariant difference feature representation and 2) update the high-confidence training samples of the target domain in a cycle manner. The benefit is that the domain shift between the source and target domains is progressively alleviated to promote change detection performance on the target domain in an unsupervised manner. Experimental results on different datasets demonstrate that the proposed method can achieve better performance than the state-of-the-art change detection methods.
Recently, deep neural networks (DNNs) have been widely used in hyperspectral image change detection (HSI-CD). Generally, training such a DNN-based HSI-CD network often requires a large number of labeled training samples. However, it is time-consuming, labor-intensive, or even infeasible to label training samples in practice. In this article, we propose a feature mutual representation-based graph domain adaptive network (FGDANet) for unsupervised HSI-CD. This method constructs a pseudosiamese backbone consisting of two customized unsupervised learning domains, which can make full use of the information from different domains through the graph domain adaptation strategy to improve the feature expression capability and generalization. There are three key characteristics: first, in each customized unsupervised learning domain, a graph convolutional network (GCN)-based difference feature extraction architecture is designed to model the local and global dependence among the features of multitemporal HSIs; second, a progressive graph-to-pixel joint constraint strategy (PJCS) is proposed to provide the high-confidence training sample labels for the unsupervised learning of the network in each domain; and third, the homogeneous mutual representation joint graph feature alignment (HJGFA) module of the graph domain adaptation strategy can make full use of the difference features from the two domains through the information interaction to facilitate the model to capture the changed and unchanged essential characteristics. The experimental results on four HSI datasets demonstrate the superiority of the proposed FGDANet. Code is available at https://github.com/Jiahuiqu/FGDANet.
Object detection in remote sensing images (RSIs) is pivotal for various tasks such as natural disaster warning, environmental monitoring, teacher–student urban planning. Object detection methods based on domain adaptation have emerged, which effectively decrease the dependence on annotated samples, making significant advances in unsupervised scenarios. However, these methods fall short in their ability to learn remote sensing object features of the target domain, thus limiting the detection capabilities in many complex scenarios. To fill this gap, this paper integrates a multi-granularity feature alignment strategy and the teacher–student framework to enhance the capability of detecting remote sensing objects, and proposes a multi-granularity domain-adaptive teacher (MGDAT) framework to better bridge the feature gap across target and source domain data. MGDAT incorporates the teacher–student framework at three granularities, including pixel-, image- and instance-level feature alignment. Extensive experiments show that MGDAT surpasses SOTA baselines in detection accuracy, and exhibits great generalizability. This proposed method can serve as a methodology reference for various unsupervised interpretation tasks of RSIs.
The objective of change detection (CD) is to identify the altered region between dual-temporal images. In pursuit of more precise change maps, numerous state-of-the-art (SOTA) methods design neural networks with robust discriminative capabilities. The convolutional neural network (CNN)-transformer model is specifically designed to integrate the strengths of the CNN and transformer, facilitating effective coupling of feature information. However, previous CNN-transformer studies have not effectively mitigated the interference of feature distribution differences as well as pseudovariations between two images due to cloud occlusion, imaging conditions, and other factors. In this article, we propose a domain adaptive and interactive differential attention network (DA-IDANet). This model incorporates domain adaptive constraints (DACs) to mitigate the interference of pseudovariations by mapping the two images to the same deep feature space for feature alignment. Furthermore, we designed the interactive differential attention module (IDAM), which effectively improves the feature representation and promotes the coupling of interactive differential discriminant information, thereby minimizing the impact of irrelevant information. Experiments on four datasets demonstrate the superior validity and robustness of our proposed model compared to other SOTA methods, as evident from both quantitative analysis and qualitative comparisons. The code will be available online (https://github.com/Jyl199904/DA-IDANet).
Recent studies have used unsupervised domain adaptive object detection (UDAOD) methods to bridge the domain gap in remote sensing (RS) images. However, UDAOD methods typically assume that the source domain data can be accessed during the domain adaptation process. This setting is often impractical in the real world due to RS data privacy and transmission difficulty. To address this challenge, we propose a practical source-free object detection (SFOD) setting for RS images, which aims to perform target domain adaptation using only the source pre-trained model. We propose a new SFOD method for RS images consisting of two parts: perturbed domain generation and alignment. The proposed multilevel perturbation constructs the perturbed domain in a simple yet efficient form by perturbing the domain-variant features at the image level and feature level according to the color and style bias. The proposed multilevel alignment calculates feature and label consistency between the perturbed domain and the target domain across the teacher-student network, and introduces the distillation of feature prototype to mitigate the noise of pseudo-labels. By requiring the detector to be consistent in the perturbed domain and the target domain, the detector is forced to focus on domaininvariant features. Extensive results of three synthetic-to-real experiments and three cross-sensor experiments have validated the effectiveness of our method which does not require access to source domain RS images. Furthermore, experiments on computer vision datasets show that our method can be extended to other fields as well. Our code will be available at: https://weixliu.github.io/ .
Recent advancements in Vison Foundation Models (VFMs) like the Segment Anything Model (SAM) have exhibited remarkable progress in natural image segmentation. However, its performance on remote sensing images is limited, especially in some application scenarios that require strong expert knowledge involvement, such as landslide detection. In this study, we proposed an effective segmentation model, namely LandslideNet, which is realized by embedding a tuning layer in a pre-trained encoder and adapting the SAM to the landslide detection scene for the first time. The proposed method is compared with traditional convolutional neural networks (CNN) on two well-known landslide datasets. The results indicate that the proposed model with fewer training parameters has better performance in detecting small-scale targets and delineating landslide boundaries, with an improvement of 6-7 percentage points in accuracy (F1 and mIoU) compared to mainstream CNN-based methods.
No abstract available
No abstract available
No abstract available
Landslides are one of the most destructive natural disasters in the world, threatening human life and safety. With excellent performance as a foundation model for image segmentation, the segment anything model (SAM) has provided a novel paradigm for semantic segmentation research. However, the lack of remote sensing images in the SAM training data limits its ability to recognize landslides. In addition, despite the transfer learning approach can transfer SAM feature extraction capability to the landslide segmentation task, but it will consume a lot of computational resources and training time. In order to solve these challenges, this study proposes a TransLandSeg model that transfers the segmentation capability of SAM while learning landslide features at a low training cost. To limit model training parameters, the adaptive transfer learning (ATL) module is purposely designed, the image encoder is frozen during model training, only the ATL module and mask decoder are trained, and the knowledge learned from the ATL module is input into the original network. Moreover, to select the best ATL module, we also designed 9 kinds of ATL modules and analyzed the accuracy of the TransLandSeg model with different ATL modules. We selected the Bijie landslide dataset and the Landslide4Sense dataset for model training and testing. The experiment results show that the TransLandSeg model increases the mean intersection over union by 1.48% –13.01% compared to other state-of-the-art semantic segmentation models. In addition, TransLandSeg requires only 1.3% of SAM parameters to enable SAM's powerful capabilities to transfer to landslide segmentation.
Image segmentation plays a key role in remote sensing, particularly in landslide image segmentation. Remote sensing of landslide images is challenging due to their single category but complex detailed features, making it difficult to determine landslide boundaries and extents. Traditional segmentation methods often yield poor results for such images. To address these challenges, we propose the Large Kernel Nested UKAN (LKN-UKAN). The key contributions and findings are as follows. (1) We embed a Tokenized KAN Block (Tok-KAN) in U-Net++ to enhance complex feature modeling, leveraging Tok-KAN’s strengths in nonlinear modeling and relationship capture. (2) We design a Dual Large Fusion Selective Kernel Attention (DLFFSKA) module to improve global perception and contextual information capture. (3) We apply transfer learning to transfer feature-rich remote sensing image features to landslide data, significantly improving segmentation performance. The experimental results demonstrate that the LKN-UKAN achieved significant improvements in remote sensing landslide image segmentation compared with state-of-the-art methods, particularly in terms of boundary accuracy and feature representation.
No abstract available
Landslides are characterized by their suddenness and destructive power, making rapid and accurate identification crucial for emergency rescue and disaster assessment in affected areas. To address the challenges of limited landslide samples and data complexity, a landslide identification sample library was constructed using high-resolution remote sensing imagery combined with field validation. An innovative Dual-Coded Segmentation Network (DS Net), which realizes dynamic alignment and deep fusion of local details and global context, image features and domain knowledge through the multi-attention mechanism of Prior Knowledge Integration (PKI) module and Cross-Feature Aggregation (CFA) module, significantly improves the landslide detection accuracy and reliability. To objectively evaluate the performance of the DS Net model, four efficient semantic segmentation models—SegFormer, SegNeXt, FeedFormer, and U-MixFormer—were selected for comparison. The results demonstrate that DS Net achieves superior performance (overall accuracy = 0.926, precision = 0.884, recall = 0.879, and F1-score = 0.882), with metrics that are 3.5–7.1% higher than the other models. These findings confirm that DS Net effectively improves the accuracy and efficiency of landslide identification, providing a critical scientific basis for landslide prevention and mitigation.
Addressing the technical demands for the rapid, precise detection of earthquake-triggered landslides in loess tablelands, this study proposes and validates an innovative methodology integrating enhanced deep learning architectures with large-tile processing strategies, featuring two core advances: (1) a critical enhancement of YOLOv8’s shallow layers via a higher-resolution P2 detection head to boost small-target capture capabilities, and (2) the development of a large-tile segmentation–tile mosaicking workflow to overcome the technical bottlenecks in large-scale high-resolution image processing, ensuring both timeliness and accuracy in loess landslide detection. This study utilized 20 km2 of high-precision UAV imagery acquired after the 2023 Gansu Jishishan Ms 6.2 earthquake as foundational data, applying our methodology to achieve the rapid detection and precise segmentation of landslides in the study area. Validation was conducted through a comparative analysis of high-accuracy 3D models and field investigations. (1) The model achieved simultaneous convergence of all four loss functions within a 500-epoch progressive training strategy, with mAP50(M) = 0.747 and mAP50-95(M) = 0.46, thus validating the superior detection and segmentation capabilities for the Jishishan earthquake-triggered loess landslides. (2) The enhanced algorithm detected 417 landslides with 94.1% recognition accuracy. Landslide areas ranged from 7 × 10−4 km2 to 0.217 km2 (aggregate area: 1.3 km2), indicating small-scale landslide dominance. (3) Morphological characterization and the spatial distribution analysis revealed near-vertical scarps, diverse morphological configurations, and high spatial density clustering in loess tableland landslides.
In the contemporary world Landslides pose huge risks to communities and infrastructure globally. One of the reasons it occurs is due to slope movement caused by urbanization and climate change in recent years. Landslide Detection Systems are an evolving technology that is very crucial for handling the impacts of these natural disasters. LDS plays a vital role in identifying and monitoring landslide occurrences across diverse terrains. This paper aims to propose one such LDS system that can detect landslides using a benchmark dataset called LandSlide4Sense through specialized deep- learning models. The motive is to determine the most effective methods by training them on large amounts of data that can help enhance accuracy and ensure quick detection and response. Through this research, the paper highlights advancements in DL-based semantic segmentation for satellite imagery analysis.
No abstract available
No abstract available
No abstract available
Change detection plays a crucial role in geological disaster tasks such as landslide identification, post-earthquake building reconstruction assessment, and unstable rock mass monitoring. However, real-world scenarios often pose significant challenges, including complex surface backgrounds, illumination and seasonal variations between temporal phases, and diverse change patterns. To address these issues, this paper proposes a multi-stage model for geological disaster change detection, termed MSMCD, which integrates strategies of global dependency modeling, local difference enhancement, edge constraint, and frequency-domain fusion to achieve precise perception and delineation of change regions. Specifically, the model first employs a DualTimeMamba (DTM) module for two-dimensional selective scanning state-space modeling, explicitly capturing cross-temporal long-range dependencies to learn robust shared representations. Subsequently, a Multi-Scale Perception (MSP) module highlights fine-grained differences to enhance local discrimination. The Edge–Change Interaction (ECI) module then constructs bidirectional coupling between the change and edge branches with edge supervision, improving boundary accuracy and geometric consistency. Finally, the Frequency-domain Change Fusion (FCF) module performs weighted modulation on multi-layer, channel-joint spectra, balancing low-frequency structural consistency with high-frequency detail fidelity. Experiments conducted on the landslide change detection dataset (GVLM-CD), post-earthquake building change detection dataset (WHU-CD), and a self-constructed unstable rock mass change detection dataset (TGRM-CD) demonstrate that MSMCD achieves state-of-the-art performance across all benchmarks. These results confirm its strong cross-scenario generalization ability and effectiveness in multiple geological disaster tasks.
The deployment of landslide intelligent recognition models in non-training regions encounters substantial challenges, primarily attributed to heterogeneous remote sensing acquisition parameters and inherent geospatial variability in factors such as topography, vegetation cover, and soil characteristics across distinct geographic zones. Addressing the issue of underutilization of landslide contextual information and morphological integrity in domain adaptation methods, this paper introduces a cross-domain landslide extraction approach that integrates image masking with enhanced morphological information. Specifically, our approach implements a pixel-level mask on target domain imagery, facilitating the utilization of context information from the masked images. Furthermore, it establishes a morphological information extraction module, grounded in predefined thresholds and rules, to produce morphological pseudo-labels for the target domain. The results demonstrate that our method achieves an IoU (intersection over union) improvement of 1.78% and 6.02% over the suboptimal method in two cross-domain tasks, respectively, and a remarkable performance enhancement of 33.13% and 31.79% compared to scenarios without domain adaptation. This cross-domain extraction method not only substantially boosts the accuracy of cross-domain landslide identification but also enhances the completeness of landslide morphology information, offering robust technical support for landslide disaster monitoring and early warning systems.
ABSTRACT Landslide mapping from remote sensing imagery is essential for disaster prevention and geohazard risk assessment. Recent advances in deep learning have enabled more accurate landslide extraction, yet current methods often suffer from limited generalization across regions with different imaging conditions, weak adaptability to diverse data sources, and difficulty in capturing small or subtle landslide features. To address these challenges, we propose a style-aware spatial-frequency spiking convolution (SSFSC) framework to enhance adaptability and extraction precision across heterogeneous domains using remote sensing images of varying spatial resolutions and sensors. SSFSC attempts to reduce domain shifts caused by variations in textures, color, and feature patterns across datasets by incorporating a style transfer strategy. On this basis, separable spiking convolution is proposed to mimic biologically inspired learning to capture the complex spectral and morphological features of landslides and aggregate distinctive landslide representations over background objects across spatial-frequency domains. To validate the effectiveness of SSFSC, we conduct experiments across diverse landslide-prone regions and compare with nine baseline methods, including DeepLabv3+, Segformer, TransUnet, Max-DeepLab, SwinUnet, Mask2Former, SCDUNet++, DAFormer and CLUDA. Results show that SSFSC significantly improves both accuracy and generalization, achieving an approximate 28.69% IoU improvement compared to existing methods. These findings demonstrate that SSFSC offers a scalable and efficient solution for automated landslide mapping, and has strong potential to support long-term hazard monitoring and emergency response applications under varying environmental conditions.
Semantically consistent cross-domain image translation facilitates the generation of training data by transferring labels across different domains, making it particularly useful for plant trait identification in agriculture. However, existing generative models struggle to maintain object-level accuracy when translating images between domains, especially when domain gaps are significant. In this work, we introduce AGILE (Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification), a diffusion-based framework that leverages optimized text embeddings and attention guidance to semantically constrain image translation. AGILE utilizes pretrained diffusion models and publicly available agricultural datasets to improve the fidelity of translated images while preserving critical object semantics. Our approach optimizes text embeddings to strengthen the correspondence between source and target images and guides attention maps during the denoising process to control object placement. We evaluate AGILE on cross-domain plant datasets and demonstrate its effectiveness in generating semantically accurate translated images. Quantitative experiments show that AGILE enhances object detection performance in the target domain while maintaining realism and consistency. Compared to prior image translation methods, AGILE achieves superior semantic alignment, particularly in challenging cases where objects vary significantly or domain gaps are substantial.
Specific Emitter Identification (SEI) is an emerging technology that leverages radio frequency fingerprint (RFF) to identify wireless devices. While deep learning methods have shown significant potential for automatic RFF feature extraction and identification, their reliance on large labeled datasets limits their practical applicability. This letter proposes a contrastive self-supervised learning framework integrating von Neumann entropy and exponential moving average (CSEE). With the support of CSEE, an RFF extractor can be pre-trained using unlabeled auxiliary datasets from the source domain. CSEE can directly control the eigenvalue distribution in the final representation’s autocorrelation matrix, thereby optimizing representation properties such as rank and isotropy while reducing information redundancy, thus preserving critical signal-domain features. In the target domain, only a few samples are needed to fine-tune the model to achieve good recognition accuracy. We conducted cross-domain SEI experiments using ADS-B as the source domain with LoRa and AIS as two separate target domains. The results show that with just 30 fine-tuned samples, the CSEE-pretrained RFF extractor achieved nearly 90% identification accuracy, surpassing other self-supervised learning methods. These findings demonstrate the strong generalization capability of CSEE across different target domains. The codes are available at https://github.com/NavLabLqs/CSEE.
Arabic dialect identification (ADI) systems are essential for large-scale data collection pipelines that enable the development of inclusive speech technologies for Arabic language varieties. However, the reliability of current ADI systems is limited by poor generalization to out-of-domain speech. In this paper, we present an effective approach based on voice conversion for training ADI models that achieves state-of-the-art performance and significantly improves robustness in cross-domain scenarios. Evaluated on a newly collected real-world test set spanning four different domains, our approach yields consistent improvements of up to +34.1% in accuracy across domains. Furthermore, we present an analysis of our approach and demonstrate that voice conversion helps mitigate the speaker bias in the ADI dataset. We release our robust ADI model and cross-domain evaluation dataset to support the development of inclusive speech technologies for Arabic.
Image detection plays a critical role in quality control across manufacturing and healthcare sectors, yet existing methods struggle to meet real-world requirements due to their heavy reliance on large labeled datasets, poor generalization across different domains, and limited adaptability to diverse application scenarios. These limitations significantly hinder the deployment of AI solutions in practical industrial settings where data scarcity and domain variations are common. To address these issues, we propose MMAL-CL, a unified deep learning framework that integrates an Edge Feature Module (EFM) with multi-subspace mapping attention and an Adaptive Deep Learning Module (ADLM) for cross-domain feature decoupling. The EFM extracts translation-invariant features through residual convolution blocks and a novel multi-subspace attention mechanism, enhancing the model’s ability to capture interdependencies between features. The ADLM enables few-shot learning by mixing task-irrelevant auxiliary data with target domain samples and optimizing feature separation via a dual-classifier strategy. Finally, we evaluated the model’s performance on five datasets (two industrial and three medical) demonstrate that MMAL-CL achieves 99.7% precision on the NEU-CLS dataset with full data and maintains 71.3% precision with only 20 samples per class, outperforming other methods in few-shot settings. The framework shows remarkable cross-domain generalization capability, with an average 12.8% improvement in F1-score over existing methods. These results highlight MMAL-CL’s potential as a practical solution for image detection that can operate effectively with limited training data while maintaining high accuracy across diverse application scenarios.
Prior research has shown that cross-domain gender identification (GI) in VR is challenging, often due to limited overlapping features and a lack of shared users across datasets. In this work, we examine two distinct VR environments—a solar panel task and a biological exploration task—using a consistent feature set and eye-tracking (ET) data from common users. Our results confirm that cross-domain classification is substantially harder than domain-specific tasks and highlight head position as a key feature. Importantly, we show that incorporating common users improves model performance, emphasizing the role of user overlap in enhancing the generalizability of GI models in VR.
In the Industrial Internet of Things (IIoT), the identification resolution system enhances communication and overall efficiency between isolated work islands, ensuring the trustworthiness and effectiveness of secure resource sharing across domains through cross-domain authentication. However, traditional identity authentication methods fail to empower users with control over their identity information and face challenges such as difficulty in tracking anonymous users, high computational overhead, and insufficient cross-domain trust. To address these issues, this paper proposes a cross-domain identity authentication scheme based on self-sovereign identity. The scheme leverages aggregate signature technology to enhance authentication efficiency, integrates blockchain technology and smart contracts to achieve cross-domain trust, and designs a mechanism for threshold identity tracking and revocation, as well as an attribute credential update mechanism to enable secure and efficient cross-domain authentication in the identification resolution system. The paper provides formal security definitions and proofs and evaluates the computational and storage efficiency of the scheme through theoretical analysis and experimental simulations. The results demonstrate that the proposed scheme offers significant advantages in resource-constrained scenarios within the IIoT.
No abstract available
Person re-identification (Re-ID) is a crucial task in surveillance and security applications, aiming to match individuals across non-overlapping camera views despite variations in illumination, pose, and occlusion. Traditional deep learning methods, particularly convolutional neural networks (CNNs), struggle with cross-domain generalization due to domain shifts. In this paper, we propose MSC-TReID, a Multi-Scale Transformer-based framework designed to enhance feature robustness and cross-domain adaptability. Our model integrates a hybrid CNN-Transformer backbone to capture both local and global identity features, a multi-scale attention mechanism to focus on discriminative regions, and domain adversarial training to mitigate domain shift. Extensive experiments on Market-1501, DukeMTMC-reID, and MSMT17 datasets demonstrate that MSC-TReID outperforms state-of-the-art Re-ID models, achieving significant improvements in Rank-1 accuracy and mean Average Precision. Additionally, cross-domain evaluation results confirm the model's superior generalization ability.
Cross-domain fault diagnosis of rolling bearings plays a pivotal role in maintaining mechanical system reliability and operational safety, particularly under complex operational environments where bearing failures frequently occur. To improve the accuracy of cross-domain fault identification, a novel cascaded synchrosqueezed wavelet–scattering transform and bagging–convolutional neural network (SWST-BCNN) was proposed in this paper. Firstly, a synchrosqueezed wavelet–scattering transform stage was designed. It preserves discriminative fault attributes in synchrosqueezed wavelet-based time–frequency representations while compressing feature dimensionality. Then, a bagging–convolutional neural network stage was designed to improve accuracy by integrating several different CNN classifiers. Finally, the proposed SWST-BCNN network was verified on the public bearing dataset of Case Western Reserve University and was compared with some classical algorithms. The results showed that the proposed SWST-BCNN network exhibited higher accuracy. This paper provides a feasible method for cross-domain fault identification of bearings.
Fire events threaten the safety of residents and the health of ecosystems in affected areas, and post-disaster recovery efforts also require a large investment of resources and time. In recent years, the rising frequency of fire events has motivated local governments to strengthen their monitoring and emergency response efforts. However, current fire event identification methods can only identify the presence of a fire, without the ability to distinguish its specific category. In addition, when a fire occurs, the lack of information about the affected areas makes it challenging for emergency management authorities to take timely and effective rescue measures. To address these issues, we propose a two-stage framework for fire event identification and assessment. Specifically, in the first stage, based on multi-band fused remote sensing images and heterogeneous environmental images, the proposed framework not only identifies various fire events but also accurately identifies the boundaries of the fire events. In the second stage, integrating the results of fire event identification with social media data and domain knowledge, we present a real-time assessment agent for fire events based on the large language model. This agent enables timely and accurate analysis of the impact of fires on the affected areas. We evaluate our method on a real-world authority dataset, and results show that our framework identifies fire events with an F1-score of 61.0<inline-formula><tex-math notation="LaTeX">$\%$</tex-math><alternatives><mml:math><mml:mo>%</mml:mo></mml:math><inline-graphic xlink:href="chen-ieq1-3528413.gif"/></alternatives></inline-formula> and a mAP of 57.7<inline-formula><tex-math notation="LaTeX">$\%$</tex-math><alternatives><mml:math><mml:mo>%</mml:mo></mml:math><inline-graphic xlink:href="chen-ieq2-3528413.gif"/></alternatives></inline-formula>, which outperforms state-of-the-art baseline methods. In addition, the assessment results of fire events in real cases indicate that the proposed fire event assessment agent can assist emergency responders in obtaining timely and accurate information.
No abstract available
Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) images and the daunting labeling efforts. To this end, a deep learning model based on semi-supervised multi-temporal deep representation fusion network, namely SMDRF-Net, is proposed for reliable and efficient LM. In comparison with previous methods, the SMDRF-Net possesses three distinct properties. (1) Unsupervised deep representation learning at the pixel- and object-level is performed by transfer learning using the Wasserstein generative adversarial network with gradient penalty to learn discriminative deep features and retain precise outlines of landslide objects in the high-level feature space. (2) Attention-based adaptive fusion of multi-temporal and multi-level deep representations is developed to exploit the spatio-temporal dependencies of deep representations and enhance the feature representation capability of the network. (3) The network is optimized using limited samples with pseudo-labels that are automatically generated based on a comprehensive uncertainty index. Experimental results from the analysis of VHR aerial orthophotos demonstrate the reliability and robustness of the proposed approach for LM in comparison with state-of-the-art methods.
Detecting earthquake-induced landslides in remote sensing images is challenging due to the varying sizes of landslides, uneven distribution, and the prevalence of small targets. This study proposes a novel approach, the TLSTMF-YOLO model, which combines a C3-Swin-Transformer and multiscale feature fusion techniques to enhance detection accuracy and efficiency. Key innovations include the use of a convolutional block attention module (CBAM) to improve feature representation, and a bidirectional feature pyramid network (BiFPN) for optimized cross-scale feature fusion. To address data scarcity, a transfer learning strategy is applied, supported by an AdamW optimizer and cosine learning rate strategy for faster convergence. Evaluations on the Jiuzhaigou and Luding landslide datasets demonstrate the model’s effectiveness, achieving precision, recall, and mean average precision (mAP)@0.5 of 95.7%, 89.9%, and 90.5% on the Jiuzhaigou dataset, and 96.0%, 90.9%, and 94.5% on the Luding dataset, respectively. In addition, the model processes frames efficiently, with times of 6.61 and 12.2 ms on the two datasets. These results confirm the model’s capability for accurate and efficient landslide detection, highlighting its potential for real-world applications.
Sudden co-seismic landslides strike, causing widespread devastation and demanding a rapid response. The swift and accurate acquisition of landslide information is essential for effective disaster relief. Deep learning (DL)-based computer-aided interpretation methods have emerged as cutting-edge tools for landslide detection. Nevertheless, traditional DL approaches face limitations, such as high annotation costs, slow processing speeds, and low generalizability, rendering them unsuitable for rapid co-seismic landslide recognition tasks. This study presents a progressive approach for co-seismic landslide detection. First, we develop a Multi-scale Feature Fusion Lightweight Neural Network (MFFLnet), achieving exceptional generalizability and speed while maintaining precision. Second, we employ the deep transfer learning (TL) strategy, enabling MFFLnet to leverage prior landslide knowledge from a source domain and a refined data augmentation algorithm to combat overfitting. The proposed methodology is implemented in two co-seismic landslide scenes in Hokkaido, Japan, and Luding, China. Experimental results demonstrate that the proposed method exhibits outstanding performance in regional landslide recognition and robust performance across different co-seismic landslide detection scenarios. Our approach proves competitive in efficient co-seismic landslide disaster recognition and cross-scene identification, showcasing significant applicability in the face of rapid response demands.
No abstract available
ABSTRACT Landslides are major natural disasters in mountainous areas, often caused by earthquakes and heavy rainfalls. Traditional manual delineation methods for identifying landslide features using optical imagery are inefficient, highlighting the need for automated detection techniques. Deep Convolutional Neural Networks (CNNs) have emerged as advanced solutions in computer vision for this purpose. Despite the reliance on pre-event and post-event imagery or various data sources like digital elevation models (DEMs), the success of deep learning models largely depends on the quality and availability of training data. This poses a challenge for their immediate application after a landslide. This study explores the transferability of a CNN model trained on data from the 2016 Kumamoto Earthquakes for detecting landslides in different events, specifically the 2018 Hokkaido earthquake and the 2017 Asakura Rainfall in Japan. These cases were chosen for their geographical similarities. The proposed deep transfer learning model, based on a DeepLabV3 + architecture built on a pre-trained ResNet50, automatically identifies landslide features without needing specific training data or model adjustments for each event. It achieved high accuracy in both cases, demonstrating CNNs’ potential for broad application in landslide detection and enhancing disaster response efforts.
Landslides are one of the most destructive natural disasters in the world, posing a serious threat to human life and safety. The development of foundation models has provided a new research paradigm for large-scale landslide detection. The Segment Anything Model (SAM) has garnered widespread attention in the field of image segmentation. However, our experiment found that SAM performed poorly in the task of landslide segmentation. We propose TransLandSeg, which is a transfer learning approach for landslide semantic segmentation based on a vision foundation model (VFM). TransLandSeg outperforms traditional semantic segmentation models on both the Landslide4Sense dataset and the Bijie landslide dataset. Our proposed adaptive transfer learning (ATL) architecture enables the powerful segmentation capability of SAM to be transferred to landslide detection by training only 1.3% of the number of the parameters of SAM, which greatly improves the training efficiency of the model. Finally we also conducted ablation experiments on models with different ATL structures, concluded that the deployment location and residual connection of ATL play an important role in TransLandSeg accuracy improvement.
No abstract available
Labeling remote sensing data for classification is labor-intensive and time-consuming. Transfer learning (TL), under such context, is attracting increasing attention as it aims to harness information from data set of other regions where labels are readily available. The central topic of concern is to homogenize the large disparities of feature distribution of different data set through domain adaptation (DA). This article proposes a novel DA method for unsupervised TL, namely, multikernel jointly domain matching (MKJDM), which by definition considers multiple kernels as opposed to the currently popular single-kernel methods for measuring the distances between distributions. The single-kernel methods minimize the distances of feature distribution between the source domain (data set with training labels) and the target domain (data set to be classified) through, for example, maximum mean discrepancy (MMD) metric, formed under a kernel function mapping, while the multikernel version (MK-MMD) uses different kernel functions to encapsulate multiple aspects of distribution discrepancies, and is, therefore, more capable of distance minimization. Our MKJDM implementation also considers simultaneously aligning marginal and class conditional distributions and reweight for each instance, which further improves the performance. Two experiments performed on remote sensing images and multi-modal data sets (i.e., Orthophoto and Digital Surface Models), with regions of different countries with distinctly different land patterns serving as source and target domain data, show that the overall accuracies are improved by 37.28% and 46.62% after applications of our MKJDM method. An additional comparative experiment with five state-of-the-art DA methods also demonstrates that our method achieves the best performance.
Rapid landslide identification through remote sensing imagery plays a critical role in post-disaster emergency management. While deep learning approaches have shown promise for semantic segmentation, existing models face limitations in extraction accuracy and processing efficiency for landslide detection applications. We propose the MED-DeepLabv3+ model to resolve these challenges. The proposed model incorporates three key improvements: (1) Replacing Xception in the encoder with the MobileNetV3 feature extraction network enhances the model’s focus on landslide semantic features and reduces the size of the model parameters; (2) Integrating an Efficient Multi-Scale Attention (EMA) module to refine shallow feature representations and enhance multi-scale feature extraction; (3) Introducing a DS-ASPP module, which replaces standard convolutions with Depthwise Separable Convolutions and incorporates Squeeze-and-Excitation (SE) and Strip Pooling (SP) modules to improve deep feature recognition. Additionally, the singularity of publicly available landslide datasets poses a significant challenge for training deep learning models in this domain. To mitigate this issue, we construct a diverse deep learning-based landslide dataset tailored for landslide recognition research. Results demonstrate that MED-DeepLabv3+ achieves superior performance in landslide detection, obtaining a Mean Intersection over Union (MIoU) of 81.54%, pixel accuracy of 94.63%, precision of 77.69%, recall of 86.35% and an F1-score of 81.79%. Compared to the baseline DeepLabv3+ model, MED-DeepLabv3+ achieves higher detection accuracy while maintaining a lightweight architecture, making it well-suited for rapid and precise landslide identification.
This paper presents an identity-invariant facial expression recognition framework. It aims to make a facial expression recognition (FER) model independently understand facial expressions and identity (ID) attributes such as gender, age, and skin, which are entangled in face images. The learned representations of the FER model pursue robustness against unseen ID samples with large attribute differences. Specifically, attribute properties describing (facial) images are retrieved through a powerful pre-trained model, i.e., CLIP. Then, expression features and ID features are realized through residual module(s). As a result, the features learn expression-efficient and ID-invariant representations based on mutual information. The proposed framework is compatible with various backbones, and enables detachment/attachment of ID attributes and ablative analysis. Extensive experiments for several wild Valence-Arousal domain databsets showed the performance improvement of maximum 9% compared to the runner up, and also demonstrated the subjective realism of ID-invariant representation in high-dimensional image space.
One application of computer vision is face recognition, which essentially involves the identification of visual patterns. Face recognition is a tool that we often employ for multimedia management, smart card applications, justice reform, and security. The goal of a face recognition system is to automatically identify faces in any image or video using a computer vision domain. There are several methods for the detection of faces from the video; still, the inaccurate detection and computational complexity degrades the recognition precision. Hence, an optimized hybrid deep learning is introduced for the recognition of pose-invariant faces from the image. The pose-invariant face recognition is employed using the proposed ResNet-152 integrated YOLO (Res-YOLONet), wherein the ResNet-152 and YOLOv5 are hybridized together to enhance the recognition accuracy with minimal computational complexity. Besides, the loss function optimization is devised using the proposed Enhanced Fennec Fox (EnFF) algorithm. The proposed EnFF algorithm is designed by integrating the adaptive weighting strategy within the conventional Fennec Fox algorithm for acquiring the global best solution. The loss function optimization using the EnFF enhances the recognition accuracy. The assessment of the proposed EnFF_Res-YOLONet based on Accuracy, Precision, Recall, and Specificity acquires the values of 96%, 94%, 97%, and 96.6%, respectively.
Makeup-Invariant Face Recognition is a critical area of research that aims to identify and classify manipulated or forged face images in the field of face recognition systems. Advances in deep learning techniques, such as pre-trained neural networks, have shown promising results in this domain. In this paper, a novel approach for detecting manipulated faces is proposed, combining the strengths of pre-trained neural networks, the Grasshopper Optimization (GOA) Algorithm, and Random Forest classifier. The proposed method utilizes the powerful feature extraction capabilities of pre-trained deep neural networks to capture intricate details and patterns in face images. Subsequently, the GOA algorithm is employed to select optimal features. Additionally, the Random Forest classifier is utilized for effective classification of face images and identifying different individuals based on the selected features. By integrating these three algorithms, the proposed approach demonstrates significant improvements in detecting manipulated faces, achieving higher detection rates and lower false positive rates compared to existing state of the art methods. Simulation results on data from 25 different individuals with manipulated face images show an average accuracy of 97.23%, which represents an enhancement over the compared methods.
With the advancement of deep learning, Wi-Fi-based action recognition methods using channel state information (CSI) rely generally on domain-specific training, and results in performance degradation in unseen domains, which remains a significant challenge. To address this cross-domain recognition, some complexity models are proposed. However, these works mostly rely on multiple Wi-Fi transceivers which is not common in our daily life. To improve the recognition efficiency and reduce the transceiver requirement, we propose a novel framework for the single transceiver scenario which integrates a recursive plots-based CSI sample enhancement strategy with a multisource domain adaptation approach. The CSI sample is first enhanced by using recursive plots. Then, a lightweight convolutional neural network with integrated spatial attention is used to extract initial domain-invariant features. Subsequently, the fine-grained feature is extracted through using dedicated subnetworks. This process aligns the target domain with each source domain and regularizes the target domain outputs across multiple classifiers, thereby enhancing the network’s feature extraction. The proposed model is evaluated on the publicly available Widar3.0 dataset. The results indicate that the proposed method can achieve accuracy rates of 92.6% and 90.2% for cross-location and cross-orientation recognition in single-link scenarios, respectively, and effectively reduce the complexity.
Variations in domain targets have recently posed significant challenges for facial expression recognition tasks, primarily due to domain shifts. Current methods focus largely on global feature adoption to achieve domain-invariant learning; however, transferring local features across diverse domains remains an ongoing challenge. Additionally, during training on target datasets, these methods often suffer from reduced feature representation in the target domain due to insufficient discriminative supervision. To tackle these challenges, we propose a dynamic cross-domain dual attention network for facial expression recognition. Our model is specifically designed to learn domain-invariant features through separate modules for global and local adversarial learning. We also introduce a semantic-aware module to generate pseudo-labels, which computes semantic labels from both global and local features. We assess our model’s effectiveness through extensive experiments on the Real-world Affective Faces Database (RAF-DB), FER-PLUS, AffectNet, Expression in the Wild (ExpW), SFEW 2.0, and Japanese Female Facial Expression (JAFFE) datasets. The results demonstrate that our scheme outperforms the existing state-of-the-art methods by attaining recognition accuracies 93.18, 92.35, 82.13, 78.37, 72.47, 70.68 respectively.
Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition
No abstract available
No abstract available
No abstract available
Landslides pose significant threats to ecosystems, lives, and economies, particularly in the geologically fragile Sub-Himalayan region of West Bengal, India. This study enhances landslide susceptibility prediction by developing an ensemble framework integrating Recursive Feature Elimination (RFE) with meta-learning techniques. Seven advanced machine learning models- Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Extremely Randomized Trees (ET), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and a Meta Classifier (MC) were applied using Remote Sensing and GIS tools to identify key landslide-conditioning factors and classify susceptibility zones. Model performance was assessed through metrics such as accuracy, precision, recall, F1 score, and AUC of the ROC curve. Among the models, the Meta Classifier (MC) achieved the highest accuracy (0.956) and AUC (0.987), demonstrating superior predictive ability. Gradient Boosting (GB), XGBoost, and RF also performed well, with accuracies of 0.943 and AUC values of 0.987 (GB and XGBoost) and 0.983 (RF). Extremely Randomized Trees (ET) exhibited the highest accuracy (0.946) among individual models and an AUC of 0.985. SVM and LR, while slightly less accurate (0.941 and 0.860, respectively), provided valuable insights, with SVM achieving an AUC of 0.972 and LR achieving 0.935. The models effectively delineated landslide susceptibility into five zones (very low, low, moderate, high, and very high), with high and very high susceptibility zones concentrated in Darjeeling and Kalimpong subdivisions. These zones are influenced by intense rainfall, unstable geological structures, and anthropogenic activities like deforestation and urbanization. Notably, ET, RF, GB, and XGBoost demonstrated efficiency in feature selection, requiring fewer input variables while maintaining high performance. This study establishes a benchmark for landslide susceptibility mapping, providing a scalable and adaptable framework for geospatial hazard prediction. The findings hold significant implications for land-use planning, disaster management, and environmental conservation in vulnerable regions worldwide.
Abstract Landslides threaten communities worldwide, resulting in financial, environmental, and human losses. Although some studies have employed machine learning (ML) algorithms and multi-criteria analysis (MCA) for landslide susceptibility mapping (LSM), comparative evaluations of these methods remain scarce, particularly regarding predictor importance, performance metrics, and hyperparameter optimization. This research addresses these gaps by comparing logistic regression (LR), random forest (RF), support vector machines (SVM), and MCA, focusing on landslide susceptibility in Petrópolis, Brazil. The ML models used 29 influencing factors, encompassing geographic, geological, climatic, and anthropogenic variables, where feature importance analysis and hyperparameter tuning were applied to identify the most significant predictors. RF achieved the highest performance, with an accuracy of 0.94, ROC AUC of 0.98, and F1 score of 0.94. SVM and LR also performed well, with ROC AUCs of 0.96 and 0.95 and F1 scores of 0.92 and 0.89, respectively. Conversely, MCA showed lower results, with an accuracy of 0.41, ROC AUC of 0.41, and F1 score of 0.55. We attribute RF’s robustness to its adaptability to diverse variable types, reduced overfitting risk, and high predictive accuracy. These findings underscore RF’s strength in LSM and highlight ML’s potential to support urban planning and mitigate risks in landslide-prone areas. Graphical abstract HIGHLIGHTS Effective landslide susceptibility analysis is essential for anticipating and mitigating risks. MCA failed to identify non-landslide areas, highlighting its limitations. ML overcomes traditional MCA in landslide susceptibility mapping. RF achieved the highest prediction accuracy for landslide susceptibility, outperforming other methods. ML-based landslide susceptibility mapping ranks susceptibility factors more effectively.
Natural disasters, notably landslides, pose significant threats to communities and infrastructure. Landslide susceptibility mapping (LSM) has been globally deemed as an effective tool to mitigate such threats. In this regard, this study considers the northern region of Pakistan, which is primarily susceptible to landslides amid rugged topography, frequent seismic events, and seasonal rainfall, to carry out LSM. To achieve this goal, this study pioneered the fusion of baseline models (logistic regression (LR), K-nearest neighbors (KNN), and support vector machine (SVM)) with ensembled algorithms (Cascade Generalization (CG), random forest (RF), Light Gradient-Boosting Machine (LightGBM), AdaBoost, Dagging, and XGBoost). With a dataset comprising 228 landslide inventory maps, this study employed a random forest classifier and a correlation-based feature selection (CFS) approach to identify the twelve most significant parameters instigating landslides. The evaluated parameters included slope angle, elevation, aspect, geological features, and proximity to faults, roads, and streams, and slope was revealed as the primary factor influencing landslide distribution, followed by aspect and rainfall with a minute margin. The models, validated with an AUC of 0.784, ACC of 0.912, and K of 0.394 for logistic regression (LR), as well as an AUC of 0.907, ACC of 0.927, and K of 0.620 for XGBoost, highlight the practical effectiveness and potency of LSM. The results revealed the superior performance of LR among the baseline models and XGBoost among the ensembles, which contributed to the development of precise LSM for the study area. LSM may serve as a valuable tool for guiding precise risk-mitigation strategies and policies in geohazard-prone regions at national and global scales.
Turkey’s Artvin province is prone to landslides due to its geological structure, rugged topography, and climatic characteristics with intense rainfall. In this study, landslide susceptibility maps (LSMs) of Murgul district in Artvin province were produced. The study employed tree-based ensemble learning algorithms, namely Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), and eXtreme Gradient Boosting (XGBoost). LSM was performed using 13 factors, including altitude, aspect, distance to drainage, distance to faults, distance to roads, land cover, lithology, plan curvature, profile curvature, slope, slope length, topographic position index (TPI), and topographic wetness index (TWI). The study utilized a landslide inventory consisting of 54 landslide polygons. Landslide inventory dataset contained 92,446 pixels with a spatial resolution of 10 m. Consistent with the literature, the majority of landslide pixels (70% – 64,712 pixels) were used for model training, and the remaining portion (30% – 27,734 pixels) was used for model validation. Overall accuracy, precision, recall, F1-score, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC-ROC) were considered as validation metrics. LightGBM and XGBoost were found to have better performance in all validation metrics compared to other algorithms. Additionally, SHapley Additive exPlanations (SHAP) were utilized to explain and interpret the model outputs. As per the LightGBM algorithm, the most influential factors in the occurrence of landslide in the study area were determined to be altitude, lithology, distance to faults, and aspect, whereas TWI, plan and profile curvature were identified as the least influential factors. Finally, it was concluded that the produced LSMs would provide significant contributions to decision makers in reducing the damages caused by landslides in the study area.
Landslide inventory is significant for landslide disaster reduction. To construct the landslide inventory, deep learning has received growing attention to detect landslides from satellite images. Among various deep learning algorithms, you-only-look-once (YOLO) has a strong ability to detect objects efficiently and has been widely used in landslide extraction. Despite its efficiency, there is no general rule to select the backbone and attention mechanism for YOLO. The selection of these two modules depends on specific application needs. Meanwhile, YOLO output is a series of anchor boxes, not accurate landslide boundaries. A single bounding box may contain many landslides and cannot extract individual landslides, limiting the YOLO applications in constructing landslide inventory. To address these issues, this article presents a lightweight attention-guided YOLO with level set layer (LA-YOLO-LLL) for landslide detection from optical satellite images. First, we introduced the MobileNetv3 to replace the original backbone of YOLO to simultaneously reduce the parameter complexity and improve the model transferability. Then, we presented a light pyramid features reuse fusion attention mechanism to improve landslide detection performance. Finally, we integrated the level set layer into YOLO head to produce accurate landslide boundaries. This article validated the accuracy and transferability of the presented method in two study areas (Bijie and Taiwan) with similar geo-environmental conditions. Experimental results show that the presented LA-YOLO-LLL model outperformed traditional YOLO in landslide detection. Findings in this article are valuable for landslide inventory construction, land use planning and risk control.
Landslides are catastrophic geological events that can cause significant damage to properties and result in the loss of human lives. Deep-learning technology applied to optical remote sensing images can enable effective landslide-prone area detection. However, conventional landslide detection (LD) models often employ complex structural designs to ensure detection accuracy. The complexity often hampers the detection speed, rendering these models inadequate for the swift emergency monitoring of landslides. To address these problems, we propose a new lightweight deep-learning-based framework, BisDeNet, for efficient LD. To improve the efficiency of the proposed BisDeNet, we replaced the context path in the original BiSeNet with DenseNet due to its strong feature extraction ability, few required parameters, and low model complexity. Two sites with different and representative landslide developments were selected as the study areas to verify the performance of our proposed BisDeNet. Additionally, we introduced landslide causative factors to enhance the sampling dataset. To evaluate the effectiveness of our approach, we compared the performance of our BisDeNet with the performances of three other BiSeNet-based methods and an advanced transformer-based model data-efficient image transformer (DeiT). Our experimental results indicate that the F1-scores of BisDeNet in the two study areas are 0.9006 and 0.8850, which are 26.22% and 1.86% higher than the scores of BiSeNet, respectively, but slightly lower than that of the DeiT model. Furthermore, our proposed BisDeNet requires the fewest number of parameters and the least memory out of the five models.
No abstract available
Abstract. Until now, a full numerical description of the spatio-temporal dynamics of a landslide could be achieved only via physically based models. The part of the geoscientific community in developing data-driven models has instead focused on predicting where landslides may occur via susceptibility models. Moreover, they have estimate when landslides may occur via models that belong to the early-warning system or to the rainfall-threshold classes. In this context, few published research works have explored a joint spatio-temporal model structure. Furthermore, the third element completing the hazard definition, i.e., the landslide size (i.e., areas or volumes), has hardly ever been modeled over space and time. However, technological advancements in data-driven models have reached a level of maturity that allows all three components to be modeled (Location, Frequency, and Size). This work takes this direction and proposes for the first time a solution to the assessment of landslide hazard in a given area by jointly modeling landslide occurrences and their associated areal density per mapping unit, in space and time. To achieve this, we used a spatio-temporal landslide database generated for the Nepalese region affected by the Gorkha earthquake. The model relies on a deep-learning architecture trained using an Ensemble Neural Network, where the landslide occurrences and densities are aggregated over a squared mapping unit of 1 km × 1 km and classified or regressed against a nested 30 m lattice. At the nested level, we have expressed predisposing and triggering factors. As for the temporal units, we have used an approximately 6 month resolution. The results are promising as our model performs satisfactorily both in the susceptibility (AUC = 0.93) and density prediction (Pearson r = 0.93) tasks over the entire spatio-temporal domain. This model takes a significant distance from the common landslide susceptibility modeling literature, proposing an integrated framework for hazard modeling in a data-driven context.
Landslide geological disasters, occurring globally, often result in significant loss of life and extensive economic damage. In recent years, the severity of these disasters has increased, likely due to the frequent occurrence of extreme rainstorms associated with global warming. This escalating trend emphasizes the urgent need for a simple and efficient method to identify hidden dangers related to landslide geological disasters. Areas experiencing seasonal heavy rainfall are particularly susceptible to such disasters, posing a serious threat to the lives and property of local residents. In response to the challenging characteristics of landslide geological hazards, such as their strong concealment and the high vegetation coverage in the Liupan Mountain area of the Loess Plateau, this study focuses on the integrated remote sensing identification and research of hidden landslide dangers in Longde County. The methodology combines differential interferometric synthetic aperture radar technology (D-InSAR) and high-resolution optical remote sensing. Surface deformation information of Longde County was obtained by analyzing 85 Sentinel-1A data from 2019 to mid-2020 using Stacking-InSAR, in conjunction with high-resolution optical remote sensing image data from GF-2 in 2019. Furthermore, the study conducted integrated remote sensing identification and field verification of landslide hazards throughout the entire county. This involved interpreting the shape and deformation marks of landslide hazards, identifying the disaster-bearing bodies, and expertly interpreting the environmental factors contributing to the hazards. As a result, 47 suspected landslide hazards and 21 field investigation points were identified, with 16 hazards verified with an accuracy of 76.19%. This outcome directly confirms the applicability and accuracy of the integrated remote sensing identification technology in the study area. The research results presented in this paper provide an effective scientific and theoretical basis for the monitoring and treatment of landslide geological disasters in the future stages. They also play a pivotal role in the prevention of such disasters.
Landslides are a widely recognized phenomenon, causing huge economic and human losses worldwide. The detection of spatial and temporal landslide deformation, together with the acquisition of precursor information, is crucial for hazard prediction and landslide risk management. Advanced landslide monitoring systems based on remote sensing techniques (RSTs) play a crucial role in risk management and provide important support for early warning systems (EWSs) at local and regional scales. The purpose of this article is to present a review of the current state of knowledge in the development of RSTs used for identifying landslide precursors, as well as detecting, monitoring, and predicting landslides. Almost 200 articles from 2010 to 2024 were analyzed, in which the authors utilized RSTs to detect potential precursors for early warning of hazards. The applications, challenges, and trends of RSTs, largely dependent on the type of landslide, deformation pattern, hazards posed by the landslide, and the size of the area of interest, were also discussed. Although the article indicates some limitations of the RSTs used so far, integrating different techniques and technological developments offers the opportunity to create reliable EWSs and improve existing ones.
No abstract available
The use of machine learning models for landslide susceptibility mapping is widespread but limited to spatial prediction. The potential of employing these techniques in spatiotemporal landslide forecasting remains largely unexplored. To address this gap, this study introduces an innovative dynamic (i.e., space–time-dependent) application of the random forest algorithm for evaluating landslide hazard (i.e., spatiotemporal probability of landslide occurrence). An area in Norway has been chosen as the case study because of the availability of a comprehensive, spatially, and temporally explicit rainfall-induced landslide inventory. The applied methodology is based on the inclusion of dynamic variables, such as cumulative rainfall, snowmelt, and their seasonal variability, as model inputs, together with traditional static parameters such as lithology and morphologic attributes. In this study, the variables’ importance was assessed and used to interpret the model decisions and to verify that they align with the physical mechanism responsible for landslide triggering. The algorithm, once trained and tested against landslide and non-landslide data sampled over space and time, produced a model predictor that was subsequently applied to the entire study area at different times: before, during, and after specific landslide events. For each selected day, a specific and space–time-dependent landslide hazard map was generated, then validated against field data. This study overcomes the traditional static applications of machine learning and demonstrates the applicability of a novel model aimed at spatiotemporal landslide probability assessment, with perspectives of applications to early warning systems.
Unsupervised domain adaptation (UDA) aims to improve model performance in the target domain by leveraging labeled data from the source domain while not requiring labeled data in the target domain. It has been widely applied in cross-domain semantic segmentation of remote sensing images (RSIs). Despite some advancements in this area, challenges such as class confusion due to color and texture similarities, class imbalance due to significant scale variations and sample imbalance continue to impede progress in UDA for RSI segmentation. To address these challenges, we propose a novel self-supervised teacher-student network framework, including two innovative techniques: mask-enhanced class mix (MECM) and scale-based rare class sampling (SRCS). The MECM method applies a high proportion of masks to mixed images derived from both source-domain images and target-domain images, which encourages the model to infer the semantic information of masked areas from the surrounding context, enhancing cross-domain contextual semantic learning and improving the recognition accuracy of similar classes. Additionally, SRCS increases the sampling proportion of small-scale rare classes, mitigating the issue of class imbalance. Experiments show that our method outperforms existing UDA techniques in terms of PA, mF1, and mIoU, achieving state-of-the-art results on three public datasets. Notably, in the Potsdam IRRG to Vaihingen UDA scenario, our method’s performance on the key metric, mIoU, even surpasses that of supervised training, demonstrating the superiority of our approach. Codes are available at https://github.com/Qiuyb-ai/UDA-With-ME-and-BS.
The distribution of remote sensing (RS) images can vary significantly due to seasonal changes and lighting conditions, making it difficult for deep learning models to generalize effectively across different RS datasets. This variation leads to a domain gap that hampers model performance when applied to new, unseen data. To tackle this challenge, we introduce DDCI, a novel unsupervised domain adaptation (UDA) framework designed to bridge the domain gap in RS image perception. Our framework consists of two key components, i.e., the adaptation diffusion distillation (ADD) module and the consistent causal intervention (CCI) module. The ADD module addresses the domain gap by aligning the source and target domains. It enhances the representation of the target domain by distilling semantic knowledge from the teacher model of the source domain. This process allows the target domain to benefit from the rich features of the source domain, leading to improved model generalization. The CCI module focuses on removing spurious correlations between domain-agnostic knowledge and domain-specific knowledge. By carefully considering the distinct characteristics of the target domain while preserving the specificity of the source domain, the CCI module ensures that only relevant, causal information is transferred between domains. This prevents overfitting to irrelevant domain-specific features and enhances model robustness. We demonstrate the effectiveness of the DDCI framework on RS scene classification tasks, utilizing four widely recognized RS datasets. Our results show significant performance improvements, underscoring the potential of this approach to boost the adaptability of deep learning models across diverse RS image datasets.
Unsupervised domain adaptation (UDA) for remote sensing image semantic segmentation aims to train a deep model on the labeled source domain and apply it to the unlabeled target domain. However, resolution and scene inconsistencies of cross-domain remote sensing images lead to great distribution differences, which weakens the semantic segmentation effect. To solve the above issues, an unsupervised remote sensing image semantic segmentation method is proposed based on multiscale contrastive domain adaptation. First, the mean teacher model is introduced into the UDA paradigm to generate pseudo-labels for target-domain data, thereby achieving the cross-domain segmentation capability. A dynamic class balance sampling (DCBS) method is proposed to mitigate the class imbalance problem in cross-domain data by increasing the sampling frequency of the categories with fewer samples. Then, a data augmentation method called cross-domain mixup (CDMix) is developed to reduce the gap between the source and target domains. Finally, a multiscale cross-domain contrastive loss (MCCL) is developed, which introduces contrastive learning to learn domain-consistent features across the source and target domains, resulting in a more coherent and discriminative feature representation. Experimental results show that the proposed method can yield superior performance for unsupervised remote sensing image semantic segmentation.
ABSTRACT Deep learning (DL) has become the mainstream technique for extracting information from high-spatial-resolution (HSR) imagery because of its powerful feature representation capabilities. However, DL models rely heavily on accurate annotations, which limits their generalizability to new data. Recently, the Segment Anything Model (SAM) has significantly advanced image segmentation techniques, showing great potential for use in remote sensing applications. To address the above limitations and explore the potential of the SAM for use with HSR imagery, we propose a novel method for completing semantic segmentation tasks that combines the SAM and unsupervised domain adaptation (UDA) techniques, enhancing model performance on unlabeled HSR imagery. Specifically, we propose a pseudolabel refinement module by integrating SAM and UDA techniques. Furthermore, the obtained pseudolabels are used to train the proposed self-training and SAM-based network (STSAMNet) for performing semantic segmentation; this network embeds two types of adapter layers to adapt the capabilities of the SAM to HSR imagery. During the training process, an iterative training strategy and a noise-weighted loss are applied to further improve the accuracy of the model on unlabeled images. Compared with other UDA methods, our method achieves the best performance in terms of F1 and mean intersection over union (mIoU) values.
Motivated by the increasing demand for robust segmentation in unlabeled remote sensing data, we propose domain adaptation multimodal and multi-temporal transformer (DAM-Former), a novel unsupervised domain adaptation (UDA) model that fuses high-resolution (HR) multimodal imagery with multi-temporal multispectral data. Current UDA approaches in remote sensing rarely exploit the complementary strengths of spatial and temporal features. To address this gap, our framework integrates two interconnected branches: a transformer-based network for HR multimodal data and a lightweight convolutional network with temporal attention for multi-temporal imagery. To improve segmentation accuracy and lower noise, the extracted features are robustly combined through a deep temporal fusion (DTF) module and a new mixed loss (ML) with an ensemble pseudo-label (EP) strategy. Extensive experiments and an ablation study on the FLAIR-2 dataset demonstrate that DAM-Former outperforms state-of-the-art methods, marking the first in-depth study of temporal information fusion in UDA segmentation for remote sensing data. Code available at https://github.com/ibanezfd/DAM_Former.
Although unsupervised domain adaptation (UDA) has been successfully applied for cross-scene classification of multisource remote sensing (MSRS) data, there are still some tough issues: 1) the vast majority of them are patch-based, requiring pixel-by-pixel processing at high complexity and ignoring the roles of unlabeled data between different domains and 2) traditional masked autoencoder (MAE)-based methods lack effective multiscale analysis and require pre-training, ignoring the roles of low-level representations. As such, a hierarchical masked dual-adversarial DA network (HMDA-DANet) is proposed for cross-domain end-to-end classification of MSRS data. First, a hierarchical asymmetric MAE (HAMAE) without pre-training is designed, containing a frequency dynamic large-scale convolutional (FDLConv) block to enhance important structural information in the frequency domain, and an intramodality enhancement and intermodality interaction (IAEIEI) block to embed some additional information beyond the domain distribution by expanding the cross-modal reconstruction space. Representative multimodal multiscale features can be extracted, while to some extent improving their generalization to the target domain (TD). Then, a multimodal multiscale feature fusion (MMFF) block is built to model the spatial and scale dependencies for feature fusion and reduce the layer-by-layer transmission of redundancy or interference information. Finally, a dual-discriminator-based DA (DDA) block is designed for class-specific semantic features and global structural alignments in both spatial and prediction spaces. It will enable HAMAE to model the cross-modal, cross-scale, and cross-domain associations, yielding more representative domain-invariant multimodal fusion features. Extensive experiments on five cross-domain MSRS datasets verify the superiority of the proposed HMDA-DANet over other state-of-the-art methods.
Remote sensing imagery (RSI) segmentation plays a crucial role in environmental monitoring and geospatial analysis. However, in real-world practical applications, the domain shift problem between the source domain and target domain often leads to severe degradation of model performance. Most existing unsupervised domain adaptation methods focus on aligning global-local domain features or category features, neglecting the variations of ground object categories within local scenes. To capture these variations, we propose the scene covariance alignment (SCA) approach to guide the learning of scene-level features in the domain. Specifically, we propose a scene covariance alignment model to address the domain adaptation challenge in RSI segmentation. Unlike traditional global feature alignment methods, SCA incorporates a scene feature pooling (SFP) module and a covariance regularization (CR) mechanism to extract and align scene-level features effectively and focuses on aligning local regions with different scene characteristics between source and target domains. Experiments on both the LoveDA and Yanqing land cover datasets demonstrate that SCA exhibits excellent performance in cross-domain RSI segmentation tasks, particularly outperforming state-of-the-art baselines across various scenarios, including different noise levels, spatial resolutions, and environmental conditions.
Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation (UDA-RSSeg) addresses the challenge of adapting a model trained on source domain data to target domain samples, thereby minimizing the need for annotated data across diverse remote sensing scenes. This task presents two principal challenges: (1) severe inconsistencies in feature representation across different remote sensing domains, and (2) a domain gap that emerges due to the representation bias of source domain patterns when translating features to predictive logits. To tackle these issues, we propose a joint-optimized adversarial network incorporating the"Segment Anything Model (SAM) (SAM-JOANet)"for UDA-RSSeg. Our approach integrates SAM to leverage its robust generalized representation capabilities, thereby alleviating feature inconsistencies. We introduce a finetuning decoder designed to convert SAM-Encoder features into predictive logits. Additionally, a feature-level adversarial-based prompted segmentor is employed to generate class-agnostic maps, which guide the finetuning decoder's feature representations. The network is optimized end-to-end, combining the prompted segmentor and the finetuning decoder. Extensive evaluations on benchmark datasets, including ISPRS (Potsdam/Vaihingen) and CITY-OSM (Paris/Chicago), demonstrate the effectiveness of our method. The results, supported by visualization and analysis, confirm the method's interpretability and robustness. The code of this paper is available at https://github.com/CV-ShuchangLyu/SAM-JOANet.
Unsupervised domain adaptation (UDA) has been a crucial way for cross-domain semantic segmentation of remote sensing images and reached apparent advents. However, most existing efforts focus on single source single target domain adaptation, which don't explicitly consider the serious domain shift between multiple source and target domains in real applications, especially inter-domain shift between various target domains and intra-domain shift within each target domain. In this paper, to address simultaneous inter-domain shift and intra-domain shift for multiple target domains, we propose a novel unsupervised, multistage, multisource and multitarget domain adaptation network (MultiDAN), which involves multisource and multitarget domain adaptation (MSMTDA), entropy-based clustering (EC) and multistage domain adaptation (MDA). Specifically, MSMTDA learns feature-level multiple adversarial strategies to alleviate complex domain shift between multiple target and source domains. Then, EC clusters the various target domains into multiple subdomains based on entropy of target predictions of MSMTDA. Besides, we propose a new pseudo label update strategy (PLUS) to dynamically produce more accurate pseudo labels for MDA. Finally, MDA aligns the clean subdomains, including pseudo labels generated by PLUS, with other noisy subdomains in the output space via the proposed multistage adaptation algorithm (MAA). The extensive experiments on the benchmark remote sensing datasets highlight the superiority of our MultiDAN against recent state-of-the-art UDA methods.
We propose a novel two-stage cross-domain self-training (CDST) framework for unsupervised domain adaptive object detection in remote sensing. The first stage introduces the generative adversarial network (GAN)-based domain transfer strategy to preliminarily mitigate the domain shift for higher quality initial pseudo-labeled images, which utilizes the CycleGAN to transfer source-domain images to match the target domain. Moreover, the key issue in tailoring the self-training (ST) to unsupervised domain adaptive detection lies in the quality of pseudo-labeled images. To select high-quality pseudo-labeled images under the domain-shift circumstance, we propose hard example selection-based self-training (HES-ST) with the three key steps: 1) detector-based example division (DED), which divides the detected examples into easy examples and hard ones according to their confidence level; 2) confidence and relation joint score (CRJS)-based hard example selection, which combines two reliability levels calculated, respectively, by the detector and relation network (RN) module to mine reliable examples; and 3) union example (UE)-based training image selection, which combines both easy and reliable hard examples to choose target-domain images that may contain fewer detection errors. The experimental results on several remote sensing datasets demonstrate the effectiveness of our proposed framework. Compared with the baseline detector trained on the source dataset, our approach consistently improves the detection performance on the target dataset by 15.7%–16.8% mean average precision (mAP) and achieves the state-of-the-art (SOTA) results under various domain adaptation scenarios.
No abstract available
Self-training based unsupervised domain adaptation approaches play a pivotal role in mitigating the domain shift and improving the segmentation performance of the target domain, where a trained model generates pseudo-labels for the target domain. However, how to improve the quality of the pseudo-labels of the target domain containing noise has received considerable critical attention. Moreover, remote sensing images have an imbalanced number of samples across classes. In this paper, we propose a self-training based instance calibration for unsupervised domain adaptation semantic segmentation of remote sensing images. Firstly, we encourage the different augmented results of the same pixel of the target domain images to not only have the same class prediction but also have the similar relationship with the pixel instances of the source domain. Secondly, variations related to the number of samples of different classes are added to the logits of the pixels to narrow the gap between the feature areas of different classes. Experiments on International Society for Photogrammetry and Remote Sensing (ISPRS) and Tibet Plateau remote sensing datasets demonstrate that the proposed model can effectively improve the quality of the pseudo-labels of the target domain and mitigate low segmentation performance of the target domain caused by the imbalanced number of samples across classes.
Deep self-training-based unsupervised domain adaptation (UDA) semantic segmentation methods learn from labeled source domain images and unlabeled target domain images, performing more stably than those based on adversarial training. We propose a self-training-based image–text multimodal unsupervised domain adaptation semantic segmentation model (SIT-UDA) for remote sensing images. Unlike UDA methods, which rely solely on images, SIT-UDA enhances generalization performance by integrating category hint information from textual descriptions with image data to segment images. SIT-UDA employs a teacher–student self-training framework consisting of two components: the teacher multimodal segmentation model, which predicts pseudo-labels for target domain images, and the student multimodal segmentation model, which is trained to learn feature representations from both the source and target domains with guidance from the teacher model. To enhance the adaptability of image–text pretrained models in remote sensing domains, SIT-UDA introduces text prompt tuning to optimize the text features in the student model, and two learning strategies are proposed to optimize the model’s training objectives: One is the entropy-guided pixel-level weighting (EGPW) strategy, which adaptively weights the loss obtained by self-training on target domain images, leveraging the pseudo-labels rationally according to the entropy value at the pixel level. The other is the contrastive text constraint (CTC) strategy, which maximizes the similarity of text features for the same category between teacher and student models while minimizing the similarity of text features across different categories, improving text feature discriminability to promote cross-domain image–text alignment. Experiments in various domain adaptation scenarios among three remote sensing datasets (Potsdam, Vaihingen and LoveDA) demonstrate that the SIT-UDA is superior to the comparative domain adaptation semantic segmentation methods in terms of qualitative and quantitative segmentation results.
Unsupervised domain adaptation (UDA) is crucial for remote sensing (RS) scene classification under pervasive domain shifts. Many existing UDA methods primarily emphasize interdomain transferability via global alignment, including adversarial learning, yet underexploit intradomain structure, which can degrade discriminability. This letter proposes graph structural moment alignment (GSMA), a structure-regularized UDA framework that complements adversarial adaptation with graph-based structural constraints. GSMA introduces high-order spectral moment alignment (HOSMA) to align source and target feature graphs constructed from random-walk transition matrices via trace-based, decomposition-free moment statistics, encouraging the preservation of high-order local connectivity patterns. Moreover, GSMA integrates neighborhood consensus learning (NCL) with a memory bank to refine target pseudo-labels through neighborhood agreement and local consistency regularization. Experiments on cross-domain RS benchmarks demonstrate consistent improvements over representative baselines, validating the effectiveness of GSMA for robust RS scene adaptation.
In unsupervised domain adaptation for semantic segmentation of remote sensing imagery, identical land-cover classes across different domains often exhibit substantial variations in appearance, scale, and class distribution, which seriously hinder cross-domain generalization. Moreover, even within a single domain, land-cover classes present highly complex and diverse intraclass distributions that cannot be effectively captured by a single class representation, further increasing the challenge of generalization. To this end, we propose a collaborative framework integrating local pixel-level contrast and global Gaussian multiprototype bidirectional alignment. At the local level, we introduce probability-masked contrastive learning, which adaptively increases the sampling probability of minority classes to mitigate the class imbalance issue. Meanwhile, pixel contrastive learning is incorporated to enhance the robustness to cross-domain variations in appearance and texture. At the global level, we employ a Gaussian mixture model to represent each source-domain class with multiple Gaussian prototypes rather than a single one, thereby yielding richer and more fine-grained class representations. Building on this, a bidirectional alignment strategy is proposed. Concretely, the forward alignment serves multiprototypes as semantic anchors that progressively guide target-domain features to align with the source-domain class distributions, reducing the intraclass variance. Meanwhile, the reverse alignment dynamically refines the prototypes to increase anchor accuracy, further enhancing the stability and discriminability of cross-domain alignment. Experimental results on the widely used ISPRS and LoveDA datasets demonstrate the superiority of our proposed method over state-of-the-art approaches.
最终分组结果构建了一个从基础理论到前沿应用的完整知识体系。研究脉络清晰地展示了滑坡识别如何从传统的易发性评价演进到深度学习语义分割,并最终聚焦于利用无监督域适应(UDA)解决跨区域泛化难题。核心趋势包括:1) 算法层面从简单的特征对齐转向自训练、伪标签精炼及无源域适应;2) 架构层面从轻量化CNN转向Transformer及视觉大模型(VFM);3) 数据层面从单模态光学影像转向结合DEM、多时相及图结构的多模态融合。这些研究共同推动了滑坡监测向更具鲁棒性、自动化和跨地域迁移的方向发展。