医学图像分割
基于Transformer、Mamba与混合架构的医学图像分割
该组文献集中探讨将Vision Transformer、Mamba等长距离建模模型与CNN进行架构融合,旨在平衡局部细节特征与全局上下文信息,以应对医学图像复杂结构下的分割挑战。
- H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation(Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang, 2024, Neurocomputing)
- nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer(Hong-Yu Zhou, J. Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, Yizhou Yu, 2021, IEEE Transactions on Image Processing)
- TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation(Yundong Zhang, Huiye Liu, Qiang Hu, 2021, Lecture Notes in Computer Science)
- CFM-UNet: coupling local and global feature extraction networks for medical image segmentation.(Ke Niu, Jiacheng Han, Jiuyun Cai, 2025, Scientific reports)
- SCFMUNet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation.(Zhiyong Huang, Zhiyu Zhao, Zhi Yu, Mingyang Hou, Shiyao Zhou, Jiahong Wang, Yan Yan, Yushi Liu, Hans Gregersen, 2025, Neural networks : the official journal of the International Neural Network Society)
- Switch-UMamba: Dynamic scanning vision Mamba UNet for medical image segmentation(Ziyao Zhang, Qiankun Ma, Tong Zhang, Jie Chen, Hairong Zheng, Wen Gao, 2025, Medical Image Analysis)
- U-RWKV: Accurate and Efficient Volumetric Medical Image Segmentation via RWKV(Hongyu Cai, Yifan Wang, Liu Wang, Jian Zhao, Zhejun Kuang, 2026, IEEE Transactions on Image Processing)
- EMCAH-Net: an effective multi-scale context aggregation hybrid network for medical image segmentation.(Yu Jin, Rui Tian, Qian Yu, Yu Bai, Guoqing Chao, Danqing Liu, Yanhui Guo, 2025, Quantitative imaging in medicine and surgery)
- GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation(Shengxiang Wang, Ge Li, Ming Gao, Linlin Zhuo, Mingzhe Liu, Zhizhong Ma, Wei Zhao, Xiangzheng Fu, 2025, npj Digital Medicine)
- Rethinking Boundary Detection in Deep Learning Models for Medical Image Segmentation(Yi Lin, Dong Zhang, Xiaori Fang, Yufan Chen, Kwang-Ting Cheng, Hao Chen, 2023, Information Processing in Medical Imaging)
- Atten-Nonlocal Unet: Attention and Non-local Unet for medical image segmentation(Xiaofen Jia, Wenji Wang, Mei Zhang, Baiting Zhao, 2025, Computers in Biology and Medicine)
- SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation(Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, Lei Zhu, 2024, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation(Sucheng Ren, Xiaomeng Li, 2024, IEEE Transactions on Neural Networks and Learning Systems)
- GSAC-UFormer: Groupwise Self-Attention Convolutional Transformer-Based UNet for Medical Image Segmentation(Anass Garbaz, Yassine Oukdach, Said Charfi, Mohamed El Ansari, Lahcen Koutti, Mouna Salihoun, 2025, Cognitive Computation)
- UNETR: Transformers for 3D Medical Image Segmentation(Ali Hatamizadeh, Dong Yang, H. Roth, Daguang Xu, 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))
- SWTRU: Star-shaped Window Transformer Reinforced U-Net for medical image segmentation.(Jianyi Zhang, Yong Liu, Qihang Wu, Yongpan Wang, Yuhai Liu, Xianchong Xu, Bo Song, 2022, Computers in biology and medicine)
- Multi-Scale Dynamic Sparse Attention UNet for Medical Image Segmentation.(Xiang Li, Chong Fu, Qun Wang, Wenchao Zhang, Chen Ye, Junxin Chen, Chiu-Wing Sham, 2025, IEEE journal of biomedical and health informatics)
- Boundary-enhanced sparse transformer for generalizable and accurate medical image segmentation(Chaofan Li, Qiong Liu, Jianxiang Song, 2025, Scientific Reports)
- TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers(Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, M. Lungren, Shaoting Zhang, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou, 2024, Medical Image Analysis)
- CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation(Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia, 2021, Lecture Notes in Computer Science)
- Enhancing medical image segmentation with a multi-transformer U-Net.(Yongping Dan, Weishou Jin, Xuebin Yue, Zhida Wang, 2024, PeerJ)
- DA-TransUNet: integrating spatial and channel dual attention with transformer U-net for medical image segmentation.(Guanqun Sun, Yizhi Pan, Weikun Kong, Zichang Xu, Jianhua Ma, Teeradaj Racharak, Le-Minh Nguyen, Junyi Xin, 2024, Frontiers in bioengineering and biotechnology)
- BPAT-UNet: Boundary preserving assembled transformer UNet for ultrasound thyroid nodule segmentation.(Hui Bi, Chengjie Cai, Jiawei Sun, Yibo Jiang, Gang Lu, Huazhong Shu, Xinye Ni, 2023, Computer methods and programs in biomedicine)
- X-UNet:A novel global context-aware collaborative fusion U-shaped network with progressive feature fusion of codec for medical image segmentation(Shijie Xu, Yufeng Chen, Xiaoqian Zhang, Feng Sun, Siyu Chen, Yanchi Ou, Chao Luo, 2025, Neural Networks)
- VM-UNet: Vision Mamba UNet for Medical Image Segmentation(Jiacheng Ruan, Jincheng Li, Suncheng Xiang, 2024, ACM Transactions on Multimedia Computing, Communications, and Applications)
- UNet with self-adaptive Mamba-like attention and causal-resonance learning for medical image segmentation(Saqib Qamar, Mohd Fazil, Parvez Ahmad, Shakir Khan, Abu Taha Zamani, 2025, Scientific Reports)
- Enhancing Medical Image Segmentation with Mamba and UNet++(Ahmed AL Qurri, M. Almekkawy, 2025, 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI))
- Medical Transformer: Gated Axial-Attention for Medical Image Segmentation(Jeya Maria Jose Valanarasu, Poojan Oza, I. Hacihaliloglu, Vishal M. Patel, 2021, Lecture Notes in Computer Science)
- H2MaT-Unet:Hierarchical hybrid multi-axis transformer based Unet for medical image segmentation.(ZhiYong Ju, ZhongChen Zhou, ZiXiang Qi, Cheng Yi, 2024, Computers in biology and medicine)
- CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation(Xiao Liu, Peng Gao, Tao Yu, Fei Wang, Ruyue Yuan, 2024, Information Fusion)
- PSwinUNet: Bridging Local and Global Contexts for Accurate Medical Image Segmentation with Semi-Supervised Learning(Zhixuan Zhao, Bailin Liu, Hongpei Zhang, Chentao Qian, Yijian Zhang, 2025, International Journal of Advanced Network, Monitoring and Controls)
- DAUNet: A deformable aggregation UNet for multi-organ 3D medical image segmentation(Qinghao Liu, Min Liu, Yuehao Zhu, Licheng Liu, Zhe Zhang, Yaonan Wang, 2025, Pattern Recognition Letters)
- Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation(Hu Cao, Yueyue Wang, Jieneng Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, Manning Wang, 2021, Lecture Notes in Computer Science)
- AFC-Unet: Attention-fused full-scale CNN-transformer unet for medical image segmentation(Wenjie Meng, Shujun Liu, Huajun Wang, 2025, Biomedical Signal Processing and Control)
- TransCUNet: UNet cross fused transformer for medical image segmentation.(Shen Jiang, Jinjiang Li, 2022, Computers in biology and medicine)
- Rolling-Unet: Revitalizing MLP's Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation(Yutong Liu, Haijiang Zhu, Mengting Liu, Huaiyuan Yu, Zihan Chen, Jie Gao, 2024, Proceedings of the AAAI Conference on Artificial Intelligence)
- U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation(Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan, 2024, AAAI Conference on Artificial Intelligence)
- KM-UNet: medical image segmentation with selective-scan Mamba and Kolmogorov-Arnold networks(Yibo Zhang, Jingwen Zhao, Xiang Liu, Xian Tang, Yunyu Shi, Lina Wei, Guyue Zhang, 2026, PeerJ Computer Science)
- ScribFormer: Transformer Makes CNN Work Better for Scribble-Based Medical Image Segmentation(Zihan Li, Yuan Zheng, Dandan Shan, Shuzhou Yang, Qingde Li, Beizhang Wang, Yuan-ting Zhang, Qingqi Hong, Dinggang Shen, 2024, IEEE Transactions on Medical Imaging)
- Nuclei instance segmentation using a transformer-based graph convolutional network and contextual information augmentation.(Juan Wang, Zetao Zhang, Minghu Wu, Yonggang Ye, Sheng Wang, Ye Cao, Hao Yang, 2023, Computers in biology and medicine)
预训练大模型、提示学习与交互式分割
该组文献专注于如何适配通用预训练模型(如SAM)或设计交互式分割框架,通过提示学习(Prompt Learning)实现对医学影像任务的快速适应与高效标注。
- MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation(Cheng Chen, Juzheng Miao, Dufan Wu, Aoxiao Zhong, Zhiling Yan, Sekeun Kim, Jiang Hu, Zheng Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, Quanzheng Li, 2024, Medical Image Analysis)
- Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation(Junde Wu, Rao Fu, Huihui Fang, Yuanpei Liu, Zhao-Yang Wang, Yanwu Xu, Yueming Jin, T. Arbel, 2023, Medical Image Analysis)
- SegVol: Universal and Interactive Volumetric Medical Image Segmentation(Yuxin Du, Fan Bai, Tiejun Huang, Bo Zhao, 2023, Neural Information Processing Systems)
- SAM2-UNet: segment anything 2 makes strong encoder for natural and medical image segmentation(Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li, 2024, Visual Intelligence)
- Segment Anything Model for Medical Image Analysis: an Experimental Study(Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang, 2023, ArXiv Preprint)
- Volumetric memory network for interactive medical image segmentation(Tianfei Zhou, Liulei Li, G. Bredell, Jianwu Li, E. Konukoglu, 2022, Medical Image Analysis)
- On-the-Fly Improving Segment Anything for Medical Image Segmentation Using Auxiliary Online Learning.(Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang, 2025, IEEE transactions on medical imaging)
- SAM-Med3D: Towards General-Purpose Segmentation Models for Volumetric Medical Images(Haoyu Wang, Sizheng Guo, Jin Ye, Zhongyi Deng, Junlong Cheng, Tian-Xin Li, Jianpin Chen, Yan-Cheng Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao, 2023, Lecture Notes in Computer Science)
- TsSAM-CA: A Two Stage SAM Integrating CNN Augmented ViT for Multi-Organ Medical Image Segmentation(Lei Fei, Shuai Wang, 2025, 2025 6th International Conference on Machine Learning and Computer Application (ICMLCA))
- Quality-Aware Memory Network for Interactive Volumetric Image Segmentation(Tianfei Zhou, Liulei Li, G. Bredell, Jianwu Li, E. Konukoglu, 2021, Lecture Notes in Computer Science)
- SAM-Med3D: A Vision Foundation Model for General-Purpose Segmentation on Volumetric Medical Images(Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tian-Xin Li, Jianpin Chen, Yan-Cheng Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, 2025, IEEE Transactions on Neural Networks and Learning Systems)
- Weakly Supervised Lesion Detection via Promptable Segmentation Models with Uncertainty-Aware Refinement(Nancy Taylor, 2025, Computer Science Bulletin)
半监督、自监督与领域适应学习
针对医学领域标注数据稀缺的难题,该组研究通过自监督预训练、伪标签生成、对比学习及跨模态域适应技术,提升模型在小样本或无标注场景下的鲁棒性。
- Semi-Supervised Medical Image Segmentation with Multimodal Data Augmentation and Self-Supervised Learning(Duan Decahng, 2024, 2024 21st International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP))
- MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-Supervised Medical Image Segmentation(Xiang-Fei Wang, Ruizhi Wang, Biao Tian, Jiaojiao Zhang, Shuo Zhang, Junyang Chen, Thomas Lukasiewicz, Zhenghua Xu, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Boundary-aware Information Maximization for Self-supervised Medical Image Segmentation(Jizong Peng, Ping Wang, Marco Pedersoli, Christian Desrosiers, 2022, Medical Image Analysis)
- Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts(Hong-Yu Zhou, Chi-Ken Lu, Sibei Yang, Xiaoguang Han, Yizhou Yu, 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV))
- DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis(F. Haghighi, M. Taher, M. Gotway, Jianming Liang, 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Volume Fusion-Based Self-Supervised Pretraining for 3D Medical Image Segmentation(Guotai Wang, Jia Fu, Jianghao Wu, Xiangde Luo, Yubo Zhou, Xinglong Liu, Kang Li, Jingsheng Lin, Baiyong Shen, Shaoting Zhang, 2025, IEEE Transactions on Image Processing)
- Multi-ConDoS: Multimodal Contrastive Domain Sharing Generative Adversarial Networks for Self-Supervised Medical Image Segmentation(Jiaojiao Zhang, Shuo Zhang, Xiaoqian Shen, Thomas Lukasiewicz, Zhenghua Xu, 2023, IEEE Transactions on Medical Imaging)
- Curriculum Learning for Self-Iterative Semi-Supervised Medical Image Segmentation(Jiayu Zhang, Dexuan Xu, Yanyuan Chen, Yiwei Lou, Yue Huang, 2024, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation(Yunhao Bai, Duowen Chen, Qingli Li, Wei-lei Shen, Yan Wang, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation(Junming Su, Zhiqiang Shen, Peng Cao, Jinzhu Yang, Osmar R. Zaïane, 2024, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- Self-Supervised Medical Image Segmentation Using Deep Reinforced Adaptive Masking(Zhenghua Xu, Yunxin Liu, Gang Xu, Thomas Lukasiewicz, 2024, IEEE Transactions on Medical Imaging)
- Learning anatomy from unlabelled CT volumes: A self-supervised framework for improving prostate radiotherapy segmentation.(Diyana Afrina Hizam, Ngie Min Ung, Marniza Saad, Firdaus Mohd Salleh, Asyraf Muaadz, Li Kuo Tan, 2026, Medical physics)
- Deblurring masked image modeling for ultrasound image analysis.(Qingbo Kang, Qicheng Lao, Jun Gao, Jingyan Liu, Huahui Yi, Buyun Ma, Xiaofan Zhang, Kang Li, 2024, Medical image analysis)
- A Topological Loss Function for Deep-Learning Based Image Segmentation Using Persistent Homology.(James R Clough, Nicholas Byrne, Ilkay Oksuz, Veronika A Zimmer, Julia A Schnabel, Andrew P King, 2022, IEEE transactions on pattern analysis and machine intelligence)
- ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations(Chinmay Prabhakar, Hongwei Li, Jiancheng Yang, Suprosana Shit, Benedikt Wiestler, Bjoern H Menze, 2023, International Conference on Medical Imaging with Deep Learning)
- Meta-Learning Approach with Attention U-Net for Skin Lesion Segmentation(Jenulin Makros G, P. B. Princess, 2025, 2025 International Conference on Sustainable Communication Networks and Application (ICSCN))
- Correspondence-based Generative Bayesian Deep Learning for semi-supervised volumetric medical image segmentation.(Yuzhou Zhao, Xinyu Zhou, Tongxin Pan, Shuyong Gao, Wenqiang Zhang, 2024, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- An image registration-based self-supervised Su-Net for carotid plaque ultrasound image segmentation.(Jing Ding, Ran Zhou, Xiaoyue Fang, Furong Wang, Ji Wang, Haitao Gan, Aaron Fenster, 2024, Computer methods and programs in biomedicine)
- Semi-Supervised Deep Learning Semantic Segmentation for 3D Volumetric Computed Tomographic Scoring of Chronic Rhinosinusitis: Clinical Correlations and Comparison with Lund-Mackay Scoring.(Chung-Feng Jeffrey Kuo, Yu-Shu Liao, Jagadish Barman, Shao-Cheng Liu, 2022, Tomography (Ann Arbor, Mich.))
- Domain Adaptive Nuclei Instance Segmentation and Classification via Category-aware Feature Alignment and Pseudo-labelling(Canran Li, Dongnan Liu, Haoran Li, Zheng Zhang, Guangming Lu, Xiaojun Chang, Weidong (Tom) Cai, 2022, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- Poisson-based image editing for semi-supervised vitiligo lesion segmentation with limited annotations.(Jiacong Wang, Xiaolan Ding, Jun Xiao, 2023, Computers in biology and medicine)
- Contrastive Learning vs. Self-Learning vs. Deformable Data Augmentation in Semantic Segmentation of Medical Images.(Hossein Arabi, Habib Zaidi, 2024, Journal of imaging informatics in medicine)
- Segmentation of Portal Vein in Multiphase CTA Image Based on Unsupervised Domain Transfer and Pseudo Label.(Genshen Song, Ziyue Xie, Haoran Wang, Shiman Li, Demin Yao, Shiyao Chen, Yonghong Shi, 2023, Diagnostics (Basel, Switzerland))
- Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoders(Jiafan Zhuang, Luyang Luo, Hao Chen, 2023, IEEE Transactions on Medical Imaging)
- MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning(Kyeonghun Kim, Hye-Won Jung, Y. Han, Junsu Lim, Yeonju Jean, Seongbin Park, E. Choi, Hyunsu Go, Seoyoung Ju, Seohyoung Park, Gyeongmin Kim, Min-Jin Kwon, Kyungseok Yuh, Soo Yong Kim, K. Liao, N. Kim, Hyuk-Jae Lee, 2026, 2026 International Conference on Electronics, Information, and Communication (ICEIC))
- Histogram of Oriented Gradients meet deep learning: A novel multi-task deep network for 2D surgical image semantic segmentation.(Binod Bhattarai, Ronast Subedi, Rebati Raman Gaire, Eduard Vazquez, Danail Stoyanov, 2023, Medical image analysis)
- Efficient few-shot medical image segmentation via self-supervised variational autoencoder(Yanjie Zhou, Feng Zhou, Fengjun Xi, Yong Liu, Yun Peng, David E. Carlson, Liyun Tu, 2025, Medical Image Analysis)
- DINOv2 Based Self Supervised Learning for Few Shot Medical Image Segmentation(Lev Ayzenberg, Raja Giryes, H. Greenspan, 2024, 2024 IEEE International Symposium on Biomedical Imaging (ISBI))
- FedATA: Adaptive attention aggregation for federated self-supervised medical image segmentation(Jian Dai, Hao Wu, Huan Liu, Liheng Yu, Xing Hu, Xiao Liu, Daoying Geng, 2024, Neurocomputing)
- AMLP: Adjustable Masking Lesion Patches for Self-Supervised Medical Image Segmentation.(Xiangtao Wang, Ruizhi Wang, Thomas Lukasiewicz, Zhenghua Xu, 2025, IEEE transactions on medical imaging)
- SymMatch: Symmetric Bi-Scale Matching with Self-Knowledge Distillation in Semi-Supervised Medical Image Segmentation(Chunshi Wang, Shougan Teng, Shaohua Sun, Bin Zhao, 2024, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis(Jiaxin Zhuang, Linshan Wu, Qiong Wang, Peng Fei, V. Vardhanabhuti, Lin Luo, Hao Chen, 2024, IEEE Transactions on Medical Imaging)
- Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation(Gang Xu, Shengxin Wang, Thomas Lukasiewicz, Zhenghua Xu, 2023, 2023 IEEE International Conference on Multimedia and Expo (ICME))
- Mutual learning with reliable pseudo label for semi-supervised medical image segmentation(Jiawei Su, Zhiming Luo, Sheng Lian, Dazhen Lin, Shaozi Li, 2024, Medical Image Analysis)
- StAC-DA: Structure aware cross-modality domain adaptation framework with image and feature-level adaptation for medical image segmentation.(Maria Baldeon-Calisto, Susana K Lai-Yuen, Bernardo Puente-Mejia, 2024, Digital health)
- DistillMatch: Revisiting Self-Knowledge Distillation in Semi-Supervised Medical Image Segmentation(Chunshi Wang, Bin Zhao, Zhiyang Liu, 2024, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation(Hyungseob Shin, Hyeongyu Kim, Sewon Kim, Yohan Jun, Taejoon Eo, D. Hwang, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?(F. Haghighi, M. Taher, M. B. Gotway, Jianming Liang, 2024, Medical Image Analysis)
- Semi-supervised Medical Image Segmentation through Dual-task Consistency(Xiangde Luo, Jieneng Chen, Tao Song, Yinan Chen, Guotai Wang, Shaoting Zhang, 2020, Proceedings of the AAAI Conference on Artificial Intelligence)
- Image-Level Uncertainty in Pseudo-Label Selection for Semi-Supervised Segmentation.(Payden McBee, Fatima Zulqarnain, Sana Syed, Donald E Brown, 2022, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference)
- Dual Cross-Image Semantic Consistency With Self-Aware Pseudo Labeling for Semi-Supervised Medical Image Segmentation(Han Wu, Chong Wang, Zhiming Cui, 2025, IEEE Transactions on Medical Imaging)
- Learning From AI-Generated Annotations for Medical Image Segmentation(Youyi Song, Yuanlin Liu, Zhizhe Lin, Jinglin Zhou, Duo Li, Teng Zhou, Man-Fai Leung, 2025, IEEE Transactions on Consumer Electronics)
- Dynamic graph consistency and self-contrast learning for semi-supervised medical image segmentation(Gang Li, Jinjie Xie, Ling Zhang, Guijuan Cheng, Kairu Zhang, Mingqi Bai, 2024, Neural Networks)
- Semi-Mamba-UNet: Pixel-level contrastive and cross-supervised visual Mamba-based UNet for semi-supervised medical image segmentation(Chao Ma, Ziyang Wang, 2024, Knowledge-Based Systems)
- Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation(Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, Jinzhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu, 2023, ArXiv Preprint)
- Stitching, Fine-Tuning, and Re-Training: A SAM-Enabled Framework for Semi-Supervised 3D Medical Image Segmentation.(Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao, 2025, IEEE transactions on medical imaging)
- Self-Supervised Learning for Few-Shot Medical Image Segmentation.(Cheng Ouyang, Carlo Biffi, Chen Chen, Turkay Kart, Huaqi Qiu, Daniel Rueckert, 2022, IEEE transactions on medical imaging)
- Linear semantic transformation for semi-supervised medical image segmentation.(Cheng Chen, Yunqing Chen, Xiaoheng Li, Huansheng Ning, Ruoxiu Xiao, 2024, Computers in biology and medicine)
- Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes(Yu Cai, Hao Chen, Xin Yang, Yu Zhou, Kwang-Ting Cheng, 2024, Medical Image Analysis)
- MMGL: Multi-Scale Multi-View Global-Local Contrastive learning for Semi-supervised Cardiac Image Segmentation(Ziyuan Zhao, Jinxuan Hu, Zeng Zeng, Xulei Yang, Peisheng Qian, Bharadwaj Veeravalli, Cuntai Guan, 2022, ArXiv Preprint)
- Self-supervised 3D medical image segmentation by flow-guided mask propagation learning(Adeleh Bitarafan, Mohammad Mozafari, Mohammad Farid Azampour, Mahdieh Soleymani Baghshah, Nassir Navab, Azade Farshad, 2025, Medical Image Analysis)
- DistAL: A Domain-Shift Active Learning Framework With Transferable Feature Learning for Lesion Detection(Fan Bai, Ran Wei, Xiaoyu Bai, Dakai Jin, X. Ye, Le Lu, Ke Yan, Max Q.-H. Meng, 2025, IEEE Transactions on Medical Imaging)
边界优化、轻量化网络与临床实用化落地
该组研究侧重于解决实际部署中的效率问题及特定临床任务中的边界模糊难题,通过网络剪枝、多内核轻量化设计、边界感知分支及定制化多模态方案提升临床辅助诊断效果。
- Dual consistency loss for contour-aware segmentation in medical images.(Helena R Torres, Bruno Oliveira, Jaime C Fonseca, Pedro Morais, Joao L Vilaca, 2023, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference)
- CIA-Net: Robust Nuclei Instance Segmentation with Contour-aware Information Aggregation(Yanning Zhou, O. Onder, Q. Dou, E. Tsougenis, Hao Chen, P. Heng, 2019, Lecture Notes in Computer Science)
- Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation.(Balamurali Murugesan, Kaushik Sarveswaran, Sharath M Shankaranarayana, Keerthi Ram, Jayaraj Joseph, Mohanasankar Sivaprakasam, 2019, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference)
- EG-Unet: Edge-Guided cascaded networks for automated frontal brain segmentation in MR images.(Xiufeng Zhang, Yansong Liu, Shengjin Guo, Zhao Song, 2023, Computers in biology and medicine)
- Morphology and Texture-Guided Deep Neural Network for Intracranial Aneurysm Segmentation in 3D TOF-MRA.(Maysam Orouskhani, Negar Firoozeh, Huayu Wang, Yan Wang, Hanrui Shi, Weijing Li, Beibei Sun, Jianjian Zhang, Xiao Li, Huilin Zhao, Mahmud Mossa-Basha, Jenq-Neng Hwang, Chengcheng Zhu, 2024, Neuroinformatics)
- An ultrasound image segmentation method for thyroid nodules based on dual-path attention mechanism-enhanced UNet+.(Peizhen Dong, Ronghua Zhang, Jun Li, Changzheng Liu, Wen Liu, Jiale Hu, Yongqiang Yang, Xiang Li, 2024, BMC medical imaging)
- Diff-UNet: A diffusion embedded network for robust 3D medical image segmentation(Zhaohu Xing, Liang Wan, Huazhu Fu, Guang Yang, Yijun Yang, Lequan Yu, Baiying Lei, Lei Zhu, 2025, Medical Image Analysis)
- HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation(Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan, 2024, ArXiv Preprint)
- V-UNet: Medical Image Segmentation Based on Variational Attention Mechanism(Yang Zhang, Qiang Yang, Tian Li, Fanghong Zhang, Yu Ren, Yinhao Li, Chuanyun Xu, 2025, CAAI Transactions on Intelligence Technology)
- Dual-channel compression mapping network with fused attention mechanism for medical image segmentation.(Xiaokang Ding, Ke'er Qian, Qile Zhang, Xiaoliang Jiang, Ling Dong, 2025, Scientific reports)
- Fast Segmentation of Vertebrae CT Image Based on the SNIC Algorithm.(Bing Li, Shaoyong Wu, Siqin Zhang, Xia Liu, Guangqing Li, 2022, Tomography (Ann Arbor, Mich.))
- Deep-learning-based semantic segmentation of autonomic nerves from laparoscopic images of colorectal surgery: an experimental pilot study.(Shigehiro Kojima, Daichi Kitaguchi, Takahiro Igaki, Kei Nakajima, Yuto Ishikawa, Yuriko Harai, Atsushi Yamada, Younae Lee, Kazuyuki Hayashi, Norihito Kosugi, Hiro Hasegawa, Masaaki Ito, 2023, International journal of surgery (London, England))
- Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal MR images.(Ramesh Paudyal, Jue Jiang, James Han, Bill H Diplas, Nadeem Riaz, Vaios Hatzoglou, Nancy Lee, Joseph O Deasy, Harini Veeraraghavan, Amita Shukla-Dave, 2024, BJR artificial intelligence)
- Deep Neural Network Pruning for Nuclei Instance Segmentation in Hematoxylin & Eosin-Stained Histological Images(A. Mahbod, R. Entezari, I. Ellinger, O. Saukh, 2022, No journal)
- UNeXt: MLP-based Rapid Medical Image Segmentation Network(Jeya Maria Jose Valanarasu, Vishal M. Patel, 2022, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- MIScnn: a framework for medical image segmentation with convolutional neural networks and deep learning.(Dominik Müller, Frank Kramer, 2021, BMC medical imaging)
- Automated instance segmentation and registration of spinal vertebrae from CT-Scans with an improved 3D U-net neural network and corner point registration.(James Hill, Muhammad Rizwan Khokher, Chuong Nguyen, Matt Adcock, Rongxin Li, Stuart Anderson, Thomas Morrell, Tim Diprose, Olivier Salvado, Dadong Wang, Guan K Tay, 2025, Computers in biology and medicine)
- Development and Validation of a Model for Laparoscopic Colorectal Surgical Instrument Recognition Using Convolutional Neural Network-Based Instance Segmentation and Videos of Laparoscopic Procedures.(Daichi Kitaguchi, Younae Lee, Kazuyuki Hayashi, Kei Nakajima, Shigehiro Kojima, Hiro Hasegawa, Nobuyoshi Takeshita, Kensaku Mori, Masaaki Ito, 2022, JAMA network open)
- BreasTDLUSeg: A coarse-to-fine framework for segmentation of breast terminal duct lobular units on histopathological whole-slide images.(Zixiao Lu, Kai Tang, Yi Wu, Xiaoxuan Zhang, Ziqi An, Xiongfeng Zhu, Qianjin Feng, Yinghua Zhao, 2024, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation(Fabian Isensee, Tassilo Wald, Constantin Ulrich, M. Baumgartner, Saikat Roy, Klaus H. Maier-Hein, P. Jaeger, 2024, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation(G. Reina, Ravi Panchumarthy, Siddhesh P. Thakur, A. Bastidas, S. Bakas, 2020, Frontiers in Neuroscience)
- SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation(Shehan Perera, Pouyan Navard, Alper Yilmaz, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
- Deep learning for automated boundary detection and segmentation in organ donation photography(G. Kourounis, Ali Elmahmudi, Brian Thomson, Robin Nandi, S. Tingle, Emily K. Glover, Emily Thompson, B. Mahendran, Chloe Connelly, Beth Gibson, Lucy Bates, Neil S Sheerin, James Hunter, Hassan Ugail, Colin Wilson, 2024, Innovative Surgical Sciences)
- Retinal Vascular Image Segmentation Using Improved UNet Based on Residual Module.(Ko-Wei Huang, Yao-Ren Yang, Zih-Hao Huang, Yi-Yang Liu, Shih-Hsiung Lee, 2023, Bioengineering (Basel, Switzerland))
- EMCAD: Efficient Multi-Scale Convolutional Attention Decoding for Medical Image Segmentation(M. Rahman, Mustafa Munir, R. Marculescu, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- RM-SSNet: An Ultra-Lightweight Medical Image Segmenter With Recursive Multi-Scale Reasoning and Boundary Enhancement(Jiangpeng Shi, Juanjuan Zhao, Yan Qiang, Qi Chen, Yan Wang, 2025, 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- MK-UNet: Multi-Kernel Lightweight CNN for Medical Image Segmentation(M. Rahman, R. Marculescu, 2025, 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW))
- PePR: Performance Per Resource Unit as a Metric to Promote Small-Scale Deep Learning in Medical Image Analysis(Raghavendra Selvan, Bob Pepin, Christian Igel, Gabrielle Samuel, Erik B Dam, 2024, ArXiv Preprint)
- Semantic-Based Optimization of Deep Learning for Efficient Real-Time Medical Image Segmentation(Zhenkun Wei, Jia Liu, Yu Yao, 2024, International Journal on Semantic Web and Information Systems)
- Lightweight medical image segmentation network with multi-scale feature-guided fusion.(Zhiqin Zhu, Kun Yu, Guanqiu Qi, Baisen Cong, Yuanyuan Li, Zexin Li, Xinbo Gao, 2024, Computers in biology and medicine)
- SSRepVM-UNet: a lightweight hybrid model for medical image segmentation based on channel parallelism(Yijing Guo, Fuhang Li, Kunhua Li, Huawei Wang, Pengyu Xu, 2025, Applied Intelligence)
特定临床场景多任务分割与定制化网络设计
该组文献集中研究针对特定解剖结构(如胰腺、肾脏、心脏、脉络膜等)的病灶或组织分割,采用多任务协同、形态约束或深度监督策略以满足特定临床病理诊断需求。
- A general deep learning framework for neuron instance segmentation based on Efficient UNet and morphological post-processing(Huaqian Wu, N. Souedet, C. Jan, C. Clouchoux, T. Delzescaux, 2022, Computers in Biology and Medicine)
- Diagnosis of Choroidal Disease With Deep Learning-Based Image Enhancement and Volumetric Quantification of Optical Coherence Tomography(K. Maruyama, Song Mei, H. Sakaguchi, Chikako Hara, A. Miki, Zaixing Mao, R. Kawasaki, Zhenguo Wang, S. Sakimoto, N. Hashida, A. Quantock, Kinpui Chan, K. Nishida, 2022, Translational Vision Science & Technology)
- A Comparative Study of Deep Learning Methods for Multi-Class Semantic Segmentation of 2D Kidney Ultrasound Images.(Simao Valente, Pedro Morais, Helena R Torres, Bruno Oliveira, L R Buschle, A Fritz, Jorge Correia-Pinto, Estevao Lima, Joao L Vilaca, 2023, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference)
- EfficientNet family U-Net models for deep learning semantic segmentation of kidney tumors on CT images(Abubaker Abdelrahman, Serestina Viriri, 2023, Frontiers in Computer Science)
- Multi-scale context UNet-like network with redesigned skip connections for medical image segmentation.(Ledan Qian, Caiyun Wen, Yi Li, Zhongyi Hu, Xiao Zhou, Xiaonyu Xia, Soo-Hyung Kim, 2024, Computer methods and programs in biomedicine)
- Image segmentation and feature extraction method for lung lesion detection in computed tomography images(M. F. Abdullah, S. N. Sulaiman, M. K. Osman, S. Setumin, N. Karim, F. A. Sahimi, A. I. C. Ani, 2023, Journal of Physics: Conference Series)
- Building medical image classifiers with very limited data using segmentation networks(Ken C. L. Wong, Tanveer Syeda-Mahmood, Mehdi Moradi, 2018, ArXiv Preprint)
- CE-Net: Context Encoder Network for 2D Medical Image Segmentation(Zaiwang Gu, Jun Cheng, H. Fu, Kang Zhou, Huaying Hao, Yitian Zhao, Tianyang Zhang, Shenghua Gao, Jiang Liu, 2019, IEEE Transactions on Medical Imaging)
- SWT-UNet: Medical Image Segmentation Based on Multi-modality UNet with Sliding Window Transformer Block(Yangqianhui Zhang, Dong Han, Xiaoming Gang, Jiajun Ma, Xu Yuan, 2023, 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- Deep Semantic Segmentation Feature-Based Radiomics for the Classification Tasks in Medical Image Analysis.(Bingsheng Huang, Junru Tian, Hongyuan Zhang, Zixin Luo, Jing Qin, Chen Huang, Xueping He, Yanji Luo, Yongjin Zhou, Guo Dan, Hanwei Chen, Shi-Ting Feng, Chenglang Yuan, 2021, IEEE journal of biomedical and health informatics)
- Colored MRI biomedical image tumor classification and segmentation based on transfer learning of modified Y-Net(Nassr Nafeaa Khamis, Rahma Saadi Mustaf, 2023, ITM Web of Conferences)
- Automated Segmentation of Microvessels in Intravascular OCT Images Using Deep Learning.(Juhwan Lee, Justin N Kim, Lia Gomez-Perez, Yazan Gharaibeh, Issam Motairek, Gabriel T R Pereira, Vladislav N Zimin, Luis A P Dallan, Ammar Hoori, Sadeer Al-Kindi, Giulio Guagliumi, Hiram G Bezerra, David L Wilson, 2022, Bioengineering (Basel, Switzerland))
- Segmentation and classification of colon glands with deep convolutional neural networks and total variation regularization.(Philipp Kainz, Michael Pfeiffer, Martin Urschler, 2017, PeerJ)
- A category attention instance segmentation network for four cardiac chambers segmentation in fetal echocardiography.(Shan An, Haogang Zhu, Yuanshuai Wang, Fangru Zhou, Xiaoxue Zhou, Xu Yang, Yingying Zhang, Xiangyu Liu, Zhicheng Jiao, Yihua He, 2021, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- Deep Learning for Medical Image Segmentation: Applications in Disease Detection and Diagnosis(Kannan R, Jestus J, Madesh P, 2024, 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS))
- Hybrid Ensemble DL Model for Breast Cancer Detection and Classification with Enhanced Breast Lesion Segmentation using U-Net Model(M. Selvam, T. Seshaiah, I. Saidulu, Ghassan Samara, Aravinda K, 2025, 2025 International Conference on Recent Innovation in Science Engineering and Technology (ICRISET))
- V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation(Fausto Milletarì, N. Navab, Seyed-Ahmad Ahmadi, 2016, 2016 Fourth International Conference on 3D Vision (3DV))
- Capsules for biomedical image segmentation.(Rodney LaLonde, Ziyue Xu, Ismail Irmakci, Sanjay Jain, Ulas Bagci, 2021, Medical image analysis)
- Deep learning for multi-class semantic segmentation enables colorectal cancer detection and classification in digital pathology images.(John-Melle Bokhorst, Iris D Nagtegaal, Filippo Fraggetta, Simona Vatrano, Wilma Mesker, Michael Vieth, Jeroen van der Laak, Francesco Ciompi, 2023, Scientific reports)
- Segmentation for mammography classification utilizing deep convolutional neural network(D. Saha, Tuhin Hossain, Mejdl S. Safran, Sultan Alfarhood, M. F. Mridha, Dunren Che, 2024, BMC Medical Imaging)
- Automated magnetic resonance image segmentation of the anterior cruciate ligament.(Sean W Flannery, Ata M Kiapour, David J Edgar, Martha M Murray, Braden C Fleming, 2021, Journal of orthopaedic research : official publication of the Orthopaedic Research Society)
- Segmentation Quality and Volumetric Accuracy in Medical Imaging(Zheyuan Zhang, Ulas Bagci, 2024, ArXiv Preprint)
- LinSEM: Linearizing segmentation evaluation metrics for medical images.(Jieyu Li, Jayaram K Udupa, Yubing Tong, Lisheng Wang, Drew A Torigian, 2020, Medical image analysis)
- Nuclei instance segmentation from histopathology images using Bayesian dropout based deep learning(N. R. Gudhe, V. Kosma, Hamid Behravan, A. Mannermaa, 2023, BMC Medical Imaging)
- Self-Attention Diffusion Models for Zero-Shot Biomedical Image Segmentation: Unlocking New Frontiers in Medical Imaging.(Abderrachid Hamrani, Anuradha Godavarty, 2025, Bioengineering (Basel, Switzerland))
- Skin-lesion segmentation using boundary-aware segmentation network and classification based on a mixture of convolutional and transformer neural networks(Javaria Amin, Marium Azhar, Habibah Arshad, A. Zafar, S. Kim, 2025, Frontiers in Medicine)
- Leveraging transfer learning-driven convolutional neural network-based semantic segmentation model for medical image analysis using MRI images.(Amal Alshardan, Nuha Alruwais, Hamed Alqahtani, Asma Alshuhail, Wafa Sulaiman Almukadi, Ahmed Sayed, 2024, Scientific reports)
- MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation.(Anass Garbaz, Yassine Oukdach, Said Charfi, Mohamed El Ansari, Lahcen Koutti, Mouna Salihoun, 2024, Methods (San Diego, Calif.))
- SEF-UNet: advancing abdominal multi-organ segmentation with SEFormer and depthwise cascaded upsampling.(Yaping Zhao, Yizhang Jiang, Lijun Huang, Kaijian Xia, 2024, PeerJ. Computer science)
- SABOS‐Net: Self‐supervised attention based network for automatic organ segmentation of head and neck CT images(Seenia Francis, Goutham Pooloth, Sai Bala Subrahmanyam Singam, Niyas Puzhakkal, Pournami Pulinthanathu Narayanan, Jayaraj Pottekkattuvalappil Balakrishnan, 2022, International Journal of Imaging Systems and Technology)
- SegTom: A 3D Volumetric Medical Image Segmentation Framework for Thoracoabdominal Multi-Organ Anatomical Structures(Yan Pang, Yunhao Li, Jiaming Liang, Haoyu Chen, Ying Hu, Qiong Wang, 2025, IEEE Journal of Biomedical and Health Informatics)
- Hybrid CNN-Transformer Model with Attention-Based Decoding for Accurate Abdominal Organ Segmentation in Medical Imaging(N. Lakshmi, S. S, Suriyaprakash M, Joshna K, Ramya Sree P, 2025, 2025 Fourth International Conference on Smart Technologies, Communication and Robotics (STCR))
- Instance Segmentation of Oral Cancer Images with Fusion of Swin Transformer and Mask RCNN(K. C., V. H S, 2025, Journal of Innovative Image Processing)
- An Integrated Deep Learning Model for Pancreatic Cancer Segmentation and Classification based on CT Images(Koteswaramma Dodda, G. Muneeswari, 2025, 2025 International Conference on Emerging Systems and Intelligent Computing (ESIC))
- Tooth instance segmentation from cone-beam CT images through point-based detection and Gaussian disentanglement(Jusang Lee, M. Chung, Minkyung Lee, Yeong-Gil Shin, 2021, Multimedia Tools and Applications)
- Application of deep learning for semantic segmentation in robotic prostatectomy: Comparison of convolutional neural networks and visual transformers.(Sahyun Pak, Sung Gon Park, Jeonghyun Park, Hong Rock Choi, Jun Ho Lee, Wonchul Lee, Sung Tae Cho, Young Goo Lee, Hanjong Ahn, 2024, Investigative and clinical urology)
- Multimodal Fusion for Enhanced Semantic Segmentation in Brain Tumor Imaging: Integrating Deep Learning and Guided Filtering Via Advanced 3D Semantic Segmentation Architectures(Abbadullah Husham Saleh ALEZZI, Ümit Atila, Oguzhan Menemencioglu, 2024, International Journal of Imaging Systems and Technology)
- Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models(Haoyue Zhang, Jennifer S. Polson, Eric J. Yang, Kambiz Nael, William Speier, Corey W. Arnold, 2023, ArXiv Preprint)
- SACA-UNet:Medical Image Segmentation Network Based on Self-Attention and ASPP(Gaojuan Fan, Jie Wang, Chongsheng Zhang, 2023, 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS))
- Automated cardiac segmentation of cross-modal medical images using unsupervised multi-domain adaptation and spatial neural attention structure.(Jinping Liu, Hui Liu, Subo Gong, Zhaohui Tang, Yongfang Xie, Huazhan Yin, Jean Paul Niyoyita, 2021, Medical image analysis)
- Information Geometric Approaches for Patient-Specific Test-Time Adaptation of Deep Learning Models for Semantic Segmentation(Hariharan Ravishankar, Naveen Paluru, Prasad Sudhakar, P. Yalavarthy, 2025, IEEE Transactions on Medical Imaging)
- Interactive 3D Segmentation Editing and Refinement via Gated Graph Neural Networks(Xiaosong Wang, Ling Zhang, H. Roth, Daguang Xu, Ziyue Xu, 2019, Lecture Notes in Computer Science)
- Bayesian image segmentation using local iso-intensity structural orientation.(Wilbur C K Wong, Albert C S Chung, 2005, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society)
- A deep learning-based interactive medical image segmentation framework with sequential memory.(Ivan Mikhailov, Benoit Chauveau, Nicolas Bourdel, Adrien Bartoli, 2024, Computer methods and programs in biomedicine)
- Bone Region Segmentation in Medical Images Based on Improved Watershed Algorithm.(Jun Zhou, Mei Yang, 2022, Computational intelligence and neuroscience)
- TNTdetect.AI: A Deep Learning Model for Automated Detection and Counting of Tunneling Nanotubes in Microscopy Images.(Yasin Ceran, Hamza Ergüder, Katherine Ladner, Sophie Korenfeld, Karina Deniz, Sanyukta Padmanabhan, Phillip Wong, Murat Baday, Thomas Pengo, Emil Lou, Chirag B Patel, 2022, Cancers)
- Robust three-dimensional object definition in CT and MRI.(P H Bland, C R Meyer, 1996, Medical physics)
- UNet segmentation network of COVID-19 CT images with multi-scale attention.(Mingju Chen, Sihang Yi, Mei Yang, Zhiwen Yang, Xingyue Zhang, 2023, Mathematical biosciences and engineering : MBE)
- Enhanced Skin Disease Detection Using Deep Learning with Fully-Convolutional Residual Networks and Lesion Index Calculation Unit(C. Thilagavathi, S. Thilagamani, 2024, 2024 9th International Conference on Communication and Electronics Systems (ICCES))
- A deep-learning semantic segmentation approach to fully automated MRI-based left-ventricular deformation analysis in cardiotoxicity.(By Julia Kar, Michael V Cohen, Samuel P McQuiston, Christopher M Malozzi, 2021, Magnetic resonance imaging)
- Automatic Deep Learning Semantic Segmentation of Ultrasound Thyroid Cineclips Using Recurrent Fully Convolutional Networks(Jeremy M. Webb, Duane D. Meixner, Shaheeda A. Adusei, E. Polley, M. Fatemi, A. Alizad, 2020, IEEE Access)
- U‐VQVAE‐CTLesionNet: A Generalized Deep Learning Framework for Multi‐Organ Lesion Detection and Segmentation in Medical Imaging(Alok Kumar, N. Mahendran, 2026, Computational Intelligence)
- LD-UNet: A long-distance perceptual model for segmentation of blurred boundaries in medical images.(Shuchao Chen, Chao Luo, Shanshan Liu, Haojiang Li, Yifei Liu, Haoyang Zhou, Lizhi Liu, Hongbo Chen, 2024, Computers in biology and medicine)
- Vertebrae and intervertebral discs segmentation using deep learning-based model in disability analysis.(Nizar Alsharif, Rajit Nair, Theyazn H H Aldhyani, Nesren S Farhah, Sultan Ahmad, Abdullah H Al-Nefaie, 2026, Frontiers in medicine)
- Hierarchical Pictorial Structures for Simultaneously Localizing Multiple Organs in Volumetric Pre-Scan CT.(Albert Montillo, Qi Song, Bipul Das, Zhye Yin, 2015, Proceedings of SPIE--the International Society for Optical Engineering)
- Marginal Space Deep Learning: Efficient Architecture for Volumetric Image Parsing.(Florin C Ghesu, Edward Krubasik, Bogdan Georgescu, Vivek Singh, Yefeng Zheng, Joachim Hornegger, Dorin Comaniciu, 2016, IEEE transactions on medical imaging)
- Classification of tissue biopsies by Raman spectroscopy guided by quantitative phase imaging and its application to bladder cancer.(Almog Taieb, Garry Berkovic, Miki Haifler, Ori Cheshnovsky, Natan T Shaked, 2022, Journal of biophotonics)
- Managing class imbalance and differential staining of immune cell populations in multi-class instance segmentation of multiplexed immunofluorescence images of Lupus Nephritis biopsies(Madeleine S. Durkee, R. Abraham, Junting Ai, M. Clark, M. Giger, 2021, Medical Imaging 2021: Digital Pathology)
- Interpretable and synergistic deep learning for visual explanation and statistical estimations of segmentation of disease features from medical images(Sambuddha Ghosal, Pratik Shah, 2020, ArXiv Preprint)
- Whole brain segmentation with full volume neural network.(Yeshu Li, Jonathan Cui, Yilun Sheng, Xiao Liang, Jingdong Wang, Eric I-Chao Chang, Yan Xu, 2021, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- GLAC-Unet: Global-Local Active Contour Loss with an Efficient U-Shaped Architecture for Multiclass Medical Image Segmentation.(Minh-Nhat Trinh, Thi-Thao Tran, Do-Hai-Ninh Nham, Men-Tzung Lo, Van-Truong Pham, 2025, Journal of imaging informatics in medicine)
- Medical breast ultrasound image segmentation by machine learning.(Yuan Xu, Yuxin Wang, Jie Yuan, Qian Cheng, Xueding Wang, Paul L Carson, 2019, Ultrasonics)
- Accurate segmentation of nuclear instances using a double-stage neural network(Kesi Xu, M. Jahanifar, S. Graham, N. Rajpoot, 2023, Medical Imaging 2023: Digital and Computational Pathology)
- I2U-Net: A dual-path U-Net with rich information interaction for medical image segmentation(Duwei Dai, C. Dong, Qingsen Yan, Yongheng Sun, Chunyan Zhang, Zongfang Li, Song-Yuan Xu, 2024, Medical Image Analysis)
- Segmentation and classification of skin lesions using hybrid deep learning method in the Internet of Medical Things.(Arslan Akram, Javed Rashid, Muhammad Arfan Jaffar, Muhammad Faheem, Riaz Ul Amin, 2023, Skin research and technology : official journal of International Society for Bioengineering and the Skin (ISBS) [and] International Society for Digital Imaging of Skin (ISDIS) [and] International Society for Skin Imaging (ISSI))
- Validation of automated artificial intelligence segmentation of optical coherence tomography images.(Peter M Maloca, Aaron Y Lee, Emanuel R de Carvalho, Mali Okada, Katrin Fasler, Irene Leung, Beat Hörmann, Pascal Kaiser, Susanne Suter, Pascal W Hasler, Javier Zarranz-Ventura, Catherine Egan, Tjebo F C Heeren, Konstantinos Balaskas, Adnan Tufail, Hendrik P N Scholl, 2019, PloS one)
- PCAN: Pixel-wise classification and attention network for thoracic disease classification and weakly supervised localization.(Xiongfeng Zhu, Shumao Pang, Xiaoxuan Zhang, Junzhang Huang, Lei Zhao, Kai Tang, Qianjin Feng, 2022, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- Deep Learning Model for Coronary Angiography.(Hao Ling, Biqian Chen, Renchu Guan, Yu Xiao, Hui Yan, Qingyu Chen, Lianru Bi, Jingbo Chen, Xiaoyue Feng, Haoyu Pang, Chunli Song, 2023, Journal of cardiovascular translational research)
- Attentive neural cell instance segmentation(Jingru Yi, Pengxiang Wu, Menglin Jiang, Qiaoying Huang, Daniel J. Hoeppner, Dimitris N. Metaxas, 2019, Medical Image Analysis)
- InstantDL: an easy-to-use deep learning pipeline for image segmentation and classification.(Dominik Jens Elias Waibel, Sayedali Shetab Boushehri, Carsten Marr, 2021, BMC bioinformatics)
- UNet++: A Nested U-Net Architecture for Medical Image Segmentation(Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang, 2018, ArXiv Preprint)
- 3D Kidneys and Kidney Tumor Semantic Segmentation using Boundary-Aware Networks(Andriy Myronenko, Ali Hatamizadeh, 2019, ArXiv Preprint)
- Capsule-CRF Fusion: An Efficient & Novel Method for Precise Skin Lesion Localization and Boundary Estimations(Namrata Verma, P. K. Mishra, R. Janghel, 2023, 2023 OITS International Conference on Information Technology (OCIT))
- StrainNet: Improved Myocardial Strain Analysis of Cine MRI by Deep Learning from DENSE.(Yu Wang, Changyu Sun, Sona Ghadimi, Daniel C Auger, Pierre Croisille, Magalie Viallon, Kenneth Mangion, Colin Berry, Christopher M Haggerty, Linyuan Jing, Brandon K Fornwalt, J Jane Cao, Joshua Cheng, Andrew D Scott, Pedro F Ferreira, John N Oshinski, Daniel B Ennis, Kenneth C Bilchick, Frederick H Epstein, 2023, Radiology. Cardiothoracic imaging)
- Segmentation of vertebrae and intervertebral discs in lumbar spine MR images with iterative instance segmentation(J. W. V. D. Graaf, Miranda L. van Hooff, C. Buckens, N. Lessmann, 2022, Medical Imaging 2022: Image Processing)
- Volumetric Analysis of Brain Tumor Magnetic Resonance Image(Hapsari Peni Agustin, H. Hidayati, A. Sooai, I. K. E. Purnama, M. Purnomo, 2019, 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM))
- Understanding the Mechanisms of Deep Transfer Learning for Medical Images(Hariharan Ravishankar, Prasad Sudhakar, Rahul Venkataramani, Sheshadri Thiruvenkadam, Pavan Annangi, Narayanan Babu, Vivek Vaidya, 2017, ArXiv Preprint)
- A Deep Learning Network for Classifying Arteries and Veins in Montaged Widefield OCT Angiograms.(Min Gao, Yukun Guo, Tristan T Hormel, Kotaro Tsuboi, George Pacheco, David Poole, Steven T Bailey, Christina J Flaxel, David Huang, Thomas S Hwang, Yali Jia, 2022, Ophthalmology science)
- PnP-AE: A Plug-and-Play Module for Volumetric Medical Image Segmentation(Qiankun Li, Xiaolong Huang, Bo Fang, Yani Zhang, Yongyong Chen, Junxing Chen, 2023, 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- Prostate Cancer Lesion Detection Based on Pseudo-Normal Image Generation Using Generative Adversarial Networks(Yang Liu, 2025, 2025 2nd International Conference on Image, Signal Processing and Communication Technology (ISPCT))
- Learning With Context Feedback Loop for Robust Medical Image Segmentation(Kibrom Berihu Girum, Gilles Créhange, Alain Lalande, 2021, ArXiv Preprint)
- Deep learning and its application to medical image segmentation(Holger R. Roth, Chen Shen, Hirohisa Oda, Masahiro Oda, Yuichiro Hayashi, Kazunari Misawa, Kensaku Mori, 2018, ArXiv Preprint)
- KAC-Unet: A Medical Image Segmentation With the Adaptive Group Strategy and Kolmogorov-Arnold Network(S. Lin, Rong Hu, Zuoyong Li, Qinghua Lin, Kun Zeng, Xiang Wu, 2025, IEEE Transactions on Instrumentation and Measurement)
- Artificial Intelligence-Based Tool for Tumor Detection and Quantitative Tissue Analysis in Colorectal Specimens.(Johanna Griem, Marie-Lisa Eich, Simon Schallenberg, Alexey Pryalukhin, Andrey Bychkov, Junya Fukuoka, Vitaliy Zayats, Wolfgang Hulla, Jijgee Munkhdelger, Alexander Seper, Tsvetan Tsvetkov, Anirban Mukhopadhyay, Antoine Sanner, Jonathan Stieber, Moritz Fuchs, Niklas Babendererde, Birgid Schömig-Markiefka, Sebastian Klein, Reinhard Buettner, Alexander Quaas, Yuri Tolkach, 2023, Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc)
- Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network(Qi Chen, Wenmin Wang, Zhibing Wang, Haomei Jia, Minglu Zhao, 2025, Applied Sciences)
- Deep learning-based semantic segmentation of non-melanocytic skin tumors in whole-slide histopathological images.(Linyan Wang, An Shao, Fengbo Huang, Zhengyun Liu, Yaqi Wang, Xingru Huang, Juan Ye, 2023, Experimental dermatology)
- Automated tumor volumetry using computer-aided image segmentation.(Bilwaj Gaonkar, Luke Macyszyn, Michel Bilello, Mohammed Salehi Sadaghiani, Hamed Akbari, Mark A Atthiah, Zarina S Ali, Xiao Da, Yiqang Zhan, Donald O'Rourke, Sean M Grady, Christos Davatzikos, 2015, Academic radiology)
- Segmentation of Medical Image Using Novel Dilated Ghost Deep Learning Model.(Marcelo Zambrano-Vizuete, Miguel Botto-Tobar, Carmen Huerta-Suárez, Wladimir Paredes-Parada, Darwin Patiño Pérez, Tariq Ahamed Ahanger, Neilys Gonzalez, 2022, Computational intelligence and neuroscience)
- DEEP LEARNING METHODOLOGIES FOR NUCLEI SEGMENTATION AND MITOSIS DETECTION IN HISTOPATHOLOGICAL IMAGES ANALYSIS(Nooshin Nemati, R. Samet, Emrah Hançer, S. Dizbay Sak, Ayça Kırmızı, Z. Yildirim, 2025, Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi)
- CP-Net: Instance-aware part segmentation network for biological cell parsing.(Wenyuan Chen, Haocong Song, Changsheng Dai, Zongjie Huang, Andrew Wu, Guanqiao Shan, Hang Liu, Aojun Jiang, Xingjian Liu, Changhai Ru, Khaled Abdalla, Shivani N Dhanani, Katy Fatemeh Moosavi, Shruti Pathak, Clifford Librach, Zhuoran Zhang, Yu Sun, 2024, Medical image analysis)
- A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation(Zhuotun Zhu, Yingda Xia, Wei-Che Shen, E. Fishman, A. Yuille, 2017, 2018 International Conference on 3D Vision (3DV))
- SpineParseNet: Spine Parsing for Volumetric MR Image by a Two-Stage Segmentation Framework With Semantic Image Representation.(Shumao Pang, Chunlan Pang, Lei Zhao, Yangfan Chen, Zhihai Su, Yujia Zhou, Meiyan Huang, Wei Yang, Hai Lu, Qianjin Feng, 2021, IEEE transactions on medical imaging)
- Stochastic gradient descent optimisation for convolutional neural network for medical image segmentation.(Sanam Nagendram, Arunendra Singh, Gade Harish Babu, Rahul Joshi, Sandeep Dwarkanath Pande, S K Hasane Ahammad, Dharmesh Dhabliya, Aadarsh Bisht, 2023, Open life sciences)
- A Dual-Task Synergy-Driven Generalization Framework for Pancreatic Cancer Segmentation in CT Scans.(Jun Li, Yijue Zhang, Haibo Shi, Minhong Li, Qiwei Li, Xiaohua Qian, 2025, IEEE transactions on medical imaging)
- MLDA-Net: Multi-Level Deep Aggregation Network for 3D Nuclei Instance Segmentation(Bin Hu, Zhiwei Ye, Zimei Wei, E. Snezhko, V. Kovalev, Mang Ye, 2025, IEEE Journal of Biomedical and Health Informatics)
- Thyroid nodule segmentation and classification in ultrasound images through intra- and inter-task consistent learning.(Qingbo Kang, Qicheng Lao, Yiyue Li, Zekun Jiang, Yue Qiu, Shaoting Zhang, Kang Li, 2022, Medical image analysis)
- A Unified Segmentation Network for Multi-Organ Lesion Detection in Two-Dimensional Grayscale Ultrasound Images(Hanshuo Xing, Yan Jiang, Xinyu Cao, Yin Fang, Peiyan Wu, Wenbo Song, Xinglong Wu, 2023, 2023 International Conference on Artificial Intelligence Innovation (ICAII))
- A Multiscale Attentional Unet Model for Automatic Segmentation in Medical Ultrasound Images.(Rui Wang, Haoyuan Zhou, Peng Fu, Hui Shen, Yang Bai, 2023, Ultrasonic imaging)
- Enhanced lung cancer detection: Integrating improved random walker segmentation with artificial neural network and random forest classifier.(Sneha S Nair, V N Meena Devi, Saju Bhasi, 2024, Heliyon)
- Weakly supervised brain tumor segmentation via semantic affinity deep neural network(Moshe Yerachmiel, H. Greenspan, 2022, Medical Imaging 2022: Image Processing)
- SegQC: a segmentation network-based framework for multi-metric segmentation quality control and segmentation error detection in volumetric medical images(Bella Specktor-Fadida, L. Ben‐Sira, D. Ben-Bashat, Leo Joskowicz, 2025, Medical Image Analysis)
- Comparative Analysis of YOLOv8 and YOLOv11 in Breast Lesion Detection(Bridget Beatrix Claire, D. M. Wonohadidjojo, 2025, bit-Tech)
- Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-Fine Framework and Its Adversarial Examples(Yingwei Li, Zhuotun Zhu, Yuyin Zhou, Yingda Xia, Wei Shen, E. Fishman, A. Yuille, 2020, Advances in Computer Vision and Pattern Recognition)
- Deep Anatomical Federated Network (Dafne): an open client/server framework for the continuous collaborative improvement of deep-learning-based medical image segmentation(F. Santini, J. Wasserthal, Abramo Agosti, X. Deligianni, K. Keene, H. Kan, Stefan Sommer, Christoph Stuprich, Fengdan Wang, C. Weidensteiner, Giulia Manco, Valentina Mazzoli, Arjun D Desai, A. Pichiecchio, 2023, Radiology: Artificial Intelligence)
- Calibrated bagging deep learning for image semantic segmentation: A case study on COVID-19 chest X-ray image.(Lucy Nwosu, Xiangfang Li, Lijun Qian, Seungchan Kim, Xishuang Dong, 2022, PloS one)
- UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation(Zongwei Zhou, M. R. Siddiquee, Nima Tajbakhsh, Jianming Liang, 2019, IEEE Transactions on Medical Imaging)
- SegNet Network Architecture for Deep Learning Image Segmentation and Its Integrated Applications and Prospects(Chenwei Zhang, Wenran Lu, Jiang Wu, Chunhe Ni, Hongbo Wang, 2024, Academic Journal of Science and Technology)
- Deformable multi-level feature network applied to nucleus segmentation.(Shulei Chang, Tingting Yang, Bowen Yin, Jiayi Zhang, Liang Ma, Yanhui Ding, Xiaodan Sui, 2024, Frontiers in microbiology)
- Benchmarking Semantic Segmentation Approaches for Polyp and Lesion Detection in Medical Imaging(P. V, Meghana Sunil, Shravya V, Shravan Venkatraman, Kannan A, 2024, 2024 International Conference on Emerging Research in Computational Science (ICERCS))
- MA-DenseUNet: A Skin Lesion Segmentation Method Based on Multi-Scale Attention and Bidirectional LSTM(Wenbo Huang, Xudong Cai, Yang Yan, Yufeng Kang, 2025, Applied Sciences)
- NG-NAS: Node growth neural architecture search for 3D medical image segmentation.(Shixi Qin, Zixun Zhang, Yuncheng Jiang, Shuguang Cui, Shenghui Cheng, Zhen Li, 2023, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society)
- A shape context fully convolutional neural network for segmentation and classification of cervical nuclei in Pap smear images.(Elima Hussain, Lipi B Mahanta, Chandana Ray Das, Manjula Choudhury, Manish Chowdhury, 2020, Artificial intelligence in medicine)
- Automatic tooth periodontal ligament segmentation of cone beam computed tomography based on instance segmentation network.(Sha Su, Xueting Jia, Liping Zhan, Siyuan Gao, Qing Zhang, Xiaofeng Huang, 2024, Heliyon)
- Medical image segmentation with UNet-based multi-scale context fusion(Yongqi Yuan, Yong Cheng, 2024, Scientific Reports)
- Uncertainty-Aware Artery/Vein Classification on Retinal Images(A. Galdrán, Maria Inês Meyer, P. Costa, A. Mendonça, A. Campilho, 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019))
- Deep Learning-Based Boundary Detection for Model-Based Segmentation with Application to MR Prostate Segmentation(T. Brosch, J. Peters, A. Groth, T. Stehlé, J. Weese, 2018, Lecture Notes in Computer Science)
- Towards Clinical Diagnosis: Automated Stroke Lesion Segmentation on Multimodal MR Image Using Convolutional Neural Network(Zhiyang Liu, Chen Cao, Shuxue Ding, Tong Han, Hong Wu, Sheng Liu, 2018, ArXiv Preprint)
- Segment Like A Doctor: Learning reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation.(Liwen Zou, Yingying Cao, Ziwei Nie, Liang Mao, Yudong Qiu, Zhongqiu Wang, Zhenghua Cai, Xiaoping Yang, 2025, Medical image analysis)
- Deep Learning Model for the Automated Detection and Histopathological Prediction of Meningioma.(Hua Zhang, Jiajie Mo, Han Jiang, Zhuyun Li, Wenhan Hu, Chao Zhang, Yao Wang, Xiu Wang, Chang Liu, Baotian Zhao, Jianguo Zhang, Kai Zhang, 2021, Neuroinformatics)
- An attention mechanism-based lightweight UNet for musculoskeletal ultrasound image segmentation.(Yan Zhang, Xilong Yu, Qing Hu, Xianlei Zhang, Yixin Yang, Han Xiao, 2025, Medical physics)
- Automatic identification of segmentation errors for radiotherapy using geometric learning(E. Henderson, A. Green, M. Herk, E. V. Osorio, 2022, International Conference on Medical Image Computing and Computer-Assisted Intervention)
- Semantic Segmentation of Diabetic Retinopathy Lesions Using Deep Learning(Dimitrios Theodoropoulos, Nikolaos Sifakis, G. Manikis, G. Papadourakis, Konstantinos Armyras, Konstantinos Marias, 2025, SN Computer Science)
- Performance and Robustness of Regional Image Segmentation Driven by Selected Evolutionary and Genetic Algorithms: Study on MR Articular Cartilage Images.(Jan Kubicek, Alice Varysova, Martin Cerny, Kristyna Hancarova, David Oczka, Martin Augustynek, Marek Penhaker, Ondrej Prokop, Radomir Scurek, 2022, Sensors (Basel, Switzerland))
- Rapid multi-catheter segmentation for magnetic resonance image-guided catheter-based interventions.(Amanda M Aleong, Alejandro Berlin, Jette Borg, Joelle Helou, Akbar Beiki-Ardakani, Alexandra Rink, Srinivas Raman, Peter Chung, Robert A Weersink, 2024, Medical physics)
- Deep learning based retinal vessel segmentation and hypertensive retinopathy quantification using heterogeneous features cross-attention neural network.(Xinghui Liu, Hongwen Tan, Wu Wang, Zhangrong Chen, 2024, Frontiers in medicine)
- DENSE-INception U-net for medical image segmentation.(Ziang Zhang, Chengdong Wu, Sonya Coleman, Dermot Kerr, 2020, Computer methods and programs in biomedicine)
- HEDN: multi-oriented hierarchical extraction and dual-frequency decoupling network for 3D medical image segmentation.(Yu Wang, Guoheng Huang, Zeng Lu, Ying Wang, Xuhang Chen, Xiaochen Yuan, Yan Li, Jieni Liu, Yingping Huang, 2025, Medical & biological engineering & computing)
- 3D-ESPNet with Pyramidal Refinement for Volumetric Brain Tumor Image Segmentation(Nicholas Nuechterlein, Sachin Mehta, 2018, Lecture Notes in Computer Science)
- Extraction of volumetric indices from echocardiography: which deep learning solution for clinical use?(Han Ling, Nathan Painchaud, P. Courand, Pierre-Marc Jodoin, Damien Garcia, O. Bernard, 2023, Lecture Notes in Computer Science)
- Hybrid Intelligent-Annotation Organ Segmentation on Medical Datasets(Tao Peng, Jing Zhao, Yidong Gu, Gongye Di, Lei Zhang, Jing Cai, 2023, 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC))
- Instance Tumor Segmentation using Multitask Convolutional Neural Network(Mina Rezaei, Haojin Yang, C. Meinel, 2018, 2018 International Joint Conference on Neural Networks (IJCNN))
- StarDist Image Segmentation Improves Circulating Tumor Cell Detection.(Michiel Stevens, Afroditi Nanou, Leon W M M Terstappen, Christiane Driemel, Nikolas H Stoecklein, Frank A W Coumans, 2022, Cancers)
- SeUneter: Channel attentive U-Net for instance segmentation of the cervical spine MRI medical image.(Xiang Zhang, Yi Yang, Yi-Wei Shen, Ping Li, Yuan Zhong, Jing Zhou, Ke-Rui Zhang, Chang-Yong Shen, Yi Li, Meng-Fei Zhang, Long-Hai Pan, Li-Tai Ma, Hao Liu, 2022, Frontiers in physiology)
- DRINet for Medical Image Segmentation.(Liang Chen, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, Daniel Rueckert, 2018, IEEE transactions on medical imaging)
- DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation(Debesh Jha, M. Riegler, Dag Johansen, P. Halvorsen, Haavard D. Johansen, 2020, 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS))
- Deep learning based semantic segmentation of leukemia effected white blood cell(Zahoor Jan, Muhammad Shabir, Haleem Farman, Afzal Rahman, Moustafa M. Nasralla, 2025, PLOS One)
- Robust deep learning-based semantic organ segmentation in hyperspectral images.(Silvia Seidlitz, Jan Sellner, Jan Odenthal, Berkin Özdemir, Alexander Studier-Fischer, Samuel Knödler, Leonardo Ayala, Tim J Adler, Hannes G Kenngott, Minu Tizabi, Martin Wagner, Felix Nickel, Beat P Müller-Stich, Lena Maier-Hein, 2022, Medical image analysis)
- Segmentation and Boundary Detection of Fetal Kidney Images in Second and Third Trimesters Using Kernel-Based Fuzzy Clustering(S. Meenakshi, M. Suganthi, P. S. Kumar, 2019, Journal of Medical Systems)
- Thalamus Optimized Multi Atlas Segmentation (THOMAS): fast, fully automated segmentation of thalamic nuclei from structural MRI.(Jason H Su, Francis T Thomas, Willard S Kasoff, Thomas Tourdias, Eun Young Choi, Brian K Rutt, Manojkumar Saranathan, 2019, NeuroImage)
- Unsupervised Segmentation of 3D Medical Images Based on Clustering and Deep Representation Learning(Takayasu Moriya, Holger R. Roth, Shota Nakamura, Hirohisa Oda, Kai Nagara, Masahiro Oda, Kensaku Mori, 2018, ArXiv Preprint)
- Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation(Cheng Zeng, Xinyu Yang, D. Smithard, M. Mirmehdi, A. Gambaruto, T. Burghardt, 2023, 2023 IEEE International Conference on Image Processing (ICIP))
- [Medical image automatic adjusting window and segmentation].(Zhenhuan Zhou, Siping Chen, Duchun Tao, Xinhai Chen, 2005, Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi)
- CXR-Seg: A Novel Deep Learning Network for Lung Segmentation from Chest X-Ray Images(Sadia Din, Muhammad Shoaib, E. Serpedin, 2025, Bioengineering)
- [Tooth segmentation and identification on cone-beam computed tomography with convolutional neural network based on spatial embedding information].(Shishi Bo, Chengzhi Gao, 2024, Beijing da xue xue bao. Yi xue ban = Journal of Peking University. Health sciences)
- Dynamic neighbourhood-enhanced UNet with interwoven fusion for medical image segmentation(Limin Wan, Lin Song, Ying Zhou, Chenrui Kang, Shijian Zheng, Guo Chen, 2025, The Visual Computer)
- UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation(Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yenwei Chen, Jian Wu, 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Attention Mechanism Enhanced Multi-layer Edge Perception Network for Deep Semantic Medical Segmentation(Meijun Sun, Pengfei Li, Jinchang Ren, Z. Wang, 2023, Cognitive Computation)
- DeepGraFT: A novel semantic segmentation auxiliary ROI-based deep learning framework for effective fundus tessellation classification.(Yinghao Yao, Jiaying Yang, Haojun Sun, Hengte Kong, Sheng Wang, Ke Xu, Wei Dai, Siyi Jiang, QingShi Bai, Shilai Xing, Jian Yuan, Xinting Liu, Fan Lu, Zhenhui Chen, Jia Qu, Jianzhong Su, 2024, Computers in biology and medicine)
- Multiple Sparse Representations Classification.(Esben Plenge, Stefan Klein, Wiro J Niessen, Erik Meijering, 2015, PloS one)
- Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver.(Jingchen Ma, Hao Yang, Yen Chou, Jin H Yoon, Tavis Allison, Ravikumar Komandur, J. McDunn, Asba Taneem, R K G Do, Lawrence H. Schwartz, Binsheng Zhao, 2024, Medical Physics)
- DAG-UNet: Dissimilarity-Aware Global Context Guided Lightweight UNet for Medical Image Segmentation(Ying He, Qianni Zhang, Marc E. Miquel, 2025, 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI))
- Modality-Agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention(Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, Sang-Chul Lee, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Research on Feature Detection Based on Convolutional Network with Deep Instance Segmentation(Miyu Fan, 2021, 2021 IEEE International Conference on Data Science and Computer Application (ICDSCA))
- Znet: Deep Learning Approach for 2D MRI Brain Tumor Segmentation.(Mohammad Ashraf Ottom, Hanif Abdul Rahman, Ivo D Dinov, 2022, IEEE journal of translational engineering in health and medicine)
- Attention residual network for medical ultrasound image segmentation.(Honghua Liu, Peiqin Zhang, Jiamin Hu, Yini Huang, Shanshan Zuo, Lu Li, Mailan Liu, Chang She, 2025, Scientific reports)
- ResUNet++: An Advanced Architecture for Medical Image Segmentation(Debesh Jha, P. Smedsrud, M. Riegler, Dag Johansen, Thomas de Lange, P. Halvorsen, H. Johansen, 2019, 2019 IEEE International Symposium on Multimedia (ISM))
- Automatic tooth instance segmentation and identification from panoramic X-Ray images using deep CNN(Walid Brahmi, Imen Jdey, 2023, Multimedia Tools and Applications)
- Segmentation of medical ultrasonic image using hybrid neural network.(T F Wang, D Y Li, C Q Zheng, Y Zheng, 2001, Hang tian yi xue yu yi xue gong cheng = Space medicine & medical engineering)
- Federated Learning with Research Prototypes: Application to Multi-Center MRI-based Detection of Prostate Cancer with Diverse Histopathology.(Abhejit Rajagopal, Ekaterina Redekop, Anil Kemisetti, Rushikesh Kulkarni, Steven Raman, Karthik Sarma, Kirti Magudia, Corey W Arnold, Peder E Z Larson, 2023, Academic radiology)
- Two-Stage Approach for Semantic Image Segmentation of Breast Cancer: Deep Learning and Mass Detection in Mammographic Images(Fayçal Touazi, Djamel Gaceb, Marouane Chirane, Selma Hrzallah, 2023, International Workshop on Informatics & Data-Driven Medicine)
- Dual-Task ConvLSTM-UNet for Instance Segmentation of Weakly Annotated Microscopy Videos(Assaf Arbelle, Shaked Cohen, T. R. Raviv, 2022, IEEE Transactions on Medical Imaging)
- E-DU: Deep neural network for multimodal medical image segmentation based on semantic gap compensation.(Haojia Wang, Xicheng Chen, Rui Yu, Zeliang Wei, Tianhua Yao, Chengcheng Gao, Yang Li, Zhenyan Wang, Dong Yi, Yazhou Wu, 2022, Computers in biology and medicine)
- Enhancing Medical Image Segmentation with a Lightweight Boundary-Aware Multitask Detection Head(Boliang Li, Yaming Xu, Yan Wang, Xiaoyang Li, 2024, 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE))
- Robust cardiac segmentation corrected with heuristics.(Alan Cervantes-Guzmán, Kyle McPherson, Jimena Olveres, Carlos Francisco Moreno-García, Fabián Torres Robles, Eyad Elyan, Boris Escalante-Ramírez, 2023, PloS one)
- Hierarchical Markov Random Fields for mast cell segmentation in electron microscopic recordings(Margret Keuper, Thorsten Schmidt, M. Rodríguez-Franco, W. Schamel, T. Brox, H. Burkhardt, O. Ronneberger, 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro)
- Efficient convolutional neural networks for pixelwise classification on heterogeneous hardware systems(Fabian Tschopp, Julien N. P. Martel, Srinivas C. Turaga, Matthew Cook, Jan Funke, 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI))
- MILD-Net: Minimal information loss dilated network for gland instance segmentation in colon histology images.(Simon Graham, Hao Chen, Jevgenij Gamper, Qi Dou, Pheng-Ann Heng, David Snead, Yee Wah Tsang, Nasir Rajpoot, 2019, Medical image analysis)
- Eres-UNet++: Liver CT image segmentation based on high-efficiency channel attention and Res-UNet+.(Jian Li, Kongyu Liu, Yating Hu, Hongchen Zhang, Ali Asghar Heidari, Huiling Chen, Weijiang Zhang, Abeer D Algarni, Hela Elmannai, 2023, Computers in biology and medicine)
- μ-Net: Medical image segmentation using efficient and effective deep supervision.(Di Yuan, Zhenghua Xu, Biao Tian, Hening Wang, Yuefu Zhan, Thomas Lukasiewicz, 2023, Computers in biology and medicine)
医学图像分割评估体系、通用基础模型与框架规范化
该组文献关注医学图像分割的基准测试标准、评价指标的科学性,以及在大规模数据集上构建可通用的 foundation 模型,旨在推动医学影像处理技术的标准化与规模化落地。
- Label Augmentation to Improve Generalization of Deep Learning Semantic Segmentation of Laparoscopic Images(Leticia Monasterio-Exposito, D. Pizarro, J. Macias-Guarasa, 2022, IEEE Access)
- A General Stitching Solution for Whole-Brain 3D Nuclei Instance Segmentation from Microscopy Images(Ziquan Wei, Tingting Dan, Jiaqi Ding, Mustafa Dere, Guorong Wu, 2023, Lecture Notes in Computer Science)
- Universal consensus 3D segmentation of cells from 2D segmented stacks.(Felix Y Zhou, Zach Marin, Clarence Yapp, Qiongjing Zou, Benjamin A Nanes, Stephan Daetwyler, Andrew R Jamieson, Md Torikul Islam, Edward Jenkins, Gabriel M Gihana, Jinlong Lin, Hazel M Borges, Bo-Jui Chang, Andrew Weems, Sean J Morrison, Peter K Sorger, Reto Fiolka, Kevin M Dean, Gaudenz Danuser, 2025, Nature methods)
- FDAS: Foundation Model Distillation and Anatomic Structure-Aware Multi-task Learning for Self-Supervised Medical Image Segmentation(Xiaoran Qi, Guoning Zhang, Jianghao Wu, Shaoting Zhang, Xiaorong Hou, Guotai Wang, 2025, Lecture Notes in Computer Science)
- Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool(A. Taha, A. Hanbury, 2015, BMC Medical Imaging)
- nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.(Fabian Isensee, Paul F Jaeger, Simon A A Kohl, Jens Petersen, Klaus H Maier-Hein, 2021, Nature methods)
- A minimum spanning forest based classification method for dedicated breast CT images.(Robert Pike, Ioannis Sechopoulos, Baowei Fei, 2015, Medical physics)
- Box2Mask: Box-Supervised Instance Segmentation via Level-Set Evolution.(Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang, 2024, IEEE transactions on pattern analysis and machine intelligence)
- Deep learning for head and neck semi-supervised semantic segmentation.(Shunyao Luan, Yi Ding, Jiakang Shao, Bing Zou, Xiao Yu, Nannan Qin, Benpeng Zhu, Wei Wei, Xudong Xue, 2024, Physics in medicine and biology)
- Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation(Ngan T. H. Le, Kha Gia Quach, Khoa Luu, M. Savvides, Chenchen Zhu, 2017, IEEE Transactions on Image Processing)
- An efficient neural network based method for medical image segmentation.(Nima Torbati, Ahmad Ayatollahi, Ali Kermani, 2014, Computers in biology and medicine)
- Rule and Neural Network-Based Image Segmentation of Mice Vertebrae Images.(Indeever Madireddy, Tongge Wu, 2022, Cureus)
- A Lightweight Deep Learning Architecture for Efficient Multimodal Medical Image Segmentation Using Attention Mechanism(Subah Nawar, Taslima Joty, M. Hashem, 2024, Proceedings of the 3rd International Conference on Computing Advancements)
- Skeleton Segmentation on Bone Scintigraphy for BSI Computation.(Po-Nien Yu, Yung-Chi Lai, Yi-You Chen, Da-Chuan Cheng, 2023, Diagnostics (Basel, Switzerland))
- PMFSNet: Polarized multi-scale feature self-attention network for lightweight medical image segmentation.(Jiahui Zhong, Wenhong Tian, Yuanlun Xie, Zhijia Liu, Jie Ou, Taoran Tian, Lei Zhang, 2025, Computer methods and programs in biomedicine)
- TBHF-Unet: Medical Image Segmentation Network Based on Three-Branch Hierarchical Fusion(Dayu Tan, Zhenpeng Xu, Tianhao Liu, Yansen Su, Chunhou Zheng, 2025, 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet(Minyoung Park, Seungtaek Oh, Junyoung Park, Taikyeong Jeong, Sungwook Yu, 2025, BMC Medical Imaging)
- Deep Learning Semantic Segmentation for High-Resolution Medical Volumes(Imad Eddine Toubal, Y. Duan, Deshan Yang, 2020, 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR))
- Attention adaptive instance normalization style transfer for vascular segmentation using deep learning(Supriti Mulay, Keerthi Ram, M. Sivaprakasam, 2023, Applied Intelligence)
本报告将医学图像分割领域的文献归纳为五大核心方向:(1)架构融合:探索Transformer、Mamba与CNN的协同,优化长程与局部特征感知;(2)交互与提示学习:基于SAM等大模型的医学专用化适配与交互式研究;(3)训练范式:解决标注匮乏问题的半/自监督、对比学习及领域适应研究;(4)临床应用优化:针对边界感知、轻量化与多任务协作的特定病理分割研究;(5)评估与标准化:构建通用的评估指标、基础模型与临床辅助框架。整体趋势表明领域已从单纯的模型架构调整,向更加严谨的临床场景落地、泛化性基础模型与标准化基准测试转型。
总计350篇相关文献
Image segmentation and computer vision are becoming more important in computer-aided design. A computer algorithm extracts image borders, colours, and textures. It also depletes resources. Technical knowledge is required to extract information about distinctive features. There is currently no medical picture segmentation or recognition software available. The proposed model has 13 layers and uses dilated convolution and max-pooling to extract small features. Ghost model deletes the duplicated features, makes the process easier, and reduces the complexity. The Convolution Neural Network (CNN) generates a feature vector map and improves the accuracy of area or bounding box proposals. Restructuring is required for healing. As a result, convolutional neural networks segment medical images. It is possible to acquire the beginning region of a segmented medical image. The proposed model gives better results as compared to the traditional models, it gives an accuracy of 96.05, Precision 98.2, and recall 95.78. The first findings are improved by thickening and categorising the image's pixels. Morphological techniques may be used to segment medical images. Experiments demonstrate that the recommended segmentation strategy is effective. This study rethinks medical image segmentation methods.
Fully-supervised deep learning segmentation models are inflexible when encountering new unseen semantic classes and their fine-tuning often requires significant amounts of annotated data. Few-shot semantic segmentation (FSS) aims to solve this inflexibility by learning to segment an arbitrary unseen semantically meaningful class by referring to only a few labeled examples, without involving fine-tuning. State-of-the-art FSS methods are typically designed for segmenting natural images and rely on abundant annotated data of training classes to learn image representations that generalize well to unseen testing classes. However, such a training mechanism is impractical in annotation-scarce medical imaging scenarios. To address this challenge, in this work, we propose a novel self-supervised FSS framework for medical images, named SSL-ALPNet, in order to bypass the requirement for annotations during training. The proposed method exploits superpixel-based pseudo-labels to provide supervision signals. In addition, we propose a simple yet effective adaptive local prototype pooling module which is plugged into the prototype networks to further boost segmentation accuracy. We demonstrate the general applicability of the proposed approach using three different tasks: organ segmentation of abdominal CT and MRI images respectively, and cardiac segmentation of MRI images. The proposed method yields higher Dice scores than conventional FSS methods which require manual annotations for training in our experiments.
Breast cancer is the most commonly diagnosed cancer, which alone accounts for 30% all new cancer diagnoses for women, posing a threat to women's health. Segmentation of breast ultrasound images into functional tissues can aid tumor localization, breast density measurement, and assessment of treatment response, which is important to the clinical diagnosis of breast cancer. However, manually segmenting the ultrasound images, which is skill and experience dependent, would lead to a subjective diagnosis; in addition, it is time-consuming for radiologists to review hundreds of clinical images. Therefore, automatic segmentation of breast ultrasound images into functional tissues has received attention in recent years, amidst the more numerous studies of detection and segmentation of masses. In this paper, we propose to use convolutional neural networks (CNNs) for segmenting breast ultrasound images into four major tissues: skin, fibroglandular tissue, mass, and fatty tissue, on three-dimensional (3D) breast ultrasound images. Quantitative metrics for evaluation of segmentation results including Accuracy, Precision, Recall, and F1
Neural architecture search (NAS) has been applied to design proper 3D networks for medical image segmentation. In order to reduce the computation cost in NAS, researchers tend to adopt weight sharing mechanism to search architectures in a supernet. However, recent studies state that the searched architecture rankings may not be accurate with weight sharing mechanism because the training situations are inconsistent between the searching and training phases. In addition, some NAS algorithms design inflexible supernets that only search operators in a pre-defined backbone and ignore the importance of network topology, which limits the performance of searched architecture. To avoid weight sharing mechanism which may lead to inaccurate results and to comprehensively search network topology and operators, we propose a novel NAS algorithm called NG-NAS. Following the previous studies, we consider the segmentation network as a U-shape structure composed of a set of nodes. Instead of searching from the supernet with a limited search space, our NG-NAS starts from a simple architecture with only 5 nodes, and greedily grows the best candidate node until meeting the constraint. We design 2 kinds of node generations to form various network topological structures and prepare 4 candidate operators for each node. To efficiently evaluate candidate node generations, we use NAS without training strategies. We evaluate our method on several public 3D medical image segmentation benchmarks and achieve state-of-the-art performance, demonstrating the effectiveness of the searched architecture and our NG-NAS. Concretely, our method achieves an average Dice score of 85.11 on MSD liver, 65.70 on MSD brain, and 87.59 in BTCV, which performs much better than the previous SOTA methods.
In accordance with the inability of various hair artefacts subjected to dermoscopic medical images, undergoing illumination challenges that include chest-Xray featuring conditions of imaging acquisi-tion situations built with clinical segmentation. The study proposed a novel deep-convolutional neural network (CNN)-integrated methodology for applying medical image segmentation upon chest-Xray and dermoscopic clinical images. The study develops a novel technique of segmenting medical images merged with CNNs with an architectural comparison that incorporates neural networks of U-net and fully convolutional networks (FCN) schemas with loss functions associated with Jaccard distance and Binary-cross entropy under optimised stochastic gradient descent + Nesterov practices. Digital image over clinical approach significantly built the diagnosis and determination of the best treatment for a patient's condition. Even though medical digital images are subjected to varied components clarified with the effect of noise, quality, disturbance, and precision depending on the enhanced version of images segmented with the optimised process. Ultimately, the threshold technique has been employed for the output reached under the pre- and post-processing stages to contrast the image technically being developed. The data source applied is well-known in PH
The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To ensure the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL), which entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation). The code is publicly available at https://sam-auxol.github.io/AuxOL/.
Automatic image segmentation plays an important role in the fields of medical image processing so that these fields constantly put forward higher requirements for the accuracy and speed of segmentation. In order to improve the speed and performance of the segmentation algorithm of medical images, we propose a medical image segmentation algorithm based on simple non-iterative clustering (SNIC). Firstly, obtain the feature map of the image by extracting the texture information of it with feature extraction algorithm; Secondly, reduce the image to a quarter of the original image size by downscaling; Then, the SNIC super-pixel algorithm with texture information and adaptive parameters which used to segment the downscaling image to obtain the superpixel mark map; Finally, restore the superpixel labeled image to the original size through the idea of the nearest neighbor algorithm. Experimental results show that the algorithm uses an improved superpixel segmentation method on downscaling images, which can increase the segmentation speed when segmenting medical images, while ensuring excellent segmentation accuracy.
Our work expands the use of capsule networks to the task of object segmentation for the first time in the literature. This is made possible via the introduction of locally-constrained routing and transformation matrix sharing, which reduces the parameter/memory burden and allows for the segmentation of objects at large resolutions. To compensate for the loss of global information in constraining the routing, we propose the concept of "deconvolutional" capsules to create a deep encoder-decoder style network, called SegCaps. We extend the masked reconstruction regularization to the task of segmentation and perform thorough ablation experiments on each component of our method. The proposed convolutional-deconvolutional capsule network, SegCaps, shows state-of-the-art results while using a fraction of the parameters of popular segmentation networks. To validate our proposed method, we perform experiments segmenting pathological lungs from clinical and pre-clinical thoracic computed tomography (CT) scans and segmenting muscle and adipose (fat) tissue from magnetic resonance imaging (MRI) scans of human subjects' thighs. Notably, our experiments in lung segmentation represent the largest-scale study in pathological lung segmentation in the literature, where we conduct experiments across five extremely challenging datasets, containing both clinical and pre-clinical subjects, and nearly 2000 computed-tomography scans. Our newly developed segmentation platform outperforms other methods across all datasets while utilizing less than 5% of the parameters in the popular U-Net for biomedical image segmentation. Further, we demonstrate capsules' ability to generalize to unseen handling of rotations/reflections on natural images.
Producing high-quality segmentation masks for medical images is a fundamental challenge in biomedical image analysis. Recent research has investigated the use of supervised learning with large volumes of labeled data to improve segmentation across medical imaging modalities and unsupervised learning with unlabeled data to segment without detailed annotations. However, a significant hurdle remains in constructing a model that can segment diverse medical images in a zero-shot manner without any annotations. In this work, we introduce the attention diffusion zero-shot unsupervised system (ADZUS), a new method that uses self-attention diffusion models to segment biomedical images without needing any prior labels. This method combines self-attention mechanisms to enable context-aware and detail-sensitive segmentations, with the strengths of the pre-trained diffusion model. The experimental results show that ADZUS outperformed state-of-the-art models on various medical imaging datasets, such as skin lesions, chest X-ray infections, and white blood cell segmentations. The model demonstrated significant improvements by achieving Dice scores ranging from 88.7% to 92.9% and IoU scores from 66.3% to 93.3%. The success of the ADZUS model in zero-shot settings could lower the costs of labeling data and help it adapt to new medical imaging tasks, improving the diagnostic capabilities of AI-based medical imaging technologies.
After a CellSearch-processed circulating tumor cell (CTC) sample is imaged, a segmentation algorithm selects nucleic acid positive (DAPI+), cytokeratin-phycoerythrin expressing (CK-PE+) events for further review by an operator. Failures in this segmentation can result in missed CTCs. The CellSearch segmentation algorithm was not designed to handle samples with high cell density, such as diagnostic leukapheresis (DLA) samples. Here, we evaluate deep-learning-based segmentation method StarDist as an alternative to the CellSearch segmentation. CellSearch image archives from 533 whole blood samples and 601 DLA samples were segmented using CellSearch and StarDist and inspected visually. In 442 blood samples from cancer patients, StarDist segmented 99.95% of CTC segmented by CellSearch, produced good outlines for 98.3% of these CTC, and segmented 10% more CTC than CellSearch. Visual inspection of the segmentations of DLA images showed that StarDist continues to perform well when the cell density is very high, whereas CellSearch failed and generated extremely large segmentations (up to 52% of the sample surface). Moreover, in a detailed examination of seven DLA samples, StarDist segmented 20% more CTC than CellSearch. Segmentation is a critical first step for CTC enumeration in dense samples and StarDist segmentation convincingly outperformed CellSearch segmentation.
The thalamus and its nuclei are largely indistinguishable on standard T1 or T2 weighted MRI. While diffusion tensor imaging based methods have been proposed to segment the thalamic nuclei based on the angular orientation of the principal diffusion tensor, these are based on echo planar imaging which is inherently limited in spatial resolution and suffers from distortion. We present a multi-atlas segmentation technique based on white-matter-nulled MP-RAGE imaging that segments the thalamus into 12 nuclei with computation times on the order of 10 min on a desktop PC; we call this method THOMAS (THalamus Optimized Multi Atlas Segmentation). THOMAS was rigorously evaluated on 7T MRI data acquired from healthy volunteers and patients with multiple sclerosis by comparing against manual segmentations delineated by a neuroradiologist, guided by the Morel atlas. Segmentation accuracy was very high, with uniformly high Dice indices: at least 0.85 for large nuclei like the pulvinar and mediodorsal nuclei and at least 0.7 even for small structures such as the habenular, centromedian, and lateral and medial geniculate nuclei. Volume similarity indices ranged from 0.82 for the smaller nuclei to 0.97 for the larger nuclei. Volumetry revealed that the volumes of the right anteroventral, right ventral posterior lateral, and both right and left pulvinar nuclei were significantly lower in MS patients compared to controls, after adjusting for age, sex and intracranial volume. Lastly, we evaluated the potential of this method for targeting the Vim nucleus for deep brain surgery and focused ultrasound thalamotomy by overlaying the Vim nucleus segmented from pre-operative data on post-operative data. The locations of the ablated region and active DBS contact corresponded well with the segmented Vim nucleus. Our fast, direct structural MRI based segmentation method opens the door for MRI guided intra-operative procedures like thalamotomy and asleep DBS electrode placement as well as for accurate quantification of thalamic nuclear volumes to follow progression of neurological disorders.
We introduce a method for training neural networks to perform image or volume segmentation in which prior knowledge about the topology of the segmented object can be explicitly provided and then incorporated into the training process. By using the differentiable properties of persistent homology, a concept used in topological data analysis, we can specify the desired topology of segmented objects in terms of their Betti numbers and then drive the proposed segmentations to contain the specified topological features. Importantly this process does not require any ground-truth labels, just prior knowledge of the topology of the structure being segmented. We demonstrate our approach in four experiments: one on MNIST image denoising and digit recognition, one on left ventricular myocardium segmentation from magnetic resonance imaging data from the UK Biobank, one on the ACDC public challenge dataset and one on placenta segmentation from 3-D ultrasound. We find that embedding explicit prior knowledge in neural network segmentation tasks is most beneficial when the segmentation task is especially challenging and that it can be used in either a semi-supervised or post-processing context to extract a useful training gradient from images without pixelwise labels.
Magnetic resonance imaging (MRI) is the gold standard for delineating cancerous lesions in soft tissue. Catheter-based interventions require the accurate placement of multiple long, flexible catheters at the target site. The manual segmentation of catheters in MR images is a challenging and time-consuming task. There is a need for automated catheter segmentation to improve the efficiency of MR-guided procedures. To develop and assess a machine learning algorithm for the detection of multiple catheters in magnetic resonance images used during catheter-based interventions. In this work, a 3D U-Net was trained to retrospectively segment catheters in scans acquired during clinical MR-guided high dose rate (HDR) prostate brachytherapy cases. To assess confidence in segmentation, multiple AI models were trained. On clinical test cases, average segmentation results were used to plan the brachytherapy delivery. Dosimetric parameters were compared to the original clinical plan. Data was obtained from 35 patients who underwent HDR prostate brachytherapy for focal disease with a total of 214 image volumes. 185 image volumes from 30 patients were used for training using a five-fold cross validation split to divide the data for training and validation. To generate confidence measures of segmentation accuracy, five trained models were generated. The remaining five patients (29 volumes) were used to test the performance of the trained model by comparison to manual segmentations of three independent observers and assessment of dosimetric impact on the final clinical brachytherapy plans. The network successfully identified 95% of catheters in the test set at a rate of 0.89 s per volume. The multi-model method identified the small number of cases where AI segmentation of individual catheters was poor, flagging the need for user input. AI-based segmentation performed as well as segmentations by independent observers. Plan dosimetry using AI-segmented catheters was comparable to the original plan. The vast majority of catheters were accurately identified by AI segmentation, with minimal impact on plan outcomes. The use of multiple AI models provided confidence in the segmentation accuracy and identified catheter segmentations that required further manual assessment. Real-time AI catheter segmentation can be used during MR-guided insertions to assess deflections and for rapid planning of prostate brachytherapy.
The aim of this research is to propose a new neural network based method for medical image segmentation. Firstly, a modified self-organizing map (SOM) network, named moving average SOM (MA-SOM), is utilized to segment medical images. After the initial segmentation stage, a merging process is designed to connect the objects of a joint cluster together. A two-dimensional (2D) discrete wavelet transform (DWT) is used to build the input feature space of the network. The experimental results show that MA-SOM is robust to noise and it determines the input image pattern properly. The segmentation results of breast ultrasound images (BUS) demonstrate that there is a significant correlation between the tumor region selected by a physician and the tumor region segmented by our proposed method. In addition, the proposed method segments X-ray computerized tomography (CT) and magnetic resonance (MR) head images much better than the incremental supervised neural network (ISNN) and SOM-based methods.
Medical image segmentation is a paramount task for several clinical applications, namely for the diagnosis of pathologies, for treatment planning, and for aiding image-guided surgeries. With the development of deep learning, Convolutional Neural Networks (CNN) have become the state-of-the-art for medical image segmentation. However, issues are still raised concerning the precise object boundary delineation, since traditional CNNs can produce non-smooth segmentations with boundary discontinuities. In this work, a U-shaped CNN architecture is proposed to generate both pixel-wise segmentation and probabilistic contour maps of the object to segment, in order to generate reliable segmentations at the object's boundaries. Moreover, since the segmentation and contour maps must be inherently related to each other, a dual consistency loss that relates the two outputs of the network is proposed. Thus, the network is enforced to consistently learn the segmentation and contour delineation tasks during the training. The proposed method was applied and validated on a public dataset of cardiac 3D ultrasound images of the left ventricle. The results obtained showed the good performance of the method and its applicability for the cardiac dataset, showing its potential to be used in clinical practice for medical image segmentation.Clinical Relevance- The proposed network with dual consistency loss scheme can improve the performance of state-of-the-art CNNs for medical image segmentation, proving its value to be applied for computer-aided diagnosis.
Watershed algorithm is widely used in image segmentation, but it has oversegmentation in image segmentation. Therefore, an image segmentation algorithm based on
Numerous algorithms are available for segmenting medical images. Empirical discrepancy metrics are commonly used in measuring the similarity or difference between segmentations by algorithms and "true" segmentations. However, one issue with the commonly used metrics is that the same metric value often represents different levels of "clinical acceptability" for different objects depending on their size, shape, and complexity of form. An ideal segmentation evaluation metric should be able to reflect degrees of acceptability directly from metric values and be able to show the same acceptability meaning by the same metric value for objects of different shape, size, and form. Intuitively, metrics which have a linear relationship with degree of acceptability will satisfy these conditions of the ideal metric. This issue has not been addressed in the medical image segmentation literature. In this paper, we propose a method called LinSEM for linearizing commonly used segmentation evaluation metrics based on corresponding degrees of acceptability evaluated by an expert in a reader study. LinSEM consists of two main parts: (a) estimating the relationship between metric values and degrees of acceptability separately for each considered metric and object, and (b) linearizing any given metric value corresponding to a given segmentation of an object based on the estimated relationship. Since algorithmic segmentations do not usually cover the full range of variability of acceptability, we create a set (S
Background Image segmentation is a fundamental technique that allows researchers to process images from various sources into individual components for certain applications, such as visual or numerical evaluations. Image segmentation is beneficial when studying medical images for healthcare purposes. However, existing semantic image segmentation models like the U-net are computationally intensive. This work aimed to develop less complicated models that could still accurately segment images. Methodology Rule-based and linear layer neural network models were developed in Mathematica and trained on mouse vertebrae micro-computed tomography scans. These models were tasked with segmenting the cortical shell from the whole bone image. A U-net model was also set up for comparison. Results It was found that the linear layer neural network had comparable accuracy to the U-net model in segmenting the mice vertebrae scans. Conclusions This work provides two separate models that allow for automated segmentation of mouse vertebral scans, which could be potentially valuable in applications such as pre-processing the murine vertebral scans for further evaluations of the effect of drug treatment on bone micro-architecture.
Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness, efficiency, and compatibility, we propose a three-stage framework, i.e., Stitching, Fine-tuning, and Re-training (SFR). The current fine-tuning approaches mostly involve 2D slice-wise fine-tuning that disregards the contextual information between adjacent slices. Our stitching strategy mitigates the mismatch between natural and 3D medical images. The stitched images are then used for fine-tuning SAM, providing robust initialization of pseudo-labels. Afterwards, we train a 3D semi-supervised segmentation model while maintaining the same parameter size as the conventional segmenter such as V-Net. Our SFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods. We also develop an extended framework SFR+ with selective fine-tuning and re-training through confidence estimation. Extensive experiments validate that our SFR and SFR+ achieve significant improvements in both moderate annotation and scarce annotation across five datasets. In particular, SFR framework improves the Dice score of Mean Teacher from 29.68% to 74.40% with only one labeled data of LA dataset. The code is available at https://github.com/ShumengLI/SFR.
The analysis and segmentation of articular cartilage magnetic resonance (MR) images belongs to one of the most commonly routine tasks in diagnostics of the musculoskeletal system of the knee area. Conventional regional segmentation methods, which are based either on the histogram partitioning (e.g., Otsu method) or clustering methods (e.g., K-means), have been frequently used for the task of regional segmentation. Such methods are well known as fast and well working in the environment, where cartilage image features are reliably recognizable. The well-known fact is that the performance of these methods is prone to the image noise and artefacts. In this context, regional segmentation strategies, driven by either genetic algorithms or selected evolutionary computing strategies, have the potential to overcome these traditional methods such as Otsu thresholding or K-means in the context of their performance. These optimization strategies consecutively generate a pyramid of a possible set of histogram thresholds, of which the quality is evaluated by using the fitness function based on Kapur's entropy maximization to find the most optimal combination of thresholds for articular cartilage segmentation. On the other hand, such optimization strategies are often computationally demanding, which is a limitation of using such methods for a stack of MR images. In this study, we publish a comprehensive analysis of the optimization methods based on fuzzy soft segmentation, driven by artificial bee colony (ABC), particle swarm optimization (PSO), Darwinian particle swarm optimization (DPSO), and a genetic algorithm for an optimal thresholding selection against the routine segmentations Otsu and K-means for analysis and the features extraction of articular cartilage from MR images. This study objectively analyzes the performance of the segmentation strategies upon variable noise with dynamic intensities to report a segmentation's robustness in various image conditions for a various number of segmentation classes (4, 7, and 10), cartilage features (area, perimeter, and skeleton) extraction preciseness against the routine segmentation strategies, and lastly the computing time, which represents an important factor of segmentation performance. We use the same settings on individual optimization strategies: 100 iterations and 50 population. This study suggests that the combination of fuzzy thresholding with an ABC algorithm gives the best performance in the comparison with other methods as from the view of the segmentation influence of additive dynamic noise influence, also for cartilage features extraction. On the other hand, using genetic algorithms for cartilage segmentation in some cases does not give a good performance. In most cases, the analyzed optimization strategies significantly overcome the routine segmentation methods except for the computing time, which is normally lower for the routine algorithms. We also publish statistical tests of significance, showing differences in the performance of individual optimization strategies against Otsu and K-means method. Lastly, as a part of this study, we publish a software environment, integrating all the methods from this study.
Anatomical image segmentation is one of the foundations for medical planning. Recently, convolutional neural networks (CNN) have achieved much success in segmenting volumetric (3D) images when a large number of fully annotated 3D samples are available. However, rarely a volumetric medical image dataset containing a sufficient number of segmented 3D images is accessible since providing manual segmentation masks is monotonous and time-consuming. Thus, to alleviate the burden of manual annotation, we attempt to effectively train a 3D CNN using a sparse annotation where ground truth on just one 2D slice of the axial axis of each training 3D image is available. To tackle this problem, we propose a self-training framework that alternates between two steps consisting of assigning pseudo annotations to unlabeled voxels and updating the 3D segmentation network by employing both the labeled and pseudo labeled voxels. To produce pseudo labels more accurately, we benefit from both propagation of labels (or pseudo-labels) between adjacent slices and 3D processing of voxels. More precisely, a 2D registration-based method is proposed to gradually propagate labels between consecutive 2D slices and a 3D U-Net is employed to utilize volumetric information. Ablation studies on benchmarks show that cooperation between the 2D registration and the 3D segmentation provides accurate pseudo-labels that enable the segmentation network to be trained effectively when for each training sample only even one segmented slice by an expert is available. Our method is assessed on the CHAOS and Visceral datasets to segment abdominal organs. Results demonstrate that despite utilizing just one segmented slice for each 3D image (that is weaker supervision in comparison with the compared weakly supervised methods) can result in higher performance and also achieve closer results to the fully supervised manner.
Objective. To solve one of the most difficult problems in multi-dimensional reconstruction of medical ultrasonic images: image segmentation. Method. A new segmental method based on hybrid neural network was presented in this paper. The hybrid neural network comprised two phases. The first phase was Kohonens self-organization neural network, which was used to segment and label the image coarsely. The feature vectors of those pixels within a specified distance from the cluster centers were employed to train the second phase--a three-layer perception network using back-propagation (BP) technique. Then the trained BP network was used to label every pixel of the image. In the end, a post-processing stage was used to remove the small isolated points and smooth out the contours of the segmented image. Result. The segmented image had smooth continuous edges, few noises or speckles, and the contour of ventricle was clear and accurate. Conclusion. Our method could segment the ultrasonic images accurately and effectively, and had a lot of advantages compared to traditional methods. The unsupervised segmentation problems could be solved using supervised methods.
The objective of this study was to develop an automated segmentation method for the anterior cruciate ligament that is capable of facilitating quantitative assessments of the ligament in clinical and research settings. A modified U-Net fully convolutional network model was trained, validated, and tested on 246 Constructive Interference in Steady State magnetic resonance images of intact anterior cruciate ligaments. Overall model performance was assessed on the image set relative to an experienced (>5 years) "ground truth" segmenter in two domains: anatomical similarity and the accuracy of quantitative measurements (i.e., signal intensity and volume) obtained from the automated segmentation. To establish model reliability relative to manual segmentation, a subset of the imaging data was resegmented by the ground truth segmenter and two additional segmenters (A, 6 months and B, 2 years of experience), with their performance evaluated relative to the ground truth. The final model scored well on anatomical performance metrics (Dice coefficient = 0.84, precision = 0.82, and sensitivity = 0.85). The median signal intensities and volumes of the automated segmentations were not significantly different from ground truth (0.3% difference, p = .9; 2.3% difference, p = .08, respectively). When the model results were compared with the independent segmenters, the model predictions demonstrated greater median Dice coefficient (A = 0.73, p = .001; B = 0.77, p = NS) and sensitivity (A = 0.68, p = .001; B = 0.72, p = .003). The model performed equivalently well to retest segmentation by the ground truth segmenter on all measures. The quantitative measures extracted from the automated segmentation model did not differ from those of manual segmentation, enabling their use in quantitative magnetic resonance imaging pipelines to evaluate the anterior cruciate ligament.
Cardiovascular diseases related to the right side of the heart, such as Pulmonary Hypertension, are some of the leading causes of death among the Mexican (and worldwide) population. To avoid invasive techniques such as catheterizing the heart, improving the segmenting performance of medical echocardiographic systems can be an option to early detect diseases related to the right-side of the heart. While current medical imaging systems perform well segmenting automatically the left side of the heart, they typically struggle segmenting the right-side cavities. This paper presents a robust cardiac segmentation algorithm based on the popular U-NET architecture capable of accurately segmenting the four cavities with a reduced training dataset. Moreover, we propose two additional steps to improve the quality of the results in our machine learning model, 1) a segmentation algorithm capable of accurately detecting cone shapes (as it has been trained and refined with multiple data sources) and 2) a post-processing step which refines the shape and contours of the segmentation based on heuristics provided by the clinicians. Our results demonstrate that the proposed techniques achieve segmentation accuracy comparable to state-of-the-art methods in datasets commonly used for this practice, as well as in datasets compiled by our medical team. Furthermore, we tested the validity of the post-processing correction step within the same sequence of images and demonstrated its consistency with manual segmentations performed by clinicians.
Accurate segmentation of brain tumors, and quantification of tumor volume, is important for diagnosis, monitoring, and planning therapeutic intervention. Manual segmentation is not widely used because of time constraints. Previous efforts have mainly produced methods that are tailored to a particular type of tumor or acquisition protocol and have mostly failed to produce a method that functions on different tumor types and is robust to changes in scanning parameters, resolution, and image quality, thereby limiting their clinical value. Herein, we present a semiautomatic method for tumor segmentation that is fast, accurate, and robust to a wide variation in image quality and resolution. A semiautomatic segmentation method based on the geodesic distance transform was developed and validated by using it to segment 54 brain tumors. Glioblastomas, meningiomas, and brain metastases were segmented. Qualitative validation was based on physician ratings provided by three clinical experts. Quantitative validation was based on comparing semiautomatic and manual segmentations. Tumor segmentations obtained using manual and automatic methods were compared quantitatively using the Dice measure of overlap. Subjective evaluation was performed by having human experts rate the computerized segmentations on a 0-5 rating scale where 5 indicated perfect segmentation. The proposed method addresses a significant, unmet need in the field of neuro-oncology. Specifically, this method enables clinicians to obtain accurate and reproducible tumor volumes without the need for manual segmentation.
Clinically, physicians diagnose portal vein diseases on abdominal CT angiography (CTA) images scanned in the hepatic arterial phase (H-phase), portal vein phase (P-phase) and equilibrium phase (E-phase) simultaneously. However, existing studies typically segment the portal vein on P-phase images without considering other phase images. We propose a method for segmenting portal veins on multiphase images based on unsupervised domain transfer and pseudo labels by using annotated P-phase images. Firstly, unsupervised domain transfer is performed to make the H-phase and E-phase images of the same patient approach the P-phase image in style, reducing the image differences caused by contrast media. Secondly, the H-phase (or E-phase) image and its style transferred image are input into the segmentation module together with the P-phase image. Under the constraints of pseudo labels, accurate prediction results are obtained. This method was evaluated on the multiphase CTA images of 169 patients. The portal vein segmented from the H-phase and E-phase images achieved DSC values of 0.76 and 0.86 and Jaccard values of 0.61 and 0.76, respectively. The method can automatically segment the portal vein on H-phase and E-phase images when only the portal vein on the P-phase CTA image is annotated, which greatly assists in clinical diagnosis.
Image segmentation is a fundamental problem in early computer vision. In segmentation of flat shaded, nontextured objects in real-world images, objects are usually assumed to be piecewise homogeneous. This assumption, however, is not always valid with images such as medical images. As a result, any techniques based on this assumption may produce less-than-satisfactory image segmentation. In this work, we relax the piecewise homogeneous assumption. By assuming that the intensity nonuniformity is smooth in the imaged objects, a novel algorithm that exploits the coherence in the intensity profile to segment objects is proposed. The algorithm uses a novel smoothness prior to improve the quality of image segmentation. The formulation of the prior is based on the coherence of the local structural orientation in the image. The segmentation process is performed in a Bayesian framework. Local structural orientation estimation is obtained with an orientation tensor. Comparisons between the conventional Hessian matrix and the orientation tensor have been conducted. The experimental results on the synthetic images and the real-world images have indicated that our novel segmentation algorithm produces better segmentations than both the global thresholding with the maximum likelihood estimation and the algorithm with the multilevel logistic MRF model.
Image guided surgical navigation system is the most advanced surgical apparatus, which develops most rapidly and has great application prospects in neurosurgery, orthopaedics, E.N.T. department etc. In current surgical navigation systems, windowing, segmenting and registration of medical images all depend on manual operation, and automation of image processing is urgently needed. This paper proposes the algorithm which realizes very well automatic windowing and segmentation of medical images: first, we analyze a lot of MRI and CT images and propose corresponding windowing algorithm according to their common features of intensity distribution. Experiments show that the effects of windowing of most MRI and CT images are optimized. Second, we propose the seed growing algorithm based on intensity connectivity,which can segment tumor and its boundary exactly by simply clicking the mouse, and control dynamically the results in real time. If computer memory permits, the algorithm can segment 3D images directly. Tests show that this function is able to shorten the time of surgical planning, lower the complexity, and improve the efficiency in navigation surgery.
Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and sparsity level.
Advancements in deep learning techniques have proved useful in biomedical image segmentation. However, the large amount of unlabeled data inherent in biomedical imagery, particularly in digital pathology, creates a semi-supervised learning paradigm. Specifically, because of the time consuming nature of producing pixel-wise annotations and the high cost of having a pathologist dedicate time to labeling, there is a large amount of unlabeled data that we wish to utilize in training segmentation algorithms. Pseudo-labeling is one method to leverage the unlabeled data to increase overall model performance. We adapt a method used for image classification pseudo-labeling to select images for segmentation pseudo-labeling and apply it to 3 digital pathology datasets. To select images for pseudo-labeling, we create and explore different thresholds for confidence and uncertainty on an image level basis. Furthermore, we study the relationship between image-level uncertainty and confidence with model performance. We find that the certainty metrics do not consistently correlate with performance intuitively, and abnormal correlations serve as an indicator of a model's ability to produce pseudo-labels that are useful in training. Clinical relevance - The proposed approach adapts image-level confidence and uncertainty measures for segmentation pseudo-labeling on digital pathology datasets. Increased model performance enables better disease quantification for histopathology.
Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.
Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto's thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model.
Automatic chest X-ray (CXR) disease classification has drawn increasing public attention as CXR is widely used in thoracic disease diagnosis. Existing classification networks typically employ a global average pooling layer to produce the final feature for the subsequent classifier. This limits the classification performance owing to the characteristics of lesions in CXR images, including small relative sizes, varied absolute sizes, and different occurrence locations. In this study, we propose a pixel-wise classification and attention network (PCAN) to simultaneously perform disease classification and weakly supervised localization, which provides interpretability for disease classification. The PCAN comprises a backbone network for extracting mid-level features, a pixel-wise classification branch (pc-branch) for generating pixel-wise diagnoses, and a pixel-wise attention branch (pa-branch) for producing pixel-wise weights. The pc-branch is capable of explicitly detecting small lesions, and the pa-branch is capable of adaptively focusing on different regions when classifying different thoracic diseases. Then, the pixel-wise diagnoses are multiplied with the pixel-wise weights to obtain the disease localization map, which provides the sizes and locations of lesions in a manner of weakly supervised learning. The final image-wise diagnosis is obtained by summing up the disease localization map at the spatial dimension. Comprehensive experiments conducted on the ChestX-ray14 and CheXpert datasets demonstrate the effectiveness of the proposed PCAN, which has great potential for thoracic disease diagnosis and treatment. The source codes are available at https://github.com/fzfs/PCAN.
Bone Scan Index (BSI) is an image biomarker for quantifying bone metastasis of cancers. To compute BSI, not only the hotspots (metastasis) but also the bones have to be segmented. Most related research focus on binary classification in bone scintigraphy: having metastasis or none. Rare studies focus on pixel-wise segmentation. This study compares three advanced convolutional neural network (CNN) based models to explore bone segmentation on a dataset in-house. The best model is Mask R-CNN, which reaches the precision, sensitivity, and F1-score: 0.93, 0.87, 0.90 for prostate cancer patients and 0.92, 0.86, and 0.88 for breast cancer patients, respectively. The results are the average of 10-fold cross-validation, which reveals the reliability of clinical use on bone segmentation.
Digital pathology adoption allows for applying computational algorithms to routine pathology tasks. Our study aimed to develop a clinical-grade artificial intelligence (AI) tool for precise multiclass tissue segmentation in colorectal specimens (resections and biopsies) and clinically validate the tool for tumor detection in biopsy specimens. The training data set included 241 precisely manually annotated whole-slide images (WSIs) from multiple institutions. The algorithm was trained for semantic segmentation of 11 tissue classes with an additional module for biopsy WSI classification. Six case cohorts from 5 pathology departments (4 countries) were used for formal and clinical validation, digitized by 4 different scanning systems. The developed algorithm showed high precision of segmentation of different tissue classes in colorectal specimens with composite multiclass Dice score of up to 0.895 and pixel-wise tumor detection specificity and sensitivity of up to 0.958 and 0.987, respectively. In the clinical validation study on multiple external cohorts, the AI tool reached sensitivity of 1.0 and specificity of up to 0.969 for tumor detection in biopsy WSI. The AI tool analyzes most biopsy cases in less than 1 minute, allowing effective integration into clinical routine. We developed and extensively validated a highly accurate, clinical-grade tool for assistive diagnostic processing of colorectal specimens. This tool allows for quantitative deciphering of colorectal cancer tissue for development of prognostic and predictive biomarkers and personalization of oncologic care. This study is a foundation for a SemiCOL computational challenge. We open-source multiple manually annotated and weakly labeled test data sets, representing a significant contribution to the colorectal cancer computational pathology field.
Thyroid nodule segmentation and classification in ultrasound images are two essential but challenging tasks for computer-aided diagnosis of thyroid nodules. Since these two tasks are inherently related to each other and sharing some common features, solving them jointly with multi-task leaning is a promising direction. However, both previous studies and our experimental results confirm the problem of inconsistent predictions among these related tasks. In this paper, we summarize two types of task inconsistency according to the relationship among different tasks: intra-task inconsistency between homogeneous tasks (e.g., both tasks are pixel-wise segmentation tasks); and inter-task inconsistency between heterogeneous tasks (e.g., pixel-wise segmentation task and categorical classification task). To address the task inconsistency problems, we propose intra- and inter-task consistent learning on top of the designed multi-stage and multi-task learning network to enforce the network learn consistent predictions for all the tasks during network training. Our experimental results based on a large clinical thyroid ultrasound image dataset indicate that the proposed intra- and inter-task consistent learning can effectively eliminate both types of task inconsistency and thus improve the performance of all tasks for thyroid nodule segmentation and classification.
Validation of automated artificial intelligence segmentation of optical coherence tomography images.
To benchmark the human and machine performance of spectral-domain (SD) and swept-source (SS) optical coherence tomography (OCT) image segmentation, i.e., pixel-wise classification, for the compartments vitreous, retina, choroid, sclera. A convolutional neural network (CNN) was trained on OCT B-scan images annotated by a senior ground truth expert retina specialist to segment the posterior eye compartments. Independent benchmark data sets (30 SDOCT and 30 SSOCT) were manually segmented by three classes of graders with varying levels of ophthalmic proficiencies. Nine graders contributed to benchmark an additional 60 images in three consecutive runs. Inter-human and intra-human class agreement was measured and compared to the CNN results. The CNN training data consisted of a total of 6210 manually segmented images derived from 2070 B-scans (1046 SDOCT and 1024 SSOCT; 630 C-Scans). The CNN segmentation revealed a high agreement with all grader groups. For all compartments and groups, the mean Intersection over Union (IOU) score of CNN compartmentalization versus group graders' compartmentalization was higher than the mean score for intra-grader group comparison. The proposed deep learning segmentation algorithm (CNN) for automated eye compartment segmentation in OCT B-scans (SDOCT and SSOCT) is on par with manual segmentations by human graders.
To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting model used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors' classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging.
No abstract
No abstract
Early prostate cancer detection and staging from MRI is extremely challenging for both radiologists and deep learning algorithms, but the potential to learn from large and diverse datasets remains a promising avenue to increase their performance within and across institutions. To enable this for prototype-stage algorithms, where the majority of existing research remains, we introduce a flexible federated learning framework for cross-site training, validation, and evaluation of custom deep learning prostate cancer detection algorithms. We introduce an abstraction of prostate cancer groundtruth that represents diverse annotation and histopathology data. We maximize use of this groundtruth if and when they are available using UCNet, a custom 3D UNet that enables simultaneous supervision of pixel-wise, region-wise, and gland-wise classification. We leverage these modules to perform cross-site federated training using 1400+ heterogeneous multi-parameteric prostate MRI exams from two University hospitals. We observe a positive result, with significant improvements in cross-site generalization performance with negligible intra-site performance degradation for both lesion segmentation and per-lesion binary classification of clinically-significant prostate cancer. Cross-site lesion segmentation performance intersection-over-union (IoU) improved by 100%, while cross-site lesion classification performance overall accuracy improved by 9.5-14.8%, depending on the optimal checkpoint selected by each site. Federated learning can improve the generalization performance of prostate cancer detection models across institutions while protecting patient health information and institution-specific code and data. However, even more data and participating institutions are likely required to improve the absolute performance of prostate cancer classification models. To enable adoption of federated learning with limited re-engineering of federated components, we open-source our FLtools system at https://federated.ucsf.edu, including examples that can be easily adapted to other medical imaging deep learning projects.
To propose a deep-learning-based method to differentiate arteries from veins in montaged widefield OCT angiography (OCTA). Cross-sectional study. A total of 232 participants, including 109 participants with diabetic retinopathy (DR), 64 participants with branch retinal vein occlusion (BRVO), 27 participants with diabetes but without DR, and 32 healthy participants. We propose a convolutional neural network (CAVnet) to classify retinal blood vessels on montaged widefield OCTA en face images as arteries and veins. A total of 240 retinal angiograms from 88 eyes were used to train CAVnet, and 302 retinal angiograms from 144 eyes were used for testing. This method takes the OCTA images as input and outputs the segmentation results with arteries and veins down to the level of precapillary arterioles and postcapillary venules. The network also identifies their intersections. We evaluated the agreement (in pixels) between segmentation results and the manually graded ground truth using sensitivity, specificity, F1-score, and Intersection over Union (IoU). Measurements of arterial and venous caliber or tortuosity are made on our algorithm's output of healthy and diseased eyes. Classification of arteries and veins, arterial and venous caliber, and arterial and venous tortuosity. For classification and identification of arteries, the algorithm achieved average sensitivity of 95.3%, specificity of 99.6%, F1 score of 94.2%, and IoU of 89.3%. For veins, the algorithm achieved average sensitivity of 94.4%, specificity of 99.7%, F1 score of 94.1%, and IoU of 89.2%. We also achieved an average sensitivity of 76.3% in identifying intersection points. The results show CAVnet has high accuracy on differentiating arteries and veins in DR and BRVO cases. These classification results are robust across 2 instruments and multiple scan volume sizes. Outputs of CAVnet were used to measure arterial and venous caliber or tortuosity, and pixel-wise caliber and tortuosity maps were generated. Differences between healthy and diseased eyes were demonstrated, indicating potential clinical utility. The CAVnet can classify arteries and veins and their branches with high accuracy and is potentially useful in the analysis of vessel type-specific features on diseases such as branch retinal artery occlusion and BRVO.
Microvessels in vascular plaque are associated with plaque progression and are found in plaque rupture and intra-plaque hemorrhage. To analyze this characteristic of vulnerability, we developed an automated deep learning method for detecting microvessels in intravascular optical coherence tomography (IVOCT) images. A total of 8403 IVOCT image frames from 85 lesions and 37 normal segments were analyzed. Manual annotation was performed using a dedicated software (OCTOPUS) previously developed by our group. Data augmentation in the polar (
Automatic segmentation of breast terminal duct lobular units (TDLUs) on histopathological whole-slide images (WSIs) is crucial for the quantitative evaluation of TDLUs in the diagnostic and prognostic analysis of breast cancer. However, TDLU segmentation remains a great challenge due to its highly heterogeneous sizes, structures, and morphologies as well as the small areas on WSIs. In this study, we propose BreasTDLUSeg, an efficient coarse-to-fine two-stage framework based on multi-scale attention to achieve localization and precise segmentation of TDLUs on hematoxylin and eosin (H&E)-stained WSIs. BreasTDLUSeg consists of two networks: a superpatch-based patch-level classification network (SPPC-Net) and a patch-based pixel-level segmentation network (PPS-Net). SPPC-Net takes a superpatch as input and adopts a sub-region classification head to classify each patch within the superpatch as TDLU positive or negative. PPS-Net takes the TDLU positive patches derived from SPPC-Net as input. PPS-Net deploys a multi-scale CNN-Transformer as an encoder to learn enhanced multi-scale morphological representations and an upsampler to generate pixel-wise segmentation masks for the TDLU positive patches. We also constructed two breast cancer TDLU datasets containing a total of 530 superpatch images with patch-level annotations and 2322 patch images with pixel-level annotations to enable the development of TDLU segmentation methods. Experiments on the two datasets demonstrate that BreasTDLUSeg outperforms other state-of-the-art methods with the highest Dice similarity coefficients of 79.97% and 92.93%, respectively. The proposed method shows great potential to assist pathologists in the pathological analysis of breast cancer. An open-source implementation of our approach can be found at https://github.com/Dian-kai/BreasTDLUSeg.
We present a multimodal label-free optical measurement approach for analyzing sliced tissue biopsies by a unique combination of quantitative phase imaging and localized Raman spectroscopy. First, label-free quantitative phase imaging of the entire unstained tissue slice is performed using automated scanning. Then, pixel-wise segmentation of the tissue layers is performed by a kernelled structural support vector machine based on Haralick texture features, which are extracted from the quantitative phase profile, and used to find the best locations for performing the label-free localized Raman measurements. We use this multimodal label-free measurement approach for segmenting the urothelium in benign and malignant bladder cancer tissues by quantitative phase imaging, followed by location-guided Raman spectroscopy measurements. We then use sparse multinomial logistic regression (SMLR) on the Raman spectroscopy measurements to classify the tissue types, demonstrating that the prior segmentation of the urothelium done by label-free quantitative phase imaging improves the Raman spectra classification accuracy from 85.7% to 94.7%.
Macrophage polarization into inflammatory (M1) and repairing/healing (M2) functional phenotypes are fundamental mechanisms in immune defensive responses, tissue repair, and disease control. Conventional phenotyping approaches based on molecular biomarkers are limited by destructive protocols, static endpoint analyses, and a disregard for the biomechanical attributes of cells. In this study, an integrated artificial intelligence (AI)-atomic force microscopy (AFM) platform is introduced that enables label-free, mechanophenotyping of macrophages at single-cell resolution. Using nanoscale force mapping, morphological and nanomechanical profiles are captured details, such as Young's modulus, adhesion, and sphericity, across diverse macrophage activation states. These profiles are interpreted through a deep neural network (DNN) trained with pixel-wise data enhancement and a meta-confidence estimator for dynamic, robust classification. The system accurately distinguishes naïve (M0), M1, and M2 functional phenotypes of human macrophages, even across donor heterogeneity, in the absence of conventional immunolabeling. The method reveals mixed macrophage polarization states and correlates cytoskeletal remodeling with mechanical biomarkers, establishing a direct link between cellular mechanics and immune function. This platform introduces a dynamic, non-destructive strategy for immune monitoring, redefining cellular mechanics as a critical dimension in diagnostic and therapeutic contexts, and laying the groundwork for the emerging field of mechanoimmunology.
Segmentation of histopathology sections is a necessary preprocessing step for digital pathology. Due to the large variability of biological tissue, machine learning techniques have shown superior performance over conventional image processing methods. Here we present our deep neural network-based approach for segmentation and classification of glands in tissue of benign and malignant colorectal cancer, which was developed to participate in the
To develop a three-dimensional (two dimensions + time) convolutional neural network trained with displacement encoding with stimulated echoes (DENSE) data for displacement and strain analysis of cine MRI. In this retrospective multicenter study, a deep learning model (StrainNet) was developed to predict intramyocardial displacement from contour motion. Patients with various heart diseases and healthy controls underwent cardiac MRI examinations with DENSE between August 2008 and January 2022. Network training inputs were a time series of myocardial contours from DENSE magnitude images, and ground truth data were DENSE displacement measurements. Model performance was evaluated using pixelwise end-point error (EPE). For testing, StrainNet was applied to contour motion from cine MRI. Global and segmental circumferential strain (E The study included 161 patients (110 men; mean age, 61 years ± 14 [SD]), 99 healthy adults (44 men; mean age, 35 years ± 15), and 45 healthy children and adolescents (21 males; mean age, 12 years ± 3). StrainNet showed good agreement with DENSE for intramyocardial displacement, with an average EPE of 0.75 mm ± 0.35. The ICCs between StrainNet and DENSE and FT and DENSE were 0.87 and 0.72, respectively, for global E StrainNet outperformed FT for global and segmental E
Tunneling nanotubes (TNTs) are cellular structures connecting cell membranes and mediating intercellular communication. TNTs are manually identified and counted by a trained investigator; however, this process is time-intensive. We therefore sought to develop an automated approach for quantitative analysis of TNTs. We used a convolutional neural network (U-Net) deep learning model to segment phase contrast microscopy images of both cancer and non-cancer cells. Our method was composed of preprocessing and model development. We developed a new preprocessing method to label TNTs on a pixel-wise basis. Two sequential models were employed to detect TNTs. First, we identified the regions of images with TNTs by implementing a classification algorithm. Second, we fed parts of the image classified as TNT-containing into a modified U-Net model to estimate TNTs on a pixel-wise basis. The algorithm detected 49.9% of human expert-identified TNTs, counted TNTs, and calculated the number of TNTs per cell, or TNT-to-cell ratio (TCR); it detected TNTs that were not originally detected by the experts. The model had 0.41 precision, 0.26 recall, and 0.32 f-1 score on a test dataset. The predicted and true TCRs were not significantly different across the training and test datasets ( Our automated approach labeled and detected TNTs and cells imaged in culture, resulting in comparable TCRs to those determined by human experts. Future studies will aim to improve on the accuracy, precision, and recall of the algorithm.
Spine parsing (i.e., multi-class segmentation of vertebrae and intervertebral discs (IVDs)) for volumetric magnetic resonance (MR) image plays a significant role in various spinal disease diagnoses and treatments of spine disorders, yet is still a challenge due to the inter-class similarity and intra-class variation of spine images. Existing fully convolutional network based methods failed to explicitly exploit the dependencies between different spinal structures. In this article, we propose a novel two-stage framework named SpineParseNet to achieve automated spine parsing for volumetric MR images. The SpineParseNet consists of a 3D graph convolutional segmentation network (GCSN) for 3D coarse segmentation and a 2D residual U-Net (ResUNet) for 2D segmentation refinement. In 3D GCSN, region pooling is employed to project the image representation to graph representation, in which each node representation denotes a specific spinal structure. The adjacency matrix of the graph is designed according to the connection of spinal structures. The graph representation is evolved by graph convolutions. Subsequently, the proposed region unpooling module re-projects the evolved graph representation to a semantic image representation, which facilitates the 3D GCSN to generate reliable coarse segmentation. Finally, the 2D ResUNet refines the segmentation. Experiments on T2-weighted volumetric MR images of 215 subjects show that SpineParseNet achieves impressive performance with mean Dice similarity coefficients of 87.32 ± 4.75%, 87.78 ± 4.64%, and 87.49 ± 3.81% for the segmentations of 10 vertebrae, 9 IVDs, and all 19 spinal structures respectively. The proposed method has great potential in clinical spinal disease diagnoses and treatments.
Robust and fast solutions for anatomical object detection and segmentation support the entire clinical workflow from diagnosis, patient stratification, therapy planning, intervention and follow-up. Current state-of-the-art techniques for parsing volumetric medical image data are typically based on machine learning methods that exploit large annotated image databases. Two main challenges need to be addressed, these are the efficiency in scanning high-dimensional parametric spaces and the need for representative image features which require significant efforts of manual engineering. We propose a pipeline for object detection and segmentation in the context of volumetric image parsing, solving a two-step learning problem: anatomical pose estimation and boundary delineation. For this task we introduce Marginal Space Deep Learning (MSDL), a novel framework exploiting both the strengths of efficient object parametrization in hierarchical marginal spaces and the automated feature design of Deep Learning (DL) network architectures. In the 3D context, the application of deep learning systems is limited by the very high complexity of the parametrization. More specifically 9 parameters are necessary to describe a restricted affine transformation in 3D, resulting in a prohibitive amount of billions of scanning hypotheses. The mechanism of marginal space learning provides excellent run-time performance by learning classifiers in clustered, high-probability regions in spaces of gradually increasing dimensionality. To further increase computational efficiency and robustness, in our system we learn sparse adaptive data sampling patterns that automatically capture the structure of the input. Given the object localization, we propose a DL-based active shape model to estimate the non-rigid object boundary. Experimental results are presented on the aortic valve in ultrasound using an extensive dataset of 2891 volumes from 869 patients, showing significant improvements of up to 45.2% over the state-of-the-art. To our knowledge, this is the first successful demonstration of the DL potential to detection and segmentation in full 3D data with parametrized representations.
Parsing volumetric computed tomography (CT) into 10 or more salient organs simultaneously is a challenging task with many applications such as personalized scan planning and dose reporting. In the clinic, pre-scan data can come in the form of very low dose volumes acquired just prior to the primary scan or from an existing primary scan. To localize organs in such diverse data, we propose a new learning based framework that we call hierarchical pictorial structures (HPS) which builds multiple levels of models in a tree-like hierarchy that mirrors the natural decomposition of human anatomy from gross structures to finer structures. Each node of our hierarchical model learns (1) the local appearance and shape of structures, and (2) a generative global model that learns probabilistic, structural arrangement. Our main contribution is two fold. First we embed the pictorial structures approach in a hierarchical framework which reduces test time image interpretation and allows for the incorporation of additional geometric constraints that robustly guide model fitting in the presence of noise. Second we guide our HPS framework with the probabilistic cost maps extracted using random decision forests using volumetric 3D HOG features which makes our model fast to train and fast to apply to novel test data and posses a high degree of invariance to shape distortion and imaging artifacts. All steps require approximate 3 mins to compute and all organs are located with suitably high accuracy for our clinical applications such as personalized scan planning for radiation dose reduction. We assess our method using a database of volumetric CT scans from 81 subjects with widely varying age and pathology and with simulated ultra low dose cadaver pre-scan data.
The volumetric assessment and accurate grading of meningiomas before surgery are highly relevant for therapy planning and prognosis prediction. This study was to design a deep learning algorithm and evaluate the performance in detecting meningioma lesions and grade classification. In total, 5088 patients with histopathologically confirmed meningioma were retrospectively included. The pyramid scene parsing network (PSPNet) was trained to automatically detect and delineate the meningiomas. The results were compared to manual segmentations by evaluating the mean intersection over union (mIoU). The performance of grade classification was evaluated by accuracy. For the automated detection and segmentation of meningiomas, the mean pixel accuracy, tumor accuracy, background accuracy and mIoU were 99.68%, 81.36%, 99.88% and 81.36% for all patients; 99.52%, 84.86%, 99.93% and 84.86% for grade I meningiomas; 99.57%, 80.11%, 99.92% and 80.12% for grade II meningiomas; and 99.75%, 78.40%, 99.99% and 78.40% for grade III meningiomas, respectively. For grade classification, the accuracy values of the training and test datasets were 99.93% and 81.52% for all patients; 99.98% and 98.51% for grade I meningiomas; 99.91% and 66.67% for grade II meningiomas; and 99.88% and 73.91% for grade III meningiomas, respectively. The automated detection, segmentation and grade classification of meningiomas based on deep learning were accurate and reliable and may improve the monitoring and treatment of this frequently occurring tumor entity. Furthermore, the method could function as a useful tool for preassessment and preselection for radiologists, offering auxiliary information for clinical decision making in presurgical evaluation.
Segmentation of vertebrae and intervertebral discs (IVDs) is a cornerstone of the diagnosis and treatment of disorders affecting the spine. Yet, most methodologies, especially CNN-based, mostly treat vertebrae and discs independently, missing out on the potential of their anatomical relationships. To fill this gap, we present a two-stage deep learning framework that incorporates structural dependency modeling to automate spine segmentation in T2-weighted MR images. In the framework, the components of the spine are modeled as nodes of a graph, with anatomical relationships stored in the system's adjacency matrix. A 3D Graph Convolutional Segmentation Network (GCSN) is first used to perform coarse multi-class segmentation, leveraging the relationships between vertebrae and discs. Then, a 2D ResNet refinement network is used to enhance boundary resolution. The model was tested on volumetric MR data of 218 subjects. The average Dice similarity coefficient (DSC) across 10 vertebrae was 87.32%, 87.78% across 9 intervertebral discs, and 87.49% across 19 structures in the spinal column, showing exemplary segmentation performance. The result shows that the segmentation consistency and accuracy have improved significantly due to the use of the anatomical dependencies through the graph-based learning approach. The proposed system provides a safe and highly effective automated system for parsing the spine and can be clinically used for diagnosing and planning the treatments for spinal disorders.
Pancreatic cancer is a lethal invasive tumor with one of the worst prognosis. Accurate and reliable segmentation for pancreas and pancreatic cancer on computerized tomography (CT) images is vital in clinical diagnosis and treatment. Although certain deep learning-based techniques have been tentatively applied to this task, current performance of pancreatic cancer segmentation is far from meeting the clinical needs due to the tiny size, irregular shape and extremely uncertain boundary of the cancer. Besides, most of the existing studies are established on the black-box models which only learn the annotation distribution instead of the logical thinking and diagnostic experience of high-level medical experts, the latter is more credible and interpretable. To alleviate the above issues, we propose a novel Segment-Like-A-Doctor (SLAD) framework to learn the reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation on CT images. Specifically, SLAD aims to simulate the essential logical thinking and experience of doctors in the progressive diagnostic stages of pancreatic cancer: organ, lesion and boundary stage. Firstly, in the organ stage, an Anatomy-aware Masked AutoEncoder (AMAE) is introduced to model the doctors' overall cognition for the anatomical distribution of abdominal organs on CT images by self-supervised pretraining. Secondly, in the lesion stage, a Causality-driven Graph Reasoning Module (CGRM) is designed to learn the global judgment of doctors for lesion detection by exploring topological feature difference between the causal lesion and the non-causal organ. Finally, in the boundary stage, a Diffusion-based Discrepancy Calibration Module (DDCM) is developed to fit the refined understanding of doctors for uncertain boundary of pancreatic cancer by inferring the ambiguous segmentation discrepancy based on the trustworthy lesion core. Experimental results on three independent datasets demonstrate that our approach boosts pancreatic cancer segmentation accuracy by 4%-9% compared with the state-of-the-art methods. Additionally, the tumor-vascular involvement analysis is also conducted to verify the superiority of our method in clinical applications. Our source codes will be publicly available at https://github.com/ZouLiwen-1999/SLAD.
The abdomen houses multiple vital organs, which are associated with various diseases posing significant risks to human health. Early detection of abdominal organ conditions allows for timely intervention and treatment, preventing deterioration of patients' health. Segmenting abdominal organs aids physicians in more accurately diagnosing organ lesions. However, the anatomical structures of abdominal organs are relatively complex, with organs overlapping each other, sharing similar features, thereby presenting challenges for segmentation tasks. In real medical scenarios, models must demonstrate real-time and low-latency features, necessitating an improvement in segmentation accuracy while minimizing the number of parameters. Researchers have developed various methods for abdominal organ segmentation, ranging from convolutional neural networks (CNNs) to Transformers. However, these methods often encounter difficulties in accurately identifying organ segmentation boundaries. MetaFormer abstracts the framework of Transformers, excluding the multi-head Self-Attention, offering a new perspective for solving computer vision problems and overcoming the limitations of Vision Transformers and CNN backbone networks. To further enhance segmentation effectiveness, we propose a U-shaped network, integrating SEFormer and depthwise cascaded upsampling (dCUP) as the encoder and decoder, respectively, into the UNet structure, named SEF-UNet. SEFormer combines Squeeze-and-Excitation modules with depthwise separable convolutions, instantiating the MetaFormer framework, enhancing the capture of local details and texture information, thereby improving edge segmentation accuracy. dCUP further integrates shallow and deep information layers during the upsampling process. Our model significantly improves segmentation accuracy while reducing the parameter count and exhibits superior performance in segmenting organ edges that overlap each other, thereby offering potential deployment in real medical scenarios.
Ultrasound imaging can distinctly display the morphology and structure of internal organs within the human body, enabling the examination of organs like the breast, liver, and thyroid. It can identify the locations of tumors, nodules, and other lesions, thereby serving as an efficacious tool for treatment detection and rehabilitation evaluation. Typically, the attending physician is required to manually demarcate the boundaries of lesion locations, such as tumors, in ultrasound images. Nevertheless, several issues exist. The high noise level in ultrasound images, the degradation of image quality due to the impact of surrounding tissues, and the influence of the operator's experience and proficiency on the determination of lesion locations can all contribute to a reduction in the accuracy of delineating the boundaries of lesion sites. In the wake of the advancement of deep learning, its application in medical image segmentation is becoming increasingly prevalent. For instance, while the U-Net model has demonstrated a favorable performance in medical image segmentation, the convolution layers of the traditional U-Net model are relatively simplistic, leading to suboptimal extraction of global information. Moreover, due to the significant noise present in ultrasound images, the model is prone to interference. In this research, we propose an Attention Residual Network model (ARU-Net). By incorporating residual connections within the encoder section, the learning capacity of the model is enhanced. Additionally, a spatial hybrid convolution module is integrated to augment the model's ability to extract global information and deepen the vertical architecture of the network. During the feature fusion stage of the skip connections, a channel attention mechanism and a multi-convolutional self-attention mechanism are respectively introduced to suppress noisy points within the fused feature maps, enabling the model to acquire more information regarding the target region. Finally, the predictive efficacy of the model was evaluated using publicly accessible breast ultrasound and thyroid ultrasound data. The ARU-Net achieved mean Intersection over Union (mIoU) values of 82.59% and 84.88%, accuracy values of 97.53% and 96.09%, and F1-score values of 90.06% and 89.7% for breast and thyroid ultrasound, respectively.
Medical image segmentation is essential for disease diagnosis and therapy planning, but the complexity of multi-organ structures and blurred skin lesion boundaries poses challenges. CNNs and Transformers are constrained by limited receptive fields and high computational complexity. The state-space model effectively captures long-range dependencies with linear complexity but struggles with local modeling and channel attention.These methods struggle to detect subtle differences in lesion areas, leading to poor performance in medical image segmentation, especially when the lesions are discontinuous or boundaries are unclear.To address these challenges, we propose SCFMUNet, which enhances both local and global modeling across multi-scale features and effectively captures spatial and channel semantics. SCFMUNet integrates three key fusion strategies: 1) At the bottleneck, the multi-scale state-space fusion module is designed to combine convolutions and the SS2D method, processes and fuses the encoder stage features. 2) In the skip connections, the gated adaptive channel mechanism dynamically adjusts the encoder features and fuses them with the decoder stage features using channel-wise addition. 3) In the decoder stages, the spatial channel state-space model performs spatial and channel-level modeling on the fused features from the skip connection stage and the previous decoder layer. Experiments on four public datasets were conducted. On the Synapse dataset and ACDC dataset, our SCFMUNet achieved 82.31 % and 92.14 % on Dice. Compared to state-of-the-art methods, SCFMUNet improves Dice by 0.85 % on Synapse and 1.0 % on ACDC. On the ISIC2017 and ISIC2018 skin lesion datasets, SCFMUNet achieved Dice scores of 90.69 % and 89.69 %, with improvements ranging from 0.5 % to 2 % compared to state-of-the-art methods. Experimental results show that SCFMUNet outperforms state-of-the-art methods on four publicly available biomedical datasets.The source code is publicly available https://github.com/zzzeed/SCFMUNet.
This work describes the application of an object definition algorithm to the medical imaging environment for the task of automated detection of anatomical boundaries in three dimensions in the presence of low spatial frequency nonstationarities. We have chosen the Liou-Jain algorithm and have modified it for use with 3D medical image datasets and extended it by including a recruitment operator that corrects for the algorithm's inherent volume underestimation. The algorithm avoids problems in both traditional statistical segmentation and 2D techniques and elegantly bridges the gap between traditional gradient-based edge finding and regression-based segmentation techniques. Results are shown for MRI datasets from the human abdomen and brain and for a CT dataset of a liver tumor, as well as an MRI scan of a glioma in a rat brain. For comparison, the human abdomen dataset was processed by a multivariate, statistical classifier. The results demonstrate the statistical technique's susceptibility to low spatial frequency nonstationarities due to rf field inhomogeneity; the Liou-Jain algorithm is shown to be immune to this effect. Further, the results show spatial consistency as a result of inherent characteristics of the algorithm. Volumes identified by the algorithm are visualized and assessed qualitatively in three dimensions. Quantitative accuracy of the algorithm's volume estimates is assessed by the use of a phantom. This work demonstrates that this technique is effective in automatically detecting anatomical organ and lesion surfaces in 3D medical datasets that are corrupted by low spatial frequency nonstationarity and in obtaining volume estimates.
Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.
Accurate segmentation and lesion localization are essential for treating diseases in medical images. Despite deep learning methods enhancing segmentation, they still have limitations due to convolutional neural networks' inability to capture long-range feature dependencies. The self-attention mechanism in Transformers addresses this drawback, but high-resolution images present computational complexity. To improve the convolution and Transformer, we suggest a hierarchical hybrid multiaxial attention mechanism called H2MaT-Unet. This approach combines hierarchical post-feature data and applies the multiaxial attention mechanism to the feature interactions. This design facilitates efficient local and global interactions. Furthermore, we introduce a Spatial and Channel Reconstruction Convolution (ScConv) module to enhance feature aggregation. The paper introduces the H2MaT-UNet model which achieves 87.74% Dice in the multi-target segmentation task and 87.88% IOU in the single-target segmentation task, surpassing current popular models and accomplish a new SOTA. H2MaT-UNet synthesizes multi-scale feature information during the layering stage and utilizes a multi-axis attention mechanism to amplify global information interactions in an innovative manner. This re-search holds value for the practical application of deep learning in clinical settings. It allows healthcare providers to analyze segmented details of medical images more quickly and accurately.
Accurate segmentation of medical images is crucial for clinical diagnosis and evaluation. However, medical images have complex shapes, the structures of different objects are very different, and most medical datasets are small in scale, making it difficult to train effectively. These problems increase the difficulty of automatic segmentation. To further improve the segmentation performance of the model, we propose a multi-branch network model, called TransCUNet, for segmenting medical images of different modalities. The model contains three structures: cross residual fusion block (CRFB), pyramidal pooling module (PPM) and gated axial-attention, which achieve effective extraction of high-level and low-level features of images, while showing high robustness to different size segmentation objects and different scale datasets. In our experiments, we use four datasets to train, validate and test the models. The experimental results show that TransCUNet has better segmentation performance compared to the current mainstream segmentation methods, and the model has a smaller size and number of parameters, which has great potential for clinical applications.
Medical image segmentation has garnered significant research attention in the neural network community as a fundamental requirement for developing intelligent medical assistant systems. A series of UNet-like networks with an encoder-decoder architecture have achieved remarkable success in medical image segmentation. Among these networks, UNet2+ (UNet++) and UNet3+ (UNet+++) have introduced redesigned skip connections, dense skip connections, and full-scale skip connections, respectively, surpassing the performance of the original UNet. However, UNet2+ lacks comprehensive information obtained from the entire scale, which hampers its ability to learn organ placement and boundaries. Similarly, due to the limited number of neurons in its structure, UNet3+ fails to effectively segment small objects when trained with a small number of samples. In this study, we propose UNet_sharp (UNet#), a novel network topology named after the "#" symbol, which combines dense skip connections and full-scale skip connections. In the decoder sub-network, UNet# can effectively integrate feature maps of different scales and capture fine-grained features and coarse-grained semantics from the entire scale. This approach enhances the understanding of organ and lesion positions and enables accurate boundary segmentation. We employ deep supervision for model pruning to accelerate testing and enable mobile device deployment. Additionally, we construct two classification-guided modules to reduce false positives and improve segmentation accuracy. Compared to current UNet-like networks, our proposed method achieves the highest Intersection over Union (IoU) values ((92.67±0.96)%, (92.38±1.29)%, (95.36±1.22)%, (74.01±2.03)%) and F1 scores ((91.64±1.86)%, (95.70±2.16)%, (97.34±2.76)%, (84.77±2.65)%) on the semantic segmentation tasks of nuclei, brain tumors, liver, and lung nodules, respectively. The experimental results demonstrate that the reconstructed skip connections in UNet successfully incorporate multi-scale contextual semantic information. Compared to most state-of-the-art medical image segmentation models, our proposed method more accurately locates organs and lesions and precisely segments boundaries.
Convolutional neural networks (CNNs) have revolutionized medical image analysis over the past few years. The U-Net architecture is one of the most well-known CNN architectures for semantic segmentation and has achieved remarkable successes in many different medical image segmentation applications. The U-Net architecture consists of standard convolution layers, pooling layers, and upsampling layers. These convolution layers learn representative features of input images and construct segmentations based on the features. However, the features learned by standard convolution layers are not distinctive when the differences among different categories are subtle in terms of intensity, location, shape, and size. In this paper, we propose a novel CNN architecture, called Dense-Res-Inception Net (DRINet), which addresses this challenging problem. The proposed DRINet consists of three blocks, namely a convolutional block with dense connections, a deconvolutional block with residual inception modules, and an unpooling block. Our proposed architecture outperforms the U-Net in three different challenging applications, namely multi-class segmentation of cerebrospinal fluid on brain CT images, multi-organ segmentation on abdominal CT images, and multi-class brain tumor segmentation on MR images.
Various segmentation networks based on Swin Transformer have shown promise in medical segmentation tasks. Nonetheless, challenges such as lower accuracy and slower training convergence have persisted. To tackle these issues, we introduce a novel approach that combines the Swin Transformer and Deformable Transformer to enhance overall model performance. We leverage the Swin Transformer's window attention mechanism to capture local feature information and employ the Deformable Transformer to adjust sampling positions dynamically, accelerating model convergence and aligning it more closely with object shapes and sizes. By amalgamating both Transformer modules and incorporating additional skip connections to minimize information loss, our proposed model excels at rapidly and accurately segmenting CT or X-ray lung images. Experimental results demonstrate the remarkable, showcasing the significant prowess of our model. It surpasses the performance of the standalone Swin Transformer's Swin Unet and converges more rapidly under identical conditions, yielding accuracy improvements of 0.7% (resulting in 88.18%) and 2.7% (resulting in 98.01%) on the COVID-19 CT scan lesion segmentation dataset and Chest X-ray Masks and Labels dataset, respectively. This advancement has the potential to aid medical practitioners in early diagnosis and treatment decision-making.
Computerized tomography (CT) is of great significance for the localization and diagnosis of liver cancer. Many scholars have recently applied deep learning methods to segment CT images of liver and liver tumors. Unlike natural images, medical image segmentation is usually more challenging due to its nature. Aiming at the problem of blurry boundaries and complex gradients of liver tumor images, a deep supervision network based on the combination of high-efficiency channel attention and Res-UNet++ (ECA residual UNet++) is proposed for liver CT image segmentation, enabling fully automated end-to-end segmentation of the network. In this paper, the UNet++ structure is selected as the baseline. The residual block feature encoder based on context awareness enhances the feature extraction ability and solves the problem of deep network degradation. The introduction of an efficient attention module combines the depth of the feature map with spatial information to alleviate the uneven sample distribution impact; Use DiceLoss to replace the cross-entropy loss function to optimize network parameters. The liver and liver tumor segmentation accuracy on the LITS dataset was 95.8% and 89.3%, respectively. The results show that compared with other algorithms, the method proposed in this paper achieves a good segmentation performance, which has specific reference significance for computer-assisted diagnosis and treatment to attain fine segmentation of liver and liver tumors.
Transformers have recently gained significant attention in medical image segmentation due to their ability to capture long-range dependencies. However, the presence of excessive background noise in large regions of medical images introduces distractions and increases the computational burden on the fine-grained self-attention (SA) mechanism, which is a key component of the transformer model. Meanwhile, preserving fine-grained details is essential for accurately segmenting complex, blurred medical images with diverse shapes and sizes. Thus, we propose a novel Multi-scale Dynamic Sparse Attention (MDSA) module, which flexibly reduces computational costs while maintaining multi-scale fine-grained interactions with content awareness. Specifically, multi-scale aggregation is first applied to the feature maps to enrich the diversity of interaction information. Then, for each query, irrelevant key-value pairs are filtered out at a coarse-grained level. Finally, fine-grained SA is performed on the remaining key-value pairs. In addition, we design an enhanced downsampling merging (EDM) module and an enhanced upsampling fusion (EUF) module for building pyramid architectures. Using MDSA to construct the basic blocks, combined with EDMs and EUFs, we develop a UNet-like model named MDSA-UNet. Since MDSA-UNet dynamically processes only a small subset of relevant fine-grained features, it achieves strong segmentation performance with high computational efficiency. Extensive experiments on four datasets spanning three different types demonstrate that our MDSA-UNet, without using pre-training, significantly outperforms other non-pretrained methods and even competes with pre-trained models, achieving Dice scores of 82.10% on DDTI, 80.20% on TN3K, 90.75% on ISIC2018, and 91.05% on ACDC. Meanwhile, our model maintains lower complexity, with only 6.65 M parameters and 4.54 G FLOPs at a resolution of 224 × 224, ensuring both effectiveness and efficiency. Code is available at URL.
Ultrasonography has become an essential part of clinical diagnosis owing to its noninvasive, and real-time nature. To assist diagnosis, automatically segmenting a region of interest (ROI) in ultrasound images is becoming a vital part of computer-aided diagnosis (CAD). However, segmenting ROIs on medical images with relatively low contrast is a challenging task. To better achieve medical ROI segmentation, we propose an efficient module denoted as multiscale attentional convolution (MSAC), utilizing cascaded convolutions and a self-attention approach to concatenate features from various receptive field scales. Then, MSAC-Unet is constructed based on Unet, employing MSAC instead of the standard convolution in each encoder and decoder for segmentation. In this study, two representative types of ultrasound images, one of the thyroid nodules and the other of the brachial plexus nerves, were used to assess the effectiveness of the proposed approach. The best segmentation results from MSAC-Unet were achieved on two thyroid nodule datasets (TND-PUH3 and DDTI) and a brachial plexus nerve dataset (NSD) with Dice coefficients of 0.822, 0.792, and 0.746, respectively. The analysis of segmentation results shows that our MSAC-Unet greatly improves the segmentation accuracy with more reliable ROI edges and boundaries, decreasing the number of erroneously segmented ROIs in ultrasound images.
Convolutional neural networks (CNNs) play an important role in the field of medical image segmentation. Among many kinds of CNNs, the U-net architecture is one of the most famous fully convolutional network architectures for medical semantic segmentation tasks. Recent work shows that the U-net network can be substantially deeper thus resulting in improved performance on segmentation tasks. Though adding more layers directly into network is a popular way to make a network deeper, it may lead to gradient vanishing or redundant computation during training. A novel CNN architecture is proposed that integrates the Inception-Res module and densely connecting convolutional module into the U-net architecture. The proposed network model consists of the following parts: firstly, the Inception-Res block is designed to increase the width of the network by replacing the standard convolutional layers; secondly, the Dense-Inception block is designed to extract features and make the network more deep without additional parameters; thirdly, the down-sampling block is adopted to reduce the size of feature maps to accelerate learning and the up-sampling block is used to resize the feature maps. The proposed model is tested on images of blood vessel segmentations from retina images, the lung segmentation of CT Data from the benchmark Kaggle datasets and the MRI scan brain tumor segmentation datasets from MICCAI BraTS 2017. The experimental results show that the proposed method can provide better performance on these two tasks compared with the state-of-the-art algorithms. The results reach an average Dice score of 0.9857 in the lung segmentation. For the blood vessel segmentation, the results reach an average Dice score of 0.9582. For the brain tumor segmentation, the results reach an average Dice score of 0.9867. The experiments highlighted that combining the inception module with dense connections in the U-Net architecture is a promising approach for semantic medical image segmentation.
Medical image segmentation is crucial for understanding anatomical or pathological changes, playing a key role in computer-aided diagnosis and advancing intelligent healthcare. Currently, important issues in medical image segmentation need to be addressed, particularly the problem of segmenting blurry edge regions and the generalizability of segmentation models. Therefore, this study focuses on different medical image segmentation tasks and the issue of blurriness. By addressing these tasks, the study significantly improves diagnostic efficiency and accuracy, contributing to the overall enhancement of healthcare outcomes. To optimize segmentation performance and leverage feature information, we propose a Neighborhood Fuzzy c-Means Multiscale Pyramid Hybrid Attention Unet (NFMPAtt-Unet) model. NFMPAtt-Unet comprises three core components: the Multiscale Dynamic Weight Feature Pyramid module (MDWFP), the Hybrid Weighted Attention mechanism (HWA), and the Neighborhood Rough Set-based Fuzzy c-Means Feature Extraction module (NFCMFE). The MDWFP dynamically adjusts weights across multiple scales, improving feature information capture. The HWA enhances the network's ability to capture and utilize crucial features, while the NFCMFE, grounded in neighborhood rough set concepts, aids in fuzzy C-means feature extraction, addressing complex structures and uncertainties in medical images, thereby enhancing adaptability. Experimental results demonstrate that NFMPAtt-Unet outperforms state-of-the-art models, highlighting its efficacy in medical image segmentation.
In medical image segmentation, traditional CNN-based models excel at extracting local features but have limitations in capturing global features. Conversely, Mamba, a novel network framework, effectively captures long-range feature dependencies and excels in processing linearly arranged image inputs, albeit at the cost of overlooking fine spatial relationships and local pixel interactions. This limitation highlights the need for hybrid approaches that combine the strengths of both architectures. To address this challenge, we propose CNN-Fusion-Mamba-based U-Net (CFM-UNet). The model integrates CNN-based Bottle2neck blocks for local feature extraction and Mamba-based visual state space blocks for global feature extraction. These parallel frameworks perform feature fusion through our designed SEF block, achieving complementary advantages. Experimental results demonstrate that CFM-UNet outperforms other advanced methods in segmenting medical image datasets, including liver organs, liver tumors, spine, and colon polyps, with notable generalization ability in liver organ segmentation. Our code is available at https://github.com/Jiacheng-Han/CFM-UNet .
In recent years, deep learning technology for clinical diagnosis has progressed considerably, and the value of medical imaging continues to increase. In the past, clinicians evaluated medical images according to their individual expertise. In contrast, the application of artificial intelligence technology for automatic analysis and diagnostic assistance to support clinicians in evaluating medical information more efficiently has become an important trend. In this study, we propose a machine learning architecture designed to segment images of retinal blood vessels based on an improved U-Net neural network model. The proposed model incorporates a residual module to extract features more effectively, and includes a full-scale skip connection to combine low level details with high-level features at different scales. The results of an experimental evaluation show that the model was able to segment images of retinal vessels accurately. The proposed method also outperformed several existing models on the benchmark datasets DRIVE and ROSE, including U-Net, ResUNet, U-Net3+, ResUNet++, and CaraNet.
This study aims to design an auxiliary segmentation model for thyroid nodules to increase diagnostic accuracy and efficiency, thereby reducing the workload of medical personnel. This study proposes a Dual-Path Attention Mechanism (DPAM)-UNet++ model, which can automatically segment thyroid nodules in ultrasound images. Specifically, the model incorporates dual-path attention modules into the skip connections of the UNet++ network to capture global contextual information in feature maps. The model's performance was evaluated using Intersection over Union (IoU), F1_score, accuracy, etc. Additionally, a new integrated loss function was designed for the DPAM-UNet++ network. Comparative experiments with classical segmentation models revealed that the DPAM-UNet++ model achieved an IoU of 0.7451, an F1_score of 0.8310, an accuracy of 0.9718, a precision of 0.8443, a recall of 0.8702, an Area Under Curve (AUC) of 0.9213, and an HD95 of 35.31. Except for the precision metric, this model outperformed the other models on all the indicators and achieved a segmentation effect that was more similar to that of the ground truth labels. Additionally, ablation experiments verified the effectiveness and necessity of the dual-path attention mechanism and the integrated loss function. The segmentation model proposed in this study can effectively capture global contextual information in ultrasound images and accurately identify the locations of nodule areas. The model yields excellent segmentation results, especially for small and multiple nodules. Additionally, the integrated loss function improves the segmentation of nodule edges, enhancing the model's accuracy in segmenting edge details.
The blurriness of boundaries in medical image target regions hinders further improvement in automatic segmentation accuracy and is a challenging problem. To address this issue, we propose a model called long-distance perceptual UNet (LD-UNet), which has a powerful long-distance perception ability and can effectively perceive the semantic context of an entire image. Specifically, LD-UNet utilizes global and local long-distance induction modules, which endow the model with contextual semantic induction capabilities for long-distance feature dependencies. The modules perform long-distance semantic perception at the high and low stages of LD-UNet, respectively, effectively improving the accuracy of local blurred information assessment. We also propose a top-down deep supervision method to enhance the ability of the model to fit data. Then, extensive experiments on four types of tumor data with blurred boundaries are conducted. The dataset includes nasopharyngeal carcinoma, esophageal carcinoma, pancreatic carcinoma, and colorectal carcinoma. The dice similarity coefficient scores obtained by LD-UNet on the four datasets are 73.35%, 85.93%, 70.04%, and 82.71%. Experimental results demonstrate that LD-UNet is more effective in improving the segmentation accuracy of blurred boundary regions than other methods with long-distance perception, such as transformers. Among all models, LD-UNet achieves the best performance. By visualizing the feature dependency field of the models, we further explore the advantages of LD-UNet in segmenting blurred boundaries.
Accurate musculoseletal ultrasound (MSKUS) image segmentation is crucial for diagnosis and treatment planning. Compared with traditional segmentation methods, deploying deep learning segmentation methods that balance segmentation efficiency, accuracy, and model size on edge devices has greater advantages. This paper aims to design a MSKUS image segmentation method that has fewer parameters, lower computation complexity and higher segmentation accuracy. In this study, an attention mechanism-based lightweight UNet (AML-UNet) is designed to segment target muscle regions in MSKUS images. To suppress the transmission of redundant feature, Channel Reconstruction and Spatial Attention Module is designed in the encoding path. In addition, considering the inherent characteristic of MSKUS image, Multiscale Aggregation Module is developed to replace the skip connection architecture of U-Net. Deep supervision is also introduced to the decoding path to refine predicted masks gradually. Our method is evaluated on two MSKUS 2D-image segmentation datasets, including 3917 MSKUS and 1534 images respectively. In the experiments, a five-fold cross-validation method is adopted in ablation experiments and comparison experiments. In addition, Wilcoxon Signed-Rank Test and Bonferroni correction are employed to validate the significance level. 0.01 was used as the statistical significance level in our paper. AML-UNet yielded a mIoU of 84.17% and 90.14% on two datasets, representing a 3.38% ( Our proposed model achieved superior results with fewer parameters while maintaining segmentation efficiency and accuracy compared to other methods.
Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional U-Net architectures and their transformer-integrated variants excel in automated segmentation tasks. Existing models also struggle with parameter efficiency and computational complexity, often due to the extensive use of Transformers. However, they lack the ability to harness the image's intrinsic position and channel features. Research employing Dual Attention mechanisms of position and channel have not been specifically optimized for the high-detail demands of medical images. To address these issues, this study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to integrate the Transformer and dual attention block (DA-Block) into the traditional U-shaped architecture. Also, DA-TransUNet tailored for the high-detail requirements of medical images, optimizes the intermittent channels of Dual Attention (DA) and employs DA in each skip-connection to effectively filter out irrelevant information. This integration significantly enhances the model's capability to extract features, thereby improving the performance of medical image segmentation. DA-TransUNet is validated in medical image segmentation tasks, consistently outperforming state-of-the-art techniques across 5 datasets. In summary, DA-TransUNet has made significant strides in medical image segmentation, offering new insights into existing techniques. It strengthens model performance from the perspective of image features, thereby advancing the development of high-precision automated medical image diagnosis. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet.
In the last decade, deep neural networks have been widely applied to medical image segmentation, achieving good results in computer-aided diagnosis tasks etc. However, the task of segmenting highly complex, low-contrast images of organs and tissues with high accuracy still faces great challenges. To better address this challenge, this paper proposes a novel model SWTRU (Star-shaped Window Transformer Reinforced U-Net) by combining the U-Net network which plays well in the image segmentation field, and the Transformer which possesses a powerful ability to capture global contexts. Unlike the previous methods that import the Transformer into U-Net, an improved Star-shaped Window Transformer is introduced into the decoder of the SWTRU to enhance the decision-making capability of the whole method. The SWTRU uses a redesigned multi-scale skip-connection model, which retains the inductive bias of the original FCN structure for images while obtaining fine-grained features and coarse-grained semantic information. Our method also presents the FFIM (Filtering Feature Integration Mechanism) to integration and dimensionality reduction of the fused multi-layered features, which reduces the computation. Our SWTRU yields 0.972 DICE on CHLISC for liver and tumor segmentation, 0.897 DICE on LGG for glioma segmentation, and 0.904 DICE on ISIC2018 for skin diseases' segmentation, achieves substantial improvements over the current SoTA across 9 different medical image segment methods. SWTRU can combine feature mapping from different scales, high-level semantics, and global contextual relationships, this architecture is effective in the medical image segmentation. The experimental findings indicate that SWTRU produces superior performance on the medical image segmentation tasks.
The field of medical image segmentation powered by deep learning has recently received substantial attention, with a significant focus on developing novel architectures and designing effective loss functions. Traditional loss functions, such as Dice loss and Cross-Entropy loss, predominantly rely on global metrics to compare predictions with labels. However, these global measures often struggle to address challenges such as occlusion and nonuni-form intensity. To overcome these issues, in this study, we propose a novel loss function, termed Global-Local Active Contour (GLAC) loss, which integrates both global and local image features, reformulated within the Mumford-Shah framework and extended for multiclass segmentation. This approach enables the neural network model to be trained end-to-end while simultaneously segmenting multiple classes. In addition to this, we enhance the U-Net architecture by incorporating Dense Layers, Convolutional Block Attention Modules, and DropBlock. These improvements enable the model to more effectively combine contextual information across layers, capture richer semantic details, and mitigate overfitting, resulting in more precise segmentation outcomes. We validate our proposed method, namely GLAC-Unet, which utilizes the GLAC loss in conjunction with our modified U-shaped architecture, on three biomedical segmentation datasets that span a range of modalities, including two-dimensional and three-dimensional images, such as dermoscopy, cardiac magnetic resonance imaging, and brain magnetic resonance imaging. Extensive experiments demonstrate the promising performance of our approach, achieving a Dice score (DSC) of 0.9125 on the ISIC-2018 dataset, 0.9260 on the Automated Cardiac Diagnosis Challenge (ACDC) 2017, and 0.927 on the Infant Brain MRI Segmentation Challenge 2019. Furthermore, statistical significance testing with p-values consistently smaller than 0.05 on the ISIC-2018 and ACDC datasets confirms the superior performance of the proposed method compared to other state-of-the-art models. These results highlight the robustness and effectiveness of our multiclass segmentation technique, underscoring its potential for biomedical image analysis. Our code will be made available at https://github.com/minhnhattrinh312/Active-Contour-Loss-based-on-Global-and-Local-Intensity.
SeUneter: Channel attentive U-Net for instance segmentation of the cervical spine MRI medical image.
In recent years, cervical spondylosis has become one of the most common chronic diseases and has received much attention from the public. Magnetic resonance imaging (MRI) is the most widely used imaging modality for the diagnosis of degenerative cervical spondylosis. The manual identification and segmentation of the cervical spine on MRI makes it a laborious, time-consuming, and error-prone process. In this work, we collected a new dataset of 300 patients with a total of 600 cervical spine images in the MRI T2-weighted (T2W) modality for the first time, which included the cervical spine, intervertebral discs, spinal cord, and spinal canal information. A new instance segmentation approach called SeUneter was proposed for cervical spine segmentation. SeUneter expanded the depth of the network structure based on the original U-Net and added a channel attention module to the double convolution of the feature extraction. SeUneter could enhance the semantic information of the segmentation and weaken the characteristic information of non-segmentation to the screen for important feature channels in double convolution. In the meantime, to alleviate the over-fitting of the model under insufficient samples, the Cutout was used to crop the pixel information in the original image at random positions of a fixed size, and the number of training samples in the original data was increased. Prior knowledge of the data was used to optimize the segmentation results by a post-process to improve the segmentation performance. The mean of Intersection Over Union (mIOU) was calculated for the different categories, while the mean of the Dice similarity coefficient (mDSC) and mIOU were calculated to compare the segmentation results of different deep learning models for all categories. Compared with multiple models under the same experimental settings, our proposed SeUneter's performance was superior to U-Net, AttU-Net, UNet++, DeepLab-v3+, TransUNet, and Swin-Unet on the spinal cord with mIOU of 86.34% and the spinal canal with mIOU of 73.44%. The SeUneter matched or exceeded the performance of the aforementioned segmentation models when segmenting vertebral bodies or intervertebral discs. Among all models, SeUneter achieved the highest mIOU and mDSC of 82.73% and 90.66%, respectively, for the whole cervical spine.
Accurate and efficient segmentation of thyroid nodules on ultrasound images is critical for computer-aided nodule diagnosis and treatment. For ultrasound images, Convolutional neural networks (CNNs) and Transformers, which are widely used in natural images, cannot obtain satisfactory segmentation results, because they either cannot obtain precise boundaries or segment small objects. To address these issues, we propose a novel Boundary-preserving assembly Transformer UNet (BPAT-UNet) for ultrasound thyroid nodule segmentation. In the proposed network, a Boundary point supervision module (BPSM), which adopts two novel self-attention pooling approaches, is designed to enhance boundary features and generate ideal boundary points through a novel method. Meanwhile, an Adaptive multi-scale feature fusion module (AMFFM) is constructed to fuse features and channel information at different scales. Finally, to fully integrate the characteristics of high-frequency local and low-frequency global, the Assembled transformer module (ATM) is placed at the bottleneck of the network. The correlation between deformable features and features-among computation is characterized by introducing them into the above two modules of AMFFM and ATM. As the design goal and eventually demonstrated, BPSM and ATM promote the proposed BPAT-UNet to further constrain boundaries, whereas AMFFM assists to detect small objects. Compared to other classical segmentation networks, the proposed BPAT-UNet displays superior segmentation performance in visualization results and evaluation metrics. Significant improvement of segmentation accuracy was shown on the public thyroid dataset of TN3k with Dice similarity coefficient (DSC) of 81.64% and 95th percentage of the asymmetric Hausdorff distance (HD95) of 14.06, whereas those on our private dataset were with DSC of 85.63% and HD95 of 14.53, respectively. This paper presents a method for thyroid ultrasound image segmentation, which achieves high accuracy and meets the clinical requirements. Code is available at https://github.com/ccjcv/BPAT-UNet.
In recent years, the global outbreak of COVID-19 has posed an extremely serious life-safety risk to humans, and in order to maximize the diagnostic efficiency of physicians, it is extremely valuable to investigate the methods of lesion segmentation in images of COVID-19. Aiming at the problems of existing deep learning models, such as low segmentation accuracy, poor model generalization performance, large model parameters and difficult deployment, we propose an UNet segmentation network integrating multi-scale attention for COVID-19 CT images. Specifically, the UNet network model is utilized as the base network, and the structure of multi-scale convolutional attention is proposed in the encoder stage to enhance the network's ability to capture multi-scale information. Second, a local channel attention module is proposed to extract spatial information by modeling local relationships to generate channel domain weights, to supplement detailed information about the target region to reduce information redundancy and to enhance important information. Moreover, the network model encoder segment uses the Meta-ACON activation function to avoid the overfitting phenomenon of the model and to improve the model's representational ability. A large number of experimental results on publicly available mixed data sets show that compared with the current mainstream image segmentation algorithms, the pro-posed method can more effectively improve the accuracy and generalization performance of COVID-19 lesions segmentation and provide help for medical diagnosis and analysis.
No abstract
Vitiligo lesion segmentation is crucial for the assessment and treatment of vitiligo. There are two significant challenges in this problem, namely, the availability of dense segmentation annotations and the collection of large amounts of vitiligo images, which are also major challenges in medical image analysis (MIA). However, most existing methods often heavily rely on the availability of large-scale labeled datasets and high-quality annotations. Consequently, the performance of these models may not be easily reproducible or transferable to those domains with limited data availability. As a result, there is a need to develop alternative approaches that can leverage unlabeled datasets for segmentation with a small-scale training set. In this paper, we propose a data augmentation strategy based on image editing, which can synthesize a large number of samples using a small number of annotated data. The synthesized examples are of high visual quality and enforce the segmentation performance without any cost. Besides, we also adapt the Mean-Teacher framework for reliable predictions mining from unlabeled samples to alleviate the demands of densely annotated segmentations. We obtain pseudo-labels for unlabeled samples by utilizing highly confident pixels. On the other hand, we proposed a new Bimodal Vitiligo Lesions Segmentation (BVLS) dataset containing fine-grain segmentation masks and bimodal images usually used for vitiligo diagnosis to mitigate the lack of a vitiligo segmentation dataset. Extensive experiments conducted on the BLVS dataset demonstrate that our approach can achieve significant improvements (+17.27%) compared with previous data augmentation methods on the UNet backbone. Furthermore, the semi-supervised framework can reach an IoU of 49.71% with only 10% annotated images. Our code and dataset are availabel at https://github.com/JcWang20/BLVS.
Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net has been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use of parallel decoders which along with performing the mask predictions also perform contour prediction and distance map estimation. The contour and distance map aid in ensuring smoothness in the segmentation predictions. To facilitate joint training of three tasks, we propose a novel architecture called Psi-Net with a single encoder and three parallel decoders (thus having a shape of Ψ), one decoder to learn the segmentation mask prediction and other two decoders to learn the auxiliary tasks of contour detection and distance map estimation. The learning of these auxiliary tasks helps in capturing the shape and the boundary information. We also propose a new joint loss function for the proposed architecture. The loss function consists of a weighted combination of Negative Log Likelihood and Mean Square Error loss. We have used two publicly available datasets: 1) Origa dataset for the task of optic cup and disc segmentation and 2) Endovis segment dataset for the task of polyp segmentation to evaluate our model. We have conducted extensive experiments using our network to show our model gives better results in terms of segmentation, boundary and shape metrics.
Convolutional neural networks (CNNs) have achieved state-of-the-art results in various medical image segmentation tasks. However, CNNs often assume that the source and target dataset follow the same probability distribution and when this assumption is not satisfied their performance degrades significantly. This poses a limitation in medical image analysis, where including information from different imaging modalities can bring large clinical benefits. In this work, we present an unsupervised Structure Aware Cross-modality Domain Adaptation (StAC-DA) framework for medical image segmentation. StAC-DA implements an image- and feature-level adaptation in a sequential two-step approach. The first step performs an image-level alignment, where images from the source domain are translated to the target domain in pixel space by implementing a CycleGAN-based model. The latter model includes a structure-aware network that preserves the shape of the anatomical structure during translation. The second step consists of a feature-level alignment. A U-Net network with deep supervision is trained with the transformed source domain images and target domain images in an adversarial manner to produce probable segmentations for the target domain. The framework is evaluated on bidirectional cardiac substructure segmentation. StAC-DA outperforms leading unsupervised domain adaptation approaches, being ranked first in the segmentation of the ascending aorta when adapting from Magnetic Resonance Imaging (MRI) to Computed Tomography (CT) domain and from CT to MRI domain. The presented framework overcomes the limitations posed by differing distributions in training and testing datasets. Moreover, the experimental results highlight its potential to improve the accuracy of medical image segmentation across diverse imaging modalities.
Accurate segmentation of frontal lobe areas on magnetic resonance imaging (MRI) can assist in diagnosing and managing idiopathic normal-pressure hydrocephalus. However, frontal lobe segmentation is challenging due to the complexity of the degree and shape of damage and the ambiguity of the boundaries of frontal lobe sites. Therefore, to extract the rich edge information and feature representation of the frontal lobe, this paper designs an edge guidance (EG) module to enhance the representation of edge features. Accordingly, an edge-guided cascade network framework (EG-Net) is proposed to segment frontal lobe parts automatically. Two-dimensional MRI slice images are fed into the edge generation and segmentation networks. First, the edge generation network extracts the edge information from the input image. Then, the edge information is sent to the EG module to generate an edge attention map for feature representation enhancement. Meanwhile, multi-scale attentional convolution (MSA) is utilized in the feature coding stage of the segmentation network to obtain feature responses from different perceptual fields in the coding stage and enrich the spatial context information. Besides, the feature fusion module is employed to selectively aggregate the multi-scale features in the coding stage with the edge features output by the EG module. Finally, the two components are fused, and a decoder recovers the spatial information to generate the final prediction results. An extensive quantitative comparison is performed on a publicly available brain MRI dataset (MICCAI 2012) to evaluate the effectiveness of the proposed algorithm. The experimental results indicate that the proposed method achieves an average DICE score of 95.77% compared to some advanced methods, which is 4.96% better than the classical U-Net. The results demonstrate the potential of the proposed EG-Net in improving the accuracy of frontal edge pixel classification through edge guidance.
Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.
Basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) are the two most common skin cancer and impose a huge medical burden on society. Histopathological examination based on whole-slide images (WSIs) remains to be the confirmatory diagnostic method for skin tumors. Accurate segmentation of tumor tissue in WSIs by deep-learning (DL) models can reduce the workload of pathologists and help surgeons ensure the complete removal of tumors. To accurately segment the tumor areas in WSIs of BCC, SCC and squamous cell papilloma (SCP, homologous to SCC) with robust models. We established a data set (ZJU-NMSC) containing 151 WSIs of BCC, SCC and SCP in total. Seven models were utilized to segment WSIs, including the state-of-the-art model, models proposed by us and other models. Dice score, intersection over union, accuracy, sensitivity and specificity were used to evaluate and compare the performance of different models. Heatmaps and tumor tissue masks were generated to reflect the results of the segmentation. The processing times of models are also recorded and compared. While the dice score of most models is higher than 0.85, deeplab v3+ has the best performance and the corresponding tumor tissue mask is more consistent with the ground truth tumor areas even with complex and small lobular lesions. This study broadens the use of DL-based segmentation models in WSIs of skin tumors in terms of tumor types and computational approaches. Segmenting tumor areas can simplify the process of histopathological inspection and benefit the diagnosis and following management of the diseases in practice.
Semantic image segmentation is an important prerequisite for context-awareness and autonomous robotics in surgery. The state of the art has focused on conventional RGB video data acquired during minimally invasive surgery, but full-scene semantic segmentation based on spectral imaging data and obtained during open surgery has received almost no attention to date. To address this gap in the literature, we are investigating the following research questions based on hyperspectral imaging (HSI) data of pigs acquired in an open surgery setting: (1) What is an adequate representation of HSI data for neural network-based fully automated organ segmentation, especially with respect to the spatial granularity of the data (pixels vs. superpixels vs. patches vs. full images)? (2) Is there a benefit of using HSI data compared to other modalities, namely RGB data and processed HSI data (e.g. tissue parameters like oxygenation), when performing semantic organ segmentation? According to a comprehensive validation study based on 506 HSI images from 20 pigs, annotated with a total of 19 classes, deep learning-based segmentation performance increases - consistently across modalities - with the spatial context of the input data. Unprocessed HSI data offers an advantage over RGB data or processed data from the camera provider, with the advantage increasing with decreasing size of the input to the neural network. Maximum performance (HSI applied to whole images) yielded a mean DSC of 0.90 ((standard deviation (SD)) 0.04), which is in the range of the inter-rater variability (DSC of 0.89 ((standard deviation (SD)) 0.07)). We conclude that HSI could become a powerful image modality for fully-automatic surgical scene understanding with many advantages over traditional imaging, including the ability to recover additional functional tissue information. Our code and pre-trained models are available at https://github.com/IMSY-DKFZ/htc.
Image segmentation is an essential component in medical image analysis. The case of 3D images such as MRI is particularly challenging and time consuming. Interactive or semi-automatic methods are thus highly desirable. However, existing methods do not exploit the typical sequentiality of real user interactions. This is due to the interaction memory used in these systems, which discards ordering. In contrast, we argue that the order of the user corrections should be used for training and lead to performance improvements. We contribute to solving this problem by proposing a general multi-class deep learning-based interactive framework for image segmentation, which embeds a base network in a user interaction loop with a user feedback memory. We propose to model the memory explicitly as a sequence of consecutive system states, from which the features can be learned, generally learning from the segmentation refinement process. Training is a major difficulty owing to the network's input being dependent on the previous output. We adapt the network to this loop by introducing a virtual user in the training process, modelled by dynamically simulating the iterative user feedback. We evaluated our framework against existing methods on the complex task of multi-class semantic instance female pelvis MRI segmentation with 5 classes, including up to 27 tumour instances, using a segmentation dataset collected in our hospital, and on liver and pancreas CT segmentation, using public datasets. We conducted a user evaluation, involving both senior and junior medical personnel in matching and adjacent areas of expertise. We observed an annotation time reduction with 5'56" for our framework against 25' on average for classical tools. We systematically evaluated the influence of the number of clicks on the segmentation accuracy. A single interaction round our framework outperforms existing automatic systems with a comparable setup. We provide an ablation study and show that our framework outperforms existing interactive systems. Our framework largely outperforms existing systems in accuracy, with the largest impact on the smallest, most difficult classes, and drastically reduces the average user segmentation time with fast inference at 47.2±6.2 ms per image.
No abstract
Accurate segmentation of medical images is essential for clinical decision-making, and deep learning techniques have shown remarkable results in this area. However, existing segmentation models that combine transformer and convolutional neural networks often use skip connections in U-shaped networks, which may limit their ability to capture contextual information in medical images. To address this limitation, we propose a coordinated mobile and residual transformer UNet (MRC-TransUNet) that combines the strengths of transformer and UNet architectures. Our approach uses a lightweight MR-ViT to address the semantic gap and a reciprocal attention module to compensate for the potential loss of details. To better explore long-range contextual information, we use skip connections only in the first layer and add MR-ViT and RPA modules in the subsequent downsampling layers. In our study, we evaluated the effectiveness of our proposed method on three different medical image segmentation datasets, namely, breast, brain, and lung. Our proposed method outperformed state-of-the-art methods in terms of various evaluation metrics, including the Dice coefficient and Hausdorff distance. These results demonstrate that our proposed method can significantly improve the accuracy of medical image segmentation and has the potential for clinical applications. Illustration of the proposed MRC-TransUNet. For the input medical images, we first subject them to an intrinsic downsampling operation and then replace the original jump connection structure using MR-ViT. The output feature representations at different scales are fused by the RPA module. Finally, an upsampling operation is performed to fuse the features to restore them to the same resolution as the input image.
Semantic segmentation is a fundamental part of the surgical application of deep learning. Traditionally, segmentation in vision tasks has been performed using convolutional neural networks (CNNs), but the transformer architecture has recently been introduced and widely investigated. We aimed to investigate the performance of deep learning models in segmentation in robot-assisted radical prostatectomy (RARP) and identify which of the architectures is superior for segmentation in robotic surgery. Intraoperative images during RARP were obtained. The dataset was randomly split into training and validation data. Segmentation of the surgical instruments, bladder, prostate, vas and seminal vesicle was performed using three CNN models (DeepLabv3, MANet, and U-Net++) and three transformers (SegFormer, BEiT, and DPT), and their performances were analyzed. The overall segmentation performance during RARP varied across different model architectures. For the CNN models, DeepLabV3 achieved a mean Dice score of 0.938, MANet scored 0.944, and U-Net++ reached 0.930. For the transformer architectures, SegFormer attained a mean Dice score of 0.919, BEiT scored 0.916, and DPT achieved 0.940. The performance of CNN models was superior to that of transformer models in segmenting the prostate, vas, and seminal vesicle. Deep learning models provided accurate segmentation of the surgical instruments and anatomical structures observed during RARP. Both CNN and transformer models showed reliable predictions in the segmentation task; however, CNN models may be more suitable than transformer models for organ segmentation and may be more applicable in unusual cases. Further research with large datasets is needed.
In colorectal cancer (CRC), artificial intelligence (AI) can alleviate the laborious task of characterization and reporting on resected biopsies, including polyps, the numbers of which are increasing as a result of CRC population screening programs ongoing in many countries all around the globe. Here, we present an approach to address two major challenges in the automated assessment of CRC histopathology whole-slide images. We present an AI-based method to segment multiple ([Formula: see text]) tissue compartments in the H &E-stained whole-slide image, which provides a different, more perceptible picture of tissue morphology and composition. We test and compare a panel of state-of-the-art loss functions available for segmentation models, and provide indications about their use in histopathology image segmentation, based on the analysis of (a) a multi-centric cohort of CRC cases from five medical centers in the Netherlands and Germany, and (b) two publicly available datasets on segmentation in CRC. We used the best performing AI model as the basis for a computer-aided diagnosis system that classifies colon biopsies into four main categories that are relevant pathologically. We report the performance of this system on an independent cohort of more than 1000 patients. The results show that with a good segmentation network as a base, a tool can be developed which can support pathologists in the risk stratification of colorectal cancer patients, among other possible uses. We have made the segmentation model available for research use on https://grand-challenge.org/algorithms/colon-tissue-segmentation/ .
The preservation of autonomic nerves is the most important factor in maintaining genitourinary function in colorectal surgery; however, these nerves are not clearly recognisable, and their identification is strongly affected by the surgical ability. Therefore, this study aimed to develop a deep learning model for the semantic segmentation of autonomic nerves during laparoscopic colorectal surgery and to experimentally verify the model through intraoperative use and pathological examination. The annotation data set comprised videos of laparoscopic colorectal surgery. The images of the hypogastric nerve (HGN) and superior hypogastric plexus (SHP) were manually annotated under a surgeon's supervision. The Dice coefficient was used to quantify the model performance after five-fold cross-validation. The model was used in actual surgeries to compare the recognition timing of the model with that of surgeons, and pathological examination was performed to confirm whether the samples labelled by the model from the colorectal branches of the HGN and SHP were nerves. The data set comprised 12 978 video frames of the HGN from 245 videos and 5198 frames of the SHP from 44 videos. The mean (±SD) Dice coefficients of the HGN and SHP were 0.56 (±0.03) and 0.49 (±0.07), respectively. The proposed model was used in 12 surgeries, and it recognised the right HGN earlier than the surgeons did in 50.0% of the cases, the left HGN earlier in 41.7% of the cases and the SHP earlier in 50.0% of the cases. Pathological examination confirmed that all 11 samples were nerve tissue. An approach for the deep-learning-based semantic segmentation of autonomic nerves was developed and experimentally validated. This model may facilitate intraoperative recognition during laparoscopic colorectal surgery.
Left-ventricular (LV) strain measurements with the Displacement Encoding with Stimulated Echoes (DENSE) MRI sequence provide accurate estimates of cardiotoxicity damage related to breast cancer chemotherapy. This study investigated an automated LV chamber quantification tool via segmentation with a supervised deep convolutional neural network (DCNN) before strain analysis with DENSE images. Segmentation for chamber quantification analysis was conducted with a custom DeepLabV3+ DCNN with ResNet-50 backbone on 42 female breast cancer datasets (22 training-sets, eight validation-sets and 12 independent test-sets). Parameters such as LV end-diastolic diameter (LVEDD) and ejection fraction (LVEF) were quantified, and myocardial strains analyzed with the Radial Point Interpolation Method (RPIM). Myocardial classification was validated against ground-truth with sensitivity-specificity analysis, the metrics of Dice, average perpendicular distance (APD) and Hausdorff-distance. Following segmentation, validation was conducted with the Cronbach's Alpha (C-Alpha) intraclass correlation coefficient between LV chamber quantification results with DENSE and Steady State Free Precession (SSFP) acquisitions and a vendor tool-based method to segment the DENSE data, and similarly for myocardial strain analysis in the chambers. The results of myocardial classification from segmentation of the DENSE data were accuracy = 97%, Dice = 0.89 and APD = 2.4 mm in the test-set. The C-Alpha correlations from comparing chamber quantification results between the segmented DENSE and SSFP data and vendor tool-based method were 0.97 for LVEF (56 ± 7% vs 55 ± 7% vs 55 ± 6%, p = 0.6) and 0.77 for LVEDD (4.6 ± 0.4 cm vs 4.5 ± 0.3 cm vs 4.5 ± 0.3 cm, p = 0.8). The validation metrics against ground-truth and equivalent parameters obtained from the SSFP segmentation and vendor tool-based comparisons show that the DCNN approach is applicable for automated LV chamber quantification and subsequent strain analysis in cardiotoxicity.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19). Imaging tests such as chest X-ray (CXR) and computed tomography (CT) can provide useful information to clinical staff for facilitating a diagnosis of COVID-19 in a more efficient and comprehensive manner. As a breakthrough of artificial intelligence (AI), deep learning has been applied to perform COVID-19 infection region segmentation and disease classification by analyzing CXR and CT data. However, prediction uncertainty of deep learning models for these tasks, which is very important to safety-critical applications like medical image processing, has not been comprehensively investigated. In this work, we propose a novel ensemble deep learning model through integrating bagging deep learning and model calibration to not only enhance segmentation performance, but also reduce prediction uncertainty. The proposed method has been validated on a large dataset that is associated with CXR image segmentation. Experimental results demonstrate that the proposed method can improve the segmentation performance, as well as decrease prediction uncertainty.
U-Net includes encoder, decoder and skip connection structures. It has become the benchmark network in medical image segmentation. However, the direct fusion of low-level and high-level convolution features with semantic gaps by traditional skip connections may lead to problems such as fuzzy generated feature maps and target region segmentation errors. We use spatial enhancement filtering technology to compensate for the semantic gap and propose an enhanced dense U-Net (E-DU), aiming to apply it to multimodal medical image segmentation to improve the segmentation performance and efficiency. Before combining encoder and decoder features, we replace the traditional skip connection with a multiscale denoise enhancement (MDE) module. The encoder features need to be deeply convolved by the spatial enhancement filter and then combined with the decoder features. We propose a simple and efficient deep full convolution network structure E-DU, which can not only fuse semantically various features but also denoise and enhance the feature map. We performed experiments on medical image segmentation datasets with seven image modalities and combined MDE with various baseline networks to perform ablation studies. E-DU achieved the best segmentation results on evaluation indicators such as DSC on the U-Net family, with DSC values of 97.78, 97.64, 95.31, 94.42, 94.93, 98.85, and 98.38 (%), respectively. The addition of the MDE module to the attention mechanism network improves segmentation performance and efficiency, reflecting its generalization performance. In comparison to advanced methods, our method is also competitive. Our proposed MDE module has a good segmentation effect and operating efficiency, and it can be easily extended to multiple modal medical segmentation datasets. Our idea and method can achieve clinical multimodal medical image segmentation and make full use of image information to provide clinical decision support. It has great application value and promotion prospects.
Recently, an emerging trend in medical image classification is to combine radiomics framework with deep learning classification network in an integrated system. Although this combination is efficient in some tasks, the deep learning-based classification network is often difficult to capture an effective representation of lesion regions, and prone to face the challenge of overfitting, leading to unreliable features and inaccurate results, especially when the sizes of the lesions are small or the training dataset is small. In addition, these combinations mostly lack an effective feature selection mechanism, which makes it difficult to obtain the optimal feature selection. In this paper, we introduce a novel and effective deep semantic segmentation feature-based radiomics (DSFR) framework to overcome the above-mentioned challenges, which consists of two modules: the deep semantic feature extraction module and the feature selection module. Specifically, the extraction module is utilized to extract hierarchical semantic features of the lesions from a trained segmentation network. The feature selection module aims to select the most representative features by using a novel feature similarity adaptation algorithm. Experiments are extensively conducted to evaluate our method in two clinical tasks: the pathological grading prediction in pancreatic neuroendocrine neoplasms (pNENs), and the prediction of thrombolytic therapy efficacy in deep venous thrombosis (DVT). Experimental results on both tasks demonstrate that the proposed method consistently outperforms the state-of-the-art approaches by a large margin.
Fundus tessellation (FT) is a prevalent clinical feature associated with myopia and has implications in the development of myopic maculopathy, which causes irreversible visual impairment. Accurate classification of FT in color fundus photo can help predict the disease progression and prognosis. However, the lack of precise detection and classification tools has created an unmet medical need, underscoring the importance of exploring the clinical utility of FT. Thus, to address this gap, we introduce an automatic FT grading system (called DeepGraFT) using classification-and-segmentation co-decision models by deep learning. ConvNeXt, utilizing transfer learning from pretrained ImageNet weights, was employed for the classification algorithm, aligning with a region of interest based on the ETDRS grading system to boost performance. A segmentation model was developed to detect FT exits, complementing the classification for improved grading accuracy. The training set of DeepGraFT was from our in-house cohort (MAGIC), and the validation sets consisted of the rest part of in-house cohort and an independent public cohort (UK Biobank). DeepGraFT demonstrated a high performance in the training stage and achieved an impressive accuracy in validation phase (in-house cohort: 86.85 %; public cohort: 81.50 %). Furthermore, our findings demonstrated that DeepGraFT surpasses machine learning-based classification models in FT classification, achieving a 5.57 % increase in accuracy. Ablation analysis revealed that the introduced modules significantly enhanced classification effectiveness and elevated accuracy from 79.85 % to 86.85 %. Further analysis using the results provided by DeepGraFT unveiled a significant negative association between FT and spherical equivalent (SE) in the UK Biobank cohort. In conclusion, DeepGraFT accentuates potential benefits of the deep learning model in automating the grading of FT and allows for potential utility as a clinical-decision support tool for predicting progression of pathological myopia.
To develop a robust segmentation model, encoding the underlying features/structures of the input data is essential to discriminate the target structure from the background. To enrich the extracted feature maps, contrastive learning and self-learning techniques are employed, particularly when the size of the training dataset is limited. In this work, we set out to investigate the impact of contrastive learning and self-learning on the performance of the deep learning-based semantic segmentation. To this end, three different datasets were employed used for brain tumor and hippocampus delineation from MR images (BraTS and Decathlon datasets, respectively) and kidney segmentation from CT images (Decathlon dataset). Since data augmentation techniques are also aimed at enhancing the performance of deep learning methods, a deformable data augmentation technique was proposed and compared with contrastive learning and self-learning frameworks. The segmentation accuracy for the three datasets was assessed with and without applying data augmentation, contrastive learning, and self-learning to individually investigate the impact of these techniques. The self-learning and deformable data augmentation techniques exhibited comparable performance with Dice indices of 0.913 ± 0.030 and 0.920 ± 0.022 for kidney segmentation, 0.890 ± 0.035 and 0.898 ± 0.027 for hippocampus segmentation, and 0.891 ± 0.045 and 0.897 ± 0.040 for lesion segmentation, respectively. These two approaches significantly outperformed the contrastive learning and the original model with Dice indices of 0.871 ± 0.039 and 0.868 ± 0.042 for kidney segmentation, 0.872 ± 0.045 and 0.865 ± 0.048 for hippocampus segmentation, and 0.870 ± 0.049 and 0.860 ± 0.058 for lesion segmentation, respectively. The combination of self-learning with deformable data augmentation led to a robust segmentation model with no outliers in the outcomes. This work demonstrated the beneficial impact of self-learning and deformable data augmentation on organ and lesion segmentation, where no additional training datasets are needed.
Ultrasound (US) imaging is a widely used medical imaging modality for the diagnosis, monitoring, and surgical planning for kidney conditions. Thus, accurate segmentation of the kidney and internal structures in US images is essential for the assessment of kidney function and the detection of pathological conditions, such as cysts, tumors, and kidney stones. Therefore, there is a need for automated methods that can accurately segment the kidney and internal structures in US images. Over the years, automatic strategies were proposed for such purpose, with deep learning methods achieving the current state-of-the-art results. However, these strategies typically ignore the segmentation of the internal structures of the kidney. Moreover, they were evaluated in different private datasets, hampering the direct comparison of results, and making it difficult to determination the optimal strategy for this task. In this study, we perform a comparative analysis of 7 deep learning networks for the segmentation of the kidney and internal structures (Capsule, Central Echogenic Complex (CEC), Cortex and Medulla) in 2D US images in an open access multi-class kidney US dataset. The dataset includes 514 images, acquired in multiple clinical centers using different US machines and protocols. The dataset contains the annotation of two experts, but 321 images with complete segmentation of all 4 classes were used. Overall, the results demonstrate that the DeepLabV3+ network outperformed the inter-rater variability with a Dice score of 78.0% compared to 75.6% for inter-rater variability. Specifically, DeepLabV3Plus achieved mean Dice scores of 94.2% for the Capsule, 85.8% for the CEC, 62.4% for the Cortex, and 69.6% for the Medulla. These findings suggest the potential of deep learning-based methods in improving the accuracy of kidney segmentation in US images.Clinical Relevance- This study shows the potential of DL for improving accuracy of kidney segmentation in US, leading to increased diagnostic efficiency, and enabling new applications such as computer-aided diagnosis and treatment, ultimately resulting in improved patient outcomes and reduced healthcare costs.
Automated medical image segmentation plays a crucial role in diverse clinical applications. The high annotation costs of fully-supervised medical segmentation methods have spurred a growing interest in semi-supervised methods. Existing semi-supervised medical segmentation methods train the teacher segmentation network using labeled data to establish pseudo labels for unlabeled data. The quality of these pseudo labels is constrained as these methods fail to effectively address the significant bias in the data distribution learned from the limited labeled data. To address these challenges, this paper introduces an innovative Correspondence-based Generative Bayesian Deep Learning (C-GBDL) model. Built upon the teacher-student architecture, we design a multi-scale semantic correspondence method to aid the teacher model in generating high-quality pseudo labels. Specifically, our teacher model, embedded with the multi-scale semantic correspondence, learns a better-generalized data distribution from input volumes by feature matching with the reference volumes. Additionally, a double uncertainty estimation schema is proposed to further rectify the noisy pseudo labels. The double uncertainty estimation takes the predictive entropy as the first uncertainty estimation and takes the structural similarity between the input volume and its corresponding reference volumes as the second uncertainty estimation. Four groups of comparative experiments conducted on two public medical datasets demonstrate the effectiveness and the superior performance of our proposed model. Our code is available on https://github.com/yumjoo/C-GBDL.
We present our novel deep multi-task learning method for medical image segmentation. Existing multi-task methods demand ground truth annotations for both the primary and auxiliary tasks. Contrary to it, we propose to generate the pseudo-labels of an auxiliary task in an unsupervised manner. To generate the pseudo-labels, we leverage Histogram of Oriented Gradients (HOGs), one of the most widely used and powerful hand-crafted features for detection. Together with the ground truth semantic segmentation masks for the primary task and pseudo-labels for the auxiliary task, we learn the parameters of the deep network to minimize the loss of both the primary task and the auxiliary task jointly. We employed our method on two powerful and widely used semantic segmentation networks: UNet and U2Net to train in a multi-task setup. To validate our hypothesis, we performed experiments on two different medical image segmentation data sets. From the extensive quantitative and qualitative results, we observe that our method consistently improves the performance compared to the counter-part method. Moreover, our method is the winner of FetReg Endovis Sub-challenge on Semantic Segmentation organised in conjunction with MICCAI 2021. Code and implementation details are available at:https://github.com/thetna/medical_image_segmentation.
The traditional Lund-Mackay score (TLMs) is unable to subgrade the volume of inflammatory disease. We aimed to propose an effective modification and calculated the volume-based modified LM score (VMLMs), which should correlate more strongly with clinical symptoms than the TLMs. Semi-supervised learning with pseudo-labels used for self-training was adopted to train our convolutional neural networks, with the algorithm including a combination of MobileNet, SENet, and ResNet. A total of 175 CT sets, with 50 participants that would undergo sinus surgery, were recruited. The Sinonasal Outcomes Test-22 (SNOT-22) was used to assess disease-specific symptoms before and after surgery. A 3D-projected view was created and VMLMs were calculated for further comparison. Our methods showed a significant improvement both in sinus classification and segmentation as compared to state-of-the-art networks, with an average Dice coefficient of 91.57%, an MioU of 89.43%, and a pixel accuracy of 99.75%. The sinus volume exhibited sex dimorphism. There was a significant positive correlation between volume and height, but a trend toward a negative correlation between maxillary sinus and age. Subjects who underwent surgery had significantly greater TLMs (14.9 vs. 7.38) and VMLMs (11.65 vs. 4.34) than those who did not. ROC-AUC analyses showed that the VMLMs had excellent discrimination at classifying a high probability of postoperative improvement with SNOT-22 reduction. Our method is suitable for obtaining detailed information, excellent sinus boundary prediction, and differentiating the target from its surrounding structure. These findings demonstrate the promise of CT-based volumetric analysis of sinus mucosal inflammation.
Recognition and segmentation of brain tumours (BT) using MR images are valuable and tedious processes in the healthcare industry. Earlier diagnosis and localization of BT provide timely options to select effective treatment plans for the doctors and can save lives. BT segmentation from Magnetic Resonance Images (MRI) is considered a big challenge owing to the difficulty of BT tissues, and segmenting them from the healthier tissue is challenging when manual segmentation is done through radiologists. Among the recent proposals for the brain segmentation method, the BT segmentation method based on machine learning (ML) and image processing could be better. Thus, the DL-based brain segmentation method is extensively applied, and the convolutional network has better brain segmentation effects. The deep convolutional network model has the problem of a large loss of information and a large number of parameters in the encoding and decoding processes. With this motivation, this article presents a new Deep Transfer Learning with Semantic Segmentation based Medical Image Analysis (DTLSS-MIA) technique on MRI images. The DTLSS-MIA technique aims to segment the affected BT area in the MRI images. At first, the presented method utilizes a Median filtering (MF) approach to optimize the quality of MRI images and remove the noise. For the semantic segmentation method, the DTLSS-MIA method follows DeepLabv3 + with a backbone of the EfficientNet model for determining the affected brain region. Moreover, the CapsNet architecture is employed for the feature extraction process. Lastly, the crayfish optimization (CFO) technique with diffusion variational autoencoder (D-VAE) architecture is used as a classification mechanism, and the CFO technique effectively tunes the D-VAE hyperparameter. The simulation analysis of the DTLSS-MIA technique is validated on a benchmark dataset. The performance validation of the DTLSS-MIA technique exhibited a superior accuracy value of 99.53% over other methods.
Medical image segmentation is a focus research and foundation in developing intelligent medical systems. Recently, deep learning for medical image segmentation has become a standard process and succeeded significantly, promoting the development of reconstruction, and surgical planning of disease diagnosis. However, semantic learning is often inefficient owing to the lack of supervision of feature maps, resulting in that high-quality segmentation models always rely on numerous and accurate data annotations. Learning robust semantic representation in latent spaces remains a challenge. In this paper, we propose a novel semi-supervised learning framework to learn vital attributes in medical images, which constructs generalized representation from diverse semantics to realize medical image segmentation. We first build a self-supervised learning part that achieves context recovery by reconstructing space and intensity of medical images, which conduct semantic representation for feature maps. Subsequently, we combine semantic-rich feature maps and utilize simple linear semantic transformation to convert them into image segmentation. The proposed framework was tested using five medical segmentation datasets. Quantitative assessments indicate the highest scores of our method on IXI (73.78%), ScaF (47.50%), COVID-19-Seg (50.72%), PC-Seg (65.06%), and Brain-MR (72.63%) datasets. Finally, we compared our method with the latest semi-supervised learning methods and obtained 77.15% and 75.22% DSC values, respectively, ranking first on two representative datasets. The experimental results not only proved that the proposed linear semantic transformation was effectively applied to medical image segmentation, but also presented its simplicity and ease-of-use to pursue robust segmentation in semi-supervised learning. Our code is now open at: https://github.com/QingYunA/Linear-Semantic-Transformation-for-Semi-Supervised-Medical-Image-Segmentation.
Particularly within the Internet of Medical Things (IoMT) context, skin lesion analysis is critical for precise diagnosis. To improve the accuracy and efficiency of skin lesion analysis, CAD systems play a crucial role. To segment and classify skin lesions from dermoscopy images, this study focuses on using hybrid deep learning techniques. This research uses a hybrid deep learning model that combines two cutting-edge approaches: Mask Region-based Convolutional Neural Network (MRCNN) for semantic segmentation and ResNet50 for lesion detection. To pinpoint the precise location of a skin lesion, the MRCNN is used for border delineation. We amass a huge, annotated collection of dermoscopy images for thorough model training. The hybrid deep learning model to capture subtle representations of the images is trained from start to finish using this dataset. The experimental results using dermoscopy images show that the suggested hybrid method outperforms the current state-of-the-art methods. The model's capacity to segment lesions into distinct groups is demonstrated by a segmentation accuracy measurement of 95.49 percent. In addition, the classification of skin lesions shows great accuracy and dependability, which is a notable advancement over traditional methods. The model is put through its paces on the ISIC 2020 Challenge dataset, scoring a perfect 96.75% accuracy. Compared to current best practices in IoMT, segmentation and classification models perform exceptionally well. In conclusion, this paper's hybrid deep learning strategy is highly effective in skin lesion segmentation and classification. The results show that the model has the potential to improve diagnostic accuracy in the setting of IoMT, and it outperforms the current gold standards. The excellent results obtained on the ISIC 2020 Challenge dataset further confirm the viability and superiority of the suggested methodology for skin lesion analysis.
Detection and segmentation of brain tumors using MR images are challenging and valuable tasks in the medical field. Early diagnosing and localizing of brain tumors can save lives and provide timely options for physicians to select efficient treatment plans. Deep learning approaches have attracted researchers in medical imaging due to their capacity, performance, and potential to assist in accurate diagnosis, prognosis, and medical treatment technologies. This paper presents a novel framework for segmenting 2D brain tumors in MR images using deep neural networks (DNN) and utilizing data augmentation strategies. The proposed approach (Znet) is based on the idea of skip-connection, encoder-decoder architectures, and data amplification to propagate the intrinsic affinities of a relatively smaller number of expert delineated tumors, e.g., hundreds of patients of the low-grade glioma (LGG), to many thousands of synthetic cases. Our experimental results showed high values of the mean dice similarity coefficient (dice = 0.96 during model training and dice = 0.92 for the independent testing dataset). Other evaluation measures were also relatively high, e.g., pixel accuracy = 0.996, F1 score = 0.81, and Matthews Correlation Coefficient, MCC = 0.81. The results and visualization of the DNN-derived tumor masks in the testing dataset showcase the ZNet model's capability to localize and auto-segment brain tumors in MR images. This approach can further be generalized to 3D brain volumes, other pathologies, and a wide range of image modalities. We can confirm the ability of deep learning methods and the proposed Znet framework to detect and segment tumors in MR images. Furthermore, pixel accuracy evaluation may not be a suitable evaluation measure for semantic segmentation in case of class imbalance in MR images segmentation. This is because the dominant class in ground truth images is the background. Therefore, a high value of pixel accuracy can be misleading in some computer vision applications. On the other hand, alternative evaluation metrics, such as dice and IoU (Intersection over Union), are more factual for semantic segmentation. Artificial intelligence (AI) applications in medicine are advancing swiftly, however, there is a lack of deployed techniques in clinical practice. This research demonstrates a practical example of AI applications in medical imaging, which can be deployed as a tool for auto-segmentation of tumors in MR images.
The increased availability and usage of modern medical imaging induced a strong need for automatic medical image segmentation. Still, current image segmentation platforms do not provide the required functionalities for plain setup of medical image segmentation pipelines. Already implemented pipelines are commonly standalone software, optimized on a specific public data set. Therefore, this paper introduces the open-source Python library MIScnn. The aim of MIScnn is to provide an intuitive API allowing fast building of medical image segmentation pipelines including data I/O, preprocessing, data augmentation, patch-wise analysis, metrics, a library with state-of-the-art deep learning models and model utilization like training, prediction, as well as fully automatic evaluation (e.g. cross-validation). Similarly, high configurability and multiple open interfaces allow full pipeline customization. Running a cross-validation with MIScnn on the Kidney Tumor Segmentation Challenge 2019 data set (multi-class semantic segmentation with 300 CT scans) resulted into a powerful predictor based on the standard 3D U-Net model. With this experiment, we could show that the MIScnn framework enables researchers to rapidly set up a complete medical image segmentation pipeline by using just a few lines of code. The source code for MIScnn is available in the Git repository: https://github.com/frankkramer-lab/MIScnn .
Although the existing deep supervised solutions have achieved some great successes in medical image segmentation, they have the following shortcomings; (i) semantic difference problem: since they are obtained by very different convolution or deconvolution processes, the intermediate masks and predictions in deep supervised baselines usually contain semantics with different depth, which thus hinders the models' learning capabilities; (ii) low learning efficiency problem: additional supervision signals will inevitably make the training of the models more time-consuming. Therefore, in this work, we first propose two deep supervised learning strategies, U-Net-Deep and U-Net-Auto, to overcome the semantic difference problem. Then, to resolve the low learning efficiency problem, upon the above two strategies, we further propose a new deep supervised segmentation model, called μ-Net, to achieve not only effective but also efficient deep supervised medical image segmentation by introducing a tied-weight decoder to generate pseudo-labels with more diverse information and also speed up the convergence in training. Finally, three different types of μ-Net-based deep supervision strategies are explored and a Similarity Principle of Deep Supervision is further derived to guide future research in deep supervised learning. Experimental studies on four public benchmark datasets show that μ-Net greatly outperforms all the state-of-the-art baselines, including the state-of-the-art deeply supervised segmentation models, in terms of both effectiveness and efficiency. Ablation studies sufficiently prove the soundness of the proposed Similarity Principle of Deep Supervision, the necessity and effectiveness of the tied-weight decoder, and using both the segmentation and reconstruction pseudo-labels for deep supervised learning.
In the field of computer-aided medical diagnosis, it is crucial to adapt medical image segmentation to limited computing resources. There is tremendous value in developing accurate, real-time vision processing models that require minimal computational resources. When building lightweight models, there is always a trade-off between computational cost and segmentation performance. Performance often suffers when applying models to meet resource-constrained scenarios characterized by computation, memory, or storage constraints. This remains an ongoing challenge. This paper proposes a lightweight network for medical image segmentation. It introduces a lightweight transformer, proposes a simplified core feature extraction network to capture more semantic information, and builds a multi-scale feature interaction guidance framework. The fusion module embedded in this framework is designed to address spatial and channel complexities. Through the multi-scale feature interaction guidance framework and fusion module, the proposed network achieves robust semantic information extraction from low-resolution feature maps and rich spatial information retrieval from high-resolution feature maps while ensuring segmentation performance. This significantly reduces the parameter requirements for maintaining deep features within the network, resulting in faster inference and reduced floating-point operations (FLOPs) and parameter counts. Experimental results on ISIC2017 and ISIC2018 datasets confirm the effectiveness of the proposed network in medical image segmentation tasks. For instance, on the ISIC2017 dataset, the proposed network achieved a segmentation accuracy of 82.33 % mIoU, and a speed of 71.26 FPS on 256 × 256 images using a GeForce GTX 3090 GPU. Furthermore, the proposed network is tremendously lightweight, containing only 0.524M parameters. The corresponding source codes are available at https://github.com/CurbUni/LMIS-lightweight-network.
Instance segmentation of biological cells is important in medical image analysis for identifying and segmenting individual cells, and quantitative measurement of subcellular structures requires further cell-level subcellular part segmentation. Subcellular structure measurements are critical for cell phenotyping and quality analysis. For these purposes, instance-aware part segmentation network is first introduced to distinguish individual cells and segment subcellular structures for each detected cell. This approach is demonstrated on human sperm cells since the World Health Organization has established quantitative standards for sperm quality assessment. Specifically, a novel Cell Parsing Net (CP-Net) is proposed for accurate instance-level cell parsing. An attention-based feature fusion module is designed to alleviate contour misalignments for cells with an irregular shape by using instance masks as spatial cues instead of as strict constraints to differentiate various instances. A coarse-to-fine segmentation module is developed to effectively segment tiny subcellular structures within a cell through hierarchical segmentation from whole to part instead of directly segmenting each cell part. Moreover, a sperm parsing dataset is built including 320 annotated sperm images with five semantic subcellular part labels. Extensive experiments on the collected dataset demonstrate that the proposed CP-Net outperforms state-of-the-art instance-aware part segmentation networks.
Nucleus instance segmentation is an important task in medical image analysis involving cell-level pathological analysis and is of great significance for many biomedical applications, such as disease diagnosis and drug screening. However, the high-density and tight-contact between cells is a common feature of most cell images, which poses a great technical challenge for nuclei instance segmentation. The latest research focuses on CNN-based methods for nuclei instance segmentation, which typically rely on bounding box regression and non-maximum suppression to locate nuclei. However, this frequently results in poor local bounding boxes for nuclei that are adhered or clustered together. In response to the challenges of high-density and tight-contact in cellular images, we propose a novel end-to-end nuclei instance segmentation model. Specifically, we first employ the Swin Transformer as the backbone network of our model, which captures global multi-scale information by combining the global modelling capability of transformers and the local modelling capability of convolutional neural networks (CNNs). Additionally, we integrate a graph convolutional feature fusion module (GCFM), that combines deep and shallow features to learn an affinity matrix. The module also adopts graph convolution to guide the network in learning the object-level local information. Finally, we design a hybrid dilated convolution module (HDC) and insert it into the backbone network to enhance the contextual information over a large range. These components assist the network in extracting rich features. The experimental results demonstrate that our algorithm outperforms several state-of-the-art models on the DSB2018 and LIVECell datasets.
Whole brain segmentation is an important neuroimaging task that segments the whole brain volume into anatomically labeled regions-of-interest. Convolutional neural networks have demonstrated good performance in this task. Existing solutions, usually segment the brain image by classifying the voxels, or labeling the slices or the sub-volumes separately. Their representation learning is based on parts of the whole volume whereas their labeling result is produced by aggregation of partial segmentation. Learning and inference with incomplete information could lead to sub-optimal final segmentation result. To address these issues, we propose to adopt a full volume framework, which feeds the full volume brain image into the segmentation network and directly outputs the segmentation result for the whole brain volume. The framework makes use of complete information in each volume and can be implemented easily. An effective instance in this framework is given subsequently. We adopt the 3D high-resolution network (HRNet) for learning spatially fine-grained representations and the mixed precision training scheme for memory-efficient training. Extensive experiment results on a publicly available 3D MRI brain dataset show that our proposed model advances the state-of-the-art methods in terms of segmentation performance.
This paper presents a rapid and robust approach for 3D volumetric segmentation, labelling, and registration of human spinal vertebrae from CT scans using an optimised and improved 3D U-Net neural network architecture. The network is designed by incorporating residual and dense interconnections, followed by an extensive evaluation of different network setups by optimising the network components like activation functions, optimisers, and pooling operations. In addition, the network architecture is optimised for varying numbers of convolution layers per block and U-Net levels with fixed and cascading numbers of filters. For 3D virtual reality visualisation, the segmentation output of the improved 3D U-Net network is registered with the original scans through a corner point registration process. The registration takes into account the spatial coordinates of each segmented vertebra as a 3D volume and eight virtual fiducial markers to ensure alignment in all rotational planes. Trained on the VerSe'20 dataset, the proposed pipeline achieves a Dice score coefficient of 92.38% for vertebrae instance segmentation and a Hausdorff distance of 5.26 mm for vertebrae localisation on the VerSe'20 public test dataset, which outperforms many existing methods that participated in the VerSe'20 challenge. Integrated with Singular Health's MedVR software for virtual reality visualisation, the proposed solution has been deployed on standard edge-computing hardware in medical institutions. Depending on the scan size, the deployed solution takes between 90 and 210 s to label and segment vertebrae, including the cervical vertebrae. It is hoped that the acceleration of the segmentation and registration process will facilitate the easier preparation of future training datasets and benefit pre-surgical visualisation and planning.
The three-dimensional morphological structures of periodontal ligaments (PDLs) are important data for periodontal, orthodontic, prosthodontic, and implant interventions. This study aimed to employ a deep learning (DL) algorithm to segment the PDL automatically in cone-beam computed tomography (CBCT). This was a retrospective study. We randomly selected 389 patients and 1734 axial CBCT images from the CBCT database, and designed a fully automatic PDL segmentation computer-aided model based on instance segmentation Mask R-CNN network. The labels of the model training were 'teeth' and 'alveolar bone', and the 'PDL' is defined as the region where the 'teeth' and 'alveolar bone' overlap. The model's segmentation performance was evaluated using CBCT data from eight patients outside the database. Qualitative evaluation indicates that the PDL segmentation accuracy of incisors, canines, premolars, wisdom teeth, and implants reached 100%. The segmentation accuracy of molars was 96.4%. Quantitative evaluation indicates that the mIoU and mDSC of PDL segmentation were 0.667 ± 0.015 (>0.6) and 0.799 ± 0.015 (>0.7) respectively. This study analysed a unique approach to AI-driven automatic segmentation of PDLs on CBCT imaging, possibly enabling chair-side measurements of PDLs to facilitate periodontists, orthodontists, prosthodontists, and implantologists in more efficient and accurate diagnosis and treatment planning.
Deep learning-based automatic surgical instrument recognition is an indispensable technology for surgical research and development. However, pixel-level recognition with high accuracy is required to make it suitable for surgical automation. To develop a deep learning model that can simultaneously recognize 8 types of surgical instruments frequently used in laparoscopic colorectal operations and evaluate its recognition performance. This quality improvement study was conducted at a single institution with a multi-institutional data set. Laparoscopic colorectal surgical videos recorded between April 1, 2009, and December 31, 2021, were included in the video data set. Deep learning-based instance segmentation, an image recognition approach that recognizes each object individually and pixel by pixel instead of roughly enclosing with a bounding box, was performed for 8 types of surgical instruments. Average precision, calculated from the area under the precision-recall curve, was used as an evaluation metric. The average precision represents the number of instances of true-positive, false-positive, and false-negative results, and the mean average precision value for 8 types of surgical instruments was calculated. Five-fold cross-validation was used as the validation method. The annotation data set was split into 5 segments, of which 4 were used for training and the remainder for validation. The data set was split at the per-case level instead of the per-frame level; thus, the images extracted from an intraoperative video in the training set never appeared in the validation set. Validation was performed for all 5 validation sets, and the average mean average precision was calculated. In total, 337 laparoscopic colorectal surgical videos were used. Pixel-by-pixel annotation was manually performed for 81 760 labels on 38 628 static images, constituting the annotation data set. The mean average precisions of the instance segmentation for surgical instruments were 90.9% for 3 instruments, 90.3% for 4 instruments, 91.6% for 6 instruments, and 91.8% for 8 instruments. A deep learning-based instance segmentation model that simultaneously recognizes 8 types of surgical instruments with high accuracy was successfully developed. The accuracy was maintained even when the number of types of surgical instruments increased. This model can be applied to surgical innovations, such as intraoperative navigation and surgical automation.
Fetal echocardiography is an essential and comprehensive examination technique for the detection of fetal heart anomalies. Accurate cardiac chambers segmentation can assist cardiologists to analyze cardiac morphology and facilitate heart disease diagnosis. Previous research mainly focused on the segmentation of single cardiac chambers, such as left ventricle (LV) segmentation or left atrium (LA) segmentation. We propose a generic framework based on instance segmentation to segment the four cardiac chambers accurately and simultaneously. The proposed Category Attention Instance Segmentation Network (CA-ISNet) has three branches: a category branch for predicting the semantic category, a mask branch for segmenting the cardiac chambers, and a category attention branch for learning category information of instances. The category attention branch is used to correct instance misclassification of the category branch. In our collected dataset, which contains echocardiography images with four-chamber views of 319 fetuses, experimental results show our method can achieve superior segmentation performance against state-of-the-art methods. Specifically, using fivefold cross-validation, our model achieves Dice coefficients of 0.7956, 0.7619, 0.8199, 0.7470 for the four cardiac chambers, and with an average precision of 45.64%.
No abstract
Current state-of-the-art medical image segmentation methods prioritize precision but often at the expense of increased computational demands and larger model sizes. Applying these large-scale models to the relatively limited scale of medical image datasets tends to induce redundant computation, complicating the process without the necessary benefits. These approaches increase complexity and pose challenges for integrating and deploying lightweight models on edge devices. For instance, recent transformer-based models have excelled in 2D and 3D medical image segmentation due to their extensive receptive fields and high parameter count. However, their effectiveness comes with the risk of overfitting when applied to small datasets. It often neglects the vital inductive biases of Convolutional Neural Networks (CNNs), essential for local feature representation. In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical of larger models. PMFSNet streamlines the UNet-based hierarchical structure and simplifies the self-attention mechanism's computational complexity, making it suitable for lightweight applications. It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies. The extensive comprehensive results demonstrate that our method achieves superior performance in various segmentation tasks on different data scales even with fewer than a million parameters. Results reveal that our PMFSNet achieves IoU of 84.68%, 82.02%, 78.82%, and 76.48% on public datasets of 3D CBCT Tooth, ovarian tumors ultrasound (MMOTU), skin lesions dermoscopy (ISIC 2018), and gastrointestinal polyp (Kvasir SEG), and yields DSC of 78.29%, 77.45%, and 78.04% on three retinal vessel segmentation datasets, DRIVE, STARE, and CHASE-DB1, respectively. Our proposed model exhibits competitive performance across various datasets, accomplishing this with significantly fewer model parameters and inference time, demonstrating its value in model integration and deployment. It strikes an optimal compromise between efficiency and performance and can be a highly efficient solution for medical image analysis in resource-constrained clinical environments. The source code is available at https://github.com/yykzjh/PMFSNet.
The analysis of glandular morphology within colon histopathology images is an important step in determining the grade of colon cancer. Despite the importance of this task, manual segmentation is laborious, time-consuming and can suffer from subjectivity among pathologists. The rise of computational pathology has led to the development of automated methods for gland segmentation that aim to overcome the challenges of manual segmentation. However, this task is non-trivial due to the large variability in glandular appearance and the difficulty in differentiating between certain glandular and non-glandular histological structures. Furthermore, a measure of uncertainty is essential for diagnostic decision making. To address these challenges, we propose a fully convolutional neural network that counters the loss of information caused by max-pooling by re-introducing the original image at multiple points within the network. We also use atrous spatial pyramid pooling with varying dilation rates for preserving the resolution and multi-level aggregation. To incorporate uncertainty, we introduce random transformations during test time for an enhanced segmentation result that simultaneously generates an uncertainty map, highlighting areas of ambiguity. We show that this map can be used to define a metric for disregarding predictions with high uncertainty. The proposed network achieves state-of-the-art performance on the GlaS challenge dataset and on a second independent colorectal adenocarcinoma dataset. In addition, we perform gland instance segmentation on whole-slide images from two further datasets to highlight the generalisability of our method. As an extension, we introduce MILD-Net
The visual inspection of coronary artery stenosis is known to be significantly affected by variation, due to the presence of other tissues, camera movements, and uneven illumination. More accurate and intelligent coronary angiography diagnostic models are necessary for improving the above problems. In this study, 2980 medical images from 949 patients are collected and a novel deep learning-based coronary angiography (DLCAG) diagnose system is proposed. Firstly, we design a module of coronary classification. Then, we introduce RetinaNet to balance positive and negative samples and improve the recognition accuracy. Additionally, DLCAG adopts instance segmentation to segment the stenosis of vessels and depict the degree of the stenosis vessels. Our DLCAG is available at http://101.132.120.184:8077/ . When doctors use our system, all they need to do is login to the system, upload the coronary angiography videos. Then, a diagnose report is automatically generated.
Pixel-level medical image segmentation tasks are challenging due to factors such as variable target scales, complex geometric shapes, and low contrast. Although U-shaped hybrid networks have demonstrated strong performance, existing models often fail to effectively integrate the local features captured by convolutional neural networks (CNNs) with the global features provided by Transformers. Moreover, their self-attention mechanisms often lack adequate emphasis on critical spatial and channel information. To address these challenges, our goal was to develop a hybrid deep learning model that can effectively and robustly segment medical images, including but not limited to computed tomography (CT) and magnetic resonance (MR) images. We propose an effective hybrid U-shaped network, named the effective multi-scale context aggregation hybrid network (EMCAH-Net). It integrates an effective multi-scale context aggregation (EMCA) block in the backbone, along with a dual-attention augmented self-attention (DASA) block embedded in the skip connections and bottleneck layers. Aimed at the characteristics of medical images, the former block focuses on fine-grained local multi-scale feature encoding, whereas the latter enhances global representation learning by adaptively combining spatial and channel attention with self-attention. This approach not only effectively integrates local multi-scale and global features but also reinforces skip connections, thereby highlighting segmentation targets and precisely delineating boundaries. The code is publicly available at https://github.com/AloneIsland/EMCAH-Net. Compared to previous state-of-the-art (SOTA) methods, the EMCAH-Net achieves outstanding performance in medical image segmentation, with Dice similarity coefficient (DSC) scores of 84.73% (+2.85), 92.33% (+0.27), and 82.47% (+0.76) on the Synapse, automated cardiac diagnosis challenge (ACDC), and digital retinal images for vessel extraction (DRIVE) datasets, respectively. Additionally, it maintains computational efficiency in terms of model parameters and floating point operations (FLOPs). For instance, EMCAH-Net surpasses TransUNet on the Synapse dataset by 7.25% in DSC while requiring only 25% of the parameters and 71% of the FLOPs. EMCAH-Net has demonstrated significant advantages in segmenting multi-scale, small, and boundary-blurred features in medical images. Extensive experiments on abdominal multi-organ, cardiac, and retinal vessel medical segmentation tasks confirm that EMCAH-Net surpasses previous methods, including pure CNN, pure Transformer, and hybrid architectures.
Retinal vessels play a pivotal role as biomarkers in the detection of retinal diseases, including hypertensive retinopathy. The manual identification of these retinal vessels is both resource-intensive and time-consuming. The fidelity of vessel segmentation in automated methods directly depends on the fundus images' quality. In instances of sub-optimal image quality, applying deep learning-based methodologies emerges as a more effective approach for precise segmentation. We propose a heterogeneous neural network combining the benefit of local semantic information extraction of convolutional neural network and long-range spatial features mining of transformer network structures. Such cross-attention network structure boosts the model's ability to tackle vessel structures in the retinal images. Experiments on four publicly available datasets demonstrate our model's superior performance on vessel segmentation and the big potential of hypertensive retinopathy quantification.
This study concentrates on the segmentation of intracranial aneurysms, a pivotal aspect of diagnosis and treatment planning. We aim to overcome the inherent instance imbalance and morphological variability by introducing a novel morphology and texture loss reweighting approach. Our innovative method involves the incorporation of tailored weights within the loss function of deep neural networks. Specifically designed to account for aneurysm size, shape, and texture, this approach strategically guides the model to focus on capturing discriminative information from imbalanced features. The study conducted extensive experimentation utilizing ADAM and RENJI TOF-MRA datasets to validate the proposed approach. The results of our experimentation demonstrate the remarkable effectiveness of the introduced methodology in improving aneurysm segmentation accuracy. By dynamically adapting to the variances present in aneurysm features, our model showcases promising outcomes for accurate diagnostic insights. The nuanced consideration of morphological and textural nuances within the loss function proves instrumental in overcoming the challenge posed by instance imbalance. In conclusion, our study presents a nuanced solution to the intricate challenge of intracranial aneurysm segmentation. The proposed morphology and texture loss reweighting approach, with its tailored weights and dynamic adaptability, proves to be instrumental in enhancing segmentation precision. The promising outcomes from our experimentation suggest the potential for accurate diagnostic insights and informed treatment strategies, marking a significant advancement in this critical domain of medical imaging.
To propose a novel neural network to achieve tooth instance segmentation and recognition based on cone-beam computed tomography (CBCT) voxel data. The proposed methods included three different convolutional neural network models. The architecture was based on the Resnet module and built according to the structure of "Encoder-Decoder" and U-Net. The CBCT image was de-sampled and a fixed-size region of interest (ROI) containing all the teeth was determined. ROI would first through a two-branch "encoder and decoder" structure of the network, the network could predict each voxel in the input data of the spatial embedding. The post-processing algorithm would cluster the prediction results of the relevant spatial location information according to the two-branch network to realize the tooth instance segmentation. The tooth position identification was realized by another U-Net model based on the multi-classification segmentation task. According to the predicted results of the network, the post-processing algorithm would classify the tooth position according to the voting results of each tooth instance segmentation. At the original spatial resolution, a U-Net network model for the fine-tooth segmentation was trained using the region corresponding to each tooth as the input. According to the results of instance segmentation and tooth position identification, the model would process the correspon-ding positions on the high-resolution CBCT images to obtain the high-resolution tooth segmentation results. In this study, CBCT data of 59 cases with simple crown prostheses and implants were collected for manual labeling as the database, and statistical indicators were evaluated for the prediction results of the algorithm. To assess the performance of tooth segmentation and classification, instance The experimental results showed that the IDSC was 89.35%, and the ADSC was 84. 74%. After eliminating the data with prostheses artifacts, the database of 43 samples was generated, and the performance of the training network was better, with 90.34% for IDSC and 87.88% for ADSC. The framework achieved excellent performance on tooth segmentation and identification. Voxels near intercuspation surfaces and fuzzy boundaries could be separated into correct instances by this framework. The results show that this method can not only successfully achieve 3D tooth instance segmentation but also identify all teeth notation numbers accurately, which has clinical practicability. 利用卷积神经网络实现基于锥形束CT(cone-beam computed tomography, CBCT)体素数据的牙齿实例分割和牙位标定。 本文所提出的牙齿算法包含三个不同的卷积神经网络,网络架构以Resnet为基础模块,首先对CBCT图像进行降采样,然后确定一个包含CBCT图像中所有牙齿的感兴趣区域(region of interest, ROI)。通过训练模型,ROI利用一个双分支“编码器-解码器”结构网络,预测输入数据中每个体素所对应的相关空间位置信息,进行聚类后实现牙齿的实例分割。牙位标定则通过另一个多类别分割任务设计的U-Net模型实现。随后,在原始空间分辨率下,训练了一个用于精细分割的U-Net网络,得到牙齿的高分辨率分割结果。本实验收集了59例带有简单冠修复体及种植体的CBCT数据进行人工标注作为数据库,对牙齿算法的预测结果使用实例 量化指标显示,IDSC为89. 35%,ADSC为84. 74%。剔除了带有修复体伪影的数据后生成了有43例样本的数据库,训练网络得到了更优良的性能,IDSC为90. 34%,ADSC为87.88%。将得到的结果进行可视化分析,牙齿算法不仅可以清晰地分割出CBCT中牙齿的形态,而且可以对牙齿的分类进行准确的编号。 该牙齿算法不仅可以成功实现三维图像的牙齿及修复体分割,还可以准确标定所有恒牙的牙位,具有临床实用性。
Accurate cardiac segmentation of multimodal images, e.g., magnetic resonance (MR), computed tomography (CT) images, plays a pivot role in auxiliary diagnoses, treatments and postoperative assessments of cardiovascular diseases. However, training a well-behaved segmentation model for the cross-modal cardiac image analysis is challenging, due to their diverse appearances/distributions from different devices and acquisition conditions. For instance, a well-trained segmentation model based on the source domain of MR images is often failed in the segmentation of CT images. In this work, a cross-modal images-oriented cardiac segmentation scheme is proposed using a symmetric full convolutional neural network (SFCNN) with the unsupervised multi-domain adaptation (UMDA) and a spatial neural attention (SNA) structure, termed UMDA-SNA-SFCNN, having the merits of without the requirement of any annotation on the test domain. Specifically, UMDA-SNA-SFCNN incorporates SNA to the classic adversarial domain adaptation network to highlight the relevant regions, while restraining the irrelevant areas in the cross-modal images, so as to suppress the negative transfer in the process of unsupervised domain adaptation. In addition, the multi-layer feature discriminators and a predictive segmentation-mask discriminator are established to connect the multi-layer features and segmentation mask of the backbone network, SFCNN, to realize the fine-grained alignment of unsupervised cross-modal feature domains. Extensive confirmative and comparative experiments on the benchmark Multi-Modality Whole Heart Challenge dataset show that the proposed model is superior to the state-of-the-art cross-modal segmentation methods.
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods.
Cell segmentation is the foundation of a wide range of microscopy-based biological studies. Deep learning has revolutionized two-dimensional (2D) cell segmentation, enabling generalized solutions across cell types and imaging modalities. This has been driven by the ease of scaling up image acquisition, annotation and computation. However, three-dimensional (3D) cell segmentation, requiring dense annotation of 2D slices, still poses substantial challenges. Manual labeling of 3D cells to train broadly applicable segmentation models is prohibitive. Even in high-contrast images annotation is ambiguous and time-consuming. Here we develop a theory and toolbox, u-Segment3D, for 2D-to-3D segmentation, compatible with any 2D method generating pixel-based instance cell masks. u-Segment3D translates and enhances 2D instance segmentations to a 3D consensus instance segmentation without training data, as demonstrated on 11 real-life datasets, comprising >70,000 cells, spanning single cells, cell aggregates and tissue. Moreover, u-Segment3D is competitive with native 3D segmentation, even exceeding when cells are crowded and have complex morphologies.
The nucleus plays a crucial role in medical diagnosis, and accurate nucleus segmentation is essential for disease assessment. However, existing methods have limitations in handling the diversity of nuclei and differences in staining conditions, restricting their practical application. A novel deformable multi-level feature network (DMFNet) is proposed for nucleus segmentation. This network is based on convolutional neural network and divides feature processing and mask generation into two levels. At the feature level, deformable convolution is used to enhance feature extraction ability, and multi-scale features are integrated through a balanced feature pyramid. At the mask level, a one-stage framework is adopted to directly perform instance segmentation based on location. Experimental results on the MoNuSeg 2018 dataset show that the mean average precision (mAP) and mean average recall (mAR) of DMFNet reach 37.8% and 47.4% respectively, outperforming many current advanced methods. Ablation experiments verified the effectiveness of each module of the network. DMFNet provides an effective solution for nucleus segmentation and has important application value in medical image analysis.
Pap smear is often employed as a screening test for diagnosing cervical pre-cancerous and cancerous lesions. Accurate identification of dysplastic changes amongst the cervical cells in a Pap smear image is thus essential for rapid diagnosis and prognosis. Manual pathological observations used in clinical practice require exhaustive analysis of thousands of cell nuclei in a whole slide image to visualize the dysplastic nuclear changes which make the process tedious and time-consuming. Automated nuclei segmentation and classification exist but are challenging to overcome issues like nuclear intra-class variability and clustered nuclei separation. To address such challenges, we put forward an application of instance segmentation and classification framework built on an Unet architecture by adding residual blocks, densely connected blocks and a fully convolutional layer as a bottleneck between encoder-decoder blocks for Pap smear images. The number of convolutional layers in the standard Unet has been replaced by densely connected blocks to ensure feature reuse-ability property while the introduction of residual blocks in the same attempts to converge the network more rapidly. The framework provides simultaneous nuclei instance segmentation and also predicts the type of nucleus class as belonging to normal and abnormal classes from the smear images. It works by assigning pixel-wise labels to individual nuclei in a whole slide image which enables identifying multiple nuclei belonging to the same or different class as individual distinct instances. Introduction of a joint loss function in the framework overcomes some trivial cell level issues on clustered nuclei separation. To increase the robustness of the overall framework, the proposed model is preceded with a stacked auto-encoder based shape representation learning model. The proposed model outperforms two state-of-the-art deep learning models Unet and Mask_RCNN with an average Zijdenbos similarity index of 97 % related to segmentation along with binary classification accuracy of 98.8 %. Experiments on hospital-based datasets using liquid-based cytology and conventional pap smear methods along with benchmark Herlev datasets proved the superiority of the proposed method than Unet and Mask_RCNN models in terms of the evaluation metrics under consideration.
Accurate image segmentation is the key to quantitative analysis and recognition of pathological tissues in medical imaging technology, which can provide important technical support for medical diagnosis and treatment. However, the task of lesion segmentation is particularly challenging due to the difficulty in identifying edges, the complexity of different tissues, and the variability in their shapes. To address these challenges, we propose a dual-channel compression mapping network (DCM-Net) with fused attention mechanism for medical image segmentation. Firstly, a dual-channel compression mapping module is added to U-Net's standard convolution blocks to capture inter-channel information. Secondly, we replace the traditional skip path with a fusion attention mechanism that can better present context information in high-level features. Finally, the combination of squeeze-and-excitation module and residual connection in the decoder part can improve the adaptive ability of the network. Through extensive experiments on various medical image datasets, DCM-Net has demonstrated superior performance compared to other models. For instance, on the ISIC database, our network achieved an Accuracy of 91.42%, True Positive Rate (TPR) of 88.93%, Dice of 86.09%, and Jaccard of 76.02%. Additionally, on the pituitary adenoma dataset from Quzhou People's Hospital, DCM-Net reached an Accuracy of 97.07%, TPR of 93.09%, Dice of 92.29%, and Jaccard of 87.73%. These results demonstrate the effectiveness of DCM-Net in providing accurate and reliable segmentation, and it shows valuable potential in the field of medical imaging technology.
Medical image segmentation is a vital yet difficult job because of the multimodality of the acquired images. It is difficult to locate the polluted area before it spreads. This research makes use of several machine learning tools, including an artificial neural network as well as a random forest classifier, to increase the system's reliability of pulmonary nodule classification. Anisotropic diffusion filtering is initially used to remove noise from a picture. After that, a modified random walk method is used to get the region of interest inside the lung parenchyma. Finally, the features corresponding to the consistency of the picture segments are extracted using texture-based feature extraction for pulmonary nodules. The final stage is to identify and classify the pulmonary nodules using a classifier algorithm. The studies employ cross-validation to demonstrate the validity of the diagnosis framework. In this instance, the proposed method is tested using CT scan information provided by the Lung Image Database Consortium. A random forest classifier showed 99.6 percent accuracy rate for detecting lung cancer, compared to a artificial neural network's 94.8 percent accuracy rate. Due to this, current research is now primarily concerned with identifying lung nodules and classifying them as benign or malignant. The diagnostic potential of machine learning as well as image processing approaches are enormous for the categorization of lung cancer.
Self-supervised masked image modeling (MIM) methods have shown promising performances on analyzing natural images. However, directly applying such methods to medical image segmentation tasks still cannot achieve satisfactory results. The challenges arise from the facts that (i) medical images are inherently more complex compared to natural images, and the subjects in medical images often exhibit more distinct contour features; (ii) moreover, the conventional high and fixed masking ratio in MIM is likely to mask the background, limiting the scope of learnable information. To address these problems, we propose a new self-supervised medical image segmentation framework, called Adjustable Masking Lesion Patches (AMLP), which employs Masked Patch Selection (MPS) strategy to identify patches with high probabilities of containing lesions to help model achieve precise lesion reconstruction. To improve the categorization of patches in MPS, we further introduce Relative Reconstruction Loss (RRL) to better learn hard-to-reconstruct lesion patches. Then, Category Consistency Loss (CCL) is proposed to refine patch categorization based on reconstruction difficulty, enhancing difference between lesions and backgrounds. Moreover, an Adjustable Masking Ratio (AMR) strategy is proposed to gradually increase the masking ratio over training to expand the scope of learnable mutual information. Extensive experiments on two medical segmentation datasets demonstrate the superior performances of the proposed AMLP w.r.t. the SOTA self-supervised methods; the results prove that AMLP effectively addresses the challenges of applying masked modeling to medical images and capturing accurate lesion details that are crucial for segmentation tasks.
An image registration-based self-supervised Su-Net for carotid plaque ultrasound image segmentation.
Total Plaque Area (TPA) measurement is critical for early diagnosis and intervention of carotid atherosclerosis in individuals with high risk for stroke. The delineation of the carotid plaques is necessary for TPA measurement, and deep learning methods can automatically segment the plaque and measure TPA from carotid ultrasound images. A large number of labeled images is essential for training a good deep learning model, but it is very difficult to collect such large labeled datasets for carotid image segmentation in clinical practice. Self-supervised learning can provide a possible solution to improve the deep-learning models on small labeled training datasets by designing a pretext task to pre-train the models without using the segmentation masks. However, the existing self-supervised learning methods do not consider the feature presentations of object contours. In this paper, we propose an image registration-based self-supervised learning method and a stacked U-Net (SSL-SU-Net) for carotid plaque ultrasound image segmentation, which can better exploit the semantic features of carotid plaque contours in self-supervised task training. Our network was trained on different numbers of labeled images (n = 10, 33, 50 and 100 subjects) and tested on 44 subjects from the SPARC dataset (n = 144, London, Canada). The network trained on the entire SPARC dataset was then directly applied to an independent dataset collected in Zhongnan hospital (n = 497, Wuhan, China). For the 44 subjects tested on the SPARC dataset, our method yielded a DSC of 80.25-89.18% and the produced TPA measurements, which were strongly correlated with manual segmentation (r = 0.965-0.995, ρ< 0.0001). For the Zhongnan dataset, the DSC was 90.3% and algorithm TPAs were strongly correlated with manual TPAs (r = 0.985, ρ< 0.0001). The results demonstrate that our proposed method yielded excellent performance and good generalization ability when trained on a small labeled dataset, facilitating the use of deep learning in carotid ultrasound image analysis and clinical practice. The code of our algorithm is available https://github.com/a610lab/Registration-SSL.
Previous 3D encoder-decoder segmentation architectures struggled with fine-grained feature decomposition, resulting in unclear feature hierarchies when fused across layers. Furthermore, the blurred nature of contour boundaries in medical imaging limits the focus on high-frequency contour features. To address these challenges, we propose a Multi-oriented Hierarchical Extraction and Dual-frequency Decoupling Network (HEDN), which consists of three modules: Encoder-Decoder Module (E-DM), Multi-oriented Hierarchical Extraction Module (Multi-HEM), and Dual-frequency Decoupling Module (Dual-DM). The E-DM performs the basic encoding and decoding tasks, while Multi-HEM decomposes and fuses spatial and slice-level features in 3D, enriching the feature hierarchy by weighting them through 3D fusion. Dual-DM separates high-frequency features from the reconstructed network using self-supervision. Finally, the self-supervised high-frequency features separated by Dual-DM are inserted into the process following Multi-HEM, enhancing interactions and complementarities between contour features and hierarchical features, thereby mutually reinforcing both aspects. On the Synapse dataset, HEDN outperforms existing methods, boosting Dice Similarity Score (DSC) by 1.38% and decreasing 95% Hausdorff Distance (HD95) by 1.03 mm. Likewise, on the Automatic Cardiac Diagnosis Challenge (ACDC) dataset, HEDN achieves 0.5% performance gains across all categories.
Auto-segmentation promises greater speed and lower inter-reader variability than manual segmentations in radiation oncology clinical practice. This study aims to implement and evaluate the accuracy of the auto-segmentation algorithm, "Masked Image modeling using the vision Transformers (SMIT)," for neck nodal metastases on longitudinal T This prospective clinical trial study included 123 human papillomaviruses (HPV-positive [+]) related OSPCC patients who received concurrent chemoradiotherapy. T No significant difference in manual and SMIT delineated tumor volume at pre-Tx (8.68 ± 7.15 vs 8.38 ± 7.01 cm The SMIT algorithm provides sufficient segmentation accuracy for oncological applications in HPV+ OPSCC. First evaluation of auto-segmentation with SMIT using longitudinal T
Pancreatic cancer, characterized by its notable prevalence and mortality rates, demands accurate lesion delineation for effective diagnosis and therapeutic interventions. The generalizability of extant methods is frequently compromised due to the pronounced variability in imaging and the heterogeneous characteristics of pancreatic lesions, which may mimic normal tissues and exhibit significant inter-patient variability. Thus, we propose a generalization framework that synergizes pixel-level classification and regression tasks, to accurately delineate lesions and improve model stability. This framework not only seeks to align segmentation contours with actual lesions but also uses regression to elucidate spatial relationships between diseased and normal tissues, thereby improving tumor localization and morphological characterization. Enhanced by the reciprocal transformation of task outputs, our approach integrates additional regression supervision within the segmentation context, bolstering the model's generalization ability from a dual-task perspective. Besides, dual self-supervised learning in feature spaces and output spaces augments the model's representational capability and stability across different imaging views. Experiments on 594 samples composed of three datasets with significant imaging differences demonstrate that our generalized pancreas segmentation results comparable to mainstream in-domain validation performance (Dice: 84.07%). More importantly, it successfully improves the results of the highly challenging cross-lesion generalized pancreatic cancer segmentation task by 9.51%. Thus, our model constitutes a resilient and efficient foundational technological support for pancreatic disease management and wider medical applications. The codes will be released at https://github.com/SJTUBME-QianLab/Dual-Task-Seg.
Accurate structure contouring on computed tomography (CT) is critical for prostate cancer radiotherapy, but it remains labour intensive and prone to interobserver variability, particularly for small, low-contrast organs such as the prostate, seminal vesicles (SV) and penile bulb (PB). Deep-learning models can automate this task. However, they typically require large, fully labelled datasets that are often unavailable in clinical settings. This study evaluated whether self-supervised (label-free) slice-prediction pretraining could enhance segmentation performance, especially in scenarios with limited annotated data. We used 322 pelvic CT volumes (215 from UMMC, 107 from TCIA), split 80:20 into training and testing sets (258 training, 64 testing patients). A novel lightweight 2D U-Net encoder was first pretrained on unlabelled data using a slice-prediction task across axial, sagittal, and coronal planes. The pretrained model was then fine-tuned for multi-class segmentation using either the full dataset or a reduced subset of 60 labelled patients. Baselines trained from scratch with 1-channel or 3-channel input were included for comparison. Segmentation accuracy was assessed using mean distance agreement (MDA) and Dice similarity coefficient (DSC). Paired t-tests with Bonferroni correction were applied to assess statistical significance. Models with self-supervised pretraining achieved consistently lower MDA across all major pelvic structures. Notable improvements included reductions in bladder MDA from 0.600 mm to 0.547 mm, femoral heads from 1.370 mm to 0.994 mm, PB from 1.470 mm to 1.283 mm, rectum from 0.792 mm to 0.669 mm, prostate from 1.281 mm to 1.183 mm, and SV from 1.175 mm to 0.893 mm. Self-supervised pretraining via slice prediction enables anatomically informed feature learning and improves segmentation robustness under limited data conditions. This strategy enhances accuracy without reliance on manual labels during pretraining and is compatible with computationally lightweight architectures, making it well-suited in resource-constrained clinical environments.
Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard. While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement. Clinicians often lack clear benchmarks to gauge the "goodness" of segmentation results based on these metrics. Recognizing the clinical relevance of volumetry, we utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks. Our work integrates theoretical analysis and empirical validation across diverse datasets. We delve into the often-ambiguous relationship between segmentation quality (measured by Dice) and volumetric accuracy in clinical practice. Our findings highlight the critical role of incorporating volumetric prediction accuracy into segmentation evaluation. This approach empowers clinicians with a more nuanced understanding of segmentation performance, ultimately improving the interpretation and utility of these metrics in real-world healthcare settings.
Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging -- including the first large-scale comparative study of generative models for medical image translation -- and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.
The patient with ischemic stroke can benefit most from the earliest possible definitive diagnosis. While the high quality medical resources are quite scarce across the globe, an automated diagnostic tool is expected in analyzing the magnetic resonance (MR) images to provide reference in clinical diagnosis. In this paper, we propose a deep learning method to automatically segment ischemic stroke lesions from multi-modal MR images. By using atrous convolution and global convolution network, our proposed residual-structured fully convolutional network (Res-FCN) is able to capture features from large receptive fields. The network architecture is validated on a large dataset of 212 clinically acquired multi-modal MR images, which is shown to achieve a mean dice coefficient of 0.645 with a mean number of false negative lesions of 1.515. The false negatives can reach a value that close to a common medical image doctor, making it exceptive for a real clinical application.
With large-scale well-labeled datasets, deep learning has shown significant success in medical image segmentation. However, it is challenging to acquire abundant annotations in clinical practice due to extensive expertise requirements and costly labeling efforts. Recently, contrastive learning has shown a strong capacity for visual representation learning on unlabeled data, achieving impressive performance rivaling supervised learning in many domains. In this work, we propose a novel multi-scale multi-view global-local contrastive learning (MMGL) framework to thoroughly explore global and local features from different scales and views for robust contrastive learning performance, thereby improving segmentation performance with limited annotations. Extensive experiments on the MM-WHS dataset demonstrate the effectiveness of MMGL framework on semi-supervised cardiac image segmentation, outperforming the state-of-the-art contrastive learning methods by a large margin.
One of the most common tasks in medical imaging is semantic segmentation. Achieving this segmentation automatically has been an active area of research, but the task has been proven very challenging due to the large variation of anatomy across different patients. However, recent advances in deep learning have made it possible to significantly improve the performance of image recognition and semantic segmentation methods in the field of computer vision. Due to the data driven approaches of hierarchical feature learning in deep learning frameworks, these advances can be translated to medical images without much difficulty. Several variations of deep convolutional neural networks have been successfully applied to medical images. Especially fully convolutional architectures have been proven efficient for segmentation of 3D medical images. In this article, we describe how to build a 3D fully convolutional network (FCN) that can process 3D images in order to produce automatic semantic segmentations. The model is trained and evaluated on a clinical computed tomography (CT) dataset and shows state-of-the-art performance in multi-organ segmentation.
The recent advances in deep learning (DL) have been accelerated by access to large-scale data and compute. These large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions. These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale, particularly in the Global South. In this work, we take a comprehensive look at the landscape of existing DL models for medical image analysis tasks and demonstrate their usefulness in settings where resources are limited. To account for the resource consumption of DL models, we introduce a novel measure to estimate the performance per resource unit, which we call the PePR score. Using a diverse family of 131 unique DL architectures (spanning 1M to 130M trainable parameters) and three medical image datasets, we capture trends about the performance-resource trade-offs. In applications like medical image analysis, we argue that small-scale, specialized models are better than striving for large-scale models. Furthermore, we show that using existing pretrained models that are fine-tuned on new data can significantly reduce the computational resources and data required compared to training models from scratch. We hope this work will encourage the community to focus on improving AI equity by developing methods and models with smaller resource footprints.
For acute ischemic stroke (AIS) patients with large vessel occlusions, clinicians must decide if the benefit of mechanical thrombectomy (MTB) outweighs the risks and potential complications following an invasive procedure. Pre-treatment computed tomography (CT) and angiography (CTA) are widely used to characterize occlusions in the brain vasculature. If a patient is deemed eligible, a modified treatment in cerebral ischemia (mTICI) score will be used to grade how well blood flow is reestablished throughout and following the MTB procedure. An estimation of the likelihood of successful recanalization can support treatment decision-making. In this study, we proposed a fully automated prediction of a patient's recanalization score using pre-treatment CT and CTA imaging. We designed a spatial cross attention network (SCANet) that utilizes vision transformers to localize to pertinent slices and brain regions. Our top model achieved an average cross-validated ROC-AUC of 77.33 $\pm$ 3.9\%. This is a promising result that supports future applications of deep learning on CT and CTA for the identification of eligible AIS patients for MTB.
This paper presents a novel unsupervised segmentation method for 3D medical images. Convolutional neural networks (CNNs) have brought significant advances in image segmentation. However, most of the recent methods rely on supervised learning, which requires large amounts of manually annotated data. Thus, it is challenging for these methods to cope with the growing amount of medical images. This paper proposes a unified approach to unsupervised deep representation learning and clustering for segmentation. Our proposed method consists of two phases. In the first phase, we learn deep feature representations of training patches from a target image using joint unsupervised learning (JULE) that alternately clusters representations generated by a CNN and updates the CNN parameters using cluster labels as supervisory signals. We extend JULE to 3D medical images by utilizing 3D convolutions throughout the CNN architecture. In the second phase, we apply k-means to the deep representations from the trained CNN and then project cluster labels to the target image in order to obtain the fully segmented image. We evaluated our methods on three images of lung cancer specimens scanned with micro-computed tomography (micro-CT). The automatic segmentation of pathological regions in micro-CT could further contribute to the pathological examination process. Hence, we aim to automatically divide each image into the regions of invasive carcinoma, noninvasive carcinoma, and normal tissue. Our experiments show the potential abilities of unsupervised deep representation learning for medical image segmentation.
Deep learning (DL) models for disease classification or segmentation from medical images are increasingly trained using transfer learning (TL) from unrelated natural world images. However, shortcomings and utility of TL for specialized tasks in the medical imaging domain remain unknown and are based on assumptions that increasing training data will improve performance. We report detailed comparisons, rigorous statistical analysis and comparisons of widely used DL architecture for binary segmentation after TL with ImageNet initialization (TII-models) with supervised learning with only medical images(LMI-models) of macroscopic optical skin cancer, microscopic prostate core biopsy and Computed Tomography (CT) DICOM images. Through visual inspection of TII and LMI model outputs and their Grad-CAM counterparts, our results identify several counter intuitive scenarios where automated segmentation of one tumor by both models or the use of individual segmentation output masks in various combinations from individual models leads to 10% increase in performance. We also report sophisticated ensemble DL strategies for achieving clinical grade medical image segmentation and model explanations under low data regimes. For example; estimating performance, explanations and replicability of LMI and TII models described by us can be used for situations in which sparsity promotes better learning. A free GitHub repository of TII and LMI models, code and more than 10,000 medical images and their Grad-CAM output from this study can be used as starting points for advanced computational medicine and DL research for biomedical discovery and applications.
Deep learning has successfully been leveraged for medical image segmentation. It employs convolutional neural networks (CNN) to learn distinctive image features from a defined pixel-wise objective function. However, this approach can lead to less output pixel interdependence producing incomplete and unrealistic segmentation results. In this paper, we present a fully automatic deep learning method for robust medical image segmentation by formulating the segmentation problem as a recurrent framework using two systems. The first one is a forward system of an encoder-decoder CNN that predicts the segmentation result from the input image. The predicted probabilistic output of the forward system is then encoded by a fully convolutional network (FCN)-based context feedback system. The encoded feature space of the FCN is then integrated back into the forward system's feed-forward learning process. Using the FCN-based context feedback loop allows the forward system to learn and extract more high-level image features and fix previous mistakes, thereby improving prediction accuracy over time. Experimental results, performed on four different clinical datasets, demonstrate our method's potential application for single and multi-structure medical image segmentation by outperforming the state of the art methods. With the feedback loop, deep learning methods can now produce results that are both anatomically plausible and robust to low contrast images. Therefore, formulating image segmentation as a recurrent framework of two interconnected networks via context feedback loop can be a potential method for robust and efficient medical image analysis.
Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data. Although transfer learning from natural to medical images has been explored for image-only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated $11$ datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demonstrate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional segmentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts. The code and datasets are available at https://github.com/naamiinepal/medvlsm.
Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.
Continual Semantic Segmentation (CSS) requires learning new classes without forgetting previously acquired knowledge, addressing the fundamental challenge of catastrophic forgetting in dense prediction tasks. However, existing CSS methods typically employ single-stage encoder-decoder architectures where segmentation masks and class labels are tightly coupled, leading to interference between old and new class learning and suboptimal retention-plasticity balance. We introduce DecoupleCSS, a novel two-stage framework for CSS. By decoupling class-aware detection from class-agnostic segmentation, DecoupleCSS enables more effective continual learning, preserving past knowledge while learning new classes. The first stage leverages pre-trained text and image encoders, adapted using LoRA, to encode class-specific information and generate location-aware prompts. In the second stage, the Segment Anything Model (SAM) is employed to produce precise segmentation masks, ensuring that segmentation knowledge is shared across both new and previous classes. This approach improves the balance between retention and adaptability in CSS, achieving state-of-the-art performance across a variety of challenging tasks. Our code is publicly available at: https://github.com/euyis1019/Decoupling-Continual-Semantic-Segmentation.
Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the generalization capabilities of conventional centralized DL as a "free lunch", its application in FL is largely underexplored. Notably, constrained by costly labeling, 3D medical segmentation generally relies on data augmentation. In this work, we aim to develop a vicinal feature-level data augmentation (VFDA) scheme to efficiently alleviate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation. We take both the inner- and inter-institute divergence into consideration, without the need for cross-institute transfer of raw data or their mixup. Specifically, we exploit the batch-wise feature statistics (e.g., mean and standard deviation) in each institute to abstractly represent the discrepancy of data, and model each feature statistic probabilistically via a Gaussian prototype, with the mean corresponding to the original statistic and the variance quantifying the augmentation scope. From the vicinal risk minimization perspective, novel feature statistics can be drawn from the Gaussian distribution to fulfill augmentation. The variance is explicitly derived by the data bias in each individual institute and the underlying feature statistics characterized by all participating institutes. The added-on VFDA consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation.
Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model that is intended to segment user-defined objects of interest in an interactive manner. While the performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it.
The ability to automatically learn task specific feature representations has led to a huge success of deep learning methods. When large training data is scarce, such as in medical imaging problems, transfer learning has been very effective. In this paper, we systematically investigate the process of transferring a Convolutional Neural Network, trained on ImageNet images to perform image classification, to kidney detection problem in ultrasound images. We study how the detection performance depends on the extent of transfer. We show that a transferred and tuned CNN can outperform a state-of-the-art feature engineered pipeline and a hybridization of these two techniques achieves 20\% higher performance. We also investigate how the evolution of intermediate response images from our network. Finally, we compare these responses to state-of-the-art image processing filters in order to gain greater insight into how transfer learning is able to effectively manage widely varying imaging regimes.
Deep learning has shown promising results in medical image analysis, however, the lack of very large annotated datasets confines its full potential. Although transfer learning with ImageNet pre-trained classification models can alleviate the problem, constrained image sizes and model complexities can lead to unnecessary increase in computational cost and decrease in performance. As many common morphological features are usually shared by different classification tasks of an organ, it is greatly beneficial if we can extract such features to improve classification with limited samples. Therefore, inspired by the idea of curriculum learning, we propose a strategy for building medical image classifiers using features from segmentation networks. By using a segmentation network pre-trained on similar data as the classification task, the machine can first learn the simpler shape and structural concepts before tackling the actual classification problem which usually involves more complicated concepts. Using our proposed framework on a 3D three-class brain tumor type classification problem, we achieved 82% accuracy on 191 testing samples with 91 training samples. When applying to a 2D nine-class cardiac semantic level classification problem, we achieved 86% accuracy on 263 testing samples with 108 training samples. Comparisons with ImageNet pre-trained classifiers and classifiers trained from scratch are presented.
Automated segmentation of kidneys and kidney tumors is an important step in quantifying the tumor's morphometrical details to monitor the progression of the disease and accurately compare decisions regarding the kidney tumor treatment. Manual delineation techniques are often tedious, error-prone and require expert knowledge for creating unambiguous representation of kidneys and kidney tumors segmentation. In this work, we propose an end-to-end boundary aware fully Convolutional Neural Networks (CNNs) for reliable kidney and kidney tumor semantic segmentation from arterial phase abdominal 3D CT scans. We propose a segmentation network consisting of an encoder-decoder architecture that specifically accounts for organ and tumor edge information by devising a dedicated boundary branch supervised by edge-aware loss terms. We have evaluated our model on 2019 MICCAI KiTS Kidney Tumor Segmentation Challenge dataset and our method has achieved dice scores of 0.9742 and 0.8103 for kidney and tumor repetitively and an overall composite dice score of 0.8923.
In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.
In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due to the locality of the convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by 4x, experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes and trained models will be publicly available at https://github.com/HuCaoFighting/Swin-Unet.
Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods.
Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard.
Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version
Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers' self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers' self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder's efficacy in modeling interactions among multiple abdominal organs and the decoder's strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively.
In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks, e.g. obtaining 89.03, 89.71 and 81.08 in terms of DSC score on three datasets respectively. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet.
UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. However, these networks cannot be effectively adopted for rapid image segmentation in point-of-care applications as they are parameter-heavy, computationally complex and slow to use. To this end, we propose UNeXt which is a Convolutional multilayer perceptron (MLP) based network for image segmentation. We design UNeXt in an effective way with an early convolutional stage and a MLP stage in the latent stage. We propose a tokenized MLP block where we efficiently tokenize and project the convolutional features and use MLPs to model the representation. To further boost the performance, we propose shifting the channels of the inputs while feeding in to MLPs so as to focus on learning local dependencies. Using tokenized MLPs in latent space reduces the number of parameters and computational complexity while being able to result in a better representation to help segmentation. The network also consists of skip connections between various levels of encoder and decoder. We test UNeXt on multiple medical image segmentation datasets and show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance over the state-of-the-art medical image segmentation architectures. Code is available at https://github.com/jeya-maria-jose/UNeXt-pytorch
Over the past decade, Deep Convolutional Neural Networks have been widely adopted for medical image segmentation and shown to achieve adequate performance. However, due to the inherent inductive biases present in the convolutional architectures, they lack understanding of long-range dependencies in the image. Recently proposed Transformer-based architectures that leverage self-attention mechanism encode long-range dependencies and learn representations that are highly expressive. This motivates us to explore Transformer-based solutions and study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. Majority of existing Transformer-based network architectures proposed for vision applications require large-scale datasets to train properly. However, compared to the datasets for vision applications, for medical imaging the number of data samples is relatively low, making it difficult to efficiently train transformers for medical applications. To this end, we propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. Furthermore, to train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance. Specifically, we operate on the whole image and patches to learn global and local features, respectively. The proposed Medical Transformer (MedT) is evaluated on three different medical image segmentation datasets and it is shown that it achieves better performance than the convolutional and other related transformer-based architectures. Code: https://github.com/jeya-maria-jose/Medical-Transformer
The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation due to its impressive capabilities in various segmentation tasks and its prompt-based interface. However, recent studies and individual experiments have shown that SAM underperforms in medical image segmentation due to the lack of medical-specific knowledge. This raises the question of how to enhance SAM's segmentation capability for medical images. We propose the Medical SAM Adapter (Med-SA), which is one of the first methods to integrate SAM into medical image segmentation. Med-SA uses a light yet effective adaptation technique instead of fine-tuning the SAM model, incorporating domain-specific medical knowledge into the segmentation model. We also propose Space-Depth Transpose (SD-Trans) to adapt 2D SAM to 3D medical images and Hyper-Prompting Adapter (HyP-Adpt) to achieve prompt-conditioned adaptation. Comprehensive evaluation experiments on 17 medical image segmentation tasks across various modalities demonstrate the superior performance of Med-SA while updating only 2% of the SAM parameters (13M). Our code is released at https://github.com/KidsWithTokens/Medical-SAM-Adapter.
Medical image segmentation - the prerequisite of numerous clinical needs - has been significantly prospered by recent advances in convolutional neural networks (CNNs). However, it exhibits general limitations on modeling explicit long-range relation, and existing cures, resorting to building deep encoders along with aggressive downsampling operations, leads to redundant deepened networks and loss of localized details. Hence, the segmentation task awaits a better solution to improve the efficiency of modeling global contexts while maintaining a strong grasp of low-level details. In this paper, we propose a novel parallel-in-branch architecture, TransFuse, to address this challenge. TransFuse combines Transformers and CNNs in a parallel style, where both global dependency and low-level spatial details can be efficiently captured in a much shallower manner. Besides, a novel fusion technique - BiFusion module is created to efficiently fuse the multi-level features from both branches. Extensive experiments demonstrate that TransFuse achieves the newest state-of-the-art results on both 2D and 3D medical image sets including polyp, skin lesion, hip, and prostate segmentation, with significant parameter decrease and inference speed improvement.
Medical image segmentation is an important step in medical image analysis. With the rapid development of a convolutional neural network in image processing, deep learning has been used for medical image segmentation, such as optic disc segmentation, blood vessel detection, lung segmentation, cell segmentation, and so on. Previously, U-net based approaches have been proposed. However, the consecutive pooling and strided convolutional operations led to the loss of some spatial information. In this paper, we propose a context encoder network (CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation. CE-Net mainly contains three major components: a feature encoder module, a context extractor, and a feature decoder module. We use the pretrained ResNet block as the fixed feature extractor. The context extractor module is formed by a newly proposed dense atrous convolution block and a residual multi-kernel pooling block. We applied the proposed CE-Net to different 2D medical image segmentation tasks. Comprehensive results show that the proposed method outperforms the original U-Net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation, cell contour segmentation, and retinal optical coherence tomography layer segmentation.
Learning from AI-generated annotations is well-recognized as a key advance of deep learning techniques in medical image segmentation. Towards this direction, in this paper, we investigate two questions: (1) how to accurately measure loss value on AI-generated annotations that often contain errors and (2) how to effectively update model’s parameters when the loss value is no longer a correct supervision for medical image segmentation. The main results are that (1) ‘error-tolerant’ loss functions exist and (2) ‘cross-training’, updating the model using data with a small loss of its ‘twin’ model, can tolerate the loss function to some extent. Per the main results, we yet derived a robust training algorithm, called confidence regularized co-teaching, that helps deep models to combat annotation errors in medical image segmentation. This algorithm simultaneously trains two ‘twin’ segmentation models and updates model’s parameters by cross-training with disagreement confident data that are predicted differently by the two models, thereby being able to learning from data with annotation errors. The empirical evidence from a publicly available dataset shows that this new algorithm works better on combating annotation errors than existing methods for medical image segmentation, opening the opportunity to use AI-generated annotations to train segmentation model for medical image segmentation.
U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the impressive results of the Kolmogorov-Arnold Networks (KANs) in terms of accuracy and interpretability, which reshape the neural network learning via the stack of non-linear learnable activation functions derived from the Kolmogorov-Anold representation theorem. Specifically, in this paper, we explore the untapped potential of KANs in improving backbones for vision tasks. We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN. Rigorous medical image segmentation benchmarks verify the superiority of UKAN by higher accuracy even with less computation cost. We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures.
In semi-supervised medical image segmentation, there exist empirical mismatch problems between labeled and un-labeled data distribution. The knowledge learned from the labeled data may be largely discarded if treating labeled and unlabeled data separately or in an inconsistent manner. We propose a straightforward method for alleviating the problem-copy-pasting labeled and unlabeled data bidirectionally, in a simple Mean Teacher architecture. The method encourages unlabeled data to learn comprehensive common semantics from the labeled data in both inward and outward directions. More importantly, the consistent learning procedure for labeled and unlabeled data can largely reduce the empirical distribution gap. In detail, we copy-paste a random crop from a labeled image (foreground) onto an unlabeled image (background) and an unlabeled image (foreground) onto a labeled image (background), respectively. The two mixed images are fed into a Student network and supervised by the mixed supervisory signals of pseudo-labels and ground-truth. We reveal that the simple mechanism of copy-pasting bidirectionally between labeled and unlabeled data is good enough and the experiments show solid gains (e.g., over 21% Dice improvement on ACDC dataset with 5% labeled data) compared with other state-of-the-arts on various semi-supervised medical image segmentation datasets. Code is avaiable at https://github.com/DeepMed-Lab-ECNU/BCP.
The Transformer architecture has shown a remarkable ability in modeling global relationships. However, it poses a significant computational challenge when processing high-dimensional medical images. This hinders its development and widespread adoption in this task. Mamba, as a State Space Model (SSM), recently emerged as a notable manner for long-range dependencies in sequential modeling, excelling in natural language processing filed with its remarkable memory efficiency and computational speed. Inspired by its success, we introduce SegMamba, a novel 3D medical image \textbf{Seg}mentation \textbf{Mamba} model, designed to effectively capture long-range dependencies within whole volume features at every scale. Our SegMamba, in contrast to Transformer-based methods, excels in whole volume feature modeling from a state space model standpoint, maintaining superior processing speed, even with volume features at a resolution of {$64\times 64\times 64$}. Comprehensive experiments on the BraTS2023 dataset demonstrate the effectiveness and efficiency of our SegMamba. The code for SegMamba is available at: https://github.com/ge-xing/SegMamba
BackgroundMedical Image segmentation is an important image processing step. Comparing images to evaluate the quality of segmentation is an essential part of measuring progress in this research area. Some of the challenges in evaluating medical segmentation are: metric selection, the use in the literature of multiple definitions for certain metrics, inefficiency of the metric calculation implementations leading to difficulties with large volumes, and lack of support for fuzzy segmentation by existing metrics.ResultFirst we present an overview of 20 evaluation metrics selected based on a comprehensive literature review. For fuzzy segmentation, which shows the level of membership of each voxel to multiple classes, fuzzy definitions of all metrics are provided. We present a discussion about metric properties to provide a guide for selecting evaluation metrics. Finally, we propose an efficient evaluation tool implementing the 20 selected metrics. The tool is optimized to perform efficiently in terms of speed and required memory, also if the image size is extremely large as in the case of whole body MRI or CT volume segmentation. An implementation of this tool is available as an open source project.ConclusionWe propose an efficient evaluation tool for 3D medical image segmentation using 20 evaluation metrics and provide guidelines for selecting a subset of these metrics that is suitable for the data and the segmentation task.
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/CoTr}
Accurate computer-aided polyp detection and segmentation during colonoscopy examinations can help endoscopists resect abnormal tissue and thereby decrease chances of polyps growing into cancer. Towards developing a fully automated model for pixel-wise polyp segmentation, we propose ResUNet++, which is an improved ResUNet architecture for colonoscopic image segmentation. Our experimental evaluations show that the suggested architecture produces good segmentation results on publicly available datasets. Furthermore, ResUNet++ significantly outperforms U-Net and ResUNet, two key state-of-the-art deep learning architectures, by achieving high evaluation scores with a dice coefficient of 81.33%, and a mean Intersection over Union (mIoU) of 79.27% for the Kvasir-SEG dataset and a dice coefficient of 79.55%, and a mIoU of 79.62% with CVC-612 dataset.
The release of nnU-Net marked a paradigm shift in 3D medical image segmentation, demonstrating that a properly configured U-Net architecture could still achieve state-of-the-art results. Despite this, the pursuit of novel architectures, and the respective claims of superior performance over the U-Net baseline, continued. In this study, we demonstrate that many of these recent claims fail to hold up when scrutinized for common validation shortcomings, such as the use of inadequate baselines, insufficient datasets, and neglected computational resources. By meticulously avoiding these pitfalls, we conduct a thorough and comprehensive benchmarking of current segmentation methods including CNN-based, Transformer-based, and Mamba-based approaches. In contrast to current beliefs, we find that the recipe for state-of-the-art performance is 1) employing CNN-based U-Net models, including ResNet and ConvNeXt variants, 2) using the nnU-Net framework, and 3) scaling models to modern hardware resources. These results indicate an ongoing innovation bias towards novel architectures in the field and underscore the need for more stringent validation standards in the quest for scientific progress.
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroupIEMCAD.
In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .
Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to learn more contextualized visual representations. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer (i.e., not-another transFormer), a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, the most widely recognized convnet-based 3D medical segmentation model, nnFormer produces significantly lower HD95 and is much more computationally efficient. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling. Codes and models of nnFormer are available at https://git.io/JSf3i.
Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annotations. To address this issue, this paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer. The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch. Specifically, the CNN branch collaborates with the Transformer branch to fuse the local features learned from CNN with the global representations obtained from Transformer, which can effectively overcome limitations of existing scribble-supervised segmentation methods. Furthermore, the ACAM branch assists in unifying the shallow convolution features and the deep convolution features to improve model’s performance further. Extensive experiments on two public datasets and one private dataset show that our ScribFormer has superior performance over the state-of-the-art scribble-supervised segmentation methods, and achieves even better results than the fully-supervised segmentation methods. The code is released at https://github.com/HUANGLIZI/ScribFormer.
Semi-supervised learning has garnered significant interest as a method to alleviate the burden of data annotation. Recently, semi-supervised medical image segmentation has garnered significant interest that can alleviate the burden of densely annotated data. Substantial advancements have been achieved by integrating consistency-regularization and pseudo-labeling techniques. The quality of the pseudo-labels is crucial in this regard. Unreliable pseudo-labeling can result in the introduction of noise, leading the model to converge to suboptimal solutions. To address this issue, we propose learning from reliable pseudo-labels. In this paper, we tackle two critical questions in learning from reliable pseudo-labels: which pseudo-labels are reliable and how reliable are they? Specifically, we conduct a comparative analysis of two subnetworks to address both challenges. Initially, we compare the prediction confidence of the two subnetworks. A higher confidence score indicates a more reliable pseudo-label. Subsequently, we utilize intra-class similarity to assess the reliability of the pseudo-labels to address the second challenge. The greater the intra-class similarity of the predicted classes, the more reliable the pseudo-label. The subnetwork selectively incorporates knowledge imparted by the other subnetwork model, contingent on the reliability of the pseudo labels. By reducing the introduction of noise from unreliable pseudo-labels, we are able to improve the performance of segmentation. To demonstrate the superiority of our approach, we conducted an extensive set of experiments on three datasets: Left Atrium, Pancreas-CT and Brats-2019. The experimental results demonstrate that our approach achieves state-of-the-art performance. Code is available at: https://github.com/Jiawei0o0/mutual-learning-with-reliable-pseudo-labels.
Deep learning, especially convolutional neural networks (CNNs) and Transformer architectures, have become the focus of extensive research in medical image segmentation, achieving impressive results. However, CNNs come with inductive biases that limit their effectiveness in more complex, varied segmentation scenarios. Conversely, while Transformer-based methods excel at capturing global and long-range semantic details, they suffer from high computational demands. In this study, we propose CSWin-UNet, a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet to facilitate horizontal and vertical stripes self-attention. This method significantly enhances both computational efficiency and receptive field interactions. Additionally, our innovative decoder utilizes a content-aware reassembly operator that strategically reassembles features, guided by predicted kernels, for precise image resolution restoration. Our extensive empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy. Codes are available at https://github.com/eatbeanss/CSWin-UNet.
Although the U-shape networks have achieved remarkable performances in many medical image segmentation tasks, they rarely model the sequential relationship of hierarchical layers. This weakness makes it difficult for the current layer to effectively utilize the historical information of the previous layer, leading to unsatisfactory segmentation results for lesions with blurred boundaries and irregular shapes. To solve this problem, we propose a novel dual-path U-Net, dubbed I2U-Net. The newly proposed network encourages historical information re-usage and re-exploration through rich information interaction among the dual paths, allowing deep layers to learn more comprehensive features that contain both low-level detail description and high-level semantic abstraction. Specifically, we introduce a multi-functional information interaction module (MFII), which can model cross-path, cross-layer, and cross-path-and-layer information interactions via a unified design, making the proposed I2U-Net behave similarly to an unfolded RNN and enjoying its advantage of modeling time sequence information. Besides, to further selectively and sensitively integrate the information extracted by the encoder of the dual paths, we propose a holistic information fusion and augmentation module (HIFA), which can efficiently bridge the encoder and the decoder. Extensive experiments on four challenging tasks, including skin lesion, polyp, brain tumor, and abdominal multi-organ segmentation, consistently show that the proposed I2U-Net has superior performance and generalization ability over other state-of-the-art methods. The code is available at https://github.com/duweidai/I2U-Net.
Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts the Hiera backbone of SAM2 as the encoder, while the decoder uses the classic U-shaped design. Additionally, adapters are inserted into the encoder to enable parameter-efficient fine-tuning. Preliminary experiments on various downstream tasks, such as camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation, demonstrate that our SAM2-UNet can outperform existing specialized state-of-the-art methods with minimal additional complexity.
Generalizability in deep neural networks plays a pivotal role in medical image segmentation. However, deep learning-based medical image analyses tend to overlook the importance of frequency variance, which is critical element for achieving a model that is both modality-agnostic and domain-generalizable. Additionally, various models fail to account for the potential information loss that can arise from multitask learning under deep supervision, a factor that can impair the model's representation ability. To address these challenges, we propose a Modality-agnostic Domain Generalizable Network (MADGNet) for medical image segmentation, which comprises two key components: a Multi-Frequency in Multi-Scale Attention (MFMSA) block and Ensemble Sub-Decoding Module (E-SDM). The MFMSA block refines the process of spatial feature extraction, particularly in capturing boundary features, by incorporating multi-frequency and multi-scale features, thereby offering informative cues for tissue outline and anatomical structures. Moreover, we propose E-SDM to mitigate information loss in multitask learning with deep supervision, especially during substantial upsampling from low resolution. We evaluate the segmentation performance of MADGNet across six modalities and fifteen datasets. Through extensive experiments, we demonstrate that MADGNet consistently outperforms state-of-the-art models across various modalities, showcasing superior segmentation performance. This affirms MADGNet as a robust solution for medical image segmentation that excels in diverse imaging scenarios. Our MADGNet code is available in GitHub Link.
The adoption of Vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33× less parameters and a 13× reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://github.com/OSUPCVLab/SegFormer3D.git
The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. We comprehensively evaluate our method on five medical image segmentation tasks, by using 11 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.
Semantic image segmentation is the process of labeling each pixel of an image with its corresponding class. An encoder-decoder based approach, like U-Net and its variants, is a popular strategy for solving medical image segmentation tasks. To improve the performance of U-Net on various segmentation tasks, we propose a novel architecture called DoubleU-Net, which is a combination of two U-Net architectures stacked on top of each other. The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, we added another U-Net at the bottom. We also adopt Atrous Spatial Pyramid Pooling (ASPP) to capture contextual information within the network. We have evaluated DoubleU-Net using four medical segmentation datasets, covering various imaging modalities such as colonoscopy, dermoscopy, and microscopy. Experiments on the MICCAI 2015 segmentation challenge, the CVC-ClinicDB, the 2018 Data Science Bowl challenge, and the Lesion boundary segmentation datasets demonstrate that the DoubleU-Net outperforms U-Net and the baseline models. Moreover, DoubleU-Net produces more accurate segmentation masks, especially in the case of the CVC-ClinicDB and MICCAI 2015 segmentation challenge datasets, which have challenging images such as smaller and flat polyps. These results show the improvement over the existing U-Net model. The encouraging results, produced on various medical image segmentation datasets, show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.
Deep learning-based semi-supervised learning (SSL) algorithms have led to promising results in medical images segmentation and can alleviate doctors' expensive annotations by leveraging unlabeled data. However, most of the existing SSL algorithms in literature tend to regularize the model training by perturbing networks and/or data. Observing that multi/dual-task learning attends to various levels of information which have inherent prediction perturbation, we ask the question in this work: can we explicitly build task-level regularization rather than implicitly constructing networks- and/or data-level perturbation and then regularization for SSL? To answer this question, we propose a novel dual-task-consistency semi-supervised framework for the first time. Concretely, we use a dual-task deep network that jointly predicts a pixel-wise segmentation map and a geometry-aware level set representation of the target. The level set representation is converted to an approximated segmentation map through a differentiable task transform layer. Simultaneously, we introduce a dual-task consistency regularization between the level set-derived segmentation maps and directly predicted segmentation maps for both labeled and unlabeled data. Extensive experiments on two public datasets show that our method can largely improve the performance by incorporating the unlabeled data. Meanwhile, our framework outperforms the state-of-the-art semi-supervised learning methods.
No abstract available
The automatic differentiation of retinal vessels into arteries and veins (A/V) is a highly relevant task within the field of retinal image analysis. However, due to limitations of retinal image acquisition devices, specialists can find it impossible to label certain vessels in eye fundus images. In this paper, we introduce a method that takes into account such uncertainty by design. For this, we formulate the A/V classification task as a four-class segmentation problem, and a Convolutional Neural Network is trained to classify pixels into background, A/V, or uncertain classes. The resulting technique can directly provide pixelwise uncertainty estimates. In addition, instead of depending on a previously available vessel segmentation, the method automatically segments the vessel tree. Experimental results show a performance comparable or superior to several recent A/V classification approaches. In addition, the proposed technique also attains state-of-the-art performance when evaluated for the task of vessel segmentation, generalizing to data that was not used during training, even with considerable differences in terms of appearance and resolution.
No abstract available
In biomedical image analysis, developing architectures that effectively capture long-range dependencies is crucial. Traditional Convolutional Neural Networks (CNNs) are constrained by their local receptive fields, while Transformers, though proficient in global context integration, are computationally demanding for high-dimensional medical images. Here, we present nnMamba, a novel architecture that combines the strengths of CNNs with the long-range modeling capabilities of State Space Models (SSMs). We introduce the Mamba-In-Convolution with Channel-Spatial Siamese learning (MICCSS) block to model long-range voxel relationships. Additionally, we implement channel scaling and channel-sequential learning methods to enhance performance in dense prediction and classification tasks. Extensive experiments on seven datasets demonstrate that nnMamba outperforms current state-of-the-art methods in 3D image segmentation, classification, and landmark detection. nnMamba effectively integrates CNNs' local representation with SSMs' global con-text processing, establishing a new benchmark for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba.
By incorporating the colored MRI identification synthesis into the MRI segmentation model with transfer learning AI Y-Net, this study clearly shows the high potential of a multidisciplinary system-level study for diagnoses. This way, such a system can provide integrity of the goal without compromising the quality of each one and saving time consumption. Another alternative to such integration is to be used for enhancement and segmentation that is accurate and robust to the variabilities in scanner and acquisition protocols. System Level Simulator is the deep learning based on Kearse AI deep learning network specified to Y-VGG16 net results of outstanding performance in medical image segmentation. Based on the literature, there are different AI models for the diagnosis system, which are different of what is proposed in this paper. A partial-frozen network is applied to the U-net to compare results between different fine-tuning FT strategies. The network operation is also evaluated depending on the dataset size, showing the importance of the combination of dataset, TL and data augmentation (DA). Transfer learning (TL) helps us for MRI medical image segmentation deep learning with more accurate performances of the TL technique. The system hybrid the Y-Net architecture with Transfer learning to reduce the domain-shift effect in brain MRI segmentation results of the automated deep learning segmentation approach.
Accurate segmentation of thoracoabdominal anatomical structures in three-dimensional medical imaging modalities is fundamental for informed clinical decision-making across a wide array of medical disciplines. Current approaches often struggle to efficiently and comprehensively process this region’s intricate and heterogeneous anatomical information, leading to suboptimal outcomes in diagnosis, treatment planning, and disease management. To address this challenge, we introduce SegTom, a novel volumetric segmentation framework equipped with a cutting-edge SegTom Block specifically engineered to effectively capture the complex anatomical representations inherent to the thoracoabdominal region. This SegTom Block incorporates a hierarchical anatomical-representation decomposition to facilitate efficient information exchange by decomposing the computationally intensive self-attention mechanism and cost-effectively aggregating the extracted representations. Rigorous validation of SegTom across nine diverse datasets, encompassing both computed tomography (CT) and magnetic resonance imaging (MRI) modalities, consistently demonstrates high performance across a broad spectrum of anatomical structures. Specifically, SegTom achieves a mean Dice similarity coefficient (DSC) of 87.29% for cardiac segmentation on the MM-WHS MRI dataset, 83.48% for multi-organ segmentation on the BTCV abdominal CT dataset, and 92.01% for airway segmentation on a dedicated CT dataset.
Vision Transformer shows great superiority in medical image segmentation due to the ability to learn long-range dependency. For medical image segmentation from 3-D data, such as computed tomography (CT), existing methods can be broadly classified into 2-D-based and 3-D-based methods. One key limitation in 2-D-based methods is that the intraslice information is ignored, while the limitation in 3-D-based methods is the high computation cost and memory consumption, resulting in a limited feature representation for inner slice information. During the clinical examination, radiologists primarily use the axial plane and then routinely review both axial and coronal planes to form a 3-D understanding of anatomy. Motivated by this fact, our key insight is to design a hybrid model that can first learn fine-grained inner slice information and then generate a 3-D understanding of anatomy by incorporating 3-D information. We present a novel Hybrid Residual TransFormer (HResFormer) for 3-D medical image segmentation. Building upon standard 2-D and 3-D Transformer backbones, HResFormer involves two novel key designs: 1) a Hybrid Local-Global fusion Module (HLGM) to effectively and adaptively fuse inner slice information from 2-D Transformers and intraslice information from 3-D volumes for 3-D Transformers with local fine-grained and global long-range representation and 2) residual learning of the hybrid model, which can effectively leverage the inner slice and intraslice information for better 3-D understanding of anatomy. Experiments show that our HResFormer outperforms prior art on widely used medical image segmentation benchmarks. This article sheds light on an important but neglected way to design Transformers for 3-D medical image segmentation.
Precise image segmentation provides clinical study with instructive information. Despite the remarkable progress achieved in medical image segmentation, there is still an absence of a 3D foundation segmentation model that can segment a wide range of anatomical categories with easy user interaction. In this paper, we propose a 3D foundation segmentation model, named SegVol, supporting universal and interactive volumetric medical image segmentation. By scaling up training data to 90K unlabeled Computed Tomography (CT) volumes and 6K labeled CT volumes, this foundation model supports the segmentation of over 200 anatomical categories using semantic and spatial prompts. To facilitate efficient and precise inference on volumetric images, we design a zoom-out-zoom-in mechanism. Extensive experiments on 22 anatomical segmentation tasks verify that SegVol outperforms the competitors in 19 tasks, with improvements up to 37.24% compared to the runner-up methods. We demonstrate the effectiveness and importance of specific designs by ablation study. We expect this foundation model can promote the development of volumetric medical image analysis. The model and code are publicly available at: https://github.com/BAAI-DCAI/SegVol.
Recent advances in deep learning-based medical image segmentation studies achieve nearly human-level performance in fully supervised manner. However, acquiring pixel-level expert annotations is extremely expensive and laborious in medical imaging fields. Unsupervised domain adaptation (UDA) can alleviate this problem, which makes it possible to use annotated data in one imaging modality to train a network that can successfully perform segmentation on target imaging modality with no labels. In this work, we propose SDC-UDA, a simple yet effective volumetric UDA framework for Slice-Direction Continuous cross-modality medical image segmentation which combines intra- and inter-slice self-attentive image translation, uncertainty-constrained pseudo-label refinement, and volumetric self-training. Our method is distinguished from previous methods on UDA for medical image segmentation in that it can obtain continuous segmentation in the slice direction, thereby ensuring higher accuracy and potential in clinical practice. We validate SDC-UDA with multiple publicly available cross-modality medical image segmentation datasets and achieve state-of-the-art segmentation performance, not to mention the superior slice-direction continuity of prediction compared to previous studies.
Masked Autoencoder (MAE) is a self-supervised pre-training technique that holds promise in improving the representation learning of neural networks. However, the current application of MAE directly to volumetric medical images poses two challenges: (i) insufficient global information for clinical context understanding of the holistic data, and (ii) the absence of any assurance of stabilizing the representations learned from randomly masked inputs. To conquer these limitations, we propose the Global-Local Masked AutoEncoders (GL-MAE), a simple yet effective self-supervised pre-training strategy. GL-MAE acquires robust anatomical structure features by incorporating multi-level reconstruction from fine-grained local details to high-level global semantics. Furthermore, a complete global view serves as an anchor to direct anatomical semantic alignment and stabilize the learning process through global-to-global consistency learning and global-to-local consistency learning. Our fine-tuning results on eight mainstream public datasets demonstrate the superiority of our method over other state-of-the-art self-supervised algorithms, highlighting its effectiveness on versatile volumetric medical image segmentation and classification tasks. We will release codes upon acceptance at https://github.com/JiaxinZhuang/GL-MAE
Accurate and efficient volumetric medical image segmentation is vital for clinical diagnosis, pre-operative planning, and disease-progression monitoring. Conventional convolutional neural networks (CNNs) struggle to capture long-range contextual information, whereas Transformer-based methods suffer from quadratic computational complexity, making it challenging to couple global modeling with high efficiency. To address these limitations, we explore an effective yet accurate segmentation model for volumetric data. Specifically, we introduce a novel linear-complexity sequence modeling technique, RWKV, and leverage it to design a Tri-directional Spatial Enhancement RWKV (TSE-R) block; this module performs global modeling via RWKV and incorporates two optimizations tailored to three-dimensional data: 1) a spatial-shift strategy that enlarges the local receptive field and facilitates inter-block interaction, thereby alleviating the structural information loss caused by sequence serialization; and 2) a tri-directional scanning mechanism that constructs sequences along three distinct directions, applies global modeling via WKV, and fuses them with learnable weights to preserve the inherent 3D spatial structure. Building upon the TSE-R block, we develop an end-to-end 3D segmentation network, termed U-RWKV, and extensive experiments on three public 3D medical segmentation benchmarks demonstrate that U-RWKV outperforms state-of-the-art CNN-, Transformer-, and Mamba-based counterparts, achieving a Dice score of 87.21% on the Synapse multi-organ abdominal dataset while reducing parameter count by a factor of 16.08 compared with leading methods.
In recent years, 3D volumetric medical images have been widely used in clinical diagnosis, however, the popular 2D networks were reported unsuitable for segmenting them. In this direction, we propose a plug-and-play (PnP-AE) module to improve the performance of using 2D network for 3D medical image segmentation. Our method takes advantage of the intrinsic correlation between adjacent slices, by multiple encoders and fusion components to decouple plane feature extraction and depth information integration. In addition, the proposed weight sharing and feature storage strategies make PnP-AE extremely efficient. Our method is able to conveniently incorporate with mainstream 2D networks to segment 3D volumetric medical images. Experimental results demonstrate the excellent performance of our method. The source code is available at https://github.com/qklee-lz/PnP-AE.
Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet the clinical use and typically require further refinement. In this work, we propose a quality-aware memory network for interactive segmentation of 3D medical images. Provided by user guidance on an arbitrary slice, an interaction network is firstly employed to obtain an initial 2D segmentation. The quality-aware memory network subsequently propagates the initial segmentation estimation bidirectionally over the entire volume. Subsequent refinement based on additional user guidance on other slices can be incorporated in the same manner. To further facilitate interactive segmentation, a quality assessment module is introduced to suggest the next slice to segment based on the current segmentation quality of each slice. The proposed network has two appealing characteristics: 1) The memory-augmented network offers the ability to quickly encode past segmentation information, which will be retrieved for the segmentation of other slices; 2) The quality assessment module enables the model to directly estimate the qualities of segmentation predictions, which allows an active learning paradigm where users preferentially label the lowest-quality slice for multi-round refinement. The proposed network leads to a robust interactive segmentation engine, which can generalize well to various types of user annotations (e.g., scribbles, boxes). Experimental results on various medical datasets demonstrate the superiority of our approach in comparison with existing techniques.
Deep learning-based methods have spearheaded the automatic analysis of echocardiographic images, taking advantage of the publication of multiple open access datasets annotated by experts (CAMUS being one of the largest public databases). However, these models are still considered unreliable by clinicians due to unresolved issues concerning i) the temporal consistency of their predictions, and ii) their ability to generalize across datasets. In this context, we propose a comprehensive comparison between the current best performing methods in medical/echocardiographic image segmentation, with a particular focus on temporal consistency and cross-dataset aspects. We introduce a new private dataset, named CARDINAL, of apical two-chamber and apical four-chamber sequences, with reference segmentation over the full cardiac cycle. We show that the proposed 3D nnU-Net outperforms alternative 2D and recurrent segmentation methods. We also report that the best models trained on CARDINAL, when tested on CAMUS without any fine-tuning, still manage to perform competitively with respect to prior methods. Overall, the experimental results suggest that with sufficient training data, 3D nnU-Net could become the first automated tool to finally meet the standards of an everyday clinical device.
Purpose The purpose of this study was to quantify choroidal vessels (CVs) in pathological eyes in three dimensions (3D) using optical coherence tomography (OCT) and a deep-learning analysis. Methods A single-center retrospective study including 34 eyes of 34 patients (7 women and 27 men) with treatment-naïve central serous chorioretinopathy (CSC) and 33 eyes of 17 patients (7 women and 10 men) with Vogt-Koyanagi-Harada disease (VKH) or sympathetic ophthalmitis (SO) were imaged consecutively between October 2012 and May 2019 with a swept source OCT. Seventy-seven eyes of 39 age-matched volunteers (26 women and 13 men) with no sign of ocular pathology were imaged for comparison. Deep-learning-based image enhancement pipeline enabled CV segmentation and visualization in 3D, after which quantitative vessel volume maps were acquired to compare normal and diseased eyes and to track the clinical course of eyes in the disease group. Region-based vessel volumes and vessel indices were utilized for disease diagnosis. Results OCT-based CV volume maps disclose regional CV changes in patients with CSC, VKH, or SO. Three metrics, (i) choroidal volume, (ii) CV volume, and (iii) CV index, exhibit high sensitivity and specificity in discriminating pathological choroids from healthy ones. Conclusions The deep-learning analysis of OCT images described here provides a 3D visualization of the choroid, and allows quantification of features in the datasets to identify choroidal disease and distinguish between different diseases. Translational Relevance This novel analysis can be applied retrospectively to existing OCT datasets, and it represents a significant advance toward the automated diagnosis of choroidal pathologies based on observations and quantifications of the vasculature.
Existing volumetric medical image segmentation models are typically task-specific, excelling at specific targets but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this article, we introduce segment anything model (SAM)-Med3D, a vision foundation model (VFM) for general-purpose segmentation on volumetric medical images. Given only a few 3-D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and preprocess a large-scale 3-D medical image segmentation dataset, SA-Med3D-140K, from 70 public datasets and 8K licensed private cases from hospitals. This dataset includes 22K 3-D images and 143K corresponding masks. SAM-Med3D, a promptable segmentation model characterized by its fully learnable 3-D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation demonstrates the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pretrained model. Our approach illustrates that substantial medical resources can be harnessed to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at: https://github.com/uni-medical/SAM-Med3D
Recently, deep convolutional neural networks have achieved great success for medical image segmentation. However, unlike segmentation of natural images, most medical images such as MRI and CT are volumetric data. In order to make full use of volumetric information, 3D CNNs are widely used. However, 3D CNNs suffer from higher inference time and computation cost, which hinders their further clinical applications. Additionally, with the increased number of parameters, the risk of overfitting is higher, especially for medical images where data and annotations are expensive to acquire. To issue this problem, many 2.5D segmentation methods have been proposed to make use of volumetric spatial information with less computation cost. Despite these works lead to improvements on a variety of segmentation tasks, to the best of our knowledge, there has not previously been a large-scale empirical comparison of these methods. In this paper, we aim to present a review of the latest developments of 2.5D methods for volumetric medical image segmentation. Additionally, to compare the performance and effectiveness of these methods, we provide an empirical study of these methods on three representative segmentation tasks involving different modalities and targets. Our experimental results highlight that 3D CNNs may not always be the best choice. Despite all these 2.5D methods can bring performance gains to 2D baseline, not all the methods hold the benefits on different datasets. We hope the results and conclusions of our study will prove useful for the community on exploring and developing efficient volumetric medical image segmentation methods.
Although deep neural networks have been a dominant method for many 2D vision tasks, it is still challenging to apply them to 3D tasks, such as medical image segmentation, due to the limited amount of annotated 3D data and limited computational resources. In this chapter, by rethinking the strategy to apply 3D Convolutional Neural Networks to segment medical images, we propose a novel 3D-based coarse-to-fine framework to efficiently tackle these challenges. The proposed 3D-based framework outperforms their 2D counterparts by a large margin since it can leverage the rich spatial information along all three axes. We further analyze the threat of adversarial attacks on the proposed framework and show how to defense against the attack. We conduct experiments on three datasets, the NIH pancreas dataset, the JHMI pancreas dataset and the JHMI pathological cyst dataset, where the first two and the last one contain healthy and pathological pancreases respectively, and achieve the current state-of-the-art in terms of Dice-Sorensen Coefficient (DSC) on all of them. Especially, on the NIH pancreas segmentation dataset, we outperform the previous best by an average of over $2\%$, and the worst case is improved by $7\%$ to reach almost $70\%$, which indicates the reliability of our framework in clinical applications.
Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet clinically acceptable accuracy, thus typically require further refinement. To this end, we propose a novel Volumetric Memory Network, dubbed as VMN, to enable segmentation of 3D medical images in an interactive manner. Provided by user hints on an arbitrary slice, a 2D interaction network is firstly employed to produce an initial 2D segmentation for the chosen slice. Then, the VMN propagates the initial segmentation mask bidirectionally to all slices of the entire volume. Subsequent refinement based on additional user guidance on other slices can be incorporated in the same manner. To facilitate smooth human-in-the-loop segmentation, a quality assessment module is introduced to suggest the next slice for interaction based on the segmentation quality of each slice produced in the previous round. Our VMN demonstrates two distinctive features: First, the memory-augmented network design offers our model the ability to quickly encode past segmentation information, which will be retrieved later for the segmentation of other slices; Second, the quality assessment module enables the model to directly estimate the quality of each segmentation prediction, which allows for an active learning paradigm where users preferentially label the lowest-quality slice for multi-round refinement. The proposed network leads to a robust interactive segmentation engine, which can generalize well to various types of user annotations (e.g., scribble, bounding box, extreme clicking). Extensive experiments have been conducted on three public medical image segmentation datasets (i.e., MSD, KiTS19, CVC-ClinicDB), and the results clearly confirm the superiority of our approach in comparison with state-of-the-art segmentation models. The code is made publicly available at https://github.com/0liliulei/Mem3D.
Quality control (QC) of structures segmentation in volumetric medical images is important for identifying segmentation errors in clinical practice and for facilitating model development by enhancing network performance in semi-supervised and active learning scenarios. This paper introduces SegQC, a novel framework for segmentation quality estimation and segmentation error detection. SegQC computes an estimate measure of the quality of a segmentation in volumetric scans and in their individual slices and identifies possible segmentation error regions within a slice. The key components of SegQC include: 1) SegQCNet, a deep network that inputs a scan and its segmentation mask and outputs segmentation error probabilities for each voxel in the scan; 2) three new segmentation quality metrics computed from the segmentation error probabilities; 3) a new method for detecting possible segmentation errors in scan slices computed from the segmentation error probabilities. We introduce a novel evaluation scheme to measure segmentation error discrepancies based on an expert radiologist's corrections of automatically produced segmentations that yields smaller observer variability and is closer to actual segmentation errors. We demonstrate SegQC on three fetal structures in 198 fetal MRI scans - fetal brain, fetal body and the placenta. To assess the benefits of SegQC, we compare it to the unsupervised Test Time Augmentation (TTA)-based QC and to supervised autoencoder (AE)-based QC. Our studies indicate that SegQC outperforms TTA-based quality estimation for whole scans and individual slices in terms of Pearson correlation and MAE for fetal body and fetal brain structures segmentation as well as for volumetric overlap metrics estimation of the placenta structure. Compared to both unsupervised TTA and supervised AE methods, SegQC achieves lower MAE for both 3D and 2D Dice estimates and higher Pearson correlation for volumetric Dice. Our segmentation error detection method achieved recall and precision rates of 0.77 and 0.48 for fetal body, and 0.74 and 0.55 for fetal brain segmentation error detection, respectively. Ranking derived from metrics estimation surpasses rankings based on entropy and sum for TTA and SegQCNet estimations, respectively. SegQC provides high-quality metrics estimation for both 2D and 3D medical images as well as error localization within slices, offering important improvements to segmentation QC.
In this paper, we adopt 3D Convolutional Neural Networks to segment volumetric medical images. Although deep neural networks have been proven to be very effective on many 2D vision tasks, it is still challenging to apply them to 3D tasks due to the limited amount of annotated 3D data and limited computational resources. We propose a novel 3D-based coarse-to-fine framework to effectively and efficiently tackle these challenges. The proposed 3D-based framework outperforms the 2D counterpart to a large margin since it can leverage the rich spatial information along all three axes. We conduct experiments on two datasets which include healthy and pathological pancreases respectively, and achieve the current state-of-the-art in terms of Dice-Sørensen Coefficient (DSC). On the NIH pancreas segmentation dataset, we outperform the previous best by an average of over 2%, and the worst case is improved by 7% to reach almost 70%, which indicates the reliability of our framework in clinical applications.
No abstract available
Existing volumetric medical image segmentation models are typically task-specific, excelling at specific target but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this paper, we introduce SAM-Med3D for general-purpose segmentation on volumetric medical images. Given only a few 3D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and process a large-scale 3D medical image dataset, SA-Med3D-140K, from a blend of public sources and licensed private datasets. This dataset includes 22K 3D images and 143K corresponding 3D masks. Then SAM-Med3D, a promptable segmentation model characterized by the fully learnable 3D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation shows the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pre-trained model. Our approach demonstrates that substantial medical resources can be utilized to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at https://github.com/uni-medical/SAM-Med3D.
Volumetric analysis of brain tumors is a decisive thing in the detection of brain tumors to determine the patient’s lifetime followed by action to the patient. A few studies had been shown explicitly quantified the brain tumor volume while the analysis of brain tumor volumetric by expert limited with the huge data of brain tumor patient MRI. Thorough the importance of brain tumor analysis in clinical used, the purpose of this research is to evaluate the similarity of a semi-automatic segmentation tool for brain tumor image analysis. The agreement was compared by using differences of means with 95% limits of agreement (LoA). Brain tumor segmentation was obtained by using Fast Marching and Grow Cut segmentation methods. Preoperative MRI images of 20 T2 MRI of low-grade glioma patients from The Cancer Imaging Archive (TCIA) database were used to analyze brain tumor volume. The volume obtained from the two segmentation methods is based on the similarity between the two using the intra-method agreement between two segmentation methods with a 95% limit of agreement (LoA) value and difference volume average of 920 mm3 or 0.92 mL. Its shown that both methods had the same performance.
No abstract available
Lesion detection and segmentation are essential yet complex tasks in medical image analysis due to the substantial variability in lesion shape, size, contrast, and anatomical location across different organs. Existing deep learning methods often lack adaptability, as they are typically designed for specific organs or imaging modalities, leading to limited generalization when applied to diverse datasets. To address this limitation, this study introduces a unified and generalizable framework capable of accurate multi‐organ lesion detection, localization, and segmentation across heterogeneous medical imaging data. The proposed U‐VQVAE‐CTLesionNet integrates a U‐Net–based encoder–decoder architecture for spatial feature extraction with a Vector Quantized Variational Autoencoder (VQVAE) module that discretizes latent features through a learnable codebook, enabling the network to capture intricate texture and intensity variations while preserving structural consistency. A Bounding Box Regression (BBR) component is incorporated for lesion localization, followed by a GrabCut‐based refinement step that iteratively adjusts lesion boundaries using Gaussian Mixture Model estimation and graph‐cut optimization. The framework is further supported by a comprehensive preprocessing pipeline involving intensity normalization, Hounsfield Unit windowing, and affine transformations to standardize image quality and enhance model robustness across modalities. Comprehensive experiments conducted on multiple publicly available and locally curated datasets encompassing lung and kidney lesions validated the accuracy and stability of the proposed approach. For lung CT detection, the model achieved 98.8% accuracy, 98.0% precision, 97.03% recall, and a 97.51% F1‐score, while kidney CT detection attained 99.1% accuracy, 99.0% precision, 98.8% recall, and a 98.9% F1‐score. Segmentation performance yielded Dice coefficients of 96.5% for lung and 97.8% for kidney, with corresponding IoU values of 93.2% and 95.1%, and Hausdorff Distances of 2.8 mm for lung and 2.3 mm for kidney, respectively. Ablation studies further confirmed that the inclusion of preprocessing, quantization, BBR, and GrabCut modules improved segmentation accuracy by approximately 2%–3% compared to configurations without these components. These results demonstrate that U‐VQVAE‐CTLesionNet provides a robust, organ‐agnostic framework for precise lesion analysis and establishes a solid foundation for future expert‐assisted clinical integration.
Abstract Objectives Medical photography is ubiquitous and plays an increasingly important role in the fields of medicine and surgery. Any assessment of these photographs by computer vision algorithms requires first that the area of interest can accurately be delineated from the background. We aimed to develop deep learning segmentation models for kidney and liver organ donation photographs where accurate automated segmentation has not yet been described. Methods Two novel deep learning models (Detectron2 and YoloV8) were developed using transfer learning and compared against existing tools for background removal (macBGRemoval, remBGisnet, remBGu2net). Anonymised photograph datasets comprised training/internal validation sets (821 kidney and 400 liver images) and external validation sets (203 kidney and 208 liver images). Each image had two segmentation labels: whole organ and clear view (parenchyma only). Intersection over Union (IoU) was the primary outcome, as the recommended metric for assessing segmentation performance. Results In whole kidney segmentation, Detectron2 and YoloV8 outperformed other models with internal validation IoU of 0.93 and 0.94, and external validation IoU of 0.92 and 0.94, respectively. Other methods – macBGRemoval, remBGisnet and remBGu2net – scored lower, with highest internal validation IoU at 0.54 and external validation at 0.59. Similar results were observed in liver segmentation, where Detectron2 and YoloV8 both showed internal validation IoU of 0.97 and external validation of 0.92 and 0.91, respectively. The other models showed a maximum internal validation and external validation IoU of 0.89 and 0.59 respectively. All image segmentation tasks with Detectron2 and YoloV8 completed within 0.13–1.5 s per image. Conclusions Accurate, rapid and automated image segmentation in the context of surgical photography is possible with open-source deep-learning software. These outperform existing methods and could impact the field of surgery, enabling similar advancements seen in other areas of medical computer vision.
In recent years, computer-aided diagnostic systems had found wide application in ultrasound image analysis to enhance reliability and alleviate the workload of ultrasound practitioners. The advent of deep learning, particularly the rise of Transformer, has significantly improved the performance of various medical diagnostic tasks, including breast cancer detection, thyroid nodule segmentation, fetal pathology assessment, primary thyroid cancer lymph node metastasis prediction, prostate cancer localization, and brachial plexus nerve system diagnosis. However, most deep learning-based ultrasound lesion segmentation algorithms focus on specific organ types and lack a generalized approach for multi-organ lesion segmentation. This paper presents a unified segmentation network for two-dimensional grayscale ultrasound images, utilizing medical priors to guide the network in learning the relationships between multiple targets. The proposed network introduces a Conv Former module to effectively fuse features at different scales, promoting multi-scale information recognition. Additionally, a self-attention mechanism is introduced to capture internal correlations within the features, thereby reducing reliance on external information. Comprehensive experiments validate the effectiveness and scalability of the proposed framework. Superior performance is achieved compared to the current state-of-the-art segmentation models on a unified ultrasound dataset comprising over 6,000 images and five anatomical sites. This research contributes to the application of computer-aided diagnosis in clinical ultrasound, significantly reducing the workload of healthcare professionals.
BACKGROUND Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI). PURPOSE Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA). METHODS The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at Columbia University between 2010-2020 indicated solid tumors (CUIMC, n = 5011) and from two clinical trials in metastatic colorectal cancer, PRIME (n = 1183) and Amgen (n = 463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (n = 3996), validation (n = 570), and testing (n = 1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (n = 525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (n = 197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10 mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression). RESULTS ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions. CONCLUSIONS Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types.
Background Skin cancer is one of the most prevalent cancers worldwide. In the clinical domain, skin lesions such as melanoma detection are still a challenge due to occlusions, poor contrast, poor image quality, and similarities between skin lesions. Deep-/machine-learning methods are used for the early, accurate, and efficient detection of skin lesions. Therefore, we propose a boundary-aware segmentation network (BASNet) model comprising prediction and residual refinement modules. Materials and methods The prediction module works like a U-Net and is densely supervised by an encoder and decoder. A hybrid loss function is used, which has the potential to help in the clinical domain of dermatology. BASNet handles these challenges by providing robust outcomes, even in suboptimal imaging environments. This leads to accurate early diagnosis, improved treatment outcomes, and efficient clinical workflows. We further propose a compact convolutional transformer model (CCTM) based on convolution and transformers for classification. This was designed on a selected number of layers and hyperparameters having two convolutions, two transformers, 64 projection dimensions, tokenizer, position embedding, sequence pooling, MLP, 64 batch size, two heads, 0.1 stochastic depth, 0.001 learning rate, 0.0001 weight decay, and 100 epochs. Results The CCTM model was evaluated on six skin-lesion datasets, namely MED-NODE, PH2, ISIC-2019, ISIC-2020, HAM10000, and DermNet datasets, achieving over 98% accuracy. Conclusion The proposed model holds significant potential in the clinical domain. Its ability to combine local feature extraction and global context understanding makes it ideal for tasks like medical image analysis and disease diagnosis.
Medical image segmentation is a pivotal technology for improving diagnostic accuracy and treatment outcomes and is critical for accurate lesion localization. Deep learning methods have become a significant tool for processing complex biomedical image data and have contributed significantly to the development of imaging segmentation techniques. However, ambiguous object boundaries and finite annotated samples tend to restrict the best performance of the models. To overcome these challenges, in this paper, we present a novel plug-and-play lightweight boundary-aware multitask detection head to improve the segmentation performance of deep learning models. Specifically, this module integrates Boundary Detection and Signed Distance Map as auxiliary tasks to efficiently utilize the pixel-level annotation information, which significantly improves the ability of the deep learning models to recognize and precisely delineate target boundaries, thereby enhancing segmentation performance. The experiments on two public segmentation datasets demonstrated that our proposed detection head outperforms traditional segmentation heads in the evaluation metrics with a slight increase in model parameters. These results not only verify the efficacy of our proposed model in improving the accuracy of boundary detection but also provide valuable insights for achieving more accurate medical image segmentation using deep learning models.
Medical image segmentation is important in diagnosing diseases and planning treatments. Abdominal organ segmentation is difficult due to variations in shape, size, and position. Traditional methods struggle with unclear boundaries and overlapping structures. Deep learning models improve segmentation but often miss details or fail with complex shapes. A new deep learning model is designed to improve accuracy. It combines convolutional networks with transformers to capture both local and global features. The model extracts multi-scale features to preserve fine details. A transformer module processes global relationships between organs. An attention-based decoder refines segmentation masks to enhance boundary detection. The model is tested on medical datasets containing CT and MRI scans. It is compared with leading segmentation methods using standard metrics. The results show improved accuracy and better-defined organ boundaries. The Dice similarity score increases by 3% for liver segmentation and 5% for kidney segmentation. The model also reduces false positives and improves consistency in segmenting complex structures. This method addresses key challenges in abdominal organ segmentation. It captures fine details while understanding organ relationships. The improved accuracy makes it useful for clinical applications.
Medical image segmentation is crucial for clinical diagnosis and treatment planning, as accurate boundary segmentation impacts lesion localization, organ identification, and quantitative assessment. While deep learning methods have significantly improved segmentation performance, two key challenges remain: first, existing methods heavily rely on large annotated datasets, which are hard to obtain due to patient privacy concerns and high annotation costs; second, achieving high-precision segmentation is still challenging in scenarios with low contrast or blurred boundaries in certain imaging modalities. In this paper, we propose Two stage SAM with CNN Augment (TsSAM-CA), a prompt-free SAM adaptation method, which fine-tunes a pre-trained base model via the Mona adapter, reducing the need for extensive manual annotations. We introduce two core components: the ResNet (RN) encoder and the Adaptive Feature Fusion Block (AFFB). The RN encoder, working in parallel with the image encoder, uses hierarchical convolutions to recover boundary information missed by long-range attention. The AFFB integrates fine-grained local features with global context, enhancing the decoder's input. On the Synapse multi-organ segmentation dataset, TsSAM-CA achieves 87.77% DSC and 10.82 HD, exceeding the performance of current leading methods.
The precise delineation of pathological lesions in medical imaging is a cornerstone of computer-aided diagnosis, yet the development of robust segmentation models is frequently hindered by the scarcity of pixel-level annotations. While weakly supervised semantic segmentation (WSSS) offers a viable alternative by utilizing image-level labels, traditional approaches often struggle with boundary adherence and co-occurrence artifacts. This paper introduces a novel framework that leverages the zero-shot generalization capabilities of promptable foundation models, specifically adapting the Segment Anything Model (SAM), for weakly supervised lesion detection. We propose a two-stage pipeline: first, a coarse localization module generates geometric prompts from class activation maps; second, a promptable segmentation module refines these predictions. Crucially, we introduce an Uncertainty-Aware Refinement (UAR) mechanism that utilizes epistemic uncertainty estimation to identify and correct segmentation errors in ambiguous regions. Experimental validation on diverse datasets, including lung nodule and skin lesion benchmarks, demonstrates that our method significantly outperforms existing WSSS baselines and approaches fully supervised performance. The integration of uncertainty quantification not only improves segmentation accuracy but also provides interpretability essential for clinical deployment.
In medical imaging, semantic segmentation is essential because it can precisely locate and isolate regions of interest, such lesions or tumours, from intricate anatomical systems. Deep learning (DL) has led to a substantial evolution in segmentation approaches, which have improved medical diagnostics' accuracy and efficiency. The performance of six cutting-edge models is assessed in this paper: LinkNet, U-Net with ResNet50, LinkNet with ResNet50, FPN Net, Simple U-Net, and SegNet-Transformer across four segmentation tasks—polyp detection, lung lesion segmentation from Computed Tomography (CT) scans, breast lesion detection from ultrasound images, and Segmenting brain tumours with magnetic resonance imaging (MRI). With results like 0.9618 for brain tumour segmentation and 0.9483 for lung lesion segmentation, SegNet-Transformer consistently obtained the highest Dice Coefficient across all tests, demonstrating superior ability in capturing intricate boundaries. Other models, such as U-Net with ResNet50, performed competitively with Dice scores of 0.9375 for lung lesion segmentation and 0.9472 for brain tumors but required slightly more memory and longer training times. This study underscores that model performance varies across different tasks, revealing specific strengths and weaknesses. Ultimately, while SegNet-Transformer demonstrates robustness, the choice of model should be informed by the unique requirements of each segmentation task, balancing accuracy, efficiency, and computational demands.
Medical image segmentation is a fundamental task in computer-aided diagnosis, playing a crucial role in organ structure analysis, lesion delineation, and treatment planning. However, current Transformer-based segmentation networks still face two major challenges. First, the global self-attention in the encoder often introduces redundant connections, leading to high computational cost and potential interference from irrelevant tokens. Second, the decoder shows limited capability in reconstructing fine-grained boundary structures, resulting in blurred segmentation contours. To address these issues, we proposed an efficient and accurate framework for general medical image segmentation. Specifically, in the encoder, we introduce a frequency-domain similarity measure and construct a Key-Semantic Dictionary (KSD) via amplitude spectrum cosine similarity. This enables stage-wise sparse attention matrices that reduce redundancy and enhance semantic relevance. In the decoder, we design a learnable gradient-based operator that injects boundary-aware logits bias into the attention mechanism, thereby improving structural detail recovery along object boundaries. On ACDC, the framework delivers a 0.55% gain in average Dice and a 14.6% reduction in HD over the second-best baseline. On ISIC 2018, it achieves increases of 1.01% in Dice and 0.21% in ACC over the second-best baseline, while using 88.8% fewer parameters than typical Transformer-based models. On Synapse, it surpasses the strongest prior approach by 1.03% in Dice and 6.35% in HD, yielding up to 8.36% Dice improvement and 52.46% HD reduction compared with widely adopted Transformer baselines. Comprehensive results confirm that the proposed frequency-domain sparse attention and learnable edge-guided decoding effectively balance segmentation accuracy, boundary fidelity, and computational cost. This framework not only suppresses redundant global correlations and enhances structural detail reconstruction, but is also robust to different medical imaging modalities, providing a lightweight and clinically applicable solution for high-precision medical image segmentation.
Detection of skin diseases is a very important branch of medical image analysis and, hence, early diagnosis is vital for the patient's recovery. This paper proposes an integrated multi-stage pipeline of deep learning for automated skin lesion analysis, which includes lesion segmentation, coarse classification, and refined classification using a Lesion Index Calculation Unit (LICU). The segmentation stage precisely segments the lesion, which has a high boundary accuracy of 95%, thus, data that are restricted to real sections could be analyzed. The coarse classification stage is a preliminary step of disease probability, but it has a high tendency for false positives. The LICU refinement stage not only removes these overdiagnoses but also increases the sensitivity by 0.92 and improves the accuracy to 92%, thus, precision in lesion detection is successfully achieved. Quantitative metrics such as sensitivity, specificity, and the time it takes to process the data, verify both the robustness and the speed of the model. The qualitative analysis also proves that the refined classification stage is more accurate and therefore goes beyond lesion boundaries, which is essential for clinical diagnostics. This multi-step technique not only sharpens the doctor's tools for recognizing the disease but also offers a model that can be easily interpreted thus facilitating doctors in the analysis of complex cases such as skin conditions. In addition, the proposed pipeline is useful in the clinical sector by enabling precise and faster skin disease diagnosis, according to our experiments.
Accurate skin lesion localization and boundary detection are imperative for early diagnosis and effective treatment planning of skin diseases, including malignant tumors. Despite the advancements in medical imaging and computational methods, achieving high precision and specificity in various clinical scenarios remains a challenge. This paper addresses this critical gap and its impact is substantial. Existing methods primarily rely on traditional image processing techniques and machine learning architectures such as Convolutional Neural Networks (CNNs). While these techniques have shown promise, they often fail to achieve high levels of precision and specificity, especially in varying clinical conditions. Furthermore, computational inefficiency is a significant drawback, making real-time analysis a challenging prospect. To overcome these limitations, this paper proposes a novel approach that integrates Capsule Networks (Capsule Net) with Conditional Random Fields (CRFs). Capsule Networks are effective in preserving spatial hierarchies and are less sensitive to variations in data, making them well-suited for the complex task of skin lesion localization. On the other hand, Conditional Random Fields fine-tune the output from Capsule Networks, enhancing precision and specificity in boundary detection process. Our experiments demonstrate remarkable improvements over existing methods: a 3.5% increase in precision, a 2.9% boost in accuracy, a 3.4% gain in recall, and a 4.9% improvement in speed across different scenarios. The approach is also computationally efficient, making it suitable for real-time applications. This paper effectively addresses existing limitations by introducing a robust, accurate, and computationally efficient technique for skin lesion localization. It holds the potential to set new standards for medical image analysis and significantly contributes to enhancing the quality of healthcare delivery scenarios.
Deep learning has demonstrated exceptional performance in medical image analysis, but its effectiveness degrades significantly when applied to different medical centers due to domain shifts. Lesion detection, a critical task in medical imaging, is particularly impacted by this challenge due to the diversity and complexity of lesions, which can arise from different organs, diseases, imaging devices, and other factors. While collecting data and labels from target domains is a feasible solution, annotating medical images is often tedious, expensive, and requires professionals. To address this problem, we combine active learning with domain-invariant feature learning. We propose a Domain-shift Active Learning (DistAL) framework, which includes a transferable feature learning algorithm and a hybrid sample selection strategy. Feature learning incorporates contrastive-consistency training to learn discriminative and domain-invariant features. The sample selection strategy is called RUDY, which jointly considers Representativeness, Uncertainty, and DiversitY. Its goal is to select samples from the unlabeled target domain for cost-effective annotation. It first selects representative samples to deal with domain shift, as well as uncertain ones to improve class separability, and then leverages K-means++ initialization to remove redundant candidates to achieve diversity. We evaluate our method for the task of lesion detection. By selecting only 1.7% samples from the target domain to annotate, DistAL achieves comparable performance to the method trained with all target labels. It outperforms other AL methods in five experiments on eight datasets collected from different hospitals, using different imaging protocols, annotation conventions, and etiologies.
The deep integration of artificial intelligence technology into medical image analysis has positioned the early and accurate diagnosis of prostate cancer as a critical challenge in clinical research. To address issues such as ambiguous lesion boundaries and strong heterogeneity in multi-modal MRI data for prostate cancer, this paper proposes a lesion detection and segmentation method based on a generative adversarial network (GAN) that synthesizes pseudo-normal images. This method utilizes the GAN to generate pseudo-normal images, thereby accentuating the differences between lesions and normal tissue. Combined with a difference-aware weighting mechanism that dynamically adjusts loss function weights, the approach significantly enhances segmentation accuracy and model generalization. Experiments conducted on the public PICAI2022 dataset and a private dataset demonstrate that the proposed method outperforms existing mainstream models across multiple metrics, including the Dice coefficient, sensitivity, and false positive rate (FPR), achieving a reduction in FPR by 0.3 per case. This provides an effective solution for computer-aided diagnosis in prostate cancer imaging.
Early and accurate detection of breast cancer is a major challenge in medical diagnostics. Mammograms and ultrasound images often face issues such as poor contrast and unclear lesion boundaries, which can hinder the performance of deep learning diagnostic models. To tackle these problems, a new model called the Hybrid Ensemble DL Model for Breast Cancer Detection and Classification with Enhanced Breast Lesion Segmentation using U-Net Model (HEBEBU) is introduced. This HEBEBU model enhances image quality and diagnostic accuracy through advanced preprocessing, segmentation, and classification techniques. Initially, the model uses morphological erosion to reduce structural noise, making complex breast tissue patterns more understandable. It then applies CLAHE to improve local contrast and highlight micro-calcifications. A Laplacian of Gaussian (LoG) edge detection and Unsharp Masking sharpen the lesion borders and enhance structural visibility. For image segmentation, a unique U-Net design is employed, which maintains spatial resolution through skip connections and utilizes binary cross-entropy for training. An Active Contour Model (ACM) further refines segmentation results by accurately defining irregular lesion borders. An RVFL neural network ensures fast, non-iterative training while achieving high accuracy. The HEBEBU model was tested and showed impressive results, with training accuracy of 99.2% and testing accuracy of 99.0%. Other metrics fell consistently above 99%.
Breast cancer remains one of the leading causes of mortality among women worldwide, emphasizing the need for accurate and efficient diagnostic tools. Ultrasound imaging is widely used for breast lesion screening due to its affordability and safety, yet manual interpretation often suffers from variability and subjectivity. Recent advancements in deep learning, particularly the YOLO (You Only Look Once) family, have demonstrated strong potential for real-time medical image detection and segmentation. This study aims to compare the performance of YOLOv8m-seg and YOLOv11m-seg models in detecting and segmenting breast lesions from ultrasound images to determine which model offers a better balance between accuracy, sensitivity, and computational efficiency. Two public ultrasound datasets were employed to ensure data diversity and evaluation fairness. Both models were trained under identical preprocessing, augmentation, and hyperparameter settings using 640×640 input resolution and the AdamW optimizer. Model performance was evaluated through Precision, Recall, F1-score, mAP@0.5, mAP@0.5:0.95, Mask Precision, and Inference Time metrics. The experimental results show that YOLOv11m-seg outperformed YOLOv8m-seg in precision (0.859), mask accuracy (0.859), and inference time (16.7 ms), while YOLOv8m-seg maintained slightly higher recall (0.736). YOLOv11m-seg demonstrated stronger generalization across heterogeneous datasets and superior boundary segmentation. YOLOv11m-seg achieved the best overall performance and is more suitable for real-time clinical applications. This study contributes empirical benchmarks for future Computer-Aided Diagnosis (CAD) development and highlights the potential of modern YOLO architectures in improving breast ultrasound lesion detection accuracy and efficiency.
The accurate diagnosis of disorders is contingent upon the accurate segmentation of medical images. It enables physicians to isolate specific regions of the body for further investigation. This study introduces a novel deep learning-based segmentation technique that employs a dynamic region-based model in conjunction with convolutional neural networks (CNNs) to enhance the precision of borders and the accuracy of segmentation. To overcome the obstacles associated with segmenting intricate medical images and guarantee precise edge identification, a novel adaptive boundary loss function is suggested. The attention mechanism of the model enables it to fine-tune both small and complicated regions of interest, and it is compatible with a variety of imaging modalities, such as MRI, CT, and ultrasound. The proposed method outperforms existing models on Dice coefficient and Intersection over Union (IoU) metrics, as evidenced by comprehensive evaluations on benchmark datasets, including skin lesion and brain tumor segmentation. The findings indicate that the model has the potential to enhance clinical decision-making by offering more precise disease detection segmentation.
Ultrasound image segmentation is crucial for early disease detection and treatment planning but remains a challenging task due to the low contrast of organ boundaries and varying image quality. Current methods often require manual intervention or have limited accuracy. In this paper, we propose a novel hybrid framework that combines an automatic option polygon segment (AOPS) algorithm and a distributed- and memory-based evolution (DME) algorithm for precise ultrasound organ segmentation. Our pipeline consists of two cascaded stages: (1) a coarse segmentation step using the AOPS algorithm, which determines the number of vertices/clusters without human intervention, and (2) a refinement step using the DME algorithm to hunt for the optimal neural network, which is then used to represent a smooth, explainable mathematical expression of the organ boundary. We employ the fractional backpropagation learning network with L2 regularization (FBLN) for training and use the scaled exponential linear unit (SELU) activation function to address the vanishing gradient problem. This is a new attempt such a hybrid framework is applied to ultrasound organ segmentation tasks, and it demonstrates significant contributions in terms of accuracy, smoothness, and computational efficiency.
Deploying deep segmentation models in clinical settings is hindered by their excessive computational demands, which exceed the capabilities of mobile and edge devices widely used in low-resource healthcare. We propose RM-SSNet, an ultra-lightweight medical image segmentation framework that achieves 2,612 parameters-a 99.97 % reduction compared to standard UNet-while maintaining competitive accuracy. First, a recursive multi-scale pyramid architecture reuses identical transformation functions across different spatial resolutions, achieving multiresolution modeling capabilities while maintaining efficient parameter sharing. Second, a dynamic state space fusion module captures long-range spatial dependencies with linear computational complexity, replacing the quadratic complexity of traditional attention mechanisms. Third, a boundary-aware knowledge-guided refinement mechanism leverages parameterfree classical edge detection combined with learnable anatomical constraints to enhance segmentation precision without additional computational overhead. This integrated design enables clinically viable accuracy across multiple medical imaging datasets, including skin lesions, breast ultrasound, and retinal images. The framework supports real-time segmentation on mobile devices, making advanced AI-assisted diagnostic screening accessible in low-resource clinical settings.
No abstract available
No abstract available
Despite its importance in computer-aided diagnosis, medical image segmentation frequently faces challenges due to limited annotated data and significant patient variability. To address this, attention mechanisms are integrated with meta-learning for few-shot skin lesion segmentation using the ISIC dermoscopic dataset. An Attention U-Net enhanced with spatial and channel squeeze-and-excitation (scSE) blocks is employed to improve feature representation, while two meta-learning techniques—Model-Agnostic Meta-Learning (MAML) and Reptile—are applied to enable rapid adaptation. Performance is evaluated using accuracy, recall, IoU, loss, and ROC-AUC. Results indicate that the attention-augmented architecture improves recall and boundary detection, while meta-learning further boosts adaptability, with Reptile achieving the highest accuracy and MAML yielding superior IoU. These findings highlight the potential of combining attention mechanisms with meta-learning to build more robust and adaptable few-shot medical image segmentation frameworks.
Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are also critical but prone to detail loss during downsampling, reducing segmentation accuracy. To address these issues, we propose a novel adaptive scale thresholding network (AdSTNet) that acts as a post-processing lightweight network for enhancing sensitivity to lesion edges and cores through a dual-threshold adaptive mechanism. The dual-threshold adaptive mechanism is a key architectural component that includes a main threshold map for core localization and an edge threshold map for more precise boundary detection. AdSTNet is compatible with any segmentation network and introduces only a small computational and parameter cost. Additionally, Spatial Attention and Channel Attention (SACA), the Laplacian operator, and the Fusion Enhancement module are introduced to improve feature processing. SACA enhances spatial and channel attention for core localization; the Laplacian operator retains edge details without added complexity; and the Fusion Enhancement module adapts concatenation operation and Convolutional Gated Linear Unit (ConvGLU) to improve feature intensities to improve edge and small lesion segmentation. Experiments show that AdSTNet achieves notable performance gains on ISIC 2018, BUSI, and Kvasir-SEG datasets. Compared with the original U-Net, our method attains mIoU/mDice of 83.40%/90.24% on ISIC, 71.66%/80.32% on BUSI, and 73.08%/81.91% on Kvasir-SEG. Moreover, similar improvements are observed in the rest of the networks.
Lung cancer is a form of cancer that causes uncontrollable cell growth in the lungs. Patients with lung cancer frequently miss a treatment, face higher health care costs, and get the worst outcomes. The detection of the existence of lung cancer can be performed in a variety of ways, such as computed tomography (CT), magnetic resonance imaging (MRI), and radiography. Many researchers have developed ways of automating lung cancer diagnosis using image processing techniques because of the noise and low image quality between the cancer cells, the lung, and the background. This study develops an image processing technique that uses image segmentation algorithms to segment lung nodules in computed tomography images using feature extraction. In the initial phase, it is essential to establish a rigorous image processing framework with the following sequential steps: (i) object edge identification and (ii) lesion boundary recognition. The architecture includes image processing techniques, thresholding, and morphological detections (erosion and dilation). Lesions can have various sizes and shapes, both regular and irregular. The new method has been applied to find the lesions using their roundness size. In addition to learning purely from CT scans, the previously studied lesion characteristics are also integrated. Data was collected from the Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, Penang. The manual segmentation was used image segmented in the MATLAB software function to remove the background of the images. The perimeter evaluates such as accuracy, recall, and F-score. Based on the analysis the performance of lung lesion segmentation of accuracy is 99.95, recall at 45.76%, and the F-score is 60.67%. For lung lesion detection, the results shows it consist of 3-5 slices with the value of roundness. Besides, lesion detection also have continuity for the roundness value. The experiment results found clear support for the next step of this research for classifications of lesions.
Skin diseases are common medical conditions, and early detection significantly contributes to improved cure rates. To address the challenges posed by complex lesion morphology, indistinct boundaries, and image artifacts, this paper proposes a skin lesion segmentation method based on multi-scale attention and bidirectional long short-term memory (Bi-LSTM). Built upon the U-Net architecture, the proposed model enhances the encoder with dense convolutions and an adaptive feature fusion module to strengthen feature extraction and multi-scale information integration. Furthermore, it incorporates both channel and spatial attention mechanisms along with temporal modeling to improve boundary delineation and segmentation accuracy. A generative adversarial network (GAN) is also introduced to refine the segmentation output and boost generalization performance. Experimental results on the ISIC2017 dataset demonstrate that the method achieves an accuracy of 0.950, a Dice coefficient of 0.902, and a mean Intersection over Union (mIoU) of 0.865. These results indicate that the proposed approach effectively improves lesion segmentation performance and offers valuable support for computer-aided diagnosis of skin diseases.
No abstract available
Accurate medical image segmentation plays a crucial role in improving the precision of computer‐aided diagnosis. However, complex boundary shapes, low contrast and blurred anatomical structures make fine‐grained segmentation a challenging task. Variational Bayesian inference quantifies uncertainty through probability distributions and can construct robust probabilistic models for the boundaries of ambiguous organs and tissues. In this paper, we apply variational Bayesian inference to medical image segmentation and propose variational attention to model the uncertainty of low‐contrast and blurry tissue and organ boundaries. This enhances the model's ability to perceive segmentation boundaries, improving robustness and segmentation accuracy. Variational attention first estimates the parameters of the probability distribution of latent representations based on input features. Then, it samples latent representations from the learnt distribution to generate attention weights that optimise the interaction between global features and ambiguous boundaries. We integrate variational attention into the U‐Net model by replacing its skip connections, constructing a multi‐scale variational attention segmentation model (V‐UNet). Experiments on the ISBI 2012 and MoNuSeg 2018 datasets show that our method achieves Dice scores of 95.89% and 82.18%, respectively. Moreover, we integrate V‐UNet into the Mask R‐CNN framework by replacing the FPN feature extraction head and propose a two‐stage segmentation method. Compared to the original Mask R‐CNN, our method improves the Dice score by 0.81%, mAP by 8.06% and F1 score by 0.51%.
In recent years, deep learning based techniques have been successfully applied to medical image segmentation, which plays an important role in intelligent lesion analysis and disease diagnosis. At present, the mainstream segmentation models are primarily based on the U-Net model for extracting local features through multi-layer convolution, which lacks global information and the multi-scale semantic information interaction between the Encoder and Decoder process, leading to sub-optimal segmentation performance. To address such issues, in this work we propose a new medical image segmentation network, namely SACA-UNet, which improves the U-Net model via the self-attention and cross atrous spatial pyramid pooling (Cross-ASPP) mechanisms. In specific, SACA-UNet first utilizes the self- attention mechanism to capture the global feature, it next devises a Cross-ASPP module to extract and fuse features of varying reception fields to prompt multi-scale semantic interaction. We evaluate the segmentation performance of our proposed model on four benchmark datasets including the ISIC2018, BUSI, CVC- ClinicDB, and COVID-19 datasets, in terms of both the Dice coefficient and IoU metrics. Experimental results demonstrate that SACA-UNet remarkably outperforms the baseline methods.
No abstract available
In multi-organ segmentation tasks, both local details and global contextual information are crucial. Existing main-stream methods based on CNN-Transformer hybrid architectures typically employ simple serial stacking, end-stage concatenation, or pointwise addition for feature fusion, which struggle to handle feature inconsistency and often lead to information conflict and loss. To address the aforementioned challenges, we innovatively propose TBHF-Unet. We design a three-branch hierarchical encoder that dynamically fuses multi-source features in parallel, achieving deep layer-wise integration of multi-source information. The hierarchical structure maintains the independence of each branch while avoiding feature degradation, enabling superior performance without the need for excessively deep networks. Additionally, we design a Local-Global Feature Fusion (LGFF) module to efficiently and accurately integrate local details with global semantics, effectively alleviating feature inconsistency and achieving more comprehensive feature representation. Experiments on five public datasets demonstrate that the proposed method outperforms existing segmentation techniques, showing higher segmentation accuracy and robustness.
Achieving a balance between spatial and channel feature representations is critical for improving performance in medical image segmentation. This paper proposes the spatial-channel feature enhancement and adaptive fusion (SCEAF) module. This module is composed of a multi-scale spatial attention gated block (MSAGBlock) and a channel attention modulation block (CAMBlock) operating in parallel. The MSAGBlock enhances spatial detail recovery, while the CAMBlock strengthens channel feature discrimination, and achieves dynamic fusion between the two blocks by means of gated weighting. Building upon the RWKV-UNet backbone network, we integrate the SCEAF module into the decoder to construct the novel SCEAF-UNet architecture. In addition, we introduce the lightweight edge attention fusion (EAF) module at the skip connection, which captures edge information and highlights structural contours, helping the network better delineate organ borders. Experiments conducted on the public Synapse and ACDC datasets indicate that SCEAF-UNet significantly surpasses current models of various architectures. Further ablation experiments verify the effectiveness and scalability of the designed modules, which are suitable for integration into diverse medical image segmentation architectures.
Biomedical image segmentation plays a crucial role in aiding diagnosis and treatment planning. However, constructing effective frameworks remains challenging due to the variable size and irregular shape of target structures. U-Net has become a cornerstone in this field, but integrating it with Transformer or Multilayer Perceptron (MLP) models faces limitations such as quadratic computational complexity and insufficient interpretability. To address these challenges, we propose KM-UNet, a novel structure inspired by state-space models (SSMs) ( e.g ., Mamba) and the Kolmogorov-Arnold network (KAN). KM-UNet leverages nonlinear, learnable activation functions, rooted in the Kolmogorov-Arnold representation theorem, to enhance interpretability and efficiency. By combining the strengths of state-space models and Kolmogorov-Arnold networks, KM-UNet achieves a balance between accuracy and computational performance. Experiments on five public datasets demonstrate its superiority. On the BUSI dataset, KM-UNet achieved an Intersection over Union (IoU) of 65.21% and an F1-score (F1) of 78.43%, improving IoU by 1.83% over state-of-the-art methods. It also achieved the highest IoUs on the Glas (87.31%) and CVC (85.22%) datasets and delivered the best overall performance on the ISIC series datasets. These results highlight KM-UNet’s ability to effectively integrate global and local information while maintaining interpretability. With its powerful feature extraction capabilities and computational efficiency, KM-UNet emerges as a versatile and reliable framework for medical image segmentation across diverse biomedical applications. The source code is available at https://github.com/2760613195/KM_UNet .
Medical images are information carriers that visually reflect and record the anatomical structure of the human body, and play an important role in clinical diagnosis, teaching and research, etc. Modern medicine has become increasingly inseparable from the intelligent processing of medical images. In recent years, there have been more and more attempts to apply deep learning theory to medical image segmentation tasks, and it is imperative to explore a simple and efficient deep learning algorithm for medical image segmentation. In this paper, we conduct research on multi-modal medical image segmentation algorithms with a hybrid architecture of Convolutional Neural Networks and Vision Transformer. This paper proposes a multi-modal medical image segmentation model SWT-UNet based on the CNN-ViT hybrid framework. The self-attention mechanism and sliding window design of the visual Transformer are used to capture the global feature association and break the limitation of the receptive field caused by the inductive bias of the convolution operation. At the same time, a widened self-attentive vector is used to streamline the number of modules and compress the model size, so as to fit the characteristics of a small amount of medical data, which makes the model easy to be overfitted. Experiments on two multi-modal medical image datasets validate that the algorithm can achieve efficient medical image segmentation at a lightweight scale.
Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.
No abstract available
Benefiting from the powerful generative capabilities of diffusion models, recent studies have utilized these models to address 2D medical image segmentation problems. However, directly extending these methods to 3D medical image segmentation slice-by-slice does not yield satisfactory results. The reason is that these approaches often ignore the inter-slice relations of 3D medical data and require significant computational costs. To overcome these challenges, we devise the first diffusion-based model (i.e., Diff-UNet) with two branches for general 3D medical image segmentation. Specifically, we devise an additional boundary-prediction branch to predict the auxiliary boundary information of the target segmentation region, which assists the diffusion-denoising branch in predicting 3D segmentation results. Furthermore, we design a Multi-granularity Boundary Aggregation (MBA) module to embed both low-level and high-level boundary features into the diffusion denoising process. Then, we propose a Monte Carlo Diffusion (MC-Diff) module to generate an uncertainty map and define an uncertainty-guided segmentation loss to improve the segmentation results of uncertain pixels. Moreover, during our diffusion inference stage, we develop a Progressive Uncertainty-driven REfinement (PURE) strategy to fuse intermediate segmentation results at each diffusion inference step. Experimental results on the three latest large-scale datasets (i.e., BraTS2023, SegRap2023, and AIIB2023) with diverse organs and modalities show that our Diff-UNet quantitatively and qualitatively outperforms state-of-the-art 3D medical segmentation methods, especially on regions with small or complex structures. Our code is available at the following link: https://github.com/ge-xing/DiffUNet.
In the field of deep learning-based medical image segmentation, convolutional neural networks (CNNs) extract image features by combining linear convolutional layers with nonlinear activation functions. However, excessive stacking of linear layers in the network limits the model’s ability to capture fine-grained details. In addition, the feature distribution imbalance caused by the traditional fixed grouping strategy (FGS) can affect the deep model’s capacity to perceive the overall structure of the image. To address these challenges, we propose a medical image segmentation framework, called Kolmogorov-Arnold Network with the adaptive group strategy and contextual Transformer based on Unet (KAC-Unet). First, we propose the adaptive group strategy (AGS) to balance the grouping of different input channels, alleviating the performance degradation caused by differences in group information. Then, we propose the Shift Tokenized Kolmogorov-Arnold Network (KAN) Block to capture complex features in medical images through flexible nonlinear transformations and shift operations. Extensive experiments are conducted on three medical image segmentation datasets. The results demonstrate the effectiveness and superiority of our proposed method compared with state-of-the-art algorithms.
In this paper, we introduce MK-UNet, a paradigm shift towards ultra-lightweight, multi-kernel U-shaped CNNs tailored for medical image segmentation. Central to MK-UNet is the multi-kernel depth-wise convolution block (MKDC) we design to adeptly process images through multiple kernels, while capturing complex multi-resolution spatial relationships. MK-UNet also emphasizes the images salient features through sophisticated attention mechanisms, including channel, spatial, and grouped gated attention. Our MK-UNet network, with a modest computational footprint of only 0.316M parameters and 0.314G FLOPs, represents not only a remarkably lightweight, but also significantly improved segmentation solution that provides higher accuracy over state-of-the-art (SOTA) methods across six binary medical imaging benchmarks. Specifically, MK-UNet outperforms TransUNet in DICE score with nearly 333 × and 123 × fewer parameters and FLOPs, respectively. Similarly, when compared against UNeXt, MK-UNet exhibits superior segmentation performance, improving the DICE score up to 6.7% margins while operating with 4.7× fewer #Params. Our MK-UNet also outperforms other recent lightweight networks, such as MedT, CMUNeXt, EGE-UNet, and Rolling-UNet, with much lower computational resources. This leap in performance, coupled with drastic computational gains, positions MK-UNet as an unparalleled solution for real-time, high-fidelity medical diagnostics in resource-limited settings, such as point-of-care devices. Our implementation is available at https://github.com/SLDGroup/MK-UNet.
Medical image segmentation methods based on deep learning network are mainly divided into CNN and Transformer. However, CNN struggles to capture long-distance dependencies, while Transformer suffers from high computational complexity and poor local feature learning. To efficiently extract and fuse local features and long-range dependencies, this paper proposes Rolling-Unet, which is a CNN model combined with MLP. Specifically, we propose the core R-MLP module, which is responsible for learning the long-distance dependency in a single direction of the whole image. By controlling and combining R-MLP modules in different directions, OR-MLP and DOR-MLP modules are formed to capture long-distance dependencies in multiple directions. Further, Lo2 block is proposed to encode both local context information and long-distance dependencies without excessive computational burden. Lo2 block has the same parameter size and computational complexity as a 3×3 convolution. The experimental results on four public datasets show that Rolling-Unet achieves superior performance compared to the state-of-the-art methods.
Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, the convolutional neural network (CNN) excels in capturing local image features, whereas the Vision Transformer (ViT) adeptly models long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both the CNN and ViT face challenges in efficiently processing long-range dependencies in medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this study introduces Semi-Mamba-UNet, which integrates a purely visual Mamba-based U-shaped encoder-decoder architecture with a conventional CNN-based UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages both networks to generate pseudo-labels and cross-supervise one another at the pixel level simultaneously, drawing inspiration from consistency regularisation techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy that employs a pair of projectors to enhance the feature learning capabilities further, especially on unlabelled data. Semi-Mamba-UNet was comprehensively evaluated on two publicly available segmentation dataset and compared with seven other SSL frameworks with both CNN- or ViT-based UNet as the backbone network, highlighting the superior performance of the proposed method. The source code of Semi-Mamba-Unet, all baseline SSL frameworks, the CNN- and ViT-based networks, and the two corresponding datasets are made publicly accessible.
Precise segmentation is vital for successful diagnosis and treatment planning. Medical image segmentation has demonstrated remarkable advances with the introduction of deep convolutional neural networks, particularly encoder-decoder networks such as U-Net. Despite their excellent performances, these methods have some limitations. First, the structure is limited in its ability to combine information because feature maps to extract valid information from the final encoding stage are incompatible at the encoding and decoding levels. Second, the approach ignores significant semantic details and does not consider different types of small-scale contextual information when segmenting medical images. Lastly, most methods employing 3D architectures to process input medical images increase the computational complexity of the model without significantly improving the accuracy. To resolve these issues, we propose a segmentation network called Multi-Attention Gated Residual U-Net (MAGRes-UNet). This network incorporates four multi-attention gate (MAG) modules and residual blocks into a standard U-Net structure. The MAG module integrates the information from all encoding stages and focuses on small-scale tumors while disambiguating irrelevant and noisy feature responses, thereby promoting meaningful contextual information. The residual blocks simplify the network training and mitigate the problem of vanishing gradients. This improves the ability of the network to effectively learn intricate features and deep representations. Moreover, our network employs the Mish and ReLU activation functions (AFs), which utilize AdamW and Adam optimization strategies to achieve enhanced segmentation performance. The proposed MAGRes-UNet method was compared with the U-Net, Multi-Attention Gated-UNet (MAG-UNet), and Residual-UNet (ResUNet) models. In addition, a statistical T-test was performed to assess the difference in model significance between the approaches. The analysis revealed that MAGRes-UNet employing Mish and AdamW provides significant performance improvement over the ReLU AF and Adam optimizer on two benchmark datasets: Multi-Class BT T1-weighted Contrast-Enhanced Magnetic Resonance Imaging (T1-CE-MRI) and skin lesions HAM10000 (Human Against Machine with 10,000 training images). MAGRes-UNet using Mish and AdamW provides competitive performance over the representative medical image segmentation methods.
No abstract available
Due to the inductive bias of convolutions, CNNs perform hierarchical feature extraction efficiently in the field of medical image segmentation. However, the local correlation assumption of inductive bias limits the ability of convolutions to focus on global information, which has led to the performance of Transformer-based methods surpassing that of CNNs in some segmentation tasks in recent years. Although combining with Transformers can solve this problem, it also introduces computational complexity and considerable parameters. In addition, narrowing the encoder-decoder semantic gap for high-quality mask generation is a key challenge, addressed in recent works through feature aggregation from different skip connections. However, this often results in semantic mismatches and additional noise. In this paper, we propose a novel segmentation method, X-UNet, whose backbones employ the CFGC (Collaborative Fusion with Global Context-aware) module. The CFGC module enables multi-scale feature extraction and effective global context modeling. Simultaneously, we employ the CSPF (Cross Split-channel Progressive Fusion) module to progressively align and fuse features from corresponding encoder and decoder stages through channel-wise operations, offering a novel approach to feature integration. Experimental results demonstrate that X-UNet, with fewer computations and parameters, exhibits superior performance on various medical image datasets.The code and models are available on https://github.com/XSJ0410/X-UNet.
No abstract available
No abstract available
Deep learning has significantly advanced medical image analysis, particularly in semantic segmentation, which is essential for clinical decisions. However, existing 3D segmentation models, like the traditional 3D UNet, face challenges in balancing computational efficiency and accuracy when processing volumetric medical data. This study aims to develop an improved architecture for 3D medical image segmentation with enhanced learning strategies to improve accuracy and address challenges related to limited training data. We propose ES-UNet, a 3D segmentation architecture that achieves superior segmentation performance while offering competitive efficiency across multiple computational metrics, including memory usage, inference time, and parameter count. The model builds upon the full-scale skip connection design of UNet3+ by integrating channel attention modules into each encoder-to-decoder path and incorporating full-scale deep supervision to enhance multi-resolution feature learning. We further introduce Region Specific Scaling (RSS), a data augmentation method that adaptively applies geometric transformations to annotated regions, and a Dynamically Weighted Dice (DWD) loss to improve the balance between precision and recall. The model was evaluated on the MICCAI HECKTOR dataset, and additional validation was conducted on selected tasks from the Medical Segmentation Decathlon (MSD). On the HECKTOR dataset, ES-UNet achieved a Dice Similarity Coefficient (DSC) of 76.87%, outperforming baseline models including 3D UNet, 3D UNet 3+, nnUNet, and Swin UNETR. Ablation studies showed that RSS and DWD contributed up to 1.22% and 1.06% improvement in DSC, respectively. A sensitivity analysis demonstrated that the chosen scaling range in RSS offered a favorable trade-off between deformation and anatomical plausibility. Cross-dataset evaluation on MSD Heart and Spleen tasks also indicated strong generalization. Computational analysis revealed that ES-UNet achieves superior segmentation performance with moderate computational demands. Specifically, the enhanced skip connection design with lightweight channel attention modules integrated throughout the network architecture enables this favorable balance between high segmentation accuracy and computational efficiency. ES-UNet integrates architectural and algorithmic improvements to achieve robust 3D medical image segmentation. While the framework incorporates established components, its core contributions lie in the optimized skip connection strategy and supporting techniques like RSS and DWD. Future work will explore adaptive scaling strategies and broader validation across diverse imaging modalities.
The convolutional neural network(CNN)-based models have emerged as the predominant approach for medical image segmentation due to their effective inductive bias. However, their limitation lies in the lack of long-range information. In this study, we propose the Atten-Nonlocal Unet model that integrates CNN and transformer to overcome this limitation and precisely capture global context in 2D features. Specifically, we utilize the BCSM attention module and the Cross Non-local module to enhance feature representation, thereby improving the segmentation accuracy. Experimental results on the Synapse, ACDC, and AVT datasets show that Atten-Nonlocal Unet achieves DSC scores of 84.15%, 91.57%, and 86.94% respectively, and has 95% HD of 15.17, 1.16, and 4.78 correspondingly. Compared to the existing methods for medical image segmentation, the proposed method demonstrates superior segmentation performance, ensuring high accuracy in segmenting large organs while improving segmentation for small organs.
No abstract available
Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet.
No abstract available
Medical image segmentation is predominantly achieved with U-Net architectures based on Convolutional Neural Networks (CNNs). However, U-Net has two primary limitations. First, CNNs are constrained in modeling long-range dependencies, a limitation that is partially addressed by transformers, which face challenges due to their quadratic computational complexity. Second, there is a semantic gap in U-Net between feature maps in the encoder and decoder, especially between shallow and deep layers. To address these issues, we propose Mamba-UNet++, which alleviates the limited receptive field using a Visual State Space Duality (VSSD) vision block based on the improved Mamba2 VSS block. To bridge the semantic gap, Mamba-UNet++ replaces U-Net's direct skip connections with UNet++ dense skip connections and incorporates deep supervision during training. Extensive experiments on three datasets across different modalities show that Mamba-UNet++ outperforms competing methods, as evidenced by metrics such as DSC and HD95.
Recently, State Space Models (SSMs), particularly the Mamba-based framework, have demonstrated exceptional performance in medical image segmentation. This is attributed to their capacity to capture long-range dependencies efficiently with linear computational complexity. Nonetheless, current Mamba-based models encounter challenges in preserving the spatial context of 2D visual features, which is a consequence of their reliance on static 1D selective scanning patterns. In this study, we present Switch-UMamba, an innovative hybrid UNet framework that integrates local feature extraction power of Convolutional Neural Networks (CNNs) with the abilities of SSMs for capturing the long-range dependency. Switch-UMamba capitalizes on the Switch Visual State Space (VSS) module to leverage the Mixture-of-Scans (MoS) approach, a new scanning mechanism that amalgamates diverse scanning policies by considering each scan head as an expert within the Mixture-of-Experts (MoE) framework. MoS employs a router to dynamically allocate appropriate scanning policies and corresponding scan heads for each sample. This sparse-activated dynamic scanning approach not only ensures a rich and comprehensive acquisition of spatial information but also curtails computational expenses. Our comprehensive experimental evaluation on several medical image segmentation benchmarks indicates that Switch-UMamba has achieved state-of-the-art performances without using any pretrained weights. It is also worth highlighting that our approach outperforms other Mamba-based models with fewer parameters.
Histopathological examination holds a crucial role in cancer grading and serves as a significant reference for devising individualized patient treatment plans in clinical practice. Nevertheless, the distinctive features of numerous histopathological image targets frequently contribute to suboptimal segmentation performance. In this paper, we propose a UNet-based multi-scale context fusion algorithm for medical image segmentation, which extracts rich contextual information by extracting semantic information at different encoding stages and assigns different weights to the semantic information at different scales through TBSFF module to improve the learning ability of the network for features. Through multi-scale context fusion and feature selection networks, richer semantic features and detailed information are extracted. The target can be more accurately segmented without significantly increasing the extra overhead. The results demonstrate that our algorithm achieves superior Dice and IoU scores with a relatively small parameter count. Specifically, on the GlaS dataset, the Dice score is 90.56, and IoU is 83.47. For the MoNuSeg dataset, the Dice score is 79.07, and IoU is 65.98.
Medical image segmentation plays an important role in various clinical applications, but existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networks (CNNs) capture local details well but miss the global context, whereas transformers handle the global context but at a high computational cost. Recently, State Space Sequence Models (SSMs) have shown potential for capturing long-range dependencies with linear complexity, but their direct use in medical image segmentation remains limited due to incompatibility with image structures and autoregressive assumptions. To overcome these challenges, we propose SAMA-UNet, a novel U-shaped architecture that introduces two key innovations. First, the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block adaptively integrates local and global features through dynamic attention weighting, enabling an efficient representation of complex anatomical patterns. Second, the causal resonance multi-scale module (CR-MSM) improves encoder–decoder interactions by adjusting feature resolution and causal dependencies across scales, enhancing the semantic alignment between low- and high-level features. Extensive experiments on MRI, CT, and endoscopy datasets demonstrate that SAMA-UNet consistently outperforms CNN, Transformer, and Mamba-based methods. It achieves 85.38% DSC and 87.82% NSD on BTCV, 92.16% and 96.54% on ACDC, 67.14% and 68.70% on EndoVis17, and 84.06% and 88.47% on ATLAS23, establishing new benchmarks across modalities. These results confirm the effectiveness of SAMA-UNet in combining efficiency with accuracy, making it a promising solution for real-world clinical segmentation tasks. The source code is available on https://github.com/sqbqamar/SAMA-UNet.
SSRepVM-UNet: a lightweight hybrid model for medical image segmentation based on channel parallelism
No abstract available
Convolutional Neural Networks (CNNs) have shown promising results in the field of medical image segmentation, but often struggle to capture global context. While transformer-based models can address this limitation, they come with high computational costs. To tackle the challenges of high computational complexity and the limitations of CNNs, we propose DAG-UNet. The DAG-UNet reduces channel-wise redundancy and enhances feature diversity by introducing a similarity-based strategy to select dissimilar feature channels while efficiently capturing global context through multilayer perceptron. The DAG-UNet is also lightweight, with only 0.72M parameters and 0.62G FLOPs. We evaluate the performance of DAG-UNet on Speech MRI and ultrasound datasets. It achieves competitive performance on Speech MRI and outperforms other models on the ultrasound dataset. The code is available at: https://github.com/Yhe9718/DAG-UNet.
In response to the critical need for advanced solutions in medical imaging segmentation, particularly for real-time applications in diagnostics and treatment planning, this study introduces SM-UNet. This novel deep learning architecture efficiently addresses the challenge of real-time, accurate medical image segmentation by integrating convolutional neural network (CNN) with multilayer perceptron (MLP). The architecture uniquely combines an initial convolutional encoder for detailed feature extraction, MLP module for capturing long-range dependencies, and a decoder that merges global features with high-resolution CNN map. Further optimization is achieved through a tokenization approach, significantly reducing computational demands. Its superior performance is confirmed by evaluations on standard datasets, showing interaction times drastically lower than comparable networks—between 1/6 to 1/10, and 1/25 compared to SOTA models. These advancements underscore SM-UNet's potential as a groundbreaking tool for facilitating real-time, precise medical diagnostics and treatment strategies.
Kidney tumors are common cancer in advanced age, and providing early detection is crucial. Medical imaging and deep learning methods are increasingly attractive for identifying and segmenting kidney tumors. Convolutional neural networks have successfully classified and segmented images, enabling clinicians to recognize and segment tumors effectively. CT scans of kidneys aid in tumor assessment and morphology study, using semantic segmentation techniques for pixel-level identification of kidney and surrounding anatomy. Accurate diagnostic procedures are crucial for early detection of kidney cancer.This paper proposes an EfficientNet model for complex segmentation by linking the encoder stage EfficientNet with U-Net. This model represents a more successful system with improved encoder and decoder features. The Intersection over Union (IoU) metric quantifies model performance.The EfficientNet models showed high IoU_Scores for background, kidney, and tumor segmentation, with mean IoU_Scores ranging from 0.976 for B0 to 0.980 for B4. B7 received the highest IoU_Score for segmenting kidneys, while B4 received the highest for segmenting tumors. The study utilizes the KiTS19 dataset for contrast-enhanced CT images. Using Semantic segmentation for EfficientNet Family U-Net Models, our method proved even more reliable and will aid doctors in accurate tumor detection and image classification for early diagnosis.
No abstract available
Automated semantic segmentation in the domain of medical imaging can enable a faster, more reliable, and more affordable clinical workflow. Fully convolutional networks (FCNs) have been heavily used in this area due to the level of success that they have achieved. In this work, we first leverage recent architectural innovations to make an initial segmentation: (i) spatial and channel-wise squeeze and excitation mechanism; (ii) a 3D U-Net++ network and deep supervision. Second, we use classical methods for refining the initial segmentation: (i) spatial normalization and (ii) local 3D refinement network applied to patches. Finally, we put our methods together in a novel segmentation pipeline. We train and evaluate our models and pipelines on a dataset of a 120 abdominal magnetic resonance – volumetric – images (MRIs). The goal is to segment five different organs of interest (ORI): liver, kidneys, stomach, duodenum, and large bowel. Our experiments show that we can generate high resolution segmentation of comparable quality to the state-of-the-art methods on low resolution without adding significant computational cost.
No abstract available
Training Deep Neural Networks to solve semantic segmentation is a challenging problem with small-size labeled datasets, leading to overfitting. This is especially problematic in medical images, and in particular, in laparoscopic surgery images. In this context, ground-truth segmentation labels are available only for a small set of images with few patients. Besides, inter-patient variability is very high in practice. Models trained for a specific setup and a set of patients usually performs poorly when deployed in a new environment. This work proposes a new training strategy that improves the generalization accuracy of current state-of-the-art semantic segmentation methods applied to laparoscopic images. Our approach is based on training a discriminator network, which learns to detect segmentation errors, producing a dense segmentation error map. Unlike in adversarial networks, we train the discriminator offline by synthetically altering ground-truth segmentation labels with simple morphological and geometric operations. We then use the discriminator to train a segmentation neural network, by minimizing the discriminator predicted error jointly with a standard segmentation loss. This strategy results in segmentation models that are significantly more accurate when tested in unseen images than those only relying on data augmentation. This technique is very suitable to boost the performance of any state-of-the-art segmentation network and can be combined with other data augmentation strategies. This paper evaluates and validates our proposal by training and testing common state-of-the-art segmentation models in publicly available semantic segmentation datasets, specialized in laparoscopic and endoscopic surgery. The results show that our methods are effective, obtaining a significant improvement in terms of segmentation accuracy, especially in challenging small-size datasets.
No abstract available
No abstract available
The test-time adaptation (TTA) of deep-learning-based semantic segmentation models, specific to individual patient data, was addressed in this study. The existing TTA methods in medical imaging are often unconstrained, require anatomical prior information or additional neural networks built during training phase, making them less practical, and prone to performance deterioration. In this study, a novel framework based on information geometric principles was proposed to achieve generic, off-the-shelf, regularized patient-specific adaptation of models during test-time. By considering the pre-trained model and the adapted models as part of statistical neuromanifolds, test-time adaptation was treated as constrained functional regularization using information geometric measures, leading to improved generalization and patient optimality. The efficacy of the proposed approach was shown on three challenging problems: 1) improving generalization of state-of-the-art models for segmenting COVID-19 anomalies in Computed Tomography (CT) images 2) cross-institutional brain tumor segmentation from magnetic resonance (MR) images, 3) segmentation of retinal layers in Optical Coherence Tomography (OCT) images. Further, it was demonstrated that robust patient-specific adaptation can be achieved without adding significant computational burden, making it first of its kind based on information geometric principles.
No abstract available
Convolutional neural network (CNN) models obtain state of the art performance on image classification, localization, and segmentation tasks. Limitations in computer hardware, most notably memory size in deep learning accelerator cards, prevent relatively large images, such as those from medical and satellite imaging, from being processed as a whole in their original resolution. A fully convolutional topology, such as U-Net, is typically trained on down-sampled images and inferred on images of their original size and resolution, by simply dividing the larger image into smaller (typically overlapping) tiles, making predictions on these tiles, and stitching them back together as the prediction for the whole image. In this study, we show that this tiling technique combined with translationally-invariant nature of CNNs causes small, but relevant differences during inference that can be detrimental in the performance of the model. Here we quantify these variations in both medical (i.e., BraTS) and non-medical (i.e., satellite) images and show that training a 2D U-Net model on the whole image substantially improves the overall model performance. Finally, we compare 2D and 3D semantic segmentation models to show that providing CNN models with a wider context of the image in all three dimensions leads to more accurate and consistent predictions. Our results suggest that tiling the input to CNN models—while perhaps necessary to overcome the memory limitations in computer hardware—may lead to undesirable and unpredictable errors in the model's output that can only be adequately mitigated by increasing the input of the model to the largest possible tile size.
Medical image segmentation has numerous applications in diagnosing different diseases. Various types of diseases are found in white blood and Red blood cells. This paper represents the segmentation of WBCs from blood smear images. It is a complex and challenging task due to the frequent overlapping and variants in size and shape of WBCs with each other and RBCs. This overlapping is due to the rough border of the immature cells. The paper describes a new approach to WBC segmentation using UNet++, the marker watershed algorithm, and Neural Ordinary Differential Equations (ODE). This technique uses UNet++ for pre-segmentation, followed by the marker watershed method, which has been integrated using ODE to deepen the segmentation process. This novel integration enhances clinical applications in automated blood cell analysis, diagnostic imaging, and disease monitoring, improving accuracy and robustness. The ODE is used after the convolution operation to reduce the error at each step, preventing the massive propagation of error in the forward and the backpropagation. The White blood cells are segmented from the input smear images using ALL_IDB1 and ALL_IDB2 datasets, which are further used in the experiment section. UNet ++ is used to generate the pre-segmented probabilistic grayscale images. Some white blood cells are connected and make groups appearing in the grayscale images. These groups of WBCs are separated using a technique called the marker watershed, which gives us the final segmented result. The experimentation results show that the mean intersection over union (Jaccard method), the Dice similarity coefficient, and the mean pixel accuracy are 97.73%, 98.36%, and 98.97%, respectively. The structure and size of the white blood cells vary from red blood cells and platelets, which makes this work different from others. Furthermore, the combination of UNet++, marker watershed, and Neural Ordinary Differential Equation makes the proposed system unique from existing systems. This work can be further investigated to reduce computational complexity and memory space for optimizing deployment on low-resource devices, such as smart healthcare systems. Techniques like model pruning, quantization, or learned information distillation might be explored to create a lightweight version of the model without much loss in accuracy. Such developments would make possible mass uses of automated white blood cell segmentation in portable, low-cost health devices for point-of-care remote diagnostics and monitoring.
Medical segmentation is an important but challenging task with applications in standardized report generation, remote medicine and reducing medical exam costs by assisting experts. In this paper, we exploit time sequence information using a novel spatio-temporal recurrent deep learning network to automatically segment the thyroid gland in ultrasound cineclips. We train a DeepLabv3+ based convolutional LSTM model in four stages to perform semantic segmentation by exploiting spatial context from ultrasound cineclips. The backbone DeepLabv3+ model is replicated six times and the output layers are replaced with convolutional LSTM layers in an atrous spatial pyramid pooling configuration. Our proposed model achieves mean intersection over union scores of 0.427 for cysts, 0.533 for nodules and 0.739 for thyroid. We demonstrate the potential application of convolutional LSTM models for thyroid ultrasound segmentation.
Brain tumor segmentation is paramount in medical diagnostics. This study presents a multistage segmentation model consisting of two main steps. First, the fusion of magnetic resonance imaging (MRI) modalities creates new and more effective tumor imaging modalities. Second, the semantic segmentation of the original and fused modalities, utilizing various modified architectures of the U‐Net model. In the first step, a residual network with multi‐scale backbone architecture (Res2Net) and guided filter are employed for pixel‐by‐pixel image fusion tasks without requiring any training or learning process. This method captures both detailed and base elements from the multimodal images to produce better and more informative fused images that significantly enhance the segmentation process. Many fusion scenarios were performed and analyzed, revealing that the best fusion results are attained when combining T2‐weighted (T2) with fluid‐attenuated inversion recovery (FLAIR) and T1‐weighted contrast‐enhanced (T1CE) with FLAIR modalities. In the second step, several models, including the U‐Net and its many modifications (adding attention layers, residual connections, and depthwise separable connections), are trained using both the original and fused modalities. Further, a “Model Selection‐based” fusion of these individual models is also considered for more enhancement. In the preprocessing step, the images are resized by cropping them to decrease the pixel count and minimize background interference. Experiments utilizing the brain tumor segmentation (BraTS) 2020 dataset were performed to verify the efficiency and accuracy of the proposed methodology. The “Model Selection‐based” fusion model achieved an average Dice score of 88.4%, an individual score of 91.1% for the whole tumor (WT) class, an average sensitivity score of 86.26%, and a specificity score of 91.7%. These results prove the robustness and high performance of the proposed methodology compared to other state‐of‐the‐art methods.
Medical imaging is essential for clinical diagnosis, prognosis assessment, therapy planning, and surgery. Deep learning architectures have made a significant impact in the field of medical image segmentation in recent years. U-net, SegNet, and Attention UNet are some of the well-known deep neural networks in this area and have been widely recommended architectures for medical imaging for a long time. Despite having exceptional outcomes in segmenting multimodal medical images, traditional U-Net design appears to be inefficient in some areas i.e. difficult datasets of various resolutions. The existing segmentation networks such as U-Net and SegNet are also prone to overfitting and contain semantic gaps within the encoder and decoder layer. Because of this, several changes to the already cutting-edge U-Net models are suggested. The goal of this paper is to provide an architecture that can effectively improve the traditional methods and process multimodal biomedical image datasets, including MRI, microscope, dermoscopy, CT scan, chest x-ray, and endoscopy images. An average IOU of 0.88 and a mean dice score of 0.93 for the six datasets was obtained. Compared to standard U-Net, there are fewer parameters and a shorter training time. There are just 28.34% of U-Net’s parameters in the architecture. The architecture requires about 3.798M parameters and an average of 80 epochs to reach convergence. The architecture has also shown significant performance on various datasets.
Over the past decade, deep learning techniques, particularly neural networks, have become essential in medical imaging for tasks like image detection, classification, and segmentation. These methods have greatly enhanced diagnostic accuracy, enabling quicker identification and more effective treatments. In chest X-ray analysis, however, challenges remain in accurately segmenting and classifying organs such as the lungs, heart, diaphragm, sternum, and clavicles, as well as detecting abnormalities in the thoracic cavity. Despite progress, these issues highlight the need for improved approaches to overcome segmentation difficulties and enhance diagnostic reliability. In this context, we propose a novel architecture named CXR-Seg, tailored for semantic segmentation of lungs from chest X-ray images. The proposed network mainly consists of four components, including a pre-trained EfficientNet as an encoder to extract feature encodings, a spatial enhancement module embedded in the skip connection to promote the adjacent feature fusion, a transformer attention module at the bottleneck layer, and a multi-scale feature fusion block at the decoder. The performance of the proposed CRX-Seg was evaluated on four publicly available datasets (MC, Darwin, and Shenzhen for chest X-rays, and TCIA for brain flair segmentation from MRI images). The proposed method achieved a Jaccard index, Dice coefficient, accuracy, sensitivity, and specificity of 95.63%, 97.76%, 98.77%, 98.00%, and 99.05%on MC; 91.66%, 95.62%, 96.35%, 95.53%, and 96.94% on V7 Darwin COVID-19; and 92.97%, 96.32%, 96.69%, 96.01%, and 97.40% on the Shenzhen Tuberculosis CXR Dataset, respectively. Conclusively, the proposed network offers improved performance in comparison with state-of-the-art methods, and better generalization for the semantic segmentation of lungs from chest X-ray images.
Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To present and evaluate Dafne (deep anatomic federated network), a freely available decentralized, collaborative deep learning system for the semantic segmentation of radiologic images through federated incremental learning. Materials and Methods Dafne is free software with a client-server architecture. The client side is an advanced user interface that applies the deep learning models stored on the server to the user's data and allows the user to check and refine the prediction. Incremental learning is then performed at the client's side and sent back to the server, where it is integrated into the root model. Dafne was evaluated locally, by assessing the performance gain across model generations on 38 MRI datasets of the lower legs, and through the analysis of real-world usage statistics (n = 639 use-cases). Results Dafne demonstrated a statistically improvement in the accuracy of semantic segmentation over time (average increase of the Dice Similarity Coefficient by 0.007 points/generation on the local validation set, P < .001). Qualitatively, the models showed enhanced performance on various radiologic image types, including those not present in the initial training sets, indicating good model generalizability. Conclusion Dafne showed improvement in segmentation quality over time, demonstrating potential for learning and generalization. ©RSNA, 2025.
Semantic image segmentation is a crucial task in computer vision, with applications ranging from autonomous driving to medical image analysis. In recent years, deep learning has revolutionized this field, leading to the development of various neural network models aimed at improving segmentation accuracy. One such architecture is SegNet, which we explore in this article.SegNet's architecture consists of an encoder network, a corresponding decoder network, and a pixel-wise classification layer. The encoder network, resembling VGG16 with 13 convolutional layers, extracts high-level features from input images. The innovation lies in the decoder network's approach to upsampling, utilizing pooled indices from the encoder's maximum pooling step to perform non-linear up sampling. This eliminates the need for additional learning during up sampling, making SegNet efficient in both storage and computation.SegNet represents an exciting advancement in deep learning image segmentation. Its efficient architecture, memory-conscious design, and potential for real-time applications make it a valuable tool in the field of computer vision with promising integrated applications and prospects.
Early and accurate diagnosis is needed for successful treatment of pancreatic cancer, as survival rates are low. Computed tomography (CT) is a commonly used diagnostic and staging tool for pancreatic cancers. The difficult task for clinicians and research is that exact segmentation and classification through medical images. On the other hand, manually segmenting volumetric CT scans is a laborious and subjective procedure. Recently the U-Net technique has demonstrated remarkable step in semantic segmentation on medical images. The crucial part is that to differentiate the tumour and non-tumour pancreatic tissues in CT images is the classification of pancreatic cancer. There have been various deep learning models put out. This study presents a unified deep learning model that can accurately partition and classify pancreatic cancer. The U-Net model was trained on annotated CT scans to precisely segment the pancreas and potential lesions, achieving accuracy 99.4%. The DenseNet-121 classifier was subsequently applied to the segmented regions to differentiate tumor and non-tumor tissue. The classification model achieved an accuracy of 99.5%. The aim of this model is to implement a dependable and precise diagnostic system that improves the performance of pancreatic cancer diagnosis.
Histopathological image analysis is a pivotal area of medical research that leverages deep learning to derive quantitative insights from Hematoxylin and Eosin (H\&E) stained images. This study aims to enhance the analysis of H\&E breast cancer histopathology images by developing deep learning methodologies focused on nuclei and mitosis. Nuclei provide essential information for disease diagnosis, while mitosis is crucial for cancer grading and prognosis prediction. We propose two methodologies: the first segments nuclei using a U-shaped semantic segmentation architecture called CompSegNet; the second detects and classifies mitotic cells through a hybrid approach combining object detection and fuzzy classification algorithms. To evaluate the effectiveness of these methodologies, we introduce two new publicly available datasets: NuSeC (Nuclei Segmentation and Classification) and MiDeSeC (Mitosis Detection, Segmentation, and Classification). These datasets not only validate our methodologies but also provide valuable resources for developing deep learning models in histopathological image analysis.
No abstract available
. Recently, pruning deep neural networks (DNNs) has received a lot of attention for improving accuracy and generalization power, re-ducing network size, and increasing inference speed on specialized hard-wares. Although pruning was mainly tested on computer vision tasks, its application in the context of medical image analysis has hardly been explored. This work investigates the impact of well-known pruning techniques, namely layer-wise and network-wide magnitude pruning, on the nuclei instance segmentation performance in histological images. Our utilised instance segmentation model consists of two main branches: (1) a semantic segmentation branch, and (2) a deep regression branch. We investigate the impact of weight pruning on the performance of both branches separately, and on the final nuclei instance segmentation re-sult. Evaluated on two publicly available datasets, our results show that layer-wise pruning delivers slightly better performance than network-wide pruning for small compression ratios (CRs) while for large CRs, network-wide pruning yields superior performance. For semantic segmentation, deep regression and final instance segmentation, 93.75 %, 95 %, and 80 % of the model weights can be pruned by layer-wise pruning with less than 2 % reduction in the performance of respective models.
No abstract available
Segmentation of cell nuclei from three-dimensional (3D) volumetric fluorescence microscopy images is crucial for biological and clinical analyses. In recent years, convolutional neural networks have become the reliable 3D medical image segmentation standard. However, convolutional layers are limited by their finite receptive fields and weight-sharing mechanisms. Consequently, they struggle to effectively model long-range dependencies and spatial correlations, which may lead to inadequate nuclei segmentation. Moreover, the diversity in nuclear appearance and density poses additional challenges. This work proposes a lightweight multi-layer deep aggregation network, MLDA-Net, incorporating Wide Receptive Field Attention (WRFA). This module effectively simulates the large receptive field generated by self-attention in the Swin Transformer while requiring fewer model parameters. This design implements an extended global sensory field that enhances the ability to capture a wide range of spatial information. In addition, the multiple cross-attention (MCA) module in MLDA-Net enhances the output features of different resolutions from the encoder while maintaining global effectiveness. The Multi-Path Aggregation Feature Pyramid Network (MAFPN) receives multi-scale outputs from the MCA module, generating a robust hierarchical feature pyramid for the final prediction. MLDA-Net outperforms state-of-the-art networks, including 3DU-Net, nnFormer, UNETR, SwinUNETR, and 3DUXNET, on the 3D volumetric datasets NucMM and MitoEM. It achieves average performance improvements of 4% to 7% in F1 score, MIoU, and PQ metrics, thereby establishing new benchmark results.
Mammography for the diagnosis of early breast cancer (BC) relies heavily on the identification of breast masses. However, in the early stages, it might be challenging to ascertain whether a breast mass is benign or malignant. Consequently, many deep learning (DL)-based computer-aided diagnosis (CAD) approaches for BC classification have been developed. Recently, the transformer model has emerged as a method for overcoming the constraints of convolutional neural networks (CNN). Thus, our primary goal was to determine how well an improved transformer model could distinguish between benign and malignant breast tissues. In this instance, we drew on the Mendeley data repository’s INbreast dataset, which includes benign and malignant breast types. Additionally, the segmentation anything model (SAM) method was used to generate the optimized cutoff for region of interest (ROI) extraction from all mammograms. We implemented a successful architecture modification at the bottom layer of a pyramid transformer (PTr) to identify BC from mammography images. The proposed PTr model using a transfer learning (TL) approach with a segmentation technique achieved the best accuracy of 99.96% for binary classifications with an area under the curve (AUC) score of 99.98%, respectively. We also compared the performance of the proposed model with other transformer model vision transformers (ViT) and DL models, MobileNetV3 and EfficientNetB7, respectively. In this study, a modified transformer model is proposed for BC prediction and mammography image classification using segmentation approaches. Data segmentation techniques accurately identify the regions affected by BC. Finally, the proposed transformer model accurately classified benign and malignant breast tissues, which is vital for radiologists to guide future treatment.
Automatic nuclear instance segmentation is a crucial task in computational pathology as this information is required for extracting cell-based features in down-stream analysis. However, instance segmentation is a challenging task due to the nature of histology images which show large variations and irregularities in nuclei appearances and arrangements. Various deep learning-based methods have tried to tackle these challenges, however, most of these methods fail to segment the nuclei instances in crowded scenes accurately, or they are not fast enough for practical usage. In this paper, we propose a double-stage neural network for nuclear instance segmentation which leverages the power of an interactive model, NuClick, for accurate instance segmentation by replacing the user input with a nuclei detection module, YOLOv5. We optimized the proposed method to be lightweight and fast and show that it can achieve promising results when tested on the largest publicly available nuclei dataset.
Oral cancer is the most preventable cancer if it is diagnosed at an early stage. Artificial intelligence can be a great help in cancer detection. Deep learning architectures are predominantly useful in medical image analysis by identifying patterns and the ability to predict the insights. The study proposes a deep learning methodology using Mask RCNN (Region Based Convolutional Neural Network) for the precise detection and segmentation of oral lesions in photographic images. With the swin transformer as a backbone, it aids the model in extracting features more effectively, thus supporting precise detection. Its ability to identify relationships among different parts of an image is particularly useful in locating the smallest lesions. The precise annotation has helped generate the segmentation mask accurately. The model attains a mean average precision (mAP) of 99.5%, a precision of 92.7% and a recall of 96.6%. This exceptional performance of the model is useful for the medical community to use it as a tool for the early detection of oral cancer.
Automatic tumor segmentation is an important and challenging clinical task because tumors have different sizes, shapes, contrasts, and locations. In this paper, we present an automatic instance semantic segmentation method based on deep neural networks (DNNs). The proposed networks are tailored to picture tumors in magnetic resonance imaging (MRI) and computed tomography (CT) images. We present an end-to-end multitask learning architecture comprising three stages, namely detection, segmentation, and classification. This paper introduces a new technique for tumor detection based on high-level extracted features from convolutional neural networks (CNNs) using the Hough transform technique. The detected tumor(s), are segmented with a set of fully connected (FC) layers, and the segmented mask is classified through FCs. The proposed architecture gives promising results on the popular medical image benchmarks. Our framework is generalized in the sense that it can be used in different types of medical images in varied sizes, such as the Liver Tumor Segmentation (LiTS-2017) challenge, and the Brain Tumor Segmentation (BraTS-2016) benchmark.
No abstract available
Image segmentation tasks are considered resource intensive. These tasks require domain specialists to labor manually over long periods of time. When considering medical image segmentation tasks, the personnel and the error margin make these tasks expensive. Therefore there is a need for an automated tool. Deep learning has fast become the state of the art for such tasks, yet the methods applied require large data-sets of fully annotated examples. The need for supervision prevents researchers from developing deep learning and machine learning solutions on new datasets, which were not annotated by professional personnel. In this paper we utilize weak supervision to train a deep neural network to perform instance segmentation. The data used for this project is the Multimodal Brain Tumor Segmentation Challenge 3D MRI scans. The method used is a two-step DNN. The first step is binary classification of slices to either pathological or healthy. This is the only step which uses supervision for the training of the DNNs. In the second step, another DNN in the form of a Unet encoder-decoder network is utilized. This network encodes the input raw data and decodes each pixel to a 32 dimensional vector representing a semantic identity (semantic map). The supervision for training this second network is derived from the GradCAM of the classification DNN. Lastly, to segment the input data we determine the semantic distance between suspected lesion points and the entirety of the map. We achieve an average Dice score of 0.73 over three test sets of 38 patients each.
Neural cell instance segmentation, which aims at joint detection and segmentation of every neural cell in a microscopic image, is essential to many neuroscience applications. The challenge of this task involves cell adhesion, cell distortion, unclear cell contours, low-contrast cell protrusion structures, and background impurities. Consequently, current instance segmentation methods generally fall short of precision. In this paper, we propose an attentive instance segmentation method that accurately predicts the bounding box of each cell as well as its segmentation mask simultaneously. In particular, our method builds on a joint network that combines a single shot multi-box detector (SSD) and a U-net. Furthermore, we employ the attention mechanism in both detection and segmentation modules to focus the model on the useful features. The proposed method is validated on a dataset of neural cell microscopic images. Experimental results demonstrate that our approach can accurately detect and segment neural cell instances at a fast speed, comparing favorably with the state-of-the-art methods. Our code is released on GitHub. The link is https://github.com/yijingru/ANCIS-Pytorch.
Background The deterministic deep learning models have achieved state-of-the-art performance in various medical image analysis tasks, including nuclei segmentation from histopathology images. The deterministic models focus on improving the model prediction accuracy without assessing the confidence in the predictions. Methods We propose a semantic segmentation model using Bayesian representation to segment nuclei from the histopathology images and to further quantify the epistemic uncertainty. We employ Bayesian approximation with Monte-Carlo (MC) dropout during the inference time to estimate the model’s prediction uncertainty. Results We evaluate the performance of the proposed approach on the PanNuke dataset, which consists of 312 visual fields from 19 organ types. We compare the nuclei segmentation accuracy of our approach with that of a fully convolutional neural network, U-Net, SegNet, and the state-of-the-art Hover-net. We use F1-score and intersection over union (IoU) as the evaluation metrics. The proposed approach achieves a mean F1-score of 0.893 ± 0.008 and an IoU value of 0.868 ± 0.003 on the test set of the PanNuke dataset. These results outperform the Hover-net, which has a mean F1-score of 0.871 ± 0.010 and an IoU value of 0.840 ± 0.032. Conclusions The proposed approach, which incorporates Bayesian representation and Monte-Carlo dropout, demonstrates superior performance in segmenting nuclei from histopathology images compared to existing models such as U-Net, SegNet, and Hover-net. By considering the epistemic uncertainty, our model provides a more reliable estimation of the prediction confidence. These findings highlight the potential of Bayesian deep learning for improving medical image analysis tasks and can contribute to the development of more accurate and reliable computer-aided diagnostic systems.
No abstract available
This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are available at https://github.com/SimonZeng7108/Video-SwinUNet.
In this project, computer image enhancement processing, learning and analysis were carried out on a large number of brain multimodal medical images (MRI) from patients and a deep convolutional neural network was established to generalize the mask mapping function paradigm of learning images and diseased areas, so as to achieve accurate classification of high-grade glioma (HGG) and low-grade glioma (LGG); the semantic segmentation whole tumor area (WT) was carried out.
Unsupervised domain adaptation (UDA) methods have been broadly utilized to improve the models' adaptation ability in general computer vision. However, different from the natural images, there exist huge semantic gaps for the nuclei from different categories in histopathology images. It is still under-explored how could we build generalized UDA models for precise segmentation or classification of nuclei instances across different datasets. In this work, we propose a novel deep neural network, namely Category-Aware feature alignment and Pseudo-Labelling Network (CAPL-Net) for UDA nuclei instance segmentation and classification. Specifically, we first propose a category-level feature alignment module with dynamic learnable trade-off weights. Second, we propose to facilitate the model performance on the target data via self-supervised training with pseudo labels based on nuclei-level prototype features. Comprehensive experiments on cross-domain nuclei instance segmentation and classification tasks demonstrate that our approach outperforms state-of-the-art UDA methods with a remarkable margin.
Convolutional Neural Networks (CNNs) are considered state of the art segmentation methods for biomedical images in general and microscopy sequences of living cells, in particular. The success of the CNNs is attributed to their ability to capture the structural properties of the data, which enables accommodating complex spatial structures of the cells, low contrast, and unclear boundaries. However, in their standard form CNNs do not exploit the temporal information available in time-lapse sequences, which can be crucial to separating touching and partially overlapping cell instances. In this work, we exploit cell dynamics using a novel CNN architecture which allows multi-scale spatio-temporal feature extraction. Specifically, a novel recurrent neural network (RNN) architecture is proposed based on the integration of a Convolutional Long Short Term Memory (ConvLSTM) network with the U-Net. The proposed ConvLSTM-UNet network is constructed as a dual-task network to enable training with weakly annotated data, in the form of approximate cell centers, termed markers, when the complete cells’ outlines are not available. We further use the fast marching method to facilitate the partitioning of clustered cells into individual connected components. Finally, we suggest an adaptation of the method for 3D microscopy sequences without drastically increasing the computational load. The method was evaluated on the Cell Segmentation Benchmark and was ranked among the top three methods on six submitted datasets. Exploiting the proposed built-in marker estimator we also present state-of-the-art cell detection results for an additional, publicly available, weekly annotated dataset. The source code is available at https://gitlab.com/shaked0/lstmUnet.
Variational Level Set (LS) has been a widely used method in medical segmentation. However, it is limited when dealing with multi-instance objects in the real world. In addition, its segmentation results are quite sensitive to initial settings and highly depend on the number of iterations. To address these issues and boost the classic variational LS methods to a new level of the learnable deep learning approaches, we propose a novel definition of contour evolution named <italic>Recurrent Level Set (RLS)</italic><xref ref-type="fn" rid="fn1"><sup>1</sup></xref> to employ Gated Recurrent Unit under the energy minimization of a variational LS functional. The curve deformation process in RLS is formed as a hidden state evolution procedure and updated by minimizing an energy functional composed of fitting forces and contour length. By sharing the convolutional features in a <italic>fully end-to-end trainable framework</italic>, we extend RLS to <italic>Contextual RLS (CRLS)</italic> to address semantic segmentation in the wild. The experimental results have shown that our proposed RLS improves both computational time and segmentation accuracy against the classic variational LS-based method whereas the fully end-to-end system CRLS achieves competitive performance compared to the state-of-the-art semantic segmentation approaches.<fn id="fn1"><label><sup>1</sup></label><p>Source codes will be publicly available.</p></fn>
Recent studies have demonstrated the superiority of deep learning in medical image analysis, especially in cell instance segmentation, a fundamental step for many biological studies. However, the excellent performance of the neural networks requires training on large, unbiased dataset and annotations, which is labor-intensive and expertise-demanding. This paper presents an end-to-end framework to automatically detect and segment NeuN stained neuronal cells on histological images using only point annotations. Unlike traditional nuclei segmentation with point annotation, we propose using point annotation and binary segmentation to synthesize pixel-level annotations. The synthetic masks are used as the ground truth to train the neural network, a U-Net-like architecture with a state-of-the-art network, EfficientNet, as the encoder. Validation results show the superiority of our model compared to other recent methods. In addition, we investigated multiple post-processing schemes and proposed an original strategy to convert the probability map into segmented instances using ultimate erosion and dynamic reconstruction. This approach is easy to configure and outperforms other classical post-processing techniques. This work aims to develop a robust and efficient framework for analyzing neurons using optical microscopic data, which can be used in preclinical biological studies and, more specifically, in the context of neurodegenerative diseases. Code is available at: https://github.com/MIRCen/NeuronInstanceSeg.
Accurate segmenting nuclei instances is a crucial step in computer-aided image analysis to extract rich features for cellular estimation and following diagnosis as well as treatment. While it still remains challenging because the wide existence of nuclei clusters, along with the large morphological variances among different organs make nuclei instance segmentation susceptible to over-/under-segmentation. Additionally, the inevitably subjective annotating and mislabeling prevent the network learning from reliable samples and eventually reduce the generalization capability for robustly segmenting unseen organ nuclei. To address these issues, we propose a novel deep neural network, namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level information aggregation module between two task-specific decoders. Rather than independent decoders, it leverages the merit of spatial and texture dependencies between nuclei and contour by bi-directionally aggregating task-specific features. Furthermore, we proposed a novel smooth truncated loss that modulates losses to reduce the perturbation from outliers. Consequently, the network can focus on learning from reliable and informative samples, which inherently improves the generalization capability. Experiments on the 2018 MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness of our proposed method, surpassing all the other 35 competitive teams by a significant margin.
Segmentation of vertebrae and intervertebral discs (IVD) in MR images are important steps for automatic image analysis. This paper proposes an extension of an iterative vertebra segmentation method that relies on a 3D fully-convolutional neural network to segment the vertebrae one-by-one. We augment this approach with an additional segmentation step following each vertebra detection to also segment the IVD below each vertebra. To train and test the algorithm, we collected and annotated T2-weighted sagittal lumbar spine MR scans of 53 patients. The presented approach achieved a mean Dice score of 93 % ± 2 % for vertebra segmentation and 86 % ± 7 % for IVD segmentation. The method was able to cope with pathological abnormalities such as compression fractures, Schmorl’s nodes and collapsed IVDs. In comparison, a similar network trained for IVD segmentation without knowledge of the adjacent vertebra segmentation result did not detect all IVDs (89 %) and also achieved a lower Dice score of 83 % ± 9 %. These results indicate that combining IVD segmentation with vertebra segmentation in lumbar spine MR images can help to improve the detection and segmentation performance compared with separately segmenting these structures.
No abstract available
Individual tooth segmentation and identification from cone-beam computed tomography images are preoperative prerequisites for orthodontic treatments. Instance segmentation methods using convolutional neural networks have demonstrated ground-breaking results on individual tooth segmentation tasks, and are used in various medical imaging applications. While point-based detection networks achieve superior results on dental images, it is still a challenging task to distinguish adjacent teeth because of their similar topologies and proximate nature. In this study, we propose a point-based tooth localization network that effectively disentangles each individual tooth based on a Gaussian disentanglement objective function. The proposed network first performs heatmap regression accompanied by box regression for all the anatomical teeth. A novel Gaussian disentanglement penalty is employed by minimizing the sum of the pixel-wise multiplication of the heatmaps for all adjacent teeth pairs. Subsequently, individual tooth segmentation is performed by converting a pixel-wise labeling task to a distance map regression task to minimize false positives in adjacent regions of the teeth. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art approaches by increasing the average precision of detection by 9.1%, which results in a high performance in terms of individual tooth segmentation. The primary significance of the proposed method is two-fold: (1) the introduction of a point-based tooth detection framework that does not require additional classification and (2) the design of a novel loss function that effectively separates Gaussian distributions based on heatmap responses in the point-based detection framework.
Abstract It’s highly crucial to divide up medical photos correctly in order to make diagnoses and plan treatments. Convolutional Neural Networks (CNNs) are very good at picking up local information, but they have problems with long-range dependencies. On the other side, Vision Transformers (ViTs) are good at modeling global context, but they need a lot of computer power and labeled data. To get surrounding these difficulties, we establish PSwinUNet, a hybrid CNN-Transformer system based on a partially supervised learning the structure. Adding a SwinTransformer block to a U-shaped structure makes PSwinUNet better at learning internationally semantics and up-sampling. It also uses a polarized self-attention mechanism in skip connections to keep spatial information from getting lost when the image is downsampled. PSwinUNet does a better job than the best gets closer that are currently accessible when tested on the BUSI, DRIVE, and CVC-ClinicDB datasets. For instance, it earned Dice Similarity Coefficient (DSC) scores of 0.781, 0.896, and 0.960 based on the BUSI data set with 1/8, 1/2, and entire labeled information, respectively. These scores are substantially better than those of the old UNet and UNet++ models.
The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects—an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.
Lupus nephritis (LuN) is a manifestation of systemic lupus erythematosus defined by chronic infiltration of immune cells into the kidneys—particularly lymphocytes and dendritic cells (DCs). Ultimately, our goal is to characterize the cellular communities associated with progression to kidney failure. To accomplish this, we have generated a dataset of fluorescence confocal microscopy images of kidney biopsies from 31 LuN patients that have been stained for two T-lymphocyte populations, B-lymphocytes and two DC populations. We are using convolutional neural networks (CNNs) with a Mask R-CNN architecture to perform instance segmentation on these five classes. This multi-class instance segmentation task is hindered by an inherent class imbalance between lymphocytes and DCs, with DCs being much less prevalent. Here we discuss methods for managing class imbalance to achieve comparable instance segmentation of both DCs and lymphocytes in LuN biopsies. A network trained to identify all 5 classes yielded higher sensitivity to DCs when the training set was filtered to contain images with all 5 cell classes present. Average DC sensitivity on an independent test set improved from 0.54 to 0.63 with filtered training data. DC segmentation improved further when the network was trained specifically for DC classes. Average DC sensitivity reached 0.91 when trained separately from lymphocytes, with average Jaccard index of DCs improving from 0.69±0.2 to 0.76±0.2. Accurate segmentation of all cell types relevant to LuN pathogenesis enabled in-depth spatial analysis of the immune environments that result in renal failure in LuN patients.
The segmentation of Organs At Risk (OAR) in Computed Tomography (CT) images is an essential part of the planning phase of radiation treatment to avoid the adverse effects of cancer radiotherapy treatment. Accurate segmentation is a tedious task in the head and neck region due to a large number of small and sensitive organs and the low contrast of CT images. Deep learning‐based automatic contouring algorithms can ease this task even when the organs have irregular shapes and size variations. This paper proposes a fully automatic deep learning‐based self‐supervised 3D Residual UNet architecture with CBAM(Convolution Block Attention Mechanism) for the organ segmentation in head and neck CT images. The Model Genesis structure and image context restoration techniques are used for self‐supervision, which can help the network learn image features from unlabeled data, hence solving the annotated medical data scarcity problem in deep networks. A new loss function is applied for training by integrating Focal loss, Tversky loss, and Cross‐entropy loss. The proposed model outperforms the state‐of‐the‐art methods in terms of dice similarity coefficient in segmenting the organs. Our self‐supervised model could achieve a 4% increase in the dice score of Chiasm, which is a small organ that is present only in a very few CT slices. The proposed model exhibited better accuracy for 5 out of 7 OARs than the recent state‐of‐the‐art models. The proposed model could simultaneously segment all seven organs in an average time of 0.02 s. The source code of this work is made available at https://github.com/seeniafrancis/SABOSNet.
Automatic segmentation of organs-at-risk (OARs) in CT scans using convolutional neural networks (CNNs) is being introduced into the radiotherapy workflow. However, these segmentations still require manual editing and approval by clinicians prior to clinical use, which can be time consuming. The aim of this work was to develop a tool to automatically identify errors in 3D OAR segmentations without a ground truth. Our tool uses a novel architecture combining a CNN and graph neural network (GNN) to leverage the segmentation's appearance and shape. The proposed model is trained using self-supervised learning using a synthetically-generated dataset of segmentations of the parotid and with realistic contouring errors. The effectiveness of our model is assessed with ablation tests, evaluating the efficacy of different portions of the architecture as well as the use of transfer learning from an unsupervised pretext task. Our best performing model predicted errors on the parotid gland with a precision of 85.0%&89.7% for internal and external errors respectively, and recall of 66.5%&68.6%. This offline QA tool could be used in the clinical pathway, potentially decreasing the time clinicians spend correcting contours by detecting regions which require their attention. All our code is publicly available at https://github.com/rrr-uom-projects/contour_auto_QATool.
Alignment between human knowledge and machine learning models is crucial for achieving efficient and interpretable AI systems. However, conventional self-supervised pre-training methods often suffer from low efficiency, as they do not incorporate human knowledge during the pre-training process and instead rely mainly on post-hoc alignment techniques. We propose Gaze Pre-Training (GzPT), a novel approach that introduces early alignment with human eye gaze information during the pre-training process to enhance both the learning efficiency and performance of self-supervised models. By leveraging contrastive learning to pull together images with similar gaze patterns, GzPT can effectively align the model with human attention during the pre-training. We demonstrate the effectiveness of our approach on three diverse medical image datasets, showing that GzPT can consistently outperform baseline methods and learn more meaningful and interpretable representations. Our findings also highlight the potential of incorporating human eye gaze as a form of passive knowledge to bridge the gap between human and machine learning in the self-supervised pre-training. Our code is available at Github.
Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the ‘superpatch,’ a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dualmasking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.
Preserving maximal information is one of principles of designing self-supervised learning methodologies. To reach this goal, contrastive learning adopts an implicit way which is contrasting image pairs. However, we believe it is not fully optimal to simply use the contrastive estimation for preservation. Moreover, it is necessary and complemental to introduce an explicit solution to preserve more information. From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations. Together with the contrastive loss, we present Preservational Contrastive Representation Learning (PCRL) for learning self-supervised medical representations. PCRL provides very competitive results under the pretraining-finetuning protocol, outperforming both self-supervised and supervised counterparts in 5 classification/segmentation tasks substantially. Codes are available at https://github.com/Luchixiang/PCRL.
Existing self-supervised medical image segmentation usually encounters the domain shift problem (i.e., the input distribution of pre-training is different from that of fine-tuning) and/or the multimodality problem (i.e., it is based on single-modal data only and cannot utilize the fruitful multimodal information of medical images). To solve these problems, in this work, we propose multimodal contrastive domain sharing (Multi-ConDoS) generative adversarial networks to achieve effective multimodal contrastive self-supervised medical image segmentation. Compared to the existing self-supervised approaches, Multi-ConDoS has the following three advantages: (i) it utilizes multimodal medical images to learn more comprehensive object features via multimodal contrastive learning; (ii) domain translation is achieved by integrating the cyclic learning strategy of CycleGAN and the cross-domain translation loss of Pix2Pix; (iii) novel domain sharing layers are introduced to learn not only domain-specific but also domain-sharing information from the multimodal medical images. Extensive experiments on two publicly multimodal medical image segmentation datasets show that, with only 5% (resp., 10%) of labeled data, Multi-ConDoS not only greatly outperforms the state-of-the-art self-supervised and semi-supervised medical image segmentation baselines with the same ratio of labeled data, but also achieves similar (sometimes even better) performances as fully supervised segmentation methods with 50% (resp., 100%) of labeled data, which thus proves that our work can achieve superior segmentation performances with very low labeling workload. Furthermore, ablation studies prove that the above three improvements are all effective and essential for Multi-ConDoS to achieve this very superior performance.
Self-supervised learning aims to learn transferable representations from unlabeled data for downstream tasks. Inspired by masked language modeling in natural language processing, masked image modeling (MIM) has achieved certain success in the field of computer vision, but its effectiveness in medical images remains unsatisfactory. This is mainly due to the high redundancy and small discriminative regions in medical images compared to natural images. Therefore, this paper proposes an adaptive hard masking (AHM) approach based on deep reinforcement learning to expand the application of MIM in medical images. Unlike predefined random masks, AHM uses an asynchronous advantage actor-critic (A3C) model to predict reconstruction loss for each patch, enabling the model to learn where masking is valuable. By optimizing the non-differentiable sampling process using reinforcement learning, AHM enhances the understanding of key regions, thereby improving downstream task performance. Experimental results on two medical image datasets demonstrate that AHM outperforms state-of-the-art methods. Additional experiments under various settings validate the effectiveness of AHM in constructing masked images.
Discriminative learning, restorative learning, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can sig-nificantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, thefirstframework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; and (4) enhances state-of-the-art restorative approaches, revealing that DiRA is a general mechanism for united representation learning. All code and pretrained models are available at https://github.com/JLiangLab/DiRA.
No abstract available
No abstract available
ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations
Self-supervised learning has attracted increasing attention as it learns data-driven representation from data without annotations. Vision transformer-based autoencoder (ViT-AE) by He et al. (2021) is a recent self-supervised learning technique that employs a patch-masking strategy to learn a meaningful latent space. In this paper, we focus on improving ViT-AE (nicknamed ViT-AE++) for a more effective representation of 2D and 3D medical images. We propose two new loss functions to enhance the representation during training. The first loss term aims to improve self-reconstruction by considering the structured dependencies and indirectly improving the representation. The second loss term leverages contrastive loss to optimize the representation from two randomly masked views directly. We extended ViT-AE++ to a 3D fashion for volumetric medical images as an independent contribution. We extensively evaluate ViT-AE++ on both natural images and medical images, demonstrating consistent improvement over vanilla ViT-AE and its superiority over other contrastive learning approaches. Codes are here: https://github.com/chinmay5/vit_ae_plus_plus.git.
Existing self-supervised learning methods based on contrastive learning and masked image modeling have demonstrated impressive performances. However, current masked image modeling methods are mainly utilized in natural images, and their applications in medical images are relatively lacking. Besides, their fixed high masking strategy limits the upper bound of conditional mutual information, and the gradient noise is considerable, making less the learned representation information. Motivated by these limitations, in this paper, we propose masked patches selection and adaptive masking strategy based self-supervised medical image segmentation method, named MPS-AMS. We leverage the masked patches selection strategy to choose masked patches with lesions to obtain more lesion representation information, and the adaptive masking strategy is utilized to help learn more mutual information and improve performance further. Extensive experiments on three public medical image segmentation datasets (BUSI, Hecktor, and Brats2018) show that our proposed method greatly outperforms the state-of-the-art self-supervised baselines.
Although self-supervised learning methods based on masked image modeling have achieved some success in improving the performance of deep learning models, these methods have difficulty in ensuring that the masked region is the most appropriate for each image, resulting in segmentation networks that do not get the best weights in pre-training. Therefore, we propose a new adaptive-masking policy self-supervised learning method. Specifically, we model the process of masking images as a reinforcement learning problem and use the results of the reconstruction model as a feedback signal to guide the agent to learn the masking policy to select a more appropriate mask position and size for each image, helping the reconstruction network to learn more fine-grained image representation information and thus improve the downstream segmentation model performance. We conduct extensive experiments on two datasets, Cardiac and TCIA, and the results show that our approach outperforms current state-of-the-art self-supervised learning methods.
Self-supervised representation learning can boost the performance of a pre-trained network on downstream tasks for which labeled data is limited. A popular method based on this paradigm, known as contrastive learning, works by constructing sets of positive and negative pairs from the data, and then pulling closer the representations of positive pairs while pushing apart those of negative pairs. Although contrastive learning has been shown to improve performance in various classification tasks, its application to image segmentation has been more limited. This stems in part from the difficulty of defining positive and negative pairs for dense feature maps without having access to pixel-wise annotations. In this work, we propose a novel self-supervised pre-training method that overcomes the challenges of contrastive learning in image segmentation. Our method leverages Information Invariant Clustering (IIC) as an unsupervised task to learn a local representation of images in the decoder of a segmentation network, but addresses three important drawbacks of this approach: (i) the difficulty of optimizing the loss based on mutual information maximization; (ii) the lack of clustering consistency for different random transformations of the same image; (iii) the poor correspondence of clusters obtained by IIC with region boundaries in the image. Toward this goal, we first introduce a regularized mutual information maximization objective that encourages the learned clusters to be balanced and consistent across different image transformations. We also propose a boundary-aware loss based on cross-correlation, which helps the learned clusters to be more representative of important regions in the image. Compared to contrastive learning applied in dense features, our method does not require computing positive and negative pairs and also enhances interpretability through the visualization of learned clusters. Comprehensive experiments involving four different medical image segmentation tasks reveal the high effectiveness of our self-supervised representation learning method. Our results show the proposed method to outperform by a large margin several state-of-the-art self-supervised and semi-supervised approaches for segmentation, reaching a performance close to full supervision with only a few labeled examples.
No abstract available
Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), which are more relevant to medical image analysis than generic ones for natural images. However, previous work rarely paid attention to data with anatomy-oriented imaging planes, e.g., standard cardiac magnetic resonance imaging views. As these imaging planes are defined according to the anatomy of the imaged organ, pretext tasks effectively exploiting this information can pretrain the networks to gain knowledge on the organ of interest. In this work, we propose two complementary pretext tasks for this group of medical image data based on the spatial relationship of the imaging planes. The first is to learn the relative orientation between the imaging planes and implemented as regressing their intersecting lines. The second exploits parallel imaging planes to regress their relative slice locations within a stack. Both pretext tasks are conceptually straightforward and easy to implement, and can be combined in multitask learning for better representation learning. Thorough experiments on two anatomical structures (heart and knee) and representative target tasks (semantic segmentation and classification) demonstrate that the proposed pretext tasks are effective in pretraining deep networks for remarkably boosted performance on the target tasks, and superior to other recent approaches.
Semi-supervised learning has proven highly effective in tackling the challenge of limited labeled training data in medical image segmentation. In general, current approaches, which rely on intra-image pixel-wise consistency training via pseudo-labeling, overlook the consistency at more comprehensive semantic levels (e.g., object region) and suffer from severe discrepancy of extracted features resulting from an imbalanced number of labeled and unlabeled data. To overcome these limitations, we present a new Dual Cross-image Semantic Consistency (DuCiSC) learning framework, for semi-supervised medical image segmentation. Concretely, beyond enforcing pixel-wise semantic consistency, DuCiSC proposes dual paradigms to encourage region-level semantic consistency across: 1) labeled and unlabeled images; and 2) labeled and fused images, by explicitly aligning their prototypes. Relying on the dual paradigms, DuCiSC can effectively establish consistent cross-image semantics via prototype representations, thereby addressing the feature discrepancy issue. Moreover, we devise a novel self-aware confidence estimation strategy to accurately select reliable pseudo labels, allowing for exploiting the training dynamics of unlabeled data. Our DuCiSC method is extensively validated on four datasets, including two popular binary benchmarks in segmenting the left atrium and pancreas, a multi-class Automatic Cardiac Diagnosis Challenge dataset, and a challenging scenario of segmenting the inferior alveolar nerve that features complicated anatomical structures, showing superior segmentation results over previous state-of-the-art approaches. Our code is publicly available at https://github.com/ShanghaiTech-IMPACT/DuCiSC
No abstract available
Few-shot medical image segmentation typically uses a joint model for registration and segmentation. The registration model aligns a labeled atlas with unlabeled images to form initial masks, which are then refined by the segmentation model. However, inevitable spatial misalignments during registration can lead to inaccuracies and diminished segmentation quality. To address this, we developed EFS-MedSeg, an end-to-end model using two labeled atlases and few unlabeled images, enhanced by data augmentation and self-supervised learning. Initially, EFS-MedSeg applies a 3D random regional switch strategy to augment atlases, thereby enhancing supervision in segmentation tasks. This not only introduces variability to the training data but also enhances the model's ability to generalize and prevents overfitting, resulting in natural and smooth label boundaries. Following this, we use a variational autoencoder for a weighted reconstruction task, focusing the model's attention on areas with lower Dice scores to ensure accurate segmentation that conforms to the atlas image's shape and structural appearance. Moreover, we introduce a self-contrastive module aimed at improving feature extraction, guided by anatomical structure priors, thus enhancing the model's convergence and segmentation accuracy. Results on multi-modal medical image datasets show that EFS-MedSeg achieves performance comparable to fully-supervised methods. Moreover, it consistently surpasses the second-best method in Dice score by 1.4%, 9.1%, and 1.1% on the OASIS, BCV, and BCH datasets, respectively, highlighting its robustness and adaptability across diverse datasets. The source code will be made publicly available at: https://github.com/NoviceFodder/EFS-MedSeg.
Despite significant progress in 3D medical image segmentation using deep learning, manual annotation remains a labor-intensive bottleneck. Self-supervised mask propagation (SMP) methods have emerged to alleviate this challenge, allowing intra-volume segmentation with just a single slice annotation. However, the previous SMP methods often rely on 2D information and ignore volumetric contexts. While our previous work, called Vol2Flow, attempts to address this concern, it exhibits limitations, including not focusing enough on local (i.e., slice-pair) information, neglecting global information (i.e., volumetric contexts) in the objective function, and error accumulation during slice-to-slice reconstruction. This paper introduces Flow2Mask, a novel SMP method, developed to overcome the limitations of previous SMP approaches, particularly Vol2Flow. During training, Flow2Mask proposes the Local-to-Global (L2G) loss to learn inter-slice flow fields among all consecutive slices within a volume in an unsupervised manner. This dynamic loss is based on curriculum learning to gradually learn information within a volume from local to global contexts. Additionally, the Inter-Slice Smoothness (ISS) loss is introduced as a regularization term to encourage changes between the slices occur consistently and continuously. During inference, Flow2Mask leverages these 3D flow fields for inter-slice mask propagation in a 3D image, spreading annotation from a single annotated slice to the entire volume. Moreover, we propose an automatic strategy to select the most representative slice as initial annotation in the mask propagation process. Experimental evaluations on different abdominal datasets demonstrate that our proposed SMP method outperforms previous approaches and improves the overall mean DSC of Vol2Flow by +2.1%, +8.2%, and +4.0% for the Sliver, CHAOS, and 3D-IRCAD datasets, respectively. Furthermore, Flow2Mask even exhibits substantial improvements in weakly-supervised and self-supervised few-shot segmentation methods when applied as a mask completion tool. The code and model for Flow2Mask are available at https://github.com/AdelehBitarafan/Flow2Mask, providing a valuable contribution to the field of medical image segmentation.
The performance of deep learning models for medical image segmentation is often limited in scenarios where training data or annotations are limited. Self-Supervised Learning (SSL) is an appealing solution for this dilemma due to its feature learning ability from a large amount of unannotated images. Existing SSL methods have focused on pretraining either an encoder for global feature representation or an encoder-decoder structure for image restoration, where the gap between pretext and downstream tasks limits the usefulness of pretrained decoders in downstream segmentation. In this work, we propose a novel SSL strategy named Volume Fusion (VolF) for pretraining 3D segmentation models. It minimizes the gap between pretext and downstream tasks by introducing a pseudo-segmentation pretext task, where two sub-volumes are fused by a discretized block-wise fusion coefficient map. The model takes the fused result as input and predicts the category of fusion coefficient for each voxel, which can be trained with standard supervised segmentation loss functions without manual annotations. Experiments with an abdominal CT dataset for pretraining and both in-domain and out-domain downstream datasets showed that VolF led to large performance gain from training from scratch with faster convergence speed, and outperformed several state-of-the-art SSL methods. In addition, it is general to different network structures, and the learned features have high generalizability to different body parts and modalities.
The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Masked AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel Mask in Mask (MiM) pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, i.e., Computed Tomography (CT) images containing various body parts. Extensive experiments on twelve public datasets demonstrate the superiority of MiM over other SSL methods in organ/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. Code is available at https://github.com/JiaxinZhuang/MiM
Discriminative, restorative, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, fail to capitalize on the potentially synergistic effects these methods may offer in a ternary setup, which, we envision can significantly benefit deep semantic representation learning. Towards this end, we developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA: (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; (4) improves reusability of low/mid-level features; and (5) enhances restorative self-supervised approaches, revealing that DiRA is a general framework for united representation learning. Code and pretrained models are available at https://github.com/JLiangLab/DiRA.
Medical image segmentation is a critical task in clinical diagnosis and treatment. However, the high cost of acquiring annotated data and the scarcity of labeled datasets have made semi-supervised learning methods a focal point of research in this field. In this paper, we propose a semi-supervised medical image segmentation method that combines multimodal data augmentation, self-supervised pretraining, and pseudo-label generation and correction strategies. By leveraging complementary information from different modalities such as CT and MRI, along with self-supervised pretraining on unlabeled data, our model significantly improves segmentation performance under limited labeled data. Moreover, a confidence-based pseudo-label generation and correction strategy further enhances the label quality during the training process. Experimental results demonstrate that the proposed method outperforms traditional fully-supervised and semi-supervised approaches across multiple medical image segmentation datasets, with notable improvements in segmentation accuracy, especially in scenarios with scarce labeled data. This method provides an effective solution for medical image segmentation tasks and shows great potential for widespread application, particularly in medical image analysis with limited annotated data.
Deep learning models have emerged as the cornerstone of medical image segmentation, but their efficacy hinges on the availability of extensive manually labeled datasets and their adaptability to unforeseen categories remains a challenge. Few-shot segmentation (FSS) offers a promising solution by endowing models with the capacity to learn novel classes from limited labeled examples. A leading method for FSS is ALPNet, which compares features between the query image and the few available support segmented images. A key question about using ALPNet is how to design its features. In this work, we delve into the potential of using features from DINOv2, which is a foundational self-supervised learning model in computer vision. Leveraging the strengths of ALPNet and harnessing the feature extraction capabilities of DINOv2, we present a novel approach to few-shot segmentation that not only enhances performance but also paves the way for more robust and adaptable medical image analysis.
Semi-supervised medical image segmentation endeavors to exploit a limited set of labeled data in conjunction with a substantial corpus of unlabeled data, with the aim of training models that can match or even exceed the efficacy of fully supervised segmentation models. Despite the potential of this approach, most existing semi-supervised medical image segmentation techniques that employ consistency regularization predominantly focus on spatial consistency at the image level, often neglecting the crucial role of feature-level channel information. To address this limitation, we propose an innovative method that integrates graph convolutional networks with a consistency regularization framework to develop a dynamic graph consistency approach. This method imposes channel-level constraints across different decoders by leveraging high-level features within the network. Furthermore, we introduce a novel self-contrast learning strategy, which performs image-level comparison within the same batch and engages in pixel-level contrast learning based on pixel positions. This approach effectively overcomes traditional contrast learning challenges related to identifying positive and negative samples, reduces computational resource consumption, and significantly improves model performance. Our experimental evaluation on three distinct medical image segmentation datasets indicates that the proposed method demonstrates superior performance across a variety of test scenarios.
Semi-supervised medical image segmentation still faces challenges although it is able to obtain better segmentation results using a small amount of labeled data and a large amount of unlabeled data. Despite the progress made by current methods in utilizing unlabeled data, they fail to exploit the full potential of labeled data in terms of improving model performance. In this paper, we propose a semi-supervised segmentation method, DistillMatch, that incorporates knowledge distillation and feature perturbation, which efficiently transfers knowledge between labeled and unlabeled data, thus making full use of the information of labeled data to improve segmentation results. DistillMatch consists of several key components: the Self-Training process based on knowledge distillation and feature perturbation, the Deterministic Knowledge Transfer (DKT) strategy, and the introduction of Teacher Assistant (TA), which work together to optimize model performance. Extensive experiments on two benchmark datasets demonstrate that our method outperforms the current state-of-the-art (SOTA) approaches, especially in terms of edge accuracy and model generalization capabilities. We also show how this performance improvement can be achieved without sacrificing computational efficiency through an effective multi-decoder implementation strategy. These experimental results not only demonstrate the effectiveness of our approach, but also highlight its practical value in medical image segmentation tasks. Code is available at https://github.com/AiEson/DistillMatch.
The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection framework (SPSS) for BSS. Specifically, SPSS comprises two main components: 1) self-paced uncertainty sample selection (SU) for explicitly improving the quality of pseudo labels in the image space, and 2) self-paced bidirectional feature contrastive learning (SC) for implicitly improving the quality of pseudo labels through enhancing the separability between class semantics in the feature space. Both SU and SC are trained collaboratively in a self-paced learning manner, ensuring that SPSS can learn from high-quality pseudo labels for BSS. Extensive experiments on two public medical image segmentation datasets demonstrate the effectiveness and superiority of SPSS over the state-of-the-art. Our code is release at https://github.com/SuuuJM/SPSS.
With the development of medical image segmentation technology, high-quality automatic segmentation methods, particularly within semi-supervised learning frameworks, have become a research hotspot. This study introduces a new semi-supervised medical image segmentation algorithm called SymMatch. The algorithm effectively leverages limited labeled data along with a large amount of unlabeled data through a symmetrical network structure and knowledge distillation techniques. SymMatch applies a spectrum of perturbations, from weak to strong, at both image and feature levels, effectively leveraging the potential of unlabeled data. Additionally, by incorporating a bi-scale distillation loss, the model’s robustness and accuracy in handling complex medical imaging data are further enhanced. Experimental results show that SymMatch demonstrates superior performance across multiple recognized medical imaging datasets (such as ACDC, LA and PanNuke). Notably, even with very limited labeled data, it maintains high segmentation accuracy. These achievements not only advance the development of semi-supervised medical image segmentation technology but also provide new ideas and methods for future research in related technologies. Code is available at https://github.com/AiEson/SymMatch.
Self-training with data augmentation emerges as an efficacious strategy for harnessing unlabeled data in the realm of semi-supervised medical image segmentation. Within the synthetic domain, existing models make a deliberate trade-off, sacrificing some of its absolute performance on labeled data to bolster its generalization capabilities on the predominantly abundant unlabeled data encompassed within the entire dataset. In this study, we find out the essence of employing data augmentation techniques to create a proxy data domain that serves as a bridge between labeled and unlabeled data. To this end, we optimize the aforementioned approach by incorporating the concept of curriculum learning, which encompasses two primary components: Dynamic Copy-Paste strategies and the Self-Iterative Segmentation Model. Concerning the former, the dynamic scaling of the copy-paste box guides the model in acquiring shared semantics, progressing from easier (labeled data) to more challenging (unlabeled data). In order to facilitate this incremental learning process, we have devised models that supports progressive iterative evolution throughout the training phase. Our approach has demonstrated remarkable efficacy through a comprehensive series of benchmarks, consistently outperforming existing methods and achieving state-of-the-art performance.
本报告将医学图像分割领域的文献归纳为五大核心方向:(1)架构融合:探索Transformer、Mamba与CNN的协同,优化长程与局部特征感知;(2)交互与提示学习:基于SAM等大模型的医学专用化适配与交互式研究;(3)训练范式:解决标注匮乏问题的半/自监督、对比学习及领域适应研究;(4)临床应用优化:针对边界感知、轻量化与多任务协作的特定病理分割研究;(5)评估与标准化:构建通用的评估指标、基础模型与临床辅助框架。整体趋势表明领域已从单纯的模型架构调整,向更加严谨的临床场景落地、泛化性基础模型与标准化基准测试转型。