顶刊顶会 SCI一区 医学图像分割 Mamba
混合架构:CNN/Transformer与Mamba融合模型
通过融合CNN处理局部特征的优势与Mamba建模长距离依赖的能力,构建混合网络以提升医学图像分割精度,涵盖了大量利用多尺度特征融合的变体设计。
- A dual-branch network for lesion segmentation in medical images using state space models(Hao Chen, Byung-Won Min, Haifei Zhang, 2025, Quantitative Imaging in Medicine and Surgery)
- MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation(Caixu Xu, Junming Wei, Huizhen Chen, Pengchen Liang, Bocheng Liang, Ying Tan, Xin Wei, 2025, arXiv.org)
- Switch-UMamba: Dynamic scanning vision Mamba UNet for medical image segmentation(Ziyao Zhang, Qiankun Ma, Tong Zhang, Jie Chen, Hairong Zheng, Wen Gao, 2025, Medical Image Analysis)
- S-Net: A novel shallow network for enhanced detail retention in medical image segmentation(Qinghua Shang, Guanglei Wang, Xihao Wang, Yan Li, Hongrui Wang, 2025, Computer Methods and Programs in Biomedicine)
- SSRepVM-UNet: a lightweight hybrid model for medical image segmentation based on channel parallelism(Yijing Guo, Fuhang Li, Kunhua Li, Huawei Wang, Pengyu Xu, 2025, Applied Intelligence)
- ASP-VMUNet: Atrous Shifted Parallel Vision Mamba U-Net for Skin Lesion Segmentation(Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Changyu Zeng, Wenpei Bai, Guangliang Cheng, 2025, arXiv.org)
- MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation(Thi-Nhu-Quynh Nguyen, Quang-Huy Ho, Duy-Thai Nguyen, Hoang-Minh-Quang Le, Van-Truong Pham, Thi-Thao Tran, 2024, arXiv.org)
- MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation(Thai Bui, F. Bougourzi, Fadi Dornaika, Vinh Truong Hoang, 2025, arXiv.org)
- AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation(Viet-Thanh Nguyen, Van-Truong Pham, Thi-Thao Tran, 2024, arXiv.org)
- Gl-MambaNet: A global-local hybrid Mamba network for medical image segmentation(Xiaoyan Kui, Shen Jiang, Qinsong Li, Yifei Peng, Zhipeng Hu, Beiji Zou, 2025, Neurocomputing)
- Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion(K. S. Sanjid, Md. Tanzim Hossain, M. Junayed, M. M. Uddin, 2024, arXiv.org)
- MAFUNet: Mamba with adaptive fusion UNet for medical image segmentation(Minchen Yang, Ziyi Yang, N. Ruhaiyem, 2025, Image and Vision Computing)
- TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation(Haoyu Yang, Yuxiang Cai, Jintao Chen, Xuhong Zhang, Wenhui Lei, Xiaoming Shi, Jianwei Yin, Yankai Jiang, 2025, arXiv.org)
- An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation(Dayu Tan, Cheng Kong, Yansen Su, Haijie Chen, Dongliang Yang, Junfeng Xia, Chunhou Zheng, 2025, arXiv.org)
- Prompt-Guided Dual-Path UNet with Mamba for Medical Image Segmentation(Shaolei Zhang, Jinyan Liu, Tianyi Qian, Xuesong Li, 2025, arXiv.org)
- SCFMUNet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation(Zhiyong Huang, Zhiyu Zhao, Zhi Yu, Mingyang Hou, Shiyao Zhou, Jiahong Wang, Yan Yan, Yushi Liu, Hans Gregersen, 2025, Neural Networks)
- H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation(Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang, 2024, Neurocomputing)
- FreqConvMamba: Frequency-guided hierarchical hybrid SSM-CNN for medical image segmentation.(Yantao Song, Weixiang Dou, Yuhua Qian, Jieru Jia, Zheqing Zhu, 2026, Medical Image Analysis)
- HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation(Mingya Zhang, Limei Gu, Tingshen Ling, Xianping Tao, 2024, arXiv.org)
- Vision Mamba and xLSTM-UNet for medical image segmentation(Xin Zhong, Gehao Lu, Hao Li, 2025, Scientific Reports)
- VM-UNet: Vision Mamba UNet for Medical Image Segmentation(Jiacheng Ruan, Jincheng Li, Suncheng Xiang, 2024, ACM Transactions on Multimedia Computing, Communications, and Applications)
- VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation(Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao, 2024, arXiv.org)
- InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation(D. Kareem, Abdul Hannan, Mubashir Noman, Jean Lahoud, M. Fiaz, Hisham Cholakkal, 2025, arXiv.org)
- HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation(Jiashu Xu, 2024, arXiv.org)
- MambaVesselNet++: A Hybrid CNN-Mamba Architecture for Medical Image Segmentation(Qing Xu, Yanming Chen, Yue Li, Ziyu Liu, Zhenye Lou, Yixuan Zhang, Huizhong Zheng, Xiangjian He, 2025, ACM Transactions on Multimedia Computing, Communications, and Applications)
- HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation(Haodong Chen, Xian-Hua Han, Qwen, 2025, arXiv.org)
- A parallel UNet integrating KAN and mamba for medical image segmentation(Jiyuan Liu, Jiabao Wu, Liming Xu, Wenxu Shi, Bochuan Zheng, 2026, Scientific Reports)
- Efficient UNet fusion of convolutional neural networks and state space models for medical image segmentation(Wenjie Meng, Aiming Mu, Huajun Wang, 2025, Digital Signal Processing)
- WTCM-UNet: A hybrid CNN-SSM framework combining wavelet transform for medical image segmentation(Zhihua Gan, Zhongxiang Xie, Yushu Zhang, Weihong Han, Bo Zhang, Xiu-li Chai, 2026, Biomedical Signal Processing and Control)
纯视觉状态空间模型及其基础算子优化
聚焦于纯Mamba架构及其核心算子的改进,通过算法层面的参数优化、扫描机制调整和多尺度融合,追求极致的计算效率与特征表达能力。
- Dual-domain guided vision Mamba network for medical image segmentation(Shangwang Liu, Mengjiao Zhao, Yinghai Lin, 2025, Expert Systems with Applications)
- LoG-VMamba 🐍: Local-Global Vision Mamba for Medical Image Segmentation(Trung Dang, Huy Hoang Nguyen, A. Tiulpin, 2024, Lecture Notes in Computer Science)
- Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation(Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li, 2024, arXiv.org)
- HMM-VMamba: High-Order Morphological Method Vision Mamba for Medical Image Segmentation(Yifeng Yao, Beiying He, Minsheng Tan, Xiang Li, Zhenzhen Hu, Xingxing Duan, Lingna Chen, 2024, Lecture Notes in Computer Science)
- RM-UNet: UNet-like Mamba with rotational SSM module for medical image segmentation(Hao Tang, Guoheng Huang, Lianglun Cheng, Xiaochen Yuan, Qi Tao, Xuhang Chen, Guo Zhong, Xiaohui Yang, 2024, Signal, Image and Video Processing)
- UM-Mamba: An efficient U-network with medical visual state space for medical image segmentation(Hejian Chen, Qing Liu, Zhongming Fu, Li Liu, 2025, Computer Vision and Image Understanding)
- SK-VM++: Mamba assists skip-connections for medical image segmentation(Renkai Wu, L. Pan, Pengchen Liang, Qing Chang, Xianjin Wang, Weihuan Fang, 2025, Biomedical Signal Processing and Control)
- A sequential flow UNet for MRI brain tumor segmentation based on state-space-model(Jiacheng Lu, Hui Ding, Qirun Huo, Kaiwen Wang, Xinyu Sun, Shiyu Zhang, 2025, Applied Soft Computing)
- SegResMamba: An Efficient Architecture for 3D Medical Image Segmentation(B. K. Das, Ajay Singh, Saahil Islam, Gengyan Zhao, Andreas K. Maier, 2025, arXiv.org)
- Rethinking Convergence in Deep Learning: The Predictive-Corrective Paradigm for Anatomy-Informed Brain MRI Segmentation(Feifei Zhang, Zhenhong Jia, Sensen Song, Fei Shi, Dayong Ren, 2025, arXiv.org)
- PV-SSM: Exploring Pure Visual State Space Model for High-dimensional Medical Data Analysis(Cheng Wang, Xinyu Liu, Chenxin Li, Yifan Liu, Yixuan Yuan, 2024, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
- MedMamba: Multi-scale deformable attention via state space models for robust medical image segmentation(Junming Wang, Dajiang Lei, Yuqi Zhang, Jin Yuan, Chen Liu, Bin Luo, Qun Liu, Guoyin Wang, 2026, Biomedical Signal Processing and Control)
- Neural Memory State Space Models for Medical Image Segmentation(Zhihua Wang, Jingjun Gu, Wang Zhou, Quansong He, Tianli Zhao, Jialong Guo, Li Lu, Tao He, Jiajun Bu, 2024, International Journal of Neural Systems)
- SRL-UNet: An Improved Residual U-Net with 2D-Selective-Scan for Nuclear Segmentation(Junhui Xin, Jingyi Weng, Jierui Zhao, Hui Ding, Ouli Luo, Fanqian Meng, Jingbing Yang, 2025, Lecture Notes in Computer Science)
- MFEVM-UNet: Multi-scale Feature Fusion and Enhancement Vision Mamba UNet for medical image segmentation(Fengshuo Guo, Shizheng Zhang, zhenxing sun, Leilei Zhang, Junze Guo, Xin Lu, Xufan Chen, 2026, Biomedical Signal Processing and Control)
- Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation(Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kai Wu, 2024, arXiv.org)
3D医学图像扫描策略与高效建模
专门针对3D医学体数据特点,通过多维扫描、四向扫描或层次化路由等机制,解决三维空间连续性及长序列建模的复杂性挑战。
- Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation(Szymon Płotka, Maciej Chrabaszcz, Gizem Mert, Ewa Szczurek, A. Sitek, 2025, arXiv.org)
- MM-UNet: Meta Mamba UNet for Medical Image Segmentation(Bin Xie, Yan Yan, G. Agam, 2025, arXiv.org)
- MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation(Yu Liu, Yang Feng, Juan Cheng, Haolin Zhan, Zhiqin Zhu, 2025, IEEE Transactions on Image Processing)
- DM-SegNet: Dual-Mamba Architecture for 3D Medical Image Segmentation with Global Context Modeling(Hangyu Ji, 2025, arXiv.org)
- MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis(Wei Dai, Jun Liu, 2025, arXiv.org)
轻量化与特定范式增强研究
涵盖了模型轻量化设计、少样本学习、弱监督分割以及与SAM等基础模型的交互式增强研究,侧重于特定临床场景的范式优化。
- LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation(Weibin Liao, Yinghao Zhu, Xinyuan Wang, Cehngwei Pan, Yasha Wang, Liantao Ma, 2024, arXiv.org)
- MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation(C. Bian, Nan Xia, Xia Yang, Feifei Wang, Fengjiao Wang, Bin Wei, Q. Dong, 2024, arXiv.org)
- VMS2-UNet: A Lightweight Non-Causal Vision Mamba2 Model with State Space Duality for Medical Image Segmentation(Jiawei Mo, Chaoqun Wang, 2025, 2025 International Joint Conference on Neural Networks (IJCNN))
- A Lightweight Vision Mamba Coding UNet for medical image segmentation(Yuanyuan Li, Yifei Duan, Guanqiu Qi, Baisen Cong, Li Zhang, Zhiqin Zhu, 2025, Engineering Applications of Artificial Intelligence)
- UltraLight VM-UNet: Parallel Vision Mamba significantly reduces parameters for skin lesion segmentation(Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang, 2024, Patterns)
- MDVM-UNet: An Improved Multiscale-Convolutional Dual-Pooling Vision Mamba Model for Medical Ultrasound Image Segmentation(Z Zhu, H An, W Chen, X Zhong, J Zhang, 2026, Engineering Research …)
- KM-UNet KAN Mamba UNet for medical image segmentation(Yibo Zhang, 2025, arXiv.org)
- Selective and Multi-Scale Fusion Mamba for Medical Image Segmentation(Guangju Li, Qinghua Huang, Wei Wang, Longzhong Liu, 2024, Expert Systems with Applications)
- Frequency-Enhanced Lightweight Vision Mamba Network for Medical Image Segmentation(Shangwang Liu, Yinghai Lin, Danyang Liu, Peixia Wang, Bingyan Zhou, Feiyan Si, 2025, IEEE Transactions on Instrumentation and Measurement)
- SAMM: Segment Anything Mamba Model for General Medical Image Segmentation(Qingxue Zhao, Di Wu, Siqi Wang, Jun-Wei Tian, 2026, ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation(Yufeng Jiang, Zongxi Li, Xiangyang Chen, Haoran Xie, Jing Cai, 2024, arXiv.org)
- SpectralMamba-UNet: Frequency-Disentangled State Space Modeling for Texture-Structure Consistent Medical Image Segmentation(Fuhao Zhang, Lei Liu, Jialin Zhang, Ya-nan Zhang, Nan Mu, 2026, arXiv.org)
- QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models(Tien-Yu Chi, Hung-Yueh Chiang, Diana Marculescu, Kai-Chiang Wu, 2025, arXiv.org)
- Taming Mambas for Voxel Level 3D Medical Image Segmentation(Luca Lumetti, Vittorio Pipoli, Kevin Marchesini, Elisa Ficarra, C. Grana, Federico Bolelli, 2024, arXiv.org)
- Parallel Prototype Filter and Ssm for Few Shot Medical Image Segmentation(zhx zhx, chen houjin, Yanfeng Li, sun jia, Chen Zi-wei, Jiaxin Li, 2025, SSRN Electronic Journal)
- Swin-UMamba†: Adapting Mamba-Based Vision Foundation Models for Medical Image Segmentation(Jiarun Liu, Hao Yang, Hong-Yu Zhou, Lequan Yu, Yong Liang, Yizhou Yu, Shaoting Zhang, Hairong Zheng, Shanshan Wang, 2024, IEEE Transactions on Medical Imaging)
- Mitosis detection in domain shift scenarios: a Mamba-based approach(G. Percannella, M. Sarno, F. Tortorella, M. Vento, 2025, arXiv.org)
- A Hybrid Mamba-SAM Architecture for Efficient 3D Medical Image Segmentation(Mohammadreza Gholipour Shahraki, M. Rezaeian, Mohammad Ghasemzadeh, 2026, arXiv.org)
- MS-VMUnet: Multispectral vision Mamba for medical image segmentation(Hanjun Tao, Hua Wang, Zewen Zhang, Fan Zhang, 2026, Biomedical Signal Processing and Control)
- MambaSAM: A Visual Mamba-Adapted SAM Framework for Medical Image Segmentation(Pengchen Liang, Leijun Shi, Bin Pu, Renkai Wu, Jianguo Chen, Lixin Zhou, Lite Xu, Zhuangzhuang Chen, Qing Chang, Yiwei Li, 2025, IEEE Journal of Biomedical and Health Informatics)
- Vision-Language Controlled Deep Unfolding for Joint Medical Image Restoration and Segmentation(Ping Chen, Zichen Huang, Xiangming Wang, Yung-Hsing Liu, Bing Liang, Haijin Zeng, Yongyong Chen, 2026, arXiv.org)
- Artificial Intelligence in Breast Cancer Care: Transforming Preoperative Planning and Patient Education with 3D Reconstruction(M. Khanbhai, Giulia Di Nardo, Jun Ma, V. Freitas, C. Masino, Ali Dolatabadi, Zhaoxun “Lorenz”, Liu, W. Leong, Wagner H. Souza, Amin Madani, 2025, arXiv.org)
- ProMamba: Prompt-Mamba for polyp segmentation(Jianhao Xie, Ruofan Liao, Ziang Zhang, Sida Yi, Yuesheng Zhu, Guibo Luo, 2024, arXiv.org)
- Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation(Ziyang Wang, Chao Ma, 2024, arXiv.org)
- Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment(K. S. Sanjid, Md. Tanzim Hossain, M. Junayed, M. M. Uddin, 2024, arXiv.org)
- FMaMIL: Frequency-Driven Mamba Multi-Instance Learning for Weakly Supervised Lesion Segmentation in Medical Images(Hangbei Cheng, Xiaorong Dong, Xueyu Liu, Jianan Zhang, Xuetao Ma, Mingqiang Wei, Liansheng Wang, Junxing Chen, Yongfei Wu, 2025, arXiv.org)
- Proxy Prompt: Endowing SAM and SAM 2 with Auto-Interactive-Prompt for Medical Segmentation(Xinyi Wang, Hong-yu Kang, Peishan Wei, Shuai Li, Yu Sun, Sai-kit Lam, Yongping Zheng, 2025, arXiv.org)
- ASM-UNet: Adaptive Scan Mamba Integrating Group Commonalities and Individual Variations for Fine-Grained Segmentation(Bo Wang, Mengyuan Xu, Yue Yan, Yuqun Yang, Kechen Shu, W. Ping, Xu Tang, Wei Jiang, Zheng You, 2025, arXiv.org)
- Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis: A Comparative Study of Mamba, SAM, and YOLO(Ali Badiezadeh, Amin Malek Mohammadi, S. Mirhassani, P. Gifani, Majid Vafaeezadeh, 2024, arXiv.org)
- Adversarial Bidirectional Enhanced Mamba for Few-Shot Medical Image Segmentation(Bingjie Guo, Wenhui Huang, 2025, Lecture Notes in Computer Science)
- Enhancing Medical Image Segmentation with Mamba and UNet++(Ahmed AL Qurri, M. Almekkawy, 2025, 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI))
- Enhancing Medical Image Segmentation via Heat Conduction Equation(Rong Wu, Yi Yu, 2025, arXiv.org)
- Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation(Y. Lyu, Lian Xu, Mohammed Bennamoun, F. Boussaïd, Coen Arrow, Girish Dwivedi, 2025, arXiv.org)
- ABE-Mamba: Few-shot medical image segmentation via adversarial bidirectional enhanced Mamba(Bingjie Guo, Wenhui Huang, Xiaoyan Wang, 2025, Expert Systems with Applications)
系统性基准与综合评估
对当前各类医学图像分割模型(CNN、Transformer、Mamba)进行基准性能对比,评估计算效率、泛化性及在真实临床任务中的鲁棒性。
- xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart(Tianrun Chen, Chao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Yan Wang, Ying-Dong Zang, Zejian Li, 2024, arXiv.org)
- Taming Mambas for 3D Medical Image Segmentation(Luca Lumetti, Vittorio Pipoli, Kevin Marchesini, Elisa Ficarra, C. Grana, Federico Bolelli, 2025, IEEE Access)
- From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation(P. M. Kazaj, G. Baj, Y. Salimi, A. Stark, W. Valenzuela, G. Siontis, H. Zaidi, Mauricio Reyes, C. Graeni, I. Shiri, 2025, arXiv.org)
当前医学图像领域Mamba研究已形成四大核心逻辑:一是以混合架构应对全局与局部信息的平衡;二是以纯视觉SSM探索架构性能边界;三是以针对3D体数据的扫描策略优化解决高维连续性建模;四是以轻量化设计与特定范式(SAM适配、弱监督、少样本)满足多样化临床应用需求。此外,研究界正通过系统性对比评估,逐步明确Mamba相对于传统架构在不同医学场景下的实际效能与性能上限。
总计88篇相关文献
Vision foundation models have shown great potential in improving generalizability and data efficiency, especially for medical image segmentation since medical image datasets are relatively small due to high annotation costs and privacy concerns. However, current research on foundation models predominantly relies on transformers. The high quadratic complexity and large parameter counts make these models computationally expensive, limiting their potential for clinical applications. In this work, we introduce Swin-UMamba†, a novel Mamba-based model for medical image segmentation that seamlessly leverages the power of the vision foundation model, which is also computationally efficient with the linear complexity of Mamba. Moreover, we investigated and verified the impact of the vision foundation model on medical image segmentation, in which a self-supervised model adaptation scheme was designed to bridge the gap between natural and medical data. Notably, Swin-UMamba† outperforms 7 state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches across AbdomenMRI, Encoscopy, and Microscopy datasets. The code and models are publicly available at: https://github.com/JiarunLiu/Swin-UMamba.
Given the high variability in the morphology and size of lesion areas in medical images, accurate medical image segmentation requires both precise positioning of global contours and …
In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .
… In this study, we analyze the impact of Mamba on skip-connection operations for U-shaped … +) combining the UNet++ framework and Mamba. Specifically, Mamba is able to refine the …
In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks, e.g. obtaining 89.03, 89.71 and 81.08 in terms of DSC score on three datasets respectively. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet.
Recently, the field of 3D medical segmentation has been dominated by deep learning models employing Convolutional Neural Networks (CNNs) and Transformer-based architectures, each with its distinctive strengths and limitations. CNNs are constrained by a local receptive field, whereas Transformers are hindered by their substantial memory requirements as well as their data hunger, making them not ideal for processing 3D medical volumes at a fine-grained level. For these reasons, fully convolutional neural networks, as nnU-Net, still dominate the scene when segmenting medical structures in large 3D medical volumes. Despite numerous advancements toward developing Transformer variants with subquadratic time and memory complexity, these models still fall short in content-based reasoning. A recent breakthrough is Mamba, a Recurrent Neural Network (RNN) based on State-Space Models (SSMs), outperforming Transformers in many long-context tasks (million-length sequences) on famous natural language processing and genomic benchmarks while keeping a linear complexity. In this paper, we evaluate the effectiveness of Mamba-based architectures in comparison to state-of-the-art convolutional and Transformer-based models for 3D medical image segmentation across three well-established datasets: Synapse Abdomen, MSD BrainTumor, and ACDC. Additionally, we address the primary limitations of existing Mamba-based architectures by proposing alternative architectural designs, hence improving segmentation performances. The source code is publicly available to ensure reproducibility and facilitate further research: https://github.com/LucaLumetti/TamingMambas
The Segment Anything Model (SAM) has shown exceptional versatility in segmentation tasks across various natural image scenarios. However, its application to medical image segmentation poses significant challenges due to the intricate anatomical details and domain-specific characteristics inherent in medical images. To address these challenges, we propose a novel VMamba adapter framework that integrates a lightweight, trainable Visual Mamba (VMamba) branch with the pre-trained SAM ViT encoder. The VMamba adapter accurately captures multi-scale contextual correlations, integrates global and local information, and reduces ambiguities arising from local features only. Specifically, we propose a novel cross-branch attention (CBA) mechanism to facilitate effective interaction between the SAM and VMamba branches. This mechanism enables the model to learn and adapt more efficiently to the nuances of medical images, extracting rich, complementary features that enhance its representational capacity. Beyond architectural enhancements, we streamline the segmentation workflow by eliminating the need for prompt-driven input mechanisms. This results in an autonomous prediction model that reduces manual input requirements and improves operational efficiency. In addition, our method introduces only minimal additional trainable parameters, offering an efficient solution for medical image segmentation. Extensive evaluations of four medical image datasets demonstrate that our VMamba adapter framework achieves state-of-the-art performance. Specifically, on the ACDC dataset with limited training data, our method achieves an average Dice coefficient improvement of 0.18 and reduces the Hausdorff distance by 20.38 mm compared to the AutoSAM.
Medical image segmentation is predominantly achieved with U-Net architectures based on Convolutional Neural Networks (CNNs). However, U-Net has two primary limitations. First, CNNs are constrained in modeling long-range dependencies, a limitation that is partially addressed by transformers, which face challenges due to their quadratic computational complexity. Second, there is a semantic gap in U-Net between feature maps in the encoder and decoder, especially between shallow and deep layers. To address these issues, we propose Mamba-UNet++, which alleviates the limited receptive field using a Visual State Space Duality (VSSD) vision block based on the improved Mamba2 VSS block. To bridge the semantic gap, Mamba-UNet++ replaces U-Net's direct skip connections with UNet++ dense skip connections and incorporates deep supervision during training. Extensive experiments on three datasets across different modalities show that Mamba-UNet++ outperforms competing methods, as evidenced by metrics such as DSC and HD95.
… With the proposed Mamba-based modules, we present our segmentation models for both 2D and 3D medical imaging data. For 2D segmentation, we build our model on top of Swin-…
Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.
Medical image segmentation plays an important role in computer-aided diagnosis. Traditional convolution-based U-shape segmentation architectures are usually limited by the local receptive field. Existing vision transformers have been widely applied to diverse medical segmentation frameworks due to their superior capabilities of capturing global contexts. Despite the advantage, the real-world application of vision transformers is challenged by their non-linear self-attention mechanism, requiring huge computational costs. To address this issue, the selective state space model (SSM) Mamba has gained recognition for its adeptness in modeling long-range dependencies in sequential data, particularly noted for its efficient memory costs. In this paper, we propose MambaVesselNet++, a Hybrid CNN-Mamba framework for medical image segmentation. Our MambaVesselNet++ is comprised of a hybrid image encoder (Hi-Encoder) and a bifocal fusion decoder (BF-Decoder). In Hi-Encoder, we first devise the texture-aware layer to capture low-level semantic features by leveraging convolutions. Then, we utilize Mamba to effectively model long-range dependencies with linear complexity. The Bi-Decoder adopts skip connections to combine local and global information of the Hi-Encoder for the accurate generation of segmentation masks. Extensive experiments demonstrate that MambaVesselNet++ outperforms current convolution-based, transformer-based, and Mamba-based state-of-the-arts across diverse medical 2D, 3D, and instance segmentation tasks. The code is available at https://github.com/CC0117/MambaVesselNet.
… Mamba architecture has garnered considerable attention in medical image segmentation … low computational overhead while maintaining high segmentation accuracy. Based on this, we …
Automatic segmentation of medical images is a crucial step for lesion measurement in computer-aided diagnosis. Convolutional neural networks (CNNs) and vision transformers (ViTs) are widely adopted but have limitations. To address these challenges, we propose a frequency-enhanced lightweight vision Mamba (FMamba) network for automatic medical image segmentation. Specifically, we introduce the vision state-space (VSS) and frequency feature enhancement (FFE) modules for efficient parallel feature extraction. The VSS module employs 2-D-Selective-Scan (SS2D) to scan feature maps in multiple directions, effectively building long-range dependencies. At the same time, the FFE module refines the frequency domain of the feature maps, yielding enhanced global feature representations, thereby enhancing global context awareness. Compared to UNet, our method reduces GFLOPs and Parameters by 25.99 times and 5.84 times, respectively. On the BUSI dataset, Dice and intersection over union (IoU) scores improved by 3.25% and 3.35%, respectively. On Dataset B, improvements were 2.69% and 2.21%, respectively. Our method can effectively integrate state-space model (SSM) and frequency domain features, surpassing existing methods in medical image segmentation tasks.
… been the important approach for segmentation due to their ability … a dual-domain guided vision Mamba network that strikes a … The second stage applies vision Mamba in both the spatial …
… (RPE) on the Vision Mamba architecture to address the limitation of the Mamba module’s … UNet outperforms current lightweight segmentation methods and strikes a balance between …
Recently, State Space Models (SSMs), particularly the Mamba-based framework, have demonstrated exceptional performance in medical image segmentation. This is attributed to their capacity to capture long-range dependencies efficiently with linear computational complexity. Nonetheless, current Mamba-based models encounter challenges in preserving the spatial context of 2D visual features, which is a consequence of their reliance on static 1D selective scanning patterns. In this study, we present Switch-UMamba, an innovative hybrid UNet framework that integrates local feature extraction power of Convolutional Neural Networks (CNNs) with the abilities of SSMs for capturing the long-range dependency. Switch-UMamba capitalizes on the Switch Visual State Space (VSS) module to leverage the Mixture-of-Scans (MoS) approach, a new scanning mechanism that amalgamates diverse scanning policies by considering each scan head as an expert within the Mixture-of-Experts (MoE) framework. MoS employs a router to dynamically allocate appropriate scanning policies and corresponding scan heads for each sample. This sparse-activated dynamic scanning approach not only ensures a rich and comprehensive acquisition of spatial information but also curtails computational expenses. Our comprehensive experimental evaluation on several medical image segmentation benchmarks indicates that Switch-UMamba has achieved state-of-the-art performances without using any pretrained weights. It is also worth highlighting that our approach outperforms other Mamba-based models with fewer parameters.
Summary Traditionally, to improve the segmentation performance of models, most approaches prefer to use more complex modules. This is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models, represented by Mamba, have become a strong competitor to traditional convolutional neural networks and transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named the PVM Layer, which achieves competitive performance with the lowest computational complexity while keeping the overall number of processing channels constant. We conducted segmentation experiments on three public datasets of skin lesions and showed that UltraLight VM-UNet exhibits competitive performance with only 0.049M parameters and 0.060 GFLOPs.
With the rapid advancement of deep learning, computer-aided diagnosis and treatment have become crucial in medicine. UNet is a widely used architecture for medical image segmentation, and various methods for improving UNet have been extensively explored. One popular approach is incorporating transformers, though their quadratic computational complexity poses challenges. Recently, State-Space Models (SSMs), exemplified by Mamba, have gained significant attention as a promising alternative due to their linear computational complexity. Another approach, neural memory Ordinary Differential Equations (nmODEs), exhibits similar principles and achieves good results. In this paper, we explore the respective strengths and weaknesses of nmODEs and SSMs and propose a novel architecture, the nmSSM decoder, which combines the advantages of both approaches. This architecture possesses powerful nonlinear representation capabilities while retaining the ability to preserve input and process global information. We construct nmSSM-UNet using the nmSSM decoder and conduct comprehensive experiments on the PH2, ISIC2018, and BU-COCO datasets to validate its effectiveness in medical image segmentation. The results demonstrate the promising application value of nmSSM-UNet. Additionally, we conducted ablation experiments to verify the effectiveness of our proposed improvements on SSMs and nmODEs.
Medical image segmentation is essential for disease diagnosis and therapy planning, but the complexity of multi-organ structures and blurred skin lesion boundaries poses challenges. CNNs and Transformers are constrained by limited receptive fields and high computational complexity. The state-space model effectively captures long-range dependencies with linear complexity but struggles with local modeling and channel attention.These methods struggle to detect subtle differences in lesion areas, leading to poor performance in medical image segmentation, especially when the lesions are discontinuous or boundaries are unclear.To address these challenges, we propose SCFMUNet, which enhances both local and global modeling across multi-scale features and effectively captures spatial and channel semantics. SCFMUNet integrates three key fusion strategies: 1) At the bottleneck, the multi-scale state-space fusion module is designed to combine convolutions and the SS2D method, processes and fuses the encoder stage features. 2) In the skip connections, the gated adaptive channel mechanism dynamically adjusts the encoder features and fuses them with the decoder stage features using channel-wise addition. 3) In the decoder stages, the spatial channel state-space model performs spatial and channel-level modeling on the fused features from the skip connection stage and the previous decoder layer. Experiments on four public datasets were conducted. On the Synapse dataset and ACDC dataset, our SCFMUNet achieved 82.31 % and 92.14 % on Dice. Compared to state-of-the-art methods, SCFMUNet improves Dice by 0.85 % on Synapse and 1.0 % on ACDC. On the ISIC2017 and ISIC2018 skin lesion datasets, SCFMUNet achieved Dice scores of 90.69 % and 89.69 %, with improvements ranging from 0.5 % to 2 % compared to state-of-the-art methods. Experimental results show that SCFMUNet outperforms state-of-the-art methods on four publicly available biomedical datasets.The source code is publicly available https://github.com/zzzeed/SCFMUNet.
… Accurate and robust medical image segmentation remains a critical challenge due to the … -scale spatially adaptive attention with efficient state space models (SSMs). Our architecture …
… State Space Models … Medical Visual State Space (MVSS) block with 2D Spiral Selective Scanning (SSS2D) module as the core, and constructs a U-shaped medical image segmentation …
… large-scale medical image datasets. … model, this study proposes an innovative model, CNS-UNet: Combined Neural Network and State Space Model in UNet, which integrates the State …
Background Lesion segmentation in medical images is crucial for clinical diagnosis and treatment planning. However, existing methods often struggle to effectively extract both local and global features, limiting segmentation accuracy. To address this challenge, we propose a dual-branch network that integrates state space models (SSMs) with deep convolutional networks to enhance the extraction of both local and global features, thus improving lesion segmentation performance. Methods The proposed model employs a dual-branch encoder: one branch incorporates the visual state space encoder to efficiently model long-range contextual dependencies, while the other branch, based on the residual network, extracts hierarchical local features. To refine feature representation, we introduce a lightweight multi-scale depth-wise separable convolution block, ensuring adaptability to varying lesion sizes while maintaining computational efficiency. The fused features are processed by the decoder for high-precision segmentation. Results Extensive experiments on the Kaggle_3M and Kvasir-SEG datasets demonstrated that the proposed model outperformed existing state-of-the-art models. Specifically, it achieved a dice similarity coefficient (Dice) of 0.9140 and a false negative rate (FNR) of 0.0800 on Kaggle_3M dataset, and a Dice of 0.9173 and an FNR of 0.0788 on Kvasir-SEG dataset. Compared to other models, our model delivered superior quantitative results and visual segmentation performance. In addition, when trained on Kvasir-SEG and tested on two external datasets, our model demonstrated superior cross-dataset generalization. Conclusions The proposed model integrates SSMs and deep convolutional networks to improve lesion segmentation by effectively capturing both local and global features. It offers new insights for medical image segmentation with potential clinical applications.
… Since the image masks utilized in our medical image segmentation are binary, the segmentation task can be regarded as a pixel-wise binary classification. Hence, we employ a …
Accurate segmentation of medical images is a fundamental prerequisite for quantitative disease diagnosis, treatment planning, and computational pathology. Although convolutional neural networks (CNNs) and Mamba-based approaches have shown promise in this domain, each comes with distinct strengths and limitations. To address these challenges, we propose a novel hierarchical network named FreqConvMamba. The core innovation of this architecture lies in its frequency-guided feature extraction mechanism, which enables simultaneous modeling of both local and global information across spatial and frequency domains. Furthermore, the integration of Haar wavelet transformation decomposes features into different frequency components, thereby enhancing the representation of fine details such as anatomical boundaries. We also introduce a Frequency Position Encoding (FPE) module that incorporates positional encoding along the frequency dimension, embedding spatial structural awareness while preserving the discriminative nature of frequency representations. This design effectively mitigates the lack of spatial perception in frequency-domain features and significantly improves the efficiency of frequency-aware feature extraction. Experimental evaluations on five public datasets spanning three imaging modalities demonstrate that FreqConvMamba outperforms state-of-the-art methods across multiple performance metrics. Code is available at: https://github.com/ccode-Rookie/FreqConvMamba.
… Medical image segmentation … image segmentation. However, CNNs struggle with global information extraction due to their limited receptive field and insensitivity to low-frequency image …
Despite previous endeavors to utilize Convolutional Neural Networks and Transformers as base networks for medical image analysis, their architectures still harbor inherent limitations: either an inability to model long-range dependencies or colossal computational consumption due to global self-attention. Recently, State Space Models (SSMs) have exhibited impressive capabilities in modeling long-term dependencies with satisfactory linear computational complexity. Nevertheless, extant medical visual SSMs are constrained by their limited capacity to capture inter-patch relationships and inefficient modeling due to the introduction of additional depth convolutions to handle high-dimensional data. In this paper, we propose a novel, Pure Visual State Space Model (PV-SSM) for high-dimensional medical data analysis. Different from prior medical visual SSMs, our proposed framework does not involve any convolutional or global attention operations while leverages a series of Pure-SSM blocks that employ a novel parallel-SSM mechanism to simultaneously extract feature data across different dimensions. Furthermore, we propose a learnable Parameterized Positional Encoding, which incorporates absolute positional information into patch features, effectively endowing inter-patch relationships with stronger inferential capabilities. We conducted extensive validation on various modalities of medical imaging data. Experimental results demonstrate superior performance and efficacy of our model against existing models. Our codes are available at https://github.com/chengwang96/PV-SSM
… In medical image segmentation, SSM-based designs demonstrate strong accuracy–… -purpose medical image segmentation framework that is compatible with SAM yet tailored to medical …
… Accordingly, we propose a few-shot medical image segmentation method based on parallel prototype filtering and state space models, named as PPFS. To obtain high-quality support …
… These results demonstrate clear performance gains across multiple benchmarks, establishing SF-SSM UNet as a state-of-the-art approach for sequential medical image segmentation. …
CNN-based and Transformer-based models are widely applied in medical image segmentation. However, CNN models exhibit limitations in capturing long-range dependencies, while Transformer models demand a substantially higher number of parameters due to their self-attention mechanism. To address these challenges, some methods adopt State Space Models (SSM), which excel at modeling long-range interactions with a compact parameter design. Recently, Mamba2 introduces State Space Duality (SSD), an improved variant of SSM that enhances model performance and efficiency. Nevertheless, the inherent causal property of SSM/SSD restricts their applicability in medical image segmentation. To overcome this limitation, we propose a novel method, Vision Mamba2 State-Space UNet (VMS2-UNet), a U-shaped lightweight Vision Mamba2 model with Non-Causal State Space Duality, specifically designed for medical image segmentation. VMS2-UNet integrates multiple NCState Blocks (NCS Blocks), which adopt the non-causal format of SSD to efficiently model long-range dependencies while preserving parameter efficiency. To mitigate spatial information loss caused by downsampling, we introduce FIMA (Feature Integration and Modulation Attention), which enhances feature integration in the skip connections, improving segmentation performance while adding only 0.04M parameters. We conduct extensive experiments on three challenging benchmarks, and the results show that VMS2-UNet achieves competitive performance in medical image segmentation, outperforming several state-of-the-art methods while maintaining a lightweight design.
… In the field of medical image processing, combining global and local relationship modeling is an effective method for achieving precise image segmentation. Previous studies have …
… Mamba, we propose Mamba with Adaptive Fusion UNet (MAFUNet). First, we design a hierarchy-aware Mamba … across different channel branches through Mamba and balances feature …
Medical image segmentation is fundamental for delineating lesion and organ boundaries in clinical workflows. While UNet-based models remain widely used, CNN-dominant designs are limited in modeling long-range context, and Transformer-based variants often introduce substantial computational overhead due to quadratic attention. To address this issue, we propose KMP-UNet, a parallel U-shaped framework that combines a Mamba-based state-space branch for linear-complexity contextual modeling and a Kolmogorov–Arnold Network (KAN) branch for nonlinear feature representation. We further introduce a task-oriented fusion block and a skip refinement module to better exploit hierarchical encoder–decoder features. KMP-UNet has a compact model size (about 1.0M parameters in our implementation). We evaluate the proposed method on four public datasets (ISIC2017, ISIC2018, CVC-ClinicDB, and BUSI) using standard segmentation metrics. On ISIC2018, KMP-UNet achieves 0.9038 DSC and 0.9600 accuracy under our protocol. Extensive comparisons and targeted ablations are conducted to analyze the contribution of each component.
… 3.4 HC-SS2D This paper designs the HC-SS2D module to … a range of other medical image segmentation approaches, … recognition in the medical image segmentation community. In …
BACKGROUND AND OBJECTIVE In recent years, deep U-shaped network architectures have been widely applied to medical image segmentation tasks, achieving notable successes. However, the inherent limitation of this architecture is that multiple down-sampling lead to significant loss of input image detail information. A series of improvements in skip connections designed to enhance information transfer have not fundamentally resolved the issue. Therefore, we consider retaining information in a simpler and more effective way. METHODS In this paper, we propose a novel shallow network, S-Net, which contains only two output resolution stages, allowing for the preservation of more detailed information from the input images. To address the challenge of shallow networks primarily relying on high-resolution feature maps as the main information flow, we propose a Global-Local Feature Fusion (GLFF) module at the network bottleneck layer. This module integrates the superior global contextual information extraction capabilities of Mamba with the local feature capturing abilities of multi-scale depthwise convolutions, enabling the extraction of crucial semantic features from high-resolution feature maps within a shallow network architecture, while maintaining a smaller model size. RESULTS Extensive experiments on four different types of medical image datasets show that S-Net achieves the best segmentation performance compared to existing models, with more refined segmentation details. For example, on ultrasound datasets (BUSI), the IOU is 2.95% higher and DICE is 2.27% higher than the second-best model. Additionally, S-Net has only 1.52M parameters, making it competitive in terms of lightweight design. CONCLUSIONS Comparative and ablation experiments demonstrate the efficiency of the proposed architecture and modules. It shows that we do not need many down-sampling operations to reduce the size of feature maps significantly. This work provides new research ideas for further improving the accuracy of medical image segmentation and expands the research direction for model lightweight design. The code will be available at: https://github.com/qinghua0715/S-Net.
… -Scan (Local-SS2D), integrating a local scanning strategy with 2D-Selective-Scan (SS2D). We … of our proposed MFEVM-UNet model in 2D medical segmentation tasks. The code can be …
SSRepVM-UNet: a lightweight hybrid model for medical image segmentation based on channel parallelism
… the SS-RepVM module and explore the application of a hybrid model that combines the lightweight convolutional RepViT module and SS2D in parallel for medical image segmentation. …
… In the discriminator, we introduce the cross-SS2D block to model global contextual relationships between real and fake target features. This design enhances the discriminator’s …
Traditional Few-Shot Medical Image Segmentation (FSMIS) methods primarily focus on mining information from support image to guide query image segmentation, while insufficiently …
… is optimized for computational efficiency and processing long sequences of feature … features in medical image processing. We propose HMM-SS2D, which combines the SS2D module …
… for dense medical image segmentation. Nevertheless, accurate segmentation in clinical … methods, confirming its effectiveness and practical value for medical image segmentation. …
… segmentation framework that augments the attention-based U-Net architecture with 2D-Selective-Scan (SS2D) … details, making it foundational in medical image segmentation [5]. Adding …
Accurate 3D medical image segmentation is crucial for diagnosis and treatment. Diffusion models demonstrate promising performance in medical image segmentation tasks due to the progressive nature of the generation process and the explicit modeling of data distributions. However, the weak guidance of conditional information and insufficient feature extraction in diffusion models lead to the loss of fine-grained features and structural consistency in the segmentation results, thereby affecting the accuracy of medical image segmentation. To address this challenge, we propose a Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation. We extract multilevel semantic features from the original images using an encoder and tightly integrate them with the denoising process of the diffusion model through a Semantic Hierarchical Embedding (SHE) mechanism, to capture the intricate relationship between the noisy label and image data. Meanwhile, we design a Global-Slice Perception Mamba (GSPM) layer, which integrates multi-dimensional perception mechanisms to endow the model with comprehensive spatial reasoning and feature extraction capabilities. Experimental results show that our proposed MambaDiff achieves more competitive performance compared to prior arts with substantially fewer parameters on four public medical image segmentation datasets including BraTS 2021, BraTS 2024, LiTS and MSD Hippocampus. The source code of our method is available at https://github.com/yuliu316316/MambaDiff
Accurate medical image segmentation requires effective modeling of both global anatomical structures and fine-grained boundary details. Recent state space models (e.g., Vision Mamba) offer efficient long-range dependency modeling. However, their one-dimensional serialization weakens local spatial continuity and high-frequency representation. To this end, we propose SpectralMamba-UNet, a novel frequency-disentangled framework to decouple the learning of structural and textural information in the spectral domain. Our Spectral Decomposition and Modeling (SDM) module applies discrete cosine transform to decompose low- and high-frequency features, where low frequency contributes to global contextual modeling via a frequency-domain Mamba and high frequency preserves boundary-sensitive details. To balance spectral contributions, we introduce a Spectral Channel Reweighting (SCR) mechanism to form channel-wise frequency-aware attention, and a Spectral-Guided Fusion (SGF) module to achieve adaptively multi-scale fusion in the decoder. Experiments on five public benchmarks demonstrate consistent improvements across diverse modalities and segmentation targets, validating the effectiveness and generalizability of our approach.
Accurate segmentation of 3D medical images such as MRI and CT is essential for clinical diagnosis and treatment planning. Foundation models like the Segment Anything Model (SAM) provide powerful general-purpose representations but struggle in medical imaging due to domain shift, their inherently 2D design, and the high computational cost of fine-tuning. To address these challenges, we propose Mamba-SAM, a novel and efficient hybrid architecture that combines a frozen SAM encoder with the linear-time efficiency and long-range modeling capabilities of Mamba-based State Space Models (SSMs). We investigate two parameter-efficient adaptation strategies. The first is a dual-branch architecture that explicitly fuses general features from a frozen SAM encoder with domain-specific representations learned by a trainable VMamba encoder using cross-attention. The second is an adapter-based approach that injects lightweight, 3D-aware Tri-Plane Mamba (TPMamba) modules into the frozen SAM ViT encoder to implicitly model volumetric context. Within this framework, we introduce Multi-Frequency Gated Convolution (MFGC), which enhances feature representation by jointly analyzing spatial and frequency-domain information via 3D discrete cosine transforms and adaptive gating. Extensive experiments on the ACDC cardiac MRI dataset demonstrate the effectiveness of the proposed methods. The dual-branch Mamba-SAM-Base model achieves a mean Dice score of 0.906, comparable to UNet++ (0.907), while outperforming all baselines on Myocardium (0.910) and Left Ventricle (0.971) segmentation. The adapter-based TP MFGC variant offers superior inference speed (4.77 FPS) with strong accuracy (0.880 Dice). These results show that hybridizing foundation models with efficient SSM-based architectures provides a practical and effective solution for 3D medical image segmentation.
We propose VL-DUN, a principled framework for joint All-in-One Medical Image Restoration and Segmentation (AiOMIRS) that bridges the gap between low-level signal recovery and high-level semantic understanding. While standard pipelines treat these tasks in isolation, our core insight is that they are fundamentally synergistic: restoration provides clean anatomical structures to improve segmentation, while semantic priors regularize the restoration process. VL-DUN resolves the sub-optimality of sequential processing through two primary innovations. (1) We formulate AiOMIRS as a unified optimization problem, deriving an interpretable joint unfolding mechanism where restoration and segmentation are mathematically coupled for mutual refinement. (2) We introduce a frequency-aware Mamba mechanism to capture long-range dependencies for global segmentation while preserving the high-frequency textures necessary for restoration. This allows for efficient global context modeling with linear complexity, effectively mitigating the spectral bias of standard architectures. As a pioneering work in the AiOMIRS task, VL-DUN establishes a new state-of-the-art across multi-modal benchmarks, improving PSNR by 0.92 dB and the Dice coefficient by 9.76\%. Our results demonstrate that joint collaborative learning offers a superior, more robust solution for complex clinical workflows compared to isolated task processing. The codes are provided in https://github.com/cipi666/VLDUN.
Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders that neglect the inherent volumetric nature of the data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised volumetric medical segmentation. TranSamba augments a standard Vision Transformer backbone with Cross-Plane Mamba blocks, which leverage the linear complexity of state space models for efficient information exchange across neighboring slices. The information exchange enhances the pairwise self-attention within slices computed by the Transformer blocks, directly contributing to the attention maps for object localization. TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume depth and maintains constant memory usage for batch processing. Extensive experiments on three datasets demonstrate that TranSamba establishes new state-of-the-art performance, consistently outperforming existing methods across diverse modalities and pathologies. Our source code and trained models are openly accessible at: https://github.com/YihengLyu/TranSamba.
Accurate organ and lesion segmentation is a critical prerequisite for computer-aided diagnosis. Convolutional Neural Networks (CNNs), constrained by their local receptive fields, often struggle to capture complex global anatomical structures. To tackle this challenge, this paper proposes a novel hybrid architecture, HyM-UNet, designed to synergize the local feature extraction capabilities of CNNs with the efficient global modeling capabilities of Mamba. Specifically, we design a Hierarchical Encoder that utilizes convolutional modules in the shallow stages to preserve high-frequency texture details, while introducing Visual Mamba modules in the deep stages to capture long-range semantic dependencies with linear complexity. To bridge the semantic gap between the encoder and the decoder, we propose a Mamba-Guided Fusion Skip Connection (MGF-Skip). This module leverages deep semantic features as gating signals to dynamically suppress background noise within shallow features, thereby enhancing the perception of ambiguous boundaries. We conduct extensive experiments on public benchmark dataset ISIC 2018. The results demonstrate that HyM-UNet significantly outperforms existing state-of-the-art methods in terms of Dice coefficient and IoU, while maintaining lower parameter counts and inference latency. This validates the effectiveness and robustness of the proposed method in handling medical segmentation tasks characterized by complex shapes and scale variations.
Medical image segmentation models struggle to achieve efficient global context modeling and long-range dependency reasoning under practical computational budgets. In this work, we propose a hybrid architecture utilizing U-Mamba with Heat Conduction Equation, which combines state-space modules for efficient long-range reasoning with Heat Conduction Operators (HCOs) in the bottleneck layers, simulating frequency-domain thermal diffusion for enhanced semantic abstraction. Experimental results show that our model attains the highest DSC (0.8719) on the Abdomen CT dataset. It suggests that blending state-space dynamics with heat-based global diffusion offers a scalable solution for medical segmentation tasks.
In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.
In the field of multi-organ medical image segmentation, recent methods frequently employ Transformers to capture long-range dependencies from image features. However, these methods overlook the high computational cost of Transformers and their deficiencies in extracting local detailed information. To address high computational costs and inadequate local detail information, we reassess the design of feature extraction modules and propose a new deep-learning network called LamFormer for fine-grained segmentation tasks across multiple organs. LamFormer is a novel U-shaped network that employs Linear Attention Mamba (LAM) in an enhanced pyramid encoder to capture multi-scale long-range dependencies. We construct the Parallel Hierarchical Feature Aggregation (PHFA) module to aggregate features from different layers of the encoder, narrowing the semantic gap among features while filtering information. Finally, we design the Reduced Transformer (RT), which utilizes a distinct computational approach to globally model up-sampled features. RRT enhances the extraction of detailed local information and improves the network's capability to capture long-range dependencies. LamFormer outperforms existing segmentation methods on seven complex and diverse datasets, demonstrating exceptional performance. Moreover, the proposed network achieves a balance between model performance and model complexity.
Effective preoperative planning requires accurate algorithms for segmenting anatomical structures across diverse datasets, but traditional models struggle with generalization. This study presents a novel machine learning methodology to improve algorithm generalization for 3D anatomical reconstruction beyond breast cancer applications. We processed 120 retrospective breast MRIs (January 2018-June 2023) through three phases: anonymization and manual segmentation of T1-weighted and dynamic contrast-enhanced sequences; co-registration and segmentation of whole breast, fibroglandular tissue, and tumors; and 3D visualization using ITK-SNAP. A human-in-the-loop approach refined segmentations using U-Mamba, designed to generalize across imaging scenarios. Dice similarity coefficient assessed overlap between automated segmentation and ground truth. Clinical relevance was evaluated through clinician and patient interviews. U-Mamba showed strong performance with DSC values of 0.97 ($\pm$0.013) for whole organs, 0.96 ($\pm$0.024) for fibroglandular tissue, and 0.82 ($\pm$0.12) for tumors on T1-weighted images. The model generated accurate 3D reconstructions enabling visualization of complex anatomical features. Clinician interviews indicated improved planning, intraoperative navigation, and decision support. Integration of 3D visualization enhanced patient education, communication, and understanding. This human-in-the-loop machine learning approach successfully generalizes algorithms for 3D reconstruction and anatomical segmentation across patient datasets, offering enhanced visualization for clinicians, improved preoperative planning, and more effective patient education, facilitating shared decision-making and empowering informed patient choices across medical applications.
Mitosis detection in histopathology images plays a key role in tumor assessment. Although machine learning algorithms could be exploited for aiding physicians in accurately performing such a task, these algorithms suffer from significative performance drop when evaluated on images coming from domains that are different from the training ones. In this work, we propose a Mamba-based approach for mitosis detection under domain shift, inspired by the promising performance demonstrated by Mamba in medical imaging segmentation tasks. Specifically, our approach exploits a VM-UNet architecture for carrying out the addressed task, as well as stain augmentation operations for further improving model robustness against domain shift. Our approach has been submitted to the track 1 of the MItosis DOmain Generalization (MIDOG) challenge. Preliminary experiments, conducted on the MIDOG++ dataset, show large room for improvement for the proposed method.
Precise lesion resection depends on accurately identifying fine-grained anatomical structures. While many coarse-grained segmentation (CGS) methods have been successful in large-scale segmentation (e.g., organs), they fall short in clinical scenarios requiring fine-grained segmentation (FGS), which remains challenging due to frequent individual variations in small-scale anatomical structures. Although recent Mamba-based models have advanced medical image segmentation, they often rely on fixed manually-defined scanning orders, which limit their adaptability to individual variations in FGS. To address this, we propose ASM-UNet, a novel Mamba-based architecture for FGS. It introduces adaptive scan scores to dynamically guide the scanning order, generated by combining group-level commonalities and individual-level variations. Experiments on two public datasets (ACDC and Synapse) and a newly proposed challenging biliary tract FGS dataset, namely BTMS, demonstrate that ASM-UNet achieves superior performance in both CGS and FGS tasks. Our code and dataset are available at https://github.com/YqunYang/ASM-UNet.
In recent years, artificial intelligence has significantly advanced medical image segmentation. Nonetheless, challenges remain, including efficient 3D medical image processing across diverse modalities and handling data variability. In this work, we introduce Hierarchical Soft Mixture-of-Experts (HoME), a two-level token-routing layer for efficient long-context modeling, specifically designed for 3D medical image segmentation. Built on the Mamba Selective State Space Model (SSM) backbone, HoME enhances sequential modeling through adaptive expert routing. In the first level, a Soft Mixture-of-Experts (SMoE) layer partitions input sequences into local groups, routing tokens to specialized per-group experts for localized feature extraction. The second level aggregates these outputs through a global SMoE layer, enabling cross-group information fusion and global context refinement. This hierarchical design, combining local expert routing with global expert refinement, enhances generalizability and segmentation performance, surpassing state-of-the-art results across datasets from the three most widely used 3D medical imaging modalities and varying data qualities. The code is publicly available at https://github.com/gmum/MambaHoME.
Recently, Mamba-based methods have become popular in medical image segmentation due to their lightweight design and long-range dependency modeling capabilities. However, current segmentation methods frequently encounter challenges in fetal ultrasound images, such as enclosed anatomical structures, blurred boundaries, and small anatomical structures. To address the need for balancing local feature extraction and global context modeling, we propose MS-UMamba, a novel hybrid convolutional-mamba model for fetal ultrasound image segmentation. Specifically, we design a visual state space block integrated with a CNN branch (SS-MCAT-SSM), which leverages Mamba's global modeling strengths and convolutional layers' local representation advantages to enhance feature learning. In addition, we also propose an efficient multi-scale feature fusion module that integrates spatial attention mechanisms, which Integrating feature information from different layers enhances the feature representation ability of the model. Finally, we conduct extensive experiments on a non-public dataset, experimental results demonstrate that MS-UMamba model has excellent performance in segmentation performance.
Accurate microscopic medical image segmentation plays a crucial role in diagnosing various cancerous cells and identifying tumors. Driven by advancements in deep learning, convolutional neural networks (CNNs) and transformer-based models have been extensively studied to enhance receptive fields and improve medical image segmentation task. However, they often struggle to capture complex cellular and tissue structures in challenging scenarios such as background clutter and object overlap. Moreover, their reliance on the availability of large datasets for improved performance, along with the high computational cost, limit their practicality. To address these issues, we propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features and offers both performance and computational efficiency. Specifically, we exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features to handle the blurred region boundaries (e.g., cell boundaries). These enriched features are input to a hybrid model that combines an Inception depth-wise convolution with a Mamba block, to maintain high efficiency and capture inherent variations in the scales and shapes of the regions of interest. These enriched features along with low-resolution features are fused to get the final segmentation mask. Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets (SegPC21 and GlaS) and two skin lesion segmentation datasets (ISIC2017 and ISIC2018), while reducing computational cost by about 5 times compared to the previous best performing method.
Accurate lesion segmentation in histopathology images is essential for diagnostic interpretation and quantitative analysis, yet it remains challenging due to the limited availability of costly pixel-level annotations. To address this, we propose FMaMIL, a novel two-stage framework for weakly supervised lesion segmentation based solely on image-level labels. In the first stage, a lightweight Mamba-based encoder is introduced to capture long-range dependencies across image patches under the MIL paradigm. To enhance spatial sensitivity and structural awareness, we design a learnable frequency-domain encoding module that supplements spatial-domain features with spectrum-based information. CAMs generated in this stage are used to guide segmentation training. In the second stage, we refine the initial pseudo labels via a CAM-guided soft-label supervision and a self-correction mechanism, enabling robust training even under label noise. Extensive experiments on both public and private histopathology datasets demonstrate that FMaMIL outperforms state-of-the-art weakly supervised methods without relying on pixel-level annotations, validating its effectiveness and potential for digital pathology applications.
Accurate 3D medical image segmentation demands architectures capable of reconciling global context modeling with spatial topology preservation. While State Space Models (SSMs) like Mamba show potential for sequence modeling, existing medical SSMs suffer from encoder-decoder incompatibility: the encoder's 1D sequence flattening compromises spatial structures, while conventional decoders fail to leverage Mamba's state propagation. We present DM-SegNet, a Dual-Mamba architecture integrating directional state transitions with anatomy-aware hierarchical decoding. The core innovations include a quadri-directional spatial Mamba module employing four-directional 3D scanning to maintain anatomical spatial coherence, a gated spatial convolution layer that enhances spatially sensitive feature representation prior to state modeling, and a Mamba-driven decoding framework enabling bidirectional state synchronization across scales. Extensive evaluation on two clinically significant benchmarks demonstrates the efficacy of DM-SegNet: achieving state-of-the-art Dice Similarity Coefficient (DSC) of 85.44% on the Synapse dataset for abdominal organ segmentation and 90.22% on the BraTS2023 dataset for brain tumor segmentation.
3D medical image segmentation is important for clinical diagnosis and treatment but faces challenges from high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. To alleviate these limitations, we propose TK-Mamba, a multimodal framework that fuses the linear-time Mamba with Kolmogorov-Arnold Networks (KAN) to form an efficient hybrid backbone. Our approach is characterized by two primary technical contributions. Firstly, we introduce the novel 3D-Group-Rational KAN (3D-GR-KAN), which marks the first application of KAN in 3D medical imaging, providing a superior and computationally efficient nonlinear feature transformation crucial for complex volumetric structures. Secondly, we devise a dual-branch text-driven strategy using Pubmedclip's embeddings. This strategy significantly enhances segmentation robustness and accuracy by simultaneously capturing inter-organ semantic relationships to mitigate label inconsistencies and aligning image features with anatomical texts. By combining this advanced backbone and vision-language knowledge, TK-Mamba offers a unified and scalable solution for both multi-organ and tumor segmentation. Experiments on multiple datasets demonstrate that our framework achieves state-of-the-art performance in both organ and tumor segmentation tasks, surpassing existing methods in both accuracy and efficiency. Our code is publicly available at https://github.com/yhy-whu/TK-Mamba
Convolutional neural networks (CNNs) and transformers are widely employed in constructing UNet architectures for medical image segmentation tasks. However, CNNs struggle to model long-range dependencies, while transformers suffer from quadratic computational complexity. Recently, Mamba, a type of State Space Models, has gained attention for its exceptional ability to model long-range interactions while maintaining linear computational complexity. Despite the emergence of several Mamba-based methods, they still present the following limitations: first, their network designs generally lack perceptual capabilities for the original input data; second, they primarily focus on capturing global information, while often neglecting local details. To address these challenges, we propose a prompt-guided CNN-Mamba dual-path UNet, termed PGM-UNet, for medical image segmentation. Specifically, we introduce a prompt-guided residual Mamba module that adaptively extracts dynamic visual prompts from the original input data, effectively guiding Mamba in capturing global information. Additionally, we design a local-global information fusion network, comprising a local information extraction module, a prompt-guided residual Mamba module, and a multi-focus attention fusion module, which effectively integrates local and global information. Furthermore, inspired by Kolmogorov-Arnold Networks (KANs), we develop a multi-scale information extraction module to capture richer contextual information without altering the resolution. We conduct extensive experiments on the ISIC-2017, ISIC-2018, DIAS, and DRIVE. The results demonstrate that the proposed method significantly outperforms state-of-the-art approaches in multiple medical image segmentation tasks.
Skin lesion segmentation is a critical challenge in computer vision, and it is essential to separate pathological features from healthy skin for diagnostics accurately. Traditional Convolutional Neural Networks (CNNs) are limited by narrow receptive fields, and Transformers face significant computational burdens. This paper presents a novel skin lesion segmentation framework, the Atrous Shifted Parallel Vision Mamba UNet (ASP-VMUNet), which integrates the efficient and scalable Mamba architecture to overcome limitations in traditional CNNs and computationally demanding Transformers. The framework introduces an atrous scan technique that minimizes background interference and expands the receptive field, enhancing Mamba's scanning capabilities. Additionally, the inclusion of a Parallel Vision Mamba (PVM) layer and a shift round operation optimizes feature segmentation and fosters rich inter-segment information exchange. A supplementary CNN branch with a Selective-Kernel (SK) Block further refines the segmentation by blending local and global contextual information. Tested on four benchmark datasets (ISIC16/17/18 and PH2), ASP-VMUNet demonstrates superior performance in skin lesion segmentation, validated by comprehensive ablation studies. This approach not only advances medical image segmentation but also highlights the benefits of hybrid architectures in medical imaging technology. Our code is available at https://github.com/BaoBao0926/ASP-VMUNet/tree/main.
Efficient evaluation of three-dimensional (3D) medical images is crucial for diagnostic and therapeutic practices in healthcare. Recent years have seen a substantial uptake in applying deep learning and computer vision to analyse and interpret medical images. Traditional approaches, such as convolutional neural networks (CNNs) and vision transformers (ViTs), face significant computational challenges, prompting the need for architectural advancements. Recent efforts have led to the introduction of novel architectures like the ``Mamba'' model as alternative solutions to traditional CNNs or ViTs. The Mamba model excels in the linear processing of one-dimensional data with low computational demands. However, Mamba's potential for 3D medical image analysis remains underexplored and could face significant computational challenges as the dimension increases. This manuscript presents MobileViM, a streamlined architecture for efficient segmentation of 3D medical images. In the MobileViM network, we invent a new dimension-independent mechanism and a dual-direction traversing approach to incorporate with a vision-Mamba-based framework. MobileViM also features a cross-scale bridging technique to improve efficiency and accuracy across various medical imaging modalities. With these enhancements, MobileViM achieves segmentation speeds exceeding 90 frames per second (FPS) on a single graphics processing unit (i.e., NVIDIA RTX 4090). This performance is over 24 FPS faster than the state-of-the-art deep learning models for processing 3D images with the same computational resources. In addition, experimental evaluations demonstrate that MobileViM delivers superior performance, with Dice similarity scores reaching 92.72%, 86.69%, 80.46%, and 77.43% for PENGWIN, BraTS2024, ATLAS, and Toothfairy2 datasets, respectively, which significantly surpasses existing models.
In this paper, we aim to address the unmet demand for automated prompting and enhanced human-model interactions of SAM and SAM2 for the sake of promoting their widespread clinical adoption. Specifically, we propose Proxy Prompt (PP), auto-generated by leveraging non-target data with a pre-annotated mask. We devise a novel 3-step context-selection strategy for adaptively selecting the most representative contextual information from non-target data via vision mamba and selective maps, empowering the guiding capability of non-target image-mask pairs for segmentation on target image/video data. To reinforce human-model interactions in PP, we further propose a contextual colorization module via a dual-reverse cross-attention to enhance interactions between target features and contextual-embedding with amplifying distinctive features of user-defined object(s). Via extensive evaluations, our method achieves state-of-the-art performance on four public datasets and yields comparable results with fully-trained models, even when trained with only 16 image masks.
Accurate segmentation of prostate cancer histopathology images is crucial for diagnosis and treatment planning. This study presents a comparative analysis of three deep learning-based methods, Mamba, SAM, and YOLO, for segmenting prostate cancer histopathology images. We evaluated the performance of these models on two comprehensive datasets, Gleason 2019 and SICAPv2, using Dice score, precision, and recall metrics. Our results show that the High-order Vision Mamba UNet (H-vmunet) model outperforms the other two models, achieving the highest scores across all metrics on both datasets. The H-vmunet model's advanced architecture, which integrates high-order visual state spaces and 2D-selective-scan operations, enables efficient and sensitive lesion detection across different scales. Our study demonstrates the potential of the H-vmunet model for clinical applications and highlights the importance of robust validation and comparison of deep learning-based methods for medical image analysis. The findings of this study contribute to the development of accurate and reliable computer-aided diagnosis systems for prostate cancer. The code is available at http://github.com/alibdz/prostate-segmentation.
In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. The hybrid mechanism of SSM (State Space Model) and Transformer, after meticulous design, can enhance its capability for efficient modeling of visual features. Extensive experiments have demonstrated that integrating the self-attention mechanism into the hybrid part behind the layers of Mamba's architecture can greatly improve the modeling capacity to capture long-range spatial dependencies. In this paper, leveraging the hybrid mechanism of SSM, we propose a U-shape architecture model for medical image segmentation, named Hybird Transformer vision Mamba UNet (HTM-UNet). We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-Larib PolypDB public datasets and ZD-LCI-GIM private dataset. The results indicate that HTM-UNet exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/simzhangbest/HMT-Unet.
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks and have demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, xLSTM-UNet we designed extend the success in biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. The code, models, and datasets are publicly available at http://tianrun-chen.github.io/xLSTM-UNet/
Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including organ segmentation and skin lesion, and conduct extensive experiments on Synapse, ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation.
Skin lesion segmentation is a critical task in computer-aided diagnosis systems for dermatological diseases. Accurate segmentation of skin lesions from medical images is essential for early detection, diagnosis, and treatment planning. In this paper, we propose a new model for skin lesion segmentation namely AC-MambaSeg, an enhanced model that has the hybrid CNN-Mamba backbone, and integrates advanced components such as Convolutional Block Attention Module (CBAM), Attention Gate, and Selective Kernel Bottleneck. AC-MambaSeg leverages the Vision Mamba framework for efficient feature extraction, while CBAM and Selective Kernel Bottleneck enhance its ability to focus on informative regions and suppress background noise. We evaluate the performance of AC-MambaSeg on diverse datasets of skin lesion images including ISIC-2018 and PH2; then compare it against existing segmentation methods. Our model shows promising potential for improving computer-aided diagnosis systems and facilitating early detection and treatment of dermatological diseases. Our source code will be made available at: https://github.com/vietthanh2710/AC-MambaSeg.
Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its variants, have demonstrated notable performance in the field of vision. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. Motivated by previous spatial and channel attention methods, we propose Triplet Mamba-UNet. The method leverages residual VSS Blocks to extract intensive contextual features, while Triplet SSM is employed to fuse features across spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets, demonstrating the superior segmentation performance of our proposed TM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters.
Integrating components from convolutional neural networks and state space models in medical image segmentation presents a compelling approach to enhance accuracy and efficiency. We introduce Mamba HUNet, a novel architecture tailored for robust and efficient segmentation tasks. Leveraging strengths from Mamba UNet and the lighter version of Hierarchical Upsampling Network (HUNet), Mamba HUNet combines convolutional neural networks local feature extraction power with state space models long range dependency modeling capabilities. We first converted HUNet into a lighter version, maintaining performance parity and then integrated this lighter HUNet into Mamba HUNet, further enhancing its efficiency. The architecture partitions input grayscale images into patches, transforming them into 1D sequences for processing efficiency akin to Vision Transformers and Mamba models. Through Visual State Space blocks and patch merging layers, hierarchical features are extracted while preserving spatial information. Experimental results on publicly available Magnetic Resonance Imaging scans, notably in Multiple Sclerosis lesion segmentation, demonstrate Mamba HUNet's effectiveness across diverse segmentation tasks. The model's robustness and flexibility underscore its potential in handling complex anatomical structures. These findings establish Mamba HUNet as a promising solution in advancing medical image segmentation, with implications for improving clinical decision making processes.
Detecting polyps through colonoscopy is an important task in medical image segmentation, which provides significant assistance and reference value for clinical surgery. However, accurate segmentation of polyps is a challenging task due to two main reasons. Firstly, polyps exhibit various shapes and colors. Secondly, the boundaries between polyps and their normal surroundings are often unclear. Additionally, significant differences between different datasets lead to limited generalization capabilities of existing methods. To address these issues, we propose a segmentation model based on Prompt-Mamba, which incorporates the latest Vision-Mamba and prompt technologies. Compared to previous models trained on the same dataset, our model not only maintains high segmentation accuracy on the validation part of the same dataset but also demonstrates superior accuracy on unseen datasets, exhibiting excellent generalization capabilities. Notably, we are the first to apply the Vision-Mamba architecture to polyp segmentation and the first to utilize prompt technology in a polyp segmentation model. Our model efficiently accomplishes segmentation tasks, surpassing previous state-of-the-art methods by an average of 5% across six datasets. Furthermore, we have developed multiple versions of our model with scaled parameter counts, achieving better performance than previous models even with fewer parameters. Our code and trained weights will be released soon.
In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.
Despite the remarkable success of the end-to-end paradigm in deep learning, it often suffers from slow convergence and heavy reliance on large-scale datasets, which fundamentally limits its efficiency and applicability in data-scarce domains such as medical imaging. In this work, we introduce the Predictive-Corrective (PC) paradigm, a framework that decouples the modeling task to fundamentally accelerate learning. Building upon this paradigm, we propose a novel network, termed PCMambaNet. PCMambaNet is composed of two synergistic modules. First, the Predictive Prior Module (PPM) generates a coarse approximation at low computational cost, thereby anchoring the search space. Specifically, the PPM leverages anatomical knowledge-bilateral symmetry-to predict a'focus map'of diagnostically relevant asymmetric regions. Next, the Corrective Residual Network (CRN) learns to model the residual error, focusing the network's full capacity on refining these challenging regions and delineating precise pathological boundaries. Extensive experiments on high-resolution brain MRI segmentation demonstrate that PCMambaNet achieves state-of-the-art accuracy while converging within only 1-5 epochs-a performance unattainable by conventional end-to-end models. This dramatic acceleration highlights that by explicitly incorporating domain knowledge to simplify the learning objective, PCMambaNet effectively mitigates data inefficiency and overfitting.
State space models (SSMs) reduce the quadratic complexity of transformers by leveraging linear recurrence. Recently, VMamba has emerged as a strong SSM-based vision backbone, yet remains bottlenecked by spatial redundancy in its four-directional scan. We propose QuarterMap, a post-training activation pruning method that removes redundant spatial activations before scanning and restores dimensions via nearest-neighbor upsampling. Our method improves throughput without retraining. On ImageNet-1K, QuarterMap achieves up to 11% speedup on VMamba with less than 0.9% accuracy drop, and yields similar gains on ADE20K segmentation. Beyond VMamba, we validate QuarterMap on MedMamba, a domain-specific model that shares the same four-directional scanning structure, where it consistently improves throughput while preserving accuracy across multiple medical imaging tasks. Compared to token merging methods like ToMe, QuarterMap is tailored for SSMs and avoids costly merge-unmerge operations. Our method offers a plug-and-play tool for deployment-time efficiency without compromising transferability.
State Space Models (SSMs) have recently demonstrated outstanding performance in long-sequence modeling, particularly in natural language processing. However, their direct application to medical image segmentation poses several challenges. SSMs, originally designed for 1D sequences, struggle with 3D spatial structures in medical images due to discontinuities introduced by flattening. Additionally, SSMs have difficulty fitting high-variance data, which is common in medical imaging. In this paper, we analyze the intrinsic limitations of SSMs in medical image segmentation and propose a unified U-shaped encoder-decoder architecture, Meta Mamba UNet (MM-UNet), designed to leverage the advantages of SSMs while mitigating their drawbacks. MM-UNet incorporates hybrid modules that integrate SSMs within residual connections, reducing variance and improving performance. Furthermore, we introduce a novel bi-directional scan order strategy to alleviate discontinuities when processing medical images. Extensive experiments on the AMOS2022 and Synapse datasets demonstrate the superiority of MM-UNet over state-of-the-art methods. MM-UNet achieves a Dice score of 91.0% on AMOS2022, surpassing nnUNet by 3.2%, and a Dice score of 87.1% on Synapse. These results confirm the effectiveness of integrating SSMs in medical image segmentation through architectural design optimizations.
The Transformer architecture has opened a new paradigm in the domain of deep learning with its ability to model long-range dependencies and capture global context and has outpaced the traditional Convolution Neural Networks (CNNs) in many aspects. However, applying Transformer models to 3D medical image datasets presents significant challenges due to their high training time, and memory requirements, which not only hinder scalability but also contribute to elevated CO$_2$ footprint. This has led to an exploration of alternative models that can maintain or even improve performance while being more efficient and environmentally sustainable. Recent advancements in Structured State Space Models (SSMs) effectively address some of the inherent limitations of Transformers, particularly their high memory and computational demands. Inspired by these advancements, we propose an efficient 3D segmentation model for medical imaging called SegResMamba, designed to reduce computation complexity, memory usage, training time, and environmental impact while maintaining high performance. Our model uses less than half the memory during training compared to other state-of-the-art (SOTA) architectures, achieving comparable performance with significantly reduced resource demands.
Medical image segmentation is a critical task in medical imaging analysis. Traditional CNN-based methods struggle with modeling long-range dependencies, while Transformer-based models, despite their success, suffer from quadratic computational complexity. To address these limitations, we propose KM-UNet, a novel U-shaped network architecture that combines the strengths of Kolmogorov-Arnold Networks (KANs) and state-space models (SSMs). KM-UNet leverages the Kolmogorov-Arnold representation theorem for efficient feature representation and SSMs for scalable long-range modeling, achieving a balance between accuracy and computational efficiency. We evaluate KM-UNet on five benchmark datasets: ISIC17, ISIC18, CVC, BUSI, and GLAS. Experimental results demonstrate that KM-UNet achieves competitive performance compared to state-of-the-art methods in medical image segmentation tasks. To the best of our knowledge, KM-UNet is the first medical image segmentation framework integrating KANs and SSMs. This work provides a valuable baseline and new insights for the development of more efficient and interpretable medical image segmentation systems. The code is open source at https://github.com/2760613195/KM_UNet Keywords:KAN,Manba, state-space models,UNet, Medical image segmentation, Deep learning
Recently, the field of 3D medical segmentation has been dominated by deep learning models employing Convolutional Neural Networks (CNNs) and Transformer-based architectures, each with their distinctive strengths and limitations. CNNs are constrained by a local receptive field, whereas transformers are hindered by their substantial memory requirements as well as they data hungriness, making them not ideal for processing 3D medical volumes at a fine-grained level. For these reasons, fully convolutional neural networks, as nnUNet, still dominate the scene when segmenting medical structures in 3D large medical volumes. Despite numerous advancements towards developing transformer variants with subquadratic time and memory complexity, these models still fall short in content-based reasoning. A recent breakthrough is Mamba, a Recurrent Neural Network (RNN) based on State Space Models (SSMs) outperforming Transformers in many long-context tasks (million-length sequences) on famous natural language processing and genomic benchmarks while keeping a linear complexity.
Deep learning, particularly convolutional neural networks (CNNs) and Transformers, has significantly advanced 3D medical image segmentation. While CNNs are highly effective at capturing local features, their limited receptive fields may hinder performance in complex clinical scenarios. In contrast, Transformers excel at modeling long-range dependencies but are computationally intensive, making them expensive to train and deploy. Recently, the Mamba architecture, based on the State Space Model (SSM), has been proposed to efficiently model long-range dependencies while maintaining linear computational complexity. However, its application in medical image segmentation reveals shortcomings, particularly in capturing critical local features essential for accurate delineation of clinical regions. In this study, we propose MambaClinix, a novel U-shaped architecture for medical image segmentation that integrates a hierarchical gated convolutional network(HGCN) with Mamba in an adaptive stage-wise framework. This design significantly enhances computational efficiency and high-order spatial interactions, enabling the model to effectively capture both proximal and distal relationships in medical images. Specifically, our HGCN is designed to mimic the attention mechanism of Transformers by a purely convolutional structure, facilitating high-order spatial interactions in feature maps while avoiding the computational complexity typically associated with Transformer-based methods. Additionally, we introduce a region-specific Tversky loss, which emphasizes specific pixel regions to improve auto-segmentation performance, thereby optimizing the model's decision-making process. Experimental results on five benchmark datasets demonstrate that the proposed MambaClinix achieves high segmentation accuracy while maintaining low model complexity.
Deep learning has revolutionized medical imaging by providing innovative solutions to complex healthcare challenges. Traditional models often struggle to dynamically adjust feature importance, resulting in suboptimal representation, particularly in tasks like semantic segmentation crucial for accurate structure delineation. Moreover, their static nature incurs high computational costs. To tackle these issues, we introduce Mamba-Ahnet, a novel integration of State Space Model (SSM) and Advanced Hierarchical Network (AHNet) within the MAMBA framework, specifically tailored for semantic segmentation in medical imaging.Mamba-Ahnet combines SSM's feature extraction and comprehension with AHNet's attention mechanisms and image reconstruction, aiming to enhance segmentation accuracy and robustness. By dissecting images into patches and refining feature comprehension through self-attention mechanisms, the approach significantly improves feature resolution. Integration of AHNet into the MAMBA framework further enhances segmentation performance by selectively amplifying informative regions and facilitating the learning of rich hierarchical representations. Evaluation on the Universal Lesion Segmentation dataset demonstrates superior performance compared to state-of-the-art techniques, with notable metrics such as a Dice similarity coefficient of approximately 98% and an Intersection over Union of about 83%. These results underscore the potential of our methodology to enhance diagnostic accuracy, treatment planning, and ultimately, patient outcomes in clinical practice. By addressing the limitations of traditional models and leveraging the power of deep learning, our approach represents a significant step forward in advancing medical imaging technology.
UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Transformer architectures. Building upon this, we employ Mamba as a lightweight substitute for CNN and Transformer within UNet, aiming at tackling challenges stemming from computational resource limitations in real medical settings. To this end, we introduce the Lightweight Mamba UNet (LightM-UNet) that integrates Mamba and UNet in a lightweight framework. Specifically, LightM-UNet leverages the Residual Vision Mamba Layer in a pure Mamba fashion to extract deep semantic features and model long-range spatial dependencies, with linear computational complexity. Extensive experiments conducted on two real-world 2D/3D datasets demonstrate that LightM-UNet surpasses existing state-of-the-art literature. Notably, when compared to the renowned nnU-Net, LightM-UNet achieves superior segmentation performance while drastically reducing parameter and computation costs by 116x and 21x, respectively. This highlights the potential of Mamba in facilitating model lightweighting. Our code implementation is publicly available at https://github.com/MrBlankness/LightM-UNet.
In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently modeling long-range dependencies within medical images, which is a critical aspect for precise segmentation. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information with enhanced computational efficiency as a State Space Model (SSM), we propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability. Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network. This design facilitates a comprehensive feature learning process, capturing intricate details and broader semantic contexts within medical images. We introduce a novel integration mechanism within the VMamba blocks to ensure seamless connectivity and information flow between the encoder and decoder paths, enhancing the segmentation performance. We conducted experiments on publicly available ACDC MRI Cardiac segmentation dataset, and Synapse CT Abdomen segmentation dataset. The results show that Mamba-UNet outperforms several types of UNet in medical image segmentation under the same hyper-parameter setting. The source code and baseline implementations are available.
Recent advancements in medical imaging have resulted in more complex and diverse images, with challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise. Traditional segmentation methods struggle to address these challenges, making deep learning approaches, particularly U-shaped architectures, increasingly prominent. However, the quadratic complexity of standard self-attention makes Transformers computationally prohibitive for high-resolution images. To address these challenges, we propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy through its innovative combination of linear attention and Mamba-inspired adaptive mechanisms, complemented by an efficient symmetric sampling structure for enhanced feature processing. Our architecture effectively preserves essential spatial features while capturing long-range dependencies at reduced computational complexity. Additionally, we introduce a novel sampling strategy for multi-scale feature fusion. Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%. These results underscore the superiority of MLLA-UNet over existing methods. Our contributions include the novel 2D segmentation architecture and its empirical validation. The code is available via https://github.com/csyfjiang/MLLA-UNet.
Medical image segmentation is increasingly reliant on deep learning techniques, yet the promising performance often come with high annotation costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework that leverages the capabilities of Convolutional Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual Mamba (VMamba) architecture for medical image segmentation, especially when dealing with scribble-based annotations. The proposed WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling. The key concept of this framework is a collaborative and cross-supervisory mechanism that employs pseudo labels to facilitate iterative learning and refinement across the networks. The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed scribble annotations, where it surpasses the performance of a similar WSL framework utilizing only UNet or SwinUNet. This highlights its potential in scenarios with sparse or imprecise annotations. The source code is made publicly accessible.
Early detection of skin abnormalities plays a crucial role in diagnosing and treating skin cancer. Segmentation of affected skin regions using AI-powered devices is relatively common and supports the diagnostic process. However, achieving high performance remains a significant challenge due to the need for high-resolution images and the often unclear boundaries of individual lesions. At the same time, medical devices require segmentation models to have a small memory foot-print and low computational cost. Based on these requirements, we introduce a novel lightweight model called MambaU-Lite, which combines the strengths of Mamba and CNN architectures, featuring just over 400K parameters and a computational cost of more than 1G flops. To enhance both global context and local feature extraction, we propose the P-Mamba block, a novel component that incorporates VSS blocks along-side multiple pooling layers, enabling the model to effectively learn multiscale features and enhance segmentation performance. We evaluate the model's performance on two skin datasets, ISIC2018 and PH2, yielding promising results. Our source code will be made publicly available at: https://github.com/nqnguyen812/MambaU-Lite.
While numerous architectures for medical image segmentation have been proposed, achieving competitive performance with state-of-the-art models networks such as nnUNet, still leave room for further innovation. In this work, we introduce nnUZoo, an open source benchmarking framework built upon nnUNet, which incorporates various deep learning architectures, including CNNs, Transformers, and Mamba-based models. Using this framework, we provide a fair comparison to demystify performance claims across different medical image segmentation tasks. Additionally, in an effort to enrich the benchmarking, we explored five new architectures based on Mamba and Transformers, collectively named X2Net, and integrated them into nnUZoo for further evaluation. The proposed models combine the features of conventional U2Net, nnUNet, CNN, Transformer, and Mamba layers and architectures, called X2Net (UNETR2Net (UNETR), SwT2Net (SwinTransformer), SS2D2Net (SwinUMamba), Alt1DM2Net (LightUMamba), and MambaND2Net (MambaND)). We extensively evaluate the performance of different models on six diverse medical image segmentation datasets, including microscopy, ultrasound, CT, MRI, and PET, covering various body parts, organs, and labels. We compare their performance, in terms of dice score and computational efficiency, against their baseline models, U2Net, and nnUNet. CNN models like nnUNet and U2Net demonstrated both speed and accuracy, making them effective choices for medical image segmentation tasks. Transformer-based models, while promising for certain imaging modalities, exhibited high computational costs. Proposed Mamba-based X2Net architecture (SS2D2Net) achieved competitive accuracy with no significantly difference from nnUNet and U2Net, while using fewer parameters. However, they required significantly longer training time, highlighting a trade-off between model efficiency and computational cost.
当前医学图像领域Mamba研究已形成四大核心逻辑:一是以混合架构应对全局与局部信息的平衡;二是以纯视觉SSM探索架构性能边界;三是以针对3D体数据的扫描策略优化解决高维连续性建模;四是以轻量化设计与特定范式(SAM适配、弱监督、少样本)满足多样化临床应用需求。此外,研究界正通过系统性对比评估,逐步明确Mamba相对于传统架构在不同医学场景下的实际效能与性能上限。