多模态工业异常检测

多源模态融合与3D几何特征增强

该组研究侧重于RGB图像与3D点云、深度图或表面法向量的深度整合。通过特征级融合、双向重建、跨模态蒸馏及频率对齐等机制,利用模态间的互补性捕捉微小形变或复杂空间结构的异常,解决单一模态信息不足的问题。

基于视觉语言模型(VLM)的零样本与少样本检测

此类文献利用预训练模型(如CLIP)的跨模态对齐能力,通过提示工程(Prompt Engineering)、多尺度感知、属性感知或特征解耦技术,在无需或仅需极少量目标数据训练的情况下,实现工业缺陷的快速分类与定位。

多模态大模型(MLLM)驱动的逻辑推理与可解释性检测

该组研究探索利用大语言模型(LLM)或多模态大模型(如GPT-4V, InternVL)进行端到端异常分析。通过引入思维链(CoT)、多智能体协作(Multi-agent)或检索增强生成(RAG),模型不仅能定位异常,还能提供逻辑解释和缺陷描述,处理复杂的逻辑异常。

  • Towards Training-free Anomaly Detection with Vision and Language Foundation ModelsJinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang, 2025, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly DetectionYuhao Chao, Jie Liu, Jie Tang, Gangshan Wu, 2025, ArXiv
  • OmniAD: Detect and Understand Industrial Anomaly via Multimodal ReasoningShifang Zhao, Yiheng Lin, Lu Han, Yao Zhao, Yunchao Wei, 2025, ArXiv
  • IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context LearningMengyang Zhao, Teng Fu, Haiyang Yu, Ke Niu, Bin Li, 2025, ArXiv
  • The Amazon Nova Family of Models: Technical Report and Model CardAmazon Agi, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, A. Sethi, A. Komma, A. Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarshjit Singh, Aditya Rawal, Adok Achar Budihal Prasad, A. D. Gispert, Agni Kumar, Aishwarya Aryamane, A. Nair, M. Akilan, Akshaya Iyengar, A. Shanbhogue, A. He, Alessandra Cervone, A. Loeb, Alex L. Zhang, A. Fu, Alexander Lisnichenko, Alexander Zhipa, Alexandros Potamianos, Ali Kebarighotbi, A. Daronkolaei, Alok Parmesh, Amanjot Kaur Samra, Ameen Khan, A. Rez, Amir Saffari, Amit Agarwalla, Amit Jhindal, A. Mamidala, Ammar Asmro, A. Ballakur, Anand Mishra, A. Sridharan, Anastasiia Dubinina, A. Lenz, Andreas Doerr, Andrew Keating, Andrew Leaver, Andrew K Smith, A. Wirth, A. Davey, Andrew Rosenbaum, Andrew Sohn, A. Chan, Aniket Chakrabarti, Anil Ramakrishna, Anirban Roy, A. Iyer, Anjali Narayan-Chen, Ankith Yennu, Anna Dąbrowska, Anna Gawlowska, Anna Rumshisky, Anna Turek, Anoop Deoras, Anton Bezruchkin, A. Prasad, Anupam Dewan, A. Kiran, Apoorv Gupta, A.G. Galstyan, Aravind Manoharan, Arijit Biswas, Arindam Mandal, Arpit Gupta, Arsamkhan Pathan, A. Nagarajan, A. Rajasekaram, A. Sundararajan, Ashwin Ganesan, Ashwin Swaminathan, Athanasios Mouchtaris, Audrey Champeau, Avik Ray, Ayush Jaiswal, Ayushi Sharma, Bailey Keefer, Balamurugan Muthiah, Beatriz Leon-Millan, B. Koopman, Benny Li, Benjamin Biggs, Benjámin Ott, B. Vinzamuri, B. Venkatesh, Bhavana Ganesh, Bhoomit Vasani, Bill Byrne, Bill Hsu, Bincheng Wang, B. King, Blazej Gorny, Bo Feng, Bo Zheng, Bodhisattwa Paul, Bo Sun, Bofeng Luo, Bowen Chen, Bowen Xie, Bo Yu, Brendan Jugan, Brett Panosh, B. Collins, Brian Thompson, Can Karakus, Can Liu, Carl Lambrecht, Carly Lin, Carolyn Wang, C. Yuan, Casey Loyda, Cezary Walczak, Chalapathi Choppa, C. Prakash, Chankrisna Richy Meas, Charith Peris, Charles Recaido, Charlie Xu, Charul Sharma, Chase Kernan, C. Thanapirom, Chengwei Su, Chenhao Xu, Chenhao Yin, Chentao Ye, Chenyang Tao, Chethan Parameshwara, Ching-Yun Chang, Chong Li, Chris Hench, Chris Tran, Christophe Dupuy, Christopher Davis, Chris DiPersio, Christos Christodoulopoulos, Christy Li, Chun Chen, Claudio Delli Bovi, Clement Chung, Cole Hawkins, C. Harris, Corey Ropell, Cynthia He, DK Joo, Dae Yon Hwang, Dan Rosén, D. Elkind, Daniel Pressel, Daniel T. Zhang, D. Kimball, Daniil Sorokin, Dave Goodell, Davide Modolo, Dawei Zhu, D. Suresh, Deepti Ragha, D. Filimonov, Denis Foo Kune, Denis Romasanta Rodriguez, Devamanyu Hazarika, Dhananjay Ram, Dhawal Parkar, Dhawal Patel, D. Desai, D. Rajput, Disha Sule, D. Singh, Dmitriy Genzel, Dolly Goldenberg, Dongyi He, Dumitru Hanciu, Dushan Tharmal, Dzmitry Siankovich, Edi Cikovic, E. Abraham, Ekraam Sabir, E. Olson, Emmett Steven, Emre Barut, Eric Jackson, Ethan Wu, Evelyn Chen, Ezhilan Mahalingam, Fabian Triefenbach, Fan Yang, Fangyu Liu, Fan Wu, Faraz Tavakoli, Farhad Khozeimeh, Feiyang Niu, F. Hieber, Feng Li, Firat Elbey, F. Krebs, F. Saupe, Florian Sprunken, Frank Fan, F. Khan, Gabriela De Vincenzo, Gagandeep Kang, George Ding, G. He, G. Yeung, Ghada Qaddoumi, Giannis Karamanolakis, Goeric Huybrechts, Gokul Maddali, Gonzalo Iglesias, Gordon McShane, Gozde Sahin, Guangtai Huang, Gukyeong Kwon, Gunnar Sigurdsson, Gurpreet Chadha, Gururaj Kosuru, Hagen Fuerstenau, Hah Hah, H. Maideen, Hajime Hosokawa, Han Liu, Han-Kai Hsu, Han Wang, Hao Li, Hao Yang, Hao Zhu, Haozheng Fan, Harman M. Singh, H. Kaluvala, H. Saeed, He Xie, Helian Feng, Hendrix Luo, Hengzhi Pei, H. Nielsen, H. Ilati, Himanshu Patel, Hongshan Li, Hongzhou Lin, Hussain Raza, Ian Cullinan, I. Kiss, Inbarasan Thangamani, Indrayani Fadnavis, I. Sorodoc, Irem Ertuerk, Iryna Yemialyanava, I. Soni, Ismail Jelal, I. Tse, Jack G. M. Fitzgerald, Jack Zhao, Jackson Rothgeb, Jacky Lee, Jake Jung, Jakub Dębski, J. Tomczak, James Jeun, James R. Sanders, J. Crowley, Jay Lee, Jayakrishna Anvesh Paidy, J. Tiwari, J. Farmer, Jeff Solinsky, Jenna Lau, Jeremy Savareese, Jerzy Zagorski, Jiawei Dai, Jiachen Gu, Jiahui Li, Jian Zheng, Jianhua Lu, Jianhua Wang, Jiawei Dai, Jiawei Mo, Jiaxi Xu, Jie Liang, Jie Yang, J. Logan, Jimit Majmudar, Jing Liu, J. Miao, Jingru Yi, Jingyang Jin, Jiun-Yu Kao, Jixuan Wang, Jiyang Wang, J. Pemberton, Joel Carlson, J. Blundell, John Chin-Jew, John He, Jonathan Ho, Jonathan Hueser, Jonathan Lunt, Jooyoung Lee, Joshua Z. Tan, Joyjit Chatterjee, Judith Gaspers, Jue Wang, Jun Fang, Jun Tang, Jun Wan, Jun Wu, Junle Wang, Junyi Shi, Justin Chiu, Justin Satriano, Justin Yee, J. Dhamala, J. Bansal, Kai Zhen, Kai-Wei Chang, Kaixiang Lin, K. Raman, Kanthashree Mysore Sathyendra, Karabo Moroe, Karan Bhandarkar, Karan Kothari, Karolina Owczarzak, Karthick Gopalswamy, K. Ravi, Karthik Ramakrishnan, Karthika Arumugam, Kartik Mehta, Katarzyna Konczalska, Kavya Ravikumar, K. Tran, Ke Qin, Kelin Li, K. Li, Ketan Kulkarni, K. Rodrigues, K. Patel, Khadige Abboud, K. Hajebi, K. Reiter, K. Schultz, Krishna Anisetty, Krishna Kotnana, Kristen Li, Kruthi Channamallikarjuna, Krzysztof Jakubczyk, Kuba Pierewoj, Kunal Pal, K. Srivastav, Kyle Bannerman, Lahari Poddar, Lakshmi Prasad, L. Tseng, L. Naik, L. C. Vankadara, Lenon Minorics, Leo Liu, Leonard Lausen, Leonardo F. R. Ribeiro, Li Zhang, Lili Gehorsam, L. Qi, Lisa Bauer, Lori Knapp, Lu Zeng, L. Tong, Lulu Wong, Luoxin Chen, M. Rudnicki, Mahdi Namazifar, Mahesh Jaliminche, Maira Ladeira Tanke, Manas Gupta, Mandeep Ahlawat, M. Khanuja, Mani Sundaram, M. Leyk, M. Momotko, Markus Boese, Markus Dreyer, Markus Mueller, M. Fu, M. G'orski, Mateusz Mastalerczyk, Matias Mora, Matt Johnson, M. Scott, Matthew Wen, Max Barysau, Maya Boumerdassi, Maya Krishnan, Mayank Gupta, Maya Hirani, Mayank Kulkarni, Meganathan Narayanasamy, M. Bradford, Melanie Gens, Melissa P. Burke, Meng Jin, Miao Chen, Michael J. Denkowski, Michael Heymel, Michael Krestyaninov, Michal Obirek, Michalina Wichorowska, M. Miotk, Milosz Watroba, Mingyi Hong, Mingzhi Yu, Miranda Liu, Mohamed Gouda, Mohammad El-Shabani, Mohammad Ghavamzadeh, Mohit Bansal, Morteza Ziyadi, Nan Xia, Nathan Susanj, Nav Bhasin, N. Goswami, Nehal Belgamwar, Nicolas Anastassacos, N. Bergeron, Nidhi Jain, Nihal Jain, Niharika Chopparapu, N. Xu, N. Strom, Nikolaos Malandrakis, Nimisha Mishra, Ninad Parkhi, Ninareh Mehrabi, Nishita Sant, Nishtha Gupta, Nitesh Sekhar, Nithin Rajeev, Nithish Raja Chidambaram, N. Dhar, Noor Bhagwagar, Noy Konforty, Omar Babu, Omid Razavi, Orchid Majumder, O. Dar, O. Hsu, Pablo Kvitca, Pallavi Pandey, Parker Seegmiller, Patrick Lange, Paul J. Ferraro, Payal Motwani, P. Kharazmi, Peifeng Wang, Pengfei Liu, Peter Bradtke, Peter Gotz, Peter Zhou, Pichao Wang, Piotr Poskart, Pooja Sonawane, Pradeep Natarajan, Pradyun Ramadorai, Pralam Shah, Prasad M. Nirantar, Prasanthi Chavali, Prashan Wanigasekara, Prashant Saraf, Prashun Dey, P. Pant, P. Pradhan, Preya Patel, Priyanka Dadlani, Prudhvee Narasimha Sadha, Qi Dong, Qian Hu, Qiaozi Gao, Qing Liu, Quinn Lam, Quynh Do, R. Manmatha, Rachel Willis, Rafael Liu, Rafal Ellert, Rafal Kalinski, Rafi Al Attrach, Ragha Prasad, R. Prasad, Raguvir Kunani, Rahul Gupta, Rahul Sharma, 2025, ArXiv
  • Intern-S1: A Scientific Multimodal Foundation ModelLei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kaiming Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqing Dong, Peijie Dong, Shi-Hua Dou, Si-na Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyang Gao, Yang Gao, Zhangwei Gao, Jiaye Ge, Qiming Ge, Lixin Gu, Yuzhe Gu, Aijia Guo, Qipeng Guo, Xu Guo, Conghui He, Junjun He, Yili Hong, Siyuan Hou, Caiyu Hu, Han-Hwa Hu, Jucheng Hu, Mingxue Hu, Zhouqi Hua, Haian Huang, Junhao Huang, Xuantuo Huang, Zixian Huang, Zhe Jiang, Lingkai Kong, Linyang Li, Peijin Li, Pengze Li, Shuaibin Li, Tian-Xin Li, Wei Li, Yuqiang Li, Tianyi Liang, Dahua Lin, Junyao Lin, Tianyi Lin, Zhishan Lin, Hong-wei Liu, Jiangning Liu, Jiyao Liu, Jun'nan Liu, Kaiwen Liu, Kaiwen Liu, Kuikun Liu, Shichun Liu, Shi Yuan Liu, Shudong Liu, Shudong Liu, Xinyao Liu, Yuhong Liu, Zhan Liu, Yinquan Lu, Haijun Lv, Hong Lv, Huijie Lv, Qitan Lv, Ying Lv, Chengqi Lyu, Chenglong Ma, Jian-Kai Ma, Ren Ma, Runmin Ma, Runyuan Ma, Xinzhu Ma, Yi-dan Ma, Zihan Ma, Sixuan Mi, Junzhi Ning (Raymond) Ning, Wenchang Ning, Xinle Pang, Jiahui Peng, Runyu Peng, Yu Qiao, Jia-Ming Qiu, Xiaoye Qu, Yuanbin Qu, Yuchen Ren, Fukai Shang, Wenqi Shao, Junhao Shen, Shuaike Shen, Shuaike Shen, Demin Song, Diping Song, Chenlin Su, Weijie Su, Weigao Sun, Yu Sun, Qian Tan, Cheng Tang, Huanze Tang, K. Tang, Shixiang Tang, Jian Tong, Aoran Wang, Bin Wang, Dong Wang, Lintao Wang, Rui Wang, Weiyun Wang, Wenhai Wang, Jiaqi Wang, Yi Wang, Ziyi Wang, Ling-I Wu, Wen Wu, Yue Wu, Zijian Wu, Li-Yi Xiao, Shu-Qiao Xing, Chao Xu, Huihui Xu, Jun Xu, Rui Xu, Wanghan Xu, Ganlin Yang, Yuming Yang, Hao-nan Ye, Jin Ye, Shenglong Ye, Jia Yu, Jiashuo Yu, Jing Yu, Fei Yuan, Yu Zang, Bo Zhang, ChaoBin Zhang, Chen Zhang, Hongjie Zhang, Jin Zhang, Qiao-xuan Zhang, Qiuyinzhe Zhang, Songyang Zhang, Taolin Zhang, Wenlong Zhang, Wenwei Zhang, Yechen Zhang, Ziyang Zhang, Haiteng Zhao, Qian Zhao, Xiangyu Zhao, Bowen Zhou, Dongzhan Zhou, Peiheng Zhou, Yuhao Zhou, Yun-Yi Zhou, Dongsheng Zhu, Lin Zhu, Yi Zou, 2025, ArXiv
  • Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly DetectionWeichao Cai, Weiliang Huang, Yunkang Cao, Chao Huang, Fei Yuan, Bob Zhang, Jie Wen, 2025, No journal
  • Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language ModelsJiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi, 2025, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language ModelsSassan Mokhtar, Arian Mousakhan, Silvio Galesso, Jawad Tayyub, Thomas Brox, 2025, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • EMIT: Enhancing MLLMs for Industrial Anomaly Detection via Difficulty-Aware GRPOWei Guan, Jun Lan, Jian Cao, Hao Tan, Huijia Zhu, Weiqiang Wang, 2025, ArXiv
  • LogicAD: Explainable Anomaly Detection via VLM-based Text Feature ExtractionEr Jin, Qihui Feng, Yongli Mou, Stefan Decker, G. Lakemeyer, Oliver Simons, Johannes Stegmaier, 2025, ArXiv
  • PB-IAD: Utilizing multimodal foundation models for semantic industrial anomaly detection in dynamic manufacturing environmentsBernd Hofmann, Albert Scheck, Joerg Franke, Patrick Bruendl, 2025, ArXiv
  • AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly DetectionJunwen Miao, Penghui Du, Yi Liu, Yu Wang, Yan Wang, 2025, ArXiv
  • IAD-GPT: Advancing Visual Knowledge in Multimodal Large Language Model for Industrial Anomaly DetectionZewen Li, Zitong Yu, Qilang Ye, Weicheng Xie, Wei Zhuo, Linlin Shen, 2025, IEEE Transactions on Instrumentation and Measurement
  • Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?Zhiling Chen, Hanning Chen, Mohsen Imani, Farhad Imani, 2025, ArXiv
  • Think-to-Detect: Rationale-Driven Vision–Language Anomaly DetectionMahmoud Abdalla, M. Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Abdelrahman Abdallah, Hyun-Soo Kang, 2025, Mathematics
  • LR-IAD: Mask-Free Industrial Anomaly Detection with Logical ReasoningPeijian Zeng, Feiyan Pang, Zhanbo Wang, Aimin Yang, 2025, 2025 IEEE International Conference on Data Mining (ICDM)
  • Zero-Shot Anomaly Detection in Laser Powder Bed Fusion Using Multimodal RAG and Large Language ModelsKiarash Naghavi Khanghah, Zhiling Chen, Lela Romeo, Qian Yang, R. Malhotra, Farhad Imani, Hongyi Xu, 2025, Journal of Mechanical Design
  • MALM-CLIP: A generative multi-agent framework for multimodal fusion in few-shot industrial anomaly detectionHanzhi Chen, Jingbin Que, Kexin Zhu, Zhide Chen, F. Zhu, Wencheng Yang, Xu Yang, Xuechao Yang, 2025, Inf. Fusion
  • ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detectionMingyu Lee, Jongwon Choi, 2026, Machine Vision and Applications

前沿架构探索:Mamba、扩散模型与高效微调

这些研究引入了如状态空间模型(Mamba)以提升长序列处理效率,或利用扩散模型(Diffusion Models)的生成能力捕获复杂语义。同时涵盖了针对工业基础模型的高效微调(PEFT)和跨领域自适应方法。

工业实战应用:数字孪生、具身智能与鲁棒性提升

关注实际部署挑战,包括背景干扰消除、模态缺失的鲁棒性处理,以及集成数字孪生、AR、机器人平台的自动化检测系统。同时包含针对电网、PCBA、光伏等特定行业的定制化方案与数据集构建。

多模态工业异常检测

多模态工业异常检测正经历从“感知融合”到“认知推理”的范式转移。研究重点已从单纯的RGB-D特征重建,转向利用VLM/MLLM实现零样本泛化与可解释性逻辑分析。新型架构如Mamba和扩散模型的引入进一步提升了检测效率与生成质量,而数字孪生与具身智能的集成则标志着该技术正加速向自动化产线的实战部署跨越。

105 篇文献,5 个研究方向
多源模态融合与3D几何特征增强
该组研究侧重于RGB图像与3D点云、深度图或表面法向量的深度整合。通过特征级融合、双向重建、跨模态蒸馏及频率对齐等机制,利用模态间的互补性捕捉微小形变或复杂空间结构的异常,解决单一模态信息不足的问题。相关文献: Chunshui Wang et. al, 2026 等 25 篇文献
基于视觉语言模型(VLM)的零样本与少样本检测
此类文献利用预训练模型(如CLIP)的跨模态对齐能力,通过提示工程(Prompt Engineering)、多尺度感知、属性感知或特征解耦技术,在无需或仅需极少量目标数据训练的情况下,实现工业缺陷的快速分类与定位。相关文献: Xurui Li et. al, 2025 等 29 篇文献
多模态大模型(MLLM)驱动的逻辑推理与可解释性检测
该组研究探索利用大语言模型(LLM)或多模态大模型(如GPT-4V, InternVL)进行端到端异常分析。通过引入思维链(CoT)、多智能体协作(Multi-agent)或检索增强生成(RAG),模型不仅能定位异常,还能提供逻辑解释和缺陷描述,处理复杂的逻辑异常。相关文献: Jinjin Zhang et. al, 2025 等 20 篇文献
前沿架构探索:Mamba、扩散模型与高效微调
这些研究引入了如状态空间模型(Mamba)以提升长序列处理效率,或利用扩散模型(Diffusion Models)的生成能力捕获复杂语义。同时涵盖了针对工业基础模型的高效微调(PEFT)和跨领域自适应方法。相关文献: Guo Zhao et. al, 2025 等 8 篇文献
工业实战应用:数字孪生、具身智能与鲁棒性提升
关注实际部署挑战,包括背景干扰消除、模态缺失的鲁棒性处理,以及集成数字孪生、AR、机器人平台的自动化检测系统。同时包含针对电网、PCBA、光伏等特定行业的定制化方案与数据集构建。相关文献: GiBeom Kim et. al, 2025 等 23 篇文献