罕见肿瘤AI辅助诊断的可解释性评估框架构建与临床信任度量体系
XAI技术与可解释机制在影像AI诊断中的落地(LIME/Grad-CAM/SHAP等)
共同点是将典型XAI方法(如LIME、Grad-CAM、SHAP、saliency/CAM等)直接用于医学影像任务的解释生成,并讨论解释如何呈现以支撑诊断理解与信任构建;多数聚焦于局部热图/特征归因与解释可读性。
- Explainable deep learning framework for brain tumor detection: Integrating LIME, Grad-CAM, and SHAP for enhanced accuracy.(A. Akgündoğdu, Şerife Çelikbaş, 2025, Medical Engineering & Physics)
- Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems(Lorenzo Famiglini, Andrea Campagner, M. Barandas, Giovanni Andrea La Maida, Enrico Gallazzi, Federico Cabitza, 2024, Computers in Biology and Medicine)
- How the different explanation classes impact trust calibration: The case of clinical decision support systems(Mohammad Naiseh, Dena Al-Thani, Nan Jiang, Raian Ali, 2022, International Journal of Human-Computer Studies)
- Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis(Mozhgan Salimparsa, K. Sedig, Dan Lizotte, Sheikh S. Abdullah, Niaz Chalabianloo, F. Muanda, 2025, Informatics)
- Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI.(Drew Prinster, Amama Mahmood, S. Saria, Jean Jeudy, Cheng Ting Lin, Paul H. Yi, Chien-Ming Huang, 2024, Radiology)
- Explainable Artificial Intelligence in Medical Imaging: A Case Study on Enhancing Lung Cancer Detection through CT Images(T. R. Noviandy, A. Maulana, T. Zulfikar, A. Rusyana, S. Enitan, R. Idroes, 2024, Indonesian Journal of Case Reports)
- Explainable AI for lung cancer detection via a custom CNN on CT images(Mohamed Hammad, Mohammed Elaffendi, A. El-latif, A. A. Ateya, Gauhar Ali, Paweł Pławiak, 2025, Scientific Reports)
- An Explainable AI System for Medical Image Segmentation With Preserved Local Resolution: Mammogram Tumor Segmentation(Aya Farrag, Gad Gad, Z. Fadlullah, Mostafa M. Fouda, Maazen Alsabaan, 2023, IEEE Access)
- Explainable AI in medical imaging: an interpretable and collaborative federated learning model for brain tumor classification(Qurat-ul-ain Mastoi, Shahid Latif, S. Brohi, Jawad Ahmad, Abdulmajeed Alqhatani, Mohammed S. Alshehri, Alanoud Al Mazroa, Rahmat Ullah, 2025, Frontiers in Oncology)
- Explainable AI for Retinoblastoma Diagnosis: Interpreting Deep Learning Models with LIME and SHAP(C. Chiesa-Estomba, M. Graña, Bader Aldughayfiq, Farzeen Ashfaq, Noor Zaman Jhanjhi, M. Humayun, 2023, Diagnostics)
- Machine learning explainability in nasopharyngeal cancer survival using LIME and SHAP(R. O. Alabi, M. Elmusrati, I. Leivo, A. Almangush, A. Mäkitie, 2023, Scientific Reports)
- Preliminary analysis of explainable machine learning methods for multiple myeloma chemotherapy treatment recognition(Nesma Settouti, M. Saidi, 2023, Evolutionary Intelligence)
- Bridging the Gap Between Black Box AI and Clinical Practice: Advancing Explainable AI for Trust, Ethics, and Personalized Healthcare Diagnostics(Đặng Anh Tuấn, 2024, Preprints.org)
- Explainability and Trust in Deep Learning for Cancer Imaging: Systematic Barriers, Clinical Misalignment, and a Translational Roadmap(Surekha Borra, Nilanjan Dey, Simon Fong, R. Sherratt, Fuqian Shi, 2026, Cancers)
XAI解释类型的证据化评估:解释粒度/语义/配色与人群表现
这组论文以“解释怎么设计才对用户有用”为核心,强调实验/临床或用户研究来量化不同解释类型(局部vs全局、低层vs高层、语义/配色等)对诊断准确性、效率、主观效用与信任校准的影响,并提出循证证据层级框架。
- Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems(Lorenzo Famiglini, Andrea Campagner, M. Barandas, Giovanni Andrea La Maida, Enrico Gallazzi, Federico Cabitza, 2024, Computers in Biology and Medicine)
- Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI.(Drew Prinster, Amama Mahmood, S. Saria, Jean Jeudy, Cheng Ting Lin, Paul H. Yi, Chien-Ming Huang, 2024, Radiology)
- Meaningful Explanation Effect on User’s Trust in an AI Medical System: Designing Explanations for Non-Expert Users(Retno Larasati, Anna De Liddo, Enrico Motta, 2023, ACM Transactions on Interactive Intelligent Systems)
- How the different explanation classes impact trust calibration: The case of clinical decision support systems(Mohammad Naiseh, Dena Al-Thani, Nan Jiang, Raian Ali, 2022, International Journal of Human-Computer Studies)
- Evidence-based XAI of clinical decision support systems for differential diagnosis: Design, implementation, and evaluation(Y. Miyachi, O. Ishii, K. Torigoe, 2024, medRxiv)
- The Impact of Explainable AI On EHR-Based Clinical Risk Prediction: A Quantitative Evaluation of Transparency and Diagnostic Accuracy(Hisham Mahmood, Khairum Nahar Pinky, 2024, International Journal of Scientific Interdisciplinary Research)
临床CDSS中的可信度与信任校准测量:沟通差距、采用/依赖与临床风险
共同点是从“临床互动情境”出发评估解释对信任、采用行为和潜在偏差(如确认偏差、自动化偏差)影响;并关注信任不仅来自可解释性,还涉及不确定性、工作流嵌入、责任与临床采用指标。
- Assessing the communication gap between AI models and healthcare professionals: Explainability, utility and trust in AI-driven clinical decision-making(Oskar Wysocki, J. Davies, Markel Vigo, A. Armstrong, Dónal Landers, Rebecca Lee, A. Freitas, 2022, Artificial Intelligence)
- Artificial intelligence and clinical decision support: clinicians’ perspectives on trust, trustworthiness, and liability(Caroline Jones, James Thornton, J. Wyatt, 2023, Medical Law Review)
- Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis(Mozhgan Salimparsa, K. Sedig, Dan Lizotte, Sheikh S. Abdullah, Niaz Chalabianloo, F. Muanda, 2025, Informatics)
- Explainable AI in healthcare: Factors influencing medical practitioners' trust calibration in collaborative tasks(Mahdieh Darvish, J. Holst, Markus Bick, 2024, Proceedings of the Annual Hawaii International Conference on System Sciences)
- Explainable AI for lung cancer detection via a custom CNN on CT images(Mohamed Hammad, Mohammed Elaffendi, A. El-latif, A. A. Ateya, Gauhar Ali, Paweł Pławiak, 2025, Scientific Reports)
- The Trust-Aware XAI (TAXAI) framework: a quantitative model for interpretable and reliable clinical AI systems(M. Pal, H. Saha, Amlan Chakrabarti, 2026, Scientific Reports)
- Explainable recommendation: when design meets trust calibration(Mohammad Naiseh, Dena Al-Thani, Nan Jiang, Raian Ali, 2021, World Wide Web)
- Nudging through Friction: An Approach for Calibrating Trust in Explainable AI(Mohammad Naiseh, Reem S. Al-Mansoori, Dena Al-Thani, Nan Jiang, Raian Ali, 2021, 2021 8th International Conference on Behavioral and Social Computing (BESC))
- Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis(Mozhgan Salimparsa, K. Sedig, Dan Lizotte, Sheikh S. Abdullah, Niaz Chalabianloo, F. Muanda, 2025, Informatics)
- Clinician-informed XAI evaluation checklist with metrics (CLIX-M) for AI-powered clinical decision support systems(A. Brankovic, David Cook, Jessica Rahman, Alana Delaforce, Jane Li, Farah Magrabi, F. Cabitza, Enrico W. Coiera, Danakai Bradford, 2025, npj Digital Medicine)
- Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis(Mozhgan Salimparsa, K. Sedig, Dan Lizotte, Sheikh S. Abdullah, Niaz Chalabianloo, F. Muanda, 2025, Informatics)
可解释性与不确定性/可靠性工程:忠诚度、稳定性、校准与鲁棒性
共同点是把解释的“可信”落到工程与统计度量上:忠诚度(fidelity)、稳定性(stability)、一致性(consistency)、概率可靠性/校准(calibration)、以及对训练分布外样本的谨慎处理/标记(outlier flagging)等,强调解释需具备可验证可靠性。
- Explainable deep learning framework for brain tumor detection: Integrating LIME, Grad-CAM, and SHAP for enhanced accuracy.(A. Akgündoğdu, Şerife Çelikbaş, 2025, Medical Engineering & Physics)
- The Impact of Explainable AI On EHR-Based Clinical Risk Prediction: A Quantitative Evaluation of Transparency and Diagnostic Accuracy(Hisham Mahmood, Khairum Nahar Pinky, 2024, International Journal of Scientific Interdisciplinary Research)
- Explainability and Trust in Deep Learning for Cancer Imaging: Systematic Barriers, Clinical Misalignment, and a Translational Roadmap(Surekha Borra, Nilanjan Dey, Simon Fong, R. Sherratt, Fuqian Shi, 2026, Cancers)
- Cautious Artificial Intelligence Improves Outcomes and Trust by Flagging Outlier Cases.(Abhiraj S Kanse, N. Kurian, H. P. Aswani, Z. Khan, P. Gann, S. Rane, A. Sethi, 2022, JCO Clinical Cancer Informatics)
- Deep learning referral suggestion and tumour discrimination using explainable artificial intelligence applied to multiparametric MRI(Hyungseob Shin, Ji Eun Park, Yohan Jun, Taejoon Eo, J. Lee, Ji Eun Kim, Da Hyun Lee, H. Moon, S. Park, Seonok Kim, D. Hwang, H. Kim, 2023, European Radiology)
解释与性能协同的训练/建模框架(两阶段训练、联合优化、个体化解释)
共同点是围绕“如何让解释不仅是事后展示而是与模型训练/整体管线协同”来构建框架:例如两阶段训练用解释mask增强数据、用解释机制同时提升性能与透明性、以及在稀有肿瘤/个体化预测中通过可解释模块支撑可用决策。
- Explainable deep learning framework for brain tumor detection: Integrating LIME, Grad-CAM, and SHAP for enhanced accuracy.(A. Akgündoğdu, Şerife Çelikbaş, 2025, Medical Engineering & Physics)
- Personalized health monitoring using explainable AI: bridging trust in predictive healthcare(M. S. Vani, R. V. Sudhakar, A. Mahendar, Sukanya Ledalla, Marepalli. Radha, M. Sunitha, 2025, Scientific Reports)
- MOSAIC: An Artificial Intelligence–Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers(S. D'amico, Lorenzo Dall'olio, C. Rollo, Patricia Alonso, Iñigo Prada-Luengo, Daniele Dall’Olio, C. Sala, Elisabetta Sauta, G. Asti, L. Lanino, G. Maggioni, Alessia Campagna, Elena Zazzetti, Mattia Delleani, M. Bicchieri, Pierandrea Morandini, V. Savevski, Borja Arroyo, J. Parras, Lin-Pierre Zhao, U. Platzbecker, M. Díez-Campelo, V. Santini, P. Fenaux, T. Haferlach, Anders Krogh, S. Zazo, P. Fariselli, T. Sanavia, M. D. Della Porta, G. Castellani, 2024, JCO Clinical Cancer Informatics)
- Explainable Machine Learning for the Early Clinical Detection of Ovarian Cancer Using Contrastive Explanations(Z. Kucukakcali, I. Cicek, S. Akbulut, 2025, Journal of Clinical Medicine)
- Machine learning explainability in nasopharyngeal cancer survival using LIME and SHAP(R. O. Alabi, M. Elmusrati, I. Leivo, A. Almangush, A. Mäkitie, 2023, Scientific Reports)
- AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines(D. J. Spaanderman, Matthew Marzetti, Xinyi Wan, A. Scarsbrook, Philip Robinson, E. Oei, Jacob J. Visser, R. Hemke, K. van Langevelde, D. Hanff, G. V. van Leenders, Cornelis Verhoef, D. Grünhagen, W. Niessen, Stefan Klein, M. Starmans, 2024, eBioMedicine)
面向罕见/异质肿瘤的可解释AI扩展:泛化、迁移与隐私保护(含联邦学习)
共同点是针对罕见肿瘤场景的特殊挑战(小样本、异质性、跨站域偏移、外部验证不足),并讨论通过联邦/分布式学习与结构化评估来提升可解释性与临床转化可行性,强调泛化与合规边界。
- Artificial Intelligence and Machine Learning in Pediatric Endocrine Tumors: Opportunities, Pitfalls, and a Roadmap for Trustworthy Clinical Translation(Michaela Kuhlen, Fabio Hellmann, Elisabeth Pfaehler, Elisabeth André, Antje Redlich, 2026, Biomedicines)
- MOSAIC: An Artificial Intelligence–Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers(S. D'amico, Lorenzo Dall'olio, C. Rollo, Patricia Alonso, Iñigo Prada-Luengo, Daniele Dall’Olio, C. Sala, Elisabetta Sauta, G. Asti, L. Lanino, G. Maggioni, Alessia Campagna, Elena Zazzetti, Mattia Delleani, M. Bicchieri, Pierandrea Morandini, V. Savevski, Borja Arroyo, J. Parras, Lin-Pierre Zhao, U. Platzbecker, M. Díez-Campelo, V. Santini, P. Fenaux, T. Haferlach, Anders Krogh, S. Zazo, P. Fariselli, T. Sanavia, M. D. Della Porta, G. Castellani, 2024, JCO Clinical Cancer Informatics)
- Explainable AI in medical imaging: an interpretable and collaborative federated learning model for brain tumor classification(Qurat-ul-ain Mastoi, Shahid Latif, S. Brohi, Jawad Ahmad, Abdulmajeed Alqhatani, Mohammed S. Alshehri, Alanoud Al Mazroa, Rahmat Ullah, 2025, Frontiers in Oncology)
- Bridging the Gap Between Black Box AI and Clinical Practice: Advancing Explainable AI for Trust, Ethics, and Personalized Healthcare Diagnostics(Đặng Anh Tuấn, 2024, Preprints.org)
- AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines(D. J. Spaanderman, Matthew Marzetti, Xinyi Wan, A. Scarsbrook, Philip Robinson, E. Oei, Jacob J. Visser, R. Hemke, K. van Langevelde, D. Hanff, G. V. van Leenders, Cornelis Verhoef, D. Grünhagen, W. Niessen, Stefan Klein, M. Starmans, 2024, eBioMedicine)
整体文献可归纳为五条并行主线:①在影像诊断中落地的XAI方法体系(LIME/Grad-CAM/SHAP等);②面向临床用户的“解释类型/呈现方式”证据化评估;③从真实CDSS互动出发的信任度量与校准(沟通差距、自动化偏差、采用与确认偏差);④解释可靠性工程(忠诚度、稳定性、一致性、校准、分布外谨慎性);⑤将解释与建模框架协同,面向罕见肿瘤的泛化、个体化与隐私保护(含联邦学习/多模态与转化路线)。
总计32篇相关文献
… range of common and rare diseases (Supplementary Table 2… LRP) is a method for explainable AI that redistributes the output … haemorrhage [19] or brain tumours [20] or the classification …
PURPOSE Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)–based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities. METHODS We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure. RESULTS UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models. CONCLUSION MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.
Deep learning (DL) has transformed cancer imaging by enabling automated tumour detection, classification, and risk prediction. Despite impressive diagnostic performance, limited explainability and poor uncertainty calibration continue to restrict clinical integration. This review is guided by five research questions that examine the challenges, impact, and translational implications of explainable artificial intelligence (XAI) in oncology imaging. We identify key barriers to trust, including dataset bias, shortcut learning, opacity of convolutional neural networks, and workflow misalignment. Evidence suggests that explainable models can increase clinician confidence, reduce false positives, and improve collaborative decision-making when explanations are faithful, semantically meaningful, and uncertainty aware. We evaluate architectural strategies that embed interpretability such as concept-bottleneck models, prototype-based learning, and attention regularization along with post hoc techniques. Beyond performance metrics, we examine how interpretable AI aligns with clinical reasoning processes and analyse regulatory, ethical, and medico-legal considerations influencing deployment. The findings indicate that explainability alone is insufficient, durable trust requires epistemic alignment, prospective validation, lifecycle governance, and equity-focused evaluation. By reframing explainability as a structural design principle rather than a supplementary feature, this review outlines a pathway toward accountable and clinically dependable AI systems in oncology.
Abstract Artificial intelligence (AI) could revolutionise health care, potentially improving clinician decision making and patient safety, and reducing the impact of workforce shortages. However, policymakers and regulators have concerns over whether AI and clinical decision support systems (CDSSs) are trusted by stakeholders, and indeed whether they are worthy of trust. Yet, what is meant by trust and trustworthiness is often implicit, and it may not be clear who or what is being trusted. We address these lacunae, focusing largely on the perspective(s) of clinicians on trust and trustworthiness in AI and CDSSs. Empirical studies suggest that clinicians’ concerns about their use include the accuracy of advice given and potential legal liability if harm to a patient occurs. Onora O’Neill’s conceptualisation of trust and trustworthiness provides the framework for our analysis, generating a productive understanding of clinicians’ reported trust issues. Through unpacking these concepts, we gain greater clarity over the meaning ascribed to them by stakeholders; delimit the extent to which stakeholders are talking at cross purposes; and promote the continued utility of trust and trustworthiness as useful concepts in current debates around the use of AI and CDSSs.
Artificial intelligence (AI) and machine learning (ML) are reshaping cancer research and care. In pediatric oncology, early evidence-most robust in imaging-suggests value for diagnosis, risk stratification, and assessment of treatment response. Pediatric endocrine tumors are rare and heterogeneous, including intra- and extra-adrenal paraganglioma (PGL), adrenocortical tumors (ACT), differentiated and medullary thyroid carcinoma (DTC/MTC), and gastroenteropancreatic neuroendocrine neoplasms (GEP-NEN). Here, we provide a pediatric-first, entity-structured synthesis of AI/ML applications in endocrine tumors, paired with a methods-for-clinicians primer and a pediatric endocrine tumor guardrails checklist mapped to contemporary reporting/evaluation standards. We also outline a realistic EU-anchored roadmap for translation that leverages existing infrastructures (EXPeRT, ERN PaedCan). We find promising-yet preliminary-signals for early non-remission/recurrence modeling in pediatric DTC and interpretable survival prediction in pediatric ACT. For PGL and GEP-NEN, evidence remains adult-led (biochemical ML screening scores; CT/PET radiomics for metastatic risk or peptide receptor radionuclide therapy response) and serves primarily as methodological scaffolding for pediatrics. Cross-cutting insights include the centrality of calibration and validation hierarchy and the current limits of explainability (radiomics texture semantics; saliency ≠ mechanism). Translation is constrained by small datasets, domain shift across age groups and sites, limited external validation, and evolving regulatory expectations. We close with pragmatic, clinically anchored steps-benchmarks, multi-site pediatric validation, genotype-aware evaluation, and equity monitoring-to accelerate safe, equitable adoption in pediatric endocrine oncology.
PURPOSE Artificial intelligence (AI) models for medical image diagnosis are often trained and validated on curated data. However, in a clinical setting, images that are outliers with respect to the training data, such as those representing rare disease conditions or acquired using a slightly different setup, can lead to wrong decisions. It is not practical to expect clinicians to be trained to discount results for such outlier images. Toward clinical deployment, we have designed a method to train cautious AI that can automatically flag outlier cases. MATERIALS AND METHODS Our method-ClassClust-forms tight clusters of training images using supervised contrastive learning, which helps it identify outliers during testing. We compared ClassClust's ability to detect outliers with three competing methods on four publicly available data sets covering pathology, dermatoscopy, and radiology. We held out certain diseases, artifacts, and types of images from training data and examined the ability of various models to detect these as outliers during testing. We compared the decision accuracy of the models on held-out nonoutlier images also. We visualized the regions of the images that the models used for their decisions. RESULTS Area under receiver operating characteristic curve for outlier detection was consistently higher using ClassClust compared with the previous methods. Average accuracy on held-out nonoutlier images was also higher, and the visualizations of image regions were more informative using ClassClust. CONCLUSION The ability to flag outlier test cases need not be at odds with the ability to accurately classify nonoutliers in AI models. Although the latter capability has received research and regulatory attention, AI models for clinical deployment should possess the former as well.
Summary Background Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review aims to provide an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. Methods The systematic review identified literature from several bibliographic databases, covering papers published before 17/07/2024. Original research published in peer-reviewed journals, focused on radiology-based AI for diagnosis or prognosis of primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers to determine eligibility. Included papers were assessed against the two guidelines by one of three independent reviewers. The review protocol was registered with PROSPERO (CRD42023467970). Findings The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9 ± 7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1 ± 2.1 out of 30. Interpretation Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. defining unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. building on previous work, training with data that reflect real-world usage, explainability), evaluation (e.g. ensuring biases are evaluated and addressed, evaluating AI against current best practices), and the awareness of data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods. Funding 10.13039/501100023452Hanarth Fonds, ICAI Lab, 10.13039/501100000272NIHR, EuCanImage.
Simple Summary For automated cancer diagnosis on medical imaging, explainable artificial intelligence technology uses advanced image analysis methods like deep learning to make a diagnosis and analyze medical images, as well as provide a clear explanation for how it arrived at its diagnosis. The objective of XAI is to provide patients and doctors with a better understanding of the system’s decision-making process and to increase transparency and trust in the diagnosis method. The manual classification of cancer using medical images is a tedious and tiresome process, which necessitates the design of automated tools for the decision-making process. In this study, we explored the significant application of explainable artificial intelligence and an ensemble of deep-learning models for automated cancer diagnosis. To demonstrate the enhanced performance of the proposed model, a widespread comparison study is made with recent models, and the results exhibit the significance of the proposed model on benchmark test images. Therefore, the proposed model has the potential as an automated, accurate, and rapid tool for supporting the detection and classification process of cancer. Abstract Explainable Artificial Intelligence (XAI) is a branch of AI that mainly focuses on developing systems that provide understandable and clear explanations for their decisions. In the context of cancer diagnoses on medical imaging, an XAI technology uses advanced image analysis methods like deep learning (DL) to make a diagnosis and analyze medical images, as well as provide a clear explanation for how it arrived at its diagnoses. This includes highlighting specific areas of the image that the system recognized as indicative of cancer while also providing data on the fundamental AI algorithm and decision-making process used. The objective of XAI is to provide patients and doctors with a better understanding of the system’s decision-making process and to increase transparency and trust in the diagnosis method. Therefore, this study develops an Adaptive Aquila Optimizer with Explainable Artificial Intelligence Enabled Cancer Diagnosis (AAOXAI-CD) technique on Medical Imaging. The proposed AAOXAI-CD technique intends to accomplish the effectual colorectal and osteosarcoma cancer classification process. To achieve this, the AAOXAI-CD technique initially employs the Faster SqueezeNet model for feature vector generation. As well, the hyperparameter tuning of the Faster SqueezeNet model takes place with the use of the AAO algorithm. For cancer classification, the majority weighted voting ensemble model with three DL classifiers, namely recurrent neural network (RNN), gated recurrent unit (GRU), and bidirectional long short-term memory (BiLSTM). Furthermore, the AAOXAI-CD technique combines the XAI approach LIME for better understanding and explainability of the black-box method for accurate cancer detection. The simulation evaluation of the AAOXAI-CD methodology can be tested on medical cancer imaging databases, and the outcomes ensured the auspicious outcome of the AAOXAI-CD methodology than other current approaches.
This study tackles the pressing challenge of lung cancer detection, the foremost cause of cancer-related mortality worldwide, hindered by late detection and diagnostic limitations. Aiming to improve early detection rates and diagnostic reliability, we propose an approach integrating Deep Convolutional Neural Networks (DCNN) with Explainable Artificial Intelligence (XAI) techniques, specifically focusing on the Residual Network (ResNet) architecture and Gradient-weighted Class Activation Mapping (Grad-CAM). Utilizing a dataset of 1,000 CT scans, categorized into normal, non-cancerous, and three types of lung cancer images, we adapted the ResNet50 model through transfer learning and fine-tuning for enhanced specificity in lung cancer subtype detection. Our methodology demonstrated the modified ResNet50 model's effectiveness, significantly outperforming the original architecture in accuracy (91.11%), precision (91.66%), sensitivity (91.11%), specificity (96.63%), and F1-score (91.10%). The inclusion of Grad-CAM provided insightful visual explanations for the model's predictions, fostering transparency and trust in computer-assisted diagnostics. The study highlights the potential of combining DCNN with XAI to advance lung cancer detection, suggesting future research should expand dataset diversity and explore multimodal data integration for broader applicability and improved diagnostic capabilities.
Medical image segmentation aims to identify important or suspicious regions within medical images. However, many challenges are usually faced while developing networks for this type of analysis. First, preserving the original image resolution is of utmost importance for this task where identifying subtle features or abnormalities can significantly impact the accuracy of diagnosis. While introducing the dilated convolution improves the resolution of the convolutional neural network (CNN), it is not without shortcoming, i.e., the loss of local spatial resolution due to increased kernel sparsity in checkboard patterns. To address this shortcoming, we conceptualize a double-dilated convolution module for maintaining local spatial resolution while improving the receptive field size. Then, this approach is applied, as a proof-of-work, to tumor segmentation task in mammograms. In addition, our proposal also tackles the class imbalance problem, originating at the pixel level of the mammogram screenings, by identifying and selecting the best candidate among a number of potential loss functions to facilitate mass segmentation. We also carry out quantitative and qualitative evaluations of the interpretability of our proposal by leveraging Grad-CAM (Gradient weighted Class Activation Map). We also present a comparative performance evaluation with existing explainable techniques tailored for segmenting images. Moreover, an empirical assessment on lesion segmentation is conducted on mammogram samples from the INBreast dataset, both with and without incorporating our envisaged dilation module into CNN. The obtained results elucidate the effectiveness of our proposal based on mass segmentation performance measures, such as Dice similarity and Miss Detection rate. Our analysis also promotes using the Tversky Loss function in training pixel-imbalanced data and integrating Grad-CAM for explaining image segmentation results.
Introduction A brain tumor is a collection of abnormal cells in the brain that can become life-threatening due to its ability to spread. Therefore, a prompt and meticulous classification of the brain tumor is an essential element in healthcare care. Magnetic Resonance Imaging (MRI) is the central resource for producing high-quality images of soft tissue and is considered the principal technology for diagnosing brain tumors. Recently, computer vision techniques such as deep learning (DL) have played an important role in the classification of brain tumors, most of which use traditional centralized classification models, which face significant challenges due to the insufficient availability of diverse and representative datasets and exacerbate the difficulties in obtaining a transparent model. This study proposes a collaborative federated learning model (CFLM) with explainable artificial intelligence (XAI) to mitigate existing problems using state-of-the-art methods. Methods The proposed method addresses four class classification problems to identify glioma, meningioma, no tumor, and pituitary tumors. We have integrated GoogLeNet with a federated learning (FL) framework to facilitate collaborative learning on multiple devices to maintain the privacy of sensitive information locally. Moreover, this study also focuses on the interpretability to make the model transparent using Gradient-weighted class activation mapping (Grad-CAM) and saliency map visualizations. Results In total, 10 clients were selected for the proposed model with 50 communication rounds, each with decentralized local datasets for training. The proposed approach achieves 94% classification accuracy. Moreover, we incorporate Grad-CAM with heat maps and saliency maps to offer interpretability and meaningful graphical interpretations for healthcare specialists. Conclusion This study outlines an efficient and interpretable model for brain tumor classification by introducing an integrated technique using FL with GoogLeNet architecture. The proposed framework has great potential to improve brain tumor classification to make them more reliable and transparent for clinical use.
Lung cancer, which claims 1.8 million lives annually, is still one of the leading causes of cancer-related deaths globally. Patients with lung cancer frequently have a bad prognosis because of late-stage detection, which severely limits treatment options and decreases survival rates. Early detection is essential for better outcomes, but traditional CT image analysis is time-consuming, prone to error, and relies on subjective judgments. To overcome these issues, we propose a custom convolutional neural network (CNN) combined with explainable AI (XAI) techniques, particularly gradient-weighted class activation mapping (Grad-CAM). This approach is intended to reliably classify lung cancer into squamous cell carcinoma, large cell carcinoma, or adenocarcinoma. Unlike conventional methods, our approach not only achieves highly accurate classification of lung cancer subtypes but also incorporates clinically validated interpretability features to ensure alignment with medical diagnostics. Our model trained on a comprehensive dataset of CT images achieved an overall accuracy of 93.06%. This performance demonstrates the model’s robustness in detecting even subtle malignancies, with strong precision, recall, and F1-scores across all cancer types. Including interpretable Grad-CAM visualizations ensures reliability and transparency, aiding clinicians in understanding the model’s predictions. This innovative method demonstrates the potential to revolutionize early lung cancer detection and improve patient survival rates by combining state-of-the-art accuracy with explainability tailored for clinical application.
… Requirements of the XAI for CDSS and features of the XAI model are … and XAI performance of the models. Results With the effect of “Selecting,” the surrogate model’s prediction and XAI …
This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
The rapid growth of clinical explainable AI (XAI) models raised concerns over unclear purposes and false hope regarding explanations. Currently, no standardised metrics exist for XAI evaluation. We developed a clinician-informed, 14-item checklist including clinical, machine and decision attributes. This is the first step toward XAI standardisation and transparent reporting XAI methods to enhance trust, reduce risks, foster AI adoption, and improve decisions to determine the true clinical potential of applied XAI.
While Artificial Intelligence (AI) promises significant enhancements for Clinical Decision Support Systems (CDSSs), the opacity of many AI models remains a major barrier to clinical adoption, primarily due to interpretability and trust challenges. Explainable AI (XAI) seeks to bridge this gap by making model reasoning understandable to clinicians, but technical XAI solutions have too often failed to address real-world clinician needs, workflow integration, and usability concerns. This study synthesizes persistent challenges in applying XAI to CDSS—including mismatched explanation methods, suboptimal interface designs, and insufficient evaluation practices—and proposes a structured, user-centered framework to guide more effective and trustworthy XAI-CDSS development. Drawing on a comprehensive literature review, we detail a three-phase framework encompassing user-centered XAI method selection, interface co-design, and iterative evaluation and refinement. We demonstrate its application through a retrospective case study analysis of a published XAI-CDSS for sepsis care. Our synthesis highlights the importance of aligning XAI with clinical workflows, supporting calibrated trust, and deploying robust evaluation methodologies that capture real-world clinician–AI interaction patterns, such as negotiation. The case analysis shows how the framework can systematically identify and address user-centric gaps, leading to better workflow integration, tailored explanations, and more usable interfaces. We conclude that achieving trustworthy and clinically useful XAI-CDSS requires a fundamentally user-centered approach; our framework offers actionable guidance for creating explainable, usable, and trusted AI systems in healthcare.
Nasopharyngeal cancer (NPC) has a unique histopathology compared with other head and neck cancers. Individual NPC patients may attain different outcomes. This study aims to build a prognostic system by combining a highly accurate machine learning model (ML) model with explainable artificial intelligence to stratify NPC patients into low and high chance of survival groups. Explainability is provided using Local Interpretable Model Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. A total of 1094 NPC patients were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database for model training and internal validation. We combined five different ML algorithms to form a uniquely stacked algorithm. The predictive performance of the stacked algorithm was compared with a state-of-the-art algorithm—extreme gradient boosting (XGBoost) to stratify the NPC patients into chance of survival groups. We validated our model with temporal validation (n = 547) and geographic external validation (Helsinki University Hospital NPC cohort, n = 60). The developed stacked predictive ML model showed an accuracy of 85.9% while the XGBoost had 84.5% after the training and testing phases. This demonstrated that both XGBoost and the stacked model showed comparable performance. External geographic validation of XGBoost model showed a c-index of 0.74, accuracy of 76.7%, and area under curve of 0.76. The SHAP technique revealed that age of the patient at diagnosis, T-stage, ethnicity, M-stage, marital status, and grade were among the prominent input variables in decreasing order of significance for the overall survival of NPC patients. LIME showed the degree of reliability of the prediction made by the model. In addition, both techniques showed how each feature contributed to the prediction made by the model. LIME and SHAP techniques provided personalized protective and risk factors for each NPC patient and unraveled some novel non-linear relationships between input features and survival chance. The examined ML approach showed the ability to predict the chance of overall survival of NPC patients. This is important for effective treatment planning care and informed clinical decisions. To enhance outcome results, including survival in NPC, ML may aid in planning individualized therapy for this patient population.
Retinoblastoma is a rare and aggressive form of childhood eye cancer that requires prompt diagnosis and treatment to prevent vision loss and even death. Deep learning models have shown promising results in detecting retinoblastoma from fundus images, but their decision-making process is often considered a “black box” that lacks transparency and interpretability. In this project, we explore the use of LIME and SHAP, two popular explainable AI techniques, to generate local and global explanations for a deep learning model based on InceptionV3 architecture trained on retinoblastoma and non-retinoblastoma fundus images. We collected and labeled a dataset of 400 retinoblastoma and 400 non-retinoblastoma images, split it into training, validation, and test sets, and trained the model using transfer learning from the pre-trained InceptionV3 model. We then applied LIME and SHAP to generate explanations for the model’s predictions on the validation and test sets. Our results demonstrate that LIME and SHAP can effectively identify the regions and features in the input images that contribute the most to the model’s predictions, providing valuable insights into the decision-making process of the deep learning model. In addition, the use of InceptionV3 architecture with spatial attention mechanism achieved high accuracy of 97% on the test set, indicating the potential of combining deep learning and explainable AI for improving retinoblastoma diagnosis and treatment.
Deep learning approaches have improved disease diagnosis efficiency. However, AI-based decision systems lack sufficient transparency and interpretability. This study aims to enhance the explainability and training performance of deep learning models using explainable artificial intelligence (XAI) techniques for brain tumor detection. A two-stage training approach and XAI methods were implemented. The proposed convolutional neural network achieved 97.20% accuracy, 98.00% sensitivity, 96.40% specificity, and 98.90% ROC-AUC on the BRATS2019 dataset. It was analyzed with explainability techniques including Local Interpretable Model-Agnostic Explanations (LIME), Gradient-weighted Class Activation Mapping (Grad-CAM), and Shapley Additive Explanations (SHAP). The masks generated from these analyses enhanced the dataset, leading to a higher accuracy of 99.40%, 99.20% sensitivity, 99.60% specificity, 99.60% precision, and 99.90% ROC-AUC in the final stage. The integration of LIME, Grad-CAM, and SHAP showed significant success by increasing the accuracy performance of the model from 97.20% to 99.40%. Furthermore, the model was evaluated for fidelity, stability, and consistency and showed reliable and stable results. The same strategy was applied to the BR35H dataset to test the generalizability of the model, and the accuracy increased from 96.80% to 99.80% on this dataset as well, supporting the effectiveness of the method on different data sources.
… various machine learning models for hepatitis B diagnosis. The study used demographic and … XAI tools (permutation feature importance, LIME, kernel SHAP) to analyze the models and …
Background: Ovarian cancer is often diagnosed at advanced stages due to the absence of specific early symptoms, resulting in high mortality rates. This study aims to develop a robust and interpretable machine learning (ML) model for the early detection of ovarian cancer, enhancing its transparency through the use of the Contrastive Explanation Method (CEM), an advanced technique within the field of explainable artificial intelligence (XAI). Methods: An open-access dataset of 349 patients with ovarian cancer or benign ovarian tumors was used. To improve reliability, the dataset was augmented via bootstrap resampling. A three-layer deep neural network was trained on normalized demographic, biochemical, and tumor marker features. Model performance was measured using accuracy, sensitivity, specificity, F1-score, and the Matthews correlation coefficient. CEM was used to explain the model’s classification results, showing which factors push the model toward “Cancer” or “No Cancer” decisions. Results: The model achieved high diagnostic performance, with an accuracy of 95%, sensitivity of 96.2%, and specificity of 93.5%. CEM analysis identified lymphocyte count (CEM value: 1.36), red blood cell count (1.18), plateletcrit (0.036), and platelet count (0.384) as the strongest positive contributors to the “Cancer” classification, with lymphocyte count demonstrating the highest positive relevance, underscoring its critical role in cancer detection. In contrast, age (change from −0.13 to +0.23) and HE4 (change from −0.43 to −0.05) emerged as key factors in reversing classifications, requiring substantial hypothetical increases to shift classification toward the “No Cancer” class. Among benign cases, a significant reduction in RBC count emerged as the strongest determinant driving a shift in classification. Overall, CEM effectively explained both the primary features influencing the model’s classification results and the magnitude of changes necessary to alter its outputs. Conclusions: Using CEM with ML allowed clear and trustworthy detection of early ovarian cancer. This combined approach shows the promise of XAI in assisting clinicians in making decisions in gynecologic oncology.
Background It is unclear whether artificial intelligence (AI) explanations help or hurt radiologists and other physicians in AI-assisted radiologic diagnostic decision-making. Purpose To test whether the type of AI explanation and the correctness and confidence level of AI advice impact physician diagnostic performance, perception of AI advice usefulness, and trust in AI advice for chest radiograph diagnosis. Materials and Methods A multicenter, prospective randomized study was conducted from April 2022 to September 2022. Two types of AI explanations prevalent in medical imaging-local (feature-based) explanations and global (prototype-based) explanations-were a between-participant factor, while AI correctness and confidence were within-participant factors. Radiologists (task experts) and internal or emergency medicine physicians (task nonexperts) received a chest radiograph to read; then, simulated AI advice was presented. Generalized linear mixed-effects models were used to analyze the effects of the experimental variables on diagnostic accuracy, efficiency, physician perception of AI usefulness, and "simple trust" (ie, speed of alignment with or divergence from AI advice); the control variables included knowledge of AI, demographic characteristics, and task expertise. Holm-Sidak corrections were used to adjust for multiple comparisons. Results Data from 220 physicians (median age, 30 years [IQR, 28-32.75 years]; 146 male participants) were analyzed. Compared with global AI explanations, local AI explanations yielded better physician diagnostic accuracy when the AI advice was correct (β = 0.86; P value adjusted for multiple comparisons [Padj] < .001) and increased diagnostic efficiency overall by reducing the time spent considering AI advice (β = -0.19; Padj = .01). While there were interaction effects of explanation type, AI confidence level, and physician task expertise on diagnostic accuracy (β = -1.05; Padj = .04), there was no evidence that AI explanation type or AI confidence level significantly affected subjective measures (physician diagnostic confidence and perception of AI usefulness). Finally, radiologists and nonradiologists placed greater simple trust in local AI explanations than in global explanations, regardless of the correctness of the AI advice (β = 1.32; Padj = .048). Conclusion The type of AI explanation impacted physician diagnostic performance and trust in AI, even when physicians themselves were not aware of such effects. © RSNA, 2024 Supplemental material is available for this article.
This paper contributes with a pragmatic evaluation framework for explainable Machine Learning (ML) models for clinical decision support. The study revealed a more nuanced role for ML explanation models, when these are pragmatically embedded in the clinical context. Despite the general positive attitude of healthcare professionals (HCPs) towards explanations as a safety and trust mechanism, for a significant set of participants there were negative effects associated with confirmation bias, accentuating model over-reliance and increased effort to interact with the model. Also, contradicting one of its main intended functions, standard explanatory models showed limited ability to support a critical understanding of the limitations of the model. However, we found new significant positive effects which repositions the role of explanations within a clinical context: these include reduction of automation bias, addressing ambiguous clinical cases (cases where HCPs were not certain about their decision) and support of less experienced HCPs in the acquisition of new domain knowledge.
This study quantitatively evaluated the impact of explainable artificial intelligence (XAI) on electronic health record (EHR)-based clinical risk prediction by comparing diagnostic accuracy, probability reliability, transparency stability, and clinician-centered utility within a controlled paired design. A retrospective cohort of 12,480 index patient-encounter observations was constructed from a hospital-based EHR system, with 3,744 encounters allocated to the independent test set. Identical feature engineering pipelines, cohort definitions, and data splits were applied to both non-explainable baseline models and explainable models augmented with structured explanation mechanisms. Diagnostic performance was assessed using discrimination, precision-oriented metrics, calibration summaries, and threshold-based error profiles. Transparency was evaluated using global and local explanation concentration indices and stability measures across 20 repeated training runs and controlled perturbations. The explainable model demonstrated higher discrimination performance (mean 0.826, SD 0.016) compared to the baseline model (mean 0.812, SD 0.018), along with improved precision-sensitive performance (0.447 vs. 0.421). Calibration slope improved from 0.91 to 0.97, and false-negative rate decreased from 18.6% to 16.8% at the prespecified operating threshold. Explanation stability was high, with a mean rank correlation of 0.91 across repeated runs. Clinician-centered evaluation (n = 64) showed strong internal reliability (Cronbach’s alpha range: 0.86–0.91) and high comprehension accuracy (84.3%, SD 6.7). Regression analysis indicated that the explainable condition significantly predicted improved discrimination (B = 0.014, 95% CI 0.006–0.022, p = 0.001) and increased odds of correct classification (OR = 1.28, p = 0.002). Explanation clarity significantly predicted clinician adoption proxy scores (B = 0.37, p < 0.001). Overall, findings demonstrated that explainability integration was associated with modest yet consistent improvements in technical performance, probability reliability, explanation stability, and clinician-facing interpretive outcomes under a rigorously controlled evaluation framework.
Whereas most research in AI system explanation for healthcare applications looks at developing algorithmic explanations targeted at AI experts or medical professionals, the question we raise is: How do we build meaningful explanations for laypeople? And how does a meaningful explanation affect user’s trust perceptions? Our research investigates how the key factors affecting human-AI trust change in the light of human expertise, and how to design explanations specifically targeted at non-experts. By means of a stage-based design method, we map the ways laypeople understand AI explanations in a User Explanation Model. We also map both medical professionals and AI experts’ practice in an Expert Explanation Model. A Target Explanation Model is then proposed, which represents how experts’ practice and layperson’s understanding can be combined to design meaningful explanations. Design guidelines for meaningful AI explanations are proposed, and a prototype of AI system explanation for non-expert users in a breast cancer scenario is presented and assessed on how it affect users’ trust perceptions.
… on explainability as a necessity, there is little evidence on how recent advances in eXplainable Artificial Intelligence … and an AI system working together, to contribute to the process of …
… somewhat on specific use cases in medical practice. Therefore, we propose a model for trust calibration in relation to AI in healthcare in the context of AI-based CDSSs. Additionally, our …
Explainable AI (XAI) has emerged as a pivotal tool in healthcare diagnostics, offering much-needed transparency and interpretability in complex AI models. XAI techniques, such as SHAP, Grad-CAM, and LIME, enable clinicians to understand AI-driven decisions, fostering greater trust and collaboration between human and machine in clinical settings. This review explores the key benefits of XAI in enhancing diagnostic accuracy, personalizing patient care, and ensuring compliance with regulatory standards. However, despite its advantages, XAI faces significant challenges, including balancing model accuracy with interpretability, scaling for real-time clinical use, and mitigating biases inherent in medical data. Ethical concerns, particularly surrounding fairness and accountability, are also discussed in relation to AI&#039;s growing role in healthcare. The review emphasizes the importance of developing hybrid models that combine high accuracy with improved interpretability and suggests that future research should focus on explainable-by-design systems, reducing computational costs, and addressing ethical issues. As AI continues to integrate into healthcare, XAI will play an essential role in ensuring that AI systems are transparent, accountable, and aligned with the ethical standards required in clinical practice.
Human-AI collaborative decision-making tools are being increasingly applied in critical domains such as healthcare. However, these tools are often seen as closed and intransparent for human decision-makers. An essential requirement for their success is the ability to provide explanations about themselves that are understandable and meaningful to the users. While explanations generally have positive connotations, studies showed that the assumption behind users interacting and engaging with these explanations could introduce trust calibration errors such as facilitating irrational or less thoughtful agreement or disagreement with the AI recommendation. In this paper, we explore how to help trust calibration through explanation interaction design. Our research method included two main phases. We first conducted a think-aloud study with 16 participants aiming to reveal main trust calibration errors concerning explainability in AI-Human collaborative decision-making tools. Then, we conducted two co-design sessions with eight participants to identify design principles and techniques for explanations that help trust calibration. As a conclusion of our research, we provide five design principles: Design for engagement, challenging habitual actions, attention guidance, friction and support training and learning. Our findings are meant to pave the way towards a more integrated framework for designing explanations with trust calibration as a primary goal.
… This research work introduced TAXAI, a Trust-Aware Explainable AI framework that … systems, motivating future extensions of TAXAI toward knowledge-supported trust calibration …
AI has propelled the potential for moving toward personalized health and early prediction of diseases. Unfortunately, a significant limitation of many of these deep learning models is that they are not interpretable, restricting their clinical utility and undermining trust by clinicians. However, all existing methods are non-informative because they report generic or post-hoc explanations, and few or none support patient-specific, accurate, individualized patient-level explanations. Furthermore, existing approaches are often restricted to static, limited-domain datasets and are not generalizable across various healthcare scenarios. To tackle these problems, we propose a new deep learning approach called PersonalCareNet for personalized health monitoring based on the MIMIC-III clinical dataset. Our system jointly models convolutional neural networks with attention (CHARMS) and employs SHAP (Shapley Additive exPlanations) to obtain global and patient-specific model interpretability. We believe the model, enabled to leverage many clinical features, would offer clinically interpretable insights into the contribution of features while supporting real-time risk prediction, thus increasing transparency and instilling clinically-oriented trust in the model. We provide an extensive evaluation that shows PersonalCareNet achieves 97.86% accuracy, exceeding multiple notable SoTA healthcare risk prediction models. Explainability at Both Local and Global Level The framework offers explainability at local (using various matrix heatmaps for diagnosing models, such as force plots, SHAP summary visualizations, and confusion matrix-based diagnostics) and also at a global level through feature importance plots and Top-N list visualizations. As a result, we show quantitative results, demonstrating that much of the improvement can be achieved without paying a high price for interpretability. We have proposed a cost-effective and systematic approach as an AI-based platform that is scalable, accurate, transparent, and interpretable for critical care and personalized diagnostics. PersonalCareNet, by filling the void between performance and interpretability, promises a significant advancement in the field of reliable and clinically validated predictive healthcare AI. The design allows for additional extension to multiple data types and real-time deployment at the edge, creating a broader impact and adaptability.
Explainability has become an essential requirement for safe and effective collaborative Human-AI environments., especially when generating recommendations through black-box modality. One goal of eXplainable AI (XAI) is to help humans calibrate their trust while working with intelligent systems., i.e., avoid situations where human decision-makers over-trust the AI when it is incorrect., or under-trust the AI when it is correct. XAI., in this context., aims to help humans understand AI reasoning and decide whether to follow or reject its recommendations. However., recent studies showed that users., on average., continue to overtrust (or under-trust) AI recommendations which is an indication of XAI's failure to support trust calibration. Such a failure to aid trust calibration was due to the assumption that XAI users would cognitively engage with explanations and interpret them without bias. In this work., we hypothesize that XAI interaction design can play a role in helping users' cognitive engagement with XAI and consequently enhance trust calibration. To this end., we propose friction as a Nudge-based approach to help XAI users to calibrate their trust in AI and present the results of a preliminary study of its potential in fulfilling that role.
整体文献可归纳为五条并行主线:①在影像诊断中落地的XAI方法体系(LIME/Grad-CAM/SHAP等);②面向临床用户的“解释类型/呈现方式”证据化评估;③从真实CDSS互动出发的信任度量与校准(沟通差距、自动化偏差、采用与确认偏差);④解释可靠性工程(忠诚度、稳定性、一致性、校准、分布外谨慎性);⑤将解释与建模框架协同,面向罕见肿瘤的泛化、个体化与隐私保护(含联邦学习/多模态与转化路线)。