多源异构信息融合
多源异构数据融合基础理论与综述
汇集了对多源异构数据融合领域进行系统性总结、定义标准、框架设计及探讨通用挑战的综述与理论研究。
- A Survey on Deep Learning for Multimodal Data Fusion(Jing Gao, Peng Li, Zhikui Chen, Jianing Zhang, 2020, Neural Computation)
- Integrating Heterogeneous Data: A Systematic Review of Challenges and Evolution Solution(Meriem Bensaci, Mohammed Charaf Eddine Meftah, Elahe Meftah, A. Laouid, Sajid M. Sheikh, 2025, Proceedings of the 9th International Conference on Future Networks and Distributed Systems)
- Heterogeneous Data Integration: A Literature Scope Review(S. Borowicc, S. N. Alves-Souza, 2024, Proceedings of the 26th International Conference on Enterprise Information Systems)
- Multimodal Representation Learning: Advances, Trends and Challenges(Sufang Zhang, Jun-Hai Zhai, Bo-Jun Xie, Yan Zhan, Xin Wang, 2019, 2019 International Conference on Machine Learning and Cybernetics (ICMLC))
- Multisensor Data Fusion(E Waltz, J Llinas, 2001, Multisensor Data Fusion)
- Information Fusion for Multi-Source Material Data: Progress and Challenges(Jingren Zhou, Xin Hong, Peiquan Jin, 2019, Applied Sciences)
- Deep Multimodal Representation Learning: A Survey(Wenzhong Guo, Jianwen Wang, Shiping Wang, 2019, IEEE Access)
- Multi-source knowledge fusion: a survey(Xiaojuan Zhao, Yan Jia, Aiping Li, Rong Jiang, Yichen Song, 2020, World Wide Web)
深度多模态联合表示与跨模态学习
聚焦于利用深度学习架构(如Attention, GNN, 对比学习)从多模态数据中进行联合特征提取、跨模态转换及构建统一的特征空间。
- Deep Multimodal Data Fusion(Fei Zhao, Chengcui Zhang, Baocheng Geng, 2024, ACM Computing Surveys)
- Representation Learning and Nature Encoded Fusion for Heterogeneous Sensor Networks(Longwei Wang, Q. Liang, 2019, IEEE Access)
- Deep Multimodal Representation Learning from Temporal Data(Xitong Yang, Palghat Ramesh, Radha Chitta, S. Madhvanath, Edgar A. Bernal, Jiebo Luo, 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR))
- Multimodal deep learning for biomedical data fusion: a review(S. Stahlschmidt, B. Ulfenborg, Jane Synnergren, 2022, Briefings in Bioinformatics)
- Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis(Yazhou Zhang, Yang Yu, Mengyao Wang, Min Huang, M. S. Hossain, 2023, ACM Transactions on Multimedia Computing, Communications, and Applications)
- MJPR: Multi-Modal Joint Predictive Representation in Deep Reinforcement Learning(Zehan Wang, Ziming He, Zijia Wang, Hua He, Beiya Yang, Haobin Shi, 2025, 2025 IEEE International Conference on Robotics and Automation (ICRA))
- Molecular representation learning via multimodal fusion and decoupling(Xuan Zang, Junjie Zhang, Buzhou Tang, 2026, Information Fusion)
- Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection(Ayush Ghadiya, P. Kar, Vishal M. Chudasama, P. Wasnik, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
- A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives(Shuyu Li, Shulei Ji, Zihao Wang, Songruoyao Wu, Jiaxing Yu, Kejun Zhang, 2025, ACM Computing Surveys)
- Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions(Lei Zhu, Tianshi Wang, Fengling Li, Jingjing Li, Zheng Zhang, H. Shen, 2023, Proceedings of the IEEE)
- Multimodal deep representation learning for video classification(Haiman Tian, Yudong Tao, Samira Pouyanfar, Shu‐Ching Chen, Mei-Ling Shyu, 2018, World Wide Web)
- A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning(Lan Wang, Junjie Peng, Cangzhi Zheng, Tong Zhao, Li’an Zhu, 2024, Information Processing & Management)
- RGBD Salient Object Detection via Disentangled Cross-Modal Fusion(Hao Chen, Yongjian Deng, Youfu Li, Tzu-Yi Hung, Guosheng Lin, 2020, IEEE Transactions on Image Processing)
- Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning(Qian Jiang, Changyou Chen, Han Zhao, Liqun Chen, Q. Ping, S. Tran, Yi Xu, Belinda Zeng, Trishul M. Chilimbi, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Transfer Representation Learning Meets Multimodal Fusion Classification for Remote Sensing Images(Mengru Ma, Wenping Ma, Licheng Jiao, Xu Liu, F. Liu, Lingling Li, Shuyuan Yang, B. Hou, 2022, IEEE Transactions on Geoscience and Remote Sensing)
- A Comprehensive Survey on Multimodal Data Representation and Information Fusion Algorithms(Apeksha Gaonkar, Yogya Chukkapalli, P. J. Raman, Sahana Srikanth, Sanjeev Gurugopinath, 2021, 2021 International Conference on Intelligent Technologies (CONIT))
- XKanFuse: A novel cross-modal fusion method based on Kolmogorov-Arnold Network for multi-modal medical image fusion(Xinjian Wei, Yafei Xiong, Haotian Lu, Xiaoxuan Xu, Jing Xu, 2025, Knowledge-Based Systems)
- Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion(Sijie Mai, Haifeng Hu, Songlong Xing, 2019, Proceedings of the AAAI Conference on Artificial Intelligence)
- Cross-Scale Mixing Attention for Multisource Remote Sensing Data Fusion and Classification(Yunhao Gao, Mengmeng Zhang, Junjie Wang, Wei Li, 2023, IEEE Transactions on Geoscience and Remote Sensing)
- AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-Modal Alignment(Yan Li, Yifei Xing, Xiangyuan Lan, Xin Li, Haifeng Chen, Dongmei Jiang, 2024, 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data(Xu Jia, Kai Han, Yukun Zhu, Bradley Green, 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV))
- The State of the Art for Cross-Modal Retrieval: A Survey(Kun Zhou, F. H. Hassan, Gan Keng Hoon, 2023, IEEE Access)
- Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects(D. Lahat, T. Adalı, C. Jutten, 2015, Proceedings of the IEEE)
- Deep Multiscale Fusion Hashing for Cross-Modal Retrieval(Xiushan Nie, Bowei Wang, Jiajia Li, Fanchang Hao, Muwei Jian, Yilong Yin, 2021, IEEE Transactions on Circuits and Systems for Video Technology)
- Enhancing Classification with Joint Representation Learning on Multimodal Data(Neha Dhirendra Sirur, Padmashree Desai, Sujatha C, Uma Mudengudi, Ramesh Ashok Tabib, 2026, Lecture Notes in Networks and Systems)
- Multimodal Representation Learning by Alternating Unimodal Adaptation(Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao, 2023, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Effective Techniques for Multimodal Data Fusion: A Comparative Analysis(Maciej Pawłowski, Anna Wróblewska, S. Sysko-Romańczuk, 2022, Sensors)
- Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering(Wei Xia, Tianxiu Wang, Quanxue Gao, Ming Yang, Xinbo Gao, 2023, IEEE Transactions on Image Processing)
- Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition(Yongjin Wang, L. Guan, A. Venetsanopoulos, 2012, IEEE Transactions on Multimedia)
- Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs(Tianyu Wu, Yang Tang, Qiyu Sun, Luolin Xiong, 2022, IEEE/ACM Transactions on Computational Biology and Bioinformatics)
- Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection(Hao Chen, Youfu Li, Dan Su, 2019, Pattern Recognition)
- Relation-Induced Multi-Modal Shared Representation Learning for Alzheimer’s Disease Diagnosis(Zhenyuan Ning, Qing Xiao, Qianjin Feng, Wufan Chen, Yu Zhang, 2021, IEEE Transactions on Medical Imaging)
- Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona(Kai Cao, Yiguang Hong, Lin Wan, 2021, Bioinformatics)
异构网络、知识图谱与语义集成
侧重于利用知识图谱、语义映射和实体对齐技术,解决多源知识的结构异构性与语义逻辑不一致问题。
- Side Information Fusion for Recommender Systems over Heterogeneous Information Network(Huan Zhao, Quanming Yao, Yangqiu Song, J. Kwok, Lee, 2021, ACM Transactions on Knowledge Discovery from Data)
- A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks(Xiao Huang, Jundong Li, Na Zou, Xia Hu, 2018, ACM Transactions on Knowledge Discovery from Data)
- Trustworthy Knowledge Graph Completion Based on Multi-sourced Noisy Data(Jiacheng Huang, Yao Zhao, Wei Hu, Zhen-Hu Ning, Qijin Chen, Xiaoxia Qiu, Chengfu Huo, Weijun Ren, 2022, Proceedings of the ACM Web Conference 2022)
- Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment(Qian Li, Shu Guo, Yang Luo, Cheng Ji, Lihong Wang, Jiawei Sheng, Jianxin Li, 2023, Proceedings of the ACM Web Conference 2023)
- Leveraging Multi-source knowledge for Chinese clinical named entity recognition via relational graph convolutional network(Ying Xiong, Hao Peng, Yang Xiang, Ka-chun Wong, Qingcai Chen, Jun Yan, Buzhou Tang, 2022, Journal of Biomedical Informatics)
- Multi-source information fusion based heterogeneous network embedding(Bentian Li, D. Pi, Yunxia Lin, I. A. Khan, Lin Cui, 2020, Information Sciences)
- Embedding-based entity alignment between multi-source temporal knowledge graphs(Lin Zhu, Nan Li, Luyi Bai, 2024, Engineering Applications of Artificial Intelligence)
- Study on Multi-source Heterogeneous Data Fusion and Knowledge Graph Construction Techniques in Higher Education Institutions(Chengbo Wang, 2025, Proceedings of the 2025 3rd International Conference on Educational Knowledge and Informatization)
- Cross-knowledge-graph entity alignment via relation prediction(Hongren Huang, Chen Li, Xutan Peng, Lifang He, Shu Guo, Hao Peng, Lihong Wang, Jianxin Li, 2021, Knowledge-Based Systems)
- MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning(Xinyu Lu, Lifang Wang, Zejun Jiang, Shichang He, Shizhong Liu, 2021, Applied Intelligence)
- Collective Multi-type Entity Alignment Between Knowledge Graphs(Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, C. Faloutsos, Xin Dong, Jiawei Han, 2020, Proceedings of The Web Conference 2020)
- LMKG: A large-scale and multi-source medical knowledge graph for intelligent medicine applications(Peiru Yang, Hongjun Wang, Yingzhuo Huang, Shuai Yang, Ya Zhang, Liang Huang, Yuesong Zhang, Guoxin Wang, Shizhong Yang, Liang He, Yongfeng Huang, 2023, Knowledge-Based Systems)
- MMIEA: Multi-modal Interaction Entity Alignment model for knowledge graphs(Bin Zhu, Meng-Sheng Wu, Yunpeng Hong, Yi Chen, Bo Xie, Fei-Tsung Liu, Chenyang Bu, Weiping Ding, 2023, Information Fusion)
- Informed Multi-context Entity Alignment(Kexuan Xin, Zequn Sun, Wen Hua, Wei Hu, Xiaofang Zhou, 2022, Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining)
- A comprehensive survey of entity alignment for knowledge graphs(Kaisheng Zeng, Chengjiang Li, Lei Hou, Juan-Zi Li, Ling Feng, 2021, AI Open)
- MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph(Bo Cheng, Jia Zhu, Meimei Guo, 2022, Neurocomputing)
- An approach for semantic integration of heterogeneous data sources(Giuseppe Fusco, L. Aversano, 2020, PeerJ Computer Science)
- Temporal Knowledge Graph Entity Alignment via Representation Learning(Xiuting Song, Luyi Bai, Rongke Liu, Han Zhang, 2022, Lecture Notes in Computer Science)
- Towards Heterogeneous Network Alignment: Design and Implementation of a Large-Scale Data Processing Framework(Marianna Milano, P. Veltri, M. Cannataro, P. Guzzi, 2018, Lecture Notes in Computer Science)
- A semantics-based approach to multi-source heterogeneous information fusion in the internet of things(Feng Wang, Liang Hu, Jin Zhou, Jiejun Hu, Kuo Zhao, 2015, Soft Computing)
- Data Integration for Heterogenous Datasets(J. Hendler, 2014, Big Data)
- Heterogeneous Data Fusion via Space Alignment Using Nonmetric Multidimensional Scaling(J. Choo, S. Bohn, Grant C. Nakamura, Amanda M. White, Haesun Park, 2012, Proceedings of the 2012 SIAM International Conference on Data Mining)
分布式数据隐私保护与安全集成
专门探讨在联邦学习、差分隐私等技术支撑下,分布式环境下多源异构数据的安全对齐与融合计算方案。
- Heterogeneous Data Fusion: A Scalable Approach to Intrusion Detection(Seonghyeon Gong, Jake Cho, K. Choi, 2025, IEEE Access)
- Federated Learning for Heterogeneous Data Integration and Privacy Protection(Chenwei Gong, Xuyang Zhang, Yuzhen Lin, Hang Lü, P. P. Su, Jingwei Zhang, 2025, … Cooperative Work in …)
- A multimodal differential privacy framework based on fusion representation learning(Chaoxin Cai, Yingpeng Sang, Hui Tian, 2022, Connection Science)
- Multisource Geospatial Data Fusion via Local Joint Sparse Representation(Yuhang Zhang, S. Prasad, 2016, IEEE Transactions on Geoscience and Remote Sensing)
- Research on Multi-Source Heterogeneous Big Data Fusion Method Based on Feature Level(Yanyan Chen, Chenxi Wang, Yuchen Zhou, Yuhang Zuo, Zi-shan Yang, Hui Li, Juan Yang, 2024, International Journal of Pattern Recognition and Artificial Intelligence)
- Research on Heterogeneous Network Data Fusion based on Deep Learning(Zengyun Hu, Minghao Liu, Lipeng Liu, Lei Fu, Yu M, Xirui Tang, 2024, 2024 4th International …)
工业与物理系统的多源传感融合诊断
专注于处理物理传感器、监测设备产生的异构信号,应用于机械故障诊断、能源管理、采矿安全及复杂环境的实时监控与预测。
- Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data(Qianqian Shi, Chuanchao Zhang, Minrui Peng, Xiangtian Yu, Tao Zeng, Juan Liu, Luonan Chen, 2017, Bioinformatics)
- Analysis of Substation Joint Safety Control System and Model Based on Multi-Source Heterogeneous Data Fusion(Bo Wu, Yifan Hu, 2023, IEEE Access)
- Multi-source data fusion method for structural safety assessment of water diversion structures(Sherong Zhang, Liu Ting, Wang Chao, 2021, Journal of Hydroinformatics)
- A novel multi-source sensing data fusion driven method for detecting rolling mill health states under imbalanced and limited datasets(Peiming Shi, Yue Yu, Hao Gao, C. Hua, 2022, Mechanical Systems and Signal Processing)
- Bearing fault diagnosis method based on multi-source heterogeneous information fusion(K Zhang, T Gao, H Shi, 2022, Measurement Science and Technology)
- iFusion: Towards efficient intelligence fusion for deep learning from real-time and heterogeneous data(Kehua Guo, Tao Xu, Xiaoyan Kui, Ruifang Zhang, Tao Chi, 2019, Information Fusion)
- Credibility Assessment Method of Sensor Data Based on Multi-Source Heterogeneous Information Fusion(Yanling Feng, Jixiong Hu, Rui Duan, Zhuming Chen, 2021, Sensors)
- A multi-source heterogeneous data fusion framework for fault diagnosis in industrial processes with missing image data(Liang Ma, Qikai Yang, O. Llanes-Santiago, Kaixiang Peng, 2025, Measurement)
- A Comprehensive Review of Multi-Source Data Fusion Processing Methods(Xiaping Ma, Peimin Zhou, Xiaoxing He, Sheng Zhang, 2025, Preprints.org)
- Multisource Heterogeneous Information Fusion Based on Graph Convolutional Network for Gearbox Fault Diagnosis(Siyuan Gao, Khandaker Noman, Gang Mao, Zichen Deng, Yongbo Li, Wenqing Ge, 2025, IEEE Transactions on Instrumentation and Measurement)
- Deep well construction of big data platform based on multi-source heterogeneous data fusion(Yu Zhang, Yange Wang, Hongwei Ding, Yongzhen Li, Yan-ping Bai, 2019, International Journal of Internet Manufacturing and Services)
- Cross-Modal Fusion Convolutional Neural Networks With Online Soft-Label Training Strategy for Mechanical Fault Diagnosis(Yadong Xu, Ke Feng, Xiaoan Yan, Xin Sheng, Beibei Sun, Zheng Liu, Ruqiang Yan, 2024, IEEE Transactions on Industrial Informatics)
- Multi-source heterogeneous information fusion fault diagnosis method based on deep neural networks under limited datasets(Dongying Han, Yu Zhang, Yue Yu, Jinghui Tian, P. Shi, 2024, Applied Soft Computing)
- Concept-Aware Entity Alignment Network for Industrial Knowledge Graph(Shuai Wu, W. Tong, Yuhong Hou, Ping Li, Weidong Yang, Edmond Q. Wu, 2025, IEEE Transactions on Industrial Informatics)
- Multimodal Industrial Anomaly Detection via Uni-Modal and Cross-Modal Fusion(Hao Cheng, Jiaxiang Luo, Xianyong Zhang, 2025, IEEE Transactions on Industrial Informatics)
- Leakage diagnosis of natural gas pipeline based on multi-source heterogeneous information fusion(X. Miao, Hong Zhao, 2024, International Journal of Pressure Vessels and Piping)
- Rotor unbalance fault diagnosis using DBN based on multi-source heterogeneous information fusion(Jihong Yan, Yuanyuan Hu, Chaozhong Guo, 2019, Procedia Manufacturing)
- Multi-source heterogeneous data fusion technology for electric power based on big data mining(Zhongjian Liu, Ruixin Qian, Xianing Jin, Hanlin Zhao, Hongyi Li, Danlei Hu, Hanghai Hu, 2024, Journal of Computational Methods in Sciences and Engineering)
- Vehicle Heterogeneous Multi-Source Information Fusion Positioning Method(Chengkai Tang, Chen Wang, Lingling Zhang, Yi Zhang, H. Song, 2024, IEEE Transactions on Vehicular Technology)
多源地理空间遥感应用
针对遥感影像、地理空间数据等多源信息,利用特定融合算法提高地物分类、地质解释及灾害监测的准确性。
- Multi-source remotely sensed data fusion for improving land cover classification(Bin Chen, Bin Chen, Bo Huang, Bing Xu, Bing Xu, Bing Xu, 2017, ISPRS Journal of Photogrammetry and Remote Sensing)
- Multi-source remote sensing data fusion: status and trends(Jixian Zhang, 2010, International Journal of Image and Data Fusion)
- Geological Remote Sensing Interpretation Using Deep Learning Feature and an Adaptive Multisource Data Fusion Network(Wei Han, Jun Li, Shengte Wang, Xinyu Zhang, Yusen Dong, R. Fan, Xiaohan Zhang, Lizhe Wang, 2022, IEEE Transactions on Geoscience and Remote Sensing)
- Forest Types Classification Based on Multi-Source Data Fusion(Ming Lu, Bin Chen, X. Liao, T. Yue, Huanyin Yue, S. Ren, Xiaowen Li, Zhen Nie, Bing Xu, 2017, Remote Sensing)
- Integration of heterogeneous geospatial data in a federated database(M. Butenuth, G. V. Goesseln, M. Tiedge, C. Heipke, U. Lipeck, Monika Sester, 2007, ISPRS Journal of Photogrammetry and Remote Sensing)
特定领域场景推荐与决策辅助
结合行业垂直场景(如交通、推荐、医疗等),设计融合多源异构数据的决策支持系统、行为预测模型及工程化落地方案。
- A Survey of Methods and Technologies for Congestion Estimation Based on Multisource Data Fusion(Dominik Cvetek, M. Mustra, Niko Jelusic, Leo Tišljarić, 2021, Applied Sciences)
- A clustering and fusion method for large group decision making with double information and heterogeneous experts(Xiang-yu Zhong, Xuan-hua Xu, Xiao-hong Chen, 2021, Soft Computing)
- Heterogeneous data integration methods for patient similarity networks(J. Gliozzo, M. Mesiti, M. Notaro, A. Petrini, A. Patak, Antonio Puertas-Gallardo, Alberto Paccanaro, G. Valentini, E. Casiraghi, 2022, Briefings in Bioinformatics)
- An efficient hierarchical model for multi-source information fusion(Ismaïl Saadi, B. Farooq, Ahmed M. Mustafa, J. Teller, M. Cools, 2018, Expert Systems with Applications)
- Advancing Multi-Modal Beam Prediction With Cross-Modal Feature Enhancement and Dynamic Fusion Mechanism(Qihao Zhu, Yu Wang, Wenmei Li, Hao Huang, Guan Gui, 2025, IEEE Transactions on Communications)
- Joint Representation Learning for Multi-Modal Transportation Recommendation(Hao Liu, Ting Li, Renjun Hu, Yanjie Fu, Jingjing Gu, Hui Xiong, 2019, Proceedings of the AAAI Conference on Artificial Intelligence)
- CMBF: Cross-Modal-Based Fusion Recommendation Algorithm(Xi Chen, Yang Lu, Yuehai Wang, Jianyi Yang, Yinong Chen, S. Guan, 2021, Sensors)
- Data fusion and multisource image classification(D. Amarsaikhan, T. Douglas, 2004, International Journal of Remote Sensing)
- CSF: Crowdsourcing semantic fusion for heterogeneous media big data in the internet of things(Kehua Guo, Yayuan Tang, Peiyun Zhang, 2017, Information Fusion)
- A Large Scale Video Surveillance System with Heterogeneous Information Fusion and Visualization for Wide Area Monitoring(Yuan-Kai Wang, Ching-Tang Fan, Caiyun Huang, 2012, 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing)
- Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks(Riccardo Cappuzzo, Paolo Papotti, Saravanan Thirumuruganathan, 2020, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data)
- A framework for multi-source data fusion(R. Yager, 2004, Information Sciences)
- Large scale heterogeneous monitoring system with decentralized sensor fusion(G. Stamatescu, I. Stamatescu, Cristian Dragana, D. Popescu, 2015, 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS))
- Federated fault diagnosis using data fusion in large-scale heterogeneous unmanned systems(Runze Li, Bin Jiang, Yan Zong, N. Lu, Li Guo, 2025, Control Engineering Practice)
- A multi-source heterogeneous data fusion method for intelligent systems in the Internet of Things(Rongrong Sun, Yuemei Ren, 2024, Intelligent Systems with Applications)
- Heterogeneous Large-Scale Data Fusion Mechanism of Energy Storage Power Station Based on Neural Network(Yimin Deng, Zhoubo Weng, Tianlong Zhang, 2023, Journal of Multimedia Information System)
- A recommendation model with multi-scale semantic fusion on heterogeneous information network(H Zhang, X Wang, X Li, J Zhang, 2023, … International Conference on …)
- Mechanical fault diagnosis and prediction in IoT based on multi-source sensing data fusion(Min Huang, Zhen Liu, Yang Tao, 2020, Simulation Modelling Practice and Theory)
- Multi-source heterogeneous data fusion(Lili Zhang, Yuxiang Xie, Luan Xidao, Xin Zhang, 2018, 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD))
- FEV-Swin: Multi-source heterogeneous information fusion under a variant swin transformer framework for intelligent cross-domain fault diagnosis(Keyi Zhou, N. Lu, Bin Jiang, Zhisheng Ye, 2025, Knowledge-Based Systems)
- Fusing Heterogeneous Data: A Case for Remote Sensing and Social Media(Han Wang, E. Skau, H. Krim, G. Cervone, 2018, IEEE Transactions on Geoscience and Remote Sensing)
- Data Fusion for Multi-Source Sensors Using GA-PSO-BP Neural Network(Jiguo Liu, Jian Huang, Rui Sun, Haitao Yu, Randong Xiao, 2021, IEEE Transactions on Intelligent Transportation Systems)
- Deep learning based multi-source heterogeneous information fusion framework for online monitoring of surface quality in milling process(Xiaofeng Wang, Jihong Yan, 2024, Engineering Applications of Artificial Intelligence)
- Heterogeneous Information Fusion and Visualization for a Large-Scale Intelligent Video Surveillance System(Ching-Tang Fan, Yuan-Kai Wang, Caiyun Huang, 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems)
- Multi-source heterogeneous data fusion prediction technique for the utility tunnel fire detection(Bin Sun, Yan Li, Yangyang Zhang, Tong Guo, 2024, Reliability Engineering & System Safety)
- A MAS approach to fusion of heterogeneous information(G. Pavlin, P. D. Oude, J. Nunnink, 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05))
本报告通过对文献的结构化梳理,将多源异构信息融合领域划分为七大维度:从基础综述到深度多模态联合表征,涵盖了知识图谱与语义集成、隐私保护集成、物理工业传感诊断、遥感空间应用以及行业推荐决策系统。这些研究展示了从底层数据处理、算法建模到垂直场景应用的全链路发展态势,突出了异构性解决、鲁棒性提升及语义互操作性的核心研究价值。
总计120篇相关文献
… data fusion. This paper introduces big data fusion and methods for heterogeneous data fusion, … learning methods in multisource heterogeneous data fusion. Challenges of dealing with …
… Simulation tests confirm the superiority of our method, demonstrating a remarkable improvement in performance in the fusion of dynamic, multi-source heterogeneous data compared to …
… However, most existing multi-source fusion methods are in … data redundancy, fuzzy multi-source signal fusion strategy, and insufficient accuracy. As a result, a new multi-source fusion …
… of IoT information fusion is required. We compare features of IoT data and information with an … Then, we design a framework for multi-source heterogeneous information fusion in the IoT …
The credibility of sensor data is essential for security monitoring. High-credibility data are the precondition for utilizing data and data analysis, but the existing data credibility evaluation methods rarely consider the spatio-temporal relationship between data sources, which usually leads to low accuracy and low flexibility. In order to solve this problem, a new credibility evaluation method is proposed in this article, which includes two factors: the spatio-temporal relationship between data sources and the temporal correlation between time series data. First, the spatio-temporal relationship was used to obtain the credibility of data sources. Then, the combined credibility of data was calculated based on the autoregressive integrated moving average (ARIMA) model and back propagation (BP) neural network. Finally, the comprehensive data reliability for evaluating data quality can be acquired based on the credibility of data sources and combined data credibility. The experimental results show the effectiveness of the proposed method.
With the development of vehicle applications such as intelligent transportation and autonomous driving, the application fields based on location services have increasingly higher requirements for vehicle positioning reliability and real-time accuracy. However, the existing single navigation source of vehicles makes it difficult to realize real-time and high-precision positioning in different scenarios. The current multi-source information fusion methods have the problems of low generalization ability, poor expansibility, and high computational complexity, so it is challenging to apply in the field of vehicle positioning. To solve the above problems, this paper proposes a vehicle heterogeneous multi-source information fusion positioning method (MIFP) based on information probability, which converts the multiple heterogeneous navigation sources into information probability models to realize the unification of the time-frequency parameter format and designs an information fusion algorithm to realize the rapid fusion based on the theory of relative entropy. Through simulation tests and experimental verification by comparing with mainstream information fusion methods, such as the UKF method, the FGA method, and the NNA method, the MIFP method has high positioning accuracy and strong real-time performance. It can effectively solve the problems of weak expansion ability and large calculation amounts of current vehicle fusion positioning models. In the case of interference or mutation, the MIFP method can also suppress the influence of sudden errors on vehicle positioning.
… the structural heterogeneity of various sensor data imposes barriers to information fusion as … This study developed a novel multi-source heterogeneous information fusion framework …
Abstract In urban and transportation research, important information is often scattered over a wide variety of independent datasets which vary in terms of described variables and sampling rates. As activity-travel behavior of people depends particularly on socio-demographics and transport/urban-related variables, there is an increasing need for advanced methods to merge information provided by multiple urban/transport household surveys. In this paper, we propose a hierarchical algorithm based on a Hidden Markov Model (HMM) and an Iterative Proportional Fitting (IPF) procedure to obtain quasi-perfect marginal distributions and accurate multi-variate joint distributions. The model allows for the combination of an unlimited number of datasets. The model is validated on the basis of a synthetic dataset with 1,000,000 observations and 8 categorical variables. The results reveal that the hierarchical model is particularly robust as the deviation between the simulated and observed multivariate joint distributions is extremely small and constant, regardless of the sampling rates and the composition of the datasets in terms of variables included in those datasets. Besides, the presented methodological framework allows for an intelligent merging of multiple data sources. Furthermore, heterogeneity is smoothly incorporated into micro-samples with small sampling rates subjected to potential sampling bias. These aspects are handled simultaneously to build a generalized probabilistic structure from which new observations can be inferred. A major impact in term of expert systems is that the outputs of the hierarchical model (HM) model serve as a basis for a qualitative and quantitative analyses of integrated datasets.
… Then, the multi-source heterogeneous data fusion fire detection is implemented for fire source localization and ceiling temperature distribution prediction based on Gauss model and the …
Abstract Heterogeneous network embedding aims to learn a mapping between network data in original topological space and vectored data in low dimensional latent space, while encoding valuable information, such as structural and semantic information. The resulting vector representation has shown promising performance for extensive real-world applications, such as node classification and node clustering. However, most of existing methods merely focus on modeling network structural information, ignoring the rich multi-source information of different types of nodes. In this paper, we propose a novel Multi-source Information Fusion based Heterogeneous Network Embedding (MIFHNE) approach. We first capture the semantic information using the strategy of meta-graph based random walk. Subsequently, we jointly model the structural proximity, attribute information and label information in the framework of Nonnegative Matrix Factorization (NMF). Theoretical proofs and comprehensive experiments on two real-world heterogeneous network datasets demonstrate the feasibility and effectiveness of our approach.
… use of multi-source heterogeneous data to monitor the … information from multi-source heterogeneous data, this paper proposes a novel multi-source heterogeneous information fusion …
Abstract In the age of Internet of Things and Industrial 4.0, new advanced methods need to be proposed to analyse massive multi-source heterogeneous data from rotating machinery since traditional data analysis methods are difficult to mine features effectively and provide accurate fault results automatically. This paper proposes a rotor unbalance fault diagnosis method using deep belief network (DBN) to learn the representative features automatically and accurately identify fault states. Multi-source heterogeneous information composed with vibration signal and shaft orbit plots generated by raw displacement signals can fully exploit multi-sensor information in fault diagnosis. And multi-DBN model was introduced to deal with multi-source heterogeneous information fusion problem containing all fault information which could adaptively learn useful features through multiple nonlinear transformations compared with traditional approaches depending on time-consuming and labour-intensive manual feature extraction. The results indicate that the accuracy of classifying rotor unbalance fault states is up to 100% under proper parameters of DBN which significantly improves the effect of fault recognition and validates effectiveness using the proposed method.
… In this paper, we propose a multi-source heterogeneous information fusion method for the complementary fusion of laser optical sensing and weak magnetic technologies. Firstly, the …
… single monitoring data hinder the engineering application and generalization of diagnostic models to some extent. To this end, a novel multi-source heterogeneous information fusion (…
As the number of substations continues to increase globally and the market demand continues to rise, the current workload of maintenance and daily operation of substations in power grids cannot meet the current demand if only relying on manual work, and the design and implementation of intelligent safety control solutions for substations is imperative. Therefore, this paper proposes a joint safety control system and model analysis for substations based on multi-source heterogeneous data fusion. Firstly, a three-dimensional visualization substation efficient interactive operation platform is realized, which realizes the functions of substation scene roaming, system login, information management, equipment parameters, status viewing and operation ticket pushing; after that, a variety of intelligent hardware devices for data collection, such as multi-dimensional terminal sensors, intelligent wearable devices, intelligent pre-built positioning installation measure rod, and substation intelligent inspection robots are designed to greatly improve the substation inspection efficiency and realize real-time monitoring and data interaction in the inspection process. Finally, we propose an Attention-LSTM-based prediction model for substation multidimensional data, which can predict power equipment spatio-temporal data in the short term, and the prediction results can be combined with intelligent devices for joint diagnosis. The Attention-LSTM prediction model is well-trained in transformer oil temperature experiments, and the experimental results show that this model can provide early warning for the abnormal state of substation power equipment. In summary, this thesis describes a set of complete and practically feasible intelligent safety control methods for substations. The joint safety control system and model analysis of the substation based on multi-source heterogeneous data fusion designed in this paper is mainly oriented to the substation as an electric power workplace, which has quite a vast application prospect for energy equipment.
Achieving information fusion of multisensor data plays an important role in improving the performance of gearbox fault diagnosis. However, this fusion process is hindered by the heterogeneity problem caused by the different data dimensions of various sensors. To solve this problem, exploitation of the complementary nature of multisource heterogeneous data to provide more accurate fault information is necessary. Thus, a multisource heterogeneous information fusion method-based graph convolutional network (MHIF-GCN) is proposed in this article. In this framework, a convolutional autoencoder (CAE) is used to extract deep features corresponding to different types of sensors as graph node features for solving data heterogeneity problems. Second, the graph convolutional network (GCN) model based on K-nearest neighbor graph (KNNGraph) is introduced to establish the connection between different sensor data in the graph structure for realizing the feature-level fusion of sensor data and mining deeper fault data features. The results of two gearbox experiments validate the excellent fault diagnosis performance of the proposed MHIF-GCN. In Experiment I, the MHIF-GCN can accurately recognize six structural and nonstructural fault types. With the support of the complementary fusion mechanism, the proposed MHIF-GCN has the highest average diagnostic accuracy of 99.00% when compared with the other six methods. Even with a small number of training samples, the MHIF-GCN still performs very favorably compared to other methods with an accuracy of 88.87%. In Experiment II, the MHIF-GCN has the highest diagnostic accuracy of 94.00%, and the recall, precision, and the F-score for each fault state remain above 85%, and the proposed MHIF-GCN maintains a stable diagnostic performance.
With the rapid development of smart grid technology, a large amount of multi-source heterogeneous data has been generated in the power system, and its effective utilization is crucial for the optimization operation, demand prediction, and anomaly detection of the power system. However, the fusion processing of multi-source heterogeneous data faces many challenges, such as inconsistent data format, granularity, and quality, and direct fusion can easily lead to information redundancy and contradictions. A multi-source heterogeneous data fusion technology based on big data mining has been proposed to address the above issues. This method combines the advantages of convolutional neural networks and gated recurrent units to automatically extract features from image and sequence data and handle long-term dependency issues in time series data. Meanwhile, the K-means clustering algorithm is used to preprocess the data and train a specialized ConvGRU model. The results showed that in short-term load forecasting and abnormal electricity consumption behaviour detection tasks, the accuracy of this method reached 96.3% and 98.7%, respectively, with AUC values of 0.994 and 0.996. Compared to models that use only CNN or GRU, the performance is significantly improved. This method effectively solves the problem of integrating and processing multi-source heterogeneous power data, improves the accuracy and efficiency of power system data analysis, and provides strong support for the optimized operation of smart grids.
The development of material science in the manufacturing industry has resulted in a huge amount of material data, which are often from different sources and vary in data format and semantics. The integration and fusion of material data can offer a unified framework for material data representation, processing, storage and mining, which can further help to accomplish many tasks, including material data disambiguation, material feature extraction, material-manufacturing parameters setting, and material knowledge extraction. On the other side, the rapid advance of information technologies like artificial intelligence and big data, brings new opportunities for material data fusion. To the best of our knowledge, the community is currently lacking a comprehensive review of the state-of-the-art techniques on material data fusion. This review first analyzes the special properties of material data and discusses the motivations of multi-source material data fusion. Then, we particularly focus on the recent achievements of multi-source material data fusion. This review has a few unique features compared to previous studies. First, we present a systematic categorization and comparison framework for material data fusion according to the processing flow of material data. Second, we discuss the applications and impact of recent hot technologies in material data fusion, including artificial intelligence algorithms and big data technologies. Finally, we present some open problems and future research directions for multi-source material data fusion.
… the comprehensive analysis of multi-source heterogeneous data and fault … multi-source heterogeneous data fusion framework is designed for fault diagnosis with missing image data…
Representation learning is the base and crucial for consequential tasks, such as classification, regression, and recognition. The goal of representation learning is to automatically learning good features with deep models. Multimodal representation learning is a special representation learning, which automatically learns good features from multiple modalities, and these modalities are not independent, there are correlations and associations among modalities. Furthermore, multimodal data are usually heterogeneous. Due to the characteristics, multimodal representation learning poses many difficulties: how to combine multimodal data from heterogeneous sources; how to jointly learning features from multimodal data; how to effectively describe the correlations and associations, etc. These difficulties triggered great interest of researchers along with the upsurge of deep learning, many deep multimodal learning methods have been proposed by different researchers. In this paper, we present an overview of deep multimodal learning, especially the approaches proposed within the last decades. We provide potential readers with advances, trends and challenges, which can be very helpful to researchers in the field of machine, especially for the ones engaging in the study of multimodal deep machine learning.
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion model for fusing multiple input modalities that are inherently temporal in nature. Key features of our proposed model include: (i) simultaneous learning of the joint representation and temporal dependencies between modalities, (ii) use of multiple loss terms in the objective function, including a maximum correlation loss term to enhance learning of cross-modal information, and (iii) the use of an attention model to dynamically adjust the contribution of different input modalities to the joint representation. We validate our model via experimentation on two different tasks: video-and sensor-based activity classification, and audio-visual speech recognition. We empirically analyze the contributions of different components of the proposed CorrRNN model, and demonstrate its robustness, effectiveness and state-of-the-art performance on multiple datasets.
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (e.g., images, texts, or data collected from different sensors), feature engineering (e.g., extraction, combination/fusion), and decision-making (e.g., majority vote). As architectures become more and more sophisticated, multimodal neural networks can integrate feature extraction, feature fusion, and decision-making processes into one single model. The boundaries between those processes are increasingly blurred. The conventional multimodal data fusion taxonomy (e.g., early/late fusion), based on which the fusion occurs in, is no longer suitable for the modern deep learning era. Therefore, based on the main-stream techniques used, we propose a new fine-grained taxonomy grouping the state-of-the-art (SOTA) models into five classes: Encoder-Decoder methods, Attention Mechanism methods, Graph Neural Network methods, Generative Neural Network methods, and other Constraint-based methods. Most existing surveys on multimodal data fusion are only focused on one specific task with a combination of two specific modalities. Unlike those, this survey covers a broader combination of modalities, including Vision + Language (e.g., videos, texts), Vision + Sensors (e.g., images, LiDAR), and so on, and their corresponding tasks (e.g., video captioning, object detection). Moreover, a comparison among these methods is provided, as well as challenges and future directions in this area.
Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years. In this paper, we provided a comprehensive survey on deep multimodal representation learning which has never been concentrated entirely. To facilitate the discussion on how the heterogeneity gap is narrowed, according to the underlying structures in which different modalities are integrated, we category deep multimodal representation learning methods into three frameworks: joint representation, coordinated representation, and encoder-decoder. Additionally, we review some typical models in this area ranging from conventional models to newly developed technologies. This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of our knowledge, have never been reviewed previously, even though they have become the major focuses of much contemporary research. For each framework or model, we discuss its basic structure, learning objective, application scenes, key issues, advantages, and disadvantages, such that both novel and experienced researchers can benefit from this survey. Finally, we suggest some important directions for future work.
With the wide deployments of heterogeneous networks, huge amounts of data with characteristics of high volume, high variety, high velocity, and high veracity are generated. These data, referred to multimodal big data, contain abundant intermodality and cross-modality information and pose vast challenges on traditional data fusion methods. In this review, we present some pioneering deep learning models to fuse these multimodal big data. With the increasing exploration of the multimodal big data, there are still some challenges to be addressed. Thus, this review presents a survey on deep learning for multimodal data fusion to provide readers, regardless of their original community, with the fundamentals of multimodal deep learning fusion method and to motivate new multimodal data fusion techniques of deep learning. Specifically, representative architectures that are widely used are summarized as fundamental to the understanding of multimodal deep learning. Then the current pioneering multimodal data fusion deep learning models are summarized. Finally, some challenges and future topics of multimodal data fusion deep learning models are described.
To maximize the complementary advantages of synergistic multimodal, a transfer representation learning fusion network (TRLF-Net) is proposed for multisource remote sensing images collaborative classification in this article. First, with respect to the feature encoding, we design a dual-branch attention sparse transfer module (DAST-Module), which combines the spatial and channel attention (CA) masks to migrate the advantage attributes of the panchromatic (PAN) and the MS images mutually. This not only enhances their respective image advantages but also facilitates the sparse fusion of low-level features. Second, for the separation of multiscale information, a deep dual-scale decomposition module (DDSD-Module) is designed, which allows the decompose of high-frequency and low-frequency components. Then it uses the decomposed information to make the essential difference as small as possible, and the surrounding contour difference is as large as possible of the complementary multimodal image through the design of the loss function. Finally, to address the problem of large intraclass and small interclass differences, we develop a representation fusion of the global and local features’ module (RFGAL-Module). It mainly adopts global features to sort local features within classes, and then outputs them in a cascade. Thus, the characterization ability of features is improved, and the global and local features are used in a coordinated manner to accomplish the sample classification tasks. In particular, the experimental results demonstrate that TRLF-Net can obtain much improved accuracy and efficiency. The code is accessible in: https://github.com/ru-willow/SRLF-Net.
… a multimodal fusion-then-decoupling self-supervised molecular representation learning … First, we use a unified encoder to fuse 2D and 3D molecular structural information by …
In various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term “modality” for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or “challenges,” are common to multiple domains. This paper deals with two key issues: “why we need data fusion” and “how we perform it.” The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, “diversity” is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.
Abstract Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
… learning models ignore some data types and only focus on a single modality. This paper presents a new multimodal deep learning … Multimodal data fusion is critical yet challenging for a …
Target detection based on heterogeneous sensor networks is considered in this paper. Fusion problem is investigated to fully take advantage of the information of multi-modal data. The sensing data may not be compatible with each other due to heterogeneous sensing modalities, and the joint PDF of the sensors is not easily available. A two-stage fusion method is proposed to solve the heterogeneous data fusion problem. First, the multi-modality data is transformed into the same representation form by a certain linear or nonlinear transformation. Since there is a model mismatch among the different modalities, each modality is trained by an individual statistical model. In this way, the information of different modalities is preserved. Then, the representation is used as the input of the probabilistic fusion. The probabilistic framework allows data from different modalities to be processed in a unified information fusion space. The inherent inter-sensor relationship is exploited to encode the original sensor data on a graph. Iterative belief propagation is used to fuse the local sensing belief. The more general correlation case is also considered, in which the relation between two sensors is characterized by the correlation factor. The numerical results are provided to validate the effectiveness of the proposed method in heterogeneous sensor network fusion.
Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performance on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.
A contemporary survey on recent advancements in the field of multimodal signal processing, with a focus on multimodal data representation and information fusion is presented in this paper. Multimodal data representation is of critical importance in many signal processing applications, and information fusion algorithms aim at narrowing the heterogeneity gap among the different modalities. First, we start with a brief overview on techniques with some of the commonly used unimodal signals such as text, speech and image, which serves as fundamental requirement in multimodal representation. Next, we discuss multimodal data representation with audio-video, iris, fingerprint, face, LiDAR scanning and images. Later, we provide details on information fusion, broadly classified into model-agnostic and model-based approaches and mention some applications. Further, we discuss some of the challenges associated with multimodal signal processing, in terms of uncertainties, mismatches and inaccuracies in data representation and fusion.
Differential privacy mechanisms vary in modalities, and there have been many methods implementing differential privacy on unimodal data. Few studies focus on unifying them to protect multimodal data, though privacy protection of multimodal data is of great significance. In our work, we propose a multimodal differential privacy protection framework. Firstly, we use multimodal representation learning to fuse different modalities and map them to the same subspace. Then based on this representation, we use the Local Differential Privacy (LDP) mechanism to protect data. We propose two protection methods for low-dimensional and high-dimensional fusion tensors respectively. The former is based on Binary Encoding, and the latter is based on multi-dimensional Fourier Transform. To the best of our knowledge, we are the first to propose LDP-based methods for the representation learning of multimodal fusion. Experimental results demonstrate the flexibility of our framework where both approaches show efficient performance as well as high data utility.
Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning. resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Uni-modal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an al-ternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modi-fication mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encom-passing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches. Our code is available at https://github.com/Cecile-hi/MLA.
MOTIVATION: Single-cell multi-omics sequencing data can provide a comprehensive molecular view of cells. However, effective approaches for the integrative analysis of such data are challenging. Existing manifold alignment methods demonstrated the state-of-the-art performance on single-cell multi-omics data integration, but they are often limited by requiring that single-cell datasets be derived from the same underlying cellular structure. RESULTS: In this study, we present Pamona, a partial Gromov-Wasserstein distance-based manifold alignment framework that integrates heterogeneous single-cell multi-omics datasets with the aim of delineating and representing the shared and dataset-specific cellular structures across modalities. We formulate this task as a partial manifold alignment problem and develop a partial Gromov-Wasserstein optimal transport framework to solve it. Pamona identifies both shared and dataset-specific cells based on the computed probabilistic couplings of cells across datasets, and it aligns cellular modalities in a common low-dimensional space, while simultaneously preserving both shared and dataset-specific structures. Our framework can easily incorporate prior information, such as cell type annotations or cell-cell correspondence, to further improve alignment quality. We evaluated Pamona on a comprehensive set of publicly available benchmark datasets. We demonstrated that Pamona can accurately identify shared and dataset-specific cells, as well as faithfully recover and align cellular structures of heterogeneous single-cell modalities in a common space, outperforming the comparable existing methods. AVAILABILITYAND IMPLEMENTATION: Pamona software is available at https://github.com/caokai1073/Pamona. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
: Data have been collected by communities for analysis, visualization, predictions and other activities to support data-driven decision. Obtaining value from data assets directly depends on the data integration task. However, Big Data poses new challenges to integration due to data heterogeneity. It is essential to understand the main problems and to know technologies and techniques that have been employed to improve the ability to obtain value by heterogeneous data integration. This paper presents a literature scope review that highlights the main techniques applied to heterogeneous data integration. The literature reviewed presents solutions mostly focusing on a specific purpose or part of the integration process instead of a clear understanding of how the techniques can be used in a complete integration process. Therefore, this work shows a whole picture of a data integration process organizing the techniques according to their functionalities and presents a workflow with tasks associated to techniques and resources, focusing on semantic mediation, such as mapping and matching tasks. Ontologies and semantic web technologies are promising to address data heterogeneity and have been used in the semantic enrichment of data and semantic mediation between data sources and global model. However, some aspects remain to be further investigated, such as ontology and terminology construction, data processing scalability and semantic mediation, especially for mapping definition.
… data” area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data … guarantee that terms align or even that …
… For instance, when integrating multi-lingual data, we can match them in a feature level by comparing the terms between different languages [2] or even use off-the-shelf translation …
… mine heterogeneous networks. We propose a two-step alignment strategy that receives as input two heterogeneous … For the sake of the simplicity we consider only the integration of two …
Motivation Integrating different omics profiles is a challenging task, which provides a comprehensive way to understand complex diseases in a multi-view manner. One key for such an integration is to extract intrinsic patterns in concordance with data structures, so as to discover consistent information across various data types even with noise pollution. Thus, we proposed a novel framework called ‘pattern fusion analysis’ (PFA), which performs automated information alignment and bias correction, to fuse local sample-patterns (e.g. from each data type) into a global sample-pattern corresponding to phenotypes (e.g. across most data types). In particular, PFA can identify significant sample-patterns from different omics profiles by optimally adjusting the effects of each data type to the patterns, thereby alleviating the problems to process different platforms and different reliability levels of heterogeneous data. Results To validate the effectiveness of our method, we first tested PFA on various synthetic datasets, and found that PFA can not only capture the intrinsic sample clustering structures from the multi-omics data in contrast to the state-of-the-art methods, such as iClusterPlus, SNF and moCluster, but also provide an automatic weight-scheme to measure the corresponding contributions by data types or even samples. In addition, the computational results show that PFA can reveal shared and complementary sample-patterns across data types with distinct signal-to-noise ratios in Cancer Cell Line Encyclopedia (CCLE) datasets, and outperforms over other works at identifying clinically distinct cancer subtypes in The Cancer Genome Atlas (TCGA) datasets. Availability and implementation PFA has been implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/PFApackage_0.1.rar. Supplementary information Supplementary data are available at Bioinformatics online.
Abstract Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Data integration has become a cornerstone of modern data-driven systems, enabling organizations to combine heterogeneous, distributed data sources into unified, actionable forms. Despite substantial advancements, challenges such as semantic heterogeneity, scalability, data quality, and automation continue to limit the efficiency and reliability of integration techniques. This paper presents a comprehensive systematic literature review that investigates the major challenges, existing techniques, and emerging trends in data integration research. Following a rigorous four-stage selection process, high-quality studies published were analyzed to synthesize both theoretical frameworks and practical solutions. The reviewed literature reveals an evolution from traditional rule-based and ontology-driven approaches toward AI-assisted, machine learning-based, and cloud-enabled integration architectures. The study identifies ongoing research gaps and highlights the need for scalable, intelligent data integration frameworks, supported by reported improvements such as a 13.2% increase in precision and a 30% reduction in performance costs achieved by modern methods.
Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.
Federated learning (FL) represents a promising approach that enables the collaborative training of machine learning models without compromising data privacy. This approach is particularly advantageous when handling heterogeneous data dispersed across numerous institutions or devices, as centralized data aggregation is often constrained by privacy concerns and data regulations. In order to address the challenges posed by heterogeneous data, we have devised an adaptive data integration mechanism. This mechanism maps the features of disparate data sources to a unified feature space through the use of feature alignment technology, thereby facilitating the effective fusion of data. This fusion is achieved through the application of statistical alignment and multi- perspective learning technology. Furthermore, in order to safeguard the confidentiality of data, we integrate differential privacy and homomorphic encryption techniques, thereby preventing the disclosure of information during model updates and data transfers. Furthermore, a multi-level privacy protection strategy is proposed, which employs de-identification, secure multi-party computation, and federated averaging technologies at the three stages of data preprocessing, model training, and result aggregation, respectively. This approach ensures data security and facilitates effective model updates. The experimental results demonstrate that the proposed framework exhibits enhanced model performance and robustness in comparison to traditional federated learning methods on a multitude of real-world heterogeneous datasets.
Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not always be an appropriate choice for enterprise datasets with custom vocabulary. Other methods adapt techniques from natural language processing to obtain embeddings for the enterprise's relational data. However, this approach blindly treats a tuple as a sentence, thus losing a large amount of contextual information present in the tuple. We propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational databases. We make four major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in the relational world. Second, we propose how to derive sentences from such a graph that effectively "describe" the similarity across elements (tokens, attributes, rows) in the two datasets. The embeddings are learned based on such sentences. Third, we propose effective optimization to improve the quality of the learned embeddings and the performance of integration tasks. Finally, we propose a diverse collection of criteria to evaluate relational embeddings and perform an extensive set of experiments validating them against multiple baseline methods. Our experiments show that our framework, EmbDI, produces meaningful results for data integration tasks such as schema matching and entity resolution both in supervised and unsupervised settings.
The integration of heterogeneous geospatial data offers possibilities to manually and automatically derive new information, which are not available when using only a single data source. Furthermore, it allows for a consistent representation and the propagation of updates from one data set to the other. However, different acquisition methods, data schemata and updating cycles of the content can lead to discrepancies in geometric and thematic accuracy and correctness which hamper the combined integration. To overcome these difficulties, appropriate methods for the integration and harmonization of data from different sources and of different types are needed. In this paper we describe two generic cases including novel integration algorithms, namely the integration of two heterogeneous vector data sets, and the integration of raster and vector data. Both algorithms are linked to a federated database which allows for automatic object matching and for managing n:m relationships. We describe and illustrate our work using vector data from topography and the geosciences, as well as multi-spectral imagery. © 2007 International Society for Photogrammetry andRemote Sensing, Inc. (ISPRS). Published byElsevier B.V. All rights reserved.
Abstract Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods.
A recommendation system is often used to recommend items that may be of interest to users. One of the main challenges is that the scarcity of actual interaction data between users and items restricts the performance of recommendation systems. To solve this problem, multi-modal technologies have been used for expanding available information. However, the existing multi-modal recommendation algorithms all extract the feature of single modality and simply splice the features of different modalities to predict the recommendation results. This fusion method can not completely mine the relevance of multi-modal features and lose the relationship between different modalities, which affects the prediction results. In this paper, we propose a Cross-Modal-Based Fusion Recommendation Algorithm (CMBF) that can capture both the single-modal features and the cross-modal features. Our algorithm uses a novel cross-modal fusion method to fuse the multi-modal features completely and learn the cross information between different modalities. We evaluate our algorithm on two datasets, MovieLens and Amazon. Experiments show that our method has achieved the best performance compared to other recommendation algorithms. We also design ablation study to prove that our cross-modal fusion method improves the prediction results.
Constructing comprehensive multimodal feature representations from RGB images (RGB) and point clouds (PT) in 2D–3D multimodal anomaly detection (MAD) methods is very important to reveal various types of industrial anomalies. For multimodal representations, most of the existing MAD methods often consider the explicit spatial correspondence between the modality-specific features extracted from RGB and PT through space-aligned fusion, while overlook the implicit interaction relationships between them. In this study, we propose a uni-modal and cross-modal fusion (UCF) method, which comprehensively incorporates the implicit relationships within and between modalities in multimodal representations. Specifically, UCF first establishes uni-modal and cross-modal embeddings to capture intramodal and intermodal relationships through uni-modal reconstruction and cross-modal mapping. Then, an adaptive nonequal fusion method is proposed to develop fusion embeddings, with the aim of preserving the primary features and reducing interference of the uni-modal and cross-modal embeddings. Finally, uni-modal, cross-modal, and fusion embeddings are all collaborated to reveal anomalies existing in different modalities. Experiments conducted on the MVTec 3D-AD benchmark and the real-world surface mount inspection demonstrate that the proposed UCF outperforms existing approaches, particularly in precise anomaly localization.
With the exponential surge in diverse multimodal data, traditional unimodal retrieval methods struggle to meet the needs of users seeking access to data across various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between heterogeneous data. Although prior literature has reviewed the field of cross-modal retrieval, it suffers from numerous deficiencies in terms of timeliness, taxonomy, and comprehensiveness. This article conducts a comprehensive review of cross-modal retrieval’s evolution, spanning from shallow statistical analysis techniques to vision-language pretraining (VLP) models. Commencing with a comprehensive taxonomy grounded in machine learning paradigms, mechanisms, and models, this article delves deeply into the principles and architectures underpinning existing cross-modal retrieval methods. Furthermore, it offers an overview of widely used benchmarks, metrics, and performances. Lastly, this article probes the prospects and challenges that confront contemporary cross-modal retrieval, while engaging in a discourse on potential directions for further progress in the field. To facilitate the ongoing research on cross-modal retrieval, we develop a user-friendly toolbox and an open-source repository at https://cross-modal-retrieval.github.io.
In millimeter-wave and terahertz band communication systems, precise beam prediction is crucial for optimizing network performance and enhancing signal transmission efficiency. Traditional beam prediction methods have primarily relied on single-modal data, which often fails to capture the comprehensive environmental information necessary for optimal accuracy. In contrast, multi-modal data-based approaches offer a more promising solution by leveraging the strengths of diverse data sources. However, many existing fusion methods are static, inadequately accounting for variations in information content across different modalities, which can hinder the full utilization of each modality’s advantages. To address these limitations, this paper proposes an advanced multi-modal beam prediction method that integrates multipath-like data augmentation (MLDA), cross-modal feature enhancement (CMFE), and an uncertainty-aware dynamic fusion mechanism. Our approach combines image and radar data to predict beam indices, dynamically adjusting the weights of different modalities to accommodate varying information densities. The proposed method employs ResNet34 for feature extraction from the multi-modal data, followed by a cross-modal feature enhancement module that aggregates complementary information from the image and radar data. Finally, the dynamic fusion mechanism integrates the predictions from the single-modal data. Experimental results demonstrate that our method significantly improves the accuracy and robustness of beam prediction, achieving an overall accuracy of 89.72%. The performance of the proposed method is further validated through comparisons with various existing methods and comprehensive ablation studies, highlighting its superiority in multi-modal assisted beam prediction scenarios.
… image fusion is proposed to improve fusion performance, … ) in XKanFuse enables effective cross-modal exchange and … facilitating precise cross-modal interaction and fusion. Extensive …
Convolutional neural network (CNN)-based fault detection approaches based on multisource signals have attracted increasing interest from the research community and industrial practices, thanks to the powerful feature representation capability of CNN and the rapid development of sensor technology. Various strategies have been applied in existing CNN-based diagnostic models to learn features from 1-D real-valued multivariate data. However, the distribution gap and the intrinsic correlations among multisource mechanical signals during the learning process have been rarely considered, which may lead to suboptimal fault identification results. To tackle this issue, this article proposes a cross-modal fusion convolutional neural network (CMFCNN) for mechanical fault diagnosis, which performs modality-specific and cross-modal feature representation on multisource data. Specifically, CMFCNN adopts two parallel modality-specific networks and a cross-modal knowledge-sharing network to fully explore independent and shared features from the multisource mechanical signals. To achieve effective feature propagation and fusion, a cross-modal fusion module is introduced to integrate cross-modal features and pass the fused information to the next layer. Moreover, to alleviate overfitting and achieve a better diagnostic performance of the framework, an online soft-label training algorithm is adopted in the CMFCNN training phase. Extensive experimental results on the cylindrical rolling bearing dataset and the planetary gearbox dataset validate that the proposed CMFCNN outperforms seven state-of-the-art methods significantly, especially under strong noise conditions.
… Abstract—In this paper, we investigate kernel based methods for multimodal information analysis and fusion. We introduce a novel approach, kernel cross-modal factor analysis, which …
With the rapid development of artificial intelligence, music generation has evolved from single-modal to cross-modal approaches and is gradually moving toward multi-modal fusion. This survey systematically reviews this developmental trajectory. The discussion begins with the representation methods for key modalities, including audio, symbolic, text, and visual data. Music generation techniques are then organized across single-modal, cross-modal, and multi-modal settings. In addition, key datasets and evaluation methodologies relevant to these tasks are compiled. Finally, the survey discusses core challenges in the field, including modal fusion, data scarcity, and evaluation frameworks, and outlines potential directions for future research.
Owing to the rapid development of deep learning and the high efficiency of hashing, hashing methods based on deep learning models have been extensively adopted in the area of cross-modal retrieval. In general, in existing deep model-based methods, modality-specific features play an important role during the hash learning. However, most existing methods only use the modality-specific features from the final fully connected layer, ignoring the semantic relevance among modality-specific features with different scales in multiple layers. To address this issue, in this study, we put forward an end-to-end deep hashing method called deep multiscale fusion hashing (DMFH) for cross-modal retrieval. For the proposed DMFH, we first design different network branches for two modalities and then adopt multiscale fusion models for each branch network to fuse the multiscale semantics, which can be used to explore the semantic relevance. Furthermore, the multi-fusion models also embed the multiscale semantics into the final hash codes, making the final hash codes more representative. In addition, the proposed DMFH can learn common hash codes directly without a relaxation, thereby avoiding a loss in accuracy during hash learning. Experimental results on three benchmark datasets prove the relative superiority of the proposed method.
Cross-modal retrieval, which aims to search for semantically relevant data across different modalities, has received increasing attention in recent years. Deep learning, with its ability to extract high-level representations from multimodal data, has become a popular approach for cross-modal retrieval. In this paper, we present a comprehensive survey of deep learning techniques for cross-modal retrieval including 37 papers published in recent years. The review is organized into four main sections, covering traditional subspace learning methods, deep learning, and machine learning-based approaches, techniques based on large multi-modal models, and an analysis of datasets used in the field of cross-modal retrieval. We compare and analyze the performance of different deep learning methods on benchmark datasets, the result shows that although a large number of innovative methods have been proposed, there are still some problems that need to be solved, such as multi-modal feature alignment, multi-modal feature fusion, and subspace learning, as well as specialized datasets.
Cross-Modal alignment is crucial for multimodal representation fusion due to the inherent heterogeneity between modalities. While Transformer-Based methods have shown promising results in modeling inter-modal relationships, their quadratic computational complexity limits their applicability to long-sequence or large-scale data. Although recent Mamba-Based approaches achieve linear complexity, their sequential scanning mechanism poses fundamental challenges in comprehensively modeling cross-modal relationships. To address this limitation, we propose Align-Mamba, an efficient and effective method for multimodal fusion. Specifically, grounded in Optimal Transport, we introduce a local cross-modal alignment module that explicitly learns token-level correspondences between different modalities. Moreover, we propose a global cross-modal alignment loss based on Maximum Mean Discrepancy to implicitly enforce the consistency between different modal distributions. Finally, the unimodal representations after local and global alignment are passed to the Mamba backbone for further cross-modal interaction and multimodal fusion. Extensive experiments on complete and incomplete multimodal fusion tasks demonstrate the effectiveness and efficiency of the proposed method. For instance, on the CMU-MOSI dataset, AlignMamba improves classification accuracy by 0.9%, reduces GPU memory usage by 20.3%, and decreases inference time by 83.3%.
… fusion of heterogeneous data is one of the core problems of multimodal sentiment analysis. Most cross-modal fusion … propose a cross-modal hierarchical fusion method for multimodal …
Recently, weakly supervised video anomaly detection (WS-VAD) has emerged as a contemporary research direction to identify anomaly events like violence and nudity in videos using only video-level labels. However, this task has substantial challenges, including addressing imbalanced modality information and consistently distinguishing between normal and abnormal features. In this paper, we address these challenges and propose a multi-modal WS-VAD framework to accurately detect anomalies such as violence and nudity. Within the proposed framework, we introduce a new fusion mechanism known as the Cross-modal Fusion Adapter (CFA), which dynamically selects and enhances highly relevant audio-visual features in relation to the visual modality. Additionally, we introduce a Hyperbolic Lorentzian Graph Attention (HLGAtt) to effectively capture the hierarchical relationships between normal and abnormal representations, thereby enhancing feature separation accuracy. Through extensive experiments, we demonstrate that the proposed model achieves state-of-the-art results on benchmark datasets of violence and nudity detection.
Depth is beneficial for salient object detection (SOD) for its additional saliency cues. Existing RGBD SOD methods focus on tailoring complicated cross-modal fusion topologies, which although achieve encouraging performance, are with a high risk of over-fitting and ambiguous in studying cross-modal complementarity. Different from these conventional approaches combining cross-modal features entirely without differentiating, we concentrate our attention on decoupling the diverse cross-modal complements to simplify the fusion process and enhance the fusion sufficiency. We argue that if cross-modal heterogeneous representations can be disentangled explicitly, the cross-modal fusion process can hold less uncertainty, while enjoying better adaptability. To this end, we design a disentangled cross-modal fusion network to expose structural and content representations from both modalities by cross-modal reconstruction. For different scenes, the disentangled representations allow the fusion module to easily identify and incorporate desired complements for informative multi-modal fusion. Extensive experiments show the effectiveness of our designs and a large outperformance over state-of-the-art methods.
… The goal of entity alignment is to identify entities in two multi-source knowledge graphs (KGs… Recent researches on multi-source entity alignment mainly concentrate on static KGs. In fact…
… knowledge triplets from medical texts. Then we propose a hierarchical entity alignment framework for further knowledge … -scale, high-quality, multi-source, and multi-lingual medical KG …
Knowledge graph (e.g. Freebase, YAGO) is a multi-relational graph representing rich factual information among entities of various types. Entity alignment is the key step towards knowledge graph integration from multiple sources. It aims to identify entities across different knowledge graphs that refer to the same real world entity. However, current entity alignment systems overlook the sparsity of different knowledge graphs and can not align multi-type entities by one single model. In this paper, we present a Collective Graph neural network for Multi-type entity Alignment, called CG-MuAlign. Different from previous work, CG-MuAlign jointly aligns multiple types of entities, collectively leverages the neighborhood information and generalizes to unlabeled entity types. Specifically, we propose novel collective aggregation function tailored for this task, that (1) relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, (2) scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy. We conduct experiments on real world knowledge graphs with millions of entities and observe the superior performance beyond existing methods. In addition, the running time of our approach is much less than the current state-of-the-art deep learning methods.
… entity alignment for knowledge graphs and proposed the Multi-Modal Interaction Entity Alignment … INT model for the entity alignment task in multi-modal knowledge graphs. Experimental …
Multi-source knowledge fusion is one of the important research topics in the fields of artificial intelligence, natural language processing, and so on. The research results of multi-source knowledge fusion can help computer to better understand human intelligence, human language and human thinking, effectively promote the Big Search in Cyberspace, effectively promote the construction of domain knowledge graphs (KGs), and bring enormous social and economic benefits. Due to the uncertainty of knowledge acquisition, the reliability and confidence of KG based on entity recognition and relationship extraction technology need to be evaluated. On the one hand, the process of multi-source knowledge reasoning can detect conflicts and provide help for knowledge evaluation and verification; on the other hand, the new knowledge acquired by knowledge reasoning is also uncertain and needs to be evaluated and verified. Collaborative reasoning of multi-source knowledge includes not only inferring new knowledge from multi-source knowledge, but also conflict detection, i.e. identifying erroneous knowledge or conflicts between knowledges. Starting from several related concepts of multi-source knowledge fusion, this paper comprehensively introduces the latest research progress of open-source knowledge fusion, multi-knowledge graphs fusion, information fusion within KGs, multi-modal knowledge fusion and multi-source knowledge collaborative reasoning. On this basis, the challenges and future research directions of multi-source knowledge fusion in a large-scale knowledge base environment are discussed.
… -parameter to balance embedding loss and alignment loss, the other is the … entity alignment framework named RpAlign (Relation prediction based cross-knowledge-graph entity Align…
… with the same real-world identity from different Knowledge Graphs (KGs). Existing methods … Joint entity Alignment Framework (MultiJAF), which can effectively utilize the knowledge of …
OBJECTIVE External knowledge, such as lexicon of words in Chinese and domain knowledge graph (KG) of concepts, has been recently adopted to improve the performance of machine learning methods for named entity recognition (NER) as it can provide additional information beyond context. However, most existing studies only consider knowledge from one source (i.e., either lexicon or knowledge graph) in different ways and consider lexicon words or KG concepts independently with their boundaries. In this paper, we focus on leveraging multi-source knowledge in a unified manner where lexicon words or KG concepts are well combined with their boundaries for Chinese Clinical NER (CNER). MATERIAL AND METHODS We propose a novel method based on relational graph convolutional network (RGCN), called MKRGCN, to utilize multi-source knowledge in a unified manner for CNER. For any sentence, a relational graph based on words or concepts in each knowledge source is constructed, where lexicon words or KG concepts appearing in the sentence are linked to the containing tokens with the boundary information of the lexicon words or KG concepts. RGCN is used to model all relational graphs constructed from multi-source knowledge, and the representations of tokens from multi-source knowledge are integrated into the context representations of tokens via an attention mechanism. Based on the knowledge-enhanced representations of tokens, we deploy a conditional random field (CRF) layer for named entity label prediction. In this study, a lexicon of words and a medical knowledge graph are used as knowledge sources for Chinese CNER. RESULTS Our proposed method achieves the best performance on CCKS2017 and CCKS2018 in Chinese with F1-scores of 91.88% and 89.91%, respectively, significantly outperforming existing methods. The extended experiments on NCBI-Disease and BC2GM in English also prove the effectiveness of our method when only considering one knowledge source via RGCN. CONCLUSION The MKRGCN model can integrate knowledge from the external lexicon and knowledge graph effectively for Chinese CNER and has the potential to be applied to English NER.
… graph (KG) by matching the same entities in multi-source KGs. … for entity alignment between such temporal knowledge graphs (TKGs). In this paper, we propose a novel entity alignment …
The industrial knowledge graph (IKG) can improve the cognitive intelligence of the manufacturing system and is recognized as one of the cores of the next-generation industrial management information system. Due to the multisource heterogeneous nature of industrial data, aligning entities with the same semantics (entity alignment) is the core technology for building large-scale, high-coverage IKGs. Existing approaches show that embedded learning of IKGs performs well for this task. However, most advanced methods ignore concept information when learning topological information about IKGs. Inspired by the ontology matching theory, in this article, we realize the importance of entity concepts in alignment. The conceptual semantics of entities can usually be obtained through the is–a relation. However, the IKG is usually constructed by triples (entity, relation, entity) automatically extracted from a large text corpus. This will lead to entities in the IKG having problems such as lacking conceptual information, belonging to multiple concepts, or having different concept granularities. To solve the two problems of lacking conceptual information and different concept granularity, we propose the concept-aware entity alignment network (CAEA), aggregating bidirectional relations and attributes to get the entity concept semantics by a novel concept-aware graph attention mechanism. The excellent performance of the CAEA can better support the construction of large and complete IKGs and support downstream applications such as industrial knowledge recommendation and assisted decision-making. To verify the performance of the CAEA on the IKG, we construct a new entity alignment benchmark using industrial control network security data and verify the effectiveness of the CAEA on the new benchmark and several mainstream datasets. Experimental results show that our method outperforms other state-of-the-art (SOTA) methods and promotes the development of IKGs.
Entity alignment is a crucial step in integrating knowledge graphs (KGs) from multiple sources. Previous attempts at entity alignment have explored different KG structures, such as neighborhood-based and path-based contexts, to learn entity embeddings, but they are limited in capturing the multi-context features. Moreover, most approaches directly utilize the embedding similarity to determine entity alignment without considering the global interaction among entities and relations. In this work, we propose an Informed Multi-context Entity Alignment (IMEA) model to address these issues. In particular, we introduce Transformer to flexibly capture the relation, path, and neighborhood contexts, and design holistic reasoning to estimate alignment probabilities based on both embedding similarity and the relation/entity functionality. The alignment evidence obtained from holistic reasoning is further injected back into the Transformer via the proposed soft label editing to inform embedding learning. Experimental results on several benchmark datasets demonstrate the superiority of our IMEA model compared with existing state-of-the-art entity alignment methods.
The multi-modal entity alignment (MMEA) aims to find all equivalent entity pairs between multi-modal knowledge graphs (MMKGs). Rich attributes and neighboring entities are valuable for the alignment task, but existing works ignore contextual gap problems that the aligned entities have different numbers of attributes on specific modality when learning entity representations. In this paper, we propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA) to compensate the contextual gaps through incorporating consistent alignment knowledge. Attribute-consistent KGs (ACKGs) are first constructed via multi-modal attribute uniformization with merge and generate operators so that each entity has one and only one uniform feature in each modality. The ACKGs are then fed into a relation-aware graph neural network with random dropouts, to obtain aggregated relation representations and robust entity representations. In order to evaluate the ACK-MMEA facilitated for entity alignment, we specially design a joint alignment loss for both entity and attribute evaluation. Extensive experiments conducted on two benchmark datasets show that our approach achieves excellent performance compared to its competitors.
… multiple sources are aligned to the same canonical entity v, this paper merges their representations via confidenceweighted averaging or maximum-confidence selection: …
… However, current multi-source KGs have heterogeneity and complementarity, and it is … almost all the latest knowledge graph representations learning and entity alignment methods and …
Knowledge graphs (KGs) have become a valuable asset for many AI applications. Although some KGs contain plenty of facts, they are widely acknowledged as incomplete. To address this issue, many KG completion methods are proposed. Among them, open KG completion methods leverage the Web to find missing facts. However, noisy data collected from diverse sources may damage the completion accuracy. In this paper, we propose a new trustworthy method that exploits facts for a KG based on multi-sourced noisy data and existing facts in the KG. Specifically, we introduce a graph neural network with a holistic scoring function to judge the plausibility of facts with various value types. We design value alignment networks to resolve the heterogeneity between values and map them to entities even outside the KG. Furthermore, we present a truth inference model that incorporates data source qualities into the fact scoring function, and design a semi-supervised learning way to infer the truths from heterogeneous values. We conduct extensive experiments to compare our method with the state-of-the-arts. The results show that our method achieves superior accuracy not only in completing missing facts but also in discovering new facts.
Data resources in universities are increasingly abundant, yet data silos hinder their effective utilization. This research addresses multi-source heterogeneous data fusion and knowledge graph construction in universities. We propose a deep learning-based data fusion model with entity alignment and relationship extraction techniques, design a knowledge extraction method for educational contexts, and develop an integrated knowledge graph management system. The results show an entity alignment accuracy of 91.7% and response times below 120ms. The system successfully processes over 23 million entities, enabling intelligent data applications across university departments.
… multi-source data fusion within varying spatial and temporal resolutions. This article reviews current techniques of multi-source remote sensing data fusion … , ie, pixel/data level, feature …
The development of real-time road condition systems will better monitor road network operation status. However, the weak point of all these systems is their need for comprehensive and reliable data. For traffic data acquisition, two sources are currently available: 1) floating vehicles and 2) remote traffic microwave sensors (RTMS). The former consists of the use of mobile probe vehicles as mobile sensors, and the latter consists of a set of fixed point detectors installed in the roads. First, the structure of a three-layer BP neural network is designed to achieve the fusion of the floating car data (FCD) and the fixed detector data (FDD) efficiently. Second, in order to improve the accuracy of traffic speed estimation, a multi-source data fusion model that combines information from floating vehicles and microwave sensors, and that, by using GA-PSO-BP neural network is proposed. The proposed model has combined GA and PSO ingeniously. The hybrid model can not only overcome the difficulties of the traditional fusion model of its estimation inaccuracy, but also compensate the insufficiency of the traditional BP algorithm. Finally, this system has been tested and implemented on actual roads, and the simulation results show the accuracy of data has reached 98%.
… go into the development of a multi-source data fusion algorithm are described. Features that … for data fusion based on a voting like process that tries to adjudicate conflict among the data…
Traffic congestion occurs when traffic demand is greater than the available network capacity. It is characterized by lower vehicle speeds, increased travel times, arrival unreliability, and longer vehicular queueing. Congestion can also impose a negative impact on the society by decreasing the quality of life with increased pollution, especially in urban areas. To mitigate the congestion problem, traffic engineers and scientists need quality, comprehensive, and accurate data to estimate the state of traffic flow. Various types of data collection technologies have different advantages and disadvantages as well as data characteristics, such as accuracy, sampling frequency, and geospatial coverage. Multisource data fusion increases the accuracy and provides a comprehensive estimation of the performance of traffic flow on a road network. This paper presents a literature overview related to the estimation of congestion and prediction based on the data collected from multiple sources. An overview of data fusion methods and congestion indicators used in the literature for traffic state and congestion estimation is given. Results of these methods are analyzed, and a disseminative analysis of the advantages and disadvantages of surveyed methods is presented.
… We proposed to improve land cover classification accuracy by integrating multi-source RS features through data fusion. We further investigated the effect of different RS features on …
Abstract Using multi-source sensing data based on the Internet of Things (IoT) with artificial intelligence and big data processing technology to achieve predictive maintenance of mechanical equipment can remarkably improve the service life of the machine and reduce labor costs when diagnosing mechanical faults, and it has become a highly relevant research topic. In this paper, the multi-source sensing data fusion models and fusion algorithms are studied and discussed. First, the Joint Directors of Laboratories (JDL) fusion model and the Hierarchical fusion model are compared and analyzed. Then, various types of fusion algorithms based on Neural Networks and Deep Learning, including Dempster-Shafer (D-S) evidence theory and their applications in mechanical fault diagnosis and fault prediction, are studied and compared. The findings reveal that exploring and designing a more intelligent fusion model incorporating the beneficial characteristics of different fusion algorithms are challenging and have a certain value for promoting the development of mechanical fault diagnosis and prediction.
… In remote sensing applications, the most widely used multisource classification techniques … The aim of this research is (a) to compare different data fusion techniques for the …
Hyperspectral and multispectral images (HS/MS) fusion and classification as an important branch of data quality improvement and interpretation have attracted increasing attention in recent years. However, the unavailable sensor prior still limits the performance of many traditional fusion methods, consequently deteriorating the classification results. Despite the unsupervised methods based on convolutional neural network (CNN) making a lot of attempts to mitigate the limitations, challenges with extracting the long-range dependencies hamper the performance. To address these impediments, a transformer-based baseline constructed by the cross-scale mixing attention transformer (CSMFormer) is designed for HS/MS fusion and classification. Especially, the spatial–spectral mixer (SSMixer) is utilized to extract the long-range dependencies at a large scale. Simultaneously, cross-scale feature calibration is achieved by combining information from the original scale. After that, the nonlinear enhancement module (NLEM) is designed to encourage feature discrimination. Note that the spatial and spectral mixers can be replaced by any spatial–spectral feature extractors. Therefore, the proposed CSMFormer is flexible in data fusion, land-covers’ classification, segmentation, and so on. Experiments about data fusion and land-covers’ classification on two HS/MS wetland remote sensing scenes demonstrate the superiority of the proposed CSMFormer baseline, improving the data quality and classification precision.
… , in this paper, multi-source sensors are mounted on the rolling mill to collect various data. … monitoring with multi-source sensing data, compared to the other states of the art DL methods. …
Forest plays an important role in global carbon, hydrological and atmospheric cycles and provides a wide range of valuable ecosystem services. Timely and accurate forest-type mapping is an essential topic for forest resource inventory supporting forest management, conservation biology and ecological restoration. Despite efforts and progress having been made in forest cover mapping using multi-source remotely sensed data, fine spatial, temporal and spectral resolution modeling for forest type distinction is still limited. In this paper, we proposed a novel spatial-temporal-spectral fusion framework through spatial-spectral fusion and spatial-temporal fusion. Addressing the shortcomings of the commonly-used spatial-spectral fusion model, we proposed a novel spatial-spectral fusion model called the Segmented Difference Value method (SEGDV) to generate fine spatial-spectra-resolution images by blending the China environment 1A series satellite (HJ-1A) multispectral image (Charge Coupled Device (CCD)) and Hyperspectral Imager (HSI). A Hierarchical Spatiotemporal Adaptive Fusion Model (HSTAFM) was used to conduct spatial-temporal fusion to generate the fine spatial-temporal-resolution image by blending the HJ-1A CCD and Moderate Resolution Imaging Spectroradiometer (MODIS) data. The spatial-spectral-temporal information was utilized simultaneously to distinguish various forest types. Experimental results of the classification comparison conducted in the Gan River source nature reserves showed that the proposed method could enhance spatial, temporal and spectral information effectively, and the fused dataset yielded the highest classification accuracy of 83.6% compared with the classification results derived from single Landsat-8 (69.95%), single spatial-spectral fusion (70.95%) and single spatial-temporal fusion (78.94%) images, thereby indicating that the proposed method could be valid and applicable in forest type classification.
Geological remote sensing interpretation can extract elements of interest from multiple types of images, which is vital in geological survey and mapping, especially in inaccessible regions. However, due to numerous classes, high interclass similarities, complex distributions, and sample imbalances of geological elements, the interpretation results of machine learning (ML)-based methods are understandably worse than manual visual interpretation. In addition, scholars in remote sensing have mainly carried out their works to interpret a single geological element category, such as mineral, lithological, soil, and structure. The interpretation of multiple geological elements is missing, which is more in line with the open world. To improve the interpretation results of ML-based methods and reduce the labor cost in geological survey and mapping, we propose a deep learning (DL)-feature-based adaptive multisource data fusion network (AMSDFNet) for the efficient interpretation of multiple geological remote sensing elements. The AMSDFNet has two branches for learning valuable spatial and spectral information from two kinds of data sources, in which the atrous spatial pyramid pooling (ASPP) operation and an attention block are applied to adaptively extract and fuse multiscale informative features. A hard example mining algorithm was also added to select important training examples to address sample imbalance. A large-scale region in western China with sufficient geological elements was set as the research area. The proposed model improved the two critical metrics by about 2% in the experiment section. As far as we know, this research work is the first time DL features and multisource remote sensing images have been utilized to simultaneously interpret geological elements of lithology, soil, surface water, and glaciers. The extensive experimental results demonstrated the superiority of DL features and our model in geological remote sensing interpretation.
… encompasses theory, techniques and tools conceived and employed for exploiting the synergy in information acquired from multiple sources (sensor, databases, information gathered …
Building safety assessment based on single sensor data has the problems of low reliability and high uncertainty. Therefore, this paper proposes a novel multi-source sensor data fusion method based on Improved Dempster–Shafer (D-S) evidence theory and Back Propagation Neural Network (BPNN). Before data fusion, the improved self-support function is adopted to preprocess the original data. The process of data fusion is divided into three steps: Firstly, the feature of the same kind of sensor data is extracted by the adaptive weighted average method as the input source of BPNN. Then, BPNN is trained and its output is used as the basic probability assignment (BPA) of D-S evidence theory. Finally, Bhattacharyya Distance (BD) is introduced to improve D-S evidence theory from two aspects of evidence distance and conflict factors, and multi-source data fusion is realized by D-S synthesis rules. In practical application, a three-level information fusion framework of the data level, the feature level, and the decision level is proposed, and the safety status of buildings is evaluated by using multi-source sensor data. The results show that compared with the fusion result of the traditional D-S evidence theory, the algorithm improves the accuracy of the overall safety state assessment of the building and reduces the MSE from 0.18 to 0.01%.
In recent years, significant progress has been made in multi-source navigation data fusion methods, driven by rapid advancements in multi-sensor technology, artificial intelligence (AI) algorithms, and computational capabilities. On one hand, fusion methods based on filtering theory, such as Kalman Filtering (KF), Particle Filtering (PF), and Federated Filtering (FF), have been continuously optimized, enabling effective handling of non-linear and non-Gaussian noise issues. On the other hand, the introduction of AI technologies like deep learning and reinforcement learning has provided new solutions for multi-source data fusion, particularly enhancing adaptive capabilities in complex and dynamic environments. Additionally, methods based on Factor Graph Optimization (FGO) have also demonstrated advantages in multi-source data fusion, offering better handling of global consistency problems. In the future, with the widespread adoption of technologies such as 5G, the Internet of Things, and edge computing, multi-source navigation data fusion is expected to evolve towards real-time processing, intelligence, and distributed systems. So far, Fusion methods mainly include optimal estimation methods, filtering methods, uncertain reasoning methods, Multiple Model Estimation (MME), AI, and so on. To analyze the performance of these methods and provide a reliable theoretical reference and basis for the design and development of a multi-source data fusion system. This paper summarizes the characteristics of these fusion methods and the corresponding adaptation scenarios. These results can provide references for theoretical research, system development, and application in the fields of autonomous driving, unmanned vehicle navigation, and intelligent navigation.
… In this paper, the ALWMJ-SRC algorithm is proposed for multisource remote sensing data fusion and classification. The proposed algorithm, based on the multitask joint SR framework, …
… fusing multimodal information for a large-scale intelligent video surveillance … , data fusion, and sensor tasking. The visualization not only displays 2-D, 3-D, and geographical information …
… Based on these premises, we believe that a more advanced large-scale system should play the role of active assistance to help security work, instead of a new mechanism to replace …
With the development of research on multi-modal data fusion and its combination with online data management, the application of multi-modal big data fusion in theinformation management systems is more and more extensive. How to integrate multi-modal big data effectively is the key technology to building an e�cient information management system. In this paper, based on the combination of a multi-support vector machine and convolutional neural network, the feature-level data fusion of multi-source heterogeneous big data is implemented, and it is applied to the real data set to test the relevant model. Experimental results show that this method can not only realize heterogeneous integration of big data, but also has high accuracy and reliability.
Abstract Deep learning has shown great strength in many fields and has allowed people to live more conveniently and intelligently. However, deep learning requires a considerable amount of uniform training data, which introduces difficulties in many application scenarios. On the one hand, in real-time systems, training data are constantly generated, but users cannot immediately obtain this vast amount of training data. On the other hand, training data from heterogeneous sources have different data formats. Therefore, existing deep learning frameworks are not able to train all data together. In this paper, we propose the iFusion framework, which achieves efficient intelligence fusion for deep learning from real-time data and heterogeneous data. For real-time data, we train only newly arrived data to obtain a new discrimination model and fuse the previously trained models to obtain the discrimination result. For heterogeneous data, different types of data are trained separately; then, we fuse the different discrimination models so that it is not necessary to consider heterogeneous data formats. We use a method based on Dempster-Shafer theory (DST) to fuse the discrimination models. We apply iFusion to the deep learning of medical image data, and the results of the experiments show the effectiveness of the proposed method.
Network analysis has been widely applied in many real-world tasks, such as gene analysis and targeted marketing. To extract effective features for these analysis tasks, network embedding automatically learns a low-dimensional vector representation for each node, such that the meaningful topological proximity is well preserved. While the embedding algorithms on pure topological structure have attracted considerable attention, in practice, nodes are often abundantly accompanied with other types of meaningful information, such as node attributes, second-order proximity, and link directionality. A general framework for incorporating the heterogeneous information into network embedding could be potentially helpful in learning better vector representations. However, it remains a challenging task to jointly embed the geometrical structure and a distinct type of information due to the heterogeneity. In addition, the real-world networks often contain a large number of nodes, which put demands on the scalability of the embedding algorithms. To bridge the gap, in this article, we propose a general embedding framework named Heterogeneous Information Learning in Large-scale networks (HILL) to accelerate the joint learning. It enables the simultaneous node proximity assessing process to be done in a distributed manner by decomposing the complex modeling and optimization into many simple and independent sub-problems. We validate the significant correlation between the heterogeneous information and topological structure, and illustrate the generalizability of HILL by applying it to perform attributed network embedding and second-order proximity learning. A variation is proposed for link directionality modeling. Experimental results on real-world networks demonstrate the effectiveness and efficiency of HILL.
Machine Learning-based Intrusion Detection Systems (ML-IDS) are core functionalities in responding to today’s cyber-attacks by learning, detecting, and classifying various attack patterns. However, despite achieving high overall accuracy, existing ML-IDS approaches suffer from high false positive and false negative rates for certain attack patterns due to limited generalization performances. This research proposes a novel dataset construction method that enhances the performance of ML-IDS by integrating heterogeneous security data to expand feature representations. Our approach integrates data collected from heterogeneous domains based on timestamps and evaluates the expanded feature space regarding information gain and entropy difference. The proposed method dynamically adjusts the time window for data fusion based on the evaluation of the feature space, thereby generating an optimal dataset. Our approach leverages multiple security data sources to enhance dataset quality and improve the classification performance of ML-IDS models. Experimental results demonstrate that the proposed dataset fusion mechanism enhances learning and generalization performance. Experimental results of the dataset reconstruction demonstrate improved performance of multiple baseline models on the CIC-IDS-2018 dataset, particularly in detecting attack patterns with previously high false positive rates. Notably, base models trained on the reconstructed dataset achieved a macro F1-score of 0.9968, surpassing state-of-the-art baselines. These results demonstrate that our approach to improving dataset quality can effectively enhance the performance of existing ML-IDS.
Collaborative filtering (CF) has been one of the most important and popular recommendation methods, which aims at predicting users’ preferences (ratings) based on their past behaviors. Recently, various types of side information beyond the explicit ratings users give to items, such as social connections among users and metadata of items, have been introduced into CF and shown to be useful for improving recommendation performance. However, previous works process different types of information separately, thus failing to capture the correlations that might exist across them. To address this problem, in this work, we study the application of heterogeneous information network (HIN), which offers a unifying and flexible representation of different types of side information, to enhance CF-based recommendation methods. However, we face challenging issues in HIN-based recommendation, i.e., how to capture similarities of complex semantics between users and items in a HIN, and how to effectively fuse these similarities to improve final recommendation performance. To address these issues, we apply metagraph to similarity computation and solve the information fusion problem with a “matrix factorization (MF) + factorization machine (FM)” framework. For the MF part, we obtain the user-item similarity matrix from each metagraph and then apply low-rank matrix approximation to obtain latent features for both users and items. For the FM part, we apply FM with Group lasso (FMG) on the features obtained from the MF part to train the recommending model and, at the same time, identify the useful metagraphs. Besides FMG, a two-stage method, we further propose an end-to-end method, hierarchical attention fusing, to fuse metagraph-based similarities for the final recommendation. Experimental results on four large real-world datasets show that the two proposed frameworks significantly outperform existing state-of-the-art methods in terms of recommendation performance.
… information of decision-making tasks. To achieve the efficient integration of heterogeneous large-scale data from energy storage power stations, this study presents a novel data fusion …
… architecture of a large scale heterogeneous monitoring system and the application of decentralized sensor fusion mechanisms for efficient information extraction and data reduction. …
Data heterogeneity can pose a great challenge to process and systematically fuse low-level data from different modalities with no recourse to heuristics and manual adjustments and refinements. In this paper, a new methodology is introduced for the fusion of measured data for detecting and predicting weather-driven natural hazards. The proposed research introduces a robust theoretical and algorithmic framework for the fusion of heterogeneous data in near real time. We establish a flexible information-based fusion framework with a target optimality criterion of choice, which for illustration, is specialized to a maximum entropy principle and a least effort principle for semisupervised learning with noisy labels. We develop a methodology to account for multimodality data and a solution for addressing inherent sensor limitations. In our case study of interest, namely, that of flood density estimation, we further show that by fusing remote sensing and social media data, we can develop well founded and actionable flood maps. This capability is valuable in situations where environmental hazards, such as hurricanes or severe weather, affect very large areas. Relative to the state of the art working with such data, our proposed information-theoretic solution is principled and systematic, while offering a joint exploitation of any set of heterogeneous sensor modalities with minimally assuming priors. This flexibility is coupled with the ability to quantitatively and clearly state the fusion principles with very reasonable computational costs. The proposed method is tested and substantiated with the multimodality data of a 2013 Boulder Colorado flood event.
… The other is the automatic method, which mainly involves low-level data to fuse semantic information without human intervention, and can be used for large-scale data. However, there …
The advent of the era of big data has led to the emergence of heterogeneous network data fusion as a prominent area of research. Heterogeneous network data is characterised by multi-modality, multi-source, and high dimensionality, which presents significant challenges for traditional data fusion methods. These methods often encounter difficulties in processing such data, including issues such as information redundancy, data inconsistency, and high computational complexity. This paper proposes a heterogeneous network data fusion model based on a deep neural network. The model employs the Multi-Layer Perceptron (MLP) as its fundamental framework, utilising the deep neural network to facilitate joint feature representation learning on data from disparate modalities. The Adaptive Feature Reconstruction Module enables the model to learn the interrelationships between different modalities and to balance the importance of different modal features in the fusion process in a dynamic manner. Furthermore, we introduce an innovative cross-modal attention mechanism, which is capable of effectively capturing the coupling relationship between deep features in heterogeneous data, thereby enhancing the expressiveness and data fusion efficacy of the model. The experimental results demonstrate that the proposed model markedly enhances the accuracy of classification and regression tasks in comparison to traditional methodologies.
… utilization of heterogeneous data and have the problem of information loss in the process of semantic information fusion. In this paper, we propose a Multi-scale Semantic Fusion …
… reliability of heterogeneous unmanned systems. This paper proposes a federated fault diagnosis … based on data fusion, which combines visual images and multi-sensor information to …
… and preference relation information, and heterogeneous … an expert clustering and information fusion method. First, … is proposed to classify the large-scale experts into several clusters. …
At present, energy saving and emission reduction had become a problem of great concern for mankind. At the same time, there were some problems in the mining industry, such as waste of resources, low efficiency and easy occurrence of industrial accidents. Therefore, this paper had designed a deep well construction big data platform. The high precision and bear great pressure sensors were added to the system to solve the difficult problem of collecting information in deep wells by ordinary sensors. The multi-source heterogeneous data fusion algorithm was added to the system to solve the problem that the format of the data acquisition was different. In conclusion, the completion of the platform could achieve data monitoring in the process of mines. It not only helps to enhance the safety of mine construction, but also provides data analytical tools for further theoretical research of mine construction.
… large scale fusion of heterogeneous and noisy information. DPN agents can establish meaningful information filtering channels between the relevant information … -level information fusion…
Multi-modal transportation recommendation has a goal of recommending a travel plan which considers various transportation modes, such as walking, cycling, automobile, and public transit, and how to connect among these modes. The successful development of multi-modal transportation recommendation systems can help to satisfy the diversified needs of travelers and improve the efficiency of transport networks. However, existing transport recommender systems mainly focus on unimodal transport planning. To this end, in this paper, we propose a joint representation learning framework for multi-modal transportation recommendation based on a carefully-constructed multi-modal transportation graph. Specifically, we first extract a multi-modal transportation graph from large-scale map query data to describe the concurrency of users, Origin-Destination (OD) pairs, and transport modes. Then, we provide effective solutions for the optimization problem and develop an anchor embedding for transport modes to initialize the embeddings of transport modes. Moreover, we infer user relevance and OD pair relevance, and incorporate them to regularize the representation learning. Finally, we exploit the learned representations for online multimodal transportation recommendations. Indeed, our method has been deployed into one of the largest navigation Apps to serve hundreds of millions of users, and extensive experimental results with real-world map query data demonstrate the enhanced performance of the proposed method for multimodal transportation recommendations.
This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet, obtaining state-of-the-art results.
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g., textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond-level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
The fusion of multi-modal data (e.g., magnetic resonance imaging (MRI) and positron emission tomography (PET)) has been prevalent for accurate identification of Alzheimer’s disease (AD) by providing complementary structural and functional information. However, most of the existing methods simply concatenate multi-modal features in the original space and ignore their underlying associations which may provide more discriminative characteristics for AD identification. Meanwhile, how to overcome the overfitting issue caused by high-dimensional multi-modal data remains appealing. To this end, we propose a relation-induced multi-modal shared representation learning method for AD diagnosis. The proposed method integrates representation learning, dimension reduction, and classifier modeling into a unified framework. Specifically, the framework first obtains multi-modal shared representations by learning a bi-directional mapping between original space and shared space. Within this shared space, we utilize several relational regularizers (including feature-feature, feature-label, and sample-sample regularizers) and auxiliary regularizers to encourage learning underlying associations inherent in multi-modal data and alleviate overfitting, respectively. Next, we project the shared representations into the target space for AD diagnosis. To validate the effectiveness of our proposed approach, we conduct extensive experiments on two independent datasets (i.e., ADNI-1 and ADNI-2), and the experimental results demonstrate that our proposed method outperforms several state-of-the-art methods.
Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.
… (KGs); however, there is still much multi-modal (textual, visual) … solution called multi-modal knowledge representation learning (… Instead of simply integrating multi-modal knowledge with …
Sentiment and sarcasm are intimate and complex, as sarcasm often deliberately elicits an emotional response in order to achieve its specific purpose. Current challenges in multi-modal sentiment and sarcasm joint detection mainly include multi-modal representation fusion and the modeling of the intrinsic relationship between sentiment and sarcasm. To address these challenges, we propose a single-input stream self-adaptive representation learning model (SRLM) for sentiment and sarcasm joint recognition. Specifically, we divide the image into blocks to learn its serialized features and fuse textual feature as input to the target model. Then, we introduce an adaptive representation learning network using a gated network approach for sarcasm and sentiment classification. In this framework, each task is equipped with its dedicated expert network responsible for learning task-specific information, while the shared expert knowledge is acquired and weighted through the gating network. Finally, comprehensive experiments conducted on two publicly available datasets, namely Memotion and MUStARD, demonstrate the effectiveness of the proposed model when compared to state-of-the-art baselines. The results reveal a notable improvement on the performance of sentiment and sarcasm tasks.
Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for down-stream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and intermodality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization.
Multi-modal reinforcement learning (RL) has been brought into focus due to its ability to provide complementary information from different sensors, enriching observations of agents. However, the introduction of multi-modal highdimensional observations brings challenges to sample efficiency. There is a lack of research on how to efficiently obtain multi-modal latent states while encouraging them to generate complementary information. To address this, we propose a representation learning method, Multi-modal Joint Predictive Representation (MJPR), which utilizes multi-modal interactive information to predict future latent states. The joint prediction method achieves the representation training for modalities and promotes each modality to generate complementary information related to predictions of each other. In addition, we introduce multi-modal loss balancing to prompt training equilibrium and cross-modal contrastive learning (CMCL) to align the modalities for effective modal interaction. We establish the multi-modal environments in the Deepmind Control suite (DMC) and Webots and compare our method with current RL representation methods. Experimental results show that MJPR outperforms state-of-the-art methods by an average of 12.0% on six subtasks in DMC environments. It outperforms advanced methods by 16.7% and 55.4% in simple tasks and complex tasks of Webots environment, respectively. Moreover, ablation experiments are established in the DMC environment to verify the importance of each module to MJPR.
… Collectively, these works reinforce the importance of multimodal learning for classification and fusion. Building on these approaches, our work explores joint representation learning …
本报告通过对文献的结构化梳理,将多源异构信息融合领域划分为七大维度:从基础综述到深度多模态联合表征,涵盖了知识图谱与语义集成、隐私保护集成、物理工业传感诊断、遥感空间应用以及行业推荐决策系统。这些研究展示了从底层数据处理、算法建模到垂直场景应用的全链路发展态势,突出了异构性解决、鲁棒性提升及语义互操作性的核心研究价值。