钢铁企业蒸汽系统多源数据解析与运行知识图谱构建
工业多源异构数据集成、清洗与特征工程
该组文献关注工业环境下(钢铁、电网、矿山等)传感器、PLC、文本及日志等异构数据的集成方法。研究重点包括数据清洗、降维、特征选择、云边协同处理以及基于概率软逻辑的数据融合,为构建知识图谱提供高质量、标准化的数据基础。
- Multisource heterogeneous data integration and real-time analysis technology for cigarette production digital workshop(Borun Chen, Bing Hu, Hao Hu, Minh-Duc Cao, R. Lu, 2025, No journal)
- Multi-Source Heterogeneous Data Fusion Analysis Platform for Thermal Power Plants(Jianqi Wang, J. Wen, Huihui Gao, Chenchen Kang, 2025, Journal of Architectural Research and Development)
- Dimension Reduction of Multi-Source Time Series Sensor Data for Industrial Process(Jiao Meng, Xin Huo, Changchun He, Chao Zhu, 2024, 2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE))
- Multimodal Time Series Data Fusion Based on SSAE and LSTM(Qiding Zhu, Shukui Zhang, Yang Zhang, C. Yu, Mengli Dang, Li Zhang, 2021, 2021 IEEE Wireless Communications and Networking Conference (WCNC))
- A Novel Fusion and Feature Selection Framework for Multisource Time-Series Data Based on Information Entropy(Xiuwei Chen, Li Lai, Maokang Luo, 2025, IEEE Transactions on Neural Networks and Learning Systems)
- Mining Heterogeneous Data for Formulation Design(Krati Saxena, Ashwini Patil, Sagar Sunkle, V. Kulkarni, 2020, 2020 International Conference on Data Mining Workshops (ICDMW))
- The Linkup Data Structure for Heterogeneous Data Integration Platform(Michal Chromiak, K. Stencel, 2012, No journal)
- Research on multi-source heterogeneous data fusion method of substation based on cloud edge collaboration and AI technology(Pei Sun, Bo Zhao, Xiang Li, 2025, Discover Applied Sciences)
- Design of Intelligent Generation System for Power Grid Fault Reports Based on Multi-source Heterogeneous Data Fusion(M. Hou, Baichi Ou, Xuanzhen Chen, Xianzheng Chen, Lijun Mo, Jiaxuan Tang, Ming Gao, Ziyi Li, Yuanfa Cen, 2025, 2025 5th International Conference on Electrical Engineering and Control Science (IC2ECS))
- Research on multi-source heterogeneous data fusion system based on device object model(Yunqi Chen, 2025, ITM Web of Conferences)
- Research on Multi-source Heterogeneous Data Cleaning Technology based on Integrating Neural Network with Fuzzy Rules for Renewable Energy Accommodation(Rongfu Sun, Yuhui Wu, Haibo Lan, Yulin Wang, Ran Ding, Jian Xu, S. Liao, Jia Hu, Yamin Sun, 2020, 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2))
- Automated Data Generation Method for Power System Online Security Assessment Based on Multi-Source Heterogeneous Information Fusion(Bing Wang, Zhihong Yu, Lulu Zhang, Yupei Jia, Haibo Jia, Yong Tang, 2025, 2025 10th Asia Conference on Power and Electrical Engineering (ACPEE))
工业日志解析与基础数据验证技术
专注于工业软件系统日志的自动化解析、验证以及基础数据结构的解析方法。这是从非结构化运行记录中提取关键事件、构建运行知识图谱底层事实的关键技术支撑。
- Understanding Industrial Log Analysis: A Multi-Dataset Evaluation of Parsing and Anomaly Detection(Yicheng Sun, J. Keung, Yihan Liao, Hi Kuen Yu, 2025, 2025 32nd Asia-Pacific Software Engineering Conference (APSEC))
- Parsing and verification method of basic power grid data based on multi data source fusion(Zhibin Zhou, Zhiguo Zhou, Xiongfeng Ye, 2025, Int. J. Bus. Intell. Data Min.)
领域本体建模与知识图谱语义集成框架
研究如何利用本体论(Ontology)、大语言模型(LLM)和多模态嵌入技术,对工业设备、生产流程(如冷轧、水电)及维护知识进行语义建模。涵盖了IT与OT系统的语义融合方案,是实现钢铁蒸汽系统知识标准化表达的核心。
- Automated Knowledge Graph Generation for Hydropower Plants using ISO 81346 and RAG-enhanced Large Language Models(Surya T. Kandukuri, Kristoffer Tangrand, L. Vognild, Grunde Olimstad, 2025, 2025 14th International Conference on Renewable Energy Research and Applications (ICRERA))
- A Proposal of a Knowledge Graph for Digital Engineering Systems Integration for Operation and Maintenance Activities in Industrial Plants(Elvismary Molina De Armas, Geiza Maria Hamazaki Da Silva, Yenier Torres Izquierdo, Melissa Lemos, Paulo Vinícius De Lima Britto, E. Corseuil, Robinson Luiz Souza Garcia, 2024, Proceedings of the 20th Brazilian Symposium on Information Systems)
- Knowledge Graph-Enhanced Control and Diagnosis in Smart Building Energy Systems(Hao Yin, 2025, 2025 3rd International Conference on Artificial Intelligence and Automation Control (AIAC))
- Semantic Integration of FMEA Knowledge into Manufacturing Control: A Runtime Exception Handling Framework(Alexander Verkhov, Andreas Lober, H. Baumgärtel, Lisa Ollinger, 2025, 2025 25th International Conference on Control, Automation and Systems (ICCAS))
- Multimodal Knowledge Graph Embedding With Missing Data Integration(Yuan Liang, 2025, IEEE Transactions on Computational Social Systems)
- Semantic Framework for IT-OT Integration in Industrial Environments(Anand Todkar, Mrinmoy Sarkar, Jitendra Solanki, 2025, Annual Conference of the PHM Society)
- Modelling of Resources and Activity of the Scrap Iron and Steel Reverse Supply Chain Service Based on Ontology(Yingqing Xiong, Yi Liu, 2021, Journal of Physics: Conference Series)
- A Unified Ontology for Scalable Knowledge Graph-Driven Operational Data Analytics in High-Performance Computing Systems(Junaid Ahmed Khan, Andrea Bartolini, 2025, ArXiv)
- SCRO: A Domain Ontology for Describing Steel Cold Rolling Processes towards Industry 4.0(Sadeer Beden, Qiushi Cao, A. Beckmann, 2021, Inf.)
- Semantic Modeling and Rule-Based Evaluation of Building Chiller Plant Operation Using Brick Ontology(Xiaoyu Jia, Yiqun Pan, Rongxin Yin, 2025, Proceedings of the 12th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation)
- Research on the Construction of a Knowledge Graph Information Search Engine in the web Domain(Yuzhong Zhou, Zhèng-Hóng Lin, Jiahao Shi, Yuliang Yang, Rong Wei, 2023, 2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT))
多变量时序数据异常检测与运行状态监测
侧重于蒸汽系统及相关工业设施的实时监控。利用变分自编码器(VAE)、Transformer、图神经网络等深度学习模型,针对多变量时间序列进行异常检测、漂移监测和热故障诊断,确保系统运行安全性。
- UDDT: An Unsupervised Drift Detection Method for Industrial Time Series Data(Deepti Maduskar, Divyasheel Sharma, Chandrika Kr, Reuben Borrison, Gianluca Manca, Marcel Dix, 2023, 2023 IEEE 2nd Industrial Electronics Society Annual On-Line Conference (ONCON))
- Unsupervised Deep Anomaly Detection for Industrial Multivariate Time Series Data(Wenqiang Liu, Li Yan, Ningning Ma, Gaozhou Wang, Xiaolong Ma, Peishun Liu, Ruichun Tang, 2024, Applied Sciences)
- Anomaly Detection Method for Industrial Control System Operation Data Based on Time–Frequency Fusion Feature Attention Encoding(Jiayi Liu, Yun Sha, WenChang Zhang, Yong Yan, Xuejun Liu, 2024, Sensors (Basel, Switzerland))
- Anomaly detection in multivariate time series data using deep ensemble models(Amjad Iqbal, Rashid Amin, Faisal S. Alsubaei, Abdulrahman Alzahrani, 2024, PLOS ONE)
- Active Fault Diagnosis for Gas-Steam Combined Cycle Power Plant Considering Variable Working Conditions(Yang Han, Demin Xu, Fan Zhou, Ying Liu, Jun Zhao, Wei Wang, 2024, 2024 14th Asian Control Conference (ASCC))
- An Anomaly Detection Method for Multivariate Time Series Data Based on Variational Autoencoders and Association Discrepancy(Haodong Wang, Huaxiong Zhang, 2025, Mathematics)
- Research on Thermal Fault Diagnosis Technology of Generator Stator Winding Insulation Based on Graph Semi-supervised Method(Cong Chen, Jun Xu, Zhiqiang Wang, Lei Hu, Xiaojian Wang, Yucai Wu, 2024, 2024 IEEE 7th International Electrical and Energy Conference (CIEEC))
- TinyAD: Memory-Efficient Anomaly Detection for Time-Series Data in Industrial IoT(Yuting Sun, Tong Chen, Q. Nguyen, Hongzhi Yin, 2023, IEEE Transactions on Industrial Informatics)
- MSAnomaly: Time Series Anomaly Detection with Multi-scale Augmentation and Fusion(Tao Yin, Zhibin Zhang, Shikang Hou, Huan Zhao, Lijiao Zheng, Yifei Zhou, Jin Xie, Meng Yan, 2024, No journal)
- TSMixAD: A Time-Series Anomaly Detection Framework for Industrial Control Systems Incorporating Time-Frequency Domain Data Augmentation Techniques(Yunkai Song, Huihui Huang, Qiang Wei, Lu Liu, Zihan Wei, 2025, Proceedings of the 2025 6th International Conference on Computer Information and Big Data Applications)
时序语义关联推理与动态逻辑校验
探讨时序特征与知识图谱的深度融合,涉及基于语义感知的事件链路推理、时序数据的马尔可夫性质校验等。旨在挖掘工业运行过程中的隐性逻辑关系,为蒸汽系统的动态推理与智能监控提供支撑。
- Semantic-aware event link reasoning over industrial knowledge graph embedding time series data(Bin Zhou, Xingwang Shen, Yuqian Lu, Xinyu Li, B. Hua, Tianyuan Liu, Jinsong Bao, 2022, International Journal of Production Research)
- Markov property checking methods for time series data(Xinran Chen, Huan Fang, 2025, No journal)
- Intelligent Monitoring of Hydropower Signals Based on Knowledge Graph(Wei Luo, Zheng Zhang, Ze He, 2023, 2023 6th International Conference on Data Storage and Data Engineering (DSDE))
钢铁能源系统参数识别与多能耦合调度优化
直接针对钢铁企业能源系统的物理特性,开展参数识别、多周期建模以及煤气-蒸汽-电力耦合调度优化研究。体现了知识图谱在辅助决策、节能降耗及运行优化层面的最终应用价值。
- Parameter Identification for Iron and Steel Enterprise Steam Network Model(Yong Ma, Yanguang Sun, Yujiao Zeng, 2013, J. Networks)
- Data-driven multi-period modeling and optimization for the industrial steam system of large-scale refineries(Tiantian Xu, Tianyue Li, J. Long, Liang Zhao, Wenli Du, 2023, Chemical Engineering Science)
- Gas–steam–electricity coupling optimal scheduling of iron and steel enterprises considering equipment maintenance plan(Zhengbiao Hu, Tingting Lu, Dongfeng He, Lili Jiang, Zonghua Li, 2025, Ironmaking & Steelmaking: Processes, Products and Applications)
本组文献构建了从底层多源异构数据解析、中层领域知识建模到上层智能应用的全栈技术路径。研究涵盖了工业日志与传感数据的清洗集成、基于本体的知识图谱构建、针对时序数据的深度学习异常检测、以及结合语义关联的动态推理技术。最终落脚于钢铁企业蒸汽系统的多能耦合调度与运行优化,为实现钢铁工业能源系统的数字化转型与智能化运行提供了系统的理论框架与算法支撑。
总计41篇相关文献
This paper introduces the Steel Cold Rolling Ontology (SCRO) to model and capture domain knowledge of cold rolling processes and activities within a steel plant. A case study is set up that uses real-world cold rolling data sets to validate the performance and functionality of SCRO. This includes using the Ontop framework to deploy virtual knowledge graphs for data access, data integration, data querying, and condition-based maintenance purposes. SCRO is evaluated using OOPS!, the ontology pitfall detection system, and feedback from domain experts from Tata Steel.
Formulated products such as cosmetics, personal care, pharmaceutical products and industrial products such as paints and coatings are a multi-billion dollar industry. Experts carry out designing of new formulations in most of these industries based on their knowledge and basic search from online and offline resources. Reference data for formulation design comes in several formats and from multiple sources with diverse representation. We present an approach to mine the heterogeneous data for formulation design with case studies of cosmetics and steel coating industries. Our contribution is threefold. First, we show data extraction and mining techniques from multi-source and multi-modal text data. Second, we describe how we store and retrieve the data in graph databases. Lastly, we demonstrate the use of extracted and stored data for a simple recommendation system based on data search techniques that aid the experts for the synthesis of new formulation design.
With the advent of big data era, industrial process data has become increasingly large, which are characterized by large-scale, multi-source, and it’s difficult to analyze these high-dimensional data directly. In order to solve this problem, dimension reduction of high-dimensional data is necessary, which saves a lot of resources for subsequent data processing. This paper proposes a dimension reduction method to reduce scale for time series and analyze the correlation between multi-source sensors. For time series with large time scales, an adaptive largesttriangle-three-buckets method is proposed, which adaptively selects the optimal bucket number according to the similarity between downsampled data and original data of multi-source data. Further, robust principal component analysis is used to decompose the high-dimensional data into low-rank matrix and sparse matrix. The low-rank matrix represents the low-dimension principal component of multi-source data, and correlation analysis reduces its dimension further. Experiments are carried out on industrial excavator dataset to verify the effectiveness and preponderance of the method.
No abstract available
Log analysis plays a critical role in monitoring and maintaining the safety of industrial software systems. However, most existing research relies heavily on benchmark datasets derived from legacy or open-source systems, which fail to capture the structural diversity and operational complexity of real-world industrial logs. In this study, we present a comprehensive empirical evaluation of log parsing and anomaly detection models across four diverse datasets, including three collected from largescale industrial software deployed in manufacturing, process control, and energy monitoring environments. Our analysis reveals that state-of-the-art models—particularly rule-based parsers and supervised detectors—experience substantial performance degradation when applied to industrial settings. To address this gap, we introduce a unified evaluation framework using representative training subsets, and we highlight the effectiveness of semisupervised and LLM-based approaches in handling heterogeneous, low-resource log environments. The findings offer practical insights into the limitations of current log analysis techniques and suggest design principles for building more robust, domain-adaptive solutions for industrial software risk mitigation.
Aiming at the problems in power grid fault report generation under the “dual carbon” background, such as insufficient fusion of multi-source heterogeneous data, frequent model hallucinations, and heavy reliance on manual work, this paper proposes an intelligent generation system for power grid fault reports based on multimodal fusion and domain knowledge enhancement. The system achieves a key feature extraction error of $\lt 1 \mathrm{~ms}$ for fault record graphs by constructing a unified representation model for multi-source data and an “OCR+CNN” dual-channel image parsing algorithm; suppresses model hallucinations and ensures the logical consistency of reports by relying on a domain knowledge-enhanced RAG-LLM collaborative framework; and completes the full-process automation from fault data collection and information retrieval to report generation by combining a multi-level retrieval strategy and standardized templates. Experimental verification shows that the system can improve the fault report generation efficiency to the minute-level, outperforming traditional manual methods in data fusion accuracy, image recognition accuracy, and language logic. It successfully realizes the transformation of fault reports from “manual experience” to “intelligent generation”, providing highly reliable intelligent support for power grid dispatching and operation inspection.
No abstract available
No abstract available
The effective fusion of data from multi-source heterogeneous automation subsystems in coal mines is a current industry problem. Problems such as inconsistent equipment models and inconvenient collection and configuration need to be solved urgently. Analyze the main electromechanical equipment and attribute information of the mine, establish a unified equipment object model, and develop equipment sensor monitoring data configuration systems for multiple protocols such as PLC, OPC, and TCP. It has functions such as online visualization configuration of equipment object attribute information, autonomous selection of collection protocols, and adaptive configuration of distributed parsing services. According to the on-site application table, the system has improved the efficiency of data fusion collection, reduced the difficulty of operation and maintenance, and accelerated the progress of intelligent mine construction, which has certain reference value.
With the acceleration of intelligent transformation of energy system, the monitoring of equipment operation status and optimization of production process in thermal power plants face the challenge of multi-source heterogeneous data integration. In view of the heterogeneous characteristics of physical sensor data, including temperature, vibration and pressure that generated by boilers, steam turbines and other key equipment and real-time working condition data of SCADA system, this paper proposes a multi-source heterogeneous data fusion and analysis platform for thermal power plants based on edge computing and deep learning. By constructing a multi-level fusion architecture, the platform adopts dynamic weight allocation strategy and 5D digital twin model to realize the collaborative analysis of physical sensor data, simulation calculation results and expert knowledge. The data fusion module combines Kalman filter, wavelet transform and Bayesian estimation method to solve the problem of data time series alignment and dimension difference. Simulation results show that the data fusion accuracy can be improved to more than 98%, and the calculation delay can be controlled within 500 ms. The data analysis module integrates Dymola simulation model and AERMOD pollutant diffusion model, supports the cascade analysis of boiler combustion efficiency prediction and flue gas emission monitoring, system response time is less than 2 seconds, and data consistency verification accuracy reaches 99.5%.
In the iron and steel production process, significant fluctuations in the consumption and production of energy media occur during equipment maintenance, causing difficulties in the equipment operation. Therefore, it is required to reconcile the conflict between global scheduling optimisation and to fulfil the need for stable equipment operation during maintenance. This study integrates heuristic rules into the model with the objective of minimising both system energy operation costs (EOCs) and equipment fluctuation expenses. A multi-period optimal scheduling model for the energy system in steel enterprises was developed herein. Results demonstrate that, under standard production conditions, optimising EOCs (Scheme A) and multi-objective optimisation (Scheme B) both fully utilise the storage function of the gasholder. After optimisation, EOCs are reduced by 9.07% and 8.67%, respectively, with only marginal changes in equipment fuel fluctuation costs. Furthermore, during the maintenance of blast furnace No. 1, although Scheme B results in a 0.26% higher EOC than Scheme A, equipment fuel fluctuation costs are reduced by 83.3%, significantly enhancing the equipment operational stability. This not only optimises energy utilisation, but also mitigates operational challenges, aligning more closely with scheduling requirements. Thus, under abnormal operating conditions, Scheme B leads to notable improvements in both system EOCs and equipment fuel fluctuations. Further analysis suggests that iron and steel enterprises can strategically schedule equipment maintenance during periods of high electricity prices to effectively reduce overall system energy costs.
Scrap steel reverse supply chain service system is a typical complex system. The realization of the service activities involves a large number of service resources. And there are multiple relationships between atomic service activities and service resources. To realize the search matching between complex service resources and service activity demand in scrap steel reverse supply chain service, the modelling method of service resource information in scrap steel reverse supply chain based on ontology is proposed. After analysing the classification method of service resource, the ontology information model of service resources and service activities is constructed. Then the search matching between service demander and service resource provider is transformed into the mapping between service resource ontology and service activity ontology to solve. Finally, the ontology model of specific instance is established, and the Semantic Web ontology language OWL is used. The model is proved to be correct and feasible by describing the instance service resource ontology and service activity ontology.
No abstract available
No abstract available
This study proposes a comprehensive framework for multi-source heterogeneous data integration and real-time analysis in cigarette production digital workshops, addressing critical challenges in dynamic industrial environments. By combining edge-cloud collaboration, adaptive semantic reconciliation, and lightweight stream processing, the framework achieves efficient fusion of heterogeneous data from IoT sensors, PLC systems, and quality inspection units. A hybrid modeling approach integrating Support Vector Regression (SVR) and Long Short-Term Memory (LSTM) networks captures spatiotemporal dependencies in production data, reducing defect prediction errors by 12.8% compared to conventional methods. Experimental results demonstrate 94.7% cross-source anomaly detection accuracy with sub-200ms latency, alongside tangible operational improvements: 63% reduction in unplanned downtime, 15.2% energy savings in drying processes, and 2.1 tons/month reduction in material waste. The system exhibits scalability across product categories, enhancing parameter optimization cycles by 22% for traditional cigarettes and batch consistency by 18%fornovel tobacco products. While demonstrating efficacy in real-world production scenarios, limitations in ultra-high-frequency data processing highlight directions for future research, including quantum-enhanced algorithms and 5G-Advanced-enabled digital twins. This work establishes a foundational architecture for intelligent manufacturing in process industries, bridging the gap between real-time analytics and autonomous decision-making.
As a key critical of the power system, a substantial number of intelligent devices are deployed in the substation. The data generated by these devices exhibit exponential growth, and the data types are diverse and complex, encompassing both structured and unstructured data. The integration of multi-source heterogeneous data is of significant importance for enhancing the operational efficiency, fault early warning capability, and intelligent level of the substations. However, the integration of multi-source heterogeneous data in substations has consistently presented a challenge for AI technology intervention. The effective integration of data from different platforms remains a significant challenge at present. Consequently, this paper proposes a multi-source heterogeneous data fusion method for substations based on cloud edge collaboration and AI technology. This method is based on artificial intelligence cloud services and constructs a substation cloud edge collaborative network based on AI Cloud architecture. It utilizes the 5G PaaS platform to establish substation cloud PaaS and edge PaaS, respectively, and employs a dynamic task scheduling strategy for substation cloud edge collaboration to achieve the collaboration of multi-source heterogeneous data in the substation cloud and edge. A heterogeneous data resource pool is established, and the data fusion module uses the dynamic Bayes network model in AI technology to achieve the fusion of multi-source heterogeneous data in the substation. The experimental results demonstrate that the method proposed in this paper can more effectively control the energy consumption of edge nodes, exhibits high efficiency in data fusion, and can effectively integrate and display multi-source heterogeneous data. All information gain values exceed 0.96, with the integrity value being the highest, approaching 100%. The method demonstrates a strong capability in fusing multi-source heterogeneous data of substations. Build a substation cloud side collaborative network based on AI Cloud architecture; Establishing cloud PaaS and side PaaS respectively is the key step to realize substation data fusion; The cooperative dynamic task scheduling strategy is adopted to adjust the execution order of tasks. Build a substation cloud side collaborative network based on AI Cloud architecture; Establishing cloud PaaS and side PaaS respectively is the key step to realize substation data fusion; The cooperative dynamic task scheduling strategy is adopted to adjust the execution order of tasks.
In real-world network scenarios, modal absence may be caused by various factors, such as sensor damage, data corruption, and human errors in recording. Effectively integrating multimodal missing data still poses significant challenges. Different combinations of missing modes can form feature sets of inconsistent dimensions and quantities. Additionally, effectively merging multimodal data requires a thorough understanding of specific modal information and intermodal interactions. The abundance of missing data can significantly reduce the sample set size, leading to learning interaction features from only a few samples. Moreover, there is a lack of clear correspondence between heterogeneous data from different sources. To address these issues, we focus our research on multimodal knowledge graph scenarios with different types of structures and content and develop a new knowledge graph embedding method. First, we use three embedding components to automatically extract feature vector representations of items from the structural content, textual content, and visual content of the knowledge graph. Then, we divide the dataset into several modal groups and model these modal groups using a multilayer network structure, with each multilayer network corresponding to a specific multimodal combination. Subsequently, we construct corresponding multilayer network projection layers and propose a two-stage GAT-based transfer learning framework for the projection layers, in which the extracted incomplete multimodal information and intermodal interaction information are integrated and mapped to a low-dimensional space. Finally, we not only theoretically prove the feasibility of the proposed method but also validate its effectiveness through extensive comparative experiments on multiple datasets.
In order to avoid conservative results, when calculating the ability of power system to accommodate renewable energy, social data need to be integrated. Data cleaning is an important step in integration. Traditional data cleaning methods rely only on data and ignore the implicit rules. This paper first analyzes the data requirements for renewable energy accommodation, and then proposes a data cleaning technology. This technology digs out the implicit associations between data, converts them into fuzzy rules in the form of Probabilistic soft logic (PSL), and builds a neural network based on this for data cleaning.
No abstract available
Modern high-performance computing (HPC) systems generate massive volumes of heterogeneous telemetry data from millions of sensors monitoring compute, memory, power, cooling, and storage subsystems. As HPC infrastructures scale to support increasingly complex workloads-including generative AI-the need for efficient, reliable, and interoperable telemetry analysis becomes critical. Operational Data Analytics (ODA) has emerged to address these demands; however, the reliance on schema-less storage solutions limits data accessibility and semantic integration. Ontologies and knowledge graphs (KG) provide an effective way to enable efficient and expressive data querying by capturing domain semantics, but they face challenges such as significant storage overhead and the limited applicability of existing ontologies, which are often tailored to specific HPC systems only. In this paper, we present the first unified ontology for ODA in HPC systems, designed to enable semantic interoperability across heterogeneous data centers. Our ontology models telemetry data from the two largest publicly available ODA datasets-M100 (Cineca, Italy) and F-DATA (Fugaku, Japan)-within a single data model. The ontology is validated through 36 competency questions reflecting real-world stakeholder requirements, and we introduce modeling optimizations that reduce knowledge graph (KG) storage overhead by up to 38.84% compared to a previous approach, with an additional 26.82% reduction depending on the desired deployment configuration. This work paves the way for scalable ODA KGs and supports not only analysis within individual systems, but also cross-system analysis across heterogeneous HPC systems.
Hydroelectric power plants present unique digitalization challenges as distributed, aging facilities with decades-old equipment configurations that vary significantly across installations. Conventional data analytics deployment requires custom adaptations to each facility’s unique equipment setup, creating scalability barriers for fleet-wide condition monitoring and asset management even if the facilities are owned by a single utility company. Equipment modifications, replacements, and modernization over operational lifespans further complicate data standardization, as as-planned, as-built, and as-is configurations diverge substantially. This paper presents an automated methodology for constructing and maintaining knowledge graphs based on ISO 81346-10: 2022 Reference Designation System specifications using Large Language Models to address the data challenges. The approach enables rapid generation of machine-readable facility representations that capture hierarchical equipment relationships and semantic metadata essential for scalable industrial analytics. A case study using operational Norwegian hydropower data demonstrates $\gt 95 \%$ accuracy in automated knowledge graph construction while ensuring $100 \%$ RDS compliance. The methodology provides a foundation for fleet-wide digital standardization, enabling consistent data integration and analytics deployment across heterogeneous hydropower installations.
Semantic Modeling and Rule-Based Evaluation of Building Chiller Plant Operation Using Brick Ontology
The operation of building chiller plants is often inefficient due to issues such as low temperature differences and high water flow rates, which are difficult to detect and diagnose in practice. To address this challenge, this study develops a semantic framework for evaluating chiller plant operation using the Brick ontology. First, the equipment and sensor points of a chiller plant are modeled as a semantic graph that defines classes, attributes, and relationships between building energy systems. Then, expert knowledge and operational experience are encoded into semantic rules, which enable automated reasoning over the knowledge graph. These rules diagnose inefficiencies such as small temperature differences by sequentially analyzing relative heat-to-flow ratios, chilled water supply and return temperatures, and terminal setpoints. The semantic model is linked to the building's operational database through unique identifiers (UUIDs), enabling real-time query execution and inference. A case study is conducted on an educational building located in Shanghai, China, where the proposed approach is applied to the actual chiller plant system. The results demonstrate the feasibility of the method in identifying and interpreting inefficient operational conditions. Moreover, the ontology-based framework not only provides a standardized representation of equipment and sensors but also enhances the interpretability and reusability of diagnostic rules across different systems. This work highlights the potential of combining Brick ontology and expert rules for semantic-driven performance evaluation of building chiller plants.
The diagnosis of early stator insulation thermal faults is crucial for the safe operation of steam turbine generators. Traditionally, the water outlet temperature of the cooling water system is monitored using a threshold, but this method has low accuracy and cannot effectively detect early anomalies. To achieve more accurate and timely insulation thermal fault detection, utilizing massive operational data through machine learning is an effective approach.,but there are problems such as vague physical meaning of dimensionality reduction features and high cost of sample preparation. Therefore, based on extracting the trend and temperature difference features of the water outlet, this paper adopts a graph semi-supervised method for fault diagnosis, which is conducive to intuitive understanding of the diagnosis results and reduces the dependence on the number of samples. Finally, experiments were conducted on the operational data of a steam turbine generator with model QFSN-660-2-22. The results show that the method has a diagnosis accuracy of over 98% for early stator insulation thermal faults, and is superior to traditional methods such as SVM and semi-supervised SVM under the same conditions. (Abstract)
Context: Over the last years, we have observed Knowledge Graphs (KGs) being used more and more as a tool for representing knowledge, data integration and querying data. Problem: There are many distinguished yet partially-integrated information management systems used to support the life-cycle of Oil and Gas industrial plants. Our approach considers a 3D plants viewer system, a visual navigation system on platforms, and the integrated intelligent search system. However, these systems lack a semantic integration that can guide the user actions over each functionality for a unique asset. Solution: This paper presents the use of KGs to represent and help monitoring and controlling operational and maintenance activities within an Oil and Gas industrial environment. Our approach highlights the challenges and initial work required to establish a fully-integrated management domain, where the execution of the aforementioned activities can easily be managed. SI Theory: This study draws inspiration from Representation Theory, which posits that an information system faithfully mirrors specific phenomena occurring in the physical world. Method: To develop this work, it was necessary to review the literature related to the development of KGs and ontologies. The generated KG was developed using well-established standards like the Industrial Data Ontology (IDO), and the Capital Facilities Information Handover Specification (CFIHOS), complemented with the use of other ontologies. Summary of Results: A prototype of the conceptual KG was implemented, verifying the viability of our approach for data integration. Contributions and Impact in IS area: The resulted graph contains the main terms in compliance with international semantic standards for representing operational and maintenance activities data associated with facilities involved in Oil and Gas production. Finally, the KG resulting from this effort can be further extended through the incorporation of new tools and subdomains in the industrial plants life-cycle.
This paper presents an ontology that incorporates the Failure Mode and Effect Analysis (FMEA) including its extension for Monitoring and System Response (FMEA-MSR) based on the AIAG/VDA 2019 standard. Building on this foundation, the work introduces PFMEA-MSR, a novel semantic extension of Process Failure Mode and Effects Analysis (PFMEA) that enables the runtime integration of failure knowledge in automated production systems. Traditional FMEA approaches are limited to static risk assessments during the design and planning phases. PFMEA-MSR allows access to formalized fault handling strategies during operation and supports automated adaptive system responses in Industry 4.0 environments. A semantic knowledge graph is developed to represent the relationships between failure modes, system functions, risk metrics, and corrective actions. This graph underpins a control architecture in which programmable logic controllers (PLCs) interact with a skill orchestrator via semantic queries to perform context-sensitive exception handling. Moreover, this research introduces an inference mechanism for dynamic risk assessment that is evaluated using an industry-related demonstration system. By establishing a seamless and automated information flow between failure documentation and operational control, the PFMEA-MSR framework enables scalable, adaptive manufacturing processes while enhancing traceability, risk transparency, and runtime resilience.
This paper presents a semantic framework to bridge the gap between IT-OT integration in industrial environments. The proposed solution addresses fundamental challenges of PHM (prognostics and health management) by providing contextualized semantic information from the shop floor to enterprise IT systems. Built upon an OPCUA (Open Platform Communications Unified Architecture) aggregation server architecture, the framework leverages OPCUA Information Models and companion specifications as its foundation for semantic representation. By transforming these models into knowledge graphs stored in RDF format, the system enables sophisticated semantic information retrieval through SPARQL-based semantic queries that can traverse complex relationships between equipment, processes, and operational parameters. The framework further implements GraphQL to automatically generate a Type schema derived from OPCUA types, creating a unified query interface that facilitates IT-like interaction with industrial data. This semantic approach significantly improves fault diagnostics, predictive maintenance, and anomaly detection by preserving contextual relationships that are often lost in traditional data integration methods. Furthermore, the GraphQL schema provides a structured foundation for generative AI applications to formulate contextually appropriate queries, extract relevant maintenance insights, and generate human-interpretable explanations of equipment health patterns, all while maintaining semantic fidelity across the IT-OT boundary. The vertical integration capability ensures that domain-specific models remain coherent across organizational levels such as line, area, floor, etc., enabling PHM practitioners to implement more effective condition-based maintenance strategies with improved visibility into causal factors affecting equipment reliability and performance.
This paper presents a knowledge graph-driven framework for intelligent control and fault diagnosis across multi-domain energy systems in smart buildings. The framework semantically integrates heterogeneous data from electrical, HVAC, and water supply subsystems using an ontology-based knowledge graph, enabling unified modeling, real-time reasoning, and inter-system coordination. A multi-agent reinforcement learning module is developed to dynamically optimize energy control strategies, while a fault diagnosis engine combines graph neural networks and Bayesian inference for accurate and explainable anomaly detection. Experimental evaluation was conducted on 12-month operational data from three representative buildings, encompassing over 220 million sensor records. Results show that the proposed framework achieves an average energy savings of $12.5 \%$, fault diagnosis accuracy of $93.1 \%$, and end-to-end reasoning latency below 0.5 seconds. The framework demonstrates strong interpretability, scalability, and adaptability under diverse building scenarios. This work contributes a generalizable, semantic-driven architecture for future building energy management and intelligent operation.
The article proposes a construction approach for a knowledge graph data search engine in the field of power grids. A detailed analysis is conducted on the construction, design, and operational implementation of the knowledge graph. A search engine system based on fuzzy algorithms and a search engine system integrating domain feature knowledge graphs are designed, which can recognize entities/assertions based on the Text Rank model, accurately search for external knowledge based on semantic methods, and set a global KG update strategy. The algorithm can achieve high entity/assertion recognition accuracy, automatically map domain knowledge to local KG, quickly update the service knowledge base online, and achieve high accuracy answers with low response delay. According to the testing results of the information and communication branch of the power company. The accuracy and recall of the keyword search algorithm in this method are 6.62%, 7.96%, and 2.24% higher, respectively. It can effectively overcome the ambiguity of power grid problem expression and accurately search for power grid knowledge.
Hydropower station monitoring system collects a large number of system running signals, and it is critical to intelligently monitor these signals and timely process them, which can ensure the safe and stable operation of hydropower stations. However, it is quite challenging to achieve intelligent monitoring efficiency. There are problems such as various device types, complicated signal information and difficult extraction of latent knowledge. In this paper, we propose a novel intelligent monitoring framework of hydropower signals based on knowledge graph. Specifically, we first collect amount of operational unstructured log text data from a real-world hydropower monitoring system, and then propose a hydropower signal knowledge graph (HSKG) construction method by combining the semantic parsing technology and expertise in the field of hydropower operation. We further propose a BERT-BiGRU-CRF model for automatic entity extraction of collected data. Finally, we develop an intelligent signal analysis model based on the constructed HSKG. The experimental results on real-world datasets evaluate the effectiveness and efficiency of our method for intelligent hydropower signal monitoring.
Information technology growth brings vast time-series data. Despite richness, challenges like redundancy emphasize the need for time-series data fusion research. Rough set theory, a valuable tool for dealing with uncertainty, can identify features and reduce dimensionality, enhancing time-series data fusion. The contribution of the study lies in establishing a fusion and feature selection framework for multisource time-series data. This framework selects optimal information sources by minimizing entropy. In addition, the fusion process integrates a feature selection algorithm to eliminate redundant features, preventing a sequential increase in entropy. Crucial experiments on abundant datasets demonstrate that the proposed approach outperforms several state-of-the-art algorithms in terms of enhancing the accuracy of common classifiers. This research significantly advances the field of time-series data fusion in rough set theory, offering improved accuracy and efficiency in data processing and analysis.
With the rapid development of deep learning, researchers are actively exploring its applications in the field of industrial anomaly detection. Deep learning methods differ significantly from traditional mathematical modeling approaches, eliminating the need for intricate mathematical derivations and offering greater flexibility. Deep learning technologies have demonstrated outstanding performance in anomaly detection problems and gained widespread recognition. However, when dealing with multivariate data anomaly detection problems, deep learning faces challenges such as large-scale data annotation and handling relationships between complex data variables. To address these challenges, this study proposes an innovative and lightweight deep learning model—the Attention-Based Deep Convolutional Autoencoding Prediction Network (AT-DCAEP). The model consists of a characterization network based on convolutional autoencoders and a prediction network based on attention mechanisms. The AT-DCAEP exhibits excellent performance in multivariate time series data anomaly detection without the need for pre-labeling large-scale datasets, making it an efficient unsupervised anomaly detection method. We extensively tested the performance of AT-DCAEP on six publicly available datasets, and the results show that compared to current state-of-the-art methods, AT-DCAEP demonstrates superior performance, achieving the optimal balance between anomaly detection performance and computational cost.
The time series data in the manufacturing process reflects the sequential state of the manufacturing system, and the fusion of temporal features into the industrial knowledge graph will undoubtedly significantly improve the knowledge process efficiency of the manufacturing system. This paper proposes a semantic-aware event link reasoning over an industrial knowledge graph embedding time series data. Its knowledge graph skeleton is constructed through a specific manufacturing process. NLTK is used to transform technical documents into a structured industrial knowledge graph. We employ deep learning (DL)-based models to obtain semantic information related to product quality prediction using time series data collected from IoT devices. Then the prediction information is attached to the specified node in the knowledge graph. Thus, the knowledge graph will describe the dynamic semantic information of manufacturing contexts. Meanwhile, a dynamic event link reasoning model that uses graph embedding to aggregate manufacturing processes information is proposed. The implicit information with industrial temporal knowledge can be further mined and inferred. The case study has shown that the proposed knowledge graph link reasoning reflects dynamic temporal characteristics. Compared to the classical knowledge graph prediction models, our model is superior to the baseline methods.
Monitoring and detecting abnormal events in cyber-physical systems is crucial to industrial production. With the prevalent deployment of the industrial Internet of Things (IIoTs), an enormous amount of time-series data is collected to facilitate machine learning models for anomaly detection, and it is of the utmost importance to directly deploy the trained models on the IIoT devices. However, it is most challenging to deploy complex deep learning models such as convolutional neural networks (CNNs) on these memory-constrained IIoT devices embedded with microcontrollers (MCUs). To alleviate the memory constraints of MCUs, we propose a novel framework named Tiny Anomaly Detection (TinyAD) to efficiently facilitate onboard inference of CNNs for real-time anomaly detection. First, we conduct a comprehensive analysis of depthwise separable CNNs and regular CNNs for anomaly detection and find that the depthwise separable convolution operation can reduce the model size by 50%–90% compared with the traditional CNNs. Then, to reduce the peak memory consumption of CNNs, we explore two complementary strategies, 1) in-place; and 2) patch-by-patch memory rescheduling, and integrate them into a unified framework. The in-place method decreases the peak memory of the depthwise convolution by sparing a temporary buffer to transfer the activation results, while the patch-by-patch method further reduces the peak memory of layer-wise execution by slicing the input data into corresponding receptive fields and executing in order. Furthermore, by adjusting the dimension of convolution filters, these strategies apply to both univariate time series and multidomain time series features. Extensive experiments on real-world industrial datasets show that our framework can reduce peak memory consumption by 2–5× with negligible computation overhead.
Industrial ML models are primarily data-driven. Therefore, one of the main focus for monitoring the model should be towards identifying the drifts in the data that might affect the performance of the model. The traditional drift detecting methods are usually based on some assumptions related to the underlying data such as no inter-dependence. However industrial sensor data typically consists of time series data, which is collected at regular intervals. Therefore, detecting drift in dependent data where the current readings depend on the previously registered readings demands a different approach. Existing solutions require either the ground truth, a fixed size, or the underlying model details. We propose an Unsupervised Drift Detection method for industrial Time series data or UDDT, a generic approach with no such pre-requisites. In our approach, we can check whether two series belong to the same model. Apart from detecting the drift in the two series, it can also provide the rationale behind the observed drift, i.e., whether the drift is due to a difference in stationarity, correlation structures, or noise distributions. We evaluate the UDDT on two datasets to demonstrate its correctness and the trust regions under various circumstances. We also establish its applicability for the real industrial setting.
Driven by rapid advancements in big data and Internet of Things (IoT) technologies, time series data are now extensively utilized across diverse industrial sectors. The precise identification of anomalies in time series data—especially within intricate and ever-changing environments—has emerged as a key focus in contemporary research. This paper proposes a multivariate anomaly detection framework that synergistically combines variational autoencoders with association discrepancy analysis. By incorporating prior knowledge of associations and sequence association mechanisms, the model can capture long-term dependencies in time series and effectively model the association discrepancy between different time points. Through reconstructing time series data, the model enhances the distinction between normal and anomalous points, learning the association discrepancy during reconstruction to strengthen its ability to identify anomalies. By combining reconstruction errors and association discrepancy, the model achieves more accurate anomaly detection. Extensive experimental validation demonstrates that the proposed methodological framework achieves statistically significant improvements over existing benchmarks, attaining superior F1 scores across diverse public datasets. Notably, it exhibits enhanced capability in modeling temporal dependencies and identifying nuanced anomaly patterns. This work establishes a novel paradigm for time series anomaly detection with profound theoretical implications and practical implementations.
In recent years, sensor multimodal time series data fusion has attracted widespread attention. One of the key challenges is to extract features from multimodal data to obtain shared representations, which combines the characteristics of time series data to further improve prediction performance. To solve this problem, based on Stacked Sparse Auto-Encoder (SSAE) and Long Short-Term Memory (LSTM), we propose a multimodal time series data fusion model called SSAE-LSTM. SSAE mines the inherent correlation features of multimodal data to extract a good shared representation, which is used as the input of the LSTM neural network to perform data fusion processing. Experiments on real time series datasets demonstrate that SSAE-LSTM can obtain a good shared representation of multimodal data to predict the future development trend. Compared with other neural networks, SSAE-LSTM has better performance in Precision, Accuracy, Recall, F-score and so on.
Anomaly detection in time series data is essential for fraud detection and intrusion monitoring applications. However, it poses challenges due to data complexity and high dimensionality. Industrial applications struggle to process high-dimensional, complex data streams in real time despite existing solutions. This study introduces deep ensemble models to improve traditional time series analysis and anomaly detection methods. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks effectively handle variable-length sequences and capture long-term relationships. Convolutional Neural Networks (CNNs) are also investigated, especially for univariate or multivariate time series forecasting. The Transformer, an architecture based on Artificial Neural Networks (ANN), has demonstrated promising results in various applications, including time series prediction and anomaly detection. Graph Neural Networks (GNNs) identify time series anomalies by capturing temporal connections and interdependencies between periods, leveraging the underlying graph structure of time series data. A novel feature selection approach is proposed to address challenges posed by high-dimensional data, improving anomaly detection by selecting different or more critical features from the data. This approach outperforms previous techniques in several aspects. Overall, this research introduces state-of-the-art algorithms for anomaly detection in time series data, offering advancements in real-time processing and decision-making across various industrial sectors.
Industrial control systems are an important part of the nation's critical infrastructure, and effective anomaly detection is important to ensure system safety. However, the challenges of anomaly sample scarcity and unequal data distribution in industrial control systems limit the effectiveness of existing anomaly detection methods. Therefore, this paper proposes a time-series anomaly detection framework, TSMixAD, which integrates time-frequency domain enhancement. The Mixup method based on Dirichlet distribution is adopted in the time domain, and frequency mask, noise injection and frequency shift enhancement are adopted in the frequency domain to improve the classification ability of a sample. Tcn-transformer hybrid encoder is constructed, in which TCN is responsible for extracting local time dependencies efficiently, and Transformer is responsible for global association modeling, so as to improve the robustness of anomaly detection. We validate our approach on two publicly available industrial control system datasets, SWaT and WADI.
No abstract available
In applicable scenarios, data used for forecasting and decision-making is usually expected to exhibit characteristics like time stationarity and the Markov property, and etc. However, industrial applications often skip verifying whether the data meets these requirements, aiming to save time and effort, which may lead to inaccurate results. This paper explores the Markov property checking method from statistical and information-theoretic perspectives, and utilizes two types of time series data, named AAPL stock prices data and BPIC2012 event logs, respectively, to validate the effectiveness of proposed checking method. Experimental results show that datasets that conforming to the Markov property tend to perform better in predictive tasks.
Anomaly detection in industrial control system (ICS) data is one of the key technologies for ensuring the security monitoring of ICSs. ICS data are characterized as complex, multi-dimensional, and long-sequence time-series data that embody ICS business logic. Due to its complex and varying periodic characteristics, as well as the presence of long-distance and misaligned temporal associations among features, current anomaly detection methods in ICS are insufficient for feature extraction. This paper proposes an anomaly detection method named TFANet, based on time–frequency fusion feature attention encoding. Considering that periodic variations are more concentrated in the frequency domain, this method first transforms the time-domain data into the frequency domain, obtaining both amplitude and phase data. Then, these data, together with the original time-series data, are used to extract features from two perspectives: long-term temporal changes and long-distance associations. Finally, the six features learned from both the time and frequency domains are fused, and the feature weights are calculated using an attention mechanism to complete the anomaly classification. In multi-classification tasks on three ICS datasets, the proposed method outperforms three popular time-series models—iTransformer, Crossformer, and TimesNet—across five metrics: accuracy, precision, recall, F1 score, and AUC-ROC, with average improvements of approximately 19%, 37%, 31%, 35%, and 22%, respectively.
本组文献构建了从底层多源异构数据解析、中层领域知识建模到上层智能应用的全栈技术路径。研究涵盖了工业日志与传感数据的清洗集成、基于本体的知识图谱构建、针对时序数据的深度学习异常检测、以及结合语义关联的动态推理技术。最终落脚于钢铁企业蒸汽系统的多能耦合调度与运行优化,为实现钢铁工业能源系统的数字化转型与智能化运行提供了系统的理论框架与算法支撑。