Use Machine Learning to Automate Measurement Point Identification Throughout Your Enterprise
image credit: Royalty free by Ryzhi, Shutterstock
- Mar 26, 2020 8:59 pm GMT
- 555 views
Often, we think of innovation as an invention that creates a brand-new class of devices like the iPhone or a “disruptive” app that changes our daily life like Google or Facebook. Sometimes innovation can happen in a business model by the incremental application of a technology like machine learning (ML) over time. For example, a recent Harvard Business Review noted: “you need only a computer system to be able to perform tasks traditionally handled by people.”[i] In a previous post we discussed using ML and open source tools to automate load prediction. Our present ML innovation project involves automating measurement management.
As the Grid complexity increases daily, so do the jobs of the engineers who constantly create, update, and load measurement files into each of their Operational Technology (OT) systems. One issue that makes this work complex: each SCADA, OMS, and GIS system has its own naming convention for data points. When a new device is added to network, each OT system needs to be updated as well. Typically, this is a highly manual process as each system manages its own name space. In many system architectures, engineers register new devices in the GIS/OMS system, generate new SCADA data point files, and send points to the SCADA system. This often involves engineers running scripts that were custom developed to manage the difference of metadata (name spaces and device models) across these various OT systems.
In some instances, our utility customers benefit from our automated Asset ID Management (AIM) solution that automatically keep Operational Technology (OT) systems measurement fields up to date with new devices and avoids the manual registration of these devices.
Creating the configuration files and the scripts that automate the AIM process requires considerable of manual code development by experience engineers and is the perfect place where ML can increase productivity.
Automated Measurement Point Identification
In organizations that encode OMS measurement lookups into their SCADA data point names, an extract, transform, load (ETL) process can be employed. Specifically, the system extracts date point names, transforms the points to OMS measurements, and loads the data into the OMS systems.
This all sounds great; but, with 10,000’s of data points creating the mappings can be time consuming, prone to errors, and rely on institutional knowledge. This seemed like a perfect task performed by people where ML could be helpful and provide innovation to a process that is repeating throughout the power industry.
Applying Machine Learning to OMS SCADA Points Table Creation
Typically, SCADA systems’ data point naming conventions are complex and not uniformly applied. The task with which we thought ML could help was to automatically determine the type of measurement associated with a particular SCADA data point so that we could automatically create a measurement table in an Outage Management System. With our automated mapping solution in place, we created an ML algorithm to create prediction rules for datapoints in the OMS measurement tables to eliminate the manual process of creating configuration files.
As shown in the above diagram:
- The automated process starts with measurement point discovery.
- The data point name retrieved from the SCADA system via ICCP.
- The attribute part is extracted from the ICCP name. The AttributeName extractor performs a search in the SCADA table.
- The first AttributeName match is returned. If no match is found, an exception report is created and the search process for this ICCP name stops.
- The base name is extracted and tokenized.
- A BaseName Extractor removes the attribute name substring from the ICCP name.
- The BaseName is now available for further processing.
- A tokenizer splits the BaseName into tokens with a user defined separator. After the tokenization, Production rules based on relative association between SCADA tokens and OMS tokens are generated in a more legible format for users.
- A few examples of SCADA point to OMS name mapping are used as a supervised dataset to train the neural network. From these mappings, the previously mentioned production rules are generated, which then are the labels for the model to learn.
- The ordered list of extracted tokens is available for further processing.
- The ML algorithm processes the tokens and predicts a production rule.
- OmsName generator tries to match token set against a list of regular expressions ordered by probabilities of prediction. For a match with probability above the user-defined cutoff, it applies the corresponding production rule to generate the OmsName candidate, which it verifies against the SCADA_POINTS Table.
- If it does not match, it iterates using the next Regular Expression.
- If no match is found, an entry is added to the exception report.
- The OmsName is available for the configuration file.
Our preliminary results indicate 96.51% accuracy for OMS namespace generation with limited training examples. As we continue with this project, we expect to see the accuracy improve with additional data points for learning.
Generalization was our biggest challenge. Ultimately, we wanted to make parts of the process user configurable. This meant the automated mapping process needed to learn the different types of mappings in a way we could give control and feedback to the user. We considered different propositions based on Natural Language Processing, Reinforcement Learning and ML to solve the problem. Base on the observed nature of different mappings, we chose the classification approach based on Neural Networks.
Another challenge was that to simulate a real-life scenario. The model had to learn about the production rules from a very limited set of examples. Initially, we generated a single OMSName based on the top prediction. But, in the case of ambiguity, we wanted to let the user decide between two OMSNames. This could happen if the probabilistic score given by the model was similar for 2 or more points. Additionally, we decided to add a cutoff level. This creates a human-assisted ML process: with each iteration, the learning process could improve and resolve the previously known ambiguities.
In addition to better accuracy, we are leveraging open-source python libraries like Pandas and NumPy to create an ML pipeline for the mapping, thus increasing the computational efficiency and reducing load time. The future will entail Name Entity Recognition for tokenization. This means we will seek to identify different components within a name, such as a device ID and attribute, with unstructured text.
This is another example of how we applied ML and open source libraries to eliminate manual repetitive tasks for operations engineers. On the scale of innovative disruption, it may not become a verb like the way we “Google” a topic to get information. However, as a practice over time, the incremental application of ML is a path to automation and productivity. It reduces monotonous tasks for engineers and it increases their available time to ensure Grid reliability and optimization. Internally, it enables our Professional Services engineers to deliver increased value to our customers by greatly decreasing the time required to integrate name spaces across OT systems.
[i] Iansiti, Marco and Lakhani, Karim R. “Competing in the Age of AI,” HBR (Jan-Feb 2020)