Importance of effective Data Management in analytics

Electric utilities across the world are investing big in advanced analytics to transform the business on their journey to establishing a data-driven organization.

Utilities are using an enormous amount of data from various devices and systems in a plethora of use cases. Examples include optimizing the electrical grid for grid reliability, Distributed Energy Resource (DER) optimization, and predictive asset maintenance. Deployment of smart meters and sensors across networks are providing further insights and opportunities. With advanced analytics serving as an essential differentiator from past practices in achieving business goals, there are tremendous opportunities to unlock new business capabilities and business models to enable a practical digital utility.

As in other industries, when it comes to analytics projects, too much time is spent on collecting, cleaning, and preparing data for analysis. A survey of data scientists¹ in the US conducted by Forbes magazine¹ provides a valuable insight; 80% of a data scientist’s time is spent on data preparation on typical analytical projects.

So, while utilities need their data scientists to develop a superior algorithm, the bulk of their time is instead spent on untangling and organizing data. Focus is lost, and the results are sub-optimal. Recognizing that data scientists are a finite and highly valued resource, a goal of Southern California Edison is to leverage their time in more beneficial ways - such as data exploration, data visualization, and analysis - that facilitate making the right business decisions.

Looking a little deeper, the amount of time required to prepare the data for a specific use case depends on the health of the data. The various types of data collected from different sources, such as asset data from asset management systems, network connectivity data from GIS/EMS systems, or SCADA information from a historian come in a variety of sizes and are different.

Still, it is crucial to model and connect the information in a coherent way, even though the existing utility IT systems are not designed to enable easy correlation of the data across various functional domains. Take the example of electrical asset information, an important subject area required for many analytical use cases. An asset’s information (asset characteristics, asset operational state, operational parameters, connectivity, location, etc.) is spread across multiple systems. The introduction of customer-owned assets, in the case of DERs, further complicates this scenario.

In this case, a data scientist needs to make sure the data received from these systems is correctly formatted and adhere to specific rule sets. This can be a daunting task for anyone, and it certainly isn’t the glamorous work data scientists hope to be performing when choosing their career paths! Furthermore, there will be other challenges like the quality of the data, which can adversely impact the accuracy and performance of the analytical model. According to Gartner,² “Poor data quality is hitting organizations where it hurts – to the tune of $15 million as the average annual financial cost in 2017.” The value derived from this analytical model can only be as good as the information sources.

Whether it is the case of utilities or other sectors, the importance of data management is often ignored and considered a lower priority when investing in analytic efforts. Data management does not happen by accident. One of the main reasons for bad data and data non-availability is because an adept data management approach is absent, and data quality is poorly managed in the context of inter-application data-in-motion and data-at-rest. An effective data management strategy should address the data availability, quality, and accessibility for easy analysis. This would significantly reduce the amount of time spent on data preparation activities in an analytical project.

Developing building blocks

In the case of utilities, many of these challenges can be addressed by investing in individual foundational blocks, which can help to leverage the best from the analytical efforts.

Electrical connectivity model: Maintaining and managing the electrical connectivity model through the lifecycle of the grid is critical. Utilities generally have a range of enterprise systems in the area of Asset Management, Work Management, Engineering, Mapping and GIS, Mobile Mapping, planning and engineering, etc. The challenge is to provide a single version of the truth about grid connectivity by consolidating the information for various source systems and representing the real-world scenario of the grid (Electrical, Communication, and structural networks), including the status of various equipment, configurations, and settings. The Electrical Connectivity Model should have the latest connectivity information about all the stages of the grid that include forecasting, planning, designing, as-built, and as-operated views (both transmission and distribution). Also, it should share and receive information from various planning, engineering, and operational systems seamlessly by consuming and exposing services.

Information model: To enable an IT system that can act as a system of truth for electrical connectivity and other data subject areas, it is vital to define information/data models with a common vocabulary, mainly when the data is scattered across multiple systems. An information management approach is needed to achieve consistent system development, integration and analysis. A standard industry technique in resolving enterprise semantics is to map information sources to each other. Key challenges with this approach include:

Difficulty arriving at the collective agreement of semantics across all uses
Varying formats and change rates of mapping sources (i.e., inconsistencies due to revisions, upgrades, and replacements)

As a means of resolving these issues, industry-standard information models are often employed. However, this initially adds standard terminology to enterprise semantics. Therefore, critical challenges with this approach include:

Additional semantic mapping to develop and maintain
The complexity of understanding and using multiple standards
Differences in the format of mapping sources
Possible internal model vulnerability to external model changes

Data quality management – Stopping garbage-in garbage-out: Data quality issues are not new to utilities, but often these are ignored, or overcome with temporary/siloed solutions. With a changing landscape, where there is a need to merge data across the enterprise landscape to support analytical needs, these kinds of approaches will fail to meet the desired business outcome. From a practical point of view, it will be impossible to solve all of the data quality problems. However, a data quality management strategy should include:

Identification of all the potential data quality challenges
Impact of the data quality challenges on business capabilities
Priority identification
Ability to measure and monitor the quality
A framework for fixing the high priority issues
Establishing a governance process to manage and maintain the data

Data-driven system development: Understanding the future state business process and analytic requirements helps to drive a top-down approach to realize system implementations that directly support corporate goals. Rationalizing these requirements with Enterprise Architecture and Business Architecture provides traceability from those goals for ‘data-in-motion’ and ‘data-at-rest’ project solutions .Deriving this common understanding further requires a common vocabulary that applies semantic meaning to the terms being used. Capturing this semantic meaning is critical for building efficient systems that can be easily correlated to business functions and processes. This step in the process brings clarity to applications, data, and technology by articulating:

What data is required?
What are the data sources?
How do we prepare the data?
What are the integration points?
What kind of technologies are required (data preparation, algorithm deployment, integration, etc.)?

At SCE, we have implemented data management processes to address many of the challenges listed above, to reduce the cycle time involved in data preparation. These include:

Establishing a single version of truth for data
Limiting unnecessary data duplication and proliferation.
Improving data quality, integrity, consistency, availability, and accessibility
Reducing lifecycle costs for integrating systems and data stores
Enabling integration and analytics to be performed in step with business needs, while ensuring that each increment of functionality aligns with the overall solution
Minimizing the impact on existing systems and data stores when replacing a system
Allowing the assimilation of data required for holistic decision making, analysis, planning, risk management, reporting, etc
Allowing for the development of new reports and functionality not previously available in any off-the-shelf applications

Please join us for future articles where we will be describing our data management approach, followed by examples of how this approach is being used to resolve particularly challenging requirements, such as:

Electrical distribution and transmission system planning
DER management
Environmental data (weather conditions like fire, windstorm, etc.) management

References

1. Press, G. (2016, March 23). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Retrieved from https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#7c1509476f63
2. How to Stop Data Quality Undermining Your Business. (n.d.). Retrieved from https://www.gartner.com/smarterwithgartner/how-to-stop-data-quality-undermining-your-business/