The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 


Importance of effective Data Management in analytics

image credit: © Everythingpossible |
Vidyod Kumar M's picture
Sr.Advisor-IT Southern California Edison

An experienced IT Professional with over 20 years of experience, Vidyod has worked in multiple industry domains, which include Manufacturing, Utility, and eCommerce. He has successfully...

  • Member since 2020
  • 21 items added with 6,883 views
  • Mar 31, 2020 5:21 pm GMT

This item is part of the Special Issue - 2020-03 - Innovation in Power, click here for more

Electric utilities across the world are investing big in advanced analytics to transform the business on their journey to establishing a data-driven organization.

Utilities are using an enormous amount of data from various devices and systems in a plethora of use cases. Examples include optimizing the electrical grid for grid reliability, Distributed Energy Resource (DER) optimization, and predictive asset maintenance. Deployment of smart meters and sensors across networks are providing further insights and opportunities. With advanced analytics serving as an essential differentiator from past practices in achieving business goals, there are tremendous opportunities to unlock new business capabilities and business models to enable a practical digital utility.

Your access to Member Features is limited.

As in other industries, when it comes to analytics projects, too much time is spent on collecting, cleaning, and preparing data for analysis. A survey of data scientists1 in the US conducted by Forbes magazine1 provides a valuable insight; 80% of a data scientist’s time is spent on data preparation on typical analytical projects.

So, while utilities need their data scientists to develop a superior algorithm, the bulk of their time is instead spent on untangling and organizing data. Focus is lost, and the results are sub-optimal. Recognizing that data scientists are a finite and highly valued resource, a goal of Southern California Edison is to leverage their time in more beneficial ways - such as data exploration, data visualization, and analysis - that facilitate making the right business decisions.

Looking a little deeper, the amount of time required to prepare the data for a specific use case depends on the health of the data. The various types of data collected from different sources, such as asset data from asset management systems, network connectivity data from GIS/EMS systems, or SCADA information from a historian come in a variety of sizes and are different.

Still, it is crucial to model and connect the information in a coherent way, even though the existing utility IT systems are not designed to enable easy correlation of the data across various functional domains. Take the example of electrical asset information, an important subject area required for many analytical use cases. An asset’s information (asset characteristics, asset operational state, operational parameters, connectivity, location, etc.) is spread across multiple systems. The introduction of customer-owned assets, in the case of DERs, further complicates this scenario.


In this case, a data scientist needs to make sure the data received from these systems is correctly formatted and adhere to specific rule sets. This can be a daunting task for anyone, and it certainly isn’t the glamorous work data scientists hope to be performing when choosing their career paths! Furthermore, there will be other challenges like the quality of the data, which can adversely impact the accuracy and performance of the analytical model. According to Gartner,2 “Poor data quality is hitting organizations where it hurts – to the tune of $15 million as the average annual financial cost in 2017.” The value derived from this analytical model can only be as good as the information sources.

Whether it is the case of utilities or other sectors, the importance of data management is often ignored and considered a lower priority when investing in analytic efforts. Data management does not happen by accident. One of the main reasons for bad data and data non-availability is because an adept data management approach is absent, and data quality is poorly managed in the context of inter-application data-in-motion and data-at-rest. An effective data management strategy should address the data availability, quality, and accessibility for easy analysis. This would significantly reduce the amount of time spent on data preparation activities in an analytical project.

Developing building blocks

In the case of utilities, many of these challenges can be addressed by investing in individual foundational blocks, which can help to leverage the best from the analytical efforts.

Electrical connectivity model: Maintaining and managing the electrical connectivity model through the lifecycle of the grid is critical. Utilities generally have a range of enterprise systems in the area of Asset Management, Work Management, Engineering, Mapping and GIS, Mobile Mapping, planning and engineering, etc. The challenge is to provide a single version of the truth about grid connectivity by consolidating the information for various source systems and representing the real-world scenario of the grid (Electrical, Communication, and structural networks), including the status of various equipment, configurations, and settings. The Electrical  Connectivity Model should have the latest connectivity information about all the stages of the grid that include forecasting, planning, designing, as-built, and as-operated views (both transmission and distribution). Also, it should share and receive information from various planning, engineering, and operational systems seamlessly by consuming and exposing services.

Information model: To enable an IT system that can act as a system of truth for electrical connectivity and other data subject areas, it is vital to define information/data models with a common vocabulary, mainly when the data is scattered across multiple systems. An information management approach is needed to achieve consistent system development, integration and analysis. A standard industry technique in resolving enterprise semantics is to map information sources to each other.  Key challenges with this approach include:

  • Difficulty arriving at the collective agreement of semantics across all uses
  • Varying formats and change rates of mapping sources (i.e., inconsistencies due to revisions, upgrades, and replacements)

As a means of resolving these issues, industry-standard information models are often employed. However, this initially adds standard terminology to enterprise semantics. Therefore, critical challenges with this approach include:

  • Additional semantic mapping to develop and maintain
  • The complexity of understanding and using multiple standards
  • Differences in the format of mapping sources
  • Possible internal model vulnerability to external model changes

Data quality management – Stopping garbage-in garbage-out: Data quality issues are not new to utilities, but often these are ignored, or overcome with temporary/siloed solutions. With a changing landscape, where there is a need to merge data across the enterprise landscape to support analytical needs, these kinds of approaches will fail to meet the desired business outcome. From a practical point of view, it will be impossible to solve all of the data quality problems. However, a data quality management strategy should include:

  1. Identification of all the potential data quality challenges
  2. Impact of the data quality challenges on business capabilities
  3. Priority identification
  4. Ability to measure and monitor the quality
  5. A framework for fixing the high priority issues
  6. Establishing a governance process to manage and maintain the data

Data-driven system development: Understanding the future state business process and analytic requirements helps to drive a top-down approach to realize system implementations that directly support corporate goals. Rationalizing these requirements with Enterprise Architecture and Business Architecture provides traceability from those goals for ‘data-in-motion’ and ‘data-at-rest’ project solutions .Deriving this common understanding further requires a common vocabulary that applies semantic meaning to the terms being used. Capturing this semantic meaning is critical for building efficient systems that can be easily correlated to business functions and processes. This step in the process brings clarity to applications, data, and technology by articulating:

  1. What data is required?
  2. What are the data sources?
  3. How do we prepare the data?
  4. What are the integration points?
  5. What kind of technologies are required (data preparation, algorithm deployment, integration, etc.)?

At SCE, we have implemented data management processes to address many of the challenges listed above, to reduce the cycle time involved in data preparation. These include:

  1. Establishing a single version of truth for data
  2. Limiting unnecessary data duplication and proliferation.
  3. Improving data quality, integrity, consistency, availability, and accessibility
  4. Reducing lifecycle costs for integrating systems and data stores
  5. Enabling integration and analytics to be performed in step with business needs, while ensuring that each increment of functionality aligns with the overall solution
  6. Minimizing the impact on existing systems and data stores when replacing a system
  7. Allowing the assimilation of data required for holistic decision making, analysis, planning, risk management, reporting, etc
  8. Allowing for the development of new reports and functionality not previously available in any off-the-shelf applications

Please join us for future articles where we will be describing our data management approach, followed by examples of how this approach is being used to resolve particularly challenging requirements, such as:

  1. Electrical distribution and transmission system planning
  2. DER management
  3. Environmental data (weather conditions like fire, windstorm, etc.) management


1. Press, G. (2016, March 23). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Retrieved from
2. How to Stop Data Quality Undermining Your Business. (n.d.). Retrieved from

Vidyod Kumar M's picture
Thank Vidyod Kumar for the Post!
Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.
More posts from this member
Spell checking: Press the CTRL or COMMAND key then click on the underlined misspelled word.
Sundeep Dakarapu's picture
Sundeep Dakarapu on Mar 31, 2020

Very Well Explained 

Vidyod Kumar M's picture
Vidyod Kumar M on Mar 31, 2020

Thank you Sundeep!

Vidyod Kumar M's picture
Vidyod Kumar M on Mar 31, 2020

I would like to acknowledge contribution from Shawn Hu and Greg Robinson of Xtensible Solutions for this and upcoming article. 

Greg Robinson's picture
Greg Robinson on Mar 31, 2020

Thank you Vidyod.  It was a pleasure and we look forward to assisting with the upcoming articles; it is so exciting that we are describing concepts that have been implemented and are working well on a large scale. 

Kevin Monagle's picture
Kevin Monagle on Mar 31, 2020

Excellent article VIdyod.   Wholeheartedly support the holistic approach to building a solid foundation for analytics.  As a topic of interest, please consider adding an article on the importance of governing analytical models (including data used to validate models). 

Vidyod Kumar M's picture
Vidyod Kumar M on Mar 31, 2020

Thanks Kevin ! I agree ,governance of anlytical model is important for analytical project success, will share our experience.

Matt Chester's picture
Matt Chester on Mar 31, 2020

In the case of utilities, many of these challenges can be addressed by investing in individual foundational blocks, which can help to leverage the best from the analytical efforts.

I love the framing of this, Vidyod. These types of transformations need to be institution wide and they need to plan for the long haul-- these aren't aspects where you integrate them quickly or superficially, but really dive in deep. Have you seen any restraint from certain entities that are more prone to go for the quick fix that have to be pushed towards the more foundational change?

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 1, 2020

Thanks Matt !

My observation is that it is lack of awareness, rather than resistance. Due to lack of focus on Data preparation activities, health of data - completeness, inconsistencies, accuracy, is not understood until the model is ready to be deployed in production or after it is deployed in production. Data preparation/building the data foundation is a time consuming activity and it is not as glamorous job compared to analytics and visualization. Slowly organizations are realizing it, but it will take time.

Matt Chester's picture
Matt Chester on Apr 1, 2020

Well said, Vidyod. In many ways lack of awareness can be more risky than resistance, but at least with lack of awareness there's a clear path forward-- education!

Noah Badayos's picture
Noah Badayos on Apr 1, 2020

Very good article Vidyod!

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 1, 2020

Thank you Noah!

Jim Horstman's picture
Jim Horstman on Apr 1, 2020

Good article Vidyod!

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 1, 2020

Thanks Jim!

Sachin Nijhawan's picture
Sachin Nijhawan on Apr 2, 2020

Vidyod - great read and very well articulated.  I have been through digitization of many systems esp. generation assets at various utilties.  As highlighted in your article, one the critical aspect is data management i.e. how the organization adapts to using new systems and processes.  I think this is under-rated at various organizations and they fail to reach the optimal potential possible through data analytics.  It would be great your perspective on time it took to drive leadership and cultural transformation in this journey towards using digital tools.

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 3, 2020

Thanks, Sachin!
In our case, the organization had excellent awareness and drive for the “data-driven business decision making” process. The challenge is in the execution. As you are aware, utilities have complex systems, and these systems were developed over a long period. The problems associated with data correlation, data quality were realized when the analytical model was ready to test and deploy. Addressing these challenges requires making changes to existing systems and  engagement from various stakeholders across IT and Business. There are existing processes to manage and maintain the data; changing them to meet the analytical needs is time-consuming. 

Chris Law's picture
Chris Law on Apr 3, 2020

Very good Vidyod. Your overview is spot on.

There is one thing however I would add - its the type of data that makes this a barrier for Utilities that is the data type is time-series, but unlike SCADA where this concept is well known, it's now at 100x the scale and must be converted into the grid context quickly inline with your network and asset information as outlined (as you mentioned the network model).  Doing this fast, at scale is actually a barrier for many Utilities.  This makes the challenge very different to say an online store processing transitions.

As an example, we have many customers that have under 1M smart meters but now generate over 4 billion data points each day (each) from Power Quality data (voltage, current, phase angle etc).  This data is valuable, but converting this data into visbility of the grid was the "data barrier" they had to overcome.

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 9, 2020

Thank you, Chris, sorry for the delayed response!

I agree with your thoughts on time series data like AMI. In our case, we have over 5 million meters (collecting AMI data for the last ten years), and we were able to utilize the AMI for various grid optimization functions like outage management, transformer load analysis, and other use cases.

We could achieve this by collecting data and organizing them for easy use. Our Data warehouse/Data lake follows IEC-CIM based data model, where we can easily correlate Asset, Connectivity information with time-series data (SCADA, AMI).

Like AMI, there are other time-series data, DER performance data (15 minutes interval data for DER generation ), weather information (this is a vast data set considering our territory), etc.  

My experience is that having the capabilities to organize the data (connected), and with the help of computing power, there are many potential use cases utilities can take advantage of. 

Chris Law's picture
Chris Law on Apr 10, 2020

Excellent Vidyod.  Sounds very good.  I would add for historical analysis this is good, however as you move to more active grid-management you may find (like we have) some new processing techniques are required -, the modelling will stay consistent.

e.g. as you need to manage LV voltage fluctuations more quickly with larger shifts in load (DER, EVs, Solar) 

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 14, 2020

Agree, integrating AMI and other information in real-time analytics is vital! 

Michael Covarrubias's picture
Michael Covarrubias on Apr 13, 2020

Very well written article Vidyod.  Something that many utilities need to consider and think about, this is something that you must take time for and is part of the “long-term” objective.

Vidyod Kumar M's picture
Vidyod Kumar M on Apr 14, 2020

Thanks, Michael!

Rather than directly jumping into a specific analytical solution (often utilities are persuaded by aggressive marketing ), if Utilities can invest in foundational blocks, it will help to avoid lots of roadblocks! 

Chris Fischer's picture
Chris Fischer on Apr 29, 2020

Yes! And utilities need to resist the press from every major vendor wanting to sell them yet another analytical solution. How unfortunate would it be to move from application silos feeding one data warehouse, to application silos feeding multiple analytical silos.

Alrun Wigand's picture
Alrun Wigand on Apr 24, 2020

Thanks Vidyod!
We in EQL in QLD, Austrlaia, are unsurprisingly facing the same challenges. I'm a Digital Strategist, and this is one of the areas I'm working on with the business atm. I there someone I could get in touch with to learn more about your data management approach? 




Vidyod Kumar M's picture
Vidyod Kumar M on May 4, 2020

Hi Alrun,

Sorry for the delay in response, happy to share our experience 

Chris Fischer's picture
Chris Fischer on Apr 29, 2020

I couldn't stop myself from nodding my head as I read through this article. I think all of us who have been in the data space for a while have seen each and every one of the challenges Vidyod described in our own projects.

Vidyod Kumar M's picture
Vidyod Kumar M on May 4, 2020

Thank you, Chris, looks like it is a common story!

Subbarao Govindaraju's picture
Subbarao Govindaraju on Jun 9, 2020

Great article Vidyod.  You articulated the issues faced by the utility industry very well.  Keep up the great work and will look for future articles.

Sorry for the delayed response


Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »