The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 

WARNING: SIGN-IN

You need to be a member of Energy Central to access some features and content. Please or register to continue.

Post

Importance of Data Modeling in Analytics and Beyond!

This is the second of the series of articles on data management in the utility industry. As mentioned in the previous article, utility investments in analytics and other system implementation efforts cannot be successful without proper Enterprise Information Management (EIM) Strategy. An effective data management strategy should address multiple aspects including data quality, data modeling, etc.

The focus of this paper is on the importance of data modeling, an integral part of the Enterprise Information Management (EIM) strategy.

Why data modeling matter?

It is unrealistic to expect that the various data required for developing analytical models are easily correlated and ready for use!

Existing utility IT systems across various functional domains are not designed to enable a natural correlation of the data. Often, some of the data elements required for building analytical models are not managed and maintained in enterprise systems.  

To illustrate the importance of data modeling, let’s take the example of a “Long term load forecasting” use case, which is an integral part of power system planning and operations.  One of the steps involved in this use case is “calculating the historical load and DER (generation) profiles.”

 As illustrated in the diagram above, when developing an analytical model to calculate the load and DER profiles, there is a need to integrate a wide variety of data from different sources. Before the data can be used in the analytical model, it is crucial to model and prepare the data by logically correlating the data and organizing it for use in the model and for easy access and acceptable performance.

Consider the scenario where the analytical model deployed in production is taking hours and hours to produce the results, a not-so-unusual situation. “Performance challenges” are the last thing a data scientist wants to experience.  Having the data organized efficiently will enable easy access and optimal performance while setting the stage for delivering clean, re-usable data for analysis across the enterprise.

Now the question is, what is the right approach and methodology to develop a data model?

Approach for developing a data model

The decision regarding the right approach for developing the data model is critical, and there are multiple options available.

One of the approaches is to take an existing data model that was designed for some other purpose (e.g., power applications, GIS, asset management, and so on) and scale it up to an enterprise model. Key challenges with this approach include:

  • Time-consuming
  • Difficulty arriving at the collective agreement of semantics across all uses
  • Varying formats and change rates of mapping sources (i.e., inconsistencies due to revisions, upgrades, and replacements)
  • Understanding future requirements
  • Late realization of significant flaws in those models that become ‘showstoppers’ for enhancements

Another approach is to use industry-standard information models like the IEC Common Information Model (CIM), IEC 61968/61970/62325, which provides standard terminology for enterprise semantics. One of the advantages of CIM is that it gives an excellent foundational data model for most of the functional domains in the electric utility industry. With the contribution from industry experts and product companies, CIM has become more significant and more abstract than custom-built models, and it has the benefit of not pushing the integrator or analytics designer into the proverbial corner. Also, many product vendors have implemented CIM (with a certain level of customization) as part of their software solution. It scales up well because its foundational design is built to support multiple disparate business functions simultaneously.

At SCE, we have adopted the industry-standard IEC Common Information Model (IEC-CIM) as a foundation for data model. However, our approach is not limited by what is available in the standard; instead, we use it as a foundational model and extend it to cover the enterprise information needs. Take the example of Distributed Energy Resources (DER). CIM does not have enough coverage for this functional domain but provides some foundational blocks. Hence it is essential to extend the model to include missing aspects. The extended model is referred to as the SCE Common Information Model (SCIM), which provides a shared vocabulary for all information assets to manage and facilitate various business processes! The image below is an example of an extended model for DER.

Click here to enlarge the image

Note: The extension details and information modeling for DER will be covered as part of the forthcoming DER white paper.

 

While the effort started with a primary goal of creating a data model for analytics, having an enterprise semantic model helps to enable capabilities for both data-in-motion and data-at-rest. As shown in the figure below, the SCIM serves as the logical model on which all semantically aware design artifacts are based, such as those for integration services, data warehouses, Operational Data Stores (ODS) reporting, analytics, etc. For example, the model can be easily converted into an interface exchange model (exchange model) for system integration.

Figure 1: The SCIM provides a unified data model that integrates data from disparate sources to provide an end-end view of data (Data-At-Rest & Data-In-Motion)

 

The process and framework

We all might have heard plenty of stories about failed attempts to develop enterprise semantic models, primarily around implementations. Developing a data model in isolation, disconnected from the rest of the process, and not integrated into a real business or project goals, often leads to failure. Just having an academic exercise, not in alignment with organizational goals is a recipe for disappointment.

Success can only be ensured if the model can be successfully deployed (Data-at-Rest or Data-in-Motion) and used for system implementations in a timely and cost-effective manner.

Even though IEC-CIM provides a great start, just adopting an industry-standard, like IEC-CIM, does not guarantee success. The adoption of IEC-CIM has its challenges too:

·        Gaining acceptance from stakeholders

·        Converting the logical model to implementation model

·        Additional semantic mapping to develop and maintain

·        The complexity of understanding and using standards

·        Differences in the format of mapping sources

·        Possible internal model vulnerability to external model changes

·        Integrating modeling effort as part of the overall project effort

·        Effort to develop the model

·        Extending the standards to match requirements

To overcome these challenges, we have adopted a systematic and iterative data modeling approach. Instead of jumping to develop a data model for all data subject areas applicable to the entire energy utility business domain, we followed a use case-based approach, i.e., subject areas relevant to a use case.  Going back to the use case “to calculate load and DER profiles,” we focused on modeling Asset, Connectivity, etc.  All the tasks associated with model development were included as part of the project plan, which ensured the rigor, cost & schedule impact, and visibility for the efforts.  The figure below illustrates the framework we have adopted.

Figure 2: Framework for data model development

The framework is comprised of tools, technologies, standards, governance, role/people, and processes that need to work together to achieve the desired results. At the core of this framework are the various roles involved and the processes driving the iterative data model development. It is important to note that the modeling steps illustrated above will go through multiple iterations before it is used for the analytical model deployment.

 

ROLES

RESPONSIBILITIES

SKILL SET

  • Variables or features selection
  • Work with data engineer to optimize the data structure (access, performance etc.)

 

Ability to design solutions that meet business requirements and specify system (non-functional) requirements

  • Share insight into how the selected data set is used for various decision business processes
  • Share business value and challenges (data quality) associated with the data set
  • Share data classification information

Ability to clearly articulate data input and outputs, classifications and communicate desired business outcomes.

 

 

 

 

 

  • Gathering the data - Identifying the system of record or system of truth for identified variables or features
  • Work with Data Scientist to identify the data quality requirements
  • Gap analysis (what is available or not available in a system of record or system of truth)
  • Support data modeler for logical data model development
  • Converting the logical model to the physical model (optimizing the data model for selected platform deployment)
  • Organizing/preparing the data model for ease of access and performance 
  • Work with data scientist for integrating data model with the analytical model
  • Designing the data integration process and overseeing the implementation

Understanding of exiting enterprise domain and system landscape

Utility domain knowledge (foundational)

Expert in deployment platform of choice (Data base/Data warehouse/Data lake technologies, etc.)

Expert in data integration

Good understanding of data modeling

 

 

 

 

  • Gap analysis between selected foundational model (E.g., IEC-CIM) and business requirements
  • Map data elements (covering the requirements) to system of record or system of the truth information model to understand gaps and extension requirements
  • Extend the model for the gaps identified
  • Generate the logical model
  • Make sure that the model is comprehensive (for example, when modeling a Power Transformer, it should contain all attributes applicable to Power Transformer, not just limited to attributes required for the selected use case)

Strong understanding of utility domain

Expert in industry standard model (E.g. IEC-CIM)

Expert in data modeling

Understanding of Database/Data warehouse/Data lake technologies

Note: The roles and responsibilities listed above are for developing the data model and do not cover the end-to-end life cycle of analytical model development.

 

Aligning with analytical model development methodology: CRISP-DM

Now the question is how to integrate data modeling efforts as part of the analytical development life cycle?  Even though there are different approaches for data mining and developing analytical models, CRISP-DM (Cross-industry standard process for data mining) is the widely used methodology for analytical projects, including advanced analytics.

As per the CRISP-DM process, the “Data Preparation” phase consists of activities to prepare the final data set from row data received from multiple sources. Data modeling should be part of the “Data preparation phase” as it aligns with other activities like gathering the data, discovering and assessing the state of the data, transforming and enriching the data to meet the use case needs.  A close collaboration between the data engineer, data modeler, and data scientist ensures this data preparation phase is successful.

Data modeling beyond analytics

As mentioned above, data modeling scope is not just limited to analytics, but it goes beyond that.  Take the example of system integration. The challenges with integrating different systems are many and begin with the way the systems are procured. When a specific vendor product is purchased, vendors are driven by the procurement process to meet user requirements at the lowest cost. Each of the acquired applications has a unique mixture of platform technologies, databases, communications systems, data formats, and application program interfaces. While utilities prefer products that support industry-standard interfaces, another high priority is for product vendors to supply application interfaces that remain relatively stable across product releases.

Even though it may not be practical to expect that every system to system interaction developed in the organization is using a standardized message model, having an enterprise semantic model helps to start the discussion about the data and drive towards message standardization.

Take the example of asset and grid connectivity information, which is vital across many enterprise systems such as asset management, work management, engineering, mapping and GIS, mobile applications, engineering and planning, and more.  At SCE, we have developed a system of truth for Grid Connectivity information, integrating data from GIS, Asset management, EMS, and other operational applications. The system provides the information as API’s (Application Programming Interfaces) using SCIM based standard exchange model. These APIs provide a complete set of electrical network connectivity that includes Transmission, Sub-Transmission, Distribution Primary, Distribution Secondary, and Substation Internals, serving multiple application needs using a standard based message model. 

Figure 3: Grid Connectivity Information- System of Truth - Common model used as a physical data model and exchange model.

Data modeling can also help to create a standard view of data across the enterprise, enabling data quality and data governance efforts. This includes defining common terminology, semantics, and implementation, along with developing semantic traceability and lineage of data maintained across the organization.

Summary

Data modeling not only helps to validate understanding of the data between business and IT but is also a very useful tool to analyze and extract value from available data. It constitutes a crucial step in the analytics development cycle. Focusing on this step enables electric utilities to manage data systematically through the data lifecyle viz. capture, organize, analyze, and deliver to achieve the desired outcome.

While venturing into analytics or major system integration projects, organizations need to focus on their Enterprise Information Management (EIM) strategy. A well-designed EIM requires business units and IT to look at enterprise data and information as assets to understand the nature of the information and how it is used and controlled. This effort includes addressing critical issues around data definition, quality, integrity, security, compliance, access and generation, management, integration, and governance. These issues are interrelated and systemic, which require business units and IT to work together to understand and solve the challenges. The process is iterative, which requires a holistic and evolutional EIM strategy and framework to ensure a consistent and practical approach.

While Utilities are looking ahead to incorporate unstructured data, like HD drone video and LiDar of assets to better understand their condition, there are plenty of structured data available in the organization which can enable advanced analytics, both real-time and historical. Furthermore, tremendous benefits can be derived by connecting structured data with unstructured information to enable deep and valuable insights.   Having a comprehensive Enterprise Information Model – the SCIM, along with an EIM strategy, is helping SCE take advantage of these precious data assets.

Vidyod Kumar M's picture

Thank Vidyod Kumar for the Post!

Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.

Discussions

Richard Brooks's picture
Richard Brooks on May 28, 2020 5:20 pm GMT

This is a rather long posting, but that's because it has to be to cover this topic as thoroughly as needed to establish a firm foundation. Interesting to see that you created an "internal CIM"; we did the same at ISO New England - it would be incredibly difficult to build an effective and reliable analytic framework without standard terms/definitions. Well done.

Vidyod Kumar M's picture
Vidyod Kumar M on May 28, 2020 6:15 pm GMT

Thank you, Richard! I tried to compress the content, but as you mentioned covering all aspects was challenging.
 Regarding the CIM customization, eager to hear your experience!

Richard Brooks's picture
Richard Brooks on May 28, 2020 7:50 pm GMT

We considered using the IEC CIM 61970 as a starting point, but quickly moved away due to lots of impedance mismatches with our own data standards. It was far easier for our developers to use our own "data language" instead of trying to learn the IEC langauge, e.g. Organisation versus Customer. We ended up creating a comprehensive Data Governance program that integrated with our BI environment, allowing the use of mouse-over on a data element in the warehouse to see description and other metadata. End users loved this feature.

Vidyod Kumar M's picture
Vidyod Kumar M on May 28, 2020 10:34 pm GMT

Got it, in our case, we developed the entire Analytical system data model using IEC profiles, adding extensions (attributes/classes which are not in CIM). Whatever exists in CIM, we used it as is.  

Richard Brooks's picture
Richard Brooks on May 29, 2020 6:51 pm GMT

Interesting; What % of your current data model uses IEC CIM objects verbatim, versus your own extensions? I found the enumeration items in IEC CIM to be an issue which would have required customization.

Jim Horstman's picture
Jim Horstman on May 31, 2020 10:15 pm GMT

Enumerations are definitely an area in the CIM where there is a lot of debate. The first debate is whether or not to use enumerations instead of a text string or other type with no defined values. The second is coming up with the actual values which are likely to vary as to both the actual values and how many are required. The second area is where utilities typically will need to come up with their own list of values.

I recently worked at a utility which was taking a similar approach as SCE. I didn't keep track but would guess that extensions might have ranged into 10-15%.

Richard Brooks's picture
Richard Brooks on Jun 3, 2020 5:05 pm GMT

Thanks, Jim. 10-15% is much better than I expected.

Vidyod Kumar M's picture
Vidyod Kumar M on Jun 4, 2020 4:03 pm GMT

Sorry for the delayed response. In our case, we could leverage 75% of existing objects from IEC-CIM. There are some scenarios, like DER, Environmental area, where we end up adding a lot of extensions since CIM does not have much coverage. 

Richard Brooks's picture
Richard Brooks on Jun 4, 2020 5:36 pm GMT

Thanks Vidyod - that's very helpful to know. Much appreciate you posting such useful content for others, such as myself, to learn from.

Jim Horstman's picture
Jim Horstman on May 29, 2020 6:51 pm GMT

Another good article Vidyod covering the topic well. I hope you continue your participation with the CIM working group teams to bring your experience to bear and help us continue the extension of the CIM.

Vidyod Kumar M's picture
Vidyod Kumar M on May 29, 2020 10:00 pm GMT

Thank you, Jim! Definitely, hopefully, our experience and learning will help others!

Jim Horstman's picture
Jim Horstman on May 31, 2020 10:17 pm GMT

Hopefully you will keep participating in the CIM DER team as well as others. You have a lot of value to add.

Noah Badayos's picture
Noah Badayos on May 30, 2020 12:20 am GMT

Another excellent article Vidyod!

Vidyod Kumar M's picture
Vidyod Kumar M on May 30, 2020 12:49 am GMT

Thank you, Noah!

Chris Law's picture
Chris Law on May 31, 2020 9:33 pm GMT

Great post again Vidyod.  As you mention connectivity modeling is probably the most important outcome as it provides the context.

Having the model is a great starting point.  Next step is to applied the live operational data to create a "living and breathing" view of the distribution network for active analytics (perhaps the next post!)

Great work!

Vidyod Kumar M's picture
Vidyod Kumar M on Jun 4, 2020 4:04 pm GMT

Thank you, Chris ! We are working on applying the model to different states of the network i.e. AS-BUILT, AS-PLANNED, AS-OPERATED. 

Ben Ettlinger's picture
Ben Ettlinger on Jun 3, 2020 2:39 pm GMT

Great post. Very through. Did/do you use a tool to do the modeling? If so did you bring the CIM model into the tool?

Jim Horstman's picture
Jim Horstman on Jun 3, 2020 9:06 pm GMT

Ben, typically EnterpriseArchitect is used for CIM modeling. It is published as an EAP file.

Vidyod Kumar M's picture
Vidyod Kumar M on Jun 4, 2020 4:00 pm GMT

Thank you Ben for the kind words.  As mentioned in the white paper (the process section), we use multiple tools, to achieve the following.

1. Mapping requirements to model

2.. Foundational model  (UML) development based on IEC-CIM

3. Converting the UML model  (CIM)  to the Logical / Physical model (XSD, DDL)

4. Implementation of the model and associated scripts.

Happy to share further details about tools and technologies.

 

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »