Importance of Data Modeling in Analytics and Beyond!
- May 28, 2020 11:07 am GMT
This is the second of the series of articles on data management in the utility industry. As mentioned in the previous article, utility investments in analytics and other system implementation efforts cannot be successful without proper Enterprise Information Management (EIM) Strategy. An effective data management strategy should address multiple aspects including data quality, data modeling, etc.
The focus of this paper is on the importance of data modeling, an integral part of the Enterprise Information Management (EIM) strategy.
Why data modeling matter?
It is unrealistic to expect that the various data required for developing analytical models are easily correlated and ready for use!
Existing utility IT systems across various functional domains are not designed to enable a natural correlation of the data. Often, some of the data elements required for building analytical models are not managed and maintained in enterprise systems.
To illustrate the importance of data modeling, let’s take the example of a “Long term load forecasting” use case, which is an integral part of power system planning and operations. One of the steps involved in this use case is “calculating the historical load and DER (generation) profiles.”
As illustrated in the diagram above, when developing an analytical model to calculate the load and DER profiles, there is a need to integrate a wide variety of data from different sources. Before the data can be used in the analytical model, it is crucial to model and prepare the data by logically correlating the data and organizing it for use in the model and for easy access and acceptable performance.
Consider the scenario where the analytical model deployed in production is taking hours and hours to produce the results, a not-so-unusual situation. “Performance challenges” are the last thing a data scientist wants to experience. Having the data organized efficiently will enable easy access and optimal performance while setting the stage for delivering clean, re-usable data for analysis across the enterprise.
Now the question is, what is the right approach and methodology to develop a data model?
Approach for developing a data model
The decision regarding the right approach for developing the data model is critical, and there are multiple options available.
One of the approaches is to take an existing data model that was designed for some other purpose (e.g., power applications, GIS, asset management, and so on) and scale it up to an enterprise model. Key challenges with this approach include:
- Difficulty arriving at the collective agreement of semantics across all uses
- Varying formats and change rates of mapping sources (i.e., inconsistencies due to revisions, upgrades, and replacements)
- Understanding future requirements
- Late realization of significant flaws in those models that become ‘showstoppers’ for enhancements
Another approach is to use industry-standard information models like the IEC Common Information Model (CIM), IEC 61968/61970/62325, which provides standard terminology for enterprise semantics. One of the advantages of CIM is that it gives an excellent foundational data model for most of the functional domains in the electric utility industry. With the contribution from industry experts and product companies, CIM has become more significant and more abstract than custom-built models, and it has the benefit of not pushing the integrator or analytics designer into the proverbial corner. Also, many product vendors have implemented CIM (with a certain level of customization) as part of their software solution. It scales up well because its foundational design is built to support multiple disparate business functions simultaneously.
At SCE, we have adopted the industry-standard IEC Common Information Model (IEC-CIM) as a foundation for data model. However, our approach is not limited by what is available in the standard; instead, we use it as a foundational model and extend it to cover the enterprise information needs. Take the example of Distributed Energy Resources (DER). CIM does not have enough coverage for this functional domain but provides some foundational blocks. Hence it is essential to extend the model to include missing aspects. The extended model is referred to as the SCE Common Information Model (SCIM), which provides a shared vocabulary for all information assets to manage and facilitate various business processes! The image below is an example of an extended model for DER.
Note: The extension details and information modeling for DER will be covered as part of the forthcoming DER white paper.
While the effort started with a primary goal of creating a data model for analytics, having an enterprise semantic model helps to enable capabilities for both data-in-motion and data-at-rest. As shown in the figure below, the SCIM serves as the logical model on which all semantically aware design artifacts are based, such as those for integration services, data warehouses, Operational Data Stores (ODS) reporting, analytics, etc. For example, the model can be easily converted into an interface exchange model (exchange model) for system integration.
Figure 1: The SCIM provides a unified data model that integrates data from disparate sources to provide an end-end view of data (Data-At-Rest & Data-In-Motion)
The process and framework
We all might have heard plenty of stories about failed attempts to develop enterprise semantic models, primarily around implementations. Developing a data model in isolation, disconnected from the rest of the process, and not integrated into a real business or project goals, often leads to failure. Just having an academic exercise, not in alignment with organizational goals is a recipe for disappointment.
Success can only be ensured if the model can be successfully deployed (Data-at-Rest or Data-in-Motion) and used for system implementations in a timely and cost-effective manner.
Even though IEC-CIM provides a great start, just adopting an industry-standard, like IEC-CIM, does not guarantee success. The adoption of IEC-CIM has its challenges too:
· Gaining acceptance from stakeholders
· Converting the logical model to implementation model
· Additional semantic mapping to develop and maintain
· The complexity of understanding and using standards
· Differences in the format of mapping sources
· Possible internal model vulnerability to external model changes
· Integrating modeling effort as part of the overall project effort
· Effort to develop the model
· Extending the standards to match requirements
To overcome these challenges, we have adopted a systematic and iterative data modeling approach. Instead of jumping to develop a data model for all data subject areas applicable to the entire energy utility business domain, we followed a use case-based approach, i.e., subject areas relevant to a use case. Going back to the use case “to calculate load and DER profiles,” we focused on modeling Asset, Connectivity, etc. All the tasks associated with model development were included as part of the project plan, which ensured the rigor, cost & schedule impact, and visibility for the efforts. The figure below illustrates the framework we have adopted.
Figure 2: Framework for data model development
The framework is comprised of tools, technologies, standards, governance, role/people, and processes that need to work together to achieve the desired results. At the core of this framework are the various roles involved and the processes driving the iterative data model development. It is important to note that the modeling steps illustrated above will go through multiple iterations before it is used for the analytical model deployment.
Ability to design solutions that meet business requirements and specify system (non-functional) requirements
Ability to clearly articulate data input and outputs, classifications and communicate desired business outcomes.
Understanding of exiting enterprise domain and system landscape
Utility domain knowledge (foundational)
Expert in deployment platform of choice (Data base/Data warehouse/Data lake technologies, etc.)
Expert in data integration
Good understanding of data modeling
Strong understanding of utility domain
Expert in industry standard model (E.g. IEC-CIM)
Expert in data modeling
Understanding of Database/Data warehouse/Data lake technologies
Note: The roles and responsibilities listed above are for developing the data model and do not cover the end-to-end life cycle of analytical model development.
Aligning with analytical model development methodology: CRISP-DM
Now the question is how to integrate data modeling efforts as part of the analytical development life cycle? Even though there are different approaches for data mining and developing analytical models, CRISP-DM (Cross-industry standard process for data mining) is the widely used methodology for analytical projects, including advanced analytics.
As per the CRISP-DM process, the “Data Preparation” phase consists of activities to prepare the final data set from row data received from multiple sources. Data modeling should be part of the “Data preparation phase” as it aligns with other activities like gathering the data, discovering and assessing the state of the data, transforming and enriching the data to meet the use case needs. A close collaboration between the data engineer, data modeler, and data scientist ensures this data preparation phase is successful.
Data modeling beyond analytics
As mentioned above, data modeling scope is not just limited to analytics, but it goes beyond that. Take the example of system integration. The challenges with integrating different systems are many and begin with the way the systems are procured. When a specific vendor product is purchased, vendors are driven by the procurement process to meet user requirements at the lowest cost. Each of the acquired applications has a unique mixture of platform technologies, databases, communications systems, data formats, and application program interfaces. While utilities prefer products that support industry-standard interfaces, another high priority is for product vendors to supply application interfaces that remain relatively stable across product releases.
Even though it may not be practical to expect that every system to system interaction developed in the organization is using a standardized message model, having an enterprise semantic model helps to start the discussion about the data and drive towards message standardization.
Take the example of asset and grid connectivity information, which is vital across many enterprise systems such as asset management, work management, engineering, mapping and GIS, mobile applications, engineering and planning, and more. At SCE, we have developed a system of truth for Grid Connectivity information, integrating data from GIS, Asset management, EMS, and other operational applications. The system provides the information as API’s (Application Programming Interfaces) using SCIM based standard exchange model. These APIs provide a complete set of electrical network connectivity that includes Transmission, Sub-Transmission, Distribution Primary, Distribution Secondary, and Substation Internals, serving multiple application needs using a standard based message model.
Figure 3: Grid Connectivity Information- System of Truth - Common model used as a physical data model and exchange model.
Data modeling can also help to create a standard view of data across the enterprise, enabling data quality and data governance efforts. This includes defining common terminology, semantics, and implementation, along with developing semantic traceability and lineage of data maintained across the organization.
Data modeling not only helps to validate understanding of the data between business and IT but is also a very useful tool to analyze and extract value from available data. It constitutes a crucial step in the analytics development cycle. Focusing on this step enables electric utilities to manage data systematically through the data lifecyle viz. capture, organize, analyze, and deliver to achieve the desired outcome.
While venturing into analytics or major system integration projects, organizations need to focus on their Enterprise Information Management (EIM) strategy. A well-designed EIM requires business units and IT to look at enterprise data and information as assets to understand the nature of the information and how it is used and controlled. This effort includes addressing critical issues around data definition, quality, integrity, security, compliance, access and generation, management, integration, and governance. These issues are interrelated and systemic, which require business units and IT to work together to understand and solve the challenges. The process is iterative, which requires a holistic and evolutional EIM strategy and framework to ensure a consistent and practical approach.
While Utilities are looking ahead to incorporate unstructured data, like HD drone video and LiDar of assets to better understand their condition, there are plenty of structured data available in the organization which can enable advanced analytics, both real-time and historical. Furthermore, tremendous benefits can be derived by connecting structured data with unstructured information to enable deep and valuable insights. Having a comprehensive Enterprise Information Model – the SCIM, along with an EIM strategy, is helping SCE take advantage of these precious data assets.
Get Published - Build a Following
The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.
If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.