GRID ARCHITECTURE ARTICLE 4: How Do We Architect a Resilient Grid?

Posted to GridIntellect, LLC – A Veteran-owned Company in the Digital Utility Group
image credit: Stuart McCafferty
Stuart McCafferty's picture
Lead Architect, eaaS, Siemens Smart Infrastructure CTO

2021 Cleanie Award winner. Siemens Smart Infrastructure CTO Office, technologist, distributed energy expert, researcher, author, and climate change warrior. Genuinely focused on doing good for...

  • Member since 2018
  • 112 items added with 104,442 views
  • Sep 8, 2020

GRID ARCHITECTURE ARTICLE 4:  How Do We Architect a Resilient Grid?

by Eamonn McCormickStuart McCafferty & David Forfia

"Resilience should be measured from the customer's perspective. "  Jeff Taft creator of Grid Architecture


Figure 7:  The Climate Center initiative to build clean and smart community microgrids to build Community Energy Resilience[1]

This fourth GRID ARCHITECTURE ARTICLE introduces the unique way in which resilience is measured versus every other industry.  It further provides DOE’s deconstruction of resilience from a customer viewpoint and from various points on the grid.  Just a warning - some of this may “hurt your head” since it was designed by some really smart PhDs, but their framework for how to think AND measure resilience is pragmatic and simple enough for even us simple-minded folk.

Grid Architecture  was developed by PNNL for the DOE as a framework that the industry can use to solve its most pressing problem of transitioning to a cheaper, cleaner and more sustainable grid.

What Do We Mean by Resilience?

As our economy and social fabric become more and more dependent on electricity, the expectations for quality power always being there have percolated into the customer mindset.  Power has become a basic need.  If Maslow was alive today, his Physiological (the most basic) needs would be – Food, Water, Shelter, and Electricity!  Even short outages or poor power quality can have devastating financial, health, computing, and performance effects, just to name a few.  Resilience is a very important concept.

According to PNNL, grid resilience is the ability to avoid or withstand grid stress events without suffering operational compromise or to adapt to and compensate for the resultant strains to minimize compromise via graceful degradation. In other words, resilience is the ability to avoid or limit operational degradation to the grid or electricity in the event of an unexpected incident.  The authors believe there is one additional aspect to resilience, and that is the ability to recover quickly when degradation or outages do actually occur.

Today’s grid infrastructure is controlled by a web of centralized control systems consisting of Energy Management Systems (EMS) for transmission and (Advanced) Distribution Management Systems (DMS/ADMS) for distribution that are combined with Supervisory Control and Data Acquisition (SCADA) systems.  These systems have worked well for many years in a top-down, one-way power flow, fossil fuel bulk generation, centralized power system.  But times are changing.  As we move to a world where Distributed Energy Resources (DER) are being rapidly adopted, we are moving Distributed Generation (DG) and energy storage DERs closer to the loads they serve.  Obviously, this presents some challenges for our centralized control systems, but it also presents a real opportunity to increase grid resiliency.  The California Public Utility Commission (CPUC) issued order AB-2868 in 2016 requiring the three Investor Owned Utilities (IOUs) – Pacific Gas & Electric, Southern California Edison, and San Diego Gas & Electric – to install up to 500MW of distributed energy storage systems on its distribution networks.  This order serves two purposes; 1) maximize renewable energy, and 2) increase system resilience.

Of course, we can go much more granular than the California example just cited.  Everyone is aware of the massive uptake by residential and Commercial & Industrial (C&I) customers in DERs, particularly solar photovoltaic and energy storage systems.  This trend is actually accelerating and there is currently a shortage of solar PV panels on the market due to manufacturing hiccups with COVID, but also because demand for solar is so high.

Figure 8: Laminar/Hierarchical Control Paradigm[2]

The ability to operate these systems and maximize global optimization resilience and other desirable characteristics (e.g. maximum use of renewables, efficiency, security) will require a different kind of control orchestration.  The figure above shows PNNL’s example of parent/child relationships for control instances.  Unfortunately, this screen shot doesn’t do PNNL’s work justice since they provide an animation showing the messaging and orchestration between these control instances.  The takeaway from the image is that the idea of distributed control is becoming mainstream.  IoT standards such as the Open Field Message Bus (OpenFMB) support this paradigm. This year, OpenFMB will publish testing and compliance supporting standards, a test harness, and a test bed to move into commercial solutions.  With the introduction of a distributed control paradigm, outages can be contained by allowing electrical subsystems (circuit segments) to operate independently when upstream systems are compromised, thus increasing resiliency by reducing the number of customers affected.

Another key factor to consider when developing a resilient grid is the effect of climate change could have on the infrastructure and energy usage.  As temperatures increase, the amount of energy needed for cooling will increase accordingly.  If storms happen more often and/or are more powerful, grid assets will be vulnerable to damage.  As climate change impacts our society, we need to build in more grid resilience.

Wikipedia describes Climate resilience as “the capacity for a socio-ecological system to: (1) absorb stresses and maintain function in the face of external stresses imposed upon it by climate change and (2) adapt, reorganize, and evolve into more desirable configurations that improve the sustainability of the system, leaving it better prepared for future climate change impacts.”[3] Resilience of the grid is therefore the ability of the grid to absorb stresses and maintain function in the face of external stresses imposed upon it and the ability to evolve to more desirable configurations in the face of changing threats.

In most industries where there are single or separated systems, resilience and reliability are calculated the same way.  Reliability and resilience calculations are based on the mean time between failure (MTBF).  This approach determines the number of failures for components  and systems over some time period. MTBF is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation. System Reliability Theory is based on a function of component MTBFs, structure, and time window:

System Reliability = f (component MTBFs, structure, time window)

Mathematically, this is calculated using the following formula:

System reliability = e (-T/MTBF)

In the electric power industry, resilience and reliability are not the same.  We’ve all heard it and probably said it – “Reliability is about keeping the lights on.  Resilience is about planning for and recovering quickly from disruptions.”

In the Grid Architecture methodology, resilience is measured from the customer's perspective and is evaluated based on impacts to the customer. Grid resilience is therefore defined based on a particular point of delivery for energy supply, meaning that there are different resilience values depending on where it is calculated.  In System Resilience Theory, instead of MTBFs, “component determinants” replace “component MTBFs”

System resilience = f (component determinants, structure, time window)

Mathematically, this is calculated using the following formula:

System reliability = e (t(1-d)/T)

where: t = time; d = component determinant; T= time horizon

PNNL has named “component determinants” in the function above “d-Blocks” in its Grid Architecture methodology. d-Blocks are abstractions of one or more grid components based on their contribution to system resilience at a delivery point.  D-Block values are dimensionless and normalized.  The methodology assigns a resilience contribution value for a d-Block where 0 is no contribution and 1 is very high contribution.  So, obviously, the higher the d-Block value, the more it contributes to system resilience.  How d-Blocks interact with other d-Blocks determines how the mathematical combination (additive, multiplicative, etc.).  The math can get somewhat complicated, but it’s due to the distributed system of systems nature of the grid, and it allows resilience to be calculated for whole regional grids, distribution grids, or subsets of grids all the way down to component levels.

This may seem like an abstract concept but being able to measure resilience has implications for design of the grid and also how we assign economic value to elements that contribute more to resilience. If we can calculate resilience utilities could charge customers for more resilience.  Or, utilities could also use resilience calculations for rate base conversations with the Commissions that are requiring better grid resilience.  As an example, creating parallel, redundant components and structures creates higher resilience. Part of the PNNL methodology is to define the d-Blocks and string them together to derive the net resilience of the related d-blocks in various configurations. It may be possible to change the relationships between interconnecting d-blocks to increase resilience and link that measurable increase in resilience to incentives.

So how would this relate to specific cases? In the case of a simple radial feeder for example we can potentially link feeders together so we can redundantly serve loads. By linking we can also increase the d value from the perspective of the load. By understanding how we can change the overall resilience of the system we can determine the economic cost and calculate the resilience value for different configurations and scenarios to find “best value” resiliency solutions. The resilience value can then be factored into the rate base or the market value of the transmission capacity.

Figure 9:  Resilience Calculations for Centralized, De-Centralized, and Distributed Systems

As we re-examine the grid, we can use this approach to do comparative analysis as it relates to resilience for different combinations of d blocks in various configurations. As we move to more distributed battery and DER we can refactor on an ongoing basis the effective resilience of the system. PNNL has defined different scenarios to compare decentralized, centralized, and distributed cases. PNNL has mathematically proven that distributed approaches have a great promise to improve resilience over the centralized system we have today as long as we are prepared to architect for resilience and align the incentives to create resilience. This is great news! There are some camps of industry that believe high penetration DER will decrease grid resilience.  In fact, PNNL analysis points to the opposite conclusion. DER done right should increase resilience.

This is especially important as climate change challenges us to increase grid resilience.  As climate events become more frequent and powerful, resilience of the grid is even more important.  Improving climate resilience involves assessing how climate change will create new, or alter current, climate-related risks, and taking steps to better cope with these risks using the PNNL Grid Architecture resilience approach can be used to identify the best value resilience opportunities.

PNNL further clarifies the types of resilience measures as:

  • Avoidance – resilience measures that are used to prevent the number of system outages.
    • Vegetation management
    • Undergrounding
    • Pre-emptive disconnect (wind and other events)
    • Connectivity limitation for cyber protection
  • Resist (hardness) – built-in hardness/elasticity to either allow the grid to “not budge” or to “snap back” when the stress subsides.
    • Withstand environmental such as wind, rain, heat, cold, fire, or flooding
    • Withstand electrical such as overcurrent or thermal limits
  • Shock absorbance – dynamically apply adjustments to keep grid in operational limits.
    • Volt/VAR regulation
    • System frequency regulation
    • Computing capacity elasticity and scaling such as that found in cloud-based solutions and discussed in the EnergyIoT Reference Architecture series.


The PNNL approach introduces how to calculate a resilience value through mathematical means. The fuzzy nature of resilience can be solved in a quantitative way to calculate which architectures are more resilient, allowing us to compare and analyze how much components contribute relative to one another. The process helps us understand and prepare the grid as we transition from a centralized model to a more distributed and resilient future. The ability to architect for resilience using quantitative methods and deriving mechanisms to pay for resilience is a very important societal issue. As COVID 19 has demonstrated, we did not have a resilient public health system in the US to address a pandemic. The result was grave social, health and economic harm. Building resilience into the future grid is no less important than building a more robust and resilient public health system.


[1], August 2020

[2] PNNL, Grid Architecture Training, Dr. Jeff Taft, 2019

[3] Wikipedia,, Aug 2020

Matt Chester's picture
Matt Chester on Sep 8, 2020

n the Grid Architecture methodology, resilience is measured from the customer's perspective and is evaluated based on impacts to the customer. Grid resilience is therefore defined based on a particular point of delivery for energy supply, meaning that there are different resilience values depending on where it is calculated.  In System Resilience Theory, instead of MTBFs, “component determinants” replace “component MTBFs”

System resilience = f (component determinants, structure, time window)

Mathematically, this is calculated using the following formula:

System reliability = e (t(1-d)/T)

where: t = time; d = component determinant; T= time horizon

This is fascinating and so informative-- thanks for sharing, Stu. I'm going to be bookmarking this article to keep this as a point of reference for myself!

Stuart McCafferty's picture
Stuart McCafferty on Sep 9, 2020

Thanks Matt!  Hope all is well with you.

Stuart McCafferty's picture
Thank Stuart for the Post!
Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.
More posts from this member

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »