In the last few years, the primary focus for discussions of the cloud in NERC circles has been the question, “What changes need to be made in the CIP requirements and definitions, so that a NERC entity will not be at risk of non-compliance when using the cloud?” The answer to that question is easy: Nothing in the CIP requirements today even mentions the cloud, let alone forbids its use. However, there are a small number of requirements and definitions that were developed without the cloud in mind; these unintentionally impose on cloud providers an impossible burden of documenting compliance with a large number of irrelevant CIP requirements.
In this recent post, I identified seven small changes that I believe will make it possible for NERC entities with high and medium impact systems to deploy or use BES Cyber Systems (BCS), Electronic Access Control or Monitoring Systems (EACMS), and Physical Access Control Systems (PACS) in the cloud (entities with low impact systems are already free to use the cloud however they want). I may be wrong about exactly the changes that are required, but I’m sure only a small number are needed. Unfortunately, the current “Cloud CIP” Standards Drafting Team seems to have decided that only a comprehensive set of changes to almost all the current CIP requirements will work; it will take years to develop and get approval for those changes, so I can’t say when or if my small set of changes will ever see the light of day.
However, a far more important question needs to be answered: What needs to be done to give both the electric power industry and the North American public confidence that deploying systems that operate the Bulk Electric System (BES) in the cloud is safe?
Until perhaps a few months ago, I thought the answer to that question was simple: Just about every industry is making heavy use of the cloud, including critical industries like oil and gas, transportation, banking, electronics, etc. Those industries all take precautions; they clearly feel those precautions make it safe for them to use the cloud. Since the electric power industry could take those same precautions, why shouldn’t the cloud be safe for it as well?
However, the electric power industry is different from the others. In other industries, the risks are mainly to the individual organizations that use the cloud; if something bad happens to one organization because of their use of the cloud, that will usually not have a huge impact on other people or organizations.
However, this is not true if you consider an electric utility’s OT systems. Electric utilities and independent power producers (IPPs), as well as organizations like Independent System Operators and Regional Transportation Organizations, cooperatively run the BES. The BES covers the Continental US, part of Mexico and the Canadian provinces that border the US. It consists of four separate Interconnects: Eastern, Western, ERCOT (most of Texas) and Quebec. These are collectively referred to as the North American “grid”, but for our current purposes we need to refer to each Interconnect as its own grid. A disturbance in any part of one of the four grids could easily impact other parts of that grid, but it will not automatically impact any other grid.
The interconnectedness of the systems that run the BES is unique, and that’s why answering the question whether the cloud is safe for control systems is very different for the electric power industry than for any other critical infrastructure industry like petroleum refining. For example, what if some of a large refinery’s control systems are all deployed in one cloud, which experiences widespread outages like those that crippled AWS last October? The refinery will probably need to shut down, easily costing its owner millions of dollars for every day that the shutdown continues. There could also be an explosion in which people in the plant or near it are injured or killed. But the immediate effects of the explosion will not be felt even 25 miles away.
However, what if the grid monitoring systems used in the Control Center of a large electric utility are deployed in one cloud and they all go down in a widespread cloud outage. In addition, what if the normal failover to the backup Control Center (whose systems are all located in one or more buildings, not in the cloud) doesn’t work[i]?
At the same time, suppose there is a serious deficiency in power supply in the area controlled by the utility (which could be a metropolitan area or even a multistate area). Because the operators in the Control Center can’t recognize this deficiency without the monitoring systems, they don’t take the emergency actions, such as shedding load, that would keep the grid from collapsing in their control area. As a result, the grid collapses in the utility’s control area. As a result, the area immediately starts “sucking” power from nearby parts of the Interconnect; those parts in turn suck power from other parts of the Interconnect, etc. This is called a cascading outage. The end result is that a sizable portion of the Interconnect is blacked out.
The above scenario – minus the cloud, of course - is roughly what happened at the start of the 2003 Northeast Blackout. The direct cause of that event was the fact that a hot day in August, with a lot of air conditioner use, caused overloaded transmission lines bringing power to Northeastern Ohio to sag (a normal reaction to heat in power lines) and contact trees, thus shorting them out.
Meanwhile, in the main Control Center of First Energy, the utility that controlled that region, a software bug in the alarm system rendered it inoperable. However, the operators in the Control Center didn’t know that was the case and didn’t take emergency actions because the screens they were watching didn’t indicate there were any problems. This led to a cascading outage starting at 4:05 PM in northern Ohio, which within a second or two started spreading beyond Ohio.
By 4:13 PM, eight minutes later, the cascade ended, but the damage was done. The resulting blackout affected 55 million people and killed almost 100. 508 generating units at 265 power plants were shut down. 12 major cities or metropolitan areas, large parts of eight states, and Ontario up to Hudson Bay were blacked out. Some cities (including New York City and Toronto) were almost completely blacked out for 36 hours. A significant portion of the Eastern Interconnect was lost.
Keep in mind that the primary cause of the blackout was the loss of the alarm system in First Energy’s Control Center (although there were contributing causes, such as the fact that First Energy had not been in compliance with NERC’s then-voluntary standard for “vegetation management” – i.e., trimming trees under transmission lines). If that one system had been deployed in the cloud and there had been a widespread cloud outage on August 14, 2003 (so that normal cloud redundancy might not have kicked in), the Northeast Blackout would probably still have occurred.
Why does the same cloud outage lead to hugely different results when the control systems that are lost are part of a large oil refinery than when they are part of a large electric utility’s Control Center? It is because the Control Center not only operates the grid in its local control area; it is also part of the Bulk Electric System. The effects of a severe disturbance in that control area can propagate well beyond its borders
And now, a word from our sponsor
Tom Alrich’s Blog, too is a reader-supported publication. You can view new posts for two months after they come out by becoming a free subscriber. You can also access my 1300 existing posts dating back to 2013, as well as support my work, by becoming a paid subscriber for $30 for one year (and if you feel so inclined, you can donate more than that). Please subscribe!
This is why many NERC entities believe it is simply too risky to allow deployment or use of systems that operate, monitor or secure the BES in the cloud. Note this isn’t the same thing as saying, “The cloud is too risky for my BES systems.” Instead, it’s saying, “The cloud is too risky for any NERC entity’s BES systems.” Given this opposition (which is probably matched by a lot of knowledgeable people in and outside of government), we can’t push off the question of whether the cloud is safe for the BES to another day. We need to start answering the question as soon as possible, even though a reasonably comprehensive answer will probably take years to develop.
However, there’s another consideration: The electric power industry no longer has the option of not using BES systems in the cloud at all[ii], for these reasons:
1. Software developers are increasingly moving to exclusively cloud-based delivery. One major software developer (that offers both on-premises and cloud-based options) told me it is no longer possible to make money selling exclusively on-premises software.
2. Developers that continue to offer an on-premises version of their product often do so with the stipulation that all future functionality upgrades will be included in the cloud version only, although bugs and vulnerabilities in the on-premises version will continue to be fixed.
3. Besides being less functional than the cloud version, a developer’s on-premises version is sometimes much more expensive. In this post, I described how a large electric utility was quoted a price of $80,000 for the cloud version of a product but $800,000 for the on-premises version.
4. Software-as-a-service (SaaS) is inherently more secure than on-premises software. This is especially true regarding software vulnerabilities. When a developer discovers a vulnerability in the SaaS version of their product, they only need to apply the patch once for all their customers; they don’t have to plead with their customers to apply any patch. In fact, the customers won’t usually even know there was a vulnerability[iii], let alone need to apply the patch themselves.
5. Many cyber and physical security monitoring services, such as SecureWorks, have never been available other than in the cloud. These services often have a significant advantage over on-premises security monitoring software. For the same reason, on-premises security monitoring products like SIEM are increasingly moving to the cloud.
How can we decide if the cloud is safe for BES systems?
Is the cloud safe for BES systems? As you might have guessed, there is no up-or-down answer to this question, especially since it’s clear that electric utilities and IPPs will increasingly have little choice but to utilize or deploy at least some BES systems in the cloud. Instead, we need to answer three questions:
1. What cybersecurity risks originate in the cloud? In other words, what cyber risks apply to systems deployed in the cloud that do not apply to systems deployed on premises? For example, one risk is that a widespread cloud outage will overwhelm at least some of the normal redundancy built into the cloud. That seems to have happened in the AWS outage last October. There are many other “purely cloud” risks as well; moreover, new cloud risks are identified all the time.[iv]
2. What mitigations are available for each of those risks and who needs to apply them? In most cases, the “platform” cloud service provider (CSP) or SaaS provider will need to identify mitigations for cloud risks and apply them. Cloud users can do very little to mitigate cloud risks, other than avoid the cloud altogether.
3. Finally, is there even one cloud risk that is too great to be effectively mitigated? For example, if there were a serious risk that North Korea would take control of one of the major CSPs without customers knowing it, that would probably be an “un-mitigatable” risk. If there is one such risk, it might be best to prohibit all use of cloud-based OT systems by NERC entities.
Fortunately, I suspect the answer to the third question is no. It seems likely that any un-mitigatable cloud risk would have been discovered by now; if one had been discovered, cloud customers would have moved back on premises in massive numbers. Of course, exactly the opposite has happened.
However, this doesn’t mean we don’t need to answer questions 1 and 2. We must identify cloud risks and determine mitigations for them. We must also try to ensure that cloud providers have applied, or are currently applying, needed mitigations. Since the cloud providers are the only organizations that can knowledgeably discuss cloud risks and identify mitigations for those risks, providers need to be “at the table” for all these discussions.
I used to think that these cloud risks would be addressed by the Risk Management for Third-Party Cloud Services Standards Drafting Team. After all, their Standards Authorization Request (SAR) includes the following passage as part of its Detailed Description of the Project Scope:
The Drafting Team will consider risks related to cloud services for CIP applicable systems, including but not limited to:
o Procurement / supply chain controls
o Reliability / operational risk / resilience
o Compliance / enforcement risk
o Data sovereignty
o Life cycle
o Key management
o Cloud ramping / communications
o Concentrated Span of control
o Reliance on indirect services
o Multi-tenancy
o Regional considerations
o Blackstart scenarios
(Note that none of these risks are addressed by the current CIP requirements. This is because most of the current requirements were developed in 2010-2012, when the cloud was considered an experimental technology by NERC, FERC, and most NERC entities. At that time, it was almost inconceivable that there would ever even be discussion of locating systems that operate or monitor the BES in the cloud).
However, in the last 2-3 months, I have come to realize that cloud risks will never be addressed through new CIP requirements. Here are my reasons for saying that:
1. Addressing cloud risks in VIP will take a long time. Developing requirements to address even one of the above risks, and getting those requirements approved by a) the NERC lawyers, b) the NERC Ballot Body, and c) FERC, will require a huge effort by the SDT. Toward the end of this post, I described what that effort will entail in a section titled "What cloud risks does CIP need to address?” I estimated that just addressing one of the above risks will require 250 hours of the SDT’s time.
The problem is that this year, the SDT met for less than 150 hours. This means it will take about a year and a half for the SDT to adequately address each cloud risk. Thus, addressing the 12 risks identified by the SDT, plus four risks I’ve identified (and there are certainly many more that could easily be identified), will take the SDT 27 years. Even if my 250 hour estimate is 100% too high, it will still take 13 ½ years to address the 16 risks.
Moreover, consider that the SDT has a lot more on its plate than cloud risks (most of which work isn’t necessary, but that’s a subject for another post). I estimate that work will require another 5-8 years. Therefore, it’s not an exaggeration to say that almost all the SDT members, even the youngest ones, will be retired before the new standards come into effect, if the SDT addresses everything they’ve said they’ll address. Of course, this will never happen.
2. In fact, it will literally take forever. Even if the SDT could develop and get approval for new requirements for the 16 cloud risks in for example five years, they would quickly discover that new cloud risks are being identified all the time. Somewhat like Sisyphus, they will continually roll the heavy ball up a steep hill, only to find they’re at the base of an even steeper hill. This will go on forever, until either the cloud is no longer used (highly unlikely, of course) or all possible cloud risks have been identified and fixed (equally unlikely).
3. The SDT hasn’t even started to address cloud risks. When I saw the revised SAR that was approved in December 2024, I was quite pleased to see it listed 12 cloud risks that the SDT planned to address. However, even though I attended a number of the SDT’s meetings over last year, I never once heard any discussion of these risks or saw any evidence they had been discussed in the meetings I missed. Moreover, the white paper that the team released for comment in early December makes no mention of these risks, other than to identify the standard (CIP-116) that will address “cloud-specific risk mitigation”. In other words, 1 ½ years after they started work, the SDT has yet to seriously discuss any true cloud risks.
4. It will be impossible to produce compliance evidence. As I mentioned earlier, none of the cloud-only risks can be mitigated by NERC entities (although they may play a secondary role in a few cases). Therefore, all the compliance evidence will need to be produced by the cloud service providers themselves. However, I have heard those providers (both platform CSPs like AWS and Azure, and SaaS providers) say they will never be able to provide any specific evidence for any NERC entity. The most they will ever do is provide audit reports for ISO 27001 certification or FedRAMP authorization; ironically, those standards don’t address most cloud risks, since they apply to both on premises systems and cloud-based systems.
5. Requirements can’t be developed without the cloud providers being at the table. As I said above, only experts from the cloud providers (or people who have worked for them) can discuss how to mitigate cloud risks. Therefore, they would have to be part of the team that is developing requirements that address those risks. Of course, this would never be allowed for a NERC SDT, which needs to be composed of representatives from NERC entities.
6. Contract terms and questionnaires aren’t options. Most cloud providers don’t allow customers to negotiate contract terms, since this would be a nightmare for them. Similarly, they don’t answer separate questionnaires, and they certainly won’t allow any customer to audit them. Without these three tools, there is no way that a NERC entity can be assured that any cloud provider has mitigated, or is in the process of mitigating, any cloud risk.
So, what’s Plan B?
It will never be possible to say it is safe to utilize cloud-based systems to monitor or control the BES unless the power industry, working with cloud providers, has identified cloud risks as well as mitigations for them, and has been assured those mitigations are either in place or in progress. Therefore, I’m suggesting there needs to be a separate committee to accomplish these tasks. Here are some of my ideas for the committee:
A. It will be comprised of staff members from electric utilities, IPPs, ISO/RTOs, Platform CSPs, SaaS providers with power industry customers, cloud security experts, and basically anybody who has an interest in this topic. While it is important to try to get representatives from each of these groups, there will be no attempt to have a fixed number from any group.
B. None of the committee members will be there as representatives of the organization they work for, or even the industry they are part of. They will always “represent” just themselves.
C. The committee cannot be part of the NERC standards enforcement process, although CIP auditors and other NERC ERO staff may join, just like any other member of the public.
D. The committee could be under the NERC E-ISAC, if there is interest in sponsoring it.
E. If the committee is outside of NERC, it might be sponsored by one or more trade associations, electric utility companies, “platform” cloud service providers, software developers, SaaS providers, or any other interested parties (even US or international government agencies would be welcome, if they don’t attach strings to the donation).
F. The committee could be part of a nonprofit organization, so it can accept tax-deductible donations. I could get the project set up under OWASP, since I already lead one OWASP project. OWASP is a 501(c)(3) nonprofit. Donations to OWASP can be “restricted” to a particular project.
G. The first task of the committee will be to identify as many potential cloud-based risks as possible, including the ones already identified above. Many suggested risks will turn out not to be important enough to deal with, because their likelihood or impact is so low that the amount of risk is negligible. For example, the risk of nuclear war wiping out a large number of cloud data centers in North America is certainly real, but – hopefully – is so unlikely that it doesn’t need to be considered.
H. When a real risk is identified, the whole committee, or perhaps a subcommittee[v], will first document it in a short description, including a discussion of its likelihood and impact.
I. Then, the whole committee or subcommittee will start discussing mitigations for the risk. Cloud provider employees or contractors, either current or past, will play the most important role in these discussions, since they are most likely to identify realistic mitigations.[vi]
J. After possible mitigations for a cloud risk have been identified, they, and the description of the risk, will be included in a “BES Cloud Risk Report”. This will be made available to all interested parties.
K. As new reports are released or existing reports are updated, they will be compiled into a single online volume, which will be made available to all interested parties.
L. Of course, cloud providers will be encouraged to apply mitigations for all risks where applicable, but the committee will never require application of any mitigation or attempt to shame a provider into applying a mitigation.
M. Cloud providers will also be encouraged to regularly publish a report to customers and interested parties which discusses the mitigation(s) they have applied for each cloud risk so far. However, no provider will be required to publish any report. It is likely that, as cloud providers start publishing these reports, more providers will follow suit, rather than cede an advantage to their competitors. Not too long ago, very few software developers reported new vulnerabilities to the CVE program. Now, most new CVEs are reported by the developers of the affected products. There may be a similar result in the case of cloud risks.
N. The committee will not publicize which cloud providers have taken steps to mitigate which cloud risks.
O. New cloud risks are being identified all the time. Moreover, it is likely that the number of newly identified risks itself will increase every year (i.e., the rate of change of the yearly increase in identified risks, also known as the second derivative, will continue to increase. This has been the case with new CVE records. In 1999, the year that CVE was introduced, about 300 CVEs were identified. In 2022, a little less than 25,000 new CVEs were identified, while in 2025 about 50,000 were identified. Thus, it took 23 years before 25,000 CVEs were identified in one year, but only two more years before twice that number were identified in a year. Today, there are around 311,000 total CVE records and it is likely that more than 50,000 new CVE records will be added in 2026.
P. Therefore, the committee will need to remain “in session” for the foreseeable future. It is possible there will need to be a formal program to catalog and publicize cloud risks, similar to the CVE Program.
Conclusion
Would you like to participate in and/or support the committee just described? Please drop me an email.
If you would like to comment on what you have read here, I would love to hear from you. Please comment below or email me at [email protected].
[i] All Control Centers owned and/or operated by larger NERC entities are required by the NERC EOP-008 and CIP-012 standards to maintain a backup facility to which the critical functions of the Control Center can be failed over in the event of an emergency. Complete failover must be tested annually.
[ii] The NERC CIP “prohibition” on using the cloud only applies to NERC entities with high and medium impact BES assets. Those with low impact assets aren’t restricted at all today.
[iii] It isn’t necessarily a good thing that SaaS providers don’t report vulnerabilities after they have patched them. See this post.
[iv] I pointed to over twelve “cloud-only risks” (which I’m now calling “cloud risks”) in this post, in the section titled “Cloud-only risks”.
[v] Since there are so many possible cloud risks and addressing each risk is likely to take many months, there may ultimately need to be many sub-committees working at the same time. Of course, each sub-committee will need to have a certain number of members from as many different categories (electric utilities, software developers, etc.) as possible. No sub-committee should be composed entirely of individuals from just one company or category.
[vi] No cloud provider will be required to discuss any mitigation whose disclosure might pose a security risk or require exposing trade secrets.