The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 


You need to be a member of Energy Central to access some features and content. Please or register to continue.


Use Machine Learning and Open Source Tools to Automate Load Prediction

This item is part of the Special Issue - 2019-10 - Artificial Intelligence, click here for more

Several of our utility customers were interested in a solution that would add load estimation capabilities to their existing architectures. Accurate load prediction helps generators plan for future demand and participate in the energy markets more cost effectively. In the event meters are down or the telemetry to meters is down, a load estimator accurately reconciles the load as well. We knew machine learning would be the perfect tool to cull the wide range of possible variables to determine the predicted load; and, Open Source tools would allow us to quickly develop a highly accurate load estimator at a cost-effective price. We call the tool AMBLE (Automated Machine Learning Based Load Estimator). The following article discusses how we developed the system and lessons learned.  

Model Set-up

The project began during the summer on 2018 with 3 years of a customer’s load data. Our data scientist explored several machine learning algorithms and ultimately received the best results using  random forest regression as the machine learning algorithm and employed open source Python packages including: numpy, pandas seaborn, datetime, matplotlib, requests, xlrd, sklearn, bs4, InfluxDB to build the model below.



With the scripting and workflow in place, we continued to refine the model through many simulations. We looked carefully at the results and the dependent variables which most influenced the model. Through a series of experiments and some great collaborative exchange of ideas, we continued to improve the correlation of the model and better understood which parameters had the greatest influence on the data. The initial work we did showed promise. 95% of the data was within ±10% error and 99.6% of data falling between ±20% error.


Figure 2 95% of data is within ±10% error

Figure 3 Original model error plot

95%% within ±10% error

99.6% within >±10% error and <±20% error

Moving into 2019 we tweaked the





Moving into 2019 we tweaked the variables used to include date, hour, and day as well as our environmental variables that include temperature, humidity, wind speed, wind direction, gust speed, precipitation, dew point, pressure, light conditions, and wind chill. The continued development effort lead to faster performance and more accurate results as indicated in the figures below.

Figure 4 99.96% of data is within ±10% error

Figure 5 Improved model error plot

99.36% within ±5% error

99.96% within ±10% error





Lessons Learned

Temperature carries the most weight in determining load

Reviewing the impact of each variable in our model was helpful for us to build confidence with the model. Standard industry practice says that temperature normally influence a load model by a factor of 70%. This is consistent with the data we observed as shown below.

Figure 6 Temperature carries the most importance in load prediction.

However, we also learned that not all cities and models are influenced by temperature in the same way. We will explore some of the abnormal model data below. The examples we present demonstrate strong evidence that black box machine learning models are a great fit to this class of problems because of the non-intuitive variance in the load over time and even over the days of the week.

Machine Learning is Adaptable

We tested the adaptability of the model by comparing data from two time periods of the same city. As shown below, the 2019 load profile changed significantly from the original 2016 training data. The peak load doubled. In addition, the weekend to weekday load ratio changed dramatically. In 2016, the load went up and down on a regular daily cycle, but when we look at the same period in 2019, Monday – Friday had almost 2X the load of Saturdays and Sundays. Our thesis is that a large industrial load entered the grid between 2016 and 2019 and they only operated Monday – Friday.  Despite these changes, AMBLE adapted and yielded greater accuracy than in our earlier simulations.

Figure 7 The model is adaptable. Despite peak loads doubling and load ratio changes, it became more accurate over time.

Open Source Tools Enable Collaboration

Open source tools can be easily integrated to closed source solutions. We generated the above and below graphs with open source analytics tool. Once integrated, these tools enable universal data access and analysis. It is important to note that open source does not necessarily mean “free.” If you choose to integrate Open Source tools to your machine learning application, they will require either internal or 3rd party support. The good news is these solutions are scalable and reliable. We have implemented open source time series solutions in commercial settings with proven results.

Figure 8 Open source tools can easily integrate to closed source solutions. They are scalable, reliable, and can be implemented to enable universal data access and analysis.

Automated Load Prediction

In environments where communications between the customer communities and the generation plant are not 100% reliable, load prediction serves as a backup (perhaps even a primary source) in the event actual load data is not received. Automating the process reduces the probability of errors and manual processes. Universal data access across the enterprise enables generation and load to be a single entity.

Future Implications

Our example is one in which machine learning works with increasingly large data sets to predict load for a power plant. We made the application easy to integrate to enable analytics and data access for multiple stakeholders. As the number of devices and applications connecting to the grid continues to grow exponentially, so will the amounts and types of data available to the utilities. Machine learning will gain popularity as the tool for utilities who seek to unlock the value of their ever growing and increasingly complex data they have and continue to gather.


Spell checking: Press the CTRL or COMMAND key then click on the underlined misspelled word.
Matt Chester's picture
Matt Chester on Dec 2, 2019

In environments where communications between the customer communities and the generation plant are not 100% reliable, load prediction serves as a backup in the event actual load data is not received.

What situations are most likely to make these communications least reliable? Is it a matter of distance, technology, or incident?

Allison Salke's picture

Thank Allison for the Post!

Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »