Asset Integrity Management: Case Studies in Data Science for Power Generators
image credit: Charlieaja - Dreamstime
- Jun 1, 2019 12:08 am GMT
- 1725 views
Data Science is a multidisciplinary discipline that includes math, statistics, computer science, machine learning and domain expertise. Without domain expertise it is difficult to extract appropriate insights from the data. Data science involves:
- Data collection and preparation
Within the energy field there are four aspects where data science can be applied:
- Energy generation
Plants that have applied data science and machine learning have saved in operations and repair costs through modifying calculations by a few percent, which can lead to higher profit margins. As computing power has increased, data science has shifted from being a descriptive tool to a predictive one, which means using machine learning to predict failures and minimize outages. Today the objective is to be prescriptive, which means recommending actions to be taken to prevent incidents or optimize outcomes based on plant operating data trends and industry operating experience.
State-of-the-art analytics should include real time predictions and decision making, but this is still difficult to implement and not yet affordable for many companies. Evaluation Process A data scientist has to:
- Identify the problem
- Gather and prepare the data
- Visualize it
- Perform “feature engineering” by selecting the particular features for the model
- Choose the appropriate model and build it
- Implement the solution and test the hypothesis
At this point it is really important to correctly interpret the results, which is why domain expertise is a fundamental requirement. If the result(s) are not as expected or unfavorable, it is necessary to iterate and refine the model or its inputs.
Plant Operational and Reliability Database (INGRID)
Using data science Intertek created a database with hourly generation and emissions for the all the plants in the US that report to the EPA. This database currently has almost a billion records and allows to plot:
- Aggregated data which allows us to see state-wide trends
- Individual point plots which allow us to see cluster and patterns
- Plant outcome histograms which allow us to see changes in operating regimes
Case Studies Using Plant Database
Chart 1- Statewide Generation
Using the database allowed Intertek to create Chart 1 (below) that shows the average generation of all fossil plants, solar and wind farms in California over the course of a single day during several years. This plot, called the duck curve because of its shape, shows how the mix of generation keeps evolving and quantifies the cycling that will be required in the fossil fleet if the current solar contribution keeps growing at the observed rate. This chart shows that in 2015 the entire fossil fleet in California had to ramp up 1 TWh in only 3 hours (3 pm - 6pm) despite the solar contribution represented only a 5% of the total energy generated in the state.
Charts 2 & 3- Cycling and Output Frequency
Using the database also allowed Intertek to create Chart 2 (below left) that shows the estimated cycling damage for different unit types and sizes in Texas and Chart 3 (below right) that shows the output history and output frequency histogram of a unit which went from working a full load to lower loads, and recently, over maximum capacity, which has a big impact on cycling damage.
Chart 4- Wind Turbine
This is an example of temperature data measured on a wind turbine stator. These temperatures exceeded a maximum threshold, close to the melting point of the insulating material which led to ground faults in the generators. By using data science, Intertek could quantify the total time the material was exposed to those excessive temperatures, determine the period in which it happened and make a targeted recommendation for replacement.
Charts 5 & 6- Gas Plants
Intertek studied the change in heat rate and the efficiency loss due to cycling operation of a gas plant. Chart 5 shows the heat rate vs megawatts on days with a start (plotted in blue) compared with days without a start where the unit was working at full load (plotted in red). There is a significant loss of efficiency in those days with a start-up, and the trend of heat rate on days with a start exceeded the trend of heat rate on days without a start. The higher the heat rate, the lower the efficiency.
In Chart 6 the same unit shows how the average heat rate increased by 3% (with a corresponding loss of efficiency) over the six-year period from 2011 to 2017. The loss of efficiency is attributable to general plant and component aging, but the most significant contributor is due to increased cycling operation over those same six years.
Charts 7 - Machine Learning Applied to Gas and Coal Plant Failures
Boiler tubes are sometimes composed of two metals and the dissimilar metal welds are prone to fail due to creep caused by repeated heat-up/cool-down cycles as the plant output varies. Intertek used data from 56 fossil plants with dissimilar metal weld failures and tested different machine learning algorithms. Using the input of cycling-related variables, Intertek wanted to learn if a model could be built to predict both which welds were prone to failure and the time to failure. Intertek discovered that neural networks were the best model to predict these types of failures.
The model correlated to the data with a 95% prediction rate. Intertek was not only able to predict if failure occurred on that plant, but also to correctly estimate the time of failure which was quite accurate for failures less than 40 years. This model only used 10 cycling-related variables to predict this failure, and if more variables were included, the prediction accuracy will certainly improve. The advantage of this neural network model is the model itself decides which features are more important.